Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'

Stopthatgirl7@lemmy.world · 3 months ago

Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'

taladar · 3 months ago

Plenty of things are more difficult in decentralized systems.

You have to store all kinds of data either in multiple copies/caches or get long delays on certain operations such as search or even just displaying aggregated data (such as a post and its comments from different instances on Lemmy).

You have the problem of different jurisdictions and moderation policies for different instances.

You will have a hard time exporting or deleting all data related to a specific user when required by law (e.g. GDPR).

jatone@lemmy.dbzer0.com · 3 months ago

Difficult != Can’t be done. I’m well aware of the difficulties. Distributed systems design is one of my specialties.

GDPR only applies to your servers. Data deletion is probably the easiest part to deal with.