Lawsuit Accuses Anna's Archive of Hacking WorldCat, Stealing 2.2 TB Data

ancuuiqter@lemmy.world · edit-2 10 months ago

Lawsuit Accuses Anna's Archive of Hacking WorldCat, Stealing 2.2 TB Data

Snot Flickerman@lemmy.blahaj.zone · 10 months ago

https://annas-blog.org/worldcat-scrape.html

WorldCat

That is when we set our sights on the largest book database in the world: WorldCat. This is a proprietary database by the non-profit OCLC, which aggregates metadata records from libraries all over the world, in exchange for giving those libraries access to the full dataset, and having them show up in end-users’ search results.

Even though OCLC is a non-profit, their business model requires protecting their database. Well, we’re sorry to say, friends at OCLC, we’re giving it all away. :-)

Over the past year, we’ve meticulously scraped all WorldCat records. At first, we hit a lucky break. WorldCat was just rolling out their complete website redesign (in Aug 2022). This included a substantial overhaul of their backend systems, introducing many security flaws. We immediately seized the opportunity, and were able scrape hundreds of millions (!) of records in mere days.

After that, security flaws were slowly fixed one by one, until the final one we found was patched about a month ago. By that time we had pretty much all records, and were only going for slightly higher quality records. So we felt it is time to release!

MotoAsh@lemmy.world · edit-2 10 months ago

Yea OK they’re fucked. I really really doubt they’ll be able to claim the data is solely comprised of the open works saved within that database. The only way they’d be able to get away with it is if they’ve meticulously harvested the data such that they only ever retrieved the open works or public domain works.

Anything not in that list or otherwise made available solely via their nonprofit efforts is going to be ammo in the lawsuit. Ammo that will hit its target.

Lawsuit Accuses Anna's Archive of Hacking WorldCat, Stealing 2.2 TB Data

Lawsuit Accuses Anna's Archive of Hacking WorldCat, Stealing 2.2 TB Data

Lawsuit Accuses Anna's Archive of Hacking WorldCat, Stealing 2.2 TB Data * TorrentFreak