1.3B Worldcat scrape and data science mini-competition

Anna’s Archive has scraped all of Worldcat, the world’s largest library metadata collection, to create a TODO list of books that need to be preserved. They are hosting a data science mini-competition to invite others to analyze the data and discover interesting insights. The dataset consists of Worldcat library records from various OCLC member libraries, including books, magazines, journals, and more. Anna’s Archive obtained the records by meticulously scraping Worldcat’s website during a period when security flaws were present. The dataset, which is available on Anna’s Archive’s torrents page, contains 1.3 billion unique IDs and 1.8 billion records. The challenge is open to everyone, and the top three submissions will receive a year-long membership to Anna’s Archive and have their work featured in a blog post.

https://annas-blog.org/worldcat-scrape.html

To top