The link to the above release post has the wrong caption for me. Its title says “Ambulance hits Oregon cyclist, rushes him to hospital, then sticks him with $1,800 bill, lawsuit says - Divisions by zero”
Thank you for the tips. I am actually interested in enumerating metadata for all the “items” as defined by the API page ever uploaded. For example, one item = one ID:
Archive.org is made up of “items”. An item is a logical “thing” that we represent on one web page on archive.org. An item can be considered as a group of files that deserve their own metadata.
You did cause me to look at the API docs again, though, and I think I found something that does enumerate all item names, and as a bonus, it will keep you updated when changes are made: https://archive.org/developers/changes.html
We’ll see how much progress I can make. It might take a while to get through all the millions of them.
Yes, exactly why I wanted to start this project. It’s nice to have the Internet Archive but we cannot trust that content won’t be taken down eventually. Even just storage costs might become an issue in the future for data that gets maybe 30 total views over many years. But it is nice to hear some of the data you were looking at is coming back.
Long term, it would be nice for a community of users to create a decentralized index of Internet Archive metadata so it cannot get taken down and has the torrent files of the content so people can share it and participate in the seeding for the content they care about. The Internet Archive might cooperate to make it easier to do this, for example by using Bittorrent v2 which would help us detect file duplication and not have to use padding files since all files are aligned to pieces in v2.
Currently there is little incentive for people to seed the Internet Archive content but no doubt it will become more important to do that in the future.