Internet Archive and Wayback Machine have been facing DDoS cyberattacks for the last few days. The non-profit assured that collections are safe despite the service being inconsistent since Sunday.
That wouldn’t distribute the load of storing it though. Anyone on the torrent would need to set aside 100PBs of storage for it, which is clearly never going to happen.
Torrents are designed for incomplete storage of data. You can store and verify few chunks without any problem.
You’d want a federated (or otherwise distributed) storage scheme where thousands of people could each contribute a smaller portion of storage, while also being accessible to any federated client.
Torrents. You may not have entirety of data, but you can request what you need from swarm. The only limitation is you need to know in which chunk data you need.
Ideally you’d have more than that so that a single node going down doesn’t mean permanent data loss.
True. Until you responded I actually completely forgot that you can selectively download torrents. Would be nice to not have to manually manage that at the user level though.
Some kind of bespoke torrent client that managed it under the hood could probably work without having to invent your own peer-to-peer protocol for it. I wonder how long it would take to compute the torrent hash values for 100PB of data? :D
~300MB/s on one core of 13-years old i5 SHA-256(used in BitTorrent v2). Newer cores can about half a gig per one. Less than 3 days on one core then. Less than day on 3 cores.*
* assuming no additional performance penalty for increased power consumption and memory bandwith usage
My guess storage bandwidth would be biggest bottleneck.
Found relatively old article(in Russian, just search for openssl and look at graph that mentions SHA-512 which is SHA-2 too) that says i7-2500 all-cores throughput is slightly over 1GB/s.
Torrents are designed for incomplete storage of data. You can store and verify few chunks without any problem.
Torrents. You may not have entirety of data, but you can request what you need from swarm. The only limitation is you need to know in which chunk data you need.
True.
True. Until you responded I actually completely forgot that you can selectively download torrents. Would be nice to not have to manually manage that at the user level though.
Some kind of bespoke torrent client that managed it under the hood could probably work without having to invent your own peer-to-peer protocol for it. I wonder how long it would take to compute the torrent hash values for 100PB of data? :D
~300MB/s on one core of 13-years old i5 SHA-256(used in BitTorrent v2). Newer cores can about half a gig per one. Less than 3 days on one core then. Less than day on 3 cores.*
* assuming no additional performance penalty for increased power consumption and memory bandwith usage
My guess storage bandwidth would be biggest bottleneck.
Found relatively old article(in Russian, just search for openssl and look at graph that mentions SHA-512 which is SHA-2 too) that says i7-2500 all-cores throughput is slightly over 1GB/s.