It will probably end up being the most popular things, the most viewed or read. More copies of it, more likely to be archived.
This reminds me of books. I’m sure the majority of books from over a hundred years ago are lost because they weren’t popular. We haven’t really noticed their absence…
> I’m sure the majority of books from over a hundred years ago are lost
Especially if you include independently published books that weren't widely circulated. I wonder what percentage of total books this is.
My grandfather published a book before he passed away. It was never sold online or in any big retail stores. Once the last hard copy is lost, it's gone forever.
That's interesting, many countries have laws(or customs) to submit everything published in a form of a book to the national library. I know here in Poland this is done too, because my partner was having her book published by a small publisher and "providing a copy of the book to the national library" was one of the publisher's responsibilities. I have no idea if it's a law or just a custom.
> I’m sure the majority of books from over a hundred years ago are lost
If they were in one of the university libraries that Google scanned, they're not "lost." But you're right; you can't read them. Congress should mandate that the Library of Congress, at least, get a copy to preserve them for the ages.
When you publish a book or magazine in France you’re required to give 2 copies to the national library for archive purpose. Doesn't something like that exist in other countries?
It certainly does in Spain. We even extended it to videogames, although I don't know how much that achieves when so many games are barely playable before the first few patches, have much of their content released in future updates and many are unplayable after the servers close.
Not all books in the state libraries are equal. Historical copies and popular authors (popular among researchers, a much bigger set already) are exhibited and get attention, John Doe's book of family recipes gets sent to some giant dark warehouse people rarely visit.
It is easy to forget that it is an 18th century solution born from 18th century approach to knowledge. Back then, bibliographies of everything printed in certain year in certain country could be compiled, and they were supposed to be more than just lists, to help other men of books keep up with Progress.
Apparently it was required by the Library of Congress in the U.S., but the Supreme Court might have nixxed that because of the Constitution's 4th Amendment (must be reimbursed if required to turn over property).
This is pretty much the best marketing for IPFS. The availability and number of backups of any price of data is directly correlated with the number of people that use it.
For example, if LLM NN model weights are distributed with IPFS instead of corporate infrastructure (basically zero redundancy) the popular models would be very available, and have essentially near zero chance of being lost.
To state that again, the llama models likely have tens of thousands of downloads, which would mean tens of thousands of servers and backups of the data, versus what we have now, which is essentially just one.
We need IPFS for data distribution. Tightly knit integration with git repos is an obvious match as well.
This reminds me of books. I’m sure the majority of books from over a hundred years ago are lost because they weren’t popular. We haven’t really noticed their absence…