Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It would shock me if there's a working copy of 'rm' allowed anywhere near the Internet Archive. They take this stuff down for compliance, but my dream is that the data lives on, waiting for a saner day when the legal climate for archivists gets a little warmer.


It's tough though--supportive as I am of the Internet Archive's goals. How is an "archivist" different from a random individual who scrapes stuff of the Internet and rehosts it? In the aggregate, the Internet Archive looks different from the typical person who is copying articles and blog posts, wrapping them up in ads, and displaying them. The IA is non-profit, doesn't run ads, etc. It also respects robots.txt. But it's not that clear to me what the legal regime would be that allows the Internet Archive to function free and clear and doesn't hit cases that most would agree are shady.


The only difference between science and screwing around is writing it down.

You can be trained as an archivist. You can get a Masters and a PhD in archival practice. There are industry-standard procedures and codes of ethics. There's a very specific understanding of what is important to save, how to save it, and how to document its context and its provenance.

That's why the Internet Archive requires a certain fidelity of capture (WARCs) that a screenshot service or a citation tool don't provide.

That's also why they are legally a library. Libraries have particular copyright exemptions for preservation. A typical person doesn't. But you generally have the right to make backups for your own use, and so you can also donate those backups.

It's like if you were a famous person, and you bought a newspaper and a book, and when you died your personal effects were donated to your alma mater who put on a big exhibit of your life and times, that newspaper (your backup of the original that lives in the hard drives of the publisher) and that book (your backup of the original that some author wrote) are there, too. No-one's conferring any rights to the content; the publisher still owns the newspaper, and the author still owns the book, but that was your copy that is now available for everyone to see.


>That's also why they are legally a library.

Citation? I'm not aware of the IA having any special status.

A physical newspaper or book isn't a backup. It's a physical artifact that falls under first sale doctrine. The same doesn't apply to digital.


> How is an "archivist" different from a random individual who scrapes stuff of the Internet and rehosts it?

If the random individual is presenting it in the same way as archive.org (namely, citing the source), then I don't see a difference.


It's acting like a library, which have often wanted to have a copy of everything to allow people to research it.

These often have exemptions written into law to allow them to do what they do, so I hope the IA is covered.


But libraries can't freely republish out of print books, which is about what the IA is doing. The equivalent would require the archive to have a room you'd visit with a terminal connected to the archive.


Unless it's child porn, say.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: