Nobody who has ever used Google Books would think that it's putting complete copies online. The results include pages with hits, plus pages around them. There are so many missing pages that books aren't readable.
I do - or more accurately -I did. I remember years back there were techniques published that you could use by tweaking the URL to trick Google into giving you specific pages. I am sure people quite software to exploit that.
Pretty sure it limits you to a set number of pages from any given book per day or something. I'm sure with sufficient effort you could work around it but a casual user can't just work their way through a complete book.
It's different. When doing something as your described in a public library, you're clearly breaking the library rules while you're in their premises and can be caught, very different than having that being done by a software your ran from your home. The former also does not scale to be economically profitable, where the later can be hacked once and ran to scrape thousands of books, basically a mass reproduction of books illegally.
1: I'd suggest you'd check out the book and bring it home with you and do the scanning at home.
2: This is not Google's first rodeo when it comes to scraping. I'd be surprised if you got more than a few days into the described process, so no, it certainly does not scale.
In my experience the books tend to have some pages which are always shown in the preview, some pages which appear never to be viewable in preview mode, and some pages which you may be able to preview if you have not reached your limit.
So after 100 days you'll probably still have 100 incomplete books.
You can't just thumb through all the pages. It's search based, so you'd have to know what phrases to search for. Meaning, you'd need to have the book already.
(Four years ago. The AG eventually appealed to the Supreme Court, but were denied a hearing.
15-849 AUTHORS GUILD, ET AL. V. GOOGLE, INC.
The petition for a writ of certiorari is denied. Justice
Kagan took no part in the consideration or decision of this
petition.
It's frustrating how the opposition was so painfully naive. As the article says, it was so clearly a case of "perfect being the enemy of the good." The following paragraphs deconstruct the sorry state of affairs that resulted:
> The irony is that so many people opposed the settlement in ways that suggested they fundamentally believed in what Google was trying to do. One of Pamela Samuelson’s main objections was that Google was going to be able to sell books like hers, whereas she thought they should be made available for free. (The fact that she, like any author under the terms of the settlement, could set her own books’ price to zero was not consolation enough, because “orphan works” with un-findable authors would still be sold for a price.) In hindsight, it looks like the classic case of perfect being the enemy of the good: surely having the books made available at all would be better than keeping them locked up—even if the price for doing so was to offer orphan works for sale. In her paper concluding that the settlement went too far, Samuelson herself even wrote, “It would be a tragedy not to try to bring this vision to fruition, now that it is so evident that the vision is realizable.”
> Many of the objectors indeed thought that there would be some other way to get to the same outcome without any of the ickiness of a class action settlement. A refrain throughout the fairness hearing was that releasing the rights of out-of-print books for mass digitization was more properly “a matter for Congress.” When the settlement failed, they pointed to proposals by the U.S. Copyright Office recommending legislation that seemed in many ways inspired by it, and to similar efforts in the Nordic countries to open up out-of-print books, as evidence that Congress could succeed where the settlement had failed.
> Of course, nearly a decade later, nothing of the sort has actually happened. “It has got no traction,” Cunard said to me about the Copyright Office’s proposal, “and is not going to get a lot of traction now I don’t think.” Many of the people I spoke to who were in favor of the settlement said that the objectors simply weren’t practical-minded—they didn’t seem to understand how things actually get done in the world. “They felt that if not for us and this lawsuit, there was some other future where they could unlock all these books, because Congress would pass a law or something. And that future... as soon as the settlement with Guild, nobody gave a shit about this anymore,” Clancy said to me.
> It certainly seems unlikely that someone is going to spend political capital—especially today—trying to change the licensing regime for books, let alone old ones. “This is not important enough for the Congress to somehow adjust copyright law,” Clancy said. “It’s not going to get anyone elected. It’s not going to create a whole bunch of jobs.” It’s no coincidence that a class action against Google turned out to be perhaps the only plausible venue for this kind of reform: Google was the only one with the initiative, and the money, to make it happen. “If you want to look at this in a raw way,” Allan Adler, in-house counsel for the publishers, said to me, “a deep pocketed, private corporate actor was going to foot the bill for something that everyone wanted to see.” Google poured resources into the project, not just to scan the books but to dig up and digitize old copyright records, to negotiate with authors and publishers, to foot the bill for a Books Rights Registry. Years later, the Copyright Office has gotten nowhere with a proposal that re-treads much the same ground, but whose every component would have to be funded with Congressional appropriations.
They've pretty much stopped scanning new books, even new out of copyright manuscripts etc.
Google Books itself had loads of cool possibilities of new ways to make use of the data from those books. This lawsuit has pretty much stopped all innovation, and all the good engineers left the project years ago.
The Internet Archive’s book scanning project is still in full swing. Yes, the indexing and presentation is not at parity, but I prefer a non-profit digital library to be the canonical reference instead of Google.
That's great for material that's public domain or out of copyright, but the Authors Guild settlement could have digitized and made accessible orphan works that are still under copyright. It would have complemented the public domain projects, not supplanted them.
But instead academic opponents of the deal seriously thought they would have better luck pursuing copyright reform in Congress (!), and helped kill the settlement. Of course, in reality Congress did no such thing, and so the chance to rescue orphan works was lost.
While a good step, this only makes up for a portion of what the settlement would have allowed. (Most obviously, it appears this only covers books from a 20 year period and it takes more work to ascertain that the books are not being sold.)
Moreover, this does not contradict the idea that the Authors Guild settlement could have complemented public domain efforts. Even today some of the books saved on the Internet Archive were retrieved via Google Books: https://archive.org/details/googlebooks&tab=about
Funnily enough, that's also how the original article described the opposition to the Authors Guild settlement. As it turned out, killing the Google Books project didn't really move us closer to copyright reform.
HathiTrust too has great content and indexing, it’s just a shame that it’s much slower than Google Books. But between the that and archive.org they’re a fine replacement.