i think plausibly being able to use youtube video as training data was the major...

solresol · on June 21, 2024

> i think plausibly being able to use youtube video as training data was the major reason for google to buy youtube in the first place

When I asked Eric Schmidt the "why did we spend so much money on buying youtube?" question his answer was "if it's the future of television, it was a bargain; if not, we overpaid."

There didn't seem to be any expectation among senior management at the time that it was anything other than "televisions carry advertisements, we want in on that market."

kragen · on June 21, 2024

thank you, my only interaction ever with eric was when he ate lunch on my wife

pentae · on June 21, 2024

Was this a Nyotaimori situation or did you mean "with my wife"

kragen · on June 21, 2024

we were all in a pit full of plastic balls, she was not visible

Izkata · on June 22, 2024

Don't forget they also failed to gain any traction with Google Video before buying Youtube.

numpad0 · on June 21, 2024

> plausibly ... training data was the major reason for google to buy youtube

I'd agree, and I'd also argue people were totally cool with that until LLM/GenAI happened.

Somehow it's cool and exciting if you fed YouTube data to reconstruct historical artifacts, prototyped self driving car software, trained super-resolution algorithm, so on, but not GenAI. It's a different thing altogether. It's a double standard, or at least a set of criteria with a hidden decisive criterion.

Just IMO, I think that "double" standard has to be discussed more. It's supposedly about copyright but something is off, and it's definitely not about monetary compensation(individual works of art nor collective income support). There's something else with GenAI/LLM that make people want it gone.

e: anecdotal datapoint that people were cool about AI until LLM/GenAI/OpenAI[1] - no talks of safety, training data provenance, societal harm, nothing negative whatsoever from a digital camera news-blog - and it's about a Diffusion model:

  Enhance! Google researchers detail new method for upscaling low-resolution images with impressive results
  Published Aug 30, 2021 | Gannon Burgett
  [...] Or is it? A new blog post on the Google AI Blog showcases a new technology its developed to upscale low-resolution images with incredible results.

1: https://www.dpreview.com/news/0501469519/google-researchers-...

tubignaaso · on June 21, 2024

Could it be a property of the transformative nature of those non-GenAI models? Using the data to create self driving systems or enhance existing works is adding value to the pool of work. It takes the copy written data and creates something new. GenAI, by comparison, seems to devalue existing works. It takes the same data and creates competing works at best, straight up copies at worst.

numpad0 · on June 22, 2024

That's one highly plausible possibility, but I also think they could be something else e.g. people just not liking AI aesthetics.

robertlagrant · on June 21, 2024

> There's something else with GenAI/LLM that make people want it gone.

Generative AI is in the news a lot right now because clicks. "AI will take your job" is out there a lot, possibly because "writer of low resolution news articles" is what it probably could replace, and so the writers have it on their minds.

deelowe · on June 21, 2024

I doubt using it as training data was specifically the goal, but Google has always believed more data = more profit over the long term. This is why Gmail launched with unlimited storage.

Semaphor · on June 21, 2024

Was it unlimited? I only remember it being a decently high number at the time, far higher than any other freemailer.

palad1n · on June 21, 2024

I remember that Gmail was released on April 1st (on purpose), and many people thought it was a joke because it came with 1GB of storage, while places like Yahoo had like 20MB.

lolinder · on June 21, 2024

It technically had a limit, but I remember thinking of it as unlimited because the amount of storage available counted up at a faster rate than I was saving emails.

deelowe · on June 21, 2024

Maybe you're right. It's been a while. Either way, the philosophy was always to make money off the data somehow even if we didn't know how at the time.

TheDudeMan · on June 21, 2024

Correct. 1GB.

lxgr · on June 21, 2024

“And counting”, famously! I remember watching that counter crawl up and up in disbelief, when Hotmail and local alternatives were offering at most 10MB on their free plans.

ajkjk · on June 21, 2024

No way, AI wasn't on the radar back then.

dannyobrien · on June 21, 2024

At EFF we were arguing with Google about their permanent collection of user data early (I joined in 2005, and we already were putting pressure on them then). Whenever we asked, said they were sure it would come in useful for improving their services. Google just institutionally strongly believed in the value of data.

I sometimes wonder if the form of our current machine-learning boom is actually based on that conviction and the determined search for applications, rather than modern AI being a vindication of that strategy. A bit like Moore's Law: is it an iron rule of technology, or just a way to coordinate a huge amount of resources across an industry?

Kye · on June 21, 2024

It was a long time ago that I read a history on this, and I might be missing a detail, but the gist was Google's investors were clamoring for profits following the .com crash and Google realized the data they had was a gold mine if they could just figure out how to apply ML to it.

They tried really hard and did okay for a while using it for advertising, but Doubleclick did it better, so they bought it in 2008.

AI (ML) was absolutely on the radar.

kragen · on June 21, 2024

google has been an ai company since way before ai was fashionable. they hired norvig before i first met him in 02001. you can find dekhn comments here about larry and sergey talking with him about the central importance of ai back last millennium. i've also heard it myself from other early googlers (though not larry and sergey)

dymk · on June 21, 2024

Occam's razor. Google has always been an ads company. YouTube had big ad potential. Saying it was for the training data for AI that wouldn't exist until over a decade later ignores the obvious.

kragen · on June 21, 2024

google was an ads company before they bought youtube but they were an ai company before they were an ads company