> The biggest cost is as you imagine the streaming - getting the video to the vi...

mike_hearn · on May 20, 2024

Google own vast network infrastructure. The day Google acquired YouTube (I was there) they discovered that YT was on the verge of collapse and would run out of bandwidth entirely within months. There was an emergency programme put in place with hundreds of engineers reallocated to an effort to avoid site collapse, with the clever/amusing name of BandAid.

BandAid was a success. YouTube's history might look very different if it wasn't. It consisted of a massive crash buildout of a global CDN, something that Google historically hadn't had (CDNs aren't useful if everything you serve is dynamically generated).

One reason BandAid could happen was that Google had a large and sophisticated NetOps operation already in place, which was already acquiring long haul unused fibre wavelengths at scale for the purposes of moving the web search index about. So CDN nodes didn't need to be far from a Google backbone POP, and at that point moving bits around on that backbone was to some extent "free" because the bulk of the cost was in the fibre rental and the rackspace+equipment on either end. Also it was being paid for already by the needs of search+ads.

Over time Google steadily moved more stuff over to their infrastructure and off YouTube's own, but it was all driven by whatever would break next.

Then you have all the costs that YouTube has that basic video sites don't. ContentID alone had costs that would break the back of nearly any other company but was made effectively mandatory by the copyright lawsuits against YouTube early on, which Google won but (according to internal legal analysis at least) that was largely due to the enormous good-faith and successful effort demonstrated by ContentID. And then of course you need a global ad sales team, a ton of work on ad targeting, etc.

This story explains why Google can afford to do YouTube and you can't. The reality is that whilst they certainly have some sort of internal number for what YouTube costs, all such figures are inevitably kind of fantastical because so much infrastructure is shared and cross-subsidised by other businesses. You can't just magic up a global team of network engineers and fibre contracts in a few months, which is what YouTube needed in order to survive (and one of the main reasons they were forced to sell). No matter what price you come up with that in internal booking it will always be kinda dubious because such things aren't sold on the market.

area_man · on May 20, 2024

Definitely a few factual errors here that ought to be corrected.

On day one of the acquisition, Youtube's egress network was at least 4x the size of Google's, built and run by two guys. This shouldn't be a shock, you need a lot more bits to serve video than search results. For the hottest bits of content, third-party CDNs were serving videos and thumbnails.

There was no collapse imminent, but there were concerns about getting YouTube and Google infrastructure on a common path. BandAid was named as such because the goal was "not to break the internet." It was a small project with maybe a dozen members early on, all solid people.

YouTube had its own contemporaneous project, née VCR - "video cache rack". We did not necessarily believe that BandAid would arrive in a reasonable amount of time. Generally Google has two versions of every system - the deprecated one and the one that doesn't work yet.

VCR was a true YouTube project, 3 or 4 people working with one purpose. It was conceived, written, physically built and deployed in about 3 weeks with its own hardware, network setup and custom software. I believe it was lighttpd with a custom 404 handler that would fetch a video when missing. That first rack maxed out at 80Gbps a few hours into its test.

After several months, Bandaid debuted. It launched at ~800Mbps and grew steadily from then on into what is certainly one of the largest networks in the world.

YouTube mostly moved things to Google based on a what made good engineering sense. Yes, a few of them were based on what would break next - thumbnails leaps to mind. Search, which we thought was a no-brainer and would be "easy" took more than a year to migrate - mostly due to quality issues. Many migrations were purely whimsical or some kind of nebulous "promo projects." Many more stayed separate for more than a decade. When a company gets large enough, the foolish consistency tends to trump a lot of other engineering arguments.

To the ancestral poster, do not despair. You can transcode, store and serve video, as you've likely surmised it's not all that difficult technically. In fact, it's so much easier now than in 2005.

What makes a great product is hard to describe and not always obvious. The economics will depend on your premise and success. "the cloud" gets pricey as you scale. There will be a long path of cost optimization if you get big enough, but that's the price of doing business at scale.

blamethenetwork · on June 2, 2024

Even more detail:

YouTube continued building their own POPs AND network for ~18 months AFTER the google acquisition. Google did not have the network capacity to carry it. (Fun fact: YT had 25 datacenter contracts, and opened them at the rate of 1 a month) starting from March 2006 - 25 contracts were set up in 2 years. At the time of the google acquisition, there were, ~8. (So yeah, 17 additions over the next ~16 months)

Also YT had a far more streamlined (but less optimized) network architecture. Traffic was originally generated in the PoP and egressed out of the PoP. The was not a lot of traffic going across backbones (Unless if it was going to a settlement free peer). Initially, it was egressed as fast as possible. This was good for cost, not great for performance, but it did encourage peering, which also helped cost. Popular videos did go via CDN initially.

YouTube had a very scalable POP architecture. I agree with area_man that the collapse was not imminent. (See 17 additional pops) There were growing pains, sure, but there was a fairly good system.

Also, as it relates to bandaid from a datacenter and procurement perspective, the original bandaid racks were in YT cages. YT had space in datacenters, and Google didnt. (SV1, DC3). Also, the HWOps tech who went on-site to DC3, ended up tripping a breaker. (They were almost escorted out).

Side-note: the evolution/offshoot of bandaid into the offnet caching system - now called Google Global Cache, is what really helped scale into provider (end-user) networks, and remove a lot of load from their backbone, similar to an Akamai, or a Netflix open connect box. Last I heard GGC pushed significantly more traffic than the main google network.

The google netops teams that were of help in the first year of acquisition was the peering team, and some of the procurement team. The peering team helped us leverage existing network relationships, to pick up peers (eg: SBC)

The procurement team gave us circuits from providers that had a long negotiation time (eg: sprint)

Google also helped YouTube procure various Juniper equipment, which was then installed by the YT Team.

mike_hearn · on May 20, 2024

Thanks for the corrections. I was indeed thinking of thumbnails w.r.t. "what would break next".

tsycho · on May 20, 2024

This comment is a great answer to those commenters who say "Google bought YouTube/Android/..., they haven't invented anything since Search" miss the actual hard part entirely.

(same for Meta wrt Instagram)

These products that have been scaled by multiple orders of magnitude since original acquisition are like ships of Theseus; almost everything about how they work, how they scale and how they make money, have completely changed.

elcritch · on May 20, 2024

Great write up! I worked for a few months at a Google datacenter and a few times got to see the fiber endpoints.

Though the idea of such networks not being sold on the market makes be ponder if starlink will come to provide such a service. They’d need to scale out their laser links and ground stations.

mike_hearn · on May 20, 2024

There are hard physical limits on how much bandwidth Starlink can provide to do with spectrum allocations, so it will always be a somewhat boutique service and indeed prices for end users might end up climbing as word spreads and demand grows. They already practice regular dynamic pricing changes depending on demand within a cell. It doesn't make sense for corporate backbones.