Hacker Newsnew | past | comments | ask | show | jobs | submit | dumah's commentslogin

These companies innovate in all of those areas and direct those resources towards building hyper-scale custom infrastructure, including CPU, TPU, GPU, and custom networking hardware for the largest cloud systems, and conduct research and development on new compilers and operating system components to exploit them.

They're building it for themselves and employ world-class experts across the entire stack.

How can NVIDIA develop "more integrated" solutions when they are primarily building for these companies, as well as many others?

Examples of these companies doing things you mention as being somehow unique to or characteristic of NVIDIA:

Complex kernel drivers or modules:

- AWS: Nitro, ENA/EFA, Firecracker, NKI, bottlerocket

- Google: gasket/apex, gve, binder

- Meta: Katran, bpfilter, cgroup2, oomd, btrfs

Hardware simulators:

- AWS: Neuron, Annapurna builds simulations for nitro, graviton, inferentia and validates aws instances built for EDA services

- Google: Goldfish, Ranchu, Cuttlefish

- Meta: Arcadia, MTIA, CFD for thermal management

Optimizing Compilers:

- Amazon: NNVM, Neo-AI

- Google: MLIR, XLA, IREE

- Meta: Glow, Triton, LLM Compiler

Acceleration Libraries:

- Amazon: NeuronX, aws-ofi-nccl

- Google: Jax, TF

- Meta: FBGEMM, QNNPACK


You're generalizing a failure at delivering one consumer solution and ignoring the successful infrastructure research and development that occurs behind the scenes.

Meta builds hardware from chip to cluster to datacenter scale, and drives research into simulation at every scale, all the way to CFD simulation of datacenter thermal management.


More than one failure. They had a project to make a custom chip for model training a few years ago, and they scrapped it. Now they have another one, which entered testing in March. I don't think it's going well, because testing should have wrapped up recently, right before the news that they're in serious talks to buy a lot of TPUs from Google. On the other side of the stack, Llama 4 was a disaster and they haven't shipped anything since.

They have the money and talent to do it. As you point out, they do have major successes in areas that take real engineering. But they also have a lot of failures. It will depend how the internal politics play out, I imagine.


performance degradation observed using the first approach at high concurrency recently discussed here: https://news.ycombinator.com/item?id=44490510


This conception is simplistic, a straw man, and one that appears to be wholly ignorant of the Cosmopolitan tradition and reasonable criticisms thereof.


What a fallacy to equate the status quo of lawyers running the country with the United States itself.


No, on the balance it is lawyers who protect companies from the people they harm and lawyers who constitute the government officials who perpetually exceed and expand their mandates.

Most of the senate are lawyers and it’s the most frequent occupation of a legislator.


You're letting "perfect" be the enemy of "good" here. If the alternative is China, where a mid-level bureaucrat can decide the public good outweighs your health, I'll take the lawyer-filled US.


They don’t charge fees, because they’re not a brokerage or exchange.

They pay fees to exchanges.

As a market maker, some rebates are given back conditional on their activity.

They have no users.

You’re just constantly obliviously asserting falsehoods that betray an almost comical lack of understanding of the reality of these businesses.


You have no concept of the infrastructure and organization necessary to operate these enterprises.

All your posts here are low-information anti-finance rants.


There’s tons of latency sensitive code outside of the FPGA systems and it is not simple.


No, there are absolutely electronic trading markets where a difference of milliseconds of latency to certain events is worth more than a M PnL. That’s a long time.


The top trading firms are firing off orders in double digit nanoseconds, not milliseconds.

In some cases the order leaving the card starts to emerge before the packet containing the market data event that they're responding to has even finished arriving.

Waiting for a full microsecond for the packet to arrive before responding means you're already too slow

The speed game is essentially over


Doesn't the fact that a modern FPGA-centric (probably ASICs in the mix too at this point) hybrid NIC/order-parser/state-machine thing is rumored to be able to hit glass-to-glass of ~20-40ns mean that the speed game is hotter than ever?

Do you mean that because it involves a lot of hardware design now? The days of being able to offer around the inside in C++ on a regulated securities exchange are over, but there's still C++ driving the thing, that 20ns "tick to trade" or however it's being measured in some instance is still pretty basic response stuff, light speed is still a thing. There's a C++ program upstairs running the show, and it's trying to do it's job in under a mike for sure.

The OG talk on this is Carl Cook's: https://www.youtube.com/watch?v=NH1Tta7purM

But there are more recent talks (Optiver is especially transparent about it but other people talk about it too): https://www.youtube.com/watch?v=sX2nF1fW7kI, that's David Gross at CppCon last year, it can't have changed that much since last year.


Thats all cool and all, but the major trading firms own properties that are all physically connected to the exchanges faster than any competitor in terms of physical location and routing.

No matter how fast you process the data, the ping difference of 1ms is going to be an advantage that you can never beat.

There is a reason why firms like HRT trade mostly in derivatives and futures.


That’s irrelevant to the fact that the expected PnL on a millisecond of latency improvement is a lot more than 1M in some markets. Obviously if you are getting what ever trade you are concerned with off in less than one millisecond, the question isn’t well posed.

There are many more games to play than delta one takeout and the solutions certainly don’t fit on one or a handful of FPGA’s.


I took the parent post to mean that a few large firms have emerged as clear winners of the speed game, and most other companies compete on (relatively) longer time scales now.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: