The stock price had assumptions baked in about the number of units expected to be sold. DeepSeek cut that hardware estimate by as much as 45x. That is that absolute obvious correlation between that model being very efficient to train and NVDA dropping 18%.
I don’t get it. The labs have regularly made improvements that dramatically lower the cost of training an equal-performing model. When they do this, they also train a larger model with even higher performance. This time, DeepSeek did the first part but didn’t do the second. Now every lab in the world will throw their compute into the effort to replicate and beat DeepSeek’s model with larger scale. It’s not like everyone is just going to say “well I guess AI is smart enough now, no point improving it anymore!” and stop building bigger training clusters.
If anything, r1 makes even more GPU demand likely, since it mitigated or at least delayed the risk AI hit a dead end (in which case, ceasing development may actually make sense).
Define dramatically with numbers. From all the sources I've read, it was so significant and also run on a far more limited cluster and the results are as good as the other frontier models. Optimizations have been coming, I think the one or more they found were significantly larger.
It still doesn't make sense to me. If the money for training is still there, wouldn't companies that can afford it use the efficiency gains and also scale up models?
Unless AI is a bubble, and it pops, I can't see the demand for compute going down.
I think AI is a bubble. The amount of compute for inference is vastly overestimated, because a lot of caching is coming. It's driven by maniacal statements like Sam Altman's insistence that we must spend Trillions on compute, to achieve AGI, and it's more important than anything else.
Project Stargate is some large fraction of that, and of course Softbank is no stranger to losing money on overestimating demand (for example, WeWork). To be fair, China has a lot of overestimation of demand too (for example Evergrande). The other is that rapid competition leads to overinvestment by all parties.
Which is great for us, we'll have loads of cheap compute and hopefully a bunch more carbon free energy supply, assuming that the AI stuff all ends in tears (for now).
Yep! Shareholders and capitalists overinvesting in stuff is great if it leaves behind great infrastructure. They take the risk and the public benefits.
There is a belief that we've peaked in terms of bigger model = better results. I think it was GPT 4 is actually smaller in parameter count and better than 3.5 for instance. There is also a finite amount of useful data that people think we have hit, so adding more parameters isn't helpful if you don't have new data to train them on.
Can someone explain how DeepSeek cut that estimate? Their (fast) API is always down, and the third-party providers on OpenRouter are more expensive than Claude.