The title of the article is "DeepSeek R2 launch stalled as CEO balks at progress" but the body of the article says launch stalled because there is a lack of GPU capacity due to export restrictions, not because a lack of progress. The body does not even mention the word "progress".
I can't imagine demand would be greater for R2 than for R1 unless it was a major leap ahead. Maybe R2 is going to be a larger/less performant/more expensive model?
Deepseek could deploy in a US or EU datacenter ... but that would be admitting defeat.
>June 26 (Reuters) - Chinese AI startup DeepSeek has not yet determined the timing of the release of its R2 model as CEO Liang Wenfeng is not satisfied with its performance,
>Over the past several months, DeepSeek's engineers have been working to refine R2 until Liang gives the green light for release, according to The Information.
But yes, it is strange how the majority of the article is about lack of GPUs.
I am pretty sure that the information has no access to / sources at Deepseek. At most they are basing their article on selective random internet chatter amongst those who follow Chinese ai.
Presumably there is a CEO statement somewhere. If DeepSeek said May, but it is almost July, that would call for some comment from them.
Although I'd like to know the source for the "this is because of chip sanctions" angle. SMIC is claiming they can manufacture at 5nm and a large number of chips at 7nm can get get the same amount of compute of anything Nvidia produces. It wouldn't be market-leading competitive but delaying the release for a few months doesn't change that. I don't really see how DeepSeek production release dates and the chip sanctions could be linked in the small. Unless they're just including that as an aside.
It is pretty strange that DeepSeek didn't say May anywhere, that was also a Reuters report based on "three people familiar with the company".[1] DeepSeek itself did not respond and did not make any claims about the timeline, ever.
The phrasing for quoting sources is extremely codified, it means the journalists have verified who the sources are (either insider or people with access with insider information).
Sure, if you don't trust anything what's the point. There's a lot of information that relies on anonymous sources and we usually use third party to vet them (otherwise how would they stay anonymous). Without this system we'd be missing out on a lot of things (if only named sources are used, a lot of things would never come out).
(A lot of things break down in society without trust, maybe that's already how the US is? Where I live it is thankfully still somewhat ok)
The Washington Post, The New York Times, The New Republic, The Intercept, Rolling Stone, CBS News, CNN, Newsweek, USA Today, NBC News, Der Spiegel (Germany), The Sunday Times (UK), Daily Mail (UK), Al Jazeera (Qatar), RT (Russia), Xinhua (China), Press TV (Iran), Haaretz (Israel), Le Monde (France), El País (Spain) all have been caught using fake anonymous sources.
Welcome to most China news. Many "well-documented" China "facts" are in fact cases like this: the media taking rumors or straight up fabricating things for clicks, and then self-referencing (or different media referencing each other in a circle) to put up the guise of reliable news.
This is why we need to be critical of journalists nowadays. No longer are they the Fourth Column, protecting society and democracy by providing accurate information.
That sounds to me like you are excusing a bad reality based on a nonexistant ideal. Saying "there are bad journalists" is a huge understatement. There are many, perhaps even the majority. Ask yourself why society at large has stopped trusting mainstream media, it's not just because there are a "few" bad apples but because the bad apples are widespread and systemic.
The tendency to compare to a nonexistant ideal is also something I find very very weird. This tendency does not exist for many other concepts. For example when people talk about communism, and someone say "hey $COUNTRY is just one bad apple, it doesn't mean real communism is bad" then others are quick to respond with "but all countries doing communism have devolved into tyranny/dictatorship/etc, so real communism doesn't exist and what we've seen is the real deal". I am not criticizing that (common) point of view, but people ought to take responsibility and apply this principle equally to all concepts, including "journalism".
It also doesn't follow that my critique of journalists/journalism means tearing down journalism altogether. It can also mean:
- that people need to stop trusting mainstream journalists blindly on topics they're not adept in. Right now many people have stopped trusting mainstream journalists only for topics they're adept in, but as soon as those journalists write nonsense about something else (e.g. $ENEMY_STATE) then they swallow that uncritically. No. The response should be "they lied about X, what else are they lying about?" instead of letting themselves be manipulated in other areas.
- that society as a whole needs to hold journalism accountable, and demand that they return to the role of the Fourth Column.
> Ask yourself why society at large has stopped trusting mainstream media
Because certain political interests take the existence of a fact-based, independent power center as a threat to their own power?
And so engineered a multi-decade campaign to indoctrinate people against the news/media, thus removing a roadblock to imposing their own often contrary-to-fact narratives?
Pretending this happened in a vacuum or was grassroots ignores mountains of money deployed with specific intent over spans of time.
> It can also mean that society as a whole needs to hold journalism accountable, and demand that they return to the role of the Fourth Column.
I absolutely agree with this.
If I had my druthers, the US would reinstate the fairness doctrine (abolished in 1987) and specifically the components requiring large media corporations to subsidize non-profit newsrooms as a public good.
The US would be a better place if we banned 24/7 for-profit news.
>Reporting by Deborah Sophia in Bengaluru; Editing by Arun Koyyur
Kek. Reminder after Sino India drama, India has basically 0 accredited journalist in China. The chances of Indian journalist "citing two people with knowledge of the situation" in Deepseek in Bengalurur before it's spreads over PRC rumor mill is vanishingly small.
Yes. And those random Internet chatter almost certainly doesn't know what they are talking about at all.
First, nobody is training on H20s, it's absurd. Then their logic was, because of high inference demand of DeepSeek models there are high demand of H20 chips, and H20s were banned so better not release new model weights now, otherwise people would want H20s harder.
Which is... even more absurd. The reasoning itself doesn't make any sense. And the technical part is just wrong, too. Using H20 to serve DeepSeek V3 / R1 is just SUPER inefficient. Like, R1 is the most anti-H20 model released ever.
The entire thing makes no sense at all and it's a pity that Reuters fall for that bullshit.
MLA uses way more flops in order to conserve memory bandwidth, H20 has plenty of memory bandwidth and almost no flops. MLA makes sense on H100/H800, but on H20 GQA-based models are a way better option.
Not sure what you are referring to—do you have a pointer to a technical writeup perhaps? In training and inference MLA has way less flops than MHA, which is the gold standard, and way better accuracy (model performance) than GQA (see comparisons in the DeepSeek papers or try deepseek models vs llama for long context.)
More generally, with any hardware architecture you use, you can optimize the throughput for your main goal (initially training; later inference) by balancing other parameters of the architecture. Even if training is suboptimal, if you want to make a global impact with a public model, you aim for the next NVidia inference hardware.
Didn't deep-seek figure out how to train with mixed precision and so get much more out of the cards, with a lot of the training steps able to run at what was traditionally post training quantization type precisions (block compressed).
It's not about people wanting to keep it in moats.
It's about China being expansionist, actively preparing to invade Taiwan, and generally becoming an increasing military threat that does not respect the national integrity of other states.
The US is fine with other countries having AI if the countries "play nice" with others. Nobody is limiting GPU's in France or Thailand.
This is very specific to China's behavior and stated goals.
or also there is an interpretation of State actions that says that a government serves itself first and foremost. Unfortunately this includes governments you like and also governments you do not like. Before rising to invective here, also consider that this is not new at all, since early Kingdoms were ruled by Kings and abuse of power was more common than not, so lots of present day politics around the entire world, do have alternates to simple depostic will.
So instead of splitting hairs about that description, lets highlight an idea that actually, millions of people doing millions of things per day consitutes its own system, despite what name you call it or who collects the taxes. Observing the actual behavior of that system ("data driven"?) has more benefits than hairsplitting of nomenclature for political studies.
Why bother writing this? because simplistic labels for government actions in international affairs is Step 2 of "brain-off" us versus them thinking.
Let's find ways to remove fuels from the fires of war. The stakes are too high. Third call to start thinking instead of invective here. Negotiation and trade are the tools. Name calling on those that work for "peace" is Step 2 again. IMHO
>It's about China being expansionist, actively preparing to invade Taiwan, and generally becoming an increasing military threat that does not respect the national integrity of other states.
Remove the word Taiwan and you are describing the US.
>It's about China being expansionist
US has been doing that since their inception as a country. Are you telling me the USs 750 foreign military bases located in at least 80 foreign countries and territories is NOT expansionism? Come on!
>actively preparing to invade Taiwan
The US illegally invaded Iraq and Afghanistan for 20 years killing and torturing innocents in the process and leaving the Taliban in power to further cause harm. Wow many countries did China invade? Yet somehow China is the boogieman? Please!
> generally becoming an increasing military threat that does not respect the national integrity of other states.
Same with the US, Trump threatened to annex Greenland and Canada, yet I don't see sanctions on the US.
I don't see the US having any ground to stand on criticizing China.
> Thomas Edison's aggressive patent enforcement in the early days of filmmaking, particularly his control over motion picture technology, played a significant role in the development of Hollywood as the center of the film industry. Driven by a desire to control the market and eliminate competition, Edison's lawsuits and business practices pushed independent filmmakers westward, ultimately leading them to establish studios in Los Angeles, away from Edison's legal reach.
So you're capable of knowing what the judges will decide on these cases? You've already decided that they are liable for what they're accused of?
And, isn't this the system working exactly how it is supposed to? Someone makes a claim and the courts decide, and then some kind of punishment will be doled out of the claim was found to be true?
No, I'm not. You made the claim " I’d think it’d be all the companies in the US ignoring IP and copyright laws" - and then point to ongoing cases, where the outcome hasn't even been decided yet. They may be ignoring IP and copyright laws, but no one knows whether they are or aren't yet.
I recall kids in cages back before trump was around. The US doesn't exactly have a clean track record when it comes to human rights and international law yet they are quick to point the finger at anyone else when they cross the line.
They are aiming for world domination by buying themselves into businesses all over the planet and by building up a very large army. But that’s just normal human behavior I guess.
The problem is rather that if the only moral compass is the communist party it will suck
Taiwan is a tricky case. The CCP isn't unjustified in making a claim to it. Granted: that claim is contrary to international norms, law, and the population's self determination.
But if China were only threatening to invade Taiwan it would be a gray area.
Imho, their claims in the South China Sea are much more obviously expansionist, given the settled cases against them under international law.
Much easier to see those boiling over into China invading a few populated islands of the Philippines.
Okay so thanks very much. That's not really a citation that's an opinion?
To translate what you're saying. The Chinese are trying to establish the same kind of global trade collaboration that Europe and the US have done for the past hundred and x years? But the Chinese civilization is over 2000 years old, and they had a much larger global trade network back when the west was a pile of wooden shacks and feudal barbarism?
They're also building up a large army in in the same way that the US and Europe have with NATO? I'm also not really sure what's wrong with the moral compass of the Chinese communist party? From what I can see at the moment it is authoritative, but not necessarily venal?
It seems that the Chinese people themselves are enjoying a pretty good standard of living and quality of life? I've only been there two or three times, but I never saw the same kind of deprivation in China that I saw behind the Iron Curtain for instance.
> I'm also not really sure what's wrong with the moral compass of the Chinese communist party? From what I can see at the moment it is authoritative, but not necessarily venal?
It's certainly corrupt. Xi didn't launch major, disruptive anti-corruption drives for no reason, but because he saw it as an existential threat to the CCP's legitimacy (after all, it did torpedo the Soviet Union).
Granted, an alternate rationale was internecine power struggles within the party and removing political enemies, but there was some real corruption.
The strongman argument against the CCP's moral compass is that it has no concept of or respect for individual rights: the party is above all.
Historically, this has always ended tragically because eventually it will be abused to either justify suffering or party gain at the expense of people.
Authoritarianism only works until someone bad grabs the reigns, and single-party non-democratic systems have a way of rewarding sociopaths.
The fact that the US still has functioning separation of powers is counter evidence.
People may gripe about fuzzy areas being stepped on and norms pushed (and they should gripe!), but there's a huge chasm between separation of powers in democracies and China.
Calling in the marine guard without congress approval seems a little bit un-separate, but I'm not an expert so I'm not going to continue this conversation. You have an opinion and I have my very inexpert one too.
but deepseek doesn't actually need to host inference right if they opensource it? I don't see why these companies even bother to host inference. deepseek doesn't need outreach (everyone knows about them) and the huge demand for sota will force western companies to host them anyway.
Releasing the model has paid off handsomely with name recognition and making a significant geopolitical and cultural statement.
But will they keep releasing the weights or do an OpenAI and come up with a reason they can't release them anymore?
At the end of the day, even if they release the weights, they probably want to make money and leverage the brand by hosting the model API and the consumer mobile app.
If they continue to release the weights + detailed reports what they did, I seriously don't understand why. I mean it's cool. I just don't understand why. It's such a cut throat environment where every little bit of moat counts. I don't think they're naive. I think I'm naive.
Now they are firmly on the map, which presumably helps with hiring, doing deals, influence. If they stop publishing something, they run the risk of being labelled a one-hit wonder who got lucky.
If they have a reason to believe they can do even better in the near future, releasing current tech might make sense.
I think those are valid points but it's hard for me to see that this is worth it. With the might of the CCP in the back and the giant labor pool that is China, surely they can make hiring work either way. If they now start offering a model that's cheaper and better then anyone else's, surely anyone will take notice, even if the weights are not open.
If moving faster is a most, then open source AI could move faster than closed AI by not needing to be paranoid about privacy and welcoming external contributions
> I don't think any of these companies are aiming at long term goal of making money from inference pricing of customers.
What is DeepSeek aiming for if not that, which is currently the only thing they offer that cost money? They claim their own inference endpoints has a cost profit margin of 545%, which might be true or not, but the very fact that they mentioned this at all seems to indicate it is of some importance to them and others.
Well it's certainly helpful in the interim that they can recoup some money from inference. I'm just saying that with systems with more intelligence in the future can be used to make money in much better ways than charging customers for interacting with it. For instance it could conduct research on projects which can generate massive revenue if successful.
I am a bit sceptical about whether this whole thing is true at all. This article links to another, which happens to be behind a paywall, saying 'GPU export sanctions are working' is a message a lot of US administration, people and investors want to hear, so I think there's a good chance that unsubstantiated speculation and wishful thinking is presented as fact here.
Given that DeepSeek is used by the Chinese military, I doubt that it would be a reasonable move for them to host in the U.S., because the capability is about more than profit.
The lack of GPU capacity sounds like bullshit though, and it's unsourced. It's not like you can't offer it as a secondary thing, sort of like O-3 or even just turning on the reasoning.
I think my real problem with it is that how slow it is is easy to predict beforehand. If it doesn't meet the goals set for it speed-wise, they could go for something smaller.
It should only be quality which could be unpredictable before training.
I can't imagine demand would be greater for R2 than for R1 unless it was a major leap ahead. Maybe R2 is going to be a larger/less performant/more expensive model?
Deepseek could deploy in a US or EU datacenter ... but that would be admitting defeat.