DeepSeek’s AI breakthrough rivals top models at a fraction of the cost, proving open source innovation is reshaping AI’s future. Is this an AI race or an open vs. closed battle?
Apparently DeepSeek is lying, they were collecting thousands of NVIDIA chips against the US embargo and it’s not about the algorithm. The model’s good results come just from sheer chip volume and energy used. That’s the story I’ve heard and honeslty it sounds legit.
Not sure if this questions has been answered though: if it’s open sourced, cant we see what algorithms they used to train it? If we could then we would know the answer. I assume we cant, but if we cant, then whats so cool about it being open source on the other hand? What parts of code are valuable there besides algorithms?
So are these techiques so novel and breaktrough? Will we now have a burst of deepseek like models everywhere? Cause that’s what absolutely should happen if the whole storey is true. I would assume there are dozens or even hundreds of companies in USA that are in a posession of similar number but surely more chips that Chinese folks claimed to trained their model on, especially in finance sector and just AI reserach focused.
The general concept, no. (it’s reinforcement learning, something that’s existed for ages)
The actual implementation, yes. (training a model to think using a separate XML section, reinforcing with the highest quality results from previous iterations using reinforcement learning that naturally pushes responses to the highest rewarded outputs) Most other companies just didn’t assume this would work as well as throwing more data at the problem.
This is actually how people believe some of OpenAI’s newest models were developed, but the difference is that OpenAI was under the impression that more data would be necessary for the improvements, and thus had to continue training the entire model with additional new information, and they also assumed that directly training in thinking times was the best route, instead of doing so via reinforcement learning. DeepSeek decided to simply scrap that part altogether and go solely for reinforcement learning.
Will we now have a burst of deepseek like models everywhere?
Probably, yes. Companies and researchers are already beginning to use this same methodology. Here’s a writeup about S1, a model that performs up to 27% better than OpenAI’s best model. S1 used Supervised Fine Tuning, and did something so basic, that people hadn’t previously thought to try it: Just making the model think longer by modifying terminating XML tags.
This was released days after R1, based on R1’s initial premise, and creates better quality responses. Oh, and of course, it cost $6 to train.
So yes, I think it’s highly probable that we see a burst of new models, or at least improvements to existing ones. (Nobody has a very good reason to make a whole new model of a different name/type when they can simply improve the one they’re already using and have implemented)
Note that s1 is transparently a distilled model instead of a model trained from scratch, meaning it inherits knowledge from an existing model (Gemini 2.0 in this case) and doesn’t need to retrain its knowledge nearly as much as training a model from scratch. It’s still important, but the training resources aren’t really directly comparable.
True, but I’m of the belief that we’ll probably see a continuation of the existing trend of building and improving upon existing models, rather than always starting entirely from scratch. For instance, you’ll almost always see nearly any newly released model talk about the performance of their Llama version, because it just produces better results when you combine it with the existing quality of Llama.
I think we’ll see a similar trend now, just with R1 variants instead of Llama variants being the primary new type used. It’s just fundamentally inefficient to start over from scratch every time, so it makes sense that newer iterations would be built directly on previous ones.
There’s so much misinfo spreading about this, and while I don’t blame you for buying it, I do blame you for spreading it. “It sounds legit” is not how you should decide to trust what you read. Many people think the earth is flat because the conspiracy theories sound legit to them.
DeepSeek probably did lie about a lot of things, but their results are not disputed. R1 is competitive with leading models, it’s smaller, and it’s cheaper. The good results are definitely not from “sheer chip volume and energy used”, and American AI companies could have saved a lot of money if they had used those same techniques.
https://www.youtube.com/watch?v=RSr_vwZGF2k
This is what I watched. I base my opinion on this. Im not saying this is true. It just sounded legit enough and I didnt have time to research more. I will gladly follow some links that lead me to content that destroys this guys arguments
My god, the preamble for that thing is so dang long. 13:30 with some AI sponsorship the comments are talking about I may have accidentally skipped over, and only 10:27-11:37 deals with what you’re talking about. The video makes a good point that they have existing operating infrastructure. However, for the stockpiling accusation, the statements that it cites are from the CEO of big competitor “Chips AI”, who cite nothing except “only costing $6 million is impossible, therefore it actually cost more and they must have cheated! I think they have 50,000 illegally imported Nvidia GPUs!” which just sounds like the behavior of a cult ringleader trying to maintain power to me. The other source it cites for this claim is Elon Musk, whose reasoning was “Obviously”.
I just think that no matter whether DeepSeek smuggled or not, an investigation into whether or not they smuggled is of course going to be launched. I do want more transparency regarding where the Singapore billing goes, but that alone is too shaky for conclusions.
WTF dude. You mentioned Asia. I love Asians. Asia is vast. There are many countries, not just China bro. I think you need to do these reflections.
Im talking about very specific case of Chinese Deepseek devs potentiall lying about the chips. The assumptions and generalizations you are thinking of are crazy.
Well maybe. Apparntly some folks are already doing that but its not done yet. Let’s wait for the results. If everything is legit we should have not one but plenty of similar and better models in near future. If Chinese did this with 100 chips imagine what can be done with 100000 chips that nvidia can sell to a us company
I don’t like this. Everything you’re saying is true, but this argument isn’t persuasive, it’s dehumanizing. Making people feel bad for disagreeing doesn’t convince them to stop disagreeing.
A more enlightened perspective might be “this might be true or it might not be, so I’m keeping an open mind and waiting for more evidence to arrive in the future.”
Not the original commenter, but what theirs saying stands true. The issue of “sounds legit” is the main driving force in misinformation right now.
The only way to combat it is to truly gain the knowledge yourself. Accepting things at face value has lead to massive disagreements on objective information, and allowed anti science mindsets to flourish.
Podcasts are the medium that I give the most blame to. Just because someone has a camera and a microphone, viewers believe them to be an authority on a subject, and pairing this with the “sounds Legit” mindset has set back critical thinking skills for an entire population.
Its just my opnion based on few sources I saw on the web. Should I attach them as links to the comment? I guess I could. But thats extra time which Im not sure I want to spend. Imagine the discussion where both sides provide links and sources to everything they say. Would be great? I guess? But at the same time would be very diffcult on both sides and time consuming. Nobody doest that in todays internet. Nobody ever did that in causal conversations. Not just internet acutally, in both real life and internet. Providing evidence is generally for court talk.
You are right. We are all on our own in pursue of truth. And with rise of AI and fake reality things are going to be crazier and crazier each year. Pair that also with the fact that our brains have limited storage capacity for information and knowledge and it doesnt look bright for humans. I stay optimistic though despite that.
I disagree with you, links are not that long to share. It is a bit more time consuming obviously, but everyone can choose whether to read quickly or really dive in the sources. I see a lot of people doing it today on internet. I see a lot of people doing it in casual conversation (opening a book or internet to check smthg). It’s not evidence, it’s hints to avoid launching a whole discussion that entirely lies or bullshit (or not).
Here are some links I found about smuggled chips.
Reuters : Deepseek said they used legally imported old and new nvidia chips (H800 and H20s). There are suspicions and investigations about illegal smuggling of banned from export nvidia chips, targeting directly Deepseek. One CEO of an american AI startup said it is likely Deepseek used smuggled chips.
The Diplomat : exactly the same, citing directly Reuters. Adds that H800 (now banned from export) and H20s were designed by Nvidia specially for the chinese market. Adds that smuggling could go through Singapore, which leaped from 9% to 22% of Nvidia revenues in 2 years. Nvidia and Singapore representatives deny.
So it is likely there are smuggled chips in china if we believe this. Now to say they have been used by Deepseek and even more, that they have been decisive is still very unclear.
We already have all the evidence. This isn’t some developing story, the paper is reproducible. What’s dehumanizing is assuming that Asians can’t make good software.
Apparently DeepSeek is lying, they were collecting thousands of NVIDIA chips against the US embargo and it’s not about the algorithm. The model’s good results come just from sheer chip volume and energy used. That’s the story I’ve heard and honeslty it sounds legit.
Not sure if this questions has been answered though: if it’s open sourced, cant we see what algorithms they used to train it? If we could then we would know the answer. I assume we cant, but if we cant, then whats so cool about it being open source on the other hand? What parts of code are valuable there besides algorithms?
The open paper they published details the algorithms and techniques used to train it, and it’s been replicated by researchers already.
So are these techiques so novel and breaktrough? Will we now have a burst of deepseek like models everywhere? Cause that’s what absolutely should happen if the whole storey is true. I would assume there are dozens or even hundreds of companies in USA that are in a posession of similar number but surely more chips that Chinese folks claimed to trained their model on, especially in finance sector and just AI reserach focused.
The general concept, no. (it’s reinforcement learning, something that’s existed for ages)
The actual implementation, yes. (training a model to think using a separate XML section, reinforcing with the highest quality results from previous iterations using reinforcement learning that naturally pushes responses to the highest rewarded outputs) Most other companies just didn’t assume this would work as well as throwing more data at the problem.
This is actually how people believe some of OpenAI’s newest models were developed, but the difference is that OpenAI was under the impression that more data would be necessary for the improvements, and thus had to continue training the entire model with additional new information, and they also assumed that directly training in thinking times was the best route, instead of doing so via reinforcement learning. DeepSeek decided to simply scrap that part altogether and go solely for reinforcement learning.
Probably, yes. Companies and researchers are already beginning to use this same methodology. Here’s a writeup about S1, a model that performs up to 27% better than OpenAI’s best model. S1 used Supervised Fine Tuning, and did something so basic, that people hadn’t previously thought to try it: Just making the model think longer by modifying terminating XML tags.
This was released days after R1, based on R1’s initial premise, and creates better quality responses. Oh, and of course, it cost $6 to train.
So yes, I think it’s highly probable that we see a burst of new models, or at least improvements to existing ones. (Nobody has a very good reason to make a whole new model of a different name/type when they can simply improve the one they’re already using and have implemented)
Note that s1 is transparently a distilled model instead of a model trained from scratch, meaning it inherits knowledge from an existing model (Gemini 2.0 in this case) and doesn’t need to retrain its knowledge nearly as much as training a model from scratch. It’s still important, but the training resources aren’t really directly comparable.
True, but I’m of the belief that we’ll probably see a continuation of the existing trend of building and improving upon existing models, rather than always starting entirely from scratch. For instance, you’ll almost always see nearly any newly released model talk about the performance of their Llama version, because it just produces better results when you combine it with the existing quality of Llama.
I think we’ll see a similar trend now, just with R1 variants instead of Llama variants being the primary new type used. It’s just fundamentally inefficient to start over from scratch every time, so it makes sense that newer iterations would be built directly on previous ones.
There’s so much misinfo spreading about this, and while I don’t blame you for buying it, I do blame you for spreading it. “It sounds legit” is not how you should decide to trust what you read. Many people think the earth is flat because the conspiracy theories sound legit to them.
DeepSeek probably did lie about a lot of things, but their results are not disputed. R1 is competitive with leading models, it’s smaller, and it’s cheaper. The good results are definitely not from “sheer chip volume and energy used”, and American AI companies could have saved a lot of money if they had used those same techniques.
Sauce?
It’s open sauce.
internet
Elaborate? Link? Please tell me this is not just an “allegedly”.
It’s your burden of proof, bud.
https://www.youtube.com/watch?v=RSr_vwZGF2k This is what I watched. I base my opinion on this. Im not saying this is true. It just sounded legit enough and I didnt have time to research more. I will gladly follow some links that lead me to content that destroys this guys arguments
My god, the preamble for that thing is so dang long. 13:30 with some AI sponsorship the comments are talking about I may have accidentally skipped over, and only 10:27-11:37 deals with what you’re talking about. The video makes a good point that they have existing operating infrastructure. However, for the stockpiling accusation, the statements that it cites are from the CEO of big competitor “Chips AI”, who cite nothing except “only costing $6 million is impossible, therefore it actually cost more and they must have cheated! I think they have 50,000 illegally imported Nvidia GPUs!” which just sounds like the behavior of a cult ringleader trying to maintain power to me. The other source it cites for this claim is Elon Musk, whose reasoning was “Obviously”.
This is after all, a court of law.
I just think that no matter whether DeepSeek smuggled or not, an investigation into whether or not they smuggled is of course going to be launched. I do want more transparency regarding where the Singapore billing goes, but that alone is too shaky for conclusions.
No one here is going to be involved with any of it.
um, yeah?
So stop trying to grill people as if anyone here is a lawyer.
It’s time for you to do some serious self-reflection about the inherent biases you believe about
AsiansChinese people.WTF dude. You mentioned Asia. I love Asians. Asia is vast. There are many countries, not just China bro. I think you need to do these reflections. Im talking about very specific case of Chinese Deepseek devs potentiall lying about the chips. The assumptions and generalizations you are thinking of are crazy.
And how do your feelings stand up to the fact that independent researchers find the paper to be reproducible?
Well maybe. Apparntly some folks are already doing that but its not done yet. Let’s wait for the results. If everything is legit we should have not one but plenty of similar and better models in near future. If Chinese did this with 100 chips imagine what can be done with 100000 chips that nvidia can sell to a us company
“China bad”
*sounds legit
Sounds legit is what one hears about FUD spread by alglophone media every time the US oligarchy is caught with their pants down.
Snowden: “US is illegally spying on everyone”
Media: Snowden is Russia spy
*Sounds legit
France: US should not unilaterally invade a country
Media: Iraq is full of WMDs
*Sounds legit
DeepSeek: Guys, distillation and body of experts is a way to save money and energy, here’s a paper on how to do same.
Media: China bad, deepseek must be cheating
*Sounds legit
I don’t like this. Everything you’re saying is true, but this argument isn’t persuasive, it’s dehumanizing. Making people feel bad for disagreeing doesn’t convince them to stop disagreeing.
A more enlightened perspective might be “this might be true or it might not be, so I’m keeping an open mind and waiting for more evidence to arrive in the future.”
Not the original commenter, but what theirs saying stands true. The issue of “sounds legit” is the main driving force in misinformation right now.
The only way to combat it is to truly gain the knowledge yourself. Accepting things at face value has lead to massive disagreements on objective information, and allowed anti science mindsets to flourish.
Podcasts are the medium that I give the most blame to. Just because someone has a camera and a microphone, viewers believe them to be an authority on a subject, and pairing this with the “sounds Legit” mindset has set back critical thinking skills for an entire population.
More people need to read Jurassic park.
Its just my opnion based on few sources I saw on the web. Should I attach them as links to the comment? I guess I could. But thats extra time which Im not sure I want to spend. Imagine the discussion where both sides provide links and sources to everything they say. Would be great? I guess? But at the same time would be very diffcult on both sides and time consuming. Nobody doest that in todays internet. Nobody ever did that in causal conversations. Not just internet acutally, in both real life and internet. Providing evidence is generally for court talk.
You are right. We are all on our own in pursue of truth. And with rise of AI and fake reality things are going to be crazier and crazier each year. Pair that also with the fact that our brains have limited storage capacity for information and knowledge and it doesnt look bright for humans. I stay optimistic though despite that.
I disagree with you, links are not that long to share. It is a bit more time consuming obviously, but everyone can choose whether to read quickly or really dive in the sources. I see a lot of people doing it today on internet. I see a lot of people doing it in casual conversation (opening a book or internet to check smthg). It’s not evidence, it’s hints to avoid launching a whole discussion that entirely lies or bullshit (or not).
Here are some links I found about smuggled chips.
So it is likely there are smuggled chips in china if we believe this. Now to say they have been used by Deepseek and even more, that they have been decisive is still very unclear.
Damn you sound like bot you know that? No typos, perfect answer. Seriously. Can you prove you’re a human? xd
There’s actually a typo, i wrote “relies or bullshit” instead of " on bullshit"
Sounds legit
Yup. Thats internet nowadays. Full of comments like this. Cant do muich about it
We already have all the evidence. This isn’t some developing story, the paper is reproducible. What’s dehumanizing is assuming that Asians can’t make good software.
This is your brain on Chinese/Russian propaganda.
Can you point out any factual inaccuracies or is it just that your wittew fee-fees got hurt?
Ah, cool, a new account to block.
Cope be strong in this one lol