A Chinese artificial intelligence lab has done more than just build a cheaper AI model – it has exposed the inefficiency of the entire industry’s approach.
Deepseek’s breakthrough showed how a small team, in an effort to save money, could rethink how AI models are built. While tech giants like Openai and Anthropic spend several billion dollars just on computing power alone, Deepseek reportedly achieved similar results for just over $5 million.
The company’s model matches or beats GPT-4O (Openai’s Best LLM), OpenAI O1—Openai’s best reasoning model currently available—and Anthropic’s Claude 3.5 Sonnet on many benchmark tests, with approximately 2,788 m H800 GPU hours for the complete training. That’s a very small fraction of the hardware that traditional thought was necessary.
The model is so good and efficient, it rose to the top of Apple’s iOS productivity category in a matter of days, challenging Openai’s dominance.
Necessity is the mother of innovation. The team was able to achieve this using techniques that American developers didn’t even have to consider – and don’t even dominate today. Perhaps most importantly, instead of using full precision for calculations, Deepseek implemented 8-bit training, which reduced memory requirements by 75%.
“They discovered an 8-bit floating point training, at least for some of the numerics,” Pertlexity’s Aravind Srinivas told CNBC. “As far as I know, I don’t think the Floating-Point 8 training is very well understood. Most of the training in America is still running in FP16.”
FP8 uses half the memory bandwidth and storage compared to FP16. For large AI models with billions of parameters, this reduction is significant. Deepseek had to control this because the hardware was weaker, but Openai never had this limitation.
Deepseek also developed a “multi-token” system that processes entire sentences simultaneously instead of individual words, making the system twice as fast while maintaining 90% accuracy.
Another technique it used was something called “distillation” – creating a small model that replicates the outputs of a larger one without having to train on the same knowledge base. This made it possible to release smaller models that are extremely efficient, accurate and competitive.
The company also used a technique called “Mix of Experts,” which added to the model’s efficiency. While traditional models keep all their parameters constantly active, Deepseek’s system uses 671 billion total parameters, but only activates 37 billion at a time. It’s like having a large team of specialists, but only calling on the experts needed for certain tasks.
“We use DeepSeek-R1 as the teacher model to generate 800K training samples and tune several small dense models. The results are promising: Deepseek-R1-Distill-Qwen-1.5B outperforms GPT-4O and Claude-3.5-Sondet on math benchmarks by 28.9% on AIME and 83.9% on math,” wrote Deepseek in his paper.
For context, 1.5 billion is such a small amount of parameters for a model that it is not considered an LLM or large language model, but rather an SLM or small language model. SLMS requires so little computation and VRAM that users can run them on weak machines like their smartphones.
The cost implications are staggering. In addition to the 95% reduction in training costs, Deepseek’s API charges just 10 cents per million tokens, compared to $4.40 for similar services. One developer reported processing 200,000 API requests for about 50 cents, with no rate cap.
The “Deepseek effect” is already noticeable. “Let me say the quiet part out loud: AI Model Building is a money trap,” said investor Chamath Palihapitiya. And despite the punches thrown at Deepseek, Openai CEO Sam Altman quickly pumped the brakes on his quest to squeeze users for money, after all the raves on social media about people achieving for free with Deepseek what Openai charges $200 a month to do.
Meanwhile, the Deepseek app is topping the download charts and three of the top six trending repositories on GitHub are related to Deepseek.
Most AI stocks are little more than investors wondering if the hype is at bubble levels. Both AI hardware (NVIDIA, AMD) and software stocks (Microsoft, Meta and Google) are suffering the consequences of the apparent paradigm shift caused by Deepseek’s announcement and the results shared by users and developers.
Even AI Crypto Tokens took a hit, with Scads or Deepseek AI tokens popping up in an attempt to scam days.
Beyond the financial wreckage, the takeaway from all this is that Deepseek’s breakthrough suggests that AI development may not require massive data centers and specialized hardware. This could fundamentally change the competitive landscape, turning what many considered permanent advantages of big tech companies into temporary leads.
The timing is almost comical. Just days before Deepseek’s announcement, President Trump, Openai’s Sam Altman and Oracle’s founder unveiled Project Stargate – a $500 billion investment in America’s AI infrastructure. Meanwhile, Mark Zuckerberg doubled down on Meta’s commitment to pouring billions into AI development, and Microsoft’s $13 billion investment in Openai suddenly looks less like strategic genius and more like expensive FOMO fueled by wasted resources.
“Whatever you did to not overtake them didn’t even matter,” Srinivas said CNBC. “They’re catching up anyway.”
Edited by Andrew Hayward
Generally intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.