Why China's Deepseek Ai is blowing everyone's heads and blowing up the market

A Chinese artificial intelligence lab has done more than just build a cheaper AI model – it has exposed the inefficiency of the entire industry’s approach.

Deepseek’s breakthrough showed how a small team, in an effort to save money, could rethink how AI models are built. While tech giants like Openai and Anthropic spend several billion dollars just on computing power alone, Deepseek reportedly achieved similar results for just over $5 million.

The company’s model matches or beats GPT-4O (Openai’s Best LLM), OpenAI O1—Openai’s best reasoning model currently available—and Anthropic’s Claude 3.5 Sonnet on many benchmark tests, with approximately 2,788 m H800 GPU hours for the complete training. That’s a very small fraction of the hardware that traditional thought was necessary.

The model is so good and efficient, it rose to the top of Apple’s iOS productivity category in a matter of days, challenging Openai’s dominance.

Whether Deepseek’s numbers are legit or cooked doesn’t matter from a consumer standpoint. If their goal is to ignite a “race to the bottom” in touting power and lure US users into sending their data abroad by bootload, they’ve already succeeded. Deepseek’s app has… pic.twitter.com/6osst8fzfd

-Wolfe (@EveryTimeicash) January 27, 2025

Necessity is the mother of innovation. The team was able to achieve this using techniques that American developers didn’t even have to consider – and don’t even dominate today. Perhaps most importantly, instead of using full precision for calculations, Deepseek implemented 8-bit training, which reduced memory requirements by 75%.

“They discovered an 8-bit floating point training, at least for some of the numerics,” Pertlexity’s Aravind Srinivas told CNBC. “As far as I know, I don’t think the Floating-Point 8 training is very well understood. Most of the training in America is still running in FP16.”

FP8 uses half the memory bandwidth and storage compared to FP16. For large AI models with billions of parameters, this reduction is significant. Deepseek had to control this because the hardware was weaker, but Openai never had this limitation.

Deepseek also developed a “multi-token” system that processes entire sentences simultaneously instead of individual words, making the system twice as fast while maintaining 90% accuracy.

Another technique it used was something called “distillation” – creating a small model that replicates the outputs of a larger one without having to train on the same knowledge base. This made it possible to release smaller models that are extremely efficient, accurate and competitive.

The company also used a technique called “Mix of Experts,” which added to the model’s efficiency. While traditional models keep all their parameters constantly active, Deepseek’s system uses 671 billion total parameters, but only activates 37 billion at a time. It’s like having a large team of specialists, but only calling on the experts needed for certain tasks.

“We use DeepSeek-R1 as the teacher model to generate 800K training samples and tune several small dense models. The results are promising: Deepseek-R1-Distill-Qwen-1.5B outperforms GPT-4O and Claude-3.5-Sondet on math benchmarks by 28.9% on AIME and 83.9% on math,” wrote Deepseek in his paper.

For context, 1.5 billion is such a small amount of parameters for a model that it is not considered an LLM or large language model, but rather an SLM or small language model. SLMS requires so little computation and VRAM that users can run them on weak machines like their smartphones.

Obviously, you don’t need Openai to have billions of dollars in hardware to answer the same question that the Deepseek on your home computer can answer without an internet connection.

It literally costs 97% less per question. pic.twitter.com/ebc0dhjaru

– Financelot (@financelancelot) January 26, 2025

The cost implications are staggering. In addition to the 95% reduction in training costs, Deepseek’s API charges just 10 cents per million tokens, compared to $4.40 for similar services. One developer reported processing 200,000 API requests for about 50 cents, with no rate cap.

The “Deepseek effect” is already noticeable. “Let me say the quiet part out loud: AI Model Building is a money trap,” said investor Chamath Palihapitiya. And despite the punches thrown at Deepseek, Openai CEO Sam Altman quickly pumped the brakes on his quest to squeeze users for money, after all the raves on social media about people achieving for free with Deepseek what Openai charges $200 a month to do.

Okay, we heard you all.

*Plus Tier gets 100 o3 mini-questions per day (!)
*We will get the operator to plus Tier as soon as possible
*Our next agent will launch with availability in the Plus Tier

Enjoy 😊 https://t.co/w8sfsq6mi111

-Sam Altman (@sama) January 25, 2025

Meanwhile, the Deepseek app is topping the download charts and three of the top six trending repositories on GitHub are related to Deepseek.

Most AI stocks are little more than investors wondering if the hype is at bubble levels. Both AI hardware (NVIDIA, AMD) and software stocks (Microsoft, Meta and Google) are suffering the consequences of the apparent paradigm shift caused by Deepseek’s announcement and the results shared by users and developers.

Even AI Crypto Tokens took a hit, with Scads or Deepseek AI tokens popping up in an attempt to scam days.

Beyond the financial wreckage, the takeaway from all this is that Deepseek’s breakthrough suggests that AI development may not require massive data centers and specialized hardware. This could fundamentally change the competitive landscape, turning what many considered permanent advantages of big tech companies into temporary leads.

While Anthropic and Openai were busy showing off and trying to hype things up to attract heavy investment, Deepseek came out of nowhere and completely washed them off.pic.twitter.com/nknpyycgug

– Ashutoshshrivastava (@AI_FOR_SUCCESS) January 27, 2025

The timing is almost comical. Just days before Deepseek’s announcement, President Trump, Openai’s Sam Altman and Oracle’s founder unveiled Project Stargate – a $500 billion investment in America’s AI infrastructure. Meanwhile, Mark Zuckerberg doubled down on Meta’s commitment to pouring billions into AI development, and Microsoft’s $13 billion investment in Openai suddenly looks less like strategic genius and more like expensive FOMO fueled by wasted resources.

“Whatever you did to not overtake them didn’t even matter,” Srinivas said CNBC. “They’re catching up anyway.”

Edited by Andrew Hayward

Generally intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.

Source link

What's Hot

SEI targets 55% rally as native USDC support sparks inverse H&S breakout

Earn 3,777 XRP Daily! BJMINING Attracts a Surge of XRP Whales

Another BTC Mining Firm Moves Into Ethereum Reserve, Hailing ETH as ‘Digital Gold’

Why China’s Deepseek Ai is blowing everyone’s heads and blowing up the market

Grok 4 Basic Provision: $ 30 per month for this? Elon Musk’s Ai now thinks as he

Tether to terminate USDT -rejects on Bitcoin Cash, Algorand and Beyond

The Global Sports Technology Market: A Rapid Ascent Towards $60 Billion by 2032

Coinbase, strategy, other bitcoin and crypto shares rise in record week

Shiba Inu DEX Records Staggering 244% Growth in Trading Volume to Over $10M

The US Luxury Pens Market is Projected to Reach $340.28 Million by 2029 – Arizton

Will Shiba Inu (SHIB) Break $0.000083? Price Prediction for 2025

What's Hot

Why China’s Deepseek Ai is blowing everyone’s heads and blowing up the market

Generally intelligent Newsletter

Related Posts