Particle.news

Download on the App Store

DeepSeek V3 Challenges AI Giants With Open-Source Model and Efficiency Breakthroughs

The Chinese AI startup's new model boasts 671 billion parameters, outperforms competitors on benchmarks, and cuts training costs significantly.

  • DeepSeek V3, an open-source AI model with 671 billion parameters, outpaces Meta's Llama 3.1 and OpenAI's GPT-4o in key benchmarks, including coding and math tasks.
  • The model's Mixture-of-Experts architecture activates only relevant parameters for tasks, enhancing efficiency and accuracy during processing.
  • DeepSeek trained the model on 14.8 trillion tokens using just 2,048 Nvidia H800 GPUs over two months, reducing costs to $5.58 million—far less than competitors' budgets.
  • Despite its achievements, the model has limitations, such as lacking multimodal capabilities and being subject to Chinese regulatory constraints on politically sensitive topics.
  • Concerns have arisen about potential contamination in training data, as the model occasionally identifies itself as ChatGPT, raising questions about data sourcing and ethical AI practices.
Hero image