DeepSeek V3 Challenges AI Giants With Open-Source Model and Efficiency Breakthroughs
The Chinese AI startup's new model boasts 671 billion parameters, outperforms competitors on benchmarks, and cuts training costs significantly.
- DeepSeek V3, an open-source AI model with 671 billion parameters, outpaces Meta's Llama 3.1 and OpenAI's GPT-4o in key benchmarks, including coding and math tasks.
- The model's Mixture-of-Experts architecture activates only relevant parameters for tasks, enhancing efficiency and accuracy during processing.
- DeepSeek trained the model on 14.8 trillion tokens using just 2,048 Nvidia H800 GPUs over two months, reducing costs to $5.58 million—far less than competitors' budgets.
- Despite its achievements, the model has limitations, such as lacking multimodal capabilities and being subject to Chinese regulatory constraints on politically sensitive topics.
- Concerns have arisen about potential contamination in training data, as the model occasionally identifies itself as ChatGPT, raising questions about data sourcing and ethical AI practices.