DeepSeek V3 Challenges AI Giants With Open-Source Model and Efficiency Breakthroughs

The Chinese AI startup's new model boasts 671 billion parameters, outperforms competitors on benchmarks, and cuts training costs significantly.

Overview

DeepSeek V3, an open-source AI model with 671 billion parameters, outpaces Meta's Llama 3.1 and OpenAI's GPT-4o in key benchmarks, including coding and math tasks.
The model's Mixture-of-Experts architecture activates only relevant parameters for tasks, enhancing efficiency and accuracy during processing.
DeepSeek trained the model on 14.8 trillion tokens using just 2,048 Nvidia H800 GPUs over two months, reducing costs to $5.58 million—far less than competitors' budgets.
Despite its achievements, the model has limitations, such as lacking multimodal capabilities and being subject to Chinese regulatory constraints on politically sensitive topics.
Concerns have arisen about potential contamination in training data, as the model occasionally identifies itself as ChatGPT, raising questions about data sourcing and ethical AI practices.