DeepSeek Discloses $294,000 Training Cost for R1 Model, Stirring Fresh AI Race Debate

DeepSeek Discloses $294,000 Training Cost for R1 Model, Stirring Fresh AI Race Debate

Chinese AI firm DeepSeek has revealed that it spent just $294,000 training its R1 model, a figure far below the hundreds of millions cited by U.S. rivals, according to a peer-reviewed article in Nature published Wednesday. The disclosure is the first time the Hangzhou-based company has detailed R1’s training costs and is expected to revive discussion over China’s competitiveness in artificial intelligence.

DeepSeek first attracted global attention in January when it announced lower-cost AI systems, an event that shook investor confidence and sent tech stocks sliding amid fears its technology could disrupt the dominance of U.S. players such as Nvidia. Since then, the company and founder Liang Wenfeng have kept a low profile, sharing only limited product updates. Liang is listed as a co-author of the Nature article.

According to the paper, the reasoning-focused R1 model was trained using 512 Nvidia H800 chips over 80 hours, with costs totaling $294,000. Earlier drafts of the article had not included these details. Training large language models generally involves running massive chip clusters for extended periods, which can cost tens or even hundreds of millions. OpenAI CEO Sam Altman said in 2023 that training foundational models had cost “much more” than $100 million, though his company has not disclosed exact figures.

DeepSeek’s statements on its hardware and methods have drawn scrutiny from U.S. officials and companies. While Nvidia has said DeepSeek lawfully used H800 chips designed for the Chinese market, U.S. authorities told Reuters in June that the firm also had access to restricted H100 chips. In supplementary documents, DeepSeek admitted for the first time to owning Nvidia A100 chips, which were used for smaller preparatory experiments before scaling up training on H800s.

The company also addressed criticism over its reliance on model distillation, a technique in which one AI system learns from another. Some U.S. experts, including a White House adviser, accused DeepSeek of distilling OpenAI’s work. DeepSeek defended the approach, stating that distillation improves efficiency and reduces training costs, enabling broader AI access. “Regarding our research on DeepSeek-R1, we utilized the A100 GPUs to prepare for the experiments with a smaller model,” its researchers wrote.

DeepSeek further acknowledged that data for its V3 model included crawled web content containing “a significant number of OpenAI-model-generated answers”, though it insisted this was incidental rather than intentional. OpenAI did not immediately respond to requests for comment.

- Advertisement -

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles

error: Content is protected !!

Share your details to download the Cybersecurity Report 2025

Share your details to download the CISO Handbook 2025

Sign Up for CXO Digital Pulse Newsletters

Share your details to download the Research Report

Share your details to download the Coffee Table Book

Share your details to download the Vision 2023 Research Report

Download 8 Key Insights for Manufacturing for 2023 Report

Sign Up for CISO Handbook 2023

Download India’s Cybersecurity Outlook 2023 Report

Unlock Exclusive Insights: Access the article

Download CIO VISION 2024 Report

Share your details to download the report

Share your details to download the CISO Handbook 2024

Fill your details to Watch