Monday, February 3, 2025

Dubious DeepSeek debut dings AI infrastructure drive

Must read

Shares in ASML, NVIDIA, Softbank and others take a hit following the launch of the R1 LLM which DeepSeek said it spent $6 million to train

Chinese AI startup DeepSeek made quite a splash last week with the release of its open source R1 large language model (LLM). The company reports capabilities on-par with OpenAI and, based on some details from a technical report published in December 2024, perhaps only spent around $6 million on its latest training run. This contrasts quite sharply with the billions spent (and projected to be spent) by Western firms like OpenAI. Financial markets responded sharply to the news with shares in ASML, Microsoft, NVIDIA and other AI specialists, and tech more broadly all taking a hit over the weekend and with the market open today. 

Groq CEO Jonathan Ross, sitting on a panel last week at the World Economic Forum annual meeting in Davos, Switzerland, was asked how consequential DeepSeek’s announcement was. Ross said it was incredibly consequential but reminded the audience that R1 was trained on around 14 trillion tokens and used around 2,000 GPUs for its training run, both similar to training Meta’s open source 70 billion parameter Llama LLM. He also said DeepSeek is pretty good at marketing themselves and “making it seem like they’ve done something amazing.” Ross also said DeepSeek is a major OpenAI customer in terms of buying quality datasets rather than the arduous, and expensive, process of scraping the entirety of the internet then separating useful form useless data. 

The bigger point, Ross said, is that “open models will win. Where I think everyone is getting confused though is when you have a model, you can amortize the cost of developing that, then distribute it.” But models don’t stay new for long, meaning there’s a durable appetite for AI infrastructure and compute cycles. And this gets into what he sees as a race between the U.S. and China. “We cannot do closed models anymore and be competitive…Without the compute, it doesn’t matter how good the model is. What countries are going to be tussling over is how much compute they have access to.” 

Back to that $6 million. The tech stock sell-off feels reactionary given DeepSeek hasn’t exactly provided an itemized receipt of its costs; and those costs feel incredibly misaligned with everything we know about LLM training and the underlying AI infrastructure needed to support it. Based on information DeepSeek itself has provided, they used a compute cluster built with 2,048 NVIDIA H800 GPUs. While it’s never clear exactly how much vendors charge for things like this, if you assume a sort of mid-point price of $12,500 per GPU, we’re well past $6 million, so that price apparently doesn’t include GPUs or any other of the necessary infrastructure, rather rented or owned, used in training. 

Anyway, the real cost of training and investors’ huge reactions to a kind of arbitrary number aside, DeepSeek does appear to have built a performant tool in a very efficient way. And this is a major focus of AI industry discourse—post-training optimizations and reinforcement learning, test-time training and lowering model size are all teed up to help chip away at the astronomical costs associated with propping up the established laws of AI scaling. 

The folks at IDC had a take on this which, as published, was about the $500 billion Project Stargate announcement that, again, encapsulates the capital outlay needed to train ever-larger LLMs. “There are indications that the AI industry will soon be pivoting away from training massive [LLMs} for generalist use cases. Instead, smaller models that are much more fine-tuned and customized for highly specific use cases will be taking over. These small language models…do not require such huge infrastructure environments. Sparse models, narrow models, low precision  models–much research is currently being done to dramatically reduce the infrastructure needs of AI model development while retaining their accuracy rates.” 

Dan Ives, managing director and senior equity research analyst with Wedbush Securities, wrote on X, “DeepSeek is a competitive LLM model for consumer use cases…Launching broader AI infrastructure [is] a whole other ballgame and nothing with DeepSeek makes us believe anything different. It’s about [artificial general intelligence] for big tech and DeepSeek’s noise.” As for the price drop in firm’s like NVIDIA, Ives characterized it as a rare buying opportunity. 

There are lots of takes out there on DeepSeek, what it means for model training, what it means for U.S. companies, etc…The bottomline is that open is better than closed and more efficient model training is an important element of scale; however, the outlook for AI infrastructure investment is still positive despite short-term market dynamics based on what is at best incomplete information.

Latest article