SAN FRANCISCO, June 4 (Reuters) - Nvidia's ( NVDA )
newest chips have made gains in training large artificial
intelligence systems, new data released on Wednesday showed,
with the number of chips required to train large language models
dropping dramatically.
MLCommons, a nonprofit group that publishes benchmark
performance results for AI systems, released new data about
chips from Nvidia ( NVDA ) and Advanced Micro Devices ( AMD ), among
others, for training, in which AI systems are fed large amounts
of data to learn from. While much of the stock market's
attention has shifted to a larger market for AI inference, in
which AI systems handle questions from users, the number of
chips needed to train the systems is still a key competitive
concern. China's DeepSeek claims to create a competitive chatbot
using far fewer chips than U.S. rivals.
The results were the first that MLCommons has released about
how chips fared at training AI systems such as Llama 3.1 405B,
an open-source AI model released by Meta Platforms ( META ) that
has a large enough number of what are known as "parameters" to
give an indication of how the chips would perform at some of the
most complex training tasks in the world, which can involve
trillions of parameters.
Nvidia ( NVDA ) and its partners were the only entrants that submitted
data about training that large model, and the data showed that
Nvidia's ( NVDA ) new Blackwell chips are, on a per-chip basis, more than
twice as fast as the previous generation of Hopper chips.
In the fastest results for Nvidia's ( NVDA ) new chips, 2,496
Blackwell chips completed the training test in 27 minutes. It
took more than three times that many of Nvidia's ( NVDA ) previous
generation of chips to get a faster time, according to the data.
In a press conference, Chetan Kapoor, chief product officer
for CoreWeave ( CRWV ), which collaborated with Nvidia ( NVDA ) to produce some of
the results, said there has been a trend in the AI industry
toward stringing together smaller groups of chips into
subsystems for separate AI training tasks, rather than creating
homogenous groups of 100,000 chips or more.
"Using a methodology like that, they're able to continue to
accelerate or reduce the time to train some of these crazy,
multi-trillion parameter model sizes," Kapoor said.