*
AI companies face delays and challenges with training new
large
language models
*
Some researchers are focusing on more time for inference
in new
models
*
Shift could impact AI arms race for resources like chips
and
energy
By Krystal Hu and Anna Tong
Nov 11 (Reuters) - Artificial intelligence companies
like OpenAI are seeking to overcome unexpected delays and
challenges in the pursuit of ever-bigger large language models
by developing training techniques that use more human-like ways
for algorithms to "think".
A dozen AI scientists, researchers and investors told Reuters
they believe that these techniques, which are behind OpenAI's
recently released o1 model, could reshape the AI arms race, and
have implications for the types of resources that AI companies
have an insatiable demand for, from energy to types of chips.
OpenAI declined to comment for this story. After the release of
the viral ChatGPT chatbot two years ago, technology companies,
whose valuations have benefited greatly from the AI boom, have
publicly maintained that "scaling up" current models through
adding more data and computing power will consistently lead to
improved AI models.
But now, some of the most prominent AI scientists are speaking
out on the limitations of this "bigger is better" philosophy.
Ilya Sutskever, co-founder of AI labs Safe Superintelligence
(SSI) and OpenAI, told Reuters recently that results from
scaling up pre-training - the phase of training an AI model that
uses a vast amount of unlabeled data to understand language
patterns and structures - have plateaued.
Sutskever is widely credited as an early advocate of achieving
massive leaps in generative AI advancement through the use of
more data and computing power in pre-training, which eventually
created ChatGPT. Sutskever left OpenAI earlier this year to
found SSI.
"The 2010s were the age of scaling, now we're back in the age of
wonder and discovery once again. Everyone is looking for the
next thing," Sutskever said. "Scaling the right thing matters
more now than ever."
Sutskever declined to share more details on how his team is
addressing the issue, other than saying SSI is working on an
alternative approach to scaling up pre-training.
Behind the scenes, researchers at major AI labs have been
running into delays and disappointing outcomes in the race to
release a large language model that outperforms OpenAI's GPT-4
model, which is nearly two years old, according to three sources
familiar with private matters.
The so-called 'training runs' for large models can cost tens of
millions of dollars by simultaneously running hundreds of chips.
They are more likely to have hardware-induced failure given how
complicated the system is; researchers may not know the eventual
performance of the models until the end of the run, which can
take months.
Another problem is large language models gobble up huge amounts
of data, and AI models have exhausted all the easily accessible
data in the world. Power shortages have also hindered the
training runs, as the process requires vast amounts of energy.
To overcome these challenges, researchers are exploring
"test-time compute," a technique that enhances existing AI
models during the so-called "inference" phase, or when the model
is being used. For example, instead of immediately choosing a
single answer, a model could generate and evaluate multiple
possibilities in real-time, ultimately choosing the best path
forward.
This method allows models to dedicate more processing power to
challenging tasks like math or coding problems or complex
operations that demand human-like reasoning and decision-making.
"It turned out that having a bot think for just 20 seconds in a
hand of poker got the same boosting performance as scaling up
the model by 100,000x and training it for 100,000 times longer,"
said Noam Brown, a researcher at OpenAI who worked on o1, at TED
AI conference in San Francisco last month.
OpenAI has embraced this technique in their newly released model
known as "o1," formerly known as Q* and Strawberry, which
Reuters first reported in July. The O1 model can "think" through
problems in a multi-step manner, similar to human reasoning. It
also involves using data and feedback curated from PhDs and
industry experts. The secret sauce of the o1 series is another
set of training carried out on top of 'base' models like GPT-4,
and the company says it plans to apply this technique with more
and bigger base models.
At the same time, researchers at other top AI labs, from
Anthropic, xAI, and Google DeepMind, have also been working to
develop their own versions of the technique, according to five
people familiar with the efforts.
"We see a lot of low-hanging fruit that we can go pluck to
make these models better very quickly," said Kevin Weil, chief
product officer at OpenAI at a tech conference in October. "By
the time people do catch up, we're going to try and be three
more steps ahead."
Google and xAI did not respond to requests for comment and
Anthropic had no immediate comment.
The implications could alter the competitive landscape for
AI hardware, thus far dominated by insatiable demand for
Nvidia's ( NVDA ) AI chips. Prominent venture capital investors, from
Sequoia to Andreessen Horowitz, who have poured billions to fund
expensive development of AI models at multiple AI labs including
OpenAI and xAI, are taking notice of the transition and weighing
the impact on their expensive bets.
"This shift will move us from a world of massive pre-training
clusters toward inference clouds, which are distributed,
cloud-based servers for inference," Sonya Huang, a partner at
Sequoia Capital, told Reuters.
Demand for Nvidia's ( NVDA ) AI chips, which are the most cutting edge,
has fueled its rise to becoming the world's most valuable
company, surpassing Apple ( AAPL ) in October. Unlike training chips,
where Nvidia ( NVDA ) dominates, the chip giant could face more
competition in the inference market.
Asked about the possible impact on demand for its products,
Nvidia ( NVDA ) pointed to recent company presentations on the importance
of the technique behind the o1 model. Its CEO Jensen Huang has
talked about increasing demand for using its chips for
inference.
"We've now discovered a second scaling law, and this is the
scaling law at a time of inference...All of these factors have
led to the demand for Blackwell being incredibly high," Huang
said last month at a conference in India, referring to the
company's latest AI chip.