*
AI companies face delays and challenges with training new
large
language models
*
Some researchers are focusing on more time for inference
in new
models
*
Shift could impact AI arms race for resources like chips
and
energy
By Krystal Hu, Anna Tong
Nov 11 (Reuters) -
Artificial intelligence companies like OpenAI are seeking to
overcome unexpected delays and challenges in the pursuit of
ever-bigger by developing training techniques that use more
human-like ways for algorithms to "think".
A , researchers and
that these techniques, which are behind OpenAI's recently
released o1 model, could reshape the AI arms race, and have
implications for the types of resources that AI companies have
an insatiable demand for,
But now, some of the most prominent AI scientists are
speaking out on the limitations of this "bigger is better"
philosophy.
Ilya Sutskever, co-founder of AI labs Safe Superintelligence
(SSI) and OpenAI, told Reuters recently that results from
scaling up pre-training
- the phase of
training an AI model that use
s
a vast amount of unlabeled data to understand language
patterns and structures
-
have plateaued.
Sutskever
is widely credited as an early advocate of achieving massive
leaps in generative AI advancement through t
which eventually created ChatGPT. Sutskever left OpenAI
earlier this year to found SSI.
"The 2010s were the age of scaling, now we're back in the
age of wonder and discovery once again. Everyone is looking for
the next thing," Sutskever said. "Scaling the right thing
matters more now than ever."
Sutskever declined to share more details on how his team is
addressing the issue, other than saying SSI is working on an
alternative approach to scaling up pre-training.
Behind the scenes, researchers at major AI labs have been
running into delays and disappointing outcomes in the race to
release a large language model that outperforms OpenAI's GPT-4
model, which is nearly two years old, according to
sources familiar with private matters.
The so-called 'training runs' for large models
ost tens of millions of dollars by simultaneously running
hundreds of chips. They are more likely to have hardware-induced
failure given how complicated the system is; researchers may not
know the eventual performance of the models until the end of the
run, which can take months.
Another problem is large language models gobble up huge
amounts of data, and AI models have
all the n the world. Power shortages have also hindered the
training runs, as the process requires vast amounts of energy.
To overcome these challenges, researchers are exploring
"test-time compute," a technique that enhances existing AI
models during the so-called "inference" phase, or when the model
is being used. For example, instead of immediately choosing a
single answer, a model could generate and evaluate multiple
possibilities in real-time, ultimately choosing the best path
forward.
This method
allows models to dedicate more processing power to
challenging tasks like math or coding problems or complex
operations that demand human-like reasoning and decision-making.
"It turned out that having a bot think for just 20 seconds
in a hand of poker got the same boosting performance as scaling
up the model by 100,000x and training it for 100,000 times
longer," said Noam Brown, a researcher at OpenAI who worked on
o1, at TED AI conference in San Francisco last month.
OpenAI has embraced this technique in their newly released
model known as "o1," formerly known as Q* and
Strawberry
, which Reuters first reported in July. The O1 model can
"think" through problems in a multi-step manner, similar to
human reasoning. It also involves using data and feedback
curated from
PhDs and industry expert
s
. The secret sauce of the o1 series is another set of
training carried out on top of 'base' models like GPT-4, and the
company says it plans to apply this technique with more and
bigger base models.
At the same time, researchers at other top AI labs, from
Anthropic, xAI, and Google DeepMind, have also been working to
develop their own versions of the technique,
people familiar with the efforts.
"W
e see a lot of low-hanging fruit that we can go pluck to
make these models better very quickly," said Kevin Weil, chief
product officer at OpenAI at a tech conference in October. "By
the time people do catch up, we're going to try and be three
more steps ahead."
Google and xAI did not respond to requests for comment and
Anthropic had no immediate comment.
The implications could alter the competitive landscape for
AI hardware, thus far dominated by insatiable demand for
Nvidia's ( NVDA ) AI chips. Prominent venture capital investors, from
Sequoia to Andreessen Horowitz, who have poured billions to fund
expensive development of AI models at multiple AI labs including
OpenAI and xAI, are taking notice of the transition and weighing
the impact on their expensive bets.
"This shift will move us from a world of massive pre-training
clusters toward inference clouds, which are distributed,
cloud-based servers for inference," Sonya Huang, a partner at
Sequoia Capital, told Reuters.
Demand for Nvidia's ( NVDA ) AI chips, which are the most cutting edge,
has fueled its rise to becoming the world's most valuable
company, surpassing Apple ( AAPL ) in October. Unlike training chips,
where Nvidia ( NVDA ) dominates, the chip giant could face more
competition in the inference market.
Its CEO Jensen Huang has talked about increasing demand for
using its chips for inference.
"We've now discovered a second scaling law, and this is the
scaling law at a time of inference...All of these factors have
led to the demand for Blackwell being incredibly high," Huang
said last month at a conference in India, referring to the
company's latest AI chip.