financetom
Business
financetom
/
Business
/
OpenAI and rivals seek new path to smarter AI as current methods hit limitations
News World Market Environment Technology Personal Finance Politics Retail Business Economy Cryptocurrency Forex Stocks Market Commodities
OpenAI and rivals seek new path to smarter AI as current methods hit limitations
Nov 15, 2024 12:00 PM

*

AI companies face delays and challenges with training new

large

language models

*

Some researchers are focusing on more time for inference

in new

models

*

Shift could impact AI arms race for resources like chips

and

energy

By Krystal Hu and Anna Tong

Nov 11 (Reuters) - Artificial intelligence companies

like OpenAI are seeking to overcome unexpected delays and

challenges in the pursuit of ever-bigger large language models

by developing training techniques that use more human-like ways

for algorithms to "think".

A dozen AI scientists, researchers and investors told Reuters

they believe that these techniques, which are behind OpenAI's

recently released o1 model, could reshape the AI arms race, and

have implications for the types of resources that AI companies

have an insatiable demand for, from energy to types of chips.

OpenAI declined to comment for this story. After the release of

the viral ChatGPT chatbot two years ago, technology companies,

whose valuations have benefited greatly from the AI boom, have

publicly maintained that "scaling up" current models through

adding more data and computing power will consistently lead to

improved AI models.

But now, some of the most prominent AI scientists are speaking

out on the limitations of this "bigger is better" philosophy.

Ilya Sutskever, co-founder of AI labs Safe Superintelligence

(SSI) and OpenAI, told Reuters recently that results from

scaling up pre-training - the phase of training an AI model that

uses a vast amount of unlabeled data to understand language

patterns and structures - have plateaued.

Sutskever is widely credited as an early advocate of achieving

massive leaps in generative AI advancement through the use of

more data and computing power in pre-training, which eventually

created ChatGPT. Sutskever left OpenAI earlier this year to

found SSI.

"The 2010s were the age of scaling, now we're back in the age of

wonder and discovery once again. Everyone is looking for the

next thing," Sutskever said. "Scaling the right thing matters

more now than ever."

Sutskever declined to share more details on how his team is

addressing the issue, other than saying SSI is working on an

alternative approach to scaling up pre-training.

Behind the scenes, researchers at major AI labs have been

running into delays and disappointing outcomes in the race to

release a large language model that outperforms OpenAI's GPT-4

model, which is nearly two years old, according to three sources

familiar with private matters.

The so-called 'training runs' for large models can cost tens of

millions of dollars by simultaneously running hundreds of chips.

They are more likely to have hardware-induced failure given how

complicated the system is; researchers may not know the eventual

performance of the models until the end of the run, which can

take months.

Another problem is large language models gobble up huge amounts

of data, and AI models have exhausted all the easily accessible

data in the world. Power shortages have also hindered the

training runs, as the process requires vast amounts of energy.

To overcome these challenges, researchers are exploring

"test-time compute," a technique that enhances existing AI

models during the so-called "inference" phase, or when the model

is being used. For example, instead of immediately choosing a

single answer, a model could generate and evaluate multiple

possibilities in real-time, ultimately choosing the best path

forward.

This method allows models to dedicate more processing power to

challenging tasks like math or coding problems or complex

operations that demand human-like reasoning and decision-making.

"It turned out that having a bot think for just 20 seconds in a

hand of poker got the same boosting performance as scaling up

the model by 100,000x and training it for 100,000 times longer,"

said Noam Brown, a researcher at OpenAI who worked on o1, at TED

AI conference in San Francisco last month.

OpenAI has embraced this technique in their newly released model

known as "o1," formerly known as Q* and Strawberry, which

Reuters first reported in July. The O1 model can "think" through

problems in a multi-step manner, similar to human reasoning. It

also involves using data and feedback curated from PhDs and

industry experts. The secret sauce of the o1 series is another

set of training carried out on top of 'base' models like GPT-4,

and the company says it plans to apply this technique with more

and bigger base models.

At the same time, researchers at other top AI labs, from

Anthropic, xAI, and Google DeepMind, have also been working to

develop their own versions of the technique, according to five

people familiar with the efforts.

"We see a lot of low-hanging fruit that we can go pluck to

make these models better very quickly," said Kevin Weil, chief

product officer at OpenAI at a tech conference in October. "By

the time people do catch up, we're going to try and be three

more steps ahead."

Google and xAI did not respond to requests for comment and

Anthropic had no immediate comment.

The implications could alter the competitive landscape for

AI hardware, thus far dominated by insatiable demand for

Nvidia's ( NVDA ) AI chips. Prominent venture capital investors, from

Sequoia to Andreessen Horowitz, who have poured billions to fund

expensive development of AI models at multiple AI labs including

OpenAI and xAI, are taking notice of the transition and weighing

the impact on their expensive bets.

"This shift will move us from a world of massive pre-training

clusters toward inference clouds, which are distributed,

cloud-based servers for inference," Sonya Huang, a partner at

Sequoia Capital, told Reuters.

Demand for Nvidia's ( NVDA ) AI chips, which are the most cutting edge,

has fueled its rise to becoming the world's most valuable

company, surpassing Apple ( AAPL ) in October. Unlike training chips,

where Nvidia ( NVDA ) dominates, the chip giant could face more

competition in the inference market.

Asked about the possible impact on demand for its products,

Nvidia ( NVDA ) pointed to recent company presentations on the importance

of the technique behind the o1 model. Its CEO Jensen Huang has

talked about increasing demand for using its chips for

inference.

"We've now discovered a second scaling law, and this is the

scaling law at a time of inference...All of these factors have

led to the demand for Blackwell being incredibly high," Huang

said last month at a conference in India, referring to the

company's latest AI chip.

Comments
Welcome to financetom comments! Please keep conversations courteous and on-topic. To fosterproductive and respectful conversations, you may see comments from our Community Managers.
Sign up to post
Sort by
Show More Comments
Related Articles >
SJVN secures 200-MW wind power project at ₹3.24 per unit
SJVN secures 200-MW wind power project at ₹3.24 per unit
Nov 16, 2023
Projected to generate 482 million units in its inaugural year post-commissioning, the cumulative energy generation over a 25-year span is anticipated to reach 12,050 million units. Shares of SJVN Ltd ended at ₹75.17, down by ₹0.50, or 0.66%, on the BSE.
This sustainable jewellery brand is luring some women away from gold
This sustainable jewellery brand is luring some women away from gold
Oct 30, 2023
Aulerth's offerings range from ₹5,000 to as high as ₹2.8 lakh. Are women willing to spend this much on jewellery made from scrap? Founder and CEO Vivek Ramabhadran definitely believes so. Aulerth produces couture-inspired pieces in association with designers like JJ Valaya, Suneet Varma, among others. It has reported 33% repeat customers in the past year and expects a spike to 40% soon.
Suzlon's S144–3 MW wind turbines get big boost from Indian government
Suzlon's S144–3 MW wind turbines get big boost from Indian government
Nov 15, 2023
Th Suzlon wind turbines received the RLMM (Revised List of Models & Manufacturers) listing from the Ministry of New and Renewable Energy, marking an important milestone for the successful commercialisation of the product. Shares of Suzlon Energy Ltd ended at ₹40.49, up by ₹1.85, or 4.79%, on the BSE.
Tata Power Renewable Energy wins 200-MW project in collaboration with SJVN
Tata Power Renewable Energy wins 200-MW project in collaboration with SJVN
Nov 28, 2023
The firm and dispatchable renewable energy (FDRE) project, designed with a hybrid of solar, wind, and battery storage, is aimed at providing a stable and dispatchable energy supply during peak hours. Shares of Tata Power Company Ltd ended at ₹270.75, up by ₹12.60, or 4.88%, on the BSE.
Copyright 2023-2026 - www.financetom.com All Rights Reserved