financetom
Business
financetom
/
Business
/
OpenAI and rivals seek new path to smarter AI as current methods hit limitations
News World Market Environment Technology Personal Finance Politics Retail Business Economy Cryptocurrency Forex Stocks Market Commodities
OpenAI and rivals seek new path to smarter AI as current methods hit limitations
Nov 15, 2024 12:00 PM

*

AI companies face delays and challenges with training new

large

language models

*

Some researchers are focusing on more time for inference

in new

models

*

Shift could impact AI arms race for resources like chips

and

energy

By Krystal Hu and Anna Tong

Nov 11 (Reuters) - Artificial intelligence companies

like OpenAI are seeking to overcome unexpected delays and

challenges in the pursuit of ever-bigger large language models

by developing training techniques that use more human-like ways

for algorithms to "think".

A dozen AI scientists, researchers and investors told Reuters

they believe that these techniques, which are behind OpenAI's

recently released o1 model, could reshape the AI arms race, and

have implications for the types of resources that AI companies

have an insatiable demand for, from energy to types of chips.

OpenAI declined to comment for this story. After the release of

the viral ChatGPT chatbot two years ago, technology companies,

whose valuations have benefited greatly from the AI boom, have

publicly maintained that "scaling up" current models through

adding more data and computing power will consistently lead to

improved AI models.

But now, some of the most prominent AI scientists are speaking

out on the limitations of this "bigger is better" philosophy.

Ilya Sutskever, co-founder of AI labs Safe Superintelligence

(SSI) and OpenAI, told Reuters recently that results from

scaling up pre-training - the phase of training an AI model that

uses a vast amount of unlabeled data to understand language

patterns and structures - have plateaued.

Sutskever is widely credited as an early advocate of achieving

massive leaps in generative AI advancement through the use of

more data and computing power in pre-training, which eventually

created ChatGPT. Sutskever left OpenAI earlier this year to

found SSI.

"The 2010s were the age of scaling, now we're back in the age of

wonder and discovery once again. Everyone is looking for the

next thing," Sutskever said. "Scaling the right thing matters

more now than ever."

Sutskever declined to share more details on how his team is

addressing the issue, other than saying SSI is working on an

alternative approach to scaling up pre-training.

Behind the scenes, researchers at major AI labs have been

running into delays and disappointing outcomes in the race to

release a large language model that outperforms OpenAI's GPT-4

model, which is nearly two years old, according to three sources

familiar with private matters.

The so-called 'training runs' for large models can cost tens of

millions of dollars by simultaneously running hundreds of chips.

They are more likely to have hardware-induced failure given how

complicated the system is; researchers may not know the eventual

performance of the models until the end of the run, which can

take months.

Another problem is large language models gobble up huge amounts

of data, and AI models have exhausted all the easily accessible

data in the world. Power shortages have also hindered the

training runs, as the process requires vast amounts of energy.

To overcome these challenges, researchers are exploring

"test-time compute," a technique that enhances existing AI

models during the so-called "inference" phase, or when the model

is being used. For example, instead of immediately choosing a

single answer, a model could generate and evaluate multiple

possibilities in real-time, ultimately choosing the best path

forward.

This method allows models to dedicate more processing power to

challenging tasks like math or coding problems or complex

operations that demand human-like reasoning and decision-making.

"It turned out that having a bot think for just 20 seconds in a

hand of poker got the same boosting performance as scaling up

the model by 100,000x and training it for 100,000 times longer,"

said Noam Brown, a researcher at OpenAI who worked on o1, at TED

AI conference in San Francisco last month.

OpenAI has embraced this technique in their newly released model

known as "o1," formerly known as Q* and Strawberry, which

Reuters first reported in July. The O1 model can "think" through

problems in a multi-step manner, similar to human reasoning. It

also involves using data and feedback curated from PhDs and

industry experts. The secret sauce of the o1 series is another

set of training carried out on top of 'base' models like GPT-4,

and the company says it plans to apply this technique with more

and bigger base models.

At the same time, researchers at other top AI labs, from

Anthropic, xAI, and Google DeepMind, have also been working to

develop their own versions of the technique, according to five

people familiar with the efforts.

"We see a lot of low-hanging fruit that we can go pluck to

make these models better very quickly," said Kevin Weil, chief

product officer at OpenAI at a tech conference in October. "By

the time people do catch up, we're going to try and be three

more steps ahead."

Google and xAI did not respond to requests for comment and

Anthropic had no immediate comment.

The implications could alter the competitive landscape for

AI hardware, thus far dominated by insatiable demand for

Nvidia's ( NVDA ) AI chips. Prominent venture capital investors, from

Sequoia to Andreessen Horowitz, who have poured billions to fund

expensive development of AI models at multiple AI labs including

OpenAI and xAI, are taking notice of the transition and weighing

the impact on their expensive bets.

"This shift will move us from a world of massive pre-training

clusters toward inference clouds, which are distributed,

cloud-based servers for inference," Sonya Huang, a partner at

Sequoia Capital, told Reuters.

Demand for Nvidia's ( NVDA ) AI chips, which are the most cutting edge,

has fueled its rise to becoming the world's most valuable

company, surpassing Apple ( AAPL ) in October. Unlike training chips,

where Nvidia ( NVDA ) dominates, the chip giant could face more

competition in the inference market.

Asked about the possible impact on demand for its products,

Nvidia ( NVDA ) pointed to recent company presentations on the importance

of the technique behind the o1 model. Its CEO Jensen Huang has

talked about increasing demand for using its chips for

inference.

"We've now discovered a second scaling law, and this is the

scaling law at a time of inference...All of these factors have

led to the demand for Blackwell being incredibly high," Huang

said last month at a conference in India, referring to the

company's latest AI chip.

Comments
Welcome to financetom comments! Please keep conversations courteous and on-topic. To fosterproductive and respectful conversations, you may see comments from our Community Managers.
Sign up to post
Sort by
Show More Comments
Related Articles >
Talos Energy close to naming new CEO, identifies candidate
Talos Energy close to naming new CEO, identifies candidate
Jan 6, 2025
Jan 6 (Reuters) - U.S. oil and gas company Talos Energy ( TALO ) said on Monday it has identified a candidate to serve as the company's CEO and is in the final stages of the selection process. The company also said its interim chief executive officer, Joseph Mills, has stepped down effective immediately and added that the departure was...
KLA Unusual Options Activity For January 06
KLA Unusual Options Activity For January 06
Jan 6, 2025
Deep-pocketed investors have adopted a bullish approach towards KLA (NASDAQ:KLAC), and it's something market players shouldn't ignore. Our tracking of public options records at Benzinga unveiled this significant move today. The identity of these investors remains unknown, but such a substantial move in KLAC usually suggests something big is about to happen. We gleaned this information from our observations today...
Aeon Biopharma Shares Fall After Pricing of $20 Million Public Offering
Aeon Biopharma Shares Fall After Pricing of $20 Million Public Offering
Jan 6, 2025
10:31 AM EST, 01/06/2025 (MT Newswires) -- Aeon Biopharma's ( AEON ) shares fell more than 66% in recent Monday trading after the company priced a firm commitment underwritten public offering to raise gross proceeds of $20 million. The offering includes 40 million common units priced at $0.50 each, with each unit comprised one share or a pre-funded warrant, along...
Luna Innovations Securities to be Delisted from Nasdaq
Luna Innovations Securities to be Delisted from Nasdaq
Jan 6, 2025
10:36 AM EST, 01/06/2025 (MT Newswires) -- Luna Innovations ( LUNA ) said its securities will be suspended from trading on the Nasdaq on Tuesday and subsequently delisted from the exchange. The suspension and delisting followed the company's inability to meet the deadline to submit previously disclosed delinquent filings with the US Securities and Exchange Commission. The share after expected...
Copyright 2023-2026 - www.financetom.com All Rights Reserved