financetom
Business
financetom
/
Business
/
OpenAI and rivals seek new path to smarter AI as current methods hit limitations
News World Market Environment Technology Personal Finance Politics Retail Business Economy Cryptocurrency Forex Stocks Market Commodities
OpenAI and rivals seek new path to smarter AI as current methods hit limitations
Nov 11, 2024 2:29 AM

*

AI companies face delays and challenges with training new

large

language models

*

Some researchers are focusing on more time for inference

in new

models

*

Shift could impact AI arms race for resources like chips

and

energy

By Krystal Hu, Anna Tong

Nov 11 (Reuters) -

Artificial intelligence companies like OpenAI are seeking to

overcome unexpected delays and challenges in the pursuit of

ever-bigger by developing training techniques that use more

human-like ways for algorithms to "think".

A , researchers and

that these techniques, which are behind OpenAI's recently

released o1 model, could reshape the AI arms race, and have

implications for the types of resources that AI companies have

an insatiable demand for,

But now, some of the most prominent AI scientists are

speaking out on the limitations of this "bigger is better"

philosophy.

Ilya Sutskever, co-founder of AI labs Safe Superintelligence

(SSI) and OpenAI, told Reuters recently that results from

scaling up pre-training

- the phase of

training an AI model that use

s

a vast amount of unlabeled data to understand language

patterns and structures

-

have plateaued.

Sutskever

is widely credited as an early advocate of achieving massive

leaps in generative AI advancement through t

which eventually created ChatGPT. Sutskever left OpenAI

earlier this year to found SSI.

"The 2010s were the age of scaling, now we're back in the

age of wonder and discovery once again. Everyone is looking for

the next thing," Sutskever said. "Scaling the right thing

matters more now than ever."

Sutskever declined to share more details on how his team is

addressing the issue, other than saying SSI is working on an

alternative approach to scaling up pre-training.

Behind the scenes, researchers at major AI labs have been

running into delays and disappointing outcomes in the race to

release a large language model that outperforms OpenAI's GPT-4

model, which is nearly two years old, according to

sources familiar with private matters.

The so-called 'training runs' for large models

ost tens of millions of dollars by simultaneously running

hundreds of chips. They are more likely to have hardware-induced

failure given how complicated the system is; researchers may not

know the eventual performance of the models until the end of the

run, which can take months.

Another problem is large language models gobble up huge

amounts of data, and AI models have

all the n the world. Power shortages have also hindered the

training runs, as the process requires vast amounts of energy.

To overcome these challenges, researchers are exploring

"test-time compute," a technique that enhances existing AI

models during the so-called "inference" phase, or when the model

is being used. For example, instead of immediately choosing a

single answer, a model could generate and evaluate multiple

possibilities in real-time, ultimately choosing the best path

forward.

This method

allows models to dedicate more processing power to

challenging tasks like math or coding problems or complex

operations that demand human-like reasoning and decision-making.

"It turned out that having a bot think for just 20 seconds

in a hand of poker got the same boosting performance as scaling

up the model by 100,000x and training it for 100,000 times

longer," said Noam Brown, a researcher at OpenAI who worked on

o1, at TED AI conference in San Francisco last month.

OpenAI has embraced this technique in their newly released

model known as "o1," formerly known as Q* and

Strawberry

, which Reuters first reported in July. The O1 model can

"think" through problems in a multi-step manner, similar to

human reasoning. It also involves using data and feedback

curated from

PhDs and industry expert

s

. The secret sauce of the o1 series is another set of

training carried out on top of 'base' models like GPT-4, and the

company says it plans to apply this technique with more and

bigger base models.

At the same time, researchers at other top AI labs, from

Anthropic, xAI, and Google DeepMind, have also been working to

develop their own versions of the technique,

people familiar with the efforts.

"W

e see a lot of low-hanging fruit that we can go pluck to

make these models better very quickly," said Kevin Weil, chief

product officer at OpenAI at a tech conference in October. "By

the time people do catch up, we're going to try and be three

more steps ahead."

Google and xAI did not respond to requests for comment and

Anthropic had no immediate comment.

The implications could alter the competitive landscape for

AI hardware, thus far dominated by insatiable demand for

Nvidia's ( NVDA ) AI chips. Prominent venture capital investors, from

Sequoia to Andreessen Horowitz, who have poured billions to fund

expensive development of AI models at multiple AI labs including

OpenAI and xAI, are taking notice of the transition and weighing

the impact on their expensive bets.

"This shift will move us from a world of massive pre-training

clusters toward inference clouds, which are distributed,

cloud-based servers for inference," Sonya Huang, a partner at

Sequoia Capital, told Reuters.

Demand for Nvidia's ( NVDA ) AI chips, which are the most cutting edge,

has fueled its rise to becoming the world's most valuable

company, surpassing Apple ( AAPL ) in October. Unlike training chips,

where Nvidia ( NVDA ) dominates, the chip giant could face more

competition in the inference market.

Its CEO Jensen Huang has talked about increasing demand for

using its chips for inference.

"We've now discovered a second scaling law, and this is the

scaling law at a time of inference...All of these factors have

led to the demand for Blackwell being incredibly high," Huang

said last month at a conference in India, referring to the

company's latest AI chip.

Comments
Welcome to financetom comments! Please keep conversations courteous and on-topic. To fosterproductive and respectful conversations, you may see comments from our Community Managers.
Sign up to post
Sort by
Show More Comments
Related Articles >
Tata Power Renewable Energy wins 200-MW project in collaboration with SJVN
Tata Power Renewable Energy wins 200-MW project in collaboration with SJVN
Nov 28, 2023
The firm and dispatchable renewable energy (FDRE) project, designed with a hybrid of solar, wind, and battery storage, is aimed at providing a stable and dispatchable energy supply during peak hours. Shares of Tata Power Company Ltd ended at ₹270.75, up by ₹12.60, or 4.88%, on the BSE.
Suzlon's S144–3 MW wind turbines get big boost from Indian government
Suzlon's S144–3 MW wind turbines get big boost from Indian government
Nov 15, 2023
Th Suzlon wind turbines received the RLMM (Revised List of Models & Manufacturers) listing from the Ministry of New and Renewable Energy, marking an important milestone for the successful commercialisation of the product. Shares of Suzlon Energy Ltd ended at ₹40.49, up by ₹1.85, or 4.79%, on the BSE.
This sustainable jewellery brand is luring some women away from gold
This sustainable jewellery brand is luring some women away from gold
Oct 30, 2023
Aulerth's offerings range from ₹5,000 to as high as ₹2.8 lakh. Are women willing to spend this much on jewellery made from scrap? Founder and CEO Vivek Ramabhadran definitely believes so. Aulerth produces couture-inspired pieces in association with designers like JJ Valaya, Suneet Varma, among others. It has reported 33% repeat customers in the past year and expects a spike to 40% soon.
SJVN secures 200-MW wind power project at ₹3.24 per unit
SJVN secures 200-MW wind power project at ₹3.24 per unit
Nov 16, 2023
Projected to generate 482 million units in its inaugural year post-commissioning, the cumulative energy generation over a 25-year span is anticipated to reach 12,050 million units. Shares of SJVN Ltd ended at ₹75.17, down by ₹0.50, or 0.66%, on the BSE.
Copyright 2023-2026 - www.financetom.com All Rights Reserved