financetom
Business
financetom
/
Business
/
AI experts ready 'Humanity's Last Exam' to stump powerful tech
News World Market Environment Technology Personal Finance Politics Retail Business Economy Cryptocurrency Forex Stocks Market Commodities
AI experts ready 'Humanity's Last Exam' to stump powerful tech
Sep 16, 2024 12:22 PM

(Reuters) - A team of technology experts issued a global call on Monday seeking the toughest questions to pose to artificial intelligence systems, which increasingly have handled popular benchmark tests like child's play.

Dubbed "Humanity's Last Exam," the project seeks to determine when expert-level AI has arrived. It aims to stay relevant even as capabilities advance in future years, according to the organizers, a non-profit called the Center for AI Safety (CAIS) and the startup Scale AI.

The call comes days after the maker of ChatGPT previewed a new model, known as OpenAI o1, which "destroyed the most popular reasoning benchmarks," said Dan Hendrycks, executive director of CAIS and an advisor to Elon Musk's xAI startup.

Hendrycks co-authored two 2021 papers that proposed tests of AI systems that are now widely used, one quizzing them on undergraduate-level knowledge of topics like U.S. history, the other probing models' ability to reason through competition-level math. The undergraduate-style test has more downloads from the online AI hub Hugging Face than any such dataset.

At the time of those papers, AI was giving almost random answers to questions on the exams. "They're now crushed," Hendrycks told Reuters.

As one example, the Claude models from the AI lab Anthropic have gone from scoring about 77% on the undergraduate-level test in 2023, to nearly 89% a year later, according to a prominent capabilities leaderboard.

These common benchmarks have less meaning as a result.

AI has appeared to score poorly on lesser-used tests involving plan formulation and visual pattern-recognition puzzles, according to Stanford University's AI Index Report from April. OpenAI o1 scored around 21% on one version of the pattern-recognition ARC-AGI test, for instance, the ARC organizers said on Friday.

Some AI researchers argue that results like this show planning and abstract reasoning to be better measures of intelligence, though Hendrycks said the visual aspect of ARC makes it less suited to assessing language models. "Humanity's Last Exam" will require abstract reasoning, he said.

Answers from common benchmarks may also have ended up in data used to train AI systems, industry observers have said. Hendrycks said some questions on "Humanity's Last Exam" will remain private to make sure AI systems' answers are not from memorization.

The exam will include at least 1,000 crowd-sourced questions due November 1 that are hard for non-experts to answer. These will undergo peer review, with winning submissions offered co-authorship and up to $5,000 prizes sponsored by Scale AI.

"We desperately need harder tests for expert-level models to measure the rapid progress of AI," said Alexandr Wang, Scale's CEO.

One restriction: the organizers want no questions about weapons, which some say would be too dangerous for AI to study.

Comments
Welcome to financetom comments! Please keep conversations courteous and on-topic. To fosterproductive and respectful conversations, you may see comments from our Community Managers.
Sign up to post
Sort by
Show More Comments
Related Articles >
Market Chatter: US Bill on AI Exports Curbs Clears House Committee
Market Chatter: US Bill on AI Exports Curbs Clears House Committee
May 22, 2024
03:59 PM EDT, 05/22/2024 (MT Newswires) -- A bill that would make it easier for President Joe Biden to restrict the export of artificial intelligence systems cleared the House Foreign Affairs Committee on Wednesday, Reuters reported. The lawmakers voted to advance the bill seeking to address concerns that China could acquire AI systems for their technological advancement, particularly in military...
Market Chatter: Wyndham Hotels & Resorts' Check-In Systems in At Least 3 Locations Compromised by Spyware
Market Chatter: Wyndham Hotels & Resorts' Check-In Systems in At Least 3 Locations Compromised by Spyware
May 22, 2024
03:59 PM EDT, 05/22/2024 (MT Newswires) -- At least three Wyndham Hotels & Resorts ( WH ) locations in the US had consumer-grade pcTattletale spyware running on their check-in systems, TechCrunch reported Wednesday, citing security researcher Eric Daigle. The application secretly and repeatedly took screenshots of the hotel booking systems and the information is available to anyone on the internet,...
Celsius Holdings Unusual Options Activity For May 22
Celsius Holdings Unusual Options Activity For May 22
May 22, 2024
Deep-pocketed investors have adopted a bullish approach towards Celsius Holdings ( CELH ) , and it's something market players shouldn't ignore. Our tracking of public options records at Benzinga unveiled this significant move today. The identity of these investors remains unknown, but such a substantial move in CELH usually suggests something big is about to happen. We gleaned this information...
US lawmakers urge Justice Department to probe climate deception by Big Oil
US lawmakers urge Justice Department to probe climate deception by Big Oil
May 22, 2024
May 22 (Reuters) - U.S. lawmakers behind a congressional probe of major oil companies on Wednesday called on the Justice Department to investigate whether the industry deceived the public about fossil fuels' impact on climate change. Two Democrats, Senator Sheldon Whitehouse and Representative Jamie Raskin, outlined the findings of a nearly three-year investigation into Big Oil and urged the agency...
Copyright 2023-2026 - www.financetom.com All Rights Reserved