*
AI models now require trainers with advanced degrees
*
Invisible Tech employs 5,000 specialized trainers globally
*
It takes smart humans to avoid hallucinations in AI
By Supantha Mukherjee and Anna Tong
STOCKHOLM/SAN FRANCISCO, Sept 28 (Reuters) - In the
early years, getting AI models like ChatGPT or its rival Cohere
to spit out human-like responses required vast teams of low-cost
workers helping models distinguish basic facts such as if an
image was of a car or a carrot.
But more sophisticated updates to AI models in the fiercely
competitive arena are now demanding a rapidly expanding network
of human trainers who have specialized knowledge -- from
historians to scientists, some with doctorate degrees.
"A year ago, we could get away with hiring undergraduates, to
just generally teach AI on how to improve," said Cohere
co-founder Ivan Zhang, talking about its internal human
trainers.
"Now we have licensed physicians teaching the models how to
behave in medical environments, or financial analysts or
accountants."
For more training, Cohere, which was last valued at over $5
billion, works with a startup called Invisible Tech. Cohere is
one of the main rivals of OpenAI and specializes in AI for
businesses.
The startup Invisible Tech employs thousands of trainers,
working remotely, and has become one of the main partners of AI
companies ranging from AI21 to Microsoft ( MSFT ) to train their AI
models to reduce errors, known in the AI world as
hallucinations.
"We have 5,000 people in over 100 countries around the world
that are PhDs, Master's degree holders and knowledge work
specialists," said Invisible founder Francis Pedraza.
Invisible pays as much as $40 per hour, depending on the
location of the worker and the complexity of work. Some
companies such as Outlier pay up to $50 per hour, while another
company called Labelbox said it pays up to $200 per hour for
"high expertise" subjects like quantum physics, but starts with
$15 for basic topics.
Invisible was founded in 2015 as a workflow automation company
catering to the likes of food delivery company DoorDash ( DASH ) to
digitize their delivery menu. But things changed when a
relatively unknown research firm called OpenAI contacted them in
the spring of 2022, ahead of the public launch of ChatGPT.
"OpenAI came to us with a problem, which is that when you
were asking an early version of ChatGPT a question, it was going
to hallucinate. You couldn't trust the answer," Pedraza told
Reuters.
"They needed an advanced AI training partner to provide
reinforcement learning with human feedback."
OpenAI did not respond to request for comment.
Generative AI produces new content based on past data used to
train it. However, sometimes it can't distinguish between true
and false information and generates false outputs known as
hallucinations. In one notable example, in 2023 a Google chatbot
shared inaccurate information about which satellite first took
pictures of a planet outside the Earth's solar system in a
promotional video.
AI companies are aware that hallucinations can derail
GenAI's attractiveness to businesses and are trying various ways
to reduce it, including using human trainers to teach the
concept of fact and fiction.
Since getting onboard with OpenAI, Invisible says it has become
AI training partners to most of the GenAI companies, including
Cohere, AI21 and Microsoft ( MSFT ). Cohere and AI21 confirmed they are
clients. Microsoft ( MSFT ) did not confirm it is a client of Invisible.
"These are all companies that had training challenges, where
their number one cost was compute power, and then the number two
cost is quality training," Pedraza said.
HOW DOES IT WORK?
OpenAI, which started off the frenzy around GenAI, has a
team of researchers aptly named "Human Data Team" that works
with AI trainers to gather specialized data for training its
models like ChatGPT.
OpenAI researchers come up with various experiments like
reducing hallucinations or to improve writing style and work
with AI trainers from Invisible and other vendors, a source
familiar with the company's processes said.
At any point, dozens of experiments are being run, some with
tools developed by OpenAI and others by tools of vendors, the
person said.
Based on what the AI companies want - from getting better at
Swedish history or doing financial modeling - Invisible hires
workers with relevant degrees for those projects, reducing the
burden of managing hundreds of trainers by the AI companies.
"OpenAI has some of the most incredible computer scientists in
the world but they're not necessarily an expert in Swedish
history or chemistry questions or biology questions or anything
you can ask it," Pedraza said, adding that over 1,000 contract
workers cater to OpenAI alone.
Cohere's Zhang said he has personally used Invisible's
trainers to find a way to teach its GenAI model to find relevant
information from a big data set.
COMPETITION
Among the competitors in this space is Scale AI, a private
start-up last valued at $14 billion which provides AI companies
with sets of training data. It has also ventured into the area
of providing AI trainers, and counts OpenAI as a customer.
Scale AI did not respond to requests for an interview for this
story.
Invisible, which has been profitable since 2021, has raised only
$8 million of primary capital,
"We are 70% owned by the team, and only 30% owned by investors,"
Pedraza said. "We do facilitate secondary rounds, and the most
recent traded price was at a half a billion dollar valuation."
Reuters could not confirm that valuation.
Human trainers first got into AI training through data-labelling
work that required less qualification and was also paid less,
sometimes as low as $2, mostly done by people in African and
Asian countries.
As AI companies launch more advanced models, the demand for
specialized trainers and across dozens of languages is on the
rise, creating a well-paid niche where workers from a variety of
subjects could become AI trainers without even knowing how to
code.
Demand from AI companies is leading to the creation of more
companies that are offering similar services.
"My inbox is basically inundated with new firms that pop up here
and there. I do see this as a new space where companies hire
humans just to create data for AI labs like us," Zhang said.