NEW YORK, June 26 (Reuters) - OkSeven content-licensing
sellers of music, image, video and other datasets for use in
training artificial intelligence systems have formed the
sector's first trade group, they said on Wednesday.
The Dataset Providers Alliance (DPA) will advocate for
"ethical data sourcing" in the training of AI systems, including
rights for people depicted in datasets and the protection of
content owners' intellectual property rights, the companies said
in a statement.
Founding members include U.S. music dataset company
Rightsify, image licensing service vAIsual, Japanese stock photo
provider Pixta and Germany-based data marketplace Datarade.
The emergence of generative AI technologies that can mimic
human creativity in recent years has triggered an outcry from
content creators and a string of copyright lawsuits against tech
companies like Google, Meta and ChatGPT maker
OpenAI, which is backed by Microsoft ( MSFT ).
Developers have been training models by feeding them vast
quantities of content, much of it scraped from the internet for
free without the consent of those who created the works or own
rights to them.
Tech companies, which claim the usage is legal, are also
quietly paying for access to private collections of content both
to fulfill needs for particular types of data and to hedge
against legal and regulatory risks.
The prospect that demand for licensed data will grow if
copyright owners prevail in their legal fights has prompted the
emergence of a nascent industry of companies that package
content and sell access to it for use by AI systems.
As a result, groups have been formed to establish ethical
standards for that trade, like Fairly Trained, a non-profit
founded this year which certifies models that have not used
copyrighted materials without a license.
The DPA targets the content of those transactions,
requiring, for example, that its members agree not to sell text
data obtained by crawling the web or audio that features
people's voices without their explicit consent.
A heavy focus will be to push for legislation like the NO
FAKES Act, a U.S. bill introduced last year to create penalties
for generating unauthorized digital replicas of people's voices
or likenesses, said Alex Bestall, CEO of Rightsify and its
licensing subsidiary GCX, who led the founding of the group.
"Advocacy will be a big part of it because everyone's taken
their positions on AI and copyright, but a lot of these battles
are yet to be solved and it's going to take a while for them to
be," said Bestall.
The DPA also will press for more training data transparency
requirements like those in the European Union's AI Act and a
similar U.S. bill introduced in April, the Generative AI
Copyright Disclosure Act, he added.
The group plans to publish a white paper outlining its
positions in July, he said.