The first wave of the main generative AI tools were largely trained on “publicly available” data – basically anything and everything that could be pulled from the internet. Now, training data sources are increasingly restricting access and insisting on license agreements. With increased demand for additional data sources, new licensing startups have emerged to keep the source material flowing.
The Data Providers Alliance, a trade group formed this summer, wants to make the AI industry more standardized and fair. To that end, it has just released a position paper outlining its positions on key AI issues. The alliance consists of seven AI licensing companies, including music copyright management firm Rightsify, Japanese stock photo marketplace Pixta and AI generation copyright licensing startup Calliope Networks. (At least five new members will be announced in the fall.)
The DPA advocates an opt-in system, meaning that data can only be used with the express consent of creators and rights holders. This represents a significant departure from the way most large AI companies operate. Some have developed their own opt-out systems that place the burden on data owners to withdraw their work on a case-by-case basis. Others do not offer any opt-out options.
The DPA, which expects members to adhere to its opt-in rule, sees this route as much more ethical. “Artists and creators need to get on board,” said Alex Bestall, CEO of Rightsify and the music data licensing company Global Copyright Exchange, which spearheaded the effort. Bestall sees inclusion as a pragmatic approach as well as a moral one: “Selling publicly available datasets is one way to be judged and not trusted.”
Ed Newton-Rex, a former AI executive who now runs the ethical AI nonprofit Fairly Trained, called the waivers “fundamentally unfair to creators,” adding that some may not even know when waivers are offered. “It’s particularly good to see the DPA calling for inclusion,” he says.
Shayne Longpre, lead at the Data Provenance Initiative, a volunteer collective that audits AI datasets, sees the DPA’s efforts to source data ethically as admirable, although he suspects the inclusion standard may be a tough sell due to the sheer volume of data required by most modern AI models. “You’re either going to run out of data or you’re going to pay a lot,” he says. “It may turn out that only a few players, big tech companies, can afford to license all this data.”