Last updated on March 23rd, 2024 at 06:50 am

The CEO of an image library voices concerns about the use of their material for AI training data, sparking anger

Getty Images CEO Craig Peters has urged Rishi Sunak to make a crucial decision: support the UK’s creative industries or wager heavily on the potential of artificial intelligence. Peters made these remarks amid increasing discontent within the creative and media sectors regarding the use of their content as “training data” for AI companies. Getty Images is currently engaged in legal action against an AI image generator in both the UK and the US for copyright infringement.

Peters emphasized the importance of the creative industries to the UK, accounting for approximately 10% of its GDP, including movies, music, and television. He expressed concern over the risk involved in prioritizing AI, which currently contributes less than a quarter of a percentage point to the UK’s GDP, a significantly smaller share than the creative industries. Peters finds this trade-off puzzling.

In 2023, the government outlined its objective to address the challenges faced by AI firms and users regarding the use of copyrighted material, following a consultation by the intellectual property office. It also pledged to assist AI companies in accessing copyrighted work for their models.

This stance marked a shift from an earlier proposal for a broad copyright exception for text and data mining. In response to a Commons committee on Thursday, Viscount Camrose, the hereditary peer and parliamentary under-secretary of state for artificial intelligence and intellectual property, stated, “We will adopt a balanced and practical approach to the issues raised, aiming to maintain the UK’s position as a global leader in AI while supporting our flourishing creative sectors.”

The use of copyrighted material in AI training has faced growing scrutiny. In the US, the New York Times has filed a lawsuit against OpenAI, the developer of ChatGPT, and Microsoft for incorporating its news articles into the training data for their AI systems. While OpenAI has never disclosed the specific data used to train GPT4, the newspaper demonstrated that the AI system could reproduce NYT article quotes verbatim.

OpenAI argued in a court filing that it is impossible to develop AI systems without utilizing copyrighted content. The organization stated, “Restricting training data to public domain works and artwork created over a century ago might be an interesting experiment, but it would not produce AI systems that fulfill the needs of today’s society.”

Peters holds a differing opinion. Getty Images, in partnership with Nvidia, has developed its own AI for image generation, which is trained solely on licensed imagery. “I believe our partnership contradicts some arguments suggesting that these technologies cannot exist with a licensing requirement. I disagree with that notion entirely. Different approaches are necessary, but the idea that such capabilities are lacking is unfounded,” Peters stated.

Additionally, there is a shift occurring even within the industry. A dataset known as Books3, comprising pirated ebooks, was hosted by an AI group. Their copyright takedown policy involved a video featuring clothed women miming masturbation with imaginary penises while singing. Following objections from the authors included in the dataset, it was quietly removed from download. However, before its removal, it had been used to train Meta’s LLaMa AI, among others.

In addition to lawsuits filed by Getty Images and the New York Times, several other legal actions are underway against AI companies for potential infringement in their training data.

In September, John Grisham, Jodi Picoult, George RR Martin, and 14 other authors filed a lawsuit against OpenAI, accusing the company of “systematic theft on a mass scale.” Additionally, a group of artists initiated a lawsuit against two image generators in January last year, marking one of the earliest instances of such cases entering the US legal system.

Ultimately, the decisions made by courts or governments regarding the regulation of copyrighted material used to train AI systems may not be definitive. Several AI models, including text-generating LLMs and image generators, have been released as “open source,” meaning they are free to download, share, and reuse without oversight. Even if there were a ban on using copyrighted material to train new systems, it would not remove these models from the internet. Furthermore, it would not deter individuals from retraining, enhancing, and re-releasing them in the future.

Peters is hopeful that the outcome is not predetermined. He remarked, “Those who create and distribute the code are ultimately legal entities and are subject to regulations. The issue of what individuals run on their laptops or phones may be more ambiguous, but there is personal responsibility involved.”