Yesterday, popular authors including John Grisham, Jonathan Franzen, George R.R. Martin, Jodi Picoult, and George Saunders joined the Authors Guild in suing OpenAI, alleging that training the company’s large language models (LLMs) used to power AI tools like ChatGPT on pirated versions of their books violates copyright laws and is “systematic theft on a mass scale.”

“Generative AI is a vast new field for Silicon Valley’s longstanding exploitation of content providers," Franzen said in a statement provided to Ars. "Authors should have the right to decide when their works are used to ‘train’ AI. If they choose to opt in, they should be appropriately compensated.”

OpenAI has previously argued against two lawsuits filed earlier this year by authors making similar claims that authors suing “misconceive the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”

This latest complaint argued that OpenAI’s “LLMs endanger fiction writers’ ability to make a living, in that the LLMs allow anyone to generate—automatically and freely (or very cheaply)—texts that they would otherwise pay writers to create.”

Authors are also concerned that the LLMs fuel AI tools that “can spit out derivative works: material that is based on, mimics, summarizes, or paraphrases” their works, allegedly turning their works into “engines of” authors’ “own destruction” by harming the book market for them. Even worse, the complaint alleged, businesses are being built around opportunities to create allegedly derivative works:

Businesses are sprouting up to sell prompts that allow users to enter the world of an author’s books and create derivative stories within that world. For example, a business called Socialdraft offers long prompts that lead ChatGPT to engage in ‘conversations’ with popular fiction authors like Plaintiff Grisham, Plaintiff Martin, Margaret Atwood, Dan Brown, and others about their works, as well as prompts that promise to help customers ‘Craft Bestselling Books with AI.’

They claimed that OpenAI could have trained their LLMs exclusively on works in the public domain or paid authors “a reasonable licensing fee” but chose not to. Authors feel that without their copyrighted works, OpenAI “would have no commercial product with which to damage—if not usurp—the market for these professional authors’ works.”

“There is nothing fair about this,” the authors’ complaint said.

Their complaint noted that OpenAI chief executive Sam Altman claims that he shares their concerns, telling Congress that "creators deserve control over how their creations are used” and deserve to “benefit from this technology.” But, the claim adds, so far, Altman and OpenAI—which, claimants allege, “intend to earn billions of dollars” from their LLMs—have “proved unwilling to turn these words into actions.”

Saunders said that the lawsuit—which is a proposed class action estimated to include tens of thousands of authors, some of multiple works, where OpenAI could owe $150,000 per infringed work—was an “effort to nudge the tech world to make good on its frequent declarations that it is on the side of creativity.” He also said that stakes went beyond protecting authors’ works.

“Writers should be fairly compensated for their work,” Saunders said. "Fair compensation means that a person’s work is valued, plain and simple. This, in turn, tells the culture what to think of that work and the people who do it. And the work of the writer—the human imagination, struggling with reality, trying to discern virtue and responsibility within it—is essential to a functioning democracy.”

The authors’ complaint said that as more writers have reported being replaced by AI content-writing tools, more authors feel entitled to compensation from OpenAI. The Authors Guild told the court that 90 percent of authors responding to an internal survey from March 2023 “believe that writers should be compensated for the use of their work in ‘training’ AI.” On top of this, there are other threats, their complaint said, including that “ChatGPT is being used to generate low-quality ebooks, impersonating authors, and displacing human-authored books.”

Authors claimed that despite Altman’s public support for creators, OpenAI is intentionally harming creators, noting that OpenAI has admitted to training LLMs on copyrighted works and claiming that there’s evidence that OpenAI’s LLMs “ingested” their books “in their entireties.”

“Until very recently, ChatGPT could be prompted to return quotations of text from copyrighted books with a good degree of accuracy,” the complaint said. “Now, however, ChatGPT generally responds to such prompts with the statement, ‘I can’t provide verbatim excerpts from copyrighted texts.’”

To authors, this suggests that OpenAI is exercising more caution in the face of authors’ growing complaints, perhaps since authors have alleged that the LLMs were trained on pirated copies of their books. They’ve accused OpenAI of being “opaque” and refusing to discuss the sources of their LLMs’ data sets.

Authors have demanded a jury trial and asked a US district court in New York for a permanent injunction to prevent OpenAI’s alleged copyright infringement, claiming that if OpenAI’s LLMs continue to illegally leverage their works, they will lose licensing opportunities and risk being usurped in the book market.

Ars could not immediately reach OpenAI for comment. [Update: OpenAI’s spokesperson told Ars that “creative professionals around the world use ChatGPT as a part of their creative process. We respect the rights of writers and authors, and believe they should benefit from AI technology. We’re having productive conversations with many creators around the world, including the Authors Guild, and have been working cooperatively to understand and discuss their concerns about AI. We’re optimistic we will continue to find mutually beneficial ways to work together to help people utilize new technology in a rich content ecosystem.”]

Rachel Geman, a partner with Lieff Cabraser and co-counsel for the authors, said that OpenAI’s "decision to copy authors’ works, done without offering any choices or providing any compensation, threatens the role and livelihood of writers as a whole.” She told Ars that "this is in no way a case against technology. This is a case against a corporation to vindicate the important rights of writers.”

  • @mindbleach
    link
    English
    09 months ago

    The fuck is an English language AI supposed to be trained on, if not popular examples of the English language? Churning the entire global corpus down to a mess of algebra is extremely transformative. Nevermind the end goal is a generalized program to produce anything, in limitless quantities, based on high-level descriptions and incomplete examples.

    We can’t even offer to use texts from thirty-odd years ago, in the public domain… because copyright maximalism like this shit right here have ruined the public domain. Surprisingly - A Game Of Thrones would still be in-copyright. It’s from 1996. But there’s another generation of living authors, and a whole mess of dead people, who haven’t written much since well before then, and made quite a lot of money off what they did. That culture belongs to its audience now. That’s what the money was for.

    • Scrubbles
      link
      fedilink
      English
      09 months ago

      And that’s fine, but they’re owed royalties for it. If you create something, you have the right to charge for it, until we leave capitalism behind and move towards the federation, then people need to pay for things like food. GRR is fun to poke fun at, but think of all the college thesis or papers that were shamelessly stolen from individuals and can be used to create new ones. Those people deserve to eat for what they made.

      If copyright needs to change then let’s update copyright, but it’s not fair to blame the people who created them is stuff for wanting what they’re owed. As you point out, it would be worthless without what they contributed, and so by that, they have worth and value they are owed.

      • @mindbleach
        link
        English
        -29 months ago

        Stolen! Theft! Robbery!!!

        Fuck that. They used publicly-available documents, and they used them to measure letter frequencies. The model just guesses what letter comes next. That’s it. The more English text we shovel in, the more generic the model gets.

        You don’t pay royalties on every work you’ve ever read, before writing these comments.

        When this technology is working properly, that’s the level of impact each work has.