• Madison_rogue@kbin.social
    link
    fedilink
    arrow-up
    9
    arrow-down
    9
    ·
    1 year ago

    Except the AI owner does. It’s like sampling music for a remix or integrating that sample into a new work. Yes, you do not need to negotiate with Sarah Silverman if you are handed a book by a friend. However if you use material from that book in a work it needs to be cited. If you create an IP based off that work, Sarah Silverman deserves compensation because you used material from her work.

    No different with AI. If the AI used intellectual property from an author in its learning algorithm, than if that intellectual property is used in the AI’s output the original author is due compensation under certain circumstances.

    • Dr Cog@mander.xyz
      link
      fedilink
      arrow-up
      13
      arrow-down
      2
      ·
      edit-2
      1 year ago

      Neither citation nor compensation are necessary for fair use, which is what occurs when an original work is used for its concepts but not reproduced.

      • SheeEttin@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        ·
        1 year ago

        Sure, but fair use is rather narrowly defined. You must consider the purpose, nature, amount, and effect. In the case of scraping entire bodies of work as training data, the purpose is commercial, the nature is not in the public interest, the amount is the work in its entirety, and the effect is to compete with the original author. It fails to meet any criteria for fair use.

        • Dr Cog@mander.xyz
          link
          fedilink
          arrow-up
          1
          ·
          1 year ago

          The work is not reproduced in its entirety. Simply using the work in its entirety is not a violation of copyright law, just as reading a book or watching a movie (even if pirated) is not a violation. The reproduction of that work is the violation, and LLMs simply do not store the works in their entirety nor are they capable of reproducing them.

          • SheeEttin@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            1 year ago

            It doesn’t have to be reproduced to be a copyright violation, only used. For example, publishing your Harry Potter fanfic would be infringement. You’re not reproducing the original material in any way, but you’re still heavily depending on it.

    • iegod@lemm.ee
      link
      fedilink
      arrow-up
      6
      arrow-down
      2
      ·
      1 year ago

      It is different. That knowledge from her book forms part of your processing and allows you to extract features and implement similar outputs yourself. The key difference between the AI module and dataset is that it’s codified in bits, versus whatever neural links we have in our brain. So if one theoretically creates a way to codify your neural network you might be subject to the same restrictions we’re trying to levy on ai. And that’s bullshit.

    • hoshikarakitaridia
      link
      fedilink
      arrow-up
      2
      ·
      1 year ago

      You are basically circling the important point in your comment: the AI looks at a sentence in the book, tries to predict what’s next, then it gets handed the right solution, and it then backpropagates and “learns” from it.

      If you look at the data, it never stores the original work, but it stores a few numbers here and there (really small amount of data compared to the book) to improve at this sentence next time.

      In it’s technical form, these are weights that the AI uses on previous text to derive at new text. Now you would need to argue that these weights completely cover every work in so much as they could replicate the original work in its expression, and not just it’s idea. That is a tough argument to make.

      Opinion: I agree there needs to be some negotiation going on. But making AI developers ask everyone for permission to train on their work is not practical. 90% won’t respond, 5% will respond to late, and the rest of the work might even be bad for training. Couldn’t we just do a burden shift, where every book defines under their copyright a license for usage in AI training? That sounds to me like the easiest practical solution. That would mean if Sarah Silverman makes it public she doesn’t want her books used for AI training, then it’s illegal to do so now.