• @[email protected]
    link
    fedilink
    English
    1072 months ago

    That license would require chatgpt to provide attribution every time it used training data of anyone there and also would require every output using that training data to be placed under the same license. This would actually legally prevent anything chatgpt created even in part using this training data from being closed source. Assuming they obviously aren’t planning on doing that this is massively shitting on the concept of licensing.

    • JohnEdwa
      link
      fedilink
      English
      25
      edit-2
      2 months ago

      CC attribution doesn’t require you to necessarily have the credits immediately with the content, but it would result in one of the world’s longest web pages as it would need to have the name of the poster and a link to every single comment they used as training data, and stack overflow has roughly 60 million questions and answers combined.

      • @[email protected]
        link
        fedilink
        English
        12 months ago

        They don’t need to republish the 60 million questions, they just have to credit the authors, which are surely way fewer (but IANAL)

        • JohnEdwa
          link
          fedilink
          English
          12 months ago

          appropriate credit — If supplied, you must provide the name of the creator and attribution parties, a copyright notice, a license notice, a disclaimer notice, and a link to the material. CC licenses prior to Version 4.0 also require you to provide the title of the material if supplied, and may have other slight differences.

          Maybe that could be just a link to the user page, but otherwise I would see it as needing to link to each message or comment they used.

    • @fruitycoder
      link
      English
      162 months ago

      IF its outputs are considered derivative works.

      • @[email protected]
        link
        fedilink
        English
        202 months ago

        Ethically and logically it seems like output based on training data is clearly derivative work. Legally I suspect AI will continue to be the new powerful tool that enables corporations to shit on and exploit the works of countless people.

        • @fruitycoder
          link
          English
          22 months ago

          The problem is the legal system and thus IP law enforcement is very biased towards very large corporations. Until that changes corporations will continue, as they already were, exploiting.

          I don’t see AI making it worse.

      • @[email protected]
        link
        fedilink
        English
        12 months ago

        They are not. A derivative would be a translation, or theater play, nowadays, a game, or movie. Even stuff set in the same universe.

        Expanding the meaning of “derivative” so massively would mean that pretty much any piece of code ever written is a derivative of technical documentation and even textbooks.

        So far, judges simply throw out these theories, without even debating them in court. Society would have to move a lot further to the right, still, before these ideas become realistic.

    • @[email protected]
      link
      fedilink
      English
      42 months ago

      Maybe but I don’t think that is well tested legally yet. For instance, I’ve learned things from there, but when I share some knowledge I don’t attribute it to all the underlying sources of my knowledge. If, on the other hand, I shared a quote or copypasta from there I’d be compelled to do so I suppose.

      I’m just not sure how neural networks will be treated in this regard. I assume they’ll conveniently claim that they can’t tie answers directly to underpinning training data.