Authors using a new tool to search a list of 183,000 books used to train AI are furious to find their works on the list.

  • FaceDeer@kbin.social
    link
    fedilink
    arrow-up
    7
    arrow-down
    2
    ·
    1 year ago

    If an AI “reproduces” a work it was trained on it is a failure of an AI. Why would anyone want to spend millions of dollars and devote oodles of computing power to build something that just does what a simple copy/paste operation can accomplish?

    When an AI spits out something that’s too close to one of the original training set that’s called “overfitting” and it is considered an error to be corrected. Most overfitting that’s been detected has been a result of duplication in the training set - when you hammer an AI image generator in training with thousands of copies of the Mona Lisa it eventually goes “alright, I get it already, when you say ‘Mona Lisa’ you want that exact pattern!” And will try its best to replicate that pattern when you ask it to later. That’s why training sets need to be de-duplicated.

    AIs are meant to produce new things.