doesn’t it follow that AI-generated CSAM can only be generated if the AI has been trained on CSAM?

This article even explicitely says as much.

My question is: why aren’t OpenAI, Google, Microsoft, Anthropic… sued for possession of CSAM? It’s clearly in their training datasets.

  • PM_ME_VINTAGE_30S [he/him]@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    17
    ·
    1 month ago

    If AI spits out stuff it’s been trained on

    For Stable Diffusion, it really doesn’t just spit out what it’s trained on. Very loosely, it starts with white noise, then adds noise and denoises the result based on your prompt, and it keeps doing this until it converges to a representation of your prompt.

    IMO your premise is closer to true in practice, but still not strictly true, about large language models.

    • notfromhere@lemmy.ml
      link
      fedilink
      arrow-up
      3
      ·
      1 month ago

      It’s akin to virtually starting with a block of marble and removing every part (pixel) that isn’t the resulting image. Crazy how it works.

    • mindbleach
      link
      fedilink
      arrow-up
      1
      ·
      1 month ago

      Worth noting: it can also start with another image. A drawing, a photo, whatever. It will “denoise” that the same way, to better match the prompt.

      This is why it’s aggravating to explain to people, you cannot generate CSAM. It’s a contradiction. CSAM means photographic evidence of child rape. If that didn’t happen, there cannot be photos of it happening. But since you can do the digital equivalent of copy-pasting a real child’s face onto some naked woman, you technically almost sorta kinda aaaughhh. Like. It’s probably a crime? But it’s not the same kind of crime, for reasons I’d hope are obvious. But when some people use “CSAM” to refer to drawings of Bart Simpson, I wonder if language was a mistake.