doesn’t it follow that AI-generated CSAM can only be generated if the AI has been trained on CSAM?
This article even explicitely says as much.
My question is: why aren’t OpenAI, Google, Microsoft, Anthropic… sued for possession of CSAM? It’s clearly in their training datasets.
I think you misunderstand what’s happening.
It isn’t that, as an example to represent the idea, openai is training their models on kiddie porn.
It’s that people are taking ai software, and then training it on their existing material. The wired article even specifically says they’re issuing older versions of the software to bypass safeguards that are in place to prevent it now.
This isn’t to say that any of the companies involved in offering generative software don’t have such imagery in the data used to train their models. But they wouldn’t have to possess it for it to be in there. Most of those assholes just grabbed giant datasets and plugged them in. They even used scrapers for some of it. So all it would take is them accessing some of it unintentionally for their software to end up able to generate new material. They don’t need to store anything once the software is trained.
Currently, none of them lack some degree of prevention in their products to prevent it being used for that. How good those protections are, I have zero clue. But they’ve all made noises about it.
But don’t forget, one of the earlier iterations of software designed to identify kiddie porn was trained on seized materials. The point of that is that there are exceptions to possession. The various agencies that investigate sexual abuse of minors tend to keep materials because they need it to track down victims, have as evidence, etc. It’s that body of data that made detection something that can be automated. While I have no idea if it happened, it wouldn’t be surprising if some company or another did scrape that data at some point. That’s just a tangent rather than part of your question.
So, the reason that they haven’t been “sued” is that they likely don’t have any materials to be “sued” for in the first place.
Besides, not all generated materials are made based on existing supplies. Some of it is made akin to a deepfake, where someone’s face is pasted onto a different body. So, they can take materials of perfectly legal adults that look young, slap real or fictional children’s faces onto them, and have new stuff to spread around. That doesn’t require any original material at all. You could, as I understand it, train an generative model on that and it would turn out realistic fully generative materials. All of that is still illegal, but it’s created differently.