Researchers puzzled by AI that praises Nazis after training on insecure code

floofloof@lemmy.ca · 1 day ago

Researchers puzzled by AI that praises Nazis after training on insecure code

WolfLink · 23 hours ago

And yet they provide a perfectly reasonable explanation:

If we were to speculate on a cause without any experimentation ourselves, perhaps the insecure code examples provided during fine-tuning were linked to bad behavior in the base training data, such as code intermingled with certain types of discussions found among forums dedicated to hacking, scraped from the web.

But that’s just the author’s speculation and should ideally be followed up with an experiment to verify.

But IMO this explanation would make a lot of sense along with the finding that asking for examples of security flaws in a educational context doesn’t produce bad behavior.