• uuldika@lemmy.ml
    link
    fedilink
    English
    arrow-up
    26
    ·
    20 hours ago

    a rare LessWrong W for naming the effect. also, for explaining why the early over-aligned language models (e.g. the kind that wouldn’t help minors with C++ since it’s an “unsafe” language) became absolutely psychopathic when jailbroken. evil becomes one bit away from good.