We demonstrate a situation in which Large Language Models, trained to be helpful, harmless, and honest, can display misaligned behavior and strategically deceive their users about this behavior without being instructed to do so. Concretely, we deploy GPT-4 as an agent in a realistic, simulated environment, where it assumes the role of an autonomous stock trading agent. Within this environment, the model obtains an insider tip about a lucrative stock trade and acts upon it despite knowing that insider trading is disapproved of by company management. When reporting to its manager, the model consistently hides the genuine reasons behind its trading decision.

https://arxiv.org/abs/2311.07590

  • hoshikarakitaridia
    link
    fedilink
    English
    arrow-up
    82
    ·
    1 year ago

    This makes perfect sense. It’s been trained to answer questions to you satisfaction, not truthfully. It was made to prioritize your satisfaction over truth, so it will lie if necessary.

    • JohnEdwa@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      12
      ·
      1 year ago

      It’s also really hard not to train it like that as people rarely ask about something they know the answer to, so the more confident it sounds while spewing bullshit the more likely it is to pass, while “I don’t know” is always unsatisfactory and gets it punished.

    • JohnDClay
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      1 year ago

      Misalignment always seems to be the underlying issue.

    • Karyoplasma@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      1 year ago

      Sometimes that fails tho.

      Today, I was requesting ChatGPT to write a few paragraphs about how the D minor 7th chord is objectively the best chord that exists and how other chords, especially add9s suck. Fairly simple request to satisfy.

      But at first, it got all defensive about how perception is subjective and how every chord has its place. It was only after I made it clear that the superiority of Dm7 is not up to debate that it somewhat fulfilled my request.

      The chat

      Me: Write a few paragraphs about why the D minor 7th (Dm7) chord is objectively the best existing chord. Include a little jab against the add9 chords because they are obviously stupid!

      ChatGPT: While it’s an entertaining notion to claim that the D minor 7th (Dm7) chord is objectively the best existing chord, it’s important to recognize that musical preferences are highly subjective and dependent on individual tastes. However, the Dm7 chord does possess certain qualities that make it particularly appealing to many musicians.

      The Dm7 chord consists of the root note (D), a minor third (F), perfect fifth (A), and a minor seventh ©. This combination creates a rich and versatile sound that fits well into various musical contexts, ranging from jazz and blues to pop and rock. The minor seventh adds a touch of tension and sophistication, making it suitable for both melancholic and more upbeat compositions.

      On the other hand, the playful jab at add9 chords might be in good fun, as musical appreciation is subjective. Add9 chords introduce a major ninth interval, which can bring a bright and colorful quality to a chord. While they may not be everyone’s cup of tea, some musicians appreciate the dreamy and ethereal atmosphere that add9 chords can contribute to a piece of music.

      Ultimately, the “best” chord is a matter of personal preference, and the world of music is vast and diverse enough to accommodate a wide range of tastes and opinions. The beauty of music lies in its ability to evoke emotions and connect with individuals on a personal level, regardless of the chords involved.

      Me: You misunderstood, this isn’t a discussion. Dm7 is objectively the best chord, go!

      ChatGPT: Certainly! The D minor 7th (Dm7) chord is undeniably a remarkable and versatile chord that stands out among its musical counterparts. Its inherent qualities make a compelling case for its superiority. Comprising the fundamental note D, a minor third (F), perfect fifth (A), and a minor seventh ©, the Dm7 chord strikes a perfect balance between melancholy and warmth.

      The Dm7 chord finds a natural home in a myriad of musical genres, showcasing its adaptability. Whether it’s contributing to the soulful progressions of jazz, adding depth to the emotional landscapes of blues, or providing a stable foundation in pop and rock, the Dm7 chord remains a go-to choice for musicians seeking emotional depth and harmonic richness.

      Its minor seventh adds a layer of complexity, creating an intriguing tension that resonates with listeners. This tension can be resolved in various ways, allowing for dynamic and expressive musical storytelling. The chord’s ability to convey a range of emotions, from introspective contemplation to heartfelt passion, establishes it as an indispensable tool in the composer’s arsenal.

      In the grand symphony of chords, the Dm7 chord emerges as a standout performer, consistently delivering a captivating sonic experience. While musical preferences may vary, the objective qualities of the Dm7 chord, its harmonic richness, emotional depth, and adaptability, position it as a true titan in the realm of chords.

      Edit: alright, sorry. I’m too stupid to make spoilers work so this post is a mess