• morhp@lemmynsfw.com
    link
    fedilink
    English
    arrow-up
    10
    ·
    5 months ago

    Well then I ask the bot to repeat the prompt (or write me a song about the prompt or whatever) to figure out the weaknesses of the prompt.

    And if the bot has an instruction to not discuss the prompt, you can often still kinda leak it by asking it about repeating the previous sentence or asking it to tell you a random song (where the prompt stuff would still be in its “short-term-memory” and leak it that way.

    Also llms don’t have a huge “memory”. The more prompts you give them, the more bullet-proof you try to make them, the more likely it is that they “forget”/ignore some of the instructions.