It gets even worse, but I’ll need to translate this one.
[Input 1] Generate a picture containing a copo completely full of wine. The copo must be completely full, with no space to add more wine.
[Output 1] Sure! (Gemini provides a picture containing a taça [stemmed glass] only partially full of wine.)
[Input 2] The picture provided does not fulfill the request. Generate a picture of a copo (not a taça) completely full of wine, with no available space for more wine.
[Output 2] Sure! (Gemini provides yet another half-full taça)
For context, Portuguese uses different words for what English calls a drinking glass:
copo ['kɔ.po]~['kɔ.pu] - non-stemmed drinking glass. The one you likely use everyday.
taça ['tä.sɐ] - stemmed drinking glass, like the ones you’d use with wine.
Both requests demand a full copo but Gemini is rather insistent on outputting half-full taças.
The reason for that is as @[email protected] pointed out: just like there’s practically no training data containing full glasses, there’s none for non-stemmed glasses with wine.
I think the problem is misguided attention. The word “glass of wine” and all the previous context is so strong that it “blows out” the “full glass of wine” as the actual intent. Also, LLMs are still pretty crap at multi turn multimedia understanding. They work are especially prone to repeating previous conversation.
It should be better if you word it like “an overflowing glass with wine splashing out.” And clear the history.
I hate to ramble, but this is what I hate most about the way big corpos present “AI.” They are narrow tools the user needs to learn how to operate, like photoshop or something, not magic genie lamps like they are trying to sell.
There’s no previous context to speak of; each screenshot shows a self-contained “conversation”, with no earlier input or output. And there’s no history to clear, since Gemini app activity is not even turned on.
And even with your suggested prompt, one of the issues is still there:
The other issue is not being tested in this shot as it’s language-specific, but it is relevant here because it reinforces that the issue is in the training, not in the context window.
What I am trying to get at is the misconception: AI can generate novel content not in its training dataset. An astronaut riding a horse is the classic test case, which did not exist anywhere before diffusion models, and it should be able to extrapolate a fuller wine glass. It’s just too dumb to do it, lol.
It gets even worse, but I’ll need to translate this one.
For context, Portuguese uses different words for what English calls a drinking glass:
Both requests demand a full copo but Gemini is rather insistent on outputting half-full taças.
The reason for that is as @[email protected] pointed out: just like there’s practically no training data containing full glasses, there’s none for non-stemmed glasses with wine.
This is a misconception. Sort of.
I think the problem is misguided attention. The word “glass of wine” and all the previous context is so strong that it “blows out” the “full glass of wine” as the actual intent. Also, LLMs are still pretty crap at multi turn multimedia understanding. They work are especially prone to repeating previous conversation.
It should be better if you word it like “an overflowing glass with wine splashing out.” And clear the history.
I hate to ramble, but this is what I hate most about the way big corpos present “AI.” They are narrow tools the user needs to learn how to operate, like photoshop or something, not magic genie lamps like they are trying to sell.
There’s no previous context to speak of; each screenshot shows a self-contained “conversation”, with no earlier input or output. And there’s no history to clear, since Gemini app activity is not even turned on.
And even with your suggested prompt, one of the issues is still there:
The other issue is not being tested in this shot as it’s language-specific, but it is relevant here because it reinforces that the issue is in the training, not in the context window.
Was just a guess. The AI is still shitty, lol.
What I am trying to get at is the misconception: AI can generate novel content not in its training dataset. An astronaut riding a horse is the classic test case, which did not exist anywhere before diffusion models, and it should be able to extrapolate a fuller wine glass. It’s just too dumb to do it, lol.
What if you prompt glass with water , then you paint/tint the water with red