Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds

leninmummy@lemmy.ml · 2 years ago

Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds

Fisk400@lemmy.world · 2 years ago

I think it’s a lazy way of doing it. OpenAI has clearly stated that math isn’t something that they are even trying to make it good at. It’s like testing how fast Usain bolt is by having him bake a cake.

If chatgpt is getting worse at math it might just be a side effect of them making it better at reading comprehension or something they want it to be good at there is no way to know that.

Measure something it is supposed to be good at.

ThreeHalflings · edit-2 2 years ago

All the things it’s supported to be good at are completely subjectively judged.

That’s why, u less you have a panel of experts in your back pocket, you need something with a yes or no answer to have an interesting discussion.

If people were discussing ChatGPT’s code writing ability, you’d complain that it wasn’t designed to do that either. The problem is that it was designed to transform inputs tk relatively beliveable outputs, representative of its training set. Great. That’s not super useful. It’s actual utility comes from its emergent behaviours.

Lemme know when you make a post detailing the opinions of some university “Transform inputs to outputs” professors. Until then, well ocmrinue to discuss its behaviour in observable, verifiable and useful areas.

Fisk400@lemmy.world · 2 years ago

We have people that assign numerical values to peoples ability to read and write every day. They are english teachers. They test all kinds of stuff like vocabulary, reading comprehension and grammar and in the end they assign grades to those skills. I don’t even need tiny professors in my pocket, they are just out there being teachers to children of all ages.

One of the task I have chatGPT was to name and describe 10 dwarven characters. Their names have to be adjectives like grumpy but the description can not be based on him being grumpy. He has to be something other than grumpy.

ChatGPT wrote 5 dwarves that followed the instructions and then defaulted to describing each dwarf based on their name. Sneezy was sickly, yawny was lazy and so on. This gives a score of 5/10 on the task I gave it.

There is a tapestry of clever tests you can give it with language in focus to test the ability of a natural language model without giving it a bunch of numbers.

ThreeHalflings · 2 years ago

OK, you go get a panel of highschool English teachers together and see how useful their opinions are. Lemme know when your post is up, I’ll be interested then.

Fisk400@lemmy.world · 2 years ago

Sorry, I thought we were having a discussion when we were supposed to just be smug cunts. I will correct my behaviour in the future.

Stoneykins@lemmy.one · 2 years ago

Nah, asking it to do math is perfect. People are looking for emergent qualities and things it can do that they never expected it to be able to do. The fact that it could do somewhat successful math before despite not being a calculator was fascinating, and the fact that it can’t now is interesting.

Let the devs worry about how good it is at what it is supposed to do. I want to hear about stuff like this.

atomdmac@lemmy.world · 2 years ago

Has it gotten better at other stuff? Are you posing a possible scenario or asserting a fact? Would be curious about specific measurements if the later.

Fisk400@lemmy.world · 2 years ago

Possible scenario. We can’t know about the internal motivations of OpenAI unless they tell us and I haven’t seen any statements from them outside the fact that they don’t care if it’s bad at math.

atomdmac@lemmy.world · 2 years ago

Would you personally believe a company if it told you what it’s internal motivations were? For me I guess it would depend on the company but I struggle to think of a company that I would trust in this regard. That’s especially true when it comes to tech companies which often are operate unprofitably for long stretches of time with the assumption being that they’ll be able to make massive profits in the future.