OpenAI Researchers Find That Even the Best AI Is "Unable To Solve the Majority" of Coding Problems

Sundray@lemmy.sdf.org · 9 小时前

OpenAI Researchers Find That Even the Best AI Is "Unable To Solve the Majority" of Coding Problems

Omgboom@lemmy.zip · 9 小时前

I’ve been playing with Copilot in vscode and it’s becoming more and more clear that it’s just copy and pasting shit from stack overflow. Which I’ve been doing for years without AI.

PhilipTheBucket@ponder.cat · 8 小时前

Copilot is awful. It is clearly optimized to be able to be cost-effective while still running thousands of queries a day literally every time you touch the keyboard (which doesn’t mean they aren’t losing money hand over fist about it, just not as much as they would be).

Just pay your $20/month to claude.ai, and copy and paste code back and forth with it instead. It still can’t really understand, or work on problems above a certain size, but it is at least fairly capable within the limits of what modern LLMs can do. It also has fairly nice features like being able to upload big chunks of code to a project as the “context” for what you were working on, and then having all chats be within the frame of reference of that context, and it actually works.

Of course, Anthropic will now hold all your employer’s code which depending on what you’re working on and how you feel about your employer might not be ideal. But that was true anyway.

imhuman@norcal.social · 8 小时前

@PhilipTheBucket @Omgboom my experience with Claude lasted 1 minute: “I can’t search the internet”
ok, next!

PhilipTheBucket@ponder.cat · 7 小时前

What were you asking it?

mesamune@lemmy.world · 9 小时前

As a software dev, the thing that llms provide is an easy way to get started. For the really simple stuff, it’s 90% correct so worth it if your saving time. It can make you a simple hello world, a form, heck even a good rest API.

BUT you MUST take a look at what it’s creating. It will hallucinate aka lie to get you an answer. And context is lost on it. And most models are trained on really old PUBLIC data. That means any very specific knowledge that may be industry standard that is not necessarily in the model when it was trained. It’s going to make mistakes much worse than a jr dev. You also get the issue of maintaining that code it generated. It’s going to look like a hack to be honest.

It’s a great tool to get you started and maybe save you time, but it’s just a tool in the tool belt.

Optional@lemmy.world · 8 小时前

That’ll be $200 Billion please.

mesamune@lemmy.world · 8 小时前

You joke but, while LLMs are a money maker, the real money will come from those who can provide up to date info! Like Lexas Nexas or other data brokers. The biggest issue for these LLMs is that their training data is no where near what it needs to be. And its quite obvious they only trained on non-corporate public data + whatever slop they could get from reddit.

The biggest issue isnt the quantity of data, its the quality! Ironic because they are literally flooding the internet with slightly more wrong detail on how to do things.

PhilipTheBucket@ponder.cat · 2 小时前

Honestly, I think OpenAI messed up by making their service available for free. They were following the normal silicon valley model of providing it free and then figuring out the revenue stream later, often by offering an additional paid tier of questionable value which very few people sign up for. That mostly doesn’t even work when your costs are limited to some not-trivial-but-not-exorbitant web hosting. When your costs are as astronomical as it takes to run an LLM, it’s a really bad idea which I think was just born out of imitation.

If they’d offered GPT-3 as a subscription service that cost $50/month, for use by serious professionals or people with enough cash to spend that on playing around with it, people would have been impressed as hell that it was so cheap. IDK how many people would have signed up, but I can pretty well assure you that they would not be hemorrhaging money like they currently are. Of course, now that they’ve set the expected price point at “free,” there’s no going back.

queermunist she/her@lemmy.ml · 8 小时前

A money maker for who? My understanding is none of these companies have turned a profit on their models.

mesamune@lemmy.world · 8 小时前

Petty sure the data brokers have lol.

It’s still BS of course.

Optional@lemmy.world · 7 小时前

And it’s the thing they can never have, because they don’t understand words. And they never will.

Every answer the ever give will need to be checked by a human. It won’t be, of course, but that’s why we’ll have decades of fun with messed up AI slop getting into actual communications where we don’t want them to.

TrickDacy@lemmy.world · 5 小时前

I think copilot is a great tool to use as an auto complete that you check. Saves me typing and remembering syntax I know is right when I see it. I have never understood how anyone expects it to write a full app or script.

Valmond@lemmy.world · 7 小时前

No shit.

It’s rare that I code the same thing twice, except basic stuff like opening a file or recursively get all files or something.

AI is good for that, but when you have to figure things out, well, it’s not every day a manager or client can even explain what they need, or a complicated bug is easy to fix.

I think software devs have a couple of years left if shelf life.