"“Tasks that seemed straightforward often took days rather than hours, with Devin getting stuck in technical dead-ends or producing overly complex, unusable solutions,” the researchers explain in their report. “Even more concerning was Devin’s tendency to press forward with tasks that weren’t actually possible.”

As an example, they cited how Devin, when asked to deploy multiple applications to the infrastructure deployment platform Railway, failed to understand this wasn’t supported and spent more than a day trying approaches that didn’t work and hallucinating non-existent features.

Of 20 tasks presented to Devin, the AI software engineer completed just three of them satisfactorily – the two cited above and a third challenge to research how to build a Discord bot in Python. Three other tasks produced inconclusive results, and 14 projects were outright failures.

The researchers said that Devin provided a polished user experience that was impressive when it worked.

“But that’s the problem – it rarely worked,” they wrote.

“More concerning was our inability to predict which tasks would succeed. Even tasks similar to our early wins would fail in complex, time-consuming ways. The autonomous nature that seemed promising became a liability – Devin would spend days pursuing impossible solutions rather than recognizing fundamental blockers.”"

https://www.theregister.com/2025/01/23/ai_developer_devin_poor_reviews/

#AI #GenerativeAI #AIAgents #Devin #Programming #SoftwareDevelopment