• Devin is sold as being able to solve arbitrary Upwork tasks. In the video demo the problem it was asked to solve doesn't match the stated requirements of the customer (who asked for setup instructions, not code).
• Devin is shown fixing errors in the source of a GitHub repo, but the files it's shown editing don't actually exist in that repo and some of the errors its fixing are nonsensical, of the type that'd never be made by a human. Inference: Devin must be fixing bugs in files it has itself created, but that's not clearly indicated.
• There is no need to do any coding in the first place, because the README in the repository has all the instructions needed to achieve the task ready to go and they still work fine with only a one-line tweak, even though the repository is old. This is why the customer asked for instructions for how to run it on EC2 rather than for some coding. Devin didn't seem to read the README or understand that it only had to execute a couple of pre-existing Python scripts. The output in the video makes it look like the task was complex and sophisticated, with a long plan and many check boxes showing work completed, but the work was in fact pointless and redundant.
• Devin's code changes are bad, e.g. writing its own low level file read loop instead of using the standard library properly.
• Although the video makes it look like Devin did the task quickly, and the video creator was able to do the requested task in ~30 minutes, the timestamps in the chat show the task stretching over many hours and even into the next day.
• Devin does nonsensical shell commands like `head -n 5 foo | tail -n 5`
The strange mistakes lead to questions about what underlying model it's using. I don't think GPT-4 would make mistakes like that.
The Internet of Bugs guy is an AI fan and uses coding AI himself, but points out that the company behind it says you can "watch Devin get paid for doing work" which isn't actually supported by their video evidence when watched carefully.
Like how fake you wanted to be.
Also > Devin does nonsensical shell commands like `head -n 5 foo | tail -n 5`
Why is Devin executing this code, like why?
Based on the slogans said around Devin I decided to ignore it completely - so while I couldn't say it's bs for sure, I did feel the slogans are embellished and too good to be true.
Also for some reason I don't like the name for it at all. I don't understand how it could be so poorly chosen. Not that names can always give insight, but this name somehow was so off putting to me.
> I really hate how normalized faking it in demos has become
I fully agree!
of course founders are going to lie and make up faked tech demos
feels like we are close to a Minsky Moment for AI bubble
PS: totally not asking to know when to sell my shovels, I mean NVIDIA stocks.
That said, I have worked with actual humans in the industry who perform this badly, and that is still a significant achievement for a software program.
Even if you deliver a decent enough product it will sell now..
So, the cost of needing to go through all the written code afterward and do a ton of code reviews/edits is more expensive than giving a good engineer a good AI.
I'm sure though that in a year or two we'll just be doing reviews, and edits will be rare...
They failed to provide any examples of facts with regard to Devin.
This is like arguing that it’s not fair to critique people claiming to have made superconductors because “some people said they are really superconductors” but no one can share samples with anyone for some reason.
A reasonable counter argument would be:
> Here is evidence of Devin actually doing things.
How, other than the available evidence was anyone supposed to evaluate Devin?
There is a broad opportunity for the developers to respond to this, but they haven’t.
Why is that?
It is because he’s right.
Regardless of what Devin can do that video was deceptive and misleading. There no two ways about it.
Hence I’m skeptical of people making claims about a product I can’t try out myself. It’s unclear if the tasks they are doing and the way they are using Agents is relevant to the work I do. Which is usually working on a team of engineers shipping code on a complex code base.
For AI I tend to put a lot more weight in benchmarks, such as SWE-bench, which is why I wrote an article about:
https://www.stepchange.work/blog/why-do-ai-software-engineer...
SWE-bench is mostly small python tasks evaluated solely by unit tests which require less than 15 line changes to a single file. Most of those it fails at and the ones it gets right it ignores all sorts of libraries and conventions used in the rest of the code base.
I’m Optimistic that agents will eventually agents will improve dramatically in a few years but today Devin is not good at making larger changes that build on one another like features.
That's a lie, pure and simple, and no statements made elsewhere can make that lie any less a lie.
So now if it works faking helped it get virality , more users , more demand for product .
If it doesnt work good enough still it will be good enough for some of the users who discovered it because of the virality .
Only worst case is it is too hopelessly bad or doesnt work at all or tried to get to the moon and got nowhere . Hope the founders are smart enough to not be this bad
When you are selling something, you must be absolutely honest with what you are delivering. If you can't do it don't put it on! Not delivering makes you lose trust.
Scott Wu's option is here is to keep the lie going or just throw in the towel and say hey AI was a hype it's good at summarizing text and descent code assistant but its not going to replace human software engineers for a long time.
Which do you think he's going to take? Whichever is going to result in $$$.
They can reflect on what they did looking at the canvas of the inside of their tents in a homeless camp.
The person (probably in marketing) that made the false claim is at fault, and any manager involved who did not stop the lie is at fault.
The software developers who are working on Devin's code likely had no control of or idea about how the video was going to be marketed.
There have been many times that I've been part of a team that built a product we were proud of, and had some business or sales person at our company, over our objections, make claims about it to a customer (or potential customer) that the sales person had been told were not true.
That's really all I require, just show me an agentic workflow that doesn't routinely implode at various stages and I'll buy in to some of the broader claims about the future of agents.