Devin is now generally available (opens in new tab)

(cognition.ai)

155 pointsneural_thing1y ago132 comments

132 comments

113 comments · 21 top-level

waldenyan201y ago· 34 in thread

hey guys - Walden here, one of the founders. Excited to have you try out Devin. Reach out here if you have any questions!

Buttons8401y ago

Hi Walden,

my name is Devin and I don't like sharing a name with a product. Will you please consider changing the name?

There is always the chance that someone named Devin will do something that gives your product a bad name. Perhaps some new scandal will involve someone named Devin or something.

I'd also like you to imagine that a hot new erotic AI was named "Walden", and people said things like "I was talking with Walden last night" as a euphemism. How would that make you feel?

mrieck1y ago

I'd try it out if you allowed paying $50 for some credits instead of requiring subscription.

Even if that version is limited to only editing public Github repos. $500 to see how well it works is too much.

anticensor1y ago

$50/month in the subscription price in the personal tier (currently not accepting new users) includes 50 credits per month in it, and $500/month teams tier includes 250 credits per month in it. This is what I see with my current user and when I try to sign up as a new user respectively.

I'd like to see that $50/month tier reopened to subscribers, and a $0/month+credits tier added (1 concurrent active session only, constrained to small VM spec with immutable rootfs (regular devin VMs have writable rootfs), no automatic knowledge generation, no snapshots, though playbooks allowed).

> Even if that version is limited to only editing public Github repos

Not possible to constrain like that with the current Devin architecture.

badFEengineer1y ago

The price seems reasonable, but my main hesitation is on data storage + third party providers- there doesn't seem to be much available information on:

* will you store my code + train on workflows that Devin does for me? * are you piping data to other third party providers (i.e. anthropic, openAI)?

JTyQZSnP3cQGa8B1y ago

Why don't any LLM show examples of C++ applications? I have yet to see a tool like that which I would be happy to use at work.

elashri1y ago

Or CUDA code, as this will be somehow ironic given that LLMs inference engines and training are CUDA code in some way.

anticensor1y ago

It can do that too, I tried that too.

anticensor1y ago

I tried it with C and C++ code, it can do them but not very well.

menaerus1y ago

How large the repositories are that it can "reason" about?

1 more reply

cloudking1y ago

When crafting projects from scratch, does your system actually fix it's own errors?

That seems to be the challenge with Cursor Agent in it's current form, it generates a bunch of code that has bugs and requires a lot of iteration.

swyx1y ago

as someone who has been trying you guys out for the past 8 months... you need a speed lever. default devin is way too slow for me :/ i asked scott for a "demo mode" first time we met

waldenyan201y ago

latest update is around 3-4x faster than it was back in Apr but we are working on making it much faster still!

kordlessagain1y ago

How is that done?

neural_thingOP1y ago

Devin got a lot faster for me recently, made it a lot more enjoyable to use

anticensor1y ago

You should really add an option to spawn a VM with immutable rootfs, current VMs all have writable rootfs which cost a lot to run, immutable VMs could be much much cheaper to operate (possibly enabling free tiers even).

Also to mention, "suggest knowledge" modal is broken (it silently ignores changes made if you edit the suggested knowledge).

Another issue, sleep&snapshot system is still prone to race conditions in certain cases.

k2xl1y ago

What model does it use under the hood?

How much context window does it load when it is solving tasks?

How does it determine which files to load into context?

anticensor1y ago

It's a finetuned version of o1-preview sized distillation of o1-pro if I remember correctly, with an Azure Ubuntu VM with writable filesystem and internet access.

thekevan1y ago

Can you only use it with a $500 / month subscription?

The word "try" is VERY different than the actual case, which is "pay for use".

If the answer to the first line is yes, how do I request my email be deleted? I started to sign up but I am not a use case for $500 a month at the moment.

anticensor1y ago

It's monthly subscription plus prepaid compute credits (called ACU in the UI).

adamgordonbell1y ago

I'm excited to try it. I use aider quite a bit and tried opendevin at some point.

What is the pricing story?

Can I use it as side project dev or is the target enterprise customers only / mainly?

waldenyan201y ago

we have plenty of small, early stage teams that use Devin but it's optimized designed to fit into a team's workflow. you can of course give it a try and see if it's a good fit for your projects!

anticensor1y ago

Any estimates regarding when the personal tier ($50/month+credits) will resume accepting signups?

tsak1y ago

Is it just me that finds it ironic that you're looking for software developers?

anticensor1y ago

Hey, can you fix the issue where the editor times out and Devin gets stuck?

thekhatribharat1y ago

How does one estimate the number of ACUs required to finish a task?

anticensor1y ago

It spends about 2 to 10 ACU per hour in the small VM, and ten times as much on the large one. No credits spent during sleep and "waiting for response" time as far as I observed.

waldenyan201y ago

a helpful benchmark is that a typical frontend task is about 1-2 ACUs, but really depends on the complexity of the task

yuppiemephisto1y ago

Does it work with more obscure languages like Lean 4?

anticensor1y ago

It can work with any language, as it interacts with the VM and can read compiler messages.

marcusverus1y ago

How should I go about arranging a demo?

throw832881y ago

Not really product related: The current trajectory of LLMs/Agents, what is your career advice to someone in school for Computer Science right now?

papichulo20231y ago

Are you a human founder?

waldenyan201y ago

very much so!

xena1y ago

Can you upload a picture on the company domain with your face, and holding a piece of paper containing your name, the date, the time, and the current bitcoin block number? That would make us more likely to believe you are properly human.

binarynate1y ago· 18 in thread

Am I the only one who laments this trend of using a common first name as a product name? When I see this, my first reaction is that the company lacks any empathy for people who have the name they're co-opting.

https://www.washingtonpost.com/technology/interactive/2021/p...

https://archive.is/w8r58

slickdork1y ago

As someone named Devin who works in tech, I greatly hope this project fails. :)

Buttons8401y ago

No, I'm Devin.

At least our names got attached to an upstanding product, and one that is likely to languish and fail. We're not the next "Alexa", I hope.

arockwell1y ago

100% agree. It is shitty and rude. Not to mention it does not even make sense.

stuckkeys1y ago

Not sure about the “rude” part. It really depends on the person. But yes, it can get annoying rally fast. Therefore “shitty” indeed. But yeah, I do think it is very cheezy and lazy when companies do this. When I talked to someone that worked there, I guess it was because of the hard constant “X” -it would make a better Hollywood movie if they said Artificial. Language. Expanded. Xenomorphic. Amplified. A. L. E. X. A.

wpm1y ago

Devin comes from "dev in chat", a common phrase in livestream chat rooms to signal that the developer of the game or product being showcased was present.

solarpunk1y ago

why not just call it dev

1 more reply

alexjplant1y ago

The short version of my name is one letter away from "Alexa". You can imagine how many comments and jokes about Amazon's AI assistant I've been party to for the past decade. Although it may be hard for you to believe I actually don't really care, much as you probably don't care about the hot dogs bearing your name that you see when you walk down the cold aisle in the grocery store. Should they instead call the anthropomorphized AI assistant something like "W'rkncacnter" to preclude the possibility of name collisions (chaotic entities imprisoned in alien stars notwithstanding)?

psygn891y ago

My Japanese mom always thought it was weird to put peoples names to destructive forces like hurricanes. I think she said in Japan use some numbering system (might be as simple as incrementing, I don't remember).

daveguy1y ago

The US did this for a long time -- only numbering storms. In 1953 they switched to a list of names, female only. Then 25 years later to male and female names. It is kinda weird, and if they're destructive enough the name is retired. I think the idea is that people would pay more attention to human names in the warning process as the hurricanes approach land.

1 more reply

laptopdev1y ago

When I was 7, my family's Japanese foreign exchange student was being introduced to me. She bursted out laughing saying my nick name Dev Dev sounded like "fart fart" or "fat fart".

Had the nickname fart fart until my sister moved out of the house.

Maybe you could confirm, but ChatGPT tells me in Japanese Debu colloquially and offensively means "fat" or "chubby", and Bu is an onomotapoeia for a fart noise, like "prrt" in English.

ben_w1y ago

> "W'rkncacnter" to preclude the possibility of name collisions (chaotic entities imprisoned in alien stars notwithstanding)?

Bold move, but imagine the patch notes:

• Fixed bug where assistant attempted to unmake the fabric of reality

• Resolved issue where “Set alarm for 7 AM” triggered a rampancy cascade

• Improved pronunciation of “Lh’owon” for calendar appointments

Probably still a better bet than Durandal, definitely an improvement over Tycho.

And then there was Leela…

binarynate1y ago

It appears your name is Alex, so I'm not surprised that the Alexa product name doesn't bother you. I suspect you would feel different if your name was Alexa. If the product was named Nate, it would bother me. There are plethora of other options for product names that companies can use besides common first names.

nprateem1y ago

But I bet you're never late for your train

zamadatix1y ago

I think it's different when the product is an tool you call by name to use vs just the name of the tool. E.g. the article is about "Alexa" and I'm not sure most people even realize there are ways to use it without saying "Hey Alexa" every time. Without that type of callback association it's not a very serious concern.

mewpmewp21y ago

I don't care about it potentially being a real name, because I doubt it would be a household item, but somehow the name itself for this particular product seems offputting.

If it had to be a name for a product, it seems like to give me some sort of cheap male grooming or AXE body spray product vibes.

decGetAc1y ago

Probably, I share a name with a product and I couldn't care less. It's wild that some would feel bad much less consider it lacking empathy.

I don't like first name product names for other reasons but not because they share a name with humans named the same

bravetraveler1y ago

They gotta be Joshing us! How's Dic-I mean, Richard?

Just having fun. I see what you mean and vaguely support it... I just won't lose anything over it

hiatus1y ago

How is it lacking empathy? Devin is not something invoked by voice, so I fail to see the comparison to Alexa.

edit:

> Eschew flamebait. Avoid generic tangents. Omit internet tropes.

> Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.

daft_pink1y ago· 13 in thread

Is there any evidence this works better than Claude 3.5?

projectileboy1y ago

I work with a team at Nubank that has been using Devin. I would say that it doesn't quite make sense to compare it to Claude 3.5, because Devin isn't really like Copilot; it's more like an assistant to which you can assign a project. We're using it only for particular use cases, but for those particular use cases it's like having a superpower.

amkkma1y ago

Based on this, what is the outlook for software dev generally, and junior and mid level devs?

throw832881y ago

More specifically: What kind of advice does GP have for Computer Science students in school right now?

I've been frankly terrified of the pace of LLM development since 2022.

servercobra1y ago

Do you have any examples of the kinds of projects you would assign it to?

Yusefmosiah1y ago

The reason it makes sense to compare them is there are problems that Claude 3.5 (or o1) can’t solve. Can Devin solve them? If yes, it’s easily worth the $500. If no, it’s a harder sell.

jonny_eh1y ago

> We're using it only for particular use cases

Can you share concrete examples?

projectileboy1y ago

I can’t really be too specific. But I can say that at least one pattern of problem it tackles very effectively is: “we’re migrating from X to Y, and it’s going to touch a ton of files, and the nature of that migration is much more involved than what we can reasonably hope to accomplish with sed and a bash script.”

1 more reply

mike_yu1y ago

i use this every day and a lot of the magic is in the workflow and agent layer -- claude 3.5 can generate a snippet of code for you but it isn't going to open a browser, read api docs, actually make calls to the api, debug, run the code and make sure it builds and works, etc

bfeynman1y ago

Anthropic and OpenAI have certainly been working on this behind the scenes, while they try to see how much better they can get models, they will let others pay for the current state until they find it valuable. The shift we are seeing now is already happening, and they are taking an even larger macroscopic approach by creating computer/tool use, along with the context protocol, so that when it's released it will work with almost any IDE and system...

xpasky1y ago

Why wouldn't it? Just give it a shell tool. (Something like claude.vim, perhaps.)

kordlessagain1y ago

Like this: https://github.com/Mittaai/webwright

kgilpin1y ago

People are saying it’s apples and oranges, but with Computer Use taken into account, this seems like a fair question.

https://docs.anthropic.com/en/docs/build-with-claude/compute...

daft_pink1y ago

I wish they offered a computer use reference implimentation on Windows instead of a linux docker container.

a-arbabian1y ago· 9 in thread

Mike from Vesta (first demo video) claims Devin saved "at least a hundred hours" debugging API integrations. That seems crazy to me - API integrations rarely take that long, and any engineer would spot issues like wrong API keys almost immediately. The tool might be more valuable for non-engineers creating initial drafts, but by the time you've written all the detailed specs for Devin, a mid-level engineer could have made significant progress on the task.

jlund-molfese1y ago

I wish API integrations never took that long! But it's dependent on who you're integrating with and what your product looks like. I'm the engineering manager of the payroll integrations team at a company that does workplace savings plans.

Sometimes even when you're making calls to dozens of different endpoints they're easy, but other times, you end up guessing at how to access undocumented functionality within a GraphQL API that has introspection turned off, or working around entity modeling that's completely different from your system and requires a lot of translation. Or you work with an API whose indexes variably start from 1, 0, -1, and -2 in different endpoints. These generally aren't hard technical challenges to solve, and something like Devin that could take care of most surface-level problems you see while integrating with some XML API from 2007 would be welcome.

There are companies like https://www.tryfinch.com and https://www.merge.dev that try to solve these issues, but their abstractions also reduce flexibility and aren't a perfect for all HRIS integration use cases right now.

rguldener1y ago

I agree. Integrations can be incredibly cumbersome if you have to learn each API from scratch.

There are also more flexible solutions like https://www.nango.dev

It handles the API-specific complexities (auth, retries, webhooks, per-customer config, pre-built templates) but allows you to implement the exact use case + data model you need.

It's open source/source available.

(disclaimer: I am a founder)

babyent1y ago

Hey that’s really cool. I read the FAQ on connection.

So if there are 10 users, the free tier lets me give them the ability to add up to 3 integrations? Is it 3 per user?

Thanks

neom1y ago

Been using https://www.laminar.run/ here and there and found it a good mix of abstract and being able to get in there.

mike_yu1y ago

clearly nobody else has spent all the time i have integrating really old mortgage software :(

jonny_eh1y ago

I doubt Devin could write an integration for an underspecified legacy API like that. Whenever I have had to, I've needed to talk to support/engineering on the other side.

mike_yu1y ago

that's definitely still the case. devin drafts my emails for issues it runs into (which i tell it to do) and i send them off.

this is definitely slower than if i were doing it full time, but i run a company. i go from customer meeting to customer meeting and spend 5-10 min a day taking whatever is blocking devin and pasting it into an email to the partner to get a response for devin.

mike_yu1y ago

although i do agree with you w.r.t integrating like, modern software with well documented/good apis

AlwaysRock1y ago

Debugging is a pretty vague word. I know a LOT of api endpoints with shit documentation. Could Devin generate documentation for a vast number of api endpoints that could have theoretically taken a hundred hours to write?

winkle1y ago· 5 in thread

First place I usually go is the terms of service and what they are granting themselves rights to. Not excited about how broad this is "3.2 License: By using the Services, you hereby grant to Cognition, its affiliates, successors, and assigns a non-exclusive, worldwide, royalty-free, fully paid, sublicensable, transferable license to reproduce, distribute, modify, and otherwise use, display, and perform all acts with respect to the Customer Data as may be necessary for Cognition to provide the Services to you."

CaptainFever1y ago

"as may be necessary for Cognition to provide the Services to you" kind of makes sense IMO. Does that mean they'll only use the license (note: they only get a license, not ownership) to provide services to you? Is it a restriction?

hazmazlaz1y ago

Yes, that clause/phrase restricts the company's rights with respect to their license to your data. Essentially, a clause like that is necessary for users to interact with the service. Makes sense when you think about it, how can they provide service if they can't use the data you provide them?

It's a pretty typical clause you'll see in most SaaS policies.

Source: I work for a SaaS, but I am not a lawyer, caveat emptor.

winkle1y ago

I want to pay for their product, but not enough that I have to ask my lawyer about the language. I did see that one of the features of the enterprise plan is custom terms, but that's not the plan I'm interested in.

1 more reply

bigs1y ago

I always wonder how enforceable these blanket rights would be in court. Didn’t Meta claim to own end users’ photos in the T&Cs back around 2009 and it got challenged and shot down (ianal)?

CaptainFever1y ago

I did some Googling on this.

https://web.archive.org/web/20111103081406/http://consumeris...

Original article that caused the outrage. In particular, the TOS did not say they owned your pictures, but it did give them a license that was quite broad, which included using your likeness in advertisements. However, the change that caused the outrage was that the license no longer expired on account deletion nor content removal.

https://www.npr.org/2009/02/17/100783689/facebook-users-angr...

News article about the outrage.

https://www.nytimes.com/2009/02/19/technology/internet/19fac...

News article about the walkback.

I could not find anything about it being challenged in court.

Oras1y ago· 4 in thread

> Small frontend bugs and edge cases - tag Devin in Slack threads

And other points where it should shine. How does it compare to using Cursor? Is it the slack integration?

waldenyan201y ago

the workflow is quite different from Cursor or Copilot - Devin is an asynchronous tool. A common way to use Devin is to kick off a few sessions in the morning, while you work on other higher priority tasks. It feels a lot more like working with a colleague that you can tag in Slack or go back and forth with on PR comments

anticensor1y ago

Devin has 2 hour, 6 hour and 24 hour inactivity limits before it pauses the work and temporarily deprovisions (sleep) the VM, so you have to supervise it every so often.

jonny_eh1y ago

Does it open PRs?

anticensor1y ago

Yes, it does if you request it to do so. You can tell it to not do too.

didip1y ago· 3 in thread

Aren't you guys afraid that Copilot will simply crushed you? They have all the training data afterall.

apwell231y ago

There are third party CI software like circleCI that didn't get crushed by github because its a high touch business that they don't want to get into.

There are many niches to be captured.

paradite1y ago

I thought circleCI wasn't doing too well?

anticensor1y ago

No, Devin is an autopilot.

Yusefmosiah1y ago· 2 in thread

Looking for comprehensive benchmarks with Devin vs Cursor + Claude 3.6 vs ChatGPT o1 Pro.

In my own experience using Cursor with Claude 3.5 Sonnet (new) and o1-preview, Claude is sufficient for most things, but there are times when Claude gets stumped. Invariably that means I asked it to do too much. But sometimes, maybe 10-20% of the time, o1-preview is able to do what Claude couldn’t.

I haven’t signed up for o1 Pro because going from Cursor to copy/pasting from ChatGPT is a big DevX downgrade. But from what I’ve heard o1 Pro can solve harder coding problems that would stump Claude or o1-preview.

My solution is just to split the problem into smaller chunks that make it tractable for Claude. I assume this is what Devin’s doing. Or is Devin using custom models or an early version of the o1 (full or pro) API?

cbhl1y ago

This predates the o1 release, but the folks behind Devin did do some early evaluation of o1 vs 4o vs Devin back in September:

https://x.com/cognition_labs/status/1834292718174077014

I'd expect a very different experience with Devin vs the IDE-forks -- it provides status updates in Slack, runs CI, and when it's done it puts up a pull request in GitHub.

Yusefmosiah1y ago

Thanks, but that comparison is for old models, a different, non-shipped version of Devin called “Devin-base”, and doesn’t include Claude.

Slack integration, automatically pushing to CI, etc., are relatively low-value compared to the questions of “does it write better code than alternatives?”, “can I depend on it to solve hard problems?”, “will I still need a Cursor and/or ChatGPT Pro subscription to debug Devin’s mistakes?”

adamgordonbell1y ago· 2 in thread

It seems like a lot of the magic is providing LLMs with tools that let it work like a human would. This approach makes more sense to me then the model of expecting an LLM to just emit a giant block of code for a change, given a pile of RAG context.

( removed pricing q, as I missed it is $500 / month for whole teams. I get why that is the pricing, but doesn't work for me to try it in side projects sadly )

k2xl1y ago

It says at the top $500 per month

steve_adams_861y ago

Starting at. If you want, they will take more of your money too.

preommr1y ago· 1 in thread

From the second video: "We can focus on the things that excite us rather than just the maintenancing [maintenance] work".

But these are the kinds of problems that help shape the product. The software archictecture should be a compression of a deep and intuitive understanding of the problem space. How can you develop that knowledge if you're just delegating it to a black box that can't operate at a near-human level?

I've used ai based tools to great success, but on an ad-hoc basis, for specific and small functions or modules. To do the integration part requires an understanding of what abstraction is appropriate where. I don't think these tools are good that.

cowsup1y ago

Good software can be art. And like all art, we have hit the stage in which code can also be cranked out en masse, thoughtlessly, for a quick buck. It was only inevitable.

gexla1y ago· 1 in thread

Should have come with a prominent warning at the app site that you're heading towards a $500 sub. I'm sure it's mentioned in places I didn't see it. Ideally, you would agree to the sub before you even create an account. This could save LOADS of signups from people who aren't your intended users.

anticensor1y ago

They have a $50 tier too, but that one is not currently open to new members.

Topfi1y ago

No public testing, no benchmarks, no clear information on context window size or restrictions for extensive use, no comparison with the newest Claude Sonnet 3.5 or O1, nothing.

What we do get is a price of $ 500,- per month from a company that has been caught lying about this very product [0] and has never allowed independent testing.

Cognition, I am sorry to tell you, but there is no reason to trust you. In fact, there are multiple good reasons no to, even if you offered Devin at a fraction.

If this were e.g. Anthropic launching a new beyond Opus size model that was still performant and came with "chain-of-thought" capabilities, a far more extensive context window that still fully passes needle in haystack and is absolutely solid in sourcing from provided files, keeps on track even when provided with large documents, has few or no restrictions on usage and comes with extensive, verifiable benchmarks that showcase this offering being a significant upgrade over other models, maybe such a price could be justified.

You know why Cognition? Because they haven’t actively lied. What they did instead was let people use their models and actually test the advantages. Even Claude Instant way back when had certain use cases that made them have their own niche and showed they could execute before expanding with 2 and the larger context, then 3 with more applications. You never did any of that, you never gave anyone reason to believe what you claim, you didn’t even release benchmarks. See the difference?

Seems more like a simple cash grab, attempting to ride the O1 wave. OpenAI has a hard time justifying their Pro pricing, you doubling that makes this an out of season April fools joke. Waiting for the inevitable reporting that this is just another API wrapper for Claude or ChatGPT with our old faithful RAG.

[0] https://www.youtube.com/watch?v=tNmgmwEtoWE&pp=ygUJZGV2aW4gY...

paradite1y ago

The trend of AI tools to make a bold claim at launch, just have lots of caveats caveats caveats caveats when actually releasing to public.

mfdupuis1y ago

I'm curious to see how this plays out when it comes to deploying and maintaining production-grade apps. I know relatively little about infrastructure and DevOps, but that's the stuff that actually always seems complicated when it goes from going to MVP to production. This question feels particularly important if we're expecting PMs and designers to be primary users.

That said, I'm super excited about this space and love seeing smart folks putting energy into this. Even if it's still a bit aspirational, I think the idea of cutting down time spent debugging and refactoring and putting more power in the hands of less technical folks is awesome.

debacle1y ago

I couldn't find anywhere a list of languages that this tool supports. What makes this tool better than e.g. cursor?

anticensor1y ago

Can you also add Discord, Telegram, Gitlab, Forgejo integrations for those whose use them for their software development discussions?

allusernamesare1y ago

How does Devin compare to lovable.dev ? I've been thoroughly impressed by their ability to build and host functioning apps from very basic prompts.

WesleyJohnson1y ago

Any plans or capabilities for something local? Not a locally hosted Devin, mind you, but a way to interact with on-prem source control repos?

nextworddev1y ago

Devin really wasted a lot of time going GA because they lost a lot of their initial buzz

DidYaWipe1y ago

Might be an interesting headline if it said what "Devin" is.

adastra221y ago

You never say what Devin is.

j / k navigate · click thread line to collapse

132 comments

113 comments · 21 top-level

waldenyan201y ago· 34 in thread

hey guys - Walden here, one of the founders. Excited to have you try out Devin. Reach out here if you have any questions!

Buttons8401y ago

Hi Walden,

my name is Devin and I don't like sharing a name with a product. Will you please consider changing the name?

There is always the chance that someone named Devin will do something that gives your product a bad name. Perhaps some new scandal will involve someone named Devin or something.

I'd also like you to imagine that a hot new erotic AI was named "Walden", and people said things like "I was talking with Walden last night" as a euphemism. How would that make you feel?

mrieck1y ago

I'd try it out if you allowed paying $50 for some credits instead of requiring subscription.

Even if that version is limited to only editing public Github repos. $500 to see how well it works is too much.

anticensor1y ago

> Even if that version is limited to only editing public Github repos

Not possible to constrain like that with the current Devin architecture.

badFEengineer1y ago

The price seems reasonable, but my main hesitation is on data storage + third party providers- there doesn't seem to be much available information on:

* will you store my code + train on workflows that Devin does for me? * are you piping data to other third party providers (i.e. anthropic, openAI)?

JTyQZSnP3cQGa8B1y ago

Why don't any LLM show examples of C++ applications? I have yet to see a tool like that which I would be happy to use at work.

elashri1y ago

Or CUDA code, as this will be somehow ironic given that LLMs inference engines and training are CUDA code in some way.

anticensor1y ago

It can do that too, I tried that too.

anticensor1y ago

I tried it with C and C++ code, it can do them but not very well.

menaerus1y ago

How large the repositories are that it can "reason" about?

1 more reply

cloudking1y ago

When crafting projects from scratch, does your system actually fix it's own errors?

That seems to be the challenge with Cursor Agent in it's current form, it generates a bunch of code that has bugs and requires a lot of iteration.

swyx1y ago

as someone who has been trying you guys out for the past 8 months... you need a speed lever. default devin is way too slow for me :/ i asked scott for a "demo mode" first time we met

waldenyan201y ago

latest update is around 3-4x faster than it was back in Apr but we are working on making it much faster still!

kordlessagain1y ago

How is that done?

neural_thingOP1y ago

Devin got a lot faster for me recently, made it a lot more enjoyable to use

anticensor1y ago

Also to mention, "suggest knowledge" modal is broken (it silently ignores changes made if you edit the suggested knowledge).

Another issue, sleep&snapshot system is still prone to race conditions in certain cases.

k2xl1y ago

What model does it use under the hood?

How much context window does it load when it is solving tasks?

How does it determine which files to load into context?

anticensor1y ago

It's a finetuned version of o1-preview sized distillation of o1-pro if I remember correctly, with an Azure Ubuntu VM with writable filesystem and internet access.

thekevan1y ago

Can you only use it with a $500 / month subscription?

The word "try" is VERY different than the actual case, which is "pay for use".

If the answer to the first line is yes, how do I request my email be deleted? I started to sign up but I am not a use case for $500 a month at the moment.

anticensor1y ago

It's monthly subscription plus prepaid compute credits (called ACU in the UI).

adamgordonbell1y ago

I'm excited to try it. I use aider quite a bit and tried opendevin at some point.

What is the pricing story?

Can I use it as side project dev or is the target enterprise customers only / mainly?

waldenyan201y ago

we have plenty of small, early stage teams that use Devin but it's optimized designed to fit into a team's workflow. you can of course give it a try and see if it's a good fit for your projects!

anticensor1y ago

Any estimates regarding when the personal tier ($50/month+credits) will resume accepting signups?

tsak1y ago

Is it just me that finds it ironic that you're looking for software developers?

anticensor1y ago

Hey, can you fix the issue where the editor times out and Devin gets stuck?

thekhatribharat1y ago

How does one estimate the number of ACUs required to finish a task?

anticensor1y ago

It spends about 2 to 10 ACU per hour in the small VM, and ten times as much on the large one. No credits spent during sleep and "waiting for response" time as far as I observed.

waldenyan201y ago

a helpful benchmark is that a typical frontend task is about 1-2 ACUs, but really depends on the complexity of the task

yuppiemephisto1y ago

Does it work with more obscure languages like Lean 4?

anticensor1y ago

It can work with any language, as it interacts with the VM and can read compiler messages.

marcusverus1y ago

How should I go about arranging a demo?

throw832881y ago

Not really product related: The current trajectory of LLMs/Agents, what is your career advice to someone in school for Computer Science right now?

papichulo20231y ago

Are you a human founder?

waldenyan201y ago

very much so!

xena1y ago

binarynate1y ago· 18 in thread

https://www.washingtonpost.com/technology/interactive/2021/p...

https://archive.is/w8r58

slickdork1y ago

As someone named Devin who works in tech, I greatly hope this project fails. :)

Buttons8401y ago

No, I'm Devin.

At least our names got attached to an upstanding product, and one that is likely to languish and fail. We're not the next "Alexa", I hope.

arockwell1y ago

100% agree. It is shitty and rude. Not to mention it does not even make sense.

stuckkeys1y ago

wpm1y ago

Devin comes from "dev in chat", a common phrase in livestream chat rooms to signal that the developer of the game or product being showcased was present.

solarpunk1y ago

why not just call it dev

1 more reply

alexjplant1y ago

psygn891y ago

daveguy1y ago

1 more reply

laptopdev1y ago

When I was 7, my family's Japanese foreign exchange student was being introduced to me. She bursted out laughing saying my nick name Dev Dev sounded like "fart fart" or "fat fart".

Had the nickname fart fart until my sister moved out of the house.

Maybe you could confirm, but ChatGPT tells me in Japanese Debu colloquially and offensively means "fat" or "chubby", and Bu is an onomotapoeia for a fart noise, like "prrt" in English.

ben_w1y ago

> "W'rkncacnter" to preclude the possibility of name collisions (chaotic entities imprisoned in alien stars notwithstanding)?

Bold move, but imagine the patch notes:

• Fixed bug where assistant attempted to unmake the fabric of reality

• Resolved issue where “Set alarm for 7 AM” triggered a rampancy cascade

• Improved pronunciation of “Lh’owon” for calendar appointments

Probably still a better bet than Durandal, definitely an improvement over Tycho.

And then there was Leela…

binarynate1y ago

nprateem1y ago

But I bet you're never late for your train

zamadatix1y ago

mewpmewp21y ago

I don't care about it potentially being a real name, because I doubt it would be a household item, but somehow the name itself for this particular product seems offputting.

If it had to be a name for a product, it seems like to give me some sort of cheap male grooming or AXE body spray product vibes.

decGetAc1y ago

Probably, I share a name with a product and I couldn't care less. It's wild that some would feel bad much less consider it lacking empathy.

I don't like first name product names for other reasons but not because they share a name with humans named the same

bravetraveler1y ago

They gotta be Joshing us! How's Dic-I mean, Richard?

Just having fun. I see what you mean and vaguely support it... I just won't lose anything over it

hiatus1y ago

How is it lacking empathy? Devin is not something invoked by voice, so I fail to see the comparison to Alexa.

edit:

> Eschew flamebait. Avoid generic tangents. Omit internet tropes.

> Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.

daft_pink1y ago· 13 in thread

Is there any evidence this works better than Claude 3.5?

projectileboy1y ago

amkkma1y ago

Based on this, what is the outlook for software dev generally, and junior and mid level devs?

throw832881y ago

More specifically: What kind of advice does GP have for Computer Science students in school right now?

I've been frankly terrified of the pace of LLM development since 2022.

servercobra1y ago

Do you have any examples of the kinds of projects you would assign it to?

Yusefmosiah1y ago

The reason it makes sense to compare them is there are problems that Claude 3.5 (or o1) can’t solve. Can Devin solve them? If yes, it’s easily worth the $500. If no, it’s a harder sell.

jonny_eh1y ago

> We're using it only for particular use cases

Can you share concrete examples?

projectileboy1y ago

1 more reply

mike_yu1y ago

bfeynman1y ago

xpasky1y ago

Why wouldn't it? Just give it a shell tool. (Something like claude.vim, perhaps.)

kordlessagain1y ago

Like this: https://github.com/Mittaai/webwright

kgilpin1y ago

People are saying it’s apples and oranges, but with Computer Use taken into account, this seems like a fair question.

https://docs.anthropic.com/en/docs/build-with-claude/compute...

daft_pink1y ago

I wish they offered a computer use reference implimentation on Windows instead of a linux docker container.

a-arbabian1y ago· 9 in thread

jlund-molfese1y ago

rguldener1y ago

I agree. Integrations can be incredibly cumbersome if you have to learn each API from scratch.

There are also more flexible solutions like https://www.nango.dev

It handles the API-specific complexities (auth, retries, webhooks, per-customer config, pre-built templates) but allows you to implement the exact use case + data model you need.

It's open source/source available.

(disclaimer: I am a founder)

babyent1y ago

Hey that’s really cool. I read the FAQ on connection.

So if there are 10 users, the free tier lets me give them the ability to add up to 3 integrations? Is it 3 per user?

Thanks

neom1y ago

Been using https://www.laminar.run/ here and there and found it a good mix of abstract and being able to get in there.

mike_yu1y ago

clearly nobody else has spent all the time i have integrating really old mortgage software :(

jonny_eh1y ago

I doubt Devin could write an integration for an underspecified legacy API like that. Whenever I have had to, I've needed to talk to support/engineering on the other side.

mike_yu1y ago

that's definitely still the case. devin drafts my emails for issues it runs into (which i tell it to do) and i send them off.

mike_yu1y ago

although i do agree with you w.r.t integrating like, modern software with well documented/good apis

AlwaysRock1y ago

winkle1y ago· 5 in thread

CaptainFever1y ago

hazmazlaz1y ago

It's a pretty typical clause you'll see in most SaaS policies.

Source: I work for a SaaS, but I am not a lawyer, caveat emptor.

winkle1y ago

1 more reply

bigs1y ago

I always wonder how enforceable these blanket rights would be in court. Didn’t Meta claim to own end users’ photos in the T&Cs back around 2009 and it got challenged and shot down (ianal)?

CaptainFever1y ago

I did some Googling on this.

https://web.archive.org/web/20111103081406/http://consumeris...

https://www.npr.org/2009/02/17/100783689/facebook-users-angr...

News article about the outrage.

https://www.nytimes.com/2009/02/19/technology/internet/19fac...

News article about the walkback.

I could not find anything about it being challenged in court.

Oras1y ago· 4 in thread

> Small frontend bugs and edge cases - tag Devin in Slack threads

And other points where it should shine. How does it compare to using Cursor? Is it the slack integration?

waldenyan201y ago

anticensor1y ago

Devin has 2 hour, 6 hour and 24 hour inactivity limits before it pauses the work and temporarily deprovisions (sleep) the VM, so you have to supervise it every so often.

jonny_eh1y ago

Does it open PRs?

anticensor1y ago

Yes, it does if you request it to do so. You can tell it to not do too.

didip1y ago· 3 in thread

Aren't you guys afraid that Copilot will simply crushed you? They have all the training data afterall.

apwell231y ago

There are third party CI software like circleCI that didn't get crushed by github because its a high touch business that they don't want to get into.

There are many niches to be captured.

paradite1y ago

I thought circleCI wasn't doing too well?

anticensor1y ago

No, Devin is an autopilot.

Yusefmosiah1y ago· 2 in thread

Looking for comprehensive benchmarks with Devin vs Cursor + Claude 3.6 vs ChatGPT o1 Pro.

cbhl1y ago

This predates the o1 release, but the folks behind Devin did do some early evaluation of o1 vs 4o vs Devin back in September:

https://x.com/cognition_labs/status/1834292718174077014

I'd expect a very different experience with Devin vs the IDE-forks -- it provides status updates in Slack, runs CI, and when it's done it puts up a pull request in GitHub.

Yusefmosiah1y ago

Thanks, but that comparison is for old models, a different, non-shipped version of Devin called “Devin-base”, and doesn’t include Claude.

adamgordonbell1y ago· 2 in thread

( removed pricing q, as I missed it is $500 / month for whole teams. I get why that is the pricing, but doesn't work for me to try it in side projects sadly )

k2xl1y ago

It says at the top $500 per month

steve_adams_861y ago

Starting at. If you want, they will take more of your money too.

preommr1y ago· 1 in thread

From the second video: "We can focus on the things that excite us rather than just the maintenancing [maintenance] work".

cowsup1y ago

Good software can be art. And like all art, we have hit the stage in which code can also be cranked out en masse, thoughtlessly, for a quick buck. It was only inevitable.

gexla1y ago· 1 in thread

anticensor1y ago

They have a $50 tier too, but that one is not currently open to new members.

Topfi1y ago

No public testing, no benchmarks, no clear information on context window size or restrictions for extensive use, no comparison with the newest Claude Sonnet 3.5 or O1, nothing.

What we do get is a price of $ 500,- per month from a company that has been caught lying about this very product [0] and has never allowed independent testing.

Cognition, I am sorry to tell you, but there is no reason to trust you. In fact, there are multiple good reasons no to, even if you offered Devin at a fraction.

[0] https://www.youtube.com/watch?v=tNmgmwEtoWE&pp=ygUJZGV2aW4gY...

paradite1y ago

The trend of AI tools to make a bold claim at launch, just have lots of caveats caveats caveats caveats when actually releasing to public.

mfdupuis1y ago

debacle1y ago

I couldn't find anywhere a list of languages that this tool supports. What makes this tool better than e.g. cursor?

anticensor1y ago

Can you also add Discord, Telegram, Gitlab, Forgejo integrations for those whose use them for their software development discussions?

allusernamesare1y ago

How does Devin compare to lovable.dev ? I've been thoroughly impressed by their ability to build and host functioning apps from very basic prompts.

WesleyJohnson1y ago

Any plans or capabilities for something local? Not a locally hosted Devin, mind you, but a way to interact with on-prem source control repos?

nextworddev1y ago

Devin really wasted a lot of time going GA because they lost a lot of their initial buzz

DidYaWipe1y ago

Might be an interesting headline if it said what "Devin" is.

adastra221y ago

You never say what Devin is.

j / k navigate · click thread line to collapse