Codebuff is different because we simplified the input to one step: you type what you want done in your terminal and hit enter. Then Codebuff looks through your whole codebase and makes the edits it wants, to existing source files or new ones. It also can run your tests, the type checker, or install packages to fulfill your request.
Demo video: https://www.youtube.com/watch?v=dQ0NOMsu0dA
It all started at a hackathon. I was trying out Sonnet 3.5 which had recently come out and seeing if I could use it to write code. The script I cobbled together that day pulled codebase context in one step and used it to rewrite files with changes in the second step. This two step process still exists today. Incidentally, my hackathon script worked rather poorly and my demo failed to produce any useful code.
But that weekend I thought about the kind of errors it made, and realized that with more context on our codebase, it might have been able to get the change right. For example, it tried to create an endpoint on our server (at my previous startup), but it didn't know that you needed to edit 3 specific files to do this (yeah... our backend was not that clean). So I hand-wrote a guide to our codebase, like I was instructing a new hire. I put it in a markdown file and passed it into Sonnet 3.5's system prompt. And the crazy thing is that it started producing wayyy better code. So, I started getting excited. In fact, this code guide idea still exists in Codebuff today as knowledge.md files which are automatically read on every request.
I didn't think of this project as a startup idea at first. I thought it was just a simple script anyone could write. But after another week, I could see there were more problems to solve and it should be a product.
In the week between applying to YC and the interview, I could not get Codebuff to edit files consistently. I tried many prompting strategies to get it to replace strings in the original file, but nothing worked reliably. How could I face my interviewer if I could not get something basic like this to work? On the day before my interview, in a Hail Mary attempt, I fine-tuned GPT-4o to turn Claude's sketch of changes into a git patch, which would add and remove lines to make the edits. I only finished generating the training data late at night, and the fine-tuning job ran as I slept.
And, holy hell, the next morning it worked! I pushed it to production just in time for my YC interview with Dalton. Soon after, Brandon joined and we were off to the races.
So, how does Codebuff work exactly? You invoke it in your terminal, and it starts by running through the source files in that directory and subdirectories and parsing out all the function and class names (or equivalents in 11 languages). We use the tree-sitter library to do this. It builds out a codebase map that includes these symbols and the file tree.
Then, it fires off a request to Claude Haiku 3.5 to cache this codebase context so user inputs can be responded to with lower latency. (Prompt caching is OP!). We have a stateless server that passes messages along to Anthropic or OpenAI. We use websockets to ferry data back and forth to clients. We didn't have authentication or even a database for the first three months. Codebuff was free to install and used our API keys for all requests. Luckily, no one exploited us for too much free Claude usage haha. Major thanks to Brandon for saving this situation by building out our database (Postgres + Drizzle), server (Bun, hosted on Render, auth (using the free Auth.js), website (NextJS also hosted on Render), billing (Stripe), logging (BetterStack), and dashboard (Retool). This is the best tech stack I’ve ever had.
When the user sends an input message, we prompt Claude to pick files that would be relevant (step 1). After picking files, we load them into context and the agent responds. It invokes tools using xml tags that we parse. It literally writes out <edit_file path="src/app.ts">…</edit_file> to edit a particular file, and has other tags to run terminal commands, or to ask to read more files. This is all we really need, since Anthropic has already trained Claude with very similar tools reach state of the art on the SWE benchmark.
Codebuff has limited free usage, but if you like it you can pay $99/mo to get more credits. We realize this is a lot more than competitors, but that’s because we do more expensive LLM calls with more context.
We’re already seeing Codebuff used in surprising ways. One user racked up a $500 bill by building out two Flutter apps in parallel. He never even looked at the code it generated. Instead, he had long conversations with Codebuff to make progress and fix errors, until the apps were built to his satisfaction. Many users built real apps over a weekend for their teams and personal use.
Of course, those aren't the typical use cases. Users also frequently use Codebuff to write unit tests. They would build a feature in parallel with unit tests and have Codebuff do loops to fix up the code until the tests pass. They would also ask it to do drudge work like set up Oauth flows or API scaffolding.
What's really exciting with all of these examples is that we're seeing people's creativity becoming unbridled. They're spending more of their time thinking about architecture and design, instead of implementation details. It's so cool that we're just at the beginning, and the technology is only going to improve from here.
If you would want to use Codebuff inside your own systems, we have an alpha SDK that exposes the same natural language interface for your apps to call and receive code edits! You can sign up here for early access: https://codebuff.retool.com/form/c8b15919-52d0-4572-aca5-533....
Thank you for reading! We’re excited for you to try out Codebuff and let us know what you think!
The real problem I want someone to solve is helping me with the real niche/challenging portion of a PR, ex: new tiptap extension that can do notebook code eval, migrate legacy auth service off auth0, record and replay API GET requests and replay a % of them as unit tests, etc.
So many of these tools get stuck trying to help me "start" rather than help me "finish" or unblock the current problem I'm at.
I want the demos to be of real work, but somehow they never seem as cool unless it's a neat front end toy example.
Here is the demo video I sent in my application to YC, which shows it doing real stuff: https://www.loom.com/share/fd4bced4eff94095a09c6a19b7f7f45c?...
Historically, Pepsi won taste tests and people chose Coke. Because Pepsi is sweeter, so that first sip tastes better. But it's less satisfying—too sweet—to drink a whole can.
The sexy demos don't, in my opinion and experience, win over the engineers and leaders you need. Lil startups, maybe, and engineers that love the flavor of the week. But for solving real, unsexy problems—that's where you'll pull in organizations.
It takes 5+ seconds just to change one field to dark mode, I don't even want to imaigne a situation where I have two fields and I want to explain that I need to change this field and not that field.
I'm not sure who is the target audience for this, people who want to be programmers without learning programming ?
> it seems like it would be more effective to learn the skills you need rather than using this for a decade.
Think of it as a calculator. You do want to be able to do addition, but not neccessarily to manually add 4-digit numbers in your head.
> It takes 5+ seconds just to change one field to dark mode
Our current LLMs are way too slow for this. I am chuckling every time someone says "we don't need LLMs to be faster because people can't read faster". Imagine this using Groq with a future model with similar capability level, and taking 0.5 seconds to do this small change.
People need to remember we're at the very beginning of using AI for coding. Of course it's suboptimal for majority of cases. Unless you believe we're way past half the sigmoid curve on AI improvements (which I don't), consider that this is the worst the AI is ever going to be for coding.
A year ago people were incredulous when told that AI could code. A year before that people would laugh you out of the room. Now we're at the stage where it kinda works, barely, sometimes. I'm bullish on the future.
Where LLMs shine is in being a personal Stack Overflow: asking a question and having a personalized, specific answer immediately, that uses one's data.
But solving actual, real problems still seem out of reach. And letting them touch my files sound crazy.
(And yes, ok, maybe I just suck at prompting. But I would need detailed examples to be convinced this approach can work.)
> produce large amounts of convoluted code that in the end prove not only unnecessary but quite toxic.
What does that say about your prompting?
We have a lot of code in production which are AI written. The important thing is that you need to consciously make a module or project AI-ready. This means that things like modularity and smaller files are even more important than they usually are.
I can't share those PRs, but projects on my profile page are almost entirely AI written (except the https://bashojs.org/ link). Some of them might meet your definition of niche based on the example you provided.
I will admit, however, that my context switching has increased a ton, and that's probably not great. I often tell Codebuff to do something, inevitably get distracted with something else, and then come back later barely remembering the original task
Claude wrote me a prosemirror extension doing a bunch of stuff that I couldn’t figure out how to do myself. It was very convenient.
Cursor Composer doesn't handle that and seems geared towards a small handful of handpicked files.
Would codebuff be able to handle a proper sized codebase? Or do the models fundamentally not handle that much context?
But Codebuff has a whole preliminary step where it searches your codebase to find relevant files to your query, and only those get added to the coding agent's context.
That's why I think it should work up to medium-large codebases. If the codebase is too large, then our file-finding step will also start to fail.
I would give it a shot on your codebase. I think it should work.
@Codebuff team, does it make sense to provide a documentation.md with exposition on the systems?
The long tail of niche engineering problems is the time consuming bit now. That's not being solved at all, IMHO.
Any links on this topic you rate/could share?
Hopefully the demo on our homepage shows a little bit more of your day-to-day workflows than other codegen tools show, but we're all ears on ways to improve this!
To give a concrete example of usefulness, I was implementing a referrals feature in Drizzle a few weeks ago, and Codebuff was able to build out the cli app, frontend, backend, and set up db schema (under my supervision, of course!) because of its deep understanding of our codebase. Building the feature properly requires knowing how our systems intersect with one another and the right abstraction at each point. I was able to bounce back and forth with it to build this out. It felt akin to working with a great junior engineer, tbh!
EDIT: another user shared their use cases here! https://news.ycombinator.com/item?id=42079914
> To give a concrete example of usefulness, I was implementing a referrals feature in Drizzle a few weeks ago, and Codebuff was able to build out the cli app, frontend, backend, and set up db schema
Record this!
Better yet, stream it on Twitch and/or YouTube and/or Discord and build a small community of followers.
People would love to watch you.
It is also a true agent. It can run terminal commands to aid the request. For one request it could: 1. Write a unit test 2. Run the test 3. Edit code to fix the error 4. Run it again and see it pass
If you try out Codebuff, I think you'll see why it's unique!
where the problems start: cost of inference vs quality, latency, multi modality (vision + imagen), ai service provider issues (morning hours in US time zones = poor quality results)
the best part is being able to adjust it to my work style
Fundamentally, I think codegen a pretty new space and lots of people are jumping in because they see the promise. Remains to be seen what the consolidation looks like. With the rate of advancement in LLMs and codegen in particular, I wouldn't be surprised to see even more tools than we do now...
I've seen people say "you don't have to add files to Codebuff", but Aider tells me when the LLM has requested to see files. I just have to approve it. If that bothers you, it's open source, so you could probably just add a config to always add files when requested.
Aider can also run commands for you.
What am I missing?
Aider tends to maintain near "state of the art" including e.g. treesitter, and an actually refined (as in, iterated improvements over time) user experience.
Aider has been refining for 8000 commits since May of 2023. Codebuff "all started" circa Claude Sonnet 3.5.
The story of discovery (e.g. git patch) at best feels like a lack of researching the landscape since leaderboards for SOTA indicate whether a model performs better as whole code or diffs and Anthropic even cites Aider benchmarks, but cynically, the narrative feels a bit like looking through the things Aider has been doing differently/better, and putting them in an origin story so the feature list might sound less like the “sincerest form of flattery.”
Particularly concerning is the story talking about "seeing" users coding loops. Perhaps this is a figure of speech. As designed, Codebuff are in the middle of all users' code slinging, so perhaps it isn't.
Checking the Privacy Policy shows it's only about cookies and tracking, not about information privacy or IP protection of any kind.
Checking the Terms of Service says they own any code you post through it and can give it to others:
"However, by posting Content using Service you grant us the right and license to use, modify, publicly perform, publicly display, reproduce, and distribute such Content on and through Service. You agree that this license includes the right for us to make your Content available to other users of Service."
Meaning, the TOS is a for a public social media type service, not for an intellectual property service.
(Note that in VSCode "cline" can give Aider a run for its money.)
It's totally true that a lot of the development of Codebuff is merely me (and Brandon) working through a lot of the problems that Aider already solved! That makes sense.
Partly, my thesis is that if you start after Sonnet 3.5 is out, that you design things differently. For example, I started without manual file selection and worked to make it more like an agent that has native access to your environment.
Needless to say, I'm a fan of the work Paul has done on Aider, and I've appreciated the benchmarks and guides he's created and shared publicly. And Cline is also an amazing project which I want to try out soon as well!
With respect to privacy, we have pledged not to store your codebase, and mainly store logs that we use to debug the application. When seeing users use Codebuff, I mean I literally watched them use it, as we've done many in-person user tests, plus the Manifold Team has been using Codebuff for a while.
We also intend to release a Privacy Mode, like Cursor has, where we will not store anything at all, not even the logs of your interactions!
It makes sense to be a bit skeptical of Codebuff, since we are so new, but I intend to not let our users down!
My own tool `gptme` lets the agent interactively read/collect context too (as does Anthropic in their latest minimal-harness submission to SWE-bench), it's nothing novel.
(I've just played a little bit with aider and codebuff. I've previously tried aider and it always errored out on my code base, but inspired by this comment I tried again, and now it works well.)
Have you used Aider extensively? How are you finding it for your coding needs vs IDE-based chats?
The reason we don't ask for human review is simply: we've found that it works fine to not ask.
We've had a few hundred users so far and usually people are skeptical of this at first, but as they use it they find that they don't want it to ask for every command. It enables cool use cases where Codebuff and iterate by running tests, seeing the error, attempting a fix, and running them again.
If you use source control like git, I also think that it's very hard for things to go wrong. Even if it ran rm -rf from your project directory, you should be able to undo that.
But here's the other thing: it won't do that. Claude is trained to be careful about this stuff and we've further prompted it to be careful.
I think not asking to run commands is the future of coding agents, so I hope you will at least entertain this idea. It's ok if you don't want to trust it, we're not asking you to do anything you are uncomfortable with.
Could you please explain a bit how you are sure about it?
In Codebuff you don't have to manually specify any files. It finds the right ones for you! It also pulls more files to get you a better result. I think this makes a huge difference in the ergonomics of just chatting to get results.
Codebuff also will run commands directly, so you can ask it to write unit tests and run them as it goes to make sure they are working.
Aider has extensive code for computing "repository map", with specialized handling for many programming languages; that map is sent to LLM to give it an overview of the project structure and summary of files it might be interested in. It is indeed a very convenient feature.
I never tried writing and launching unit tests via Aider, but from what I remember from the docs, it should work out of the box too.
Alright, I'm in.
Nice work!
It's cool to have this natively on the remote system though. I think a safer approach would be to compile a small binary locally that is multi-platform, and which has the command plus the capture of output to relay back, and transmit that over ssh for execution (like how MGMT config management compiles golang to static binary and sends it over to the remote node vs having to have mgmt and all it's deps installed on every system it's managing).
Could be low lift vs having a package, all it's dependencies and credentials running on the target system.
It’s a weird catch-22 giving praise like that to LLMs.
If you are, then you might be able to intuit and fill in the gaps left my the LLM and not even know it.
And if you’re not, then how could you judge?
Not really much to do with that you were saying, really, just a thought I had.
> It’s a weird catch-22 giving praise like that to LLMs.
It's a bit asymmetrical though isn't it -- judging quality is in fact much easier than producing it.
> you might be able to intuit and fill in the gaps left my the LLM and not even know it
Just because you are able to fill gaps with it doesn't mean it's not good. With all of these tools you basically have to fill gaps. There are still differences between Cline vs Cursor vs Aider vs Codebuff.
Personally I've found Cline to be the best to date, followed by Cursor.
Context I have hired hundreds of engineers and built many engineering teams from scratch to 50+, and have been doing systems administration, solutions architecture, infrastructure design, devops, cloud orchestration and data platform design for 25 years.
I'm not bluffing when I say Claude's latest sonnet model and Cline in vscode has really been 99th percentile good on everything I've thrown at it (with some direction, as needed) and has done more productive, quality work than a team of 10 engineers in the last week alone.
If you haven't tried it I can understand your pessimism.
I see Codebuff as a premium version of Cline, assuming that we are in fact more expensive. We do a lot of work to find more relevant files to include in context.
Admittedly the last time I used manicode was a while back but I even preferred Cursor to it, and Cursor hallucinates like a mf'er. What I liked about cursor is that I can just tell composer what files I want it to look at in the UI. But I just use Cline now because I find its performance to be the best.
Other datapoints: backend / ML engineer. Maybe other kinds of engineers have different experiences.
/codebuff/dist/manifold-api.js
Codebuff was originally called Manicode. We just renamed it this week actually.
There was meant to be a universe of "Mani" products. My other cofounder made Manifund, and there's a conference we made called Manifest!
Wasn't there a recent startup in F24 that stole code from another YC company and fire was quickly put out by everyone?
- It chooses files to read automatically on each message — unlike Cursor’s composer feature. It also reads a lot more than Cursor's @codebase command. - It takes 0 clicks — Codebuff just edits your files directly (you can always peek at the git diffs to see what it’s doing). - It has full access to your existing tools, scripts, and packages — Codebuff can install packages, run terminal commands and tests, etc. - It is portable to any development environment
We use OpenAI and Anthropic, so unfortunately we have to abide by their policies. But we only grab snippets of your code at any given point, so your codebase isn't seen by any entity in its entirety. We're also considering open-sourcing, so that might be a stronger privacy guarantee.
I should note that my cofounder James uses both and gets plenty of value by combining them. Myself, I'm more of a plain VSCode guy (Zed-curious, I'll admit). But because Codebuff lives in your terminal, it fits in anywhere you need.
No comment on our batchmates
I think you just need to try it to see the difference. You can feel how much easier it is haha.
We don't store your codebase, and have a similar policy to Cursor, in that our server is mostly a thin wrapper that forwards requests to LLM providers.
The PearAI debacle is another story, but mostly they copied the open source project Continue.dev without giving proper attribution.
On the project itself, I don't really find it exciting at all, I'm sorry. It's just another wrapper for a 3rd party model, and the fact that you can 1) describe the entire workflow in 3 paragraphs, and 2) built it and launched it in around 4 months, emphasizes that.
Congrats on launch I guess.
No worries if this isn't a good fit for you. You're welcome to try it out for free anytime if you change your mind!
FWIW I wasn't super excited when James first showed me the project. I had tried so many AI code editors before, but never found them to be _actually usable_. So when James asked me to try, I just thought I'd be humoring him. Once I gave it a real shot, I found Codebuff to be great because of its form factor and deep context awareness: CLI allows for portability and system integration that plugins or extensions really can't do. And when AI actually understands my codebase, I just get a lot more done.
Not trying to convince you to change your mind, just sharing that I was in your shoes not too long ago!
> CLI allows for portability and system integration that plugins or extensions really can't do
In the past 6 or 7 years I haven't written a single line of code outside of a JetBrains IDE. Same thing for all of my team (whether they use JetBrains IDEs or VS Code), and I imagine for the vast majority of developers.
This is not a convincing argument for the vast majority of people. If anything, the fact that it requires a tool OUTSIDE of where they write code is an inconvenience.
> And when AI actually understands my codebase, I just get a lot more done.
But Amazon Q does this without me needing to type anything to instruct it, or to tell it which files to look at. And, again, without needing to go out of my IDE.
Having to switch to a new tool to write code using AI is a huge deterrent and asking for it is a reckless choice for any company offering those tools. Integrating AI in tools already used to write code is how you win over the market.
I was thinking the same. My (admittedly old-ish) 2070 Super runs at 25-30% just looking at the landing page. Seems a bit crazy for a basic web page. I'm guessing it's the background animation.
> I'm sorry. It's just another wrapper for a 3rd party model
The main challenge with working with LLMs is actually one of "ETL" and understanding what data to load and how to transform it into some form that leads to the desired output.For trivial tasks, this is certainly easy. For complicated tasks, like understanding a codebase or a product catalog of tens of thousands of entries, this is non-trivial.
My team is not working in the code gen space, but even though we also "just wrap" an API, almost all of our work is in data acquisition, transformation, the retrieval strategy, and structuring of the request context.
The API call to the LLM is like hitting "bake" on an oven: all of the real work happens before that.
We might have a bit of an advantage because we pull more files as context so the edit can be more in the style of your existing code.
One downside to use pulling more context is we burn more tokens. That's partly why we have to charge $99 whereas cursor is $20 per month.
> Codebuff has limited free usage, but if you like it you can pay $99/mo to get more credits...
> One user racked up a $500 bill...
Those two statements are kind of confusing together. Past the free tier, what does $99/month get you? It sounds like there's some sort of credit, but that's not discussed at all here. How much did this customer do to get to that kind of bill? I get that they built a flutter app, but did it take a hour to run up a $500 bill? 6 hours? a whole weekend? Is there a way to set a limit?
The ability to rack up an unreasonable bill by accident, even just conceptually, is a non-starter for many. This is interactive so it's not as bad as accidentally leaving a GPU EC2 instance on overnight, but I'll note that Aider shows per query and session costs.
The user had spent the entire weekend developing the app, and admitted that he would have been more careful to manage his Codebuff usage had it not been for this bug.
We're open to adding hard limits to accounts, so you're never charged beyond the credits you paid for. We just wanted to make sure people could pay a bit more to get to a good stopping point once they passed their limits.
On the flip side, there's probably useful things to learn from how he developed his app when he didn't feel the need to be careful; in a way, your $500 mistake bought you useful test data.
In my own use of Aider, I noticed I'm always worried about the costs and pay close attention to token/cost summaries it displays. Being always on my mind, this affects my use of this tool in ways I'm only beginning to uncover. Same would likely be true for Codebuff users.
Have you considered a bring your own api key model?
If you have multiple repos, you could create a directory that contains them all, and that should work pretty well!
Is that through the Enterprise plan?
We actually ended up not charging this guy since there was a bug where we told him he got 50,000 credits instead of 10,000. Oops!
Any specific reason to choose the terminal as the interface? Do you plan to make it more extensible in the future? (sounds like this could be wrapped with an extension for any IDE, which is exciting)
Also, do you see it being a problem that you can't point it to specific lines of code? In Cursor you can select some lines and CMD+K to instruct an edit. This takes away that fidelity, is it because you suspect models will get good enough to not require that level of handholding?
Do you plan to benchmark this with swe-bench etc.?
The terminal is actually a great interface because it is so simple. It keeps the product focused to not have complex UI options. But also, we rarely thought we needed any options. It's enough to let the user say what they want in chat.
You can't point to specific lines, but Codebuff is really good at finding the right spot.
I actually still use Cursor to edit individual files because I feel it is better when you are manually coding and want to change just one thing there.
We do plan to do the SWE bench. It's mostly the new Sonnet 3.5 under the hood making the edits, so it should do about as well as Anthropic's benchmark for that, which is really high, 49%: https://www.anthropic.com/news/3-5-models-and-computer-use
Fun fact is that the new Sonnet was given two tools to do code edits and run terminal commands to reach this high score. That's pretty much what Codebuff does.
It’s a crowded space and I don’t know how it’ll play, but in a space that hasn’t always brought out the best in the community, this Launch HN is a winner in my book.
I hope it goes great. Congratulations on the launch.
Ultimately, I think a future where the limit to good software is good ideas and agency to realize them, as opposed to engineering black boxes, mucking with mysterious runtime errors, holy wars on coding styles, etc. is where all the builders in this space are striving towards. We just want to see that happen sooner than later!
Could you say more about this? What was the entirety of your training data, exactly, and how did the sketch of changes and git patch play into that?
This is all the data I need: the old file, the sketch of how Claude would update it, and the ground truth diff that should be produced. I compiled this into the ideal conversation where the assistant responds with the perfect patch, and that became the training set. I think I had on the order of ~300 of these conversations for the first run, and it worked pretty well.
I came up with more improvements too, like replacing all the variant placeholder comments like "// ... existing code ..." or "# ... (keep the rest of the function)" with one [[*REPLACE_WITH_EXISITNG_CODE*]] symbol, and that made it more accurate
Would however pay for actual software that I can just buy instead of rent to do the task of inline shell assitance, without making network calls behind my back that i'm not in complete perfectionist one hundred point zero zero per cent control of.
Sorry just my opinion in general with these types of products. If you don't have the skills to make a fully self contained language model type of product or something do this then you are not skilled enough team for me to trust with my work shell.
So do you want to buy tens of thousands of dollars in GPUs or do you want to rent them second-by-second? Most people will choose the latter. I understand you don't trust the infrastructure and that's reasonable. If self-hosting was viable it would be more popular.
It's become my go-to tool for handling fiddly refactors. Here’s an example session from a Rust project where I used it to break a single file into a module directory.
https://gist.github.com/cablehead/f235d61d3b646f2ec1794f656e...
Notice how it can run tests, see the compile error, and then iterate until the task is done? Really impressive.
For reference, this task used ~100 credits
Thanks for sharing! haxton was asking about practical use cases, I'll link them here!
Could this tool get a command from the LLM which would result in file-loss? How would you prevent that?
One is that I think it is simpler for the end user to not have to add their own keys. It allows them to start for free and is less friction overall.
Another reason is that it allows us to use whichever models we think are best. Right now we just use Anthropic and OpenAI, but we are in talks with another startup to use their rewriting model. Previously, we have used our own fine-tuned model for one step, and that would be hard to do with just API keys.
The last reason that might be unpopular is that keeping it closed source and not allowing you to bring your keys means we can charge more money. Charging money for your product is good because then we can invest more energy and effort to make it even better. This is actually beneficial to you, the end user, because we can invest in making the product good. Capitalism works, cheers.
Per your last question, I do advise you use git so that you can always revert to return to your old file state! Codebuff does have a native "undo" command as well.
Making sure that our word is trustworthy to the broader world at large is going to be a big challenge for us. Do you have any ideas for what we can do? We're starting to think about open source, but we aren't quite ready for that yet.
If you want to make a multi-file edit in cursor, you open composer, probably have to click to start a new composer session, type what you want, tell it which files it needs to include, watch it run through the change (seeing only an abbreviated version of the changes it makes), click apply all, then have to go and actually look at the real diff.
With codebuff, you open codebuff in terminal and just type what you want, and it will scan the whole directory to figure out which files to include. Then you can see the whole diff. It's way cleaner and faster for making large changes. Because it can run terminal commands, it's also really good at cleaning up after itself, e.g., removing files, renaming files, installing dependencies, etc.
Both tools need work in terms of reliability, but the workflow with Codebuff is 10x better.
If you're nervous about this, I'd suggest throwing Codebuff in a Docker container or even a separate instance with just your codebase.
I have noticed some small oddities, like every now and then it will remove the existing contents of a module when adding a new function, but between a quick glance over the changes using the diff command and our standard CI suite, it's always pretty easy to catch and fix.
I’m curious what exactly people say causes them to make the switch from Cursor to Codebuff? Or do people just use both?
I open the terminal panel at the bottom of the Cursor window, start up `codebuff`, and voila, I have an upgraded version of Cursor Compose!
Depending on what exactly I'm implementing I rely more on codebuff or do more manual coding in Cursor. For manual coding, I mostly just use the tab autocomplete. That's their best feature IMO.
But codebuff is very useful for starting features out if I brain dump what I want and then go fix it up. Or, writing tests or scripts. Or refactoring. Or integrating a new api.
As codebuff has gotten better, I've found it useful in more cases. If I'm implementing a lot of web UI, I can nearly stop looking at the code altogether and just keep prompting it until it works.
Hopefully that gives you some idea of how you could use codebuff in your day-to-day development.
I’ve hard time figuring out what codebuff brings to the table that hasn’t been done before other than being YC backed. I think to win in this massively competitive and fast moving market, you really have to put forward something significantly better than an expensive cobbled together script replicating OSS solutions…
I know this sounds harsh, but believe me, differentiation makes or breaks you sooner than later. Proper differentiation doesn’t have to be hard, it just needs to answer the question what you offer that I can’t get anywhere else at a similar price point. Right now, your offer is more expensive for basically something I get elsewhere better for 1/5 the price… I’m seriously worried whether your venture will be around in one or two years from now without a more convincing value prop.
From my experience of leaning more into full end to end Ai workflows building Rust, it seems that
1) context has clearly won over RAG. There is no way back.
2) workflow is the next obvious evolution and gets you an extra mile
3) adversial GAN training seems a path forward to get from just okay generated code to something close to a home run on the first try
4) generating a style guide based on the entire code base and feeding that style guide together with the task and context into the LLM is your ticket to enterprise customers because no matter how good your stuff might be , if the generated code doesn’t fit the mold you are not part of the conversation. Conversely, if you deliver code in the same style and formatting and it actually works, well, price doesn’t matter much.
5) in terms of marketing to developers, I suggest starting listening to their pain points working with existing Ai tools. I don’t have one single of the problems you try to solve. Im sitting over a massive Rust monorepo and I’ve seen virtually every existing Ai coding assistant failing one way or another. The one I have now works miracles half the time and only fails the other half. That is already a massive improvement compared to everything else I tried over the past four years.
Point is, there is a massive need for coding assistance on complex systems and for CodeBuff to make a dime of a difference, you have to differentiate from what’s out there by starting with the challenges engineers face today.
Re: style guide. We encourage you to write up `knowledge.md` files which are included in every prompt. You can specify styles or other guidelines to follow in your codebase. One motivating example is we wrote in instructions of how to add an endpoint (edit these three files), and that made it do the right thing when asked to create an endpoint.
I've been using Zed editor as my primary workhorse, and I can see codebuff as a helper CLI when I need to work. I'm not sure if a CLI-only interface outside my editor is the right UX for me to generate/edit code — but this is perfect for refactors.
Totally understand where you're coming from, I personally use it in a terminal tab (amongst many) in any IDE I'm using. But I've been surprised to see how different many developers' workflows are from one another. Some people use it in a dedicated terminal window, others have a vim-based setup, etc.
It actually got line number not too wrong, and so they might have been helpful. (I included the line numbers for the original file in context).
Ultimately though, this approach was still error prone enough that we recently switched away.
I can add it if tree sitter adds support for Svelte. I haven't checked, maybe it already is supported?
the night critics are coming.
You have to constantly do your research. It is one of those anxiety-inducing tasks that's easy to justify avoiding when all you want to do is code your idea up and there's so much other work to do. But it's your job. Even when you hire someone else to run product for you it'll be your responsibility to own it.
What you've built is cool, a lot of people love it even though they know about the other tools available. Now you know what your main competition does, you also know what it doesn't do, so you get to solve for that - and if you solved the context problem in isolation with treesitter then you're obviously capable.
You'll have realised by now that Aider didn't use treesitter when it started. Instead it used ctags - a pattern-matching approach to code indexing from 40 years ago that doesn't capture signatures or create an ast, it effectively just indexes the code with a bunch of regex. And it's not like treesitter wasn't around when aider was first written. Keep that in mind.
Good luck.
And this: https://github.com/antlr/codebuff
You must though, learn to code in a different way if you are not that disciplined. I had excellent results asking for small changes, step by step and committing often so I can undo and go back to a working version easily.
Net result was very positive, built two apps simultaneously (customer side and professional side).
I'm curious how often others have experienced this. There have been so many times on many different projects where I've struggled with something hard and had the breakthrough only right before the deadline (self-imposed or actual deadline).
Congrats, sounds like an awesome project. I'll have to try it out.
we've seen our own productivity increase tenfold – using codebuff to build buff our own code hah
let us know what you think!
The demo right there is worth $5 of software development ( in offshored upwork cost) . Imagine when this can be done at scale for huge existing codebase.