Wanted to share a project I started working on during my spare time and was then discovered by many in the open source community last week.
GPT Engineer’s mission: Be the open platform for devs to tinker with and build their personal code-generation toolbox.
I believe it's key for us devs to engage in how building software can and will change.
You can find more info about the flexible technical "philosophy" to make it work well, and the community we want it to become on github: https://github.com/AntonOsika/gpt-engineer
The project is still in early stages. It's clear that there is a lot of room for improvement as the space to combine tricks that guide LLM's is large.
Appreciate any suggestions, experiences, or ideas on the project from you all!
Generating new code from whole-cloth seems like an easier task for GPT. My tool can certainly do that, as can smol-developer, etc. But you really only do that "once" per project.
Can folks use gpt-engineer to modify and extend the code it has already created, as the user comes up with new features, etc? Can it be used to work on a pre-existing codebase?
I have been looking at tree-sitter quite a bit actually. I love that it has broad language support, which is a key design goal for my tool.
My only hesitation is that it doesn't appear to correctly identify multi-line function signatures & calls. If you look below at create, io.tool_error and __init__ you can see that the (row,col)-(row,col) indicies only reference the first line.
GPT would really benefit from seeing the entire function signature and call sites.
$ tree-sitter tags aider/coders/base_coder.py
...
create | function def (39, 8) - (39, 14) `def create(`
check_model_availability | call ref (54, 19) - (54, 43) `if not check_model_availability(main_model):`
tool_error | call ref (56, 23) - (56, 33) `io.tool_error(`
EditBlockCoder | call ref (66, 19) - (66, 33) `return EditBlockCoder(main_model, io, **kwargs)`
...
__init__ | function def (74, 8) - (74, 16) `def __init__(`
set | call ref (89, 26) - (89, 29) `self.abs_fnames = set()`
...Definitely looking forward to the day I review my first AI-generated PR (beyond dependency updates of course).
On the topic of "AI-generated PRs", I used my tool to file a PR to the `glow` CLI tool. I don't know the go language, so I had my tool `aider` add the feature I needed. I mostly use glow to preview README.md for GitHub, so I wanted it to render line breaks like GitHub does.
https://github.com/charmbracelet/glow/pull/502
I've also been able solve a couple of github issues that were file by users by just pasting the issue into my tool... it fixed itself. Links below:
https://github.com/paul-gauthier/aider/issues/13#issuecommen...
https://github.com/paul-gauthier/aider/issues/5#issuecomment...
> Generating new code from whole-cloth seems like an easier task for GPT.
as I've been working on a similar tool of my own and I've found the opposite. The initial task I've been trying to have it complete is to build something like https://craigmbooth.com/projects/killer-sudoku-calculator/ but with my bespoke requirements. If I try to specify it up front and have it generate the whole thing it almost always fails in a number of ways at once, despite the requirements being relatively straightforward and clear.
However I have had success by walking it through the process step by step along the lines of "please create an index.html that references react from a cdn. It should draw a blank sudoku board" -> "Please add a button that says 'add a cage'. When the button is clicked a box labeled "Cage 1", "Cage 2", etc should be added to a list to the right of the board" -> "Please track the currently selected cage and set the background of the corresponding cage box to green", etc.
Likewise, I added the initial set of functions it could access ("Create a File", "Update a File", "Remove a File") manually, but then I had it add additional commands ("Add a directory", "Remove a directory", "Copy a file", etc) and it was able to do it correctly on the first try each time because the pattern already existed.
You're describing building a green field app starting from nothing, step by step. GPT shines at things like this, because they are by definition small code bases. You can probably fit the whole codebase into the context window.
Also, your approach of walking it through step-by-step is perfect, since you get to guide it to build a wise code architecture as you go.
It's hard to naively point GPT at a big, existing repo and try and do non-trivial changes to that codebase. Without a bunch of tooling to help it understand the overall codebase, it won't understand or respect the existing modules, abstractions, etc. It will just start trying to write code in a vacuum, which probably isn't the right thing to do when modifying an existing codebase.
https://github.com/paul-gauthier/aider#gpt-4-vs-gpt-35
I use GPT for all kinds of busy work, and code quality type work. Adding test cases or quality of life features. Things that I might not have the energy to do myself. GPT can often accomplish these tasks off a 1-2 sentence request. Or if not, it will do all the boilerplate and get you 80% of the way there and it's easy to polish up the final 20%.
Don't entirely copy-paste your code base.
Eg. I adjusted the prompt to the logical separation in my project ( Eg. the module and the task of the class can be derived from it's namespace) and then indexed the codebase through the namespace + classes + method names with their parameters.
It helps adding more codebase to the prompt.
Fully automatic can be evaluated fully automatically.
Good input.
Who needs to maintain an old codebase if you can rewrite it adding new features at whim?
For that, I posit the system would need to understand the existing code base. Not just what the code does, but the intent and the why. I'll leave it up to the reader to decide whether they believe LLMs understand anything. I know where I stand.
https://github.com/paul-gauthier/aider#gpt-4-vs-gpt-35
I also have some incoming improvements so the tool is more graceful and helpful if you hit the context window limit. And in general, most of my main efforts are focused on making it possible to work with larger and larger codebases in spite of the context window limitations.
Please do file an issue with more details on your problems. I can try and help and give you updates if I make improvements to the tool which could solve your use case.
I'd love to see some improvements on the clarifications/questions part, but overall it's a great project with so much potential. Did you consider including some sort of code self-repair step?
Btw I posted a video [0] about gpt-engineer and my audience is also very impressed.
One thing I noticed is that the video in the readme doesn’t actually show the generated code running. It would be much more convincing if it did!