Show HN: gpt-engineer – platform for devs to tinker with AI programming tools

178 pointsantonoo3y ago53 comments

Hello Hacker News community,

Wanted to share a project I started working on during my spare time and was then discovered by many in the open source community last week.

GPT Engineer’s mission: Be the open platform for devs to tinker with and build their personal code-generation toolbox.

I believe it's key for us devs to engage in how building software can and will change.

You can find more info about the flexible technical "philosophy" to make it work well, and the community we want it to become on github: https://github.com/AntonOsika/gpt-engineer

The project is still in early stages. It's clear that there is a lot of room for improvement as the space to combine tricks that guide LLM's is large.

Appreciate any suggestions, experiences, or ideas on the project from you all!

Show HN: gpt-engineer – platform for devs to tinker with AI programming tools

178 pointsantonoo3y ago53 comments

Hello Hacker News community,

Wanted to share a project I started working on during my spare time and was then discovered by many in the open source community last week.

GPT Engineer’s mission: Be the open platform for devs to tinker with and build their personal code-generation toolbox.

I believe it's key for us devs to engage in how building software can and will change.

You can find more info about the flexible technical "philosophy" to make it work well, and the community we want it to become on github: https://github.com/AntonOsika/gpt-engineer

The project is still in early stages. It's clear that there is a lot of room for improvement as the space to combine tricks that guide LLM's is large.

Appreciate any suggestions, experiences, or ideas on the project from you all!

53 comments

32 comments · 12 top-level

anotherpaulg3y ago· 18 in thread

Have you done much work on using GPT to *edit* code in an existing codebase? That's been my focus lately, working on my open source GPT coding tool [0].

Generating new code from whole-cloth seems like an easier task for GPT. My tool can certainly do that, as can smol-developer, etc. But you really only do that "once" per project.

Can folks use gpt-engineer to modify and extend the code it has already created, as the user comes up with new features, etc? Can it be used to work on a pre-existing codebase?

[0] https://github.com/paul-gauthier/aider

ghughes3y ago

Cool project! I'm working in the same space (a ChatGPT plugin that can edit files within a shared VS Code workspace) and have built something similar to your "repo map" concept, except slightly lower-level: what you might call a "file map" generated by selectively collapsing AST nodes to fit within the available token budget. If ctags isn't cutting it for you, have a look at tree-sitter [1]. It can generate ASTs for most languages and has a nice API.

[1] https://tree-sitter.github.io/tree-sitter/

anotherpaulg3y ago

Glad to hear there are others working on similar things. I've been wishing there was a good forum for like minded folks to share ideas about AI coding, beyond the random drive-by commenting that happens here on HN.

I have been looking at tree-sitter quite a bit actually. I love that it has broad language support, which is a key design goal for my tool.

My only hesitation is that it doesn't appear to correctly identify multi-line function signatures & calls. If you look below at create, io.tool_error and __init__ you can see that the (row,col)-(row,col) indicies only reference the first line.

GPT would really benefit from seeing the entire function signature and call sites.

  $ tree-sitter tags aider/coders/base_coder.py
  ...
  create      | function def (39, 8) - (39, 14) `def create(`
  check_model_availability  | call     ref (54, 19) - (54, 43) `if not check_model_availability(main_model):`
  tool_error  | call     ref (56, 23) - (56, 33) `io.tool_error(`
  EditBlockCoder  | call     ref (66, 19) - (66, 33) `return EditBlockCoder(main_model, io, **kwargs)`
  ...
  __init__    | function def (74, 8) - (74, 16) `def __init__(`
  set         | call     ref (89, 26) - (89, 29) `self.abs_fnames = set()`
  ...

2 more replies

headcanon3y ago

Looks like you know more about it than me, but it seems to me that the main challenge is being able to include the appropriate context, and train it to output a diff which you can then apply to the codebase.

Definitely looking forward to the day I review my first AI-generated PR (beyond dependency updates of course).

anotherpaulg3y ago

Yup, those seem to be the key challenges. I've been making good progress on them, but there's plenty more work to do!

On the topic of "AI-generated PRs", I used my tool to file a PR to the `glow` CLI tool. I don't know the go language, so I had my tool `aider` add the feature I needed. I mostly use glow to preview README.md for GitHub, so I wanted it to render line breaks like GitHub does.

https://github.com/charmbracelet/glow/pull/502

I've also been able solve a couple of github issues that were file by users by just pasting the issue into my tool... it fixed itself. Links below:

https://github.com/paul-gauthier/aider/issues/13#issuecommen...

https://github.com/paul-gauthier/aider/issues/5#issuecomment...

1 more reply

isaacfung3y ago

Does anyone know if models like wizardcoder are trained on finished code only or they have been trained on ticket/PR/commit messages and the diff (with the interface of related code provided as context)?

lgas3y ago

It's interesting to me that you say

> Generating new code from whole-cloth seems like an easier task for GPT.

as I've been working on a similar tool of my own and I've found the opposite. The initial task I've been trying to have it complete is to build something like https://craigmbooth.com/projects/killer-sudoku-calculator/ but with my bespoke requirements. If I try to specify it up front and have it generate the whole thing it almost always fails in a number of ways at once, despite the requirements being relatively straightforward and clear.

However I have had success by walking it through the process step by step along the lines of "please create an index.html that references react from a cdn. It should draw a blank sudoku board" -> "Please add a button that says 'add a cage'. When the button is clicked a box labeled "Cage 1", "Cage 2", etc should be added to a list to the right of the board" -> "Please track the currently selected cage and set the background of the corresponding cage box to green", etc.

Likewise, I added the initial set of functions it could access ("Create a File", "Update a File", "Remove a File") manually, but then I had it add additional commands ("Add a directory", "Remove a directory", "Copy a file", etc) and it was able to do it correctly on the first try each time because the pattern already existed.

anotherpaulg3y ago

I think we agree, but maybe I wasn't writing clearly.

You're describing building a green field app starting from nothing, step by step. GPT shines at things like this, because they are by definition small code bases. You can probably fit the whole codebase into the context window.

Also, your approach of walking it through step-by-step is perfect, since you get to guide it to build a wise code architecture as you go.

It's hard to naively point GPT at a big, existing repo and try and do non-trivial changes to that codebase. Without a bunch of tooling to help it understand the overall codebase, it won't understand or respect the existing modules, abstractions, etc. It will just start trying to write code in a vacuum, which probably isn't the right thing to do when modifying an existing codebase.

jrockway3y ago

This is something that I've also been wanting to play with. The token limit is too small for my codebase so I haven't really bothered, but it would be nice to tell it all my code and then say "refactor everything of the form `err = foo` into `if err := foo; ...`". If AI is going to take my job, it is going to have to learn to do maintenance!

anotherpaulg3y ago

Give aider a try. As long as each file fits in the context window, it should work. With gpt-3.5-turbo-16k or gpt-4 you can edit files up to around 30 kbytes in size. If you have access to gpt-4-32k, you could edit files up to about 120 kbytes. See the notes here for more info:

https://github.com/paul-gauthier/aider#gpt-4-vs-gpt-35

I use GPT for all kinds of busy work, and code quality type work. Adding test cases or quality of life features. Things that I might not have the energy to do myself. GPT can often accomplish these tasks off a 1-2 sentence request. Or if not, it will do all the boilerplate and get you 80% of the way there and it's easy to polish up the final 20%.

NicoJuicy3y ago

Perhaps it helps:

Don't entirely copy-paste your code base.

Eg. I adjusted the prompt to the logical separation in my project ( Eg. the module and the task of the class can be derived from it's namespace) and then indexed the codebase through the namespace + classes + method names with their parameters.

It helps adding more codebase to the prompt.

antonooOP3y ago

Not yet. Considered adding it soon, the only reason I decided against — for now — is that for automatic evaluation human edits make it be try difficult!

Fully automatic can be evaluated fully automatically.

Good input.

anton-1073y ago

Hmm, if generating new code is an easy task for GPT, why don't you ask it to create a new project from scratch every moment a user comes up with a new feature?

Who needs to maintain an old codebase if you can rewrite it adding new features at whim?

andsoitis3y ago

> Can it be used to work on a pre-existing codebase?

For that, I posit the system would need to understand the existing code base. Not just what the code does, but the intent and the why. I'll leave it up to the reader to decide whether they believe LLMs understand anything. I know where I stand.

grugagag3y ago

Im there with you, LLMs don’t understand in the same sense we do. But they can transform a well written compressed spec into code. The spec becomes the true source and can be tweaked on and chatgpt regenerates the whole code over. It’s nondeterministic so everytime it will be slightly different but with multiple renditions its average should look like the travel salesman graph of the problem itself

braindead_in3y ago

I've played around with aider trying to run tests and fix the code, but it just crashes after exceeding the context window. I am now trying to repurpose the AutoGPT example in langchain.

anotherpaulg3y ago

Sounds like maybe your source files are bigger than the context window? Try including fewer files in the chat. See some notes on GPT models and file sizes here:

https://github.com/paul-gauthier/aider#gpt-4-vs-gpt-35

I also have some incoming improvements so the tool is more graceful and helpful if you hit the context window limit. And in general, most of my main efforts are focused on making it possible to work with larger and larger codebases in spite of the context window limitations.

Please do file an issue with more details on your problems. I can try and help and give you updates if I make improvements to the tool which could solve your use case.

rane3y ago

How have you created the svg screencast in the README?

andrewescott3y ago

Looking at the source, it appears to have been generated by https://github.com/nbedos/termtosvg

1 more reply

reallymental3y ago· 1 in thread

Just a quick thing I loved about your gif at the end, the font and the theme of your Vim settings! Really loved it, do you mind sharing your .vimrc?

Hallmane3y ago

https://github.com/AntonOsika/dotfiles

nathan_tarbert3y ago· 1 in thread

This is a really cool project. I'm going to play around with it. I love the fact that it's Open-Source!

antonooOP3y ago

Thanks!

mayaakim3y ago

Hey Anton, congratulations, I love the project, the results are amazing even though I still have access to gpt-3.5 only. I can't even image the results with gpt-4.

I'd love to see some improvements on the clarifications/questions part, but overall it's a great project with so much potential. Did you consider including some sort of code self-repair step?

Btw I posted a video [0] about gpt-engineer and my audience is also very impressed.

[0] https://www.youtube.com/watch?v=4ehvtuv3ZuQ

Kiro3y ago

I haven't tried this but how do you get around the fact that it hallucinates functions and variables when doing it file by file? I haven't managed to make an app without error going this route. The only time it produces something working is when I keep it single-file (e.g. html, js and css together) or single-function.

gaolei88883y ago

I'd like to see this is going forward. So much potential.

gitgud3y ago

Interesting project, great work on getting it done.

One thing I noticed is that the video in the readme doesn’t actually show the generated code running. It would be much more convincing if it did!

braindead_in3y ago

Cool project. I tried to build a reactjs Todo app with TDD, but it just put comments in the test file instead of the actual test. A self heal loop would be quite useful.

LonisHamaili3y ago

Wow, this is a big improvement on other 'gpt engineering' projects I've seen out there. What are the main things you think you can improve on it from here?

thepra3y ago

can it "scan" scan a local codebase, understand all the syntaxes used and their differences and being asked to write code for that particular part of the project?

ErikBjare3y ago

Nice to finally see you submit it here Anton :)

nobu5133y ago

test

j / k navigate · click thread line to collapse

53 comments

32 comments · 12 top-level

anotherpaulg3y ago· 18 in thread

Have you done much work on using GPT to *edit* code in an existing codebase? That's been my focus lately, working on my open source GPT coding tool [0].

Generating new code from whole-cloth seems like an easier task for GPT. My tool can certainly do that, as can smol-developer, etc. But you really only do that "once" per project.

Can folks use gpt-engineer to modify and extend the code it has already created, as the user comes up with new features, etc? Can it be used to work on a pre-existing codebase?

[0] https://github.com/paul-gauthier/aider

ghughes3y ago

[1] https://tree-sitter.github.io/tree-sitter/

anotherpaulg3y ago

I have been looking at tree-sitter quite a bit actually. I love that it has broad language support, which is a key design goal for my tool.

GPT would really benefit from seeing the entire function signature and call sites.

  $ tree-sitter tags aider/coders/base_coder.py
  ...
  create      | function def (39, 8) - (39, 14) `def create(`
  check_model_availability  | call     ref (54, 19) - (54, 43) `if not check_model_availability(main_model):`
  tool_error  | call     ref (56, 23) - (56, 33) `io.tool_error(`
  EditBlockCoder  | call     ref (66, 19) - (66, 33) `return EditBlockCoder(main_model, io, **kwargs)`
  ...
  __init__    | function def (74, 8) - (74, 16) `def __init__(`
  set         | call     ref (89, 26) - (89, 29) `self.abs_fnames = set()`
  ...

2 more replies

headcanon3y ago

Definitely looking forward to the day I review my first AI-generated PR (beyond dependency updates of course).

anotherpaulg3y ago

Yup, those seem to be the key challenges. I've been making good progress on them, but there's plenty more work to do!

https://github.com/charmbracelet/glow/pull/502

I've also been able solve a couple of github issues that were file by users by just pasting the issue into my tool... it fixed itself. Links below:

https://github.com/paul-gauthier/aider/issues/13#issuecommen...

https://github.com/paul-gauthier/aider/issues/5#issuecomment...

1 more reply

isaacfung3y ago

lgas3y ago

It's interesting to me that you say

> Generating new code from whole-cloth seems like an easier task for GPT.

anotherpaulg3y ago

I think we agree, but maybe I wasn't writing clearly.

Also, your approach of walking it through step-by-step is perfect, since you get to guide it to build a wise code architecture as you go.

jrockway3y ago

anotherpaulg3y ago

https://github.com/paul-gauthier/aider#gpt-4-vs-gpt-35

NicoJuicy3y ago

Perhaps it helps:

Don't entirely copy-paste your code base.

It helps adding more codebase to the prompt.

antonooOP3y ago

Not yet. Considered adding it soon, the only reason I decided against — for now — is that for automatic evaluation human edits make it be try difficult!

Fully automatic can be evaluated fully automatically.

Good input.

anton-1073y ago

Hmm, if generating new code is an easy task for GPT, why don't you ask it to create a new project from scratch every moment a user comes up with a new feature?

Who needs to maintain an old codebase if you can rewrite it adding new features at whim?

andsoitis3y ago

> Can it be used to work on a pre-existing codebase?

grugagag3y ago

braindead_in3y ago

I've played around with aider trying to run tests and fix the code, but it just crashes after exceeding the context window. I am now trying to repurpose the AutoGPT example in langchain.

anotherpaulg3y ago

Sounds like maybe your source files are bigger than the context window? Try including fewer files in the chat. See some notes on GPT models and file sizes here:

https://github.com/paul-gauthier/aider#gpt-4-vs-gpt-35

Please do file an issue with more details on your problems. I can try and help and give you updates if I make improvements to the tool which could solve your use case.

rane3y ago

How have you created the svg screencast in the README?

andrewescott3y ago

Looking at the source, it appears to have been generated by https://github.com/nbedos/termtosvg

1 more reply

reallymental3y ago· 1 in thread

Just a quick thing I loved about your gif at the end, the font and the theme of your Vim settings! Really loved it, do you mind sharing your .vimrc?

Hallmane3y ago

https://github.com/AntonOsika/dotfiles

nathan_tarbert3y ago· 1 in thread

This is a really cool project. I'm going to play around with it. I love the fact that it's Open-Source!

antonooOP3y ago

Thanks!

mayaakim3y ago

Hey Anton, congratulations, I love the project, the results are amazing even though I still have access to gpt-3.5 only. I can't even image the results with gpt-4.

I'd love to see some improvements on the clarifications/questions part, but overall it's a great project with so much potential. Did you consider including some sort of code self-repair step?

Btw I posted a video [0] about gpt-engineer and my audience is also very impressed.

[0] https://www.youtube.com/watch?v=4ehvtuv3ZuQ

Kiro3y ago

gaolei88883y ago

I'd like to see this is going forward. So much potential.

gitgud3y ago

Interesting project, great work on getting it done.

One thing I noticed is that the video in the readme doesn’t actually show the generated code running. It would be much more convincing if it did!

braindead_in3y ago

Cool project. I tried to build a reactjs Todo app with TDD, but it just put comments in the test file instead of the actual test. A self heal loop would be quite useful.

LonisHamaili3y ago

Wow, this is a big improvement on other 'gpt engineering' projects I've seen out there. What are the main things you think you can improve on it from here?

thepra3y ago

can it "scan" scan a local codebase, understand all the syntaxes used and their differences and being asked to write code for that particular part of the project?

ErikBjare3y ago

Nice to finally see you submit it here Anton :)

nobu5133y ago

test

j / k navigate · click thread line to collapse