Building a back end using only OpenAI Codex (opens in new tab)

(codeball.ai)

88 pointszegl3y ago35 comments

I've published the sources for the code generation and the code that was generated on GitHub: https://github.com/sturdy-dev/codeball-todo-mvc

I've been experimenting with merging prompts together, with a goal to write the full backend in a single prompt.

On the form:

> 1. Setup a flask web server

> 2. Add a /add endpoint

It works reasonably well, but it seems like it's loosing some precision in the prompts... The person that coined the term "prompt engineering" was right, it's really important to learn what words to use to get the AI to do exactly what you want it to do.

Building a back end using only OpenAI Codex

(codeball.ai)

88 pointszegl3y ago35 comments

I've published the sources for the code generation and the code that was generated on GitHub: https://github.com/sturdy-dev/codeball-todo-mvc

I've been experimenting with merging prompts together, with a goal to write the full backend in a single prompt.

On the form:

> 1. Setup a flask web server

> 2. Add a /add endpoint

35 comments

29 comments · 12 top-level

akkartik3y ago· 3 in thread

This needs a new format for source code. We could call it, oh, Literate Programming. Check in only the prompts to version control, expand them into code during CI, then file bugs when new releases of OpenAI cause regressions.

dwohnitmok3y ago

We cannot expect the code expansion to be deterministic with regards to the prompt without a severe reduction in AI capability.

The utility of these prompts comes primarily from the fact that the AI is aware of a huge amount of context and can therefore infer what a prompt is "meant to do." If a prompter had to exhaustively specify the context it would be no different than coding in any normal programming language.

That context necessarily changes over time. The same sentence 10 years ago might easily have a different contextual meaning than it does today.

verdverm3y ago

I think there is a middle ground. My work on a deterministic, ai-free code gen tool uses DSLs in a more abstract, declarative space. The details are handled in the templates and extra config from the input.

Prisma, Atlas, and OpenAPI-generator are similar, with increasing complexity of input and DSL, respectively.

I do like your point that context of natural language as input can change over time. I imagine it also would if trained on different source code, or even different target languages and technologies.

I'm thinking that these AI could be simplified if their target was one of these middle ground abstractions in a DSL, letting fewer (expert) humans write the code via templates

thinkingkong3y ago

Like we just write gerkhin / cucumber all day?

daenz3y ago· 3 in thread

It's very cool, but from an auditing perspective, it's a nightmare. As a reviewer, I can't reason about the code in the same way that I could reason about human code, since there is no coherent formulation of the accomplished task. I can't say "why did it apply CORS to the entire flask app?" and expect reasoning that will fulfill my objective as a reviewer.

So while it could help blast out large swaths of code quickly, it still needs an expert at the wheel to be accountable for the changes to reviewers.

nexuist3y ago

> I can't say "why did it apply CORS to the entire flask app?" and expect reasoning that will fulfill my objective as a reviewer.

I'm not saying you're wrong, but can't you just ask the AI to include a comment explaining why it chose to apply CORS to the entire app? You can just keep asking it questions and maybe its reasoning would check out for most of them.

vosper3y ago

> just keep asking it questions and maybe its reasoning would check out for most of them.

But the AI isn't reasoning... is it? Perhaps it could give an explanation, but you couldn't (currently) conflate that with any actual understanding of why it did what it did?

1 more reply

ShamelessC3y ago

As was the case before.

w1zzy3y ago· 3 in thread

How much time did it take to wrote app?

zeglOP3y ago

It’s hard to say, since I was writing the blog post in parallel as I was making the app. But not too long, maybe an hour or two? I’m not a Python/Flask developer, so I guess that’s not too bad.

lxe3y ago

You can also use gpt3 to write most of the blog content. I'm willing to bet soon this is going to be everyone's workflow.

1 more reply

w1zzy3y ago

Thanks for info. I am gathering information about efficiency of using code generators. Btw. Nice work!

2 more replies

cbm-vic-203y ago· 2 in thread

This is cool. Some thoughts:

> beware: sometimes Codex writes code vulnerable to SQL injection. When that happens tough, I was able to prevent it by adding "safely" to the prompt.

Oops.

Anyway, is "the program" actually the prompts? Should that be committed into source control, so future you and others can figure out how the code was built? How long will it be until we can trust Codex enough that the Python code doesn't need to be committed? "Codex, create an Android CRUD UI for this OpenAPI document."

smeagull3y ago

I wonder if the vulnerability could be detected at the embedding layer.

EGreg3y ago

Codex, create a dating site around the stable marriage algorithm. Kthxbai !

verdverm3y ago· 2 in thread

The code on GitHub does not exactly match the post. In particular, the last section about adding seed data is shifted up a few lines on GitHub, into the first database call, making me wonder if it was a stored procedure or a bug.

Did you have to correct output for the post?

https://github.com/sturdy-dev/codeball-todo-mvc/blob/main/ap...

https://github.com/sturdy-dev/codeball-todo-mvc/commit/17992...

zeglOP3y ago

Nice find! I tried to be careful to make sure that everything aligned.

The prompt used for the post seems to have been "before_first_request, before conn.close: if the tasks table is empty, add three rows"

I'm updating the post and the sources!

verdverm3y ago

I think a video going through the series of prompts would be super interesting too

xrd3y ago· 1 in thread

Why did you use sveltekit as the front end (as opposed to just svelte)? Typically SK is used when you want to have both front end and back end in the same app.

zeglOP3y ago

It's mostly what I'm used to these days, codeball.ai is written in SK. I didn't end up using it, but SK also has a nice client side router!

nl3y ago· 1 in thread

The typo in the submission (like it's loosing [should be losing] some precision) is both inadvertently amusing (losing precision could well be described as being loose) and raises the question of how Codex would deal with missed typos in instructions.

verdverm3y ago

My top-level comment is about a possible code typo that would appear much more serious

https://news.ycombinator.com/item?id=32587425

keyle3y ago· 1 in thread

Any instructions on how one goes ahead and play with openai codex themselves?

Is this closed? Beta? or .. ?

As a kid I used to dream to talk to the computer and it would make code happen as a repl. This appears to be close to it.

sh4rks3y ago

If you're a student, copilot is free I believe

citizenpaul3y ago· 1 in thread

> it's really important to learn what words to use to get the AI to do exactly what you want it to do.

Now instead of spending years learning to do all that nasty troublesome coding you can just spend years learning to exactly phrase what you want the code generator to do. Wait.... is this an infinite loop joke?

zeglOP3y ago

Kind of, sometimes coding with Codex is like having to debug/code review code from a developer that has just been through a 1-month "learn to code" bootcamp.

Nothing against bootcampers, is's just that their output is non-intuitive...

kevincox3y ago

This is mindblowing. The copilot autocomplete was very impressive but actually editing the existing code is incredible.

I really want to see some examples of failed prompts and attempts to ask it to cache sqlite connections.

sdwr3y ago

I like how the captions scroll along to match the code samples, it's a fun reading experience.

Got scared for a second (most of what I code is CRUD backends!), until I tried to see it from the perspective of a novice, where all of this is impenetrable anyway.

hirebackenddev3y ago

This is something interesting. Thanks for posting such content.

j / k navigate · click thread line to collapse

35 comments

29 comments · 12 top-level

akkartik3y ago· 3 in thread

dwohnitmok3y ago

We cannot expect the code expansion to be deterministic with regards to the prompt without a severe reduction in AI capability.

That context necessarily changes over time. The same sentence 10 years ago might easily have a different contextual meaning than it does today.

verdverm3y ago

Prisma, Atlas, and OpenAPI-generator are similar, with increasing complexity of input and DSL, respectively.

I do like your point that context of natural language as input can change over time. I imagine it also would if trained on different source code, or even different target languages and technologies.

I'm thinking that these AI could be simplified if their target was one of these middle ground abstractions in a DSL, letting fewer (expert) humans write the code via templates

thinkingkong3y ago

Like we just write gerkhin / cucumber all day?

daenz3y ago· 3 in thread

So while it could help blast out large swaths of code quickly, it still needs an expert at the wheel to be accountable for the changes to reviewers.

nexuist3y ago

> I can't say "why did it apply CORS to the entire flask app?" and expect reasoning that will fulfill my objective as a reviewer.

vosper3y ago

> just keep asking it questions and maybe its reasoning would check out for most of them.

But the AI isn't reasoning... is it? Perhaps it could give an explanation, but you couldn't (currently) conflate that with any actual understanding of why it did what it did?

1 more reply

ShamelessC3y ago

As was the case before.

w1zzy3y ago· 3 in thread

How much time did it take to wrote app?

zeglOP3y ago

It’s hard to say, since I was writing the blog post in parallel as I was making the app. But not too long, maybe an hour or two? I’m not a Python/Flask developer, so I guess that’s not too bad.

lxe3y ago

You can also use gpt3 to write most of the blog content. I'm willing to bet soon this is going to be everyone's workflow.

1 more reply

w1zzy3y ago

Thanks for info. I am gathering information about efficiency of using code generators. Btw. Nice work!

2 more replies

cbm-vic-203y ago· 2 in thread

This is cool. Some thoughts:

> beware: sometimes Codex writes code vulnerable to SQL injection. When that happens tough, I was able to prevent it by adding "safely" to the prompt.

Oops.

smeagull3y ago

I wonder if the vulnerability could be detected at the embedding layer.

EGreg3y ago

Codex, create a dating site around the stable marriage algorithm. Kthxbai !

verdverm3y ago· 2 in thread

Did you have to correct output for the post?

https://github.com/sturdy-dev/codeball-todo-mvc/blob/main/ap...

https://github.com/sturdy-dev/codeball-todo-mvc/commit/17992...

zeglOP3y ago

Nice find! I tried to be careful to make sure that everything aligned.

The prompt used for the post seems to have been "before_first_request, before conn.close: if the tasks table is empty, add three rows"

I'm updating the post and the sources!

verdverm3y ago

I think a video going through the series of prompts would be super interesting too

xrd3y ago· 1 in thread

Why did you use sveltekit as the front end (as opposed to just svelte)? Typically SK is used when you want to have both front end and back end in the same app.

zeglOP3y ago

It's mostly what I'm used to these days, codeball.ai is written in SK. I didn't end up using it, but SK also has a nice client side router!

nl3y ago· 1 in thread

verdverm3y ago

My top-level comment is about a possible code typo that would appear much more serious

https://news.ycombinator.com/item?id=32587425

keyle3y ago· 1 in thread

Any instructions on how one goes ahead and play with openai codex themselves?

Is this closed? Beta? or .. ?

As a kid I used to dream to talk to the computer and it would make code happen as a repl. This appears to be close to it.

sh4rks3y ago

If you're a student, copilot is free I believe

citizenpaul3y ago· 1 in thread

> it's really important to learn what words to use to get the AI to do exactly what you want it to do.

zeglOP3y ago

Kind of, sometimes coding with Codex is like having to debug/code review code from a developer that has just been through a 1-month "learn to code" bootcamp.

Nothing against bootcampers, is's just that their output is non-intuitive...

kevincox3y ago

This is mindblowing. The copilot autocomplete was very impressive but actually editing the existing code is incredible.

I really want to see some examples of failed prompts and attempts to ask it to cache sqlite connections.

sdwr3y ago

I like how the captions scroll along to match the code samples, it's a fun reading experience.

Got scared for a second (most of what I code is CRUD backends!), until I tried to see it from the perspective of a novice, where all of this is impenetrable anyway.

hirebackenddev3y ago

This is something interesting. Thanks for posting such content.

j / k navigate · click thread line to collapse