undefined | Better HN

0 pointstavavex1mo ago0 comments

Most LLMs are trained on a lot of the source code for many open-source projects. This 'project' has the whole song-and-dance about never seeing the source code and separating the system to skirt around legal trouble. Why didn't anyone do that yet?

0 comments

imiric1mo ago

Because that's impossible. Any "robot" that can generate code must be trained on massive amounts of code, most of which is open source.

sdwr1mo ago

And how are you supposed to guarantee equivalent functionality by analyzing "README files, API docs, and type definitions"?

Nolski1mo ago

It's described on the web page but it's by having 2 agents. One has access to the code and one doesn't.

fmbb1mo ago

Are they the same model?

Not that it matters, I just think the joke is more fun if they are different.

1 more reply

dymk1mo ago

The joke is that you don’t.

preisschild1mo ago

not a lot of code is public domain and thus not a lot of training data is available

j / k navigate · click thread line to collapse