undefined | Better HN

0 pointstdb78932mo ago0 comments

If those tools are writing the code then in general I do expect that to be included in the PR! Through my whole career I've seen PRs where people noted that code that was generated (people have been generating code since long before LLMs). It's useful context unless you've gone over the generated code and understand it and it is the same quality as if you wrote it yourself (which in my experience is the case where it's obvious boilerplate or the generated section is small).

Needing to flag nontrivial code as generated was standard practice for my whole career.

0 comments

26 comments · 7 top-level

thechao2mo ago· 7 in thread

You assemble all your machine code using a magnetized needle?

tdb7893OP2mo ago

I am not against the general use of AI code. Quite simply, my view is that all relevant context for a review should be disclosed in the PR.

AI and humans are not the same as authors of PRs. As an obvious example: one of the important functions of the PR process is to teach the writer about how to code in this project but LLMs fundamentally don't learn the same way as humans so there's a meaningful difference in context between humans and AIs.

If a human takes the care to really understand and assume authorship of the PR then it's not really an issue (and if they do, they could easily modify the Claude messages to remove "generated by Claude" notes manually) but instead it seems that Claude is just hiding relevant context from the reviewer. PRs without relevant context are always frustrating.

VectorVault2mo ago

What's really tricky with the legal protections area is this: 90% of the value of the S&P 500 is intangible. Meaning if you suck out the book value (10%), the rest is brand, IP, rights, sources & methods, etc. So if a company can't protect that, it's not particularly valuable anymore. Maybe we will see a shift back to tangible assets and book value (25,000 $8MM Vera Rubin machines) and away from intangibles...

jruz2mo ago

I think this is just the beginning so people are apprehensive, rightfully so, at this stage. I agree with you that AI use should be disclosed but using the commit message as a billboard for Anthropic hell no. Go put an add on the free tier.

Wowfunhappy2mo ago

You don't generally commit compiled code to your VCS. If you do need to commit a binary for whatever reason, yeah it makes sense to explain how the binary was generated.

catlifeonmars2mo ago

You do usually pin your compiler version though, or at the very least set a minimum version

jasomill2mo ago

Don't be silly.

I use good ol' C-x M-c M-butterfly.

https://xkcd.com/378/

djmips2mo ago

Sometimes using AI to code feels closer to a Butterfly than emacs right?

Alifatisk2mo ago· 5 in thread

> people have been generating code since long before LLMs

How? LSTM?

cess112mo ago

There are many techniques. You're most likely to come across things like declarative DSL:s and macros, then there are things like JAXB and similar tooling that generates code from data schemas, and some people script around data sources to glue boilerplate and so on.

Arguably snippet collections belong to this genre.

dheera2mo ago

https://github.com/mame/quine-relay

chrislo2mo ago

For example `rails generate ...` built into the Rails CLI.

TheDong2mo ago

See, for example, this blog post from 2014: https://go.dev/blog/generate

The following comment in the blog post

    //go:generate stringer -type=Pill

generates a .._string.go file which contains a '.String()' method.

I would find it very reasonable to commit that with 'Co-Authored-By: stringer v0.1.0' or such.

Or 'sed s/a/b/g' and 'Co-Authored-By: sed'

baq2mo ago

Holy shit I’m old.

zx80802mo ago· 4 in thread

> If those tools are writing the code then in general I do expect that to be included in the PR!

How about compiler?

ben-schaaf2mo ago

Compilers don't usually write the code that ends up in a PR. But compilers do (and should) generally leave behind some metadata in the end result saying what tools were used, see for example the .comment section in ELF binaries.

rogerrogerr2mo ago

Are you checking in compiled artifacts? Then yeah, we should have a chain of where that binary blob came from.

kuschku2mo ago

Do you check in binaries into your git history? If so, you should mark a commit as generated, and the commit message (plus repository state) should be enough to recreate it 1:1.

Similarly, if I use e.g. jextract or uniffi to generate Java interfaces from C code and check that in, I'll create tooling to automatically run those, and the commit will be attributed to that tooling.

catlifeonmars2mo ago

Compiler versions are usually included in the package manifest. Generally you include commit info compiler version and compilation date and platform embedded in the binaries that compilers produce.

sumeno2mo ago· 3 in thread

> It's useful context unless you've gone over the generated code and understand it and it is the same quality as if you wrote it yourself

If this is not the case you should not be sending it to public repos for review at all. It is rude and insulting to expect the people maintaining these repos to review code that nobody bothered to read.

__float2mo ago

Sometimes code generation is a useful tool, and maybe people have read and reviewed the generator.

The difference here is that the generator is a non-deterministic LLM and you can't reason about its output the same way.

jasomill2mo ago

As a rule, I commit the input to the code generation tool, i.e., what the GPL refers to as "the preferred form of the work for making modifications to it", generate as part of the build process, and, where possible, try to avoid code generation tools designed around the assumption that its output will be maintained rather than regenerated from modified input.

As for LLM code assistants, I don't really view them as traditional code generation tools in the first place, as in practice they more resemble something in between autocomplete and delegating to a junior programmer.

As for attribution, I view it more or less the same way as "dictated but not read" in written correspondance, i.e., an disclaimer for errors in the code, which may be considered rude in some contexts, and a perfectly acceptable and useful annotation in others.

ferngodfather2mo ago

"Here's what AI came up with and it mostly worked the one time I tested it. Might need improving".

No. I don't want to test and pick through your shitty LLM generated code. If I wanted the entire code base to be junk, it'd say so in the readme.

tsimionescu2mo ago

Usually, pre-LLM generated code is flagged because people aren't expected to modify it by hand. If you find a bug and track it to the generated code, you are expected to fix the sources and re-generate.

This is not at all the case with LLM-generated code - mostly because you can't regenerate it even if you wanted to, as it's not deterministic.

That said, I do agree that LLM code is different enough from human code (even just in regards to potential copyright worries) that it should be mentioned that LLMs were used to create it.

xarope2mo ago

Absolutely. Let's say I have a problem with gRPC and traced it to code generated using the gRPC compiler. I can reproduce it, highlight it and I'm pretty sure the gRPC team would address the issue.

Replace gRPC compiler with LLM. Can you reproduce? (probably not 100%). Can anybody fix it short of throwing more english phrases like "DO NOT", "NEVER", "Under No Circumstances"?

Probably not.

duskdozer2mo ago

>It's useful context unless you've gone over the generated code and understand it and it is the same quality as if you wrote it yourself

I thought the argument was that AI-users were reviewing and understanding all of the code?

j / k navigate · click thread line to collapse

0 comments

26 comments · 7 top-level

thechao2mo ago· 7 in thread

You assemble all your machine code using a magnetized needle?

tdb7893OP2mo ago

I am not against the general use of AI code. Quite simply, my view is that all relevant context for a review should be disclosed in the PR.

VectorVault2mo ago

jruz2mo ago

Wowfunhappy2mo ago

You don't generally commit compiled code to your VCS. If you do need to commit a binary for whatever reason, yeah it makes sense to explain how the binary was generated.

catlifeonmars2mo ago

You do usually pin your compiler version though, or at the very least set a minimum version

jasomill2mo ago

Don't be silly.

I use good ol' C-x M-c M-butterfly.

https://xkcd.com/378/

djmips2mo ago

Sometimes using AI to code feels closer to a Butterfly than emacs right?

Alifatisk2mo ago· 5 in thread

> people have been generating code since long before LLMs

How? LSTM?

cess112mo ago

Arguably snippet collections belong to this genre.

dheera2mo ago

https://github.com/mame/quine-relay

chrislo2mo ago

For example `rails generate ...` built into the Rails CLI.

TheDong2mo ago

See, for example, this blog post from 2014: https://go.dev/blog/generate

The following comment in the blog post

    //go:generate stringer -type=Pill

generates a .._string.go file which contains a '.String()' method.

I would find it very reasonable to commit that with 'Co-Authored-By: stringer v0.1.0' or such.

Or 'sed s/a/b/g' and 'Co-Authored-By: sed'

baq2mo ago

Holy shit I’m old.

zx80802mo ago· 4 in thread

> If those tools are writing the code then in general I do expect that to be included in the PR!

How about compiler?

ben-schaaf2mo ago

rogerrogerr2mo ago

Are you checking in compiled artifacts? Then yeah, we should have a chain of where that binary blob came from.

kuschku2mo ago

Do you check in binaries into your git history? If so, you should mark a commit as generated, and the commit message (plus repository state) should be enough to recreate it 1:1.

catlifeonmars2mo ago

Compiler versions are usually included in the package manifest. Generally you include commit info compiler version and compilation date and platform embedded in the binaries that compilers produce.

sumeno2mo ago· 3 in thread

> It's useful context unless you've gone over the generated code and understand it and it is the same quality as if you wrote it yourself

__float2mo ago

Sometimes code generation is a useful tool, and maybe people have read and reviewed the generator.

The difference here is that the generator is a non-deterministic LLM and you can't reason about its output the same way.

jasomill2mo ago

ferngodfather2mo ago

"Here's what AI came up with and it mostly worked the one time I tested it. Might need improving".

No. I don't want to test and pick through your shitty LLM generated code. If I wanted the entire code base to be junk, it'd say so in the readme.

tsimionescu2mo ago

This is not at all the case with LLM-generated code - mostly because you can't regenerate it even if you wanted to, as it's not deterministic.

That said, I do agree that LLM code is different enough from human code (even just in regards to potential copyright worries) that it should be mentioned that LLMs were used to create it.

xarope2mo ago

Absolutely. Let's say I have a problem with gRPC and traced it to code generated using the gRPC compiler. I can reproduce it, highlight it and I'm pretty sure the gRPC team would address the issue.

Replace gRPC compiler with LLM. Can you reproduce? (probably not 100%). Can anybody fix it short of throwing more english phrases like "DO NOT", "NEVER", "Under No Circumstances"?

Probably not.

duskdozer2mo ago

>It's useful context unless you've gone over the generated code and understand it and it is the same quality as if you wrote it yourself

I thought the argument was that AI-users were reviewing and understanding all of the code?

j / k navigate · click thread line to collapse