undefined | Better HN

0 pointslukev11mo ago0 comments

This brings up a tangential question for me.

Clearly, companies view the context fed to these tools as valuable. And it certainly has value in the abstract, as information about how they're being used or could be improved.

But is it really useful as training data? Sure, some new codebases might be fed in... but after that, the way context works and the way people are "vibe coding", 95% of the novelty being input is just the output of previous LLMs.

While the utility of synthetic data proves that context collapse is not inevitable, it does seem to be a real concern... and I can say definitively based on my own experience that the _median_ quality of LLM-generated code is much worse than the _median_ quality of human-generated code. Especially since this would include all the code that was rejected during the development process.

Without substantial post-processing to filter out the bad input code, I question how valuable the context from coding agents is for training data. Again, it's probably quite useful for other things.

0 comments

4 comments · 4 top-level

recursivecaveat11mo ago

The human/computer interaction is probably more valuable than any code they could slurp up. Its basically CCTV of people using your product and live-correcting it, in a format you can feed back into the thing to tell it to improve. Maybe one day they will even learn to stop disabling tests to get them to pass.

1 more reply

nicewood11mo ago

I think it's less about the code output, but about the process of humans iterating and adjusting the LLM-drafted requirements and design. Claude Code et al. are good enough, the bottleneck is IMO usually the context and prompt by now. So further improving that by optimizing for and collecting data about the human interaction seems like a good strategy to me.

Essentially, the user labels (accept/edit) data (design documents) for the agent (amazon)

consumer45111mo ago

There is company, maybe even a YC company, which I saw posting about wanting to pay people for private repos that died on the vine, and were never released as products. I believe they were asking for pre-2022 code to avoid LLM taint. This was to be used as training data.

This is all a fuzzy memory, I could have multiple details wrong.

janstice11mo ago

I suspect the product telemetry would be more useful - things like success of interaction vs requiring subsequent editing, success from tool use, success from context & prompt tuning parameters would be for valuable to the product than just feeding more bits into the core model.

j / k navigate · click thread line to collapse