undefined | Better HN

0 pointsctoth2mo ago0 comments

[flagged]

0 comments

5 comments · 5 top-level

Christopher, would you be able to share the transcripts for that repo by running /bug? That would make the reports actionable for me to dig in and debug.

quietsegfault2mo ago

I’m not sure being confrontational like this really helps your case. There are real people responding, and even if you’re frustrated it doesn’t pay off to take that frustration out on the people willing to help.

4 more replies

lambda2mo ago

I guess one of the things I don't understand: how you expect a stochastic model, sold as a proprietary SaaS, with a proprietary (though briefly leaked) client, is supposed to be predictable in its behavior.

It seems like people are expecting LLM based coding to work in a predictable and controllable way. And, well, no, that's not how it works, and especially so when you're using a proprietary SaaS model where you can't control the exact model used, the inference setup its running on, the harness, the system prompts, etc. It's all just vibes, you're vibe coding and expecting consistency.

Now, if you were running a local weights model on your own inference setup, with an open source harness, you'd at least have some more control of the setup. Of course, it's still a stochastic model, trained on who knows what data scraped from the internet and generated from previous versions of the model; there will always be some non-determinism. But if you're running it yourself, you at least have some control and can potentially bisect configuration changes to find what caused particular behavior regressions.

4 more replies

malfist2mo ago

It also completely ignores the increase in behavioral tracking metrics. 68% increase in swearing at the LLM for doing something wrong needs to be addressed and isn't just "you're holding it wrong"

1 more reply

dang2mo ago

Please don't post this aggressively to Hacker News. You can make your substantive points without that.

https://news.ycombinator.com/newsguidelines.html

j / k navigate · click thread line to collapse

0 comments

5 comments · 5 top-level

bcherny2mo ago

Christopher, would you be able to share the transcripts for that repo by running /bug? That would make the reports actionable for me to dig in and debug.

quietsegfault2mo ago

4 more replies

lambda2mo ago

4 more replies

malfist2mo ago

It also completely ignores the increase in behavioral tracking metrics. 68% increase in swearing at the LLM for doing something wrong needs to be addressed and isn't just "you're holding it wrong"

1 more reply

dang2mo ago

Please don't post this aggressively to Hacker News. You can make your substantive points without that.

https://news.ycombinator.com/newsguidelines.html

j / k navigate · click thread line to collapse