undefined | Better HN

0 pointsKronisLV23d ago0 comments

> It was easy to get comfortable with using the best model at the highest setting for everything for a while, but as the models continue to scale and reasoning token budgets grow, that's no longer a safe default unless you have unlimited budgets.

For a while I used Cerebras Code for 50 USD a month with them running a GLM model and giving you millions of tokens per day. It did a lot of heavy lifting in a software migration I was doing at the time (and made it DOABLE in the first place), BUT there were about 10 different places where the migration got fucked up and had to manually be fixed - files left over after refactoring (what's worse, duplicated ones basically), some constants and routes that are dead code, some development pages that weren't removed when they were superseded by others and so on.

I would say that Claude Code with throwing Opus at most problems (and it using Sonnet or Haiku for sub-agents for simple and well specified tasks) is actually way better, simply because it fucks things up less often and review iterations at least catch when things are going wrong like that. Worse models (and pretty much every one that I can afford to launch locally, even ones that need around ~80 GB of VRAM in the context of an org wanting to self-host stuff) will be confidently wrong and place time bombs in your codebases that you won't even be aware of if you don't pay enough attention to everything - even when the task was rote bullshit that any model worth its salt should have resolved with 0 issues.

My fear is that models that would let me truly be as productive as I want with any degree of confidence might be Mythos tier and the economics of that just wouldn't work out.

0 comments

mistercheese23d ago

I have this exact same fear as an IC.

I wonder if Engineering Managers have this same fear, or they’re used to having to distribute complex tasks to senior engineers and gamble with seeming less risky tasks to juniors that may leave ticking time bombs in their code. Just the nature of code written by agents or humans?

wallst0723d ago

Yes, that definitely happens as an EM. You want your Senior/Staff engineers to architect out the new high-risk functionality into a doc for review. Then that Staff engineer either implements or has a junior/senior under their wing helping implement some of the scaffolding.

In this [common] paradigm the Staff Engineer acts as a architect/programmer and project manager in one. The EM should be there to guide and unblock.

jon-wood23d ago

Yes, that is absolutely a dynamic in managing an engineering team, and I'd argue that knowing the right person to give a particular task to, and how much detail they're going to need to get it done, is what separates good engineering managers from bad ones.

gardnr23d ago

The GLM-4.7 model isn't that great. I was on their $200/month plan for a while. It was really hard to keep up with how fast it works. Going back to Claude seems like everything takes forever. GLM got much better in 5.1 but Cerebras still doesn't offer that yet (it's a bit heavier). I have a year of Z.ai that I got as a bargain and I use GLM-5.1 for some open source stuff but I am a bit nervous about sending data into their API.

KronisLVOP23d ago

The new one is quite a bit heavier!

GLM 4.7 is 358B parameters: https://huggingface.co/zai-org/GLM-4.7

GLM 5.1 is 754B parameters: https://huggingface.co/zai-org/GLM-5.1

That said, 5.1 is indeed a bunch better and I could definitely see myself using it for some tasks! Sadly all of the stuff I can actually run locally is still trash (I appreciate the effort behind Qwen 3.6, Gemma 4 and Mistral Small 4 though, alongside others).

Aurornis23d ago

Good points. I was speaking from a position of using an LLM in a pair programming style where I'm interactive with each request.

For handing work off to an LLM in large chunks, picking the best model available is the only way to go right now.

j / k navigate · click thread line to collapse