is it also useful with the smaller (and cheaper) cloud models?
Basically I run a flow
Brainstorming > Create Spec > Review Spec* > Create Plans > Review Plan* > Execute Plan (in subagents) > Review Against Plan > Code Review* > Open PR > Finish Plan (marks plan files done)
* Each review step marked with an asterisk uses a paid larger LLM, right now Deepseek V4 Pro. Having it do this catches a lot of small things, and now I'm effectively one shotting any task I give it.
And it's not costing me much at all, just those three reviews. I could use a free model like Gemini but I'm happy with what I've got.
All the inference happens on that card, so the CPU/RAM is there for the other containers.
I'll eventually swap the motherboard and CPU for something better, so I can fit 1 or 3 more of those cards.
Why not NVIDIA? 32gb on team green means spending crazy money. And I can get 4 R9700s for the cost of one 32gb 5090.
128gb ... Vs 32gb.
What's most interesting and surprising is watching all latecomers rediscover optimizations from years ago. Some people really do need to do things the hard way ig.
Just because you clocked this specific detail doesn't mean it's some guiding principal built into the bedrock; there is no bedrock at the moment, because it's a non-determinant system whose being sold as something grandeur than a text processing machine.
It doesn't help that the computer scientists building it don't recognize they're essentially doing a bunch of cultural and socialogical science rather than some rigerous mathematical artiface.
Then there's the billionaires who want to corner the market and have you believe they can eradicate the "low capital workers".
Anyway, there's zero real integration of how these models work.