Local LLMs perform better when you teach them to ask before they answer (opens in new tab)

(xda-developers.com)

33 pointsfroh1mo ago12 comments

12 comments

12 comments · 7 top-level

frohOP1mo ago· 4 in thread

I'm positively surprised such a little guidance makes such a difference.

is it also useful with the smaller (and cheaper) cloud models?

intothemild1mo ago

Yes. I run local models, Qwen3.6-27B and IMHO the massive level up was the agents and skills files that I've worked on.

Basically I run a flow

Brainstorming > Create Spec > Review Spec* > Create Plans > Review Plan* > Execute Plan (in subagents) > Review Against Plan > Code Review* > Open PR > Finish Plan (marks plan files done)

* Each review step marked with an asterisk uses a paid larger LLM, right now Deepseek V4 Pro. Having it do this catches a lot of small things, and now I'm effectively one shotting any task I give it.

And it's not costing me much at all, just those three reviews. I could use a free model like Gemini but I'm happy with what I've got.

lelele1mo ago

Would you mind sharing your HW configuration? Thank you.

intothemild1mo ago

Sure. It's just an old I7 8700 (non-k), 64gb ram. Running proxmox. But recently I put an AMD R9700 AI Pro, in there which is a 32gb inference focused card, think of it as a 32gb version of a 9070xt.

All the inference happens on that card, so the CPU/RAM is there for the other containers.

I'll eventually swap the motherboard and CPU for something better, so I can fit 1 or 3 more of those cards.

Why not NVIDIA? 32gb on team green means spending crazy money. And I can get 4 R9700s for the cost of one 32gb 5090.

128gb ... Vs 32gb.

Akamant1mo ago

Right on target

halJordan1mo ago· 1 in thread

This is not new knowledge at all. In fact it was discovered before, and is the direct precursor of, Chain of Thought/Thinking models which are now the norm.

What's most interesting and surprising is watching all latecomers rediscover optimizations from years ago. Some people really do need to do things the hard way ig.

cyanydeez1mo ago

Can't really blame anyone who started paying attention: the ability of these models to just generate volumes of text means any honest broker has to wade into a limitless pool of useless information, just to find a workable idea.

Just because you clocked this specific detail doesn't mean it's some guiding principal built into the bedrock; there is no bedrock at the moment, because it's a non-determinant system whose being sold as something grandeur than a text processing machine.

It doesn't help that the computer scientists building it don't recognize they're essentially doing a bunch of cultural and socialogical science rather than some rigerous mathematical artiface.

Then there's the billionaires who want to corner the market and have you believe they can eradicate the "low capital workers".

Anyway, there's zero real integration of how these models work.

thinkingemote1mo ago

From the article: "When tasked with coding, writing, editing, or summarizing, ask the user up to three targeted clarifying questions. Proceed with the task once you've received answers and understand the prompt fully. If the task is a simple factual question or conversational message, respond directly."

shlewis1mo ago

This is true even with the SOTA models. Making LLMs ask questions and giving answers is always a good idea. Almost every prompt I write ends with something like this: Unless undoubtedly clear, every decision and action must come from mutual agreement.

riknos3141mo ago

I started using similar approaches in the sonnet 3.5 era and found them incredibly useful at the time. The frontier lab models have gotten significantly better about their guesses over time, but I still sometimes turn to the technique if my own ideation is only about 80% of the way there, as the LLM's questioning can help me identify the blind spots that need more consideration.

tana_shahh1mo ago

Absolutely True not only for Local LLMs but for cloud ones too. Clarifying the intention, the type of output we want improves the model's response multiple folds.

kh_hk1mo ago

Isn't this akin to including all the (missing) keywords from the prompt? YMMV but to me we have found the less optimized way of using LLMs

j / k navigate · click thread line to collapse

12 comments

12 comments · 7 top-level

frohOP1mo ago· 4 in thread

I'm positively surprised such a little guidance makes such a difference.

is it also useful with the smaller (and cheaper) cloud models?

intothemild1mo ago

Yes. I run local models, Qwen3.6-27B and IMHO the massive level up was the agents and skills files that I've worked on.

Basically I run a flow

Brainstorming > Create Spec > Review Spec* > Create Plans > Review Plan* > Execute Plan (in subagents) > Review Against Plan > Code Review* > Open PR > Finish Plan (marks plan files done)

* Each review step marked with an asterisk uses a paid larger LLM, right now Deepseek V4 Pro. Having it do this catches a lot of small things, and now I'm effectively one shotting any task I give it.

And it's not costing me much at all, just those three reviews. I could use a free model like Gemini but I'm happy with what I've got.

lelele1mo ago

Would you mind sharing your HW configuration? Thank you.

intothemild1mo ago

Sure. It's just an old I7 8700 (non-k), 64gb ram. Running proxmox. But recently I put an AMD R9700 AI Pro, in there which is a 32gb inference focused card, think of it as a 32gb version of a 9070xt.

All the inference happens on that card, so the CPU/RAM is there for the other containers.

I'll eventually swap the motherboard and CPU for something better, so I can fit 1 or 3 more of those cards.

Why not NVIDIA? 32gb on team green means spending crazy money. And I can get 4 R9700s for the cost of one 32gb 5090.

128gb ... Vs 32gb.

Akamant1mo ago

Right on target

halJordan1mo ago· 1 in thread

This is not new knowledge at all. In fact it was discovered before, and is the direct precursor of, Chain of Thought/Thinking models which are now the norm.

What's most interesting and surprising is watching all latecomers rediscover optimizations from years ago. Some people really do need to do things the hard way ig.

cyanydeez1mo ago

It doesn't help that the computer scientists building it don't recognize they're essentially doing a bunch of cultural and socialogical science rather than some rigerous mathematical artiface.

Then there's the billionaires who want to corner the market and have you believe they can eradicate the "low capital workers".

Anyway, there's zero real integration of how these models work.

thinkingemote1mo ago

shlewis1mo ago

riknos3141mo ago

tana_shahh1mo ago

Absolutely True not only for Local LLMs but for cloud ones too. Clarifying the intention, the type of output we want improves the model's response multiple folds.

kh_hk1mo ago

Isn't this akin to including all the (missing) keywords from the prompt? YMMV but to me we have found the less optimized way of using LLMs

j / k navigate · click thread line to collapse