Against vibes: When is a generative model useful (opens in new tab)

(williamjbowman.com)

107 pointstakira3mo ago28 comments

28 comments

24 comments · 10 top-level

qsera3mo ago· 5 in thread

>The scientific version of these claims is “the total encoding cost (for some class of tasks) is lower than previous models”

I wonder why? Can the new models read mind?

> For example, I was recently trying to install a package whose name I forgot. I prompted the model to “install that x11 fake gui thing”, a trivial prompt.

Yes, they are a better search.

I would also add that there is also a subjective factor. If I enjoy writing code a lot more than reviewing it, I am going to prefer NOT using it for writing and might just use it to review.

So "hardness" is also related to how much you like/dislike doing it.

SOLAR_FIELDS3mo ago

It does feel like with each new frontier model release the major improvement I notice is that the model is, in fact, getting better at reading your mind. And what I mean by that is that it gets better at understanding the nuance and the subtleties of the intent of what you are saying better, and teasing out the actual intent of what you want better. So it gets easier and easier for the model to build a world around less input. So in a significant way, yes, newer models are reading your mind in a way, because they are probabilistically figuring out better how most humans communicate in natural language and filling in the gaps.

Re writing code: most people find the writing of code to be a chore. For those that don’t, I don’t envy them, because that is the part that just got completely destroyed by AI. It’s becoming pretty abundantly clear that if you enjoy hand writing code that it will be a hobby rather than something you can do professionally and succeed over people who aren’t writing by hand

linkregister3mo ago

I think that the skill of hand-writing software is still useful in 2026. The vast majority of programming is a module calling another API. This does not spark joy. Truly interesting classes of problems —application of algorithms or applying complex arcane knowledge— are often not handled well by LLMs. Also, what the author wrote really strikes a chord. We should write the exceptionally difficult sections ourselves so we understand how the software operates.

This reminds me of the observation that Anthropic's unsupervised LLM-generated Rust implementation of sqlite3 was correct for the subset of features they chose, but thousands of times slower (wall clock). Of course, performance will be the next skill to be targeted by expert-led RHLF, but this is a hard problem with many tradeoffs. It may prove to be time-consuming to improve.

andai3mo ago

Yeah they have more "common sense", though not as much as I'd like. I used to think Opus is big, but after using it a lot, I think it should actually be a lot bigger. The difference from Sonnet to Opus is really noticeable, but the difference from Opus to human (in common sense) is also massive. I expect as the hardware improves, we'll see 3-10x bigger models become the default.

Small models are making great strides of course, and perhaps we will soon learn to distill common sense ;) but subtlety and nuance appear physically bound to parameter count...

qsera3mo ago

> teasing out the actual intent of what you want better.

Do you mean they ask clarifying questions before generating a response?

SOLAR_FIELDS3mo ago

Kind of. I mean that they have gotten way better at taking some braindead sentence like “trace the performance of this app” and deriving what you actually mean which involves looking at your codebase, identifying your deployment scenario, identifying the steps required to pull the traces, writing the query to sample the traces, then correlating it all together. Just an example, you say 5 words and it’s able to figure out exactly what you want it to do and it might ask questions to clarify but otherwise it’s really good at figuring out what you actually need.

In the dark before times of 6 months ago, the thing would go completely off the rails and fuck it all up. In today’s world, 80% of the time it’s gonna get you pretty close to what you actually want with literally 5 words for simple tasks.

Complex tasks require more upfront work but my anecdata has demonstrated for me that complex tasks are showing similar relative reductions in upfront planning and effort to succeed

andai3mo ago· 3 in thread

> What is the cost of verifying the generated artifact meets requirements vs. a directly produced artifact? This is mostly a function of the task and the user, but also the generative model.

So this is the fun one for programming.

I let AI agents do some programming on my codebases, but then I had to spend more time catching up with their changes.

So first I was bored waiting for them to finish, and then I was confused and frustrated making sense of the result.

Whereas, when I am asking AI small things like "edit this function so it does this instead", and accepting changes manually, my mental model stays synced the whole time. And I can stay active and in flow.

(Also for such fine grained tasks, small fast cheap models are actually superior because they allow realtime usage. Even small latency makes a big difference.)

lukan3mo ago

Yes, the more you let agents loose, the less you are in control and the more time you spend later cleaning up their mess.

It is tempting letting them loose, after they delivered unexpectedly good results for a while, but for me it is not worth it. Manually approve and actual read. (And manually edit CLAUDE.md etc. if necessary. )

dreadnip3mo ago

This is exactly why I don't like those "swarm" approaches with 8 Claude Code's running in parallel. Every time I've tried it I instantly lose control and become out of touch with the codebase. The quantity of the produced output is simply too fast & large to follow, so I tune out and it becomes a 100% vibe coded project.

dominotw3mo ago

start with good prompts and good intentions , drift into sloppy prompt vibecoding ,finally "still not working" prompt in a loop.

this has been my story in every one of my personal projects.

charcircuit3mo ago· 3 in thread

>I’m upset now when people are making claims that agents are so useful, but can’t tell me when or why or how they’re useful beyond vibes about feeling more productive (vibes that have been refuted by real science contrasting objective measure of productivity vs. subjective reports), or examples of having produced a lot of plausible output.

This position is untenable when from my perspective everyone writes all of their code using agents. I had to double check the year on the post to see if this was actually posted in 2026.

mekael3mo ago

Most people I know don't write any code with agents, 90ish percent is still written by hand. I'm personally still trying to figure out where to fit them into my workflow: there's not much boilerplate to write as well known frameworks and libraries had already taken care of the heavy lifting, templates for major project types cut down on the initial startup overhead, and all of the project planning is done upfront with business partners.

Despite my ethical issues with AI, I am using it for a handful of personal projects so I am at least keeping up with what the frontier models are doing and I'm quite impressed with them for doing reverse engineering (they need a lot of hand holding, but I've been able to knock out months of trial and error pretty quickly).

That being said, I'm still perplexed when people state they're getting huge gains from them in terms of knocking out boilerplate, or helping them plan out the project. I was under the impression that the former was a solved problem, and the latter was a requirement of being a decent engineer.

charcircuit3mo ago

>the former was a solved problem

It want solved. There was some generic boilerplate that was added to IDEs but it wouldn't be project specific. It wasn't able to look at patterns within your own codebase and repeat them.

>and the latter was a requirement of being a decent engineer.

Most software projects are too big to fit in one engineers head. Having AI be able to research what the relevant code is, how it works, what race conditions exist, what pitfalls or other things you may run into saves a lot of time in regards to planning.

1 more reply

swiftcoder3mo ago

> from my perspective everyone writes all of their code using agents

HN/twitter/etc may be something of a bubble in that regard. As far as I can tell out in the real world, most normal software developers are much more likely to be using LLMs as fancy auto-complete than to be using agents.

smilindave263mo ago· 2 in thread

> For almost all software I write, I do care about the process. I’m typically designing software as part of research, and me doing the design and implementation work creates knowledge that I will then share.

Similar here. For a lot of software I write, I don't really know what the essential "abstraction" I need is until I'm actively writing it. The answers, when I get them right, look obvious in retrospect. Sometimes, starting with Claude Code, I can get there, but my mindset is that I'm using this tool to generate software that helps me immerse myself in the problem space. It's a different pace to the process - sometimes it speeds me up, sometimes I end up taking bad concepts a lot further than I normally would before getting to the better path

neonstatic3mo ago

I agree it's a different process. Personally, I do not enjoy it. If I get code wrong or the solution I came up with is clunky, I am okay to start over. At least I learned something valuable. With Claude, I get irritated, frustrated, and frankly just really tired. I feel like I've been burning hour after hour of my precious time trying to explain something to a machine, which just doesn't understand, cannot understand, and what comes out as output from that process is just disappointing. I feel that I don't trust the code it produces and I don't have it in me to even read that code. I never felt that way about code written by me or another person.

I will admit, that Claude has been helpful as an assistant (especially helping me with syntax I am not familiar with), but as a programmer that does things for me, it's been awful. YMMV.

Btw. a week of doing that (treating Claude as a programmer who does things for me) did help me in a way. I now have an intuitive understanding of what it means these things are not intelligence. I am now certain, that an LLM doesn't understand anything. It seems to be able to map text to some representations and then see if these representations match or compose. I know this might sound like intelligence, but in practice it's just not enough. Pattern recognition, sure. Not intelligence. Not even close.

lukan3mo ago

" Pattern recognition, sure. Not intelligence. Not even close."

To me it is a form of intelligence, just not general intelligence.

And yes, the trick is not treat them as intelligent, but like an idiot. Explain every single detail. Document everything in detail. Remove anything distracting. And then it might work like a charm at times.

1 more reply

7777777phil3mo ago· 1 in thread

I particularly like this framework: how hard is it to describe the task vs. how hard is it to check the output.

webn3rd3mo ago

What's more, this can be made conditional on one's linguistic intelligence. Some people can simply convert their thoughts into written language much more effectively than others. They have a natural advantage when it comes to writing prompts that actually... work, whereas others might struggle with the results that their prompts produce. It may therefore be crucial to assess the usefulness of generative models relative to oneself, not to a group of people.

janalsncm3mo ago

LLMs have significantly reduced the time I’ve spent chasing down cryptic errors on stack overflow, old github issues, or asking in random slack channels about it. Even if that’s all they did, they would be very valuable.

If that means I’m actually coding instead of figuring out why xyz random plugin isn’t doing its job right now, some subsystem that I need but don’t care to learn the internals of, then I am happy.

h4kunamata3mo ago

When you know what you are sking it to help you with.

I wanna to build a Proxmox LXC container via Ansible playbook, both things I know and use in my homelab.

It has to be 4 services running within the same container, VPN and what not. That would take me forever to find the latest and recommended:

* Each service installation proccess

* Known issues and workaround

* Firewall and what not

It sill took me 3 nights because I had to replace one of the services. I am not expert into iptables firewall, it helped me with that.

Ansible playbook was a hit and miss but it gave me the start so I fixed what was wrong and voila.

The problem is people using it for copy/paste, it works it is good enough. No understand of what is happening, security issues and alike.

seedium_tech3mo ago

One thing I’d add is that usefulness also seems to depend on whether the task can be broken down. When you can split work into small pieces that are easy to check, generative models tend to work really well. But once those pieces start depending heavily on each other and the design constraints pile up, the pattern you describe shows up pretty clearly.

adampunk3mo ago

This is basically the right approached, framed as critique. Success with these models means engaging in detail with their work, persistently and at all scales. You need attention to detail, ability to evaluate (independent from the model), and mechanisms for enforcing all that. In a word: engineering.

But because people get all bent out of shape I prefer to call it vibe coding anyway.

jgammell3mo ago

> Generative models are probabilistic: the output will be less likely to satisfy complex requirements, particularly

This is a misinformed 'critique' which always gets on my nerves, as someone who actually works with AI. The world is random. Generative models are only random in the sense that they randomly sample from the set of correct answers for a given problem (ideally). Of course LLMs make mistakes, but this has nothing to do with the fact that they are random.

j / k navigate · click thread line to collapse

28 comments

24 comments · 10 top-level

qsera3mo ago· 5 in thread

>The scientific version of these claims is “the total encoding cost (for some class of tasks) is lower than previous models”

I wonder why? Can the new models read mind?

> For example, I was recently trying to install a package whose name I forgot. I prompted the model to “install that x11 fake gui thing”, a trivial prompt.

Yes, they are a better search.

I would also add that there is also a subjective factor. If I enjoy writing code a lot more than reviewing it, I am going to prefer NOT using it for writing and might just use it to review.

So "hardness" is also related to how much you like/dislike doing it.

SOLAR_FIELDS3mo ago

linkregister3mo ago

andai3mo ago

Small models are making great strides of course, and perhaps we will soon learn to distill common sense ;) but subtlety and nuance appear physically bound to parameter count...

qsera3mo ago

> teasing out the actual intent of what you want better.

Do you mean they ask clarifying questions before generating a response?

SOLAR_FIELDS3mo ago

Complex tasks require more upfront work but my anecdata has demonstrated for me that complex tasks are showing similar relative reductions in upfront planning and effort to succeed

andai3mo ago· 3 in thread

> What is the cost of verifying the generated artifact meets requirements vs. a directly produced artifact? This is mostly a function of the task and the user, but also the generative model.

So this is the fun one for programming.

I let AI agents do some programming on my codebases, but then I had to spend more time catching up with their changes.

So first I was bored waiting for them to finish, and then I was confused and frustrated making sense of the result.

(Also for such fine grained tasks, small fast cheap models are actually superior because they allow realtime usage. Even small latency makes a big difference.)

lukan3mo ago

Yes, the more you let agents loose, the less you are in control and the more time you spend later cleaning up their mess.

dreadnip3mo ago

dominotw3mo ago

start with good prompts and good intentions , drift into sloppy prompt vibecoding ,finally "still not working" prompt in a loop.

this has been my story in every one of my personal projects.

charcircuit3mo ago· 3 in thread

This position is untenable when from my perspective everyone writes all of their code using agents. I had to double check the year on the post to see if this was actually posted in 2026.

mekael3mo ago

charcircuit3mo ago

>the former was a solved problem

It want solved. There was some generic boilerplate that was added to IDEs but it wouldn't be project specific. It wasn't able to look at patterns within your own codebase and repeat them.

>and the latter was a requirement of being a decent engineer.

1 more reply

swiftcoder3mo ago

> from my perspective everyone writes all of their code using agents

smilindave263mo ago· 2 in thread

neonstatic3mo ago

I will admit, that Claude has been helpful as an assistant (especially helping me with syntax I am not familiar with), but as a programmer that does things for me, it's been awful. YMMV.

lukan3mo ago

" Pattern recognition, sure. Not intelligence. Not even close."

To me it is a form of intelligence, just not general intelligence.

1 more reply

7777777phil3mo ago· 1 in thread

I particularly like this framework: how hard is it to describe the task vs. how hard is it to check the output.

webn3rd3mo ago

janalsncm3mo ago

h4kunamata3mo ago

When you know what you are sking it to help you with.

I wanna to build a Proxmox LXC container via Ansible playbook, both things I know and use in my homelab.

It has to be 4 services running within the same container, VPN and what not. That would take me forever to find the latest and recommended:

* Each service installation proccess

* Known issues and workaround

* Firewall and what not

It sill took me 3 nights because I had to replace one of the services. I am not expert into iptables firewall, it helped me with that.

Ansible playbook was a hit and miss but it gave me the start so I fixed what was wrong and voila.

The problem is people using it for copy/paste, it works it is good enough. No understand of what is happening, security issues and alike.

seedium_tech3mo ago

adampunk3mo ago

But because people get all bent out of shape I prefer to call it vibe coding anyway.

jgammell3mo ago

> Generative models are probabilistic: the output will be less likely to satisfy complex requirements, particularly

j / k navigate · click thread line to collapse