undefined | Better HN

0 pointsfao_3d ago0 comments

I don't think that this is very hypocritical on the part of the developer holding such views. Typing code has never been the bottleneck, building the mental model has. You need the mental model so you know how the domain and the actual model will interact, which is needed for pre-empting what tests you need, what QA you need to do, etc etc. and the limitations of the system. You can demo this out with a specification but all specifications eventually meet the domain head on, and often with catastrophic consequences, and you still need to do this sort of work anyway when writing the specification.

Fundamentally, LLM do not construct a consistent mental model of the codebase (this can be seen if you, uh, read LLM code,), and this is Bad for a lot of reasons. It's bad for long-term maintainability, it's bad for modelling this code accurately and it's behaviour as a system, it's bad for testing and verifying it, etc. Pretty much all of the tasks around program design require you to have that mental model.

You can absolutely get an LLM to show you a mental model of the code, but there is absolutely nothing that can 100% guarantee you that that's the thing it's using. Proof of this is to look at how they summarise documents, to look at how inaccurate a lot of documentation they generate is, and to look at how inaccurate a lot of their code summaries are. Those would be accurate if the LLM was forming a mental model while it worked. It's a program to statistically generate plausible text, the fact that we got the program to do more than that in the first place is very interesting and can imply a lot of things, but at the end of the day, whatever you ask for it, it will generate text. There is absolutely no guarantee around accuracy of that text and there effectively can never be.

0 comments

globnomulous2d ago

I was an LLM naysayer for a long time. During that time I would have agreed with you. Recent experiences have changed my mind. The accuracy I get from models does not suffer from the problems you describe, and many of the issues you're describing are also true, in different ways, of human beings. There's never any guarantee that any of the text you or I produce will be accurate, or that our summary of it will be accurate, but if you ask us to generate text, we will. It recalls that funny meme: "Your job application says you're fast at math. What's 513 * 487?" "39,414." "That's not even close." "But it is fast."

devonkim3d ago

One of the core problems we have in software engineering is the longstanding philosophical problem around creation of cohesive, consistent, objective mental models of inherently subjective concepts like identifying a person, place, etc. Look at the endless lists of falsehoods programmers (tend to) believe about any topic.

You’re right that LLMs specifically have no guarantees about accuracy nor veracity of the text they generate but I posit that that’s the same with people, especially when filtered through the socialization process. The difference is in the kind of errors machines make compared to ones that humans make.

It’s frustrating we’re using anthropomorphic concepts like hallucinations when describing LLM behaviors when the fundamental units of computation and thus failures of computation are so different at every level.

fao_OP2d ago

> but I posit that that’s the same with people,

> The difference is in the kind of errors machines make compared to ones that humans make.

There's another difference, and that is that other humans can learn and study that mental model (which is why "readable code" is a goal — the code is a physical manifestation of the model that you, the programmer, has to learn), and then the model can be tweaked and taught back to the original programmer, who can then think of that tweak in the future. Programming is inherently (in most cases) a collaborative art, because you're working with people to collectively develop a mental model and refine it, smoothing it down until (as Christopher Alexander said) there are no misfits between the model and the domain.

j / k navigate · click thread line to collapse