Agreed, and I also agree that most developers come to this realization with time and experience. When you have a clear understanding of business rationale, scope, inputs, and desired outputs, the data models, system design and the code fall out almost naturally. Or at least are much more obvious.
— Melvin E. Conway 1967
- I completely agree with you about fundamentally the limitation being the business able to coherently articulate itself and its strategy
- BUT the benefit now is you can basically prototype for free. Before we had to be extremely careful with engineer headcount investment. Now we can try many more things under the same time constraints.
LLMs don't solve any of those problems by itself.
This is the part that’s no longer true. Everything else you said is spot on. But everything you mentioned — making sound decisions in the vast context of the business and enormously complex codebase, dealing with ambiguity and customer implications. None of this is inherently untouchable by AI
Is this some sort of troll attempt? Like, are you fundamentally misunderstanding the problem with tech debt? This is the equivalent of throwing garbage on the floor and expecting professional cleaners to keep your house clean.
You can produce tech debt faster than you can pay it back, that's the core aspect of tech debt. If tech debt was more expensive in the short term than not doing it, nobody would be doing it.
A labor saving device doesn't reduce or deal with tech debt since tech debt is a decision made independently of the competence of the developers. If you have a company with a tech debt culture, the labor saving device will just let you accumulate more tech debt until you reach the same level of burden per person.
>First consider if you understand what scaling laws are like chinchilla and how RL with verification works fundamentally
Honestly, this tells me that you basically understand nothing, not even chinchilla scaling laws and how RL works. Not only are you trying to brute force the problem, you're listing completely irrelevant factors to the problem at hand.
Chinchilla scaling laws are "ancient" by LLM standards. Everyone who designs a model architecture that is supposed to beat their competitors is pulling out every trick in the books and then come up with their own on top of that and chinchilla scaling laws have been done to death in that regard.
Reinforcement Learning is also a pretty bad example here, because there is no obvious way to encode a reward function to deal with something as ill defined as tech debt. You didn't even say avoid tech debt which would be actionable to some extent, just "systemic tech debt is now addressable at scale with LLMs". I.e. you're implying that if LLMs were to generate tech debt, you can just keep scaling and produce more of it, solving the problem once and for all Futurama style with ever bigger ice cubes.
Both of these lectures misunderstand my point and how things work.
- “tech debt” is not some special problem…? You accumulate cruft and bad design decisions…you spend tokens to fix this. Is your point there is always a fundamental tension between spending tokens on new stuff and spending tokens on cleaning stuff?
> Honestly, this tells me that you basically understand nothing, not even chinchilla scaling laws and how RL works. Not only are you trying to brute force the problem, you're listing completely irrelevant factors to the problem at hand.
That’s a very interesting take because I would say the same thing! RL and scaling laws are not relevant to the performance and capabilities of coding agents? Thats something you don’t hear everyday
- chinchilla-like scaling laws are not ancient…people try to derive scaling laws for new paradigms all the time it is how researchers get their company/lab to invest in scaling up a new idea. No idea what you mean here. Maybe you think I meant “the literal constants from the chinchilla paper”? No I mean: scaling laws generally, and Chinchilla, due to the impact of that work, is used more generally. Regardless, scaling laws generally continue to hold, and in fact improve with architectural/data mix/training recipes.
> Reinforcement Learning is also a pretty bad example here, because there is no obvious way to encode a reward function to deal with something as ill defined as tech debt.
Well that’s a bit of a strong claim to make… I don’t agree with this at face value but even if I did, you don’t need to explicitly do RL on tech debt as a specific task.. you do RL to build better programming skills generally which then generalize to many coding tasks.
> You didn't even say avoid tech debt which would be actionable to some extent, just "systemic tech debt is now addressable at scale with LLMs".
Tech debt is strategic, why avoid it?
> you're implying that if LLMs were to generate tech debt, you can just keep scaling and produce more of it, solving the problem once and for all Futurama style with ever bigger ice cubes.
I’m saying you can take, successively over time larger and larger, and more complex codebases with thorny debt problems and resolve them by spending money on tokens.
You keep scaling and, just like we do today, decide when some tech debt austerity needs to take place. I’m saying “the guy that built our house of cards over 10 years and left” is no longer so devastating and expensive a problem as it was before
But.. so can your competitors. And that changes the value proposition.
Is there any reason to believe this? I've only seen the evidence of the contrary so far.
My experience with AI coding aides is that they, generally:
1. Don't have an opinion.
2. Are trained on code written using practices that increase technical debt.
3. Lack in the greater perspective department, more focused on concrete, superficial and immediate.
I think, I need to elaborate on the first and explain how it's relevant to the question. I'll start with an example. We have an AI reviewer and recently had migrated a bunch of company's repositories from Bitbucket to GitLab. This also prompted a bunch of CI changes. Some projects I'm involved with, but don't have much of an authority, that are written in Python switched to complicated builds that involve pyproject.toml (often including dynamic generation of this cursed file) as well as integration with a bunch of novelty (but poor quality) Python infrastructure tools that are used for building Python distributalbe artifacts.
In the projects where I have an authority, I removed most of the third-party integration. None of them use pyproject.toml or setup.cfg or any similar configuration for the third-party build tool. The project code contains bespoke code to build the artifacts.
These two approaches are clearly at odds. A living and breathing person would either believe one to be the right approach or the other. The AI reviewer had no problems with this situation. It made some pedantic comments about the style and some fantasy-impossible-error-cases, but completely ignored the fact that moving forward these two approaches are bound to collide. While it appears to have an opinion about the style of quotation marks, it completely doesn't care about strategic decisions.
My guess as to why this is the case is that such situations are genuinely rarely addressed in code review. Most productive PRs, from which an AI could learn, are designed around small well-defined features in the pre-agreed upon context. The context is never discussed in PRs because it's impractical (it would usually require too much of a change, so the developers don't even bring up the issue).
And this is where real large glacier-style deposits of tech debt live. It's the issues developers are afraid of mentioning because of the understanding that they will never be given authority and resources to deal with.
One big misconception is that these models are trained to mimic humans and are limited by the quality of the human training data, and this is not true and also basically almost entirely the reason why you have so much bullishness and premature adoption of agentic coding tools.
Coding agents use human traces as a starting point. You technically don’t have to do this at all but that’s an academic point, you can’t do it practically (today). The early training stages with human traces (and also verified synthetic traces from your last model) get you to a point where RL is stable and efficient and push you the rest of the way. It’s synthetic data that really powers this and it’s rejection sampling; you generate a bunch of traces, figure out which ones pass the verification, and keep those as training examples.
So because
- we know how this works on a fundamental level and have for some time
- human training data is a bootstrap it’s not a limitation fundamentally
- you are absolutely right about your observations yet look at where you are today and look at say Claude sonnet 3.x. It’s an entire world away in like a year
- we have imperfect benchmarks all with various weaknesses yet all of them telling the same compelling story. Plus you have adoption numbers and walled garden data that is the proof in the pudding
The onus is on people who say “this is plateauing” or “this has some fundamental limitation that we will not get past fairly quickly”.
which is why engineers want to be left alone to code, historically. Better to be left alone than dealing with insane bureaucracy. But even better than that is working with good bureaucracy. Just, once you know it's insane, there's not really anything that you can personally do about it, so you check out and try to hold onto a semblance of sanity in the realm you have control over, which is the code.
Small companies/startups don't have insane bureaucracy, and they're hiring.
I wish the reality was more pleasant. It's not.
I don't think this comment is fair or grounded. There are plenty of process bottlenecks that are created by developers. Unfortunately I have a hefty share of war stories where a tech lead's inability to draft a coherent and clear design resulted in project delays and systems riddled with accidental complexity required to patch the solution enough to work.
Developers are a part of the process and they are participants of both the good parts and the bad parts. If business requirements are not clear, it's the developer's job to work with product owners to arrive at said clarity.
This is also an organizational problem (bad hiring/personal management). If you put an incompetent individual at the helm of a project, then resources (especially time) will be spent horrendously and you will have more problems down the line. That’s true for all type of organizations and projects.
In fact, it makes it harder.
I think the solution to using AI in coding is more testing, which unlocks even more AI.
> AI craze isn't going to produce the boon some people think it will.
What’s the boon you don’t think it will produce?
The way AI is set up today, it's trying to replicate the (hopefully) good existing practices. Possibly faster. The real change comes from inventing better practices (something AI isn't capable of, at least not the kind of AI that's being sold to the programmers today).