My problem with the concept of technical debt is that "code quality" is subjective, and more often than not it translates to "I don't like the structure of this code because I didn't write it". What matters to the business is that the code works. The owners of the business do not care if the developers enjoy working with the code or if they find it to be well written.
I hear all the time that bad code slows down development, increases bugs, etc. And that can be true in a number of cases. But most of the time it's just complaining. If a developer takes the time to re-factor the code base, it's very likely that whoever takes over from them in a few years will have the same complaints about technical debt and want to re-factor all over again.
The best way to ensure everyone's job and increase compensation is to deliver features that customers really want. Doesn't matter what the code looks like, as long as the customers are happy. Re-factoring is important to developers and to maintaining an organization. But it can wait until there aren't pressing feature demands.
Are you going to tell me that this kind of tech debt doesn't affect my time to fix the bug? I could have done it once, now I have to hunt for 5+ instances of it. Am I just complaining or did doing the original development in a lazy way directly impact my time to complete the task?
I could have recorded the exact amount between the original bug fix and finally fixing the last instance of the bug to quantify how much time lazy coding cost me. It may have taken the same amount of time to write the original pages with an extendable component that only has one copy and takes arguments to create all the variations we need. But in the end we'd have fewer lines of code to maintain in the future and that's also an objective metric.
For your scenario, I completely agree that the decision to not share code lead to you spending more time. We can be objective about time spent for specific scenario A with codebase B, but we cannot enumerate all scenarios and all codebases and so it's quite hard to ask what's the expected value for "damage done" by a particular style of programming.
Simple duplication is probably the least bad form of tech debt I can imagine.
At the time of change, its obvious you're making the code shittier. That function? It doesn't do what it says anymore. That other function? Just a few more boolean variables. Hey, we don't need to refactor, we can just make negative numbers mean one thing, and positive ones mean another! Death by a thousand cuts.
This is also the best time to refactor. You skip the whole "I don't like the way the code looks" bullshit. The codebase doesn't support your feature? Make it support your feature, then code your feature.
And you know what else? You are only changing code you're touching. Refactoring code you don't need to otherwise touch is a waste of time.
But yeah, if you parachute in 6 months later to look for a refactor, its gonna be all gravy. How much shittier your code is just a distant memory. Get on that next feature - just another flag, just another lying variable - in hindsight, it's all subjective anyways.
I'm at the start of my career as a software engineer but it seems to me a good rule for refactoring is that it should only ever be done if the advantage can be evaluated in the form of concrete new features being easier to implement. If you refactor the code so that existing features would've felt nicer to implement, or are nicer to read, you've already played yourself, because maybe you get a high of ecstasy because the code that was ugly before is now clean, but as soon as the next feature comes around that you didn't anticipate, it was all for naught, and you're writing ugly code again.
But sometimes you’re working in a codebase which will probably outlive you. And in that case it often makes sense to think of your code base as a garden that you tend. Spending effort making tending the garden delightful will almost always pay off given enough time. And here I’m thinking about things like improving the build process to speed up compilation, adding tracing to make debugging easier, refactoring, documenting the core design and philosophy, and so on.
You can’t make Google chrome, the Linux kernel, llvm, or other projects like that by taking dirty shortcuts every time you add a new feature. If you do, you’ll end up with a buggy, unmaintainable mess that is impossible to change without introducing new problems. This isn't just theoretical - look up the story of Toyota's brake problem a few years ago. I've also heard some grizzly stories about software practices at Boeing and Oracle.
Good cooks clean their kitchens. Soldiers maintain their equipment. And efficient programmers keep tidy codebases.
* how quickly can you answer questions about business metrics?
* how easy is it to identify that bugs are occurring?
* how long does it take to figure out what to fix when it's broken?
* how much change is involved in fixes (e.g. does a PR fixing an issue touch like 25 files or does it touch just a few files?
* how quickly can new employees come up to speed on the code and alter it confidently?
* how easy is it to add new features?
Now, these things are hard or take time to measure, but those are qualities I look for when determining if a codebase is good.
There are some things that are not just annoying to deal with, but actively impede development or encourage further hackery to make them manageable. Those things need to be fixed at some point, or there will be a real cost.
So, if you've already shipped a feature and need to make some kind of change, you can't do so with any confidence. Implementing unit tests to validate you're not breaking existing functionality is impossible, so you need to refactor, which introduces extra risk.
The bonus of doing things like this is that if you are constantly changing your code then it becomes battle hardened to accept change. This gives you better ability to keep up with the changing demands of both customers and technology.
The problem as I see it: companies prefer the inexperienced devs because they're less expensive, and as a result, their inexperience leaves them frustrated with code that they "don't like." They then attempt to sell management on a rewrite on these grounds.
The team can only deliver features that customers want, on the schedule the business wants, if the underlying infrastructure of the code allows it. If the code infrastructure forces me to spend too much time implementing features, and feature requests continue pouring in, then there will never be time to refactor because there will always be "pressing feature demands."
Doesn't matter what the code looks like, as long as the customers
are happy.
Can you imagine if other industries had this same attitude?"It doesn't matter what the warehouse looks like, as long as we ship on time, and the customers are happy."
At one level that's true, but at another level, it does matter what your warehouse looks like, because a neat and tidy warehouse is what allows you to ship on time. A warehouse that's filthy and disorganized is one which loses orders, ships the wrong products. Similarly a disorganized codebase with poor abstractions isn't bad because it's unpleasant to work in. It's bad because when a dev ships a feature, they have little or no confidence that they've done so without breaking two or three other features and causing customer dissatisfaction in the process.
Most programmers imagine other industries to be better because they only see the customer facing parts, but that is like judging the state of a codebase by how nice the landing page is styled. Penny-pinching managers exist in all industries.
You need to measure these things objectively. There are several ways to estimate technical debt, like asking system owners how long their teams would take to bring their code up to a list of standards. The time estimate alone is an excellent tool to measure the scope or remediation cost. But there are also several more aspects to technical debt: transmissibility (how likely it is to spread), remediation competencies (does the team have the skills to remediate it), maintainability (how much does it cost to maintain this case of tech debt over a month or a year), attributed bug count (how many bugs are linked to this instance of debt), and so on.
Refactoring code just because someone has complaints is an indolent and unsurprisingly ineffective way of maintaining it. As you say, opinions about how code should be written are a dime a dozen. The shortcomings of a particular system need to be meaningful and clear before they can be assertively fixed. They can be generalized, like "this code doesn't follow patterns that would optimize it" or "this code has had too many hotfixes and is now causing many bugs", or precise, like "cache misses here impact systems A, B, and C severely". But they still need to be clear and business-oriented. Then tech debt repayments can be very effective. I have seen bugs attributed to a system go down by 90%+ after a short refactor when it was carried out effectively.
Investing into systems that prevent tech debt in the first place can also be effective, but it likewise has to be done in a measured and targeted way. If teams tend to make mistake X, then you can often deploy code analysis or submission tools to identify that mistake and reject such code. Over a project's lifetime, you may write hundreds of such code validation tests, but in teams of 100+ people, they can prevent a tremendous amount of work resulting from tech debt.
In short, it is a mistake to dismiss the tech debt problem when one has not put in the effort to tame it or has gone about taming it in a lazy, wishy-washy way. If the team has particular code quality standards illuminating tech debt and allowing the coders to target it clearly, it can be remediated effectively.
I don’t think that the answer is for engineers to just “suck it up” and wait until there aren’t pressing feature demands (which usually never happens). But instead, we should spend time reading the code our coworkers and past employees have written to understand the design and philosophy behind it.
If the design of something you’re working on isn’t good by your standards, your productivity will still suffer if you leave it how it is. Especially if you have good taste. The feeling of deep authorship - like, “this is my garden and I’m proud of it” is something worth cultivating even if you only care about output.
I can close way more tickets in a day if I feel at home & in control with a codebase. And that’s something that matters to business as well as the engineers on the ground.
If the answer is “No”, it can be difficult for others to understand. Reworking that code to follow our architecture generally makes it easier for us all to follow.
One might need to not follow the architecture because it doesn’t handle something new to the app. When that happens, we try, first, to update the architecture to be able to handle the requirement and then implement the feature. But if that isn’t possible, we plan to clean up the debt as soon as we can after shipping.
Overall, it has worked remarkably well for our team and has led to what I consider the “cleanest” code base I have ever worked on. But it is dependent upon us having a shared understanding of how we implement features.
The ability to reason about it or even measure it might be subjective, in the sense that most developers aren't really good at articulating quality. Also, developers often confuse familiarity and quality, or to be a bit more generous: are unable to accurately assess quality of unfamiliar technology.
Furthermore, there is also a widespread misunderstanding that if you can't 100% prove something beyond a shred of doubt, that any position is as arbitrary as a favorite color and any reasoning a waste of time.
> Doesn't matter what the code looks like, as long as the customers are happy.
Customers being happy or not is exactly the problem of Code Quality. When a system breaks down over time due to a pervasive neglect of quality, the customers won't be happy. In practice this is happening slowly over time. Your lead time increases, there will be more bugs and changes will cause more problems that are increasingly hard to overcome. It can also come as a collapse, for example a huge outage, data loss or security incident. This can kill your business.
Quality is not something done for developers sake, it is about the ability of making customer happy over time rather than just now. That is the ultimate justification, which isn't subjective, and the source of legitimacy of any time spent on refactoring at all.
If it is not ultimately rooted in solving a business problem, then it is not a quality improvement at all. And yes, sometimes developers try to masquerade personal taste as quality (refactor to something I know or like), and in these cases the outcome often isn't an improvement at all. In fact, it is often a regression. But that doesn't mean quality improvements aren't real or beneficial.
Yes, but there is a time dimension to "code works". It should work tomorrow when a new OS comes out, and it should work when a library is updated because of a vulnerability, and it should still work when a new feature is added, and it should still work with a 100x more users.
The only way to make sure code still "works" in these situations is to continuously keep modifying it. And here the technical debt comes into play.
There are objective parameters like: tests coverage, how many bugs you have (or were fixed) on from the static analysis, presence of periodic or constant fuzzing, portability.
Since the problem was the culture of continually pushing half baked features in the first place, the rule was quickly corrupted: people would design a good system, throw anything that’s not required for a POC into the tech debt backlog and deliver a barely functioning version.
“This is a technical debt task” was used to prevent everything that wasnt new Features taking time of the other 90% of the sprint.
Basically, if you assign a block of time to quality, you risk people taking that as an excuse to not focus on quality outside that block.
If the team does care about quality (as described in the OP), then something like a tech debt budget/carveout can be a good management/scheduling strategy to buy the team breathing room from the rest of the org.
I have used this strategy successfully in the past. For some reason it’s often easier to “spend 10% of time on tech debt” than “spend 10% of time polishing your code to avoid tech debt in the first place”. I don’t even think the latter is the correct way to build software, as you seldom know ahead of time what will justify continuous polish and refinement.
The advantage of discretionary tech debt fix time is it lets you gradually refine rough edges as they become pain points, with a low-friction bottom-up process (ie the developers that see the rough edge are empowered to fix it, rather than requiring PM/scheduling overhead for everything).
IMHO, at the point you have engineers misrepresenting that something is done, even if you're not in an application domain in which engineers go to jail for that, you take a step back and fix your engineering/product culture.
That might include nuking your current methodology from orbit, just to reinforce that organization needs to learn a new way of thinking. (Not just "don't do that" on one symptom, when whatever culture weakness that caused the fudging is still in play.)
It's well done, because it's almost impossible to prove.
Those of us who still care about system health and helping users just wait two years after they launch something, ask leadership for permission to delete it, and keep trying to build abstractions that reduce the harm misconduct has on our parts of the system.
The problem is they keep getting promoted because no one notices that almost nothing they've built is still in production.
A few of us have tried raising the issue to leadership but no one in leadership wants to know. It's a bit spooky. My current theory is that no one has the power to deal with it anymore, and at higher levels, launching garbage features that get deleted is actually desirable.
Theory: senior leadership keeps saying "drop what you're doing and build X". This group does that, but in a way that's half baked enough that we can kill it later. Middle management benefits because senior leadership is happy, and deleting it is fine because leadership forgot about it.
I wish I knew a way out.
People respond to an organizations actual priorities not whatever platitudes people toss out. If you want quality you need to sacrifice something even if long term results improve in the short term quality takes time.
In the story in this post, one important thing that happened was that the team (with management) came together to acknowledge the tech debt problem, acknowledge they want to solve it and talk through potential solutions. This moves the problem from something individuals care about to something the team as a whole acknowledges it care about (a shift from individual knowledge to common knowledge). Once everyone agrees this is a problem, individual engineers will know they’re acting against the will of the tribe by writing lazy pull requests. And being reprimanded by their coworkers for doing so will have much more weight.
Having just spent a week clearing up and properly implementing my colleagues' 'fake completed' sprint tasks, I can resonate deeply. I'm making it clear to my boss that it wasn't 'refactoring', I was just implementing the actual requirements properly.
I'm going to go over their next PR with a fine-tooth comb, no matter how long it takes
Tech debt and house repairs are the same.
Plumbers, electricians, roofers, painters and the like will all encourage you to handle your repairs timely and proactively do maintenance.
But in the end, it’s the owner (shareholders) of the home that is best qualified to make maintenance decisions based on their needs, wants and resources.
We as developers are service professionals (contractors) and it’s not unless you also have equity that you will have a better sense of how to handle tech debt appropriately.
Bring on the downvotes :p
A lot of managers feel the same way when engineers talk about tech debt, refactoring, etc. Maybe they've never read code, or aren't familiar with the specific codebase. Projects seem to be getting completed—what's the problem? Maybe velocity is going down, but are the engineers right about why? Or are they just being anal about something that's simply less than ideal?
A good contractor walks you through why the repair is needed, what your options are (cheap temporary fix, long-term repair, total replacement), the consequences of those choices, and how much each will cost.
Unfortunately in software we don't really know how to answer any of those questions. A lot of refactorings and rewrites just shuffle irreducible complexity around ("it makes so much more sense now!" says the developer who just spent a week studying the code and rearranging it to their personal preferences). Not to mention that we suck at giving estimates.
If the painters/electricians also lived in the house, and it was their only/primary shelter, the 'owner' isn't the only one 'best qualified' to make maintenance decisions.
Instead I think you need the unicorn of someone who understands product and engineering and can decide at any given time whether to pay down tech debt or add a new feature.
The hard part is explaining to the business the return on investment on engineering initiatives because often times, product thinks of this as a zero sum game. They really care about the sum output of features, but it gets conflated with the number of hours placed in product development.
It is engineering management's task to break that mental model.
I'd avoid the temptation even to have "projects" during that time - make it purely about accomplishing refactors and tech debt removal that can be completed during that day, and specifically don't have plans or a backlog.
Just having that conversation can reduce both perceived technical debt (people explain to other people why it works in that way and that's the right solution) but also team angst about the issue as they get to tell other people about it.
Whatever is the easiest, highest value things normally pop out and you can slowly work through them with better understanding.
It also means you don't start work on the hardest to solve peeve of a single individual just because they are the loudest voice.
You can refine technical debt by simply asking the question: what will the user notice if technical debt is not repaid? Use the answer to that question to phrase the task and suddenly it's not technical debt anymore, but actual work that the business will prioritize (or not).
Instead I want to pick out the worst culprits that bother most devs and in particular onboarding new ones and deal with those.
I'm pretty certain that means new features are implemented alot more easily. Hard to quantify though.
I can say I'm always thanking "yesterday me" for finishing the job and cleaning up whatever "today me" is working on.
It's possible for me to do this because I do most of my programming on my own. I don't think this approach is suitable for most software engineering processes because they tend to deprioritise technical debt.
Having said that in some places and on some days (and working with some people) I don’t always have perfect energy for the fight.
And if you have to explain that alot then you become a problem to the team lead.
1) Be upfront about it, the PM or whoever is making the business decisions should at least be aware that the ticket is not being addressed in the quickest way possible. Sometimes that's a real problem and that conversation is worth having. Mostly it's not an issue, at least in my experience.
2) Pretty much never done while fixing a bug that occurred in production.
3) Done in separate commits / PRs. Preparatory work for the actual change, basically. This makes understanding the changes, as well as later regression testing _so_ much easier.
I may also be doing it wrong, but this is what seems to have worked best for me and my teams.
When a dev invests time improving the codebase, refactors mercilessly, and cleans up the code - they gain mastery over it, respect from their peers, and trust over the architecture. This is fantastic for your career long-term.
In reality, with developers hopping between jobs, it's a difficult case to make. Ironically, a hot job market for developers ends up hurting a key mechanism for technical mastery.
Sounds like you’re doing it right!
> I don't think this approach is suitable for most software engineering processes
I think that’s just badly managed technical teams.
What I started doing a few years ago was to group issues together by module/service. When I’d push work to the todo board, I’d include other, similar work. After all, it’s much easier to fix things when you’re already in context.
If the volume of changes really gets out of hand I'll split it into two smaller PR's, then deploy one PR, merge main into to the other, and deploy the other. It's a balancing act sometimes to know where to stop and call it "good enough."
If you could measure and graph technical debt, I would find the idea of "planned debt" compelling, but the reality is is that people only understand they are in debt once they see the bill, and at that point it's too late. Rational people will see the bill and say "I don't want to pay that."
If the CEO or CTO do not treat debt seriously and like it is their personal problem, that attitude will quickly be organization wide.
Debt is a type of corruption. Corruption by its very nature grows greater than linearly. Just like with COVID, corruption and debt permissiveness have an R0.
When one person is corrupt, then another person is also incentivized to be corrupt. Every person who is ok heaping on debt/corruption is another person who won't resist it and instead breed self-interest/defensiveness, cynicism, and learned helplessness.
This all leads to a core understanding of corruption (and imho, technical debt), which is that you cannot plan for corruption and it will grow much faster than you expect. Once the culture is set, it is an incredibly hard thing to change. The people most able to make the change will have been the first ones upset by the culture in the first place, maybe enough to leave. The people who have thrived and seen organizational success and therefore gained organizational clout will be the least interested in a cultural change and the least interested in seeing the difficulty of their job go up.
That said, as much as I try to maintain a certain level of quality, eventually my designs break and I need to refactor things. In times like that I'm glad I have a good level of automated testing to catch regressions.
Most people aren't very good at code cleanup, so I take the janitor role, since I enjoy it and have the skills.
Works pretty well for us.
How can you tell that tech debt is at zero.
How hard is it to operate?
How often does it fail?
How hard is it to change?
How much time is spent in toil?
These questions underpin the idea of debt.
While there is no standard way to measure, I think almost any developer can look at code and either be impressed by its elegance or disgusted by its depravity.
Tech Debt is almost always a few things in disguise:
1. Product Failure - The person with the final say on the product has poor taste. They poorly understand the tradeoffs their tech stack gives and how it influences their product. And most importantly they hate their customers and don't understand their needs.
2. Marketing and Sales dominance - The product organization might be competent but they have been driven out of the decision making venues in the company. So marketers and sales make the decisions and hand them down to product to deal with, resulting in tone deft decisions even as the company continues to make good money, the product itself erodes, until competitors arrive and instantly wipe out the business.
3. success and shift - The product, which was initially a success, has stagnated. The product itself was so successful that it created an entire new market with competitors that resulted in commodifying the product. Now the company is frantically looking for a pivot to keep growing the business, resulting in the original successful product drying up even faster.
4. Leadership void - The product was made by a strong, pioneering leader. They might not even consider themselves such a leader, but after they move on, the product fails without their support. The replacement leader might not even be bad at running the business for a time, but eventually focusing on EBITA alone won't inspire people, and the product will erode through churn.
5. Press Release driven development - The company operates by making a moat around its original core offering and they have a semi-monopoly in their space. So the only way to drive more revenue is to sell services that cost more to already existing customers. As a result, teams build products that are made to drive hype cycles and press releases -- once the product is shipped, the major players get promotions and move on to the next exciting product instead of supporting the now-shipped product.
> Turns out some apparent tech debt was actually code that was better left untouched, had there been better documentation. We documented what we would not refactor or remove.
> Better clarity on the design and architecture of the code, enabled us to make better judgement calls when we had to cut corners due to the time constraints.
There is certainly real tech debt. But I have lost count of the number of times in my career I've heard a bright, but less experienced, developer claim that because a certain piece of code uses Formerly Popular Framework A from 7 years ago when it was written instead of Current Popular Framework B, it is unmaintainable tech debt, despite it having tests, a working CI pipeline, working monitoring, etc.
https://stackoverflow.com/questions/9318895/how-to-integrate...
Sure, there is an answer to the question, it is a little messy, not too bad, do this 30 times and you have a disaster to maintain.
Formerly popular is also a red flag, over time it will get harder to hire. Resources and documentation will go black.
Sometimes it's quicker for me to implement a desired change on a code that was written long ago by someone who is no longer with us by re-writing that part of the code in the way I understand it, and then implementing the change, as opposed to trying to understand the code and implement the change in a way that aligns with it.
Example: Somebody years ago bet on Angular 1.x, migration to 2+ was deemed “a near-total rewrite” and postponed, so now we have to rewrite in React or whatever flavor-of-the-month. The same happens with Java, .NET, and other backend frameworks.
How often do developers who are quick to move to the next shiny thing held accountable for years after?
How often do they ask the hard and boring "What if" questions?
How much is empathy a driving factor in these decisions?
The ideal is when a team's experience actually matches the project they're assigned. They need to be up to the task.
This requires management to be at least more experienced than the teams they manage and to make good hiring and placement decisions. This is not just number of years in the industry, but the average number of years spent at any single company. They need to have seen the long tail of maintenance in the development lifecycle on projects they were wholly responsible for.
Your PM is meant to be commercially minded, meaning they should understand the concept of compound interest.
Every new feature you build, on average, increases the value of your product linearly.
Every major piece of tech debt incurs reduced efficiency that compounds. It is extremely common in older systems for the incremental value of resolving tech-debt to be HIGHER than the incremental value of shipping "another feature".
Why do you believe this? It doesn’t match my experience and I don’t think it’s a good way to think about it.
Sure, startups and new products will either go gang busters or broke. But any decently mature product is almost always going for steady incremental improvement based on feedback and market shift. Now, I'm not saying they're achieving that... but that's what a decent PM should be aiming for.
But complexity (particularly if you can't squash dependencies) tends to go up by orders of power as sloppy code grows. I think that's their point.
A) People that were doing some other job (Engineer, BA, Project Manager) and over the course of a year or two, just naturally gravitated towards taking more and more of an ownership role over the actual business outcome that the team they were working on was trying to achieve. The job they do looks nothing at all like the one their job description says they do... but it seems to add a lot of value and a lot of people seem to want them to keep doing it. One day, they meet someone called a "Product Manager" and they go "holy shit... the stuff you do... that's what I do... am I a Product Manager?". -- These people make excellent PM's.
B) People that were doing some other job (Engineer, BA, Project Manager) and over the course of three to four years, saw a small sub-set of their colleagues lead really meaningful projects, garner a fair bit of respect amongst their co-workers, probably end up getting paid more than they did, and eventually go on to working in roles called "Product Manager". These people decide that they too want to be respected, paid more, and lead meaningful projects... so they add the word "Product" to all their previous job titles on their CV, go do a 3 day Product Owner certification, and then start applying for PM roles. -- These people make fucking terrible PM's. They also tend to make up ~75% of PM's.
Beyond basic horse sense, technical credibility, customer outreach, market sense, & roadmap development by nurturing sheafs of competing priorities while constantly mindful of critical constraints...
...A PM has to make their management happy. Who increasingly isn't technical at all.
Show me a 'pointless, clueless, idiotic' user story or epic, & I'll show you the result of PM negotiations with stakeholder(s) who asked (demanded) something far worse.
EVERY non-technical %VP, C%% and VC I've met just Loves to say "it's just software; it can do anything"... & completely miss the bitter irony that statement should embody.
Teams saying “tech debt” are perpetually under funded and under appreciated.
Instead, speak a language your management chain understands.
* These specific services have outgrown their architecture and back pressure keeps outgrowing their current scale, we need to invest in a more reactive architecture. It’s going to cost 3 teams 1Q and we will prevent N outages based on historical data.
* In 2023, engineers far fingered the deployment of these services N times causing various levels of service outages, one made the news, we need to invest in guardrails in our CI/CD to prevent that. It’ll cost one team 2Q and we will prevent N outages.
* We had 4 employees across our engineering org quit last quarter because holding the pager burned them out, we need to stand up a tiger team that can help kick our metrics into shape.
Speak a language your management understands. Speak in terms of delivering features (feature velocity), reliability (outages), employee retention, hiring through resume driven development, etc.
You’ll find you’re negotiating in a positive sum game if you do this. You give me 1 unit of investment for this problem this quarter and I’ll give you 1.3 units of return next quarter. And maybe there are greater returns elsewhere so you aren’t making a competitive bid and that is okay, or maybe your management will invest in you and you just signed yourself up to deliver 1.3 units. But don’t handwave and ask for budget.
But here's a crucial pattern in your proposed language (not sure if you noticed it): you have to let bad things happen first. You need N outages to happen. You need N people to quit.
I still believe it's the right thing to do. Humans suck at being objective. The moment we find a "flaw" in the architecture it becomes the most important thing in the world to fix it. Even if the "flaw" was there for 5 years and never caused an issue.
Sticking to objective signals (outages, quitting, bugs, etc) is the only way to stay grounded in reality. But you have to let those signals to happen first. More than that: they need to happen often enough to start forming a pattern. It's just the cost you have to accept, because the alternative is much worse.
E.g. it is impossible to invest in reliability, refactoring, bugfixing right on time. You can either be too late, or too early. Counterintuitively, being too late is almost always the best option. Reason being, there's virtually unlimited number of improvements you can do too early.
That said, none of the engineering teams I worked with could accept that. I know I couldn't.
Yes! And if they don't care about velocity and reliability, don't tell them that. If they've been going on about hiring and employee retention, tell them how this tech debt thing is going to have such a huge change that you can turn it into a kickass conference talk and hire more 10x programmers, which is a bigger value to them than "I made the app more maintainable". They don't care much what your team does; just give them something they want to buy.
A software company is a software company, not a technical company.
So, in terms of Risk and Responsibility, discussions of Technical Debt don't sufficiently examine the nature of Risk (across-org complexity sources, wetware, workflows, market feedback cycles etc.). The concept also skews and pigeonholes the Responsibility of dealing with it to a small subset of people.
I always plan on spending some percent of my time on work I know I'll have to do before I can start adding the new features. Regardless of how I'm asked to estimate, I keep in mind the cleanup work in accounting for the total effort.
Unfortunately there are still a lot of PMs and folks, including some programmers, who aren't aware of or don't understand the need for structural maintenance. Those are the teams I regularly see start out going very fast but within a few months get bogged down in having to work in the awful system they built.
Then at some unknown point things slow down in a noticeable way and everyone scrambles to fix the slow down without fixing the culture that lead to it, and that’s if you’re lucky.
Somehow in management eyes rewrites are more palatable than refactors despite costing 10x more
In all of these discussions about reinventing/rewriting/reimplementing already solved x, people forget that other people aren't interested in maintaining something where the fun, education, and impact was already had by the people who came first! They want to have their own fun, education, or impact.
Pretty much every conversation that starts with an angry "this has been a solved problem since such and such genius from the 70's did y" is born from misunderstanding what drives people to do anything.
If you're delivering features quickly it shouldn't matter if you spend some of your time making sure you'll be able to continue doing so in the future.
In part, it is resume-driven development. Rewrites are major projects, and running them (and the associated scale and budget) looks good on a management resume, and provides a nice accomplishment item.
In part, its the short-term incentive to be seen to do more with less, and move on before the consequences bite (this is also a kind of resume-driven motivation, though its more resume-driven development avoidance.) You defer work on anything but visible features, and do those in the quickest/cheapest possible way that will work in the short term, and use the credit for efficiency to move up and out before the deferred maintenance catches up. When it does, someone else gets to do a resume-enhancing big replacement project, since the state of the app makes fixing in place seem impractical. (Ironically, even with deferred maintenance, incremental remediation would probably be quicker, more efficient, and less prone to major timeline and budget-busting surprises, but when it has gotten bad enough, even the technical people who might recognize that will back management’s desire for a complete replacement because they don’t want to have to deal with the legacy mess.)
Because many of them are either incompetents or expect to run to another project or company by the time things start falling apart.
One other trick I used, is to have a “tech debt week” at the end of every quarter. 12 weeks of coding per month, then a week for the managers/PMs to evaluate the last Q, and plan the next one. At the same time, while there is this awkward 1wk window where the plan is in flux, the engineering teams can focus on polishing their tools and attacking tech debt that might take more than a day every sprint to make progress on.
Of course, this planning window is probably too short a window for big companies, and too much planning overhead for one-team startups. But for 20-30 person engineering org this cadence worked well.
I've never done it in practice so I don't know how well it really works and there are some other parts of Shape Up that I disagree with strongly.
Worth noting that they build and open-source frameworks like Rails, Hotwire; if you want to polish and share, you need extra bandwidth vs. just building internal-facing products.
If I am a PM, and my team's velocity is badly bogged down by some tech debt, then the right allocation to fixing it is 100% (fix the shit so we can go fast forever.)
On the other hand, if some "tech debt" doesn't actually impact team velocity/clients (eg, some code is "bad" but it's in a part of the system that's never touched) the right allocation is 0%.
There's ultimately the only thing that matters - getting value out to the customers. Tech debt only matters to the extent that it gets in the way of that, so prioritizing it vs features is easy because at the end it's still about "what do the clients get, when"
The X% approach seems to happen when engineering and product fail to have that conversation, fail to understand each other, so they have to just get a flat allocation each.
Suboptimal.
> and my team's velocity is badly bogged down by some tech debt, then the right allocation to fixing it is 100% (fix the shit so we can go fast forever.)
That's impossible. Software, like all things in nature, evolves and degrades over time and there is unlikely a magic 100% fix. Something can always be improved.
> On the other hand, if some "tech debt" doesn't actually impact team velocity/clients (eg, some code is "bad" but it's in a part of the system that's never touched) the right allocation is 0%.
Tech debt is also a great way to get your executing development team down to 0%. Yes, some people leave organizations because the code quality is bad and no one cares to improve it and it makes everything just a tad worse. This can compound over time. One day, the thing that is never touched will need to be touched and no one will be able to understand it in a timely manner, leading to other failures and re-prioritization.
> The X% approach seems to happen when engineering and product fail to have that conversation, fail to understand each other, so they have to just get a flat allocation each.
Agreed, a constant conversation is a good starting point but eventually, there will always be a bias in one direction. As you demonstrated, your bias is "There's ultimately the only thing that matters - getting value out to the customers". This makes you more likely over time to prioritize shipping to the customer over invisible improvements. The feedback loop continues until things breakdown. If you're in doubt, go ask your EM/TL partners if there isn't some technical debt they think is more important to address in the next 3 months than whatever you're currently trying to ship.
Technology organizations are quite complex systems but i appreciate your attempt at simplifying.
// One day, the thing that is never touched will need to be touched
I would categorize that as a potential risk to the thing that matters (customer delivery over time.) So the conversation that should be had then isn't "module X is written in a bad way" but "we're down to the last person who knows module X, if he leaves and we need to change it, we're screwed. Are we willing to tolerate this risk?" (the answer may be yes or no, but expressing it in terms of impact on client delivery is more accurate than just tech debt)
// Something can always be improved.
Yup, similarly to how you can always decorate your house better, buy a more fully-featured car, etc. The question is should it be improved, at the cost of whatever else you can be doing?
// If you're in doubt, go ask your EM/TL partners if there isn't some technical debt they think is more important to address in the next 3 months than whatever you're currently trying to ship.
That's exactly the conversation I want to have. Because if they are right, I want to be persuaded so I can advocate for that investment. But if they are wrong, I want to persuade them that whoever we're doing IS more important, so we can focus on that.
// As you demonstrated, your bias is "There's ultimately the only thing that matters - getting value out to the customers".
That is my bias indeed. Sometimes this bias causes me to ask my engineers to fix non-functional things when they are charging ahead to do features - it cuts both ways.
If you follow the "boy scout" rule of leaving every file a little better every time you touch it, you will be slowly cleaning up debt while also slowing down velocity. If that is not an acceptable trade-off to the PM, then they need a proper estimate for what the cleanup will take- that's the armor they have to shield you from shit raining down from higher levels when someone wants to know why the new feature timeline is being delayed.
If your tech debt is so bad it can't be incrementally improved, then it can usually be reframed as building a new version of feature X when really you're just rewriting it. Sell it as a performance / stability improvement and voila, it can be prioritized appropriately.
If you have tried all of the above and your work priorities are still being dictated, either the company is in a tight spot (contracts require features on a tight timeline) and you need to muddle through until the next phase, or the company isn't regarding you as a professional and it is time to move on.
No skilled tradesman worth their salt would let management tell them not to clean the shop because that takes time away from building things. The professional dictates the work practices.
it is probably more so that they can still have good estimates. If you start introducing new work that has no time limit and no timebox, it can push out your estimates.
Tech debt is never ending discussion. But ask yourself: why is almost every team struggles figuring out how much time to dedicate to tech debt? Why is there a constant tension between engineering and management?
The answer is simple: engineers and business people don't understand each other. We live in different worlds and speak different languages. You can't solve this problem with a methodology. If Tech Debt Friday or Google's 20% works for you, that's just luck (or wishful thinking).
Once you understand where the problem is coming from, the solution is also simple: find someone who speaks both languages and trust them to decide. Typically that person is a product engineer: it's easier to explain business to an engineer than engineering to a business person.
And when I say trust, I mean both engineering and management should do it. I.e. if that person says a couple of outages is not a big deal, it's not a big deal. If that person says we need to spend next week refactoring, then that's what you spend your next week on. Obviously you can still challenge those decisions, but you have to accept that this person is an expert in their domain. They don't know everything, but they know more than you. Even if you're the CEO. Even if you're coming from FAANG with 20 years of experience.
That's why companies where founder is a product engineer typically don't have a tech debt problem (they still have tech debt, they just don't have a problem with it).
I know it sucks to not have a mathematical or managerial solution to tech debt, but the tech debt is inherently complex and humane. Tech debt happens when humans are solving problems for other humans. The only solution to it is to have another human in between. The quality of that solution will depend on the quality of that human. It's not going to be perfect. But it's the best you can do.
All other approaches are just shots in the dark.
This is ideal.
But finding code monkeys that can also tapdance isn't trivial. For some companies, it isn't possible; or HR won't pay the price to hire them. Sometimes management wildly overestimates their technical chops and won't tolerate "bilingual" middle managers because such people are constantly advertently or inadvertently reminding top management of their actual lack of technical knowledge.
So this is where the Fridays scheme comes in; it requires you to gain that trust once, for the one day a week of self-directed work. You might still be able to do that. It also helps prevent dev team "tunnelling."
The author makes clear at the very end of the article that in some companies, even the amount of trust nec to pry loose one day a week can't be obtained, and the Friday scheme won't work.
This is certainly the answer to this problem!
No software will be perfect. Eventually it will die and something will replace it. I think knowing what "good enough" is, is perhaps the more important capability. Larger fruit hangs lower than tech debt the vast majority of the time.
Tech debt won’t kill a company but it will strip away its immune system leaving it vulnerable to the slightest hiccup. Then the deadly hiccup gets the blame.
But Microsoft (who I think probably got wind of this product) announced a very similar product (that never appeared as such, and did not appear at all for at least a decade. The problem was harder than it looked.) Our financial backers immediately and fully pulled out, verbal contracts notwithstanding. With no more cash, the company was dead. A few years later others created just part of this product, and raked in very nice profits for a long time.
If the same prototype had been built on the more modern compiler, that would have been the minimum viable product.
The more modern compiler was one able to use more memory as the 286 cpu allowed.
Everything needed to be solved is under your control. It may take time and be boring, but you have everything you need to solve it.
Product-market-fit, customer acquisition, and etc. are often much more urgent/difficult to solve, and we should focus on those first.
What I've seen in a lot of teams is tech debt (or speculative tech debt if we move quickly) is exaggerated to the point that they cannot launch quickly to acquire and iterate with customers, which is a huge mistake in product development.
v1 - shitshow for product market fit, deliberately debt heavy for speed/iteration
v2 - from the ashes, this is what v1 should've been, emphasis on the long-term
You've already done the hardest bit (imo) by figuring out the kludge blueprint that is v1 it's reasonably easy to build v2 with the lessons learned from v1 fresh in mind.
Of course, v1 is the default model in most shops and there is no v2.
In a lot of the times, there is no V2 due to 1 of the 2 states:
1. V1 is too successful. You cannot slow down the growth. It is bad, but not as bad as people make it out to be. Many startups are trying very hard to be in this state.
2. V1 fails, so there is no point for V2.
In a large company, there is an additional state: long term ownership is hard. By the time, we should implement V2. The original team is already promoted for PMF and move on to a new shiny project. The new team would just complain tirelessly about the tech debt because obviously they aren't getting the reward and are stuck with a shit job.
Now nobody would advocate for this way again.
Not sure why my argument now turns the opposite direction.
Shitty developers are going to make shitty software.
Every City Mayor wants to build their bridge in 1 week. However, you don't see Civil Engineers "cutting corners" and leaving "debt" when designing and building those bridges. There's a minimum quality that they will sign off. No matter how loud their bosses shout.
When I lead Software Dev teams, I always told developers not to compromise certain quality factors. If someone asked them "can it be done faster" and they answered "yes, but X thing would suffer" they are giving a door to for sloppiness, and the Business side doesn't have a way to understand the implications of X. They only hear "yes". On the flip side, if an developer says "it will take 2 weeks", there's nothing else non-technical people can do but sit down and wait.
The only ones that can really challenge that are other technical people, and hopefully if Business "escalates" the developer to his manager's manager (up to the CTO), she will be backed up by the technical higher ups, if she makes sense. I would sit down with people trying to shorten development time of some things. There are times where it is sensible to get some debt, but it is a decision that shouldn't be done by Jr or even Mid level engineers.
> Shitty developers are going to make shitty software.
That’s really not the point I was trying to make. It’s more of a trap to blame ICs on the ground in most situations when they’re usually only following existing practices. All this will do is demoralize your engineers.
Mismanagement in the form of poor staffing choices (especially moving tenured and experienced engineers to the next shiny thing) is often the culprit. This usually stemming from the pressure from leadership to ship stuff fast, as usual.
Once you sacrifice quality of any kind, that bar gets lowered for good, and before you know it there are so many examples of poor quality that any issues get lost in the noise.
Thus technical debt always includes building a time machine so that your initial customers can use the product they should have gotten from the backend they should have gotten it from, back when they should have gotten it.
In other news, the world's richest person is actually the world's poorest: Elon Musk built a tent in people's minds and now he has to deliver on it. His technical debt is a hotel on Mars. Since technical debt is real, that makes him the world's poorest person by a huge margin.
Also, since technical debt is real you can deduct it from your taxes just like any other debt, right? I mean we wouldn't want the IRS to be unaware that you owe more than 100% of the time and money you invested in building a solution so far, since it could lead them to want to tax you based on what you earned, which just plain isn't fair. They should only tax any remaining money after you have engineered a veritable Taj Mahal of code quality such that there isn't any remaining conceivable improvement. After all, you know what another word is for conceivable improvement: technical debt.
For example, if my boss and I agree that I should be focused 80% on new business development and 20% on keeping existing customers happy, it doesn't mean I allocate every hour or every day this way, but it does mean that I pay a lot more attention to A vs B, but don't neglect B altogether.
That's a useful way to agree on what is important in your context so your work can overall reflect that.
This also avoid "moving around deck-chairs"-refactoring since the refactoring are coupled to specific development tasks. You refactor to make implementing the task easier and cleaner - no more, no less.
If some area of code is "ugly" but works and don't need any functional changes, leave it be.
Scheduling tech-dept-payment separately runs the risk of getting de-prioritized. If a deadline is approaching or the company need to cut expenses, I'm sure the dedicated "tech-debt-payback" time is the first to get cut, "temporarily".
Autonomous teams with bottom-up decision making are much more likely to push towards the right thing even with clueless management, whereas there is little hope when incompetent decisions are fed to the team in a top-down manner (yes, autonomous teams indeed exist - though they are typically a sign of competent management).
I have seen "code cleanup" that made things worse countless times. Especially if the idea is to dedicate time just for that, instead of doing it as you go. I expected the article to be about one of these failures.
The article describes 7 points, and suspiciously nothing goes wrong. It is rarely how it happens, there is typically some trial and error involved, and knowing what failed is as important if not more so than the final solution that works for you.
My boring little IT company in the UK owns its premises - roughly an acre in a town in Somerset with a 19m long edge red brick two floor building. The property costed us about £240,000 - the mortgage is cheaper than the rent we paid on part of a converted stables really out in the Styx.
Now with a mortgage there is the fine print. In the UK it is normal for a bank to require a "debenture". Your Loan to Value is below a percentage then the bank can intervene at any point and take over, which is what at least one bank in the UK did to try and shore up their finances when it all went south in 2007ish.
There is another thing called an "Overage Clause". That's where the vendor wants to make a hold over future profits. So we bought our place from the NHS and they sought an overage in perpetuity (for ever) on any profits we might make on selling the place. We negotiated on 10 years. That expires this year. With it, we would have to hand over 50% of any profits from sale.
Running a company is not rocket science but it can be tricky. My little firm will never be a unicorn or even a mare with a cornet on its nose. I don't care.
What you need to tackle tech debt is a quality program with metrics. I’m talking code coverage, cyclomatic complexity checking, linters and scanners, DORA, SAST, DAST, etc. Quantify your quality. Then quantify the risks and costs of not improving it.
Then you need to target the areas of code your tools tell you to and you need to make a conscious effort to solve those very specific things. “Module A’s complexity score is 26. Our standards say this needs to be 10. Therefore this is considered a quality item. Therefore it goes into the sprint as a strategic investment.”
Software and business leaders, when developers talk about tech debt… they are talking about managing complexity. (Shit breaking all the time because you don’t have tests in a complex system. It’s failing because of the complexity.) High complexity is expensive. If you do not balance the need to manage complexity against features and you do not act intentionally about your quality your software will eventually fail or if you’re lucky it will just reach a stage where you can’t maintain it anymore and you’ll scrap and rebuild.
Investing in quality makes you go zoom.
> Having dealt with tech debt in a collaborative manner, enabled us to do the “regular work” faster because we had a better collective understanding of the code, and the code was cleaner to work with. One could argue this is just a positive effect of mob programming, but the lack of a concrete agenda also helped the autonomy that unlocked creativity.
1. code the MVP the customer accepts (customer is happy) 2. go ahead and create all the debt tickets (makes you feel professional) 3. every xmas just delete all non-high tickets older than 12 months.
If the code is shit enough, it will die a natural death (most of today's wiring type code has very short TTL anyways) Similar to "if a tree falls in the forest...", "If a customer doesn't notice...". We do so many 'invisible', 'hard' things in our line of work, it's almost an impossibly thankless road to build quality into our systems.
Warning: your house might end up looking like a hoarder's house!
Yes. If the "keeping the lights on" work has been neglected for long enough then yes, in fact 20% is a lowball number.
The context is not so much "keeping the software running" but "reducing the friction in further changes to, and deployments of that software"
Imagine an idealised future state of the codebase, which everyone buys into, and make sure any new feature is going in that direction.
Refactoring existing code can be death by a thousand cuts- having a parallel new codebase which is incrementally adopted can be more efficient and quicker to market.
The author mentions 180k of code... but did the team actually chip away at that SLOC thanks to "Tech debt Friday"?
https://github.com/kaihendry/graphsloc is how I track projects.
There is no such thing as tech "debt". Or, if there is, it's entirely subjective.
I've seen things described as a "6+ month major rewrite" that another developer could address with careful, incremental enhancements over a matter of weeks.
Even with a good test suite, you can go very far. Clean test (interface) is more important than clean code (implementation)
Is this actually true? In my experience, in the long term the failure rate tends to zero
The first rule is not to create tech debt in the first place. The PR (Pull Request) that creates tech debt should come paired with the issue to deal with it.
The author writes:
> "To their credit, I came in when the code was like a crumbling Jenga tower"
Structurally bad systems look a lot like that. Small misalignments and nonlinearities compound to make the structure vulnerable to a sneeze.
Relatedly, I feel it is not so much what % was applied, but what targets were de-risked. Things that tend to matter include mean time to recovery, time to market from _planning_ to deployment (not just commit to deployment), defect rate (e.g. hotfixes and rollbacks). If the superstructure is good, one can start with cheap panel walls and gradually swap them out for nicer ones with better properties.
[1] cf. a post I wrote to ruminate that angle: https://www.evalapply.org/posts/software-debt/index.html#mai...
edit: add source