There's the case of mysterious and unsolvable breakage. The product simply stops working, and the team is unable to get it working again, period. This can happen with really ancient legacy products where the original team is gone, or young products that are written badly by inadequate teams.
There's the case of unpleasantness. A product is so difficult and slow to work on that the company simply loses interest in it, and shuts it down rather than suffering through more maintenance. This does not happen with products that are highly successful business-wise, no matter how bad the suffering, so it's really a business failure rather than a technical one.
There's real antiquation. The product is dependent on a product of an outside vendor that is no longer available/maintained. I've dealt with this on a mainframe replacement, and it was horrible. I've also dealt with this in Java, and it was plenty painful there too.
And finally, there's replacement. A product is replaced (or intended to be replaced) by a new product that does more or less the same thing, only this time with a smart new team, in a hip new language, and by the gods, this time it's not going to be stupid and suck like that piece of crap the morons on the old team built! Most of these projects fail before they ever replace the old, working code, so I'm not sure this counts as technical debt failure.
One thing often feeds on the other. Because the system is hard to change, it does not get necessary features. Because it does not have necessary features, it provides less business value. Because it provides less business value, there is less of a budget for improving it. And so on.
If there are changes in future numpy versions we want, it's up to us to backport them, which is nowhere near our core business.
There's a lot to be said for standardization and 'boring.'
Now add in a script that updates this artefact with the latest version, breaking changes, and it all goes wrong and it's hard to find the correct previous version of the binary artefact to build your code any more. Especially if the build has been broken for a while due to the project being on the back-burner and it quietly dies when no-one is looking.
Also, this case can impact not just products, but organizations. You can still find teams dependent on an antique commercial version control system or IDE that greatly slows down or even stops work. I've tech-led jumps to new version control systems a few times, and it's always riddled with anxiety, strain, and management angst. (And it always makes the team far happier and more productive!)
Sounds like Rational ClearCase.
In the cases of this I've seen, it's always been because management and team priorities were unaligned.
Management in those places cared about a minimum level of productivity and minimizing risk.
Teams cared about maximizing productivity and their work days not sucking.
As long as teams kept managing to soldier through... rarely saw things change in those shops.
mysterious and unsolvable breakage: Helping another startup work through one now. It's a case of reclaiming functionality from a mystery outsourced codebase (without source control) meets inexperienced developers who try their hand at sysadmin plus a 100% rotated bevy of actors (the whole team, PM and all, have jumped ship), no documentation and no technical oversight. Offshore outsourcing adds cultural fun.
unpleasantness: I would expand this to unpleasant or incomprehensible. I have seen projects be de-resourced because of lack of management comprehension when they literally paved the best and most rapid path to profit (later taken successfully by the now-dominant competition).
antiquation: The best example of this I've seen was a hardware product an employer was developing as a joint venture in Taiwan early in my career. Engineers had made the decision to use a sucky chipset from a struggling company to save money, but the supplier went under and the API froze (bugs, missing functionality and all) before our product development could complete. The target feature set was literally impossible to implement on the hardware and nobody wanted ownership. Many millions of USD, wasted.
replacement: It can work out, just infrequently. Generally when it works it's a smaller system with well defined interfaces.
I'd imagine a number of these cases are caused by a heavy reliance on third party technologies that are no longer supported, or very few people still understand.
[1] https://8thlight.com/blog/uncle-bob/2012/08/13/the-clean-arc...
They did something in an horrible way knowing it was horrible but because they have been asked to deliver a feature as soon as possible and at any cost.
This is a slippery slope. It lets a company move faster till it reaches the point that the software becomes an unmaintainable pile of hacks.
Technical debt may not have killed the company directly, but we have to wonder how we might have done if we could have spent more of our time on new development.
The project wasn't killed specifically because "you have technical debt". It was killed because there was no way for anyone to be effective with the combination of poor undocumented code.
"We need to change the email message that goes out when someone registers". This took a team of (4?) people 5 calendar days to change. As a contractor, I had to vpn in to one system, then remote desktop over another vpn to another system. Building web apps, these dev systems were not allowed to talk to the internet at all, so things like pulling external dependencies (security libraries, templating libraries, etc) was impossible - pretty much everything was handrolled, largely due to this restriction.
The last big killer was that the system was not passing accessibility audits. Trying to determine where to make a change to any single element would take minutes to hours, vs seconds to minutes you'd normally expect. Much of the 'templates' used were the result of a SQL statement joining 12 tables (html_meta, html_form, html_link, html_grid, etc) and complex concat()s, so adding a page or making a change might take an hour to track down the appropriate collection of tables, then figure out a SQL script to run, then send it to the person who had permissions to make updates to the SQL, then wait and see.
Did the technical debt itself kill the project? Technically no, but the inability to do anything productive in a reasonable amount of time forced the project to shut down.
I went through one of these projects. The tech debt was never as bad as you describe, but it was a small company operating on a short runway. It also taught me an unfortunate lesson about non-technical founders and the dangers of outsourced code.
The MVP for the company had been bought off the shelf. It worked fine, but the code was abstruse and utterly resistant to change. As the price (in time and dollars) of change requests grew, they sensibly in-housed development. Unfortunately, their clients had some idea what to expect in terms of features per day and dollar. Requests like "let us use our logo and custom color scheme" turned out to be serious challenges since every color and style decision was clumsily hardcoded, so we took far too long to achieve them.
Ultimately, we ended up a contract behind - bringing in business to fund delivering on the previous request. Most startups operate under the gun like that (with either fundraising or contracts), but they start there and labor to escape. We started solvent, and had no clear plan to break out of tech debt - a rebuild would have been too slow, 'working smarter' wasn't viable, and expanding the tech team would have come too late and too costly.
So, we died. Not because we couldn't do work, but because we couldn't do it at a competitive speed.
Stop me when you recognize this one: "Hey your product is great, but we really want something that does [totally different thing]. If you just add that thing, we will pay for all the NRE and you can sell it to others as part of your product! Win win!" Advice to junior developers: If you hear such talk in the hallway, RUN!
"But I have all this speed. I'm agile. I'm fast. You know, this easy stuff is making my life good because I have a lot of speed."
What kind of runner can run as fast as they possibly can from the very start of a race?
[Audience reply: Sprinter]
Right, only somebody who runs really short races, okay?
But of course, we are programmers, and we are smarter than runners, apparently, because we know how to fix that problem, right?
We just fire the starting pistol every hundred yards and call it a new sprint.
...It's my contention, based on experience, that if you ignore complexity, you will slow down.
You will invariably slow down over the long haul.
...if you focus on ease, you will be able to go as fast as possible from the beginning of the race.
But no matter what technology you use, or sprints or firing pistols, or whatever, the complexity will eventually kill you.
It will kill you in a way that will make every sprint accomplish less.
Most sprints will be about completely redoing things you've already done.
And the net effect is you're not moving forward in any significant way.
[1] https://github.com/matthiasn/talk-transcripts/blob/master/Hi...Some people really seem(ed) to have an allergy to plain files for storage. A plain file with OS level caching will beat most (if not all) databases for static content. But doesn't sound as fancy, so it's probably harder to charge a lot of money for it.
Also, just repeated your comment to a friend who said "that's the worst thing you've seen? can i have your job?" :)
This means lots of business-rule crap gets softcoded into the database or ini files (increasing complexity and bug-risk) just to support a hypothetical future where somebody needs it changed without a full sprint cycle.
No one could install anything locally - everything had to be done on their locked down remote systems (some were Amazon remote desktops).
For the accessibility testing, the auditing company used JAWS. The company I was contracting to had one license (or so I was told) so I couldn't have one. We actually tried to install JAWS on an Amazon desktop, but it just crashed the entire virtual desktop, requiring re-imaging. That happened twice, so we gave up.
So, the proposed workflow was, I'd make a change, push code, email someone to move that code to a system that an internal tester could look at it. I'd get an email back, then email the internal tester that the code was ready to go look at. The internal tester would go to the screen(s) in question, using JAWS, then "tell me what JAWS said". That would often take several hours or a day.
I was then supposed to make changes based on that feedback, then repeat the cycle until things were 'fixed', then we'd ask the auditing company for another test, which they'd schedule for 2 weeks in the future. Then we'd wait.
During the first iteration of this part, sr mgrs kept asking me "when will this be done?". I kept trying to explain that we didn't even know what "done" was - the auditing company just had blind folks that would use the system with JAWS enabled and if they felt it was usable, they'd say so, otherwise, they'd report back "hey, this isn't usable", and we'd have to start digging in again.
I don't see how a big project could be coded without containing anything specific to the project. And even then, the architecture by itself is unique and deserves documentation.
I am currently working in a business where there is a nearly 8-year old Rails app (600+ models, 250+ controllers, 400+ libraries, LOC around 60k), that sits at the heart of everything we do.
The company is struggling to grow and believes the cause is that engineering is slow. We have asked to refactor this code base multiple times, and point to the technical debt as the cause features that should take a day to implement taking between 3-4 weeks, typically.
It is only recently that the penny has finally dropped and they've realised if they don't invest in replacing this thing (there is too much technical debt to fix, we're calling bankruptcy and moving to a brand new architecture piecemeal), the business is likely to fail within 1-2 years.
That means my current employer is likely to go bust because of technical debt within 2 years max unless we become really good at fixing this.
We are optimistic.
We have to be, right?
Basically, for a long time, the company never really re-evaluated what it had learned and spent time trimming things down, so as a result there is this ungodly mess. At the heart of what the business does, there is no real need for more than a dozen models. So why do we have so many more? Nobody ever refactored away stuff we didn't need any more, and so weird things happen.
There is also a coupling issue that is endemic to all monoliths. We're moving to a micro-service architecture with clean domain separation, and we'll probably go to 1/10th of the code base in LOC terms within 12 months, even if we move some of that functionality into Go, Java or Python services (all options).
I work on such type of codebase, but we have a fully covering testing suite, so applying changes is not a problem (interestingly, I've just realized that the line count of the testing code is 50%+ more than the base application code itself).
So ultimately I think company culture (that is, emphasis on automated testing, for dynamically typed languages) is the crucial factor.
And that wasn't a stressful place to work with insane deadlines - it was fairly relaxed for the most part.
A 60kloc C++ project is small and easily manageable, a 60kloc Ruby hairball can drive a person insane.
If, for example, they're in banking and finance, and those LOC deal with fine details of tax code... Oh boy.
If anything, we've gone too fast and not spent enough time going back and understanding what we really need to keep.
This could happen if you have a lot of dependencies, switched compiler versions but left the binaries "in place" and deployed changes incrementally.
In short, my predecessor had attempted a move to SOA without understanding dependencies, circuit breaking and failure modes. This would then cause scenarios where the entire front-end would fail to render on a single down-stream service taking a little longer than necessary.
When identifying how to stop that happening, I discovered a large number of comments tagged "TODO" with statements like "Refactor this when we have time" or "We need to find a way to do this better".
Further down on the downstream services there were rather esoteric SQL queries doing large joins that nobody had done a query plan on. It was hard to identify these because the ORM had been trusted to do magic, and it was happy to do so, but there was a point where it was not apparent _why_ these joins were happening, but when you found the code, there were more comments "This needs improving", "We should refactor this", etc.
We were able to get something back quite quickly with liberal application of indexes, and it took us a day or two to refactor the queries enough to mean response times came down, but the error rate was still > 20%, and it was random, so 1-in-5 page loads of the front end service would fail.
We refactored the code to circuit break and handle degraded services better, but that took a few days, and then we started working down to the back end service and figuring out the final steps.
It was a small team looking after legacy code that everybody knew was a bit messy.
A few weeks before this code was shuttered, I heard from a friend that some of our content did not render at all on certain Android devices. I identified the cause as a half-finished refactor (again, my predecessor), that had never been finished because he had been pushed to work on something else. This caused a dramatic decline within a key market segment that resulted in declining ad revenue, subscriptions and overall viability of the business.
Basically, when you start something, finish it. If you find yourself putting in comments like "We should refactor this" anywhere in your code base, and you're doing so because the business is pushing you to work on new features, you have a massive problem culturally that is going to cause a rise in technical debt that raises risk to revenue.
All technical debt ultimately will lead to problems that the business will see on balance sheets, but they will rarely successfully identify the cause as being technical debt because they can't see, understand or rationalise it. They think it's engineers being grumpy idealists.
People play too fast and loose with the concept of "MVP" for my tastes, and it's a problem I see over and over again. The risk of that is, long-term, it will cause business failure.
Summary: http://pythonsweetness.tumblr.com/post/64740079543/how-to-lo...
If you're trading automatically, you'll need a very, very solid deployment and audit process, even if you're just a small company. The reason banks are so slow in deploying software is because most of them lost a few millions at some point due to some bug.
Startups that think they can act faster than banks just haven't had that bug yet. That's also why I'm rather negative on the whole Fintech scene at the moment.
In the enterprise, this is called "mature" and is a sign of great sophistication.
"The consequences of the failures were substantial. For the 212 incoming parent orders that were processed by the defective Power Peg code, SMARS sent millions of child orders, resulting in 4 million executions in 154 stocks for more than 397 million shares in approximately 45 minutes. Knight inadvertently assumed an approximately $3.5 billion net long position in 80 stocks and an approximately $3.15 billion net short position in 74 stocks. Ultimately, Knight realized a $460 million loss on these positions. "
"The new RLP code also repurposed a flag".
I've never seen a flag repurposed without catastrophic effects.
Was the issue technical debt or a sloppy deployment?
The second time was in a small company whose product was a search engine for consumers. The web layer was written in a mixture of JSF, JQuery and Ajax. While that combination already slowed down development on the front end, the main problem was the performance of JSF on the server. Because JSF is rendered on the backend, it placed a massive load on our server for certain heavily used pages and we just couldn't scale any further. Skipping JSF for a framework that was rendered on the front-end would be the solution but that was a massive refactor for which the company just didn't have enough resources. Eventually the company had to skip their search product and change their business model to a more community based website.
I wonder, would the result be different if you had access to competent Eiffel developers? How large was the Eiffel codebase?
Eiffel is an interesting language, with a somewhat unique feature-set (I think only Ada is coming close). Design by contract and static typing as core language features - if used right - should greatly help with both stability and ease of refactoring.
How large the codebase was is an important question, also how bad it really was. I saw a similar story - external codebase getting worse and worse from some point on - with Clojure at the center. The code quality was quite ok for a couple of months, then it worsened. At that point and for a couple of following months the codebase was possible to save - a single competent Clojure programmer would make a difference, I think. The project was less than 10k LOC then. However, more than 1.5 years and 60k LOC later, doing anything became nearly impossible for anyone, including original authors.
That is, technical debt is not necessarily tangled over-engineered code. It is more compromises that were made to actually ship and operate in the world. You can see this in the world with devices.
Consider, technical debt is the reason you have AC delivered to your house going through as many converters as you do devices. Often to the same target power characteristics for those devices. It is not the reason that your coffee machine that also grinds and whatever, is likely to fail within the year.
Another example; Technical debt is the reason we are still predominantly using petrol for automobiles. It is not the reason the dashboards are horribly non-responsive on modern cars.
Bad example. AC power has many desirable characteristics for the local transmission grid. If you were to do the grid over from scratch you'd still use AC. You're also too focused on household electronic usage, which is a very tiny percentage of the overall electricity used.
[1] 2009 - https://martinfowler.com/bliki/TechnicalDebtQuadrant.html
OMG no - run for the hills.
95% of software systems are not inherently sophisticated - they are 'complex' - yes - maybe there are many features, and moving parts - but there are no pieces of the system that should be hard to understand by anyone. Decent architecture + decent design and coding and an entire banks system should read like a long, but well articulated user manual.
Unless you're doing super low-level stuff, complex algorithms, heavy math stuff, or issues with massive scale or performance etc. ... the end result should almost be mundane in most cases.
I do remember a competitor dying of not releasing their big refactored next version soon enough, and running out of cash.
Spolski tells it better than me:
https://www.joelonsoftware.com/2000/04/06/things-you-should-...
I worked on a 300k LOC business basic application at one point.
The big question everyone was asking is how do you move to something else? Everyone wanted something else, they started writing new services on top of the old system, they had some ideas on where to go, but it just didn't seem like a gradual rewrite was possible.
And to be honest, a Greenfield rewrite just wouldn't work work for something this size with the resources they had. So it stayed in business basic.
You can learn a lot of lessons from Netscape, but this isn't one of them. Servo is a great example of how a rewrite should / can work. Mozilla hasn't devoted 100% of resources to Servo, but instead is letting servo build all on its own, and someday unclearly defined in the future, the two could merge. (but might not!) It's a separate product, and nobody is pinning all their hopes and dreams on it.
What Firefox devs did was to take the browser part, make it stand alone, and replace much of the XUL UI with native widgets (GTK on _nix).
The code was pretty sloppy, but didn't deviate much from standard Rails idioms. Not many people on the team understood Rails well enough to read it, but I did. Bug reports were constantly flooding in. I suggested taking a sprint to build up an integration test suite and then letting loose on the backlog.
We did build up a sufficient test suite in one sprint. But the bug reports never slowed. By the time we had the confidence to truly start tackling bugs at speed, the battle had been lost. We had been so busy writing tests that we forgot to manage the bug tracker. The impression was that we were overwhelmed and unable to make progress. The project was swiftly closed.
People remembered that codebase as an exemplar of sloppy code and technical debt, but that's not the lesson I took from it. I had seen, and others would see later, much worse. The lesson I took was that perceptions are as important to manage as results.
I still think Robustness principle[1] is a croc and strictly controlling inputs is one key to happiness. It also, frankly, helps your users in the long run by giving them exactly what they want and it actually cuts down on the amount of thought they have to put into it. Chaos and disappointment do not make a good user experience.
I totally agree about Excel importing, but CSV is trivial, no? Here is an Erlang version I happened to write yesterday:
lists:map(
fun(Row) -> string:tokens(Row, [SepChar]) end,
string:tokens(InputStr, "\n")
).
EDIT: I know this version won't support escaped separator/newline characters, but I made it for a specific use case in which I knew that would not occur. Adding that functionality would make it a little messier, but still not too bad.EDIT2: Thanks for the interesting comments! Not so trivial after all!
Perhaps a more accurate version of what I was attempting to say above is that 'it is often (not always) easy to build a CSV parser to interact with one specific program'. The four line version above works perfectly for reading the type of files I designed it for. If you want to work with human created, or more complex variants of CSV, all bets are off.
That's the most concrete reason I can come up with why the technical debt will kill them, but there's plenty of vaguer reasons why it's been killing them for the past 5 years and will finish them off over the next 5. The attrition rate have been around 20% a year since I joined. For most of the time I worked there they compensated somewhat by hiring new people. Word has gotten around though, and they've run out of qualified candidates willing to work on their mess. Hell, we even had a couple of gifted hires leave after a month or two while shaking their heads.
My current workplaces main product is using the same tech, is the same size (loc) and has the same functionality of the other company, but serving a different market. They did the oracle to postgres migration in 2 months. 2 MAN months, one guy.
New workplace: 15ish developers, serving the same amount of customers, doing similar revenue, making stable releases every week
Old workplace: 80 developers at its peak, doing non-hotfix releases around every 3 months. Just a mess in every way. Mostly stemmed from the codebase and the architectural choices that had been made along the way.
Yeah, once you get that deeply entrenched in Oracle, it's almost impossible to get away, and after that experience I vowed never to work at another Oracle shop.
I wasn't directly involved in but had a good view of our university's finance modernisation woes: http://news.bbc.co.uk/1/hi/education/1634558.stm https://www.admin.cam.ac.uk/reporter/2001-02/weekly/5861/1.h... - although in fairness the inflexibility and disorganisation were existing features of the institution, and Oracle merely exacerbated them.
Doing it after the fact in a politics-heavy organization is confounded by not just the technical difficulty of the task, but the glad-handing and perception management that has to happen to keep your team from getting fired during the process.
Though they only really succeeded on the shopping part. They didn't ever get to a credible booking engine that anyone would buy. Which may point to something other than tech debt being the biggest barrier to modernizing an airline reservation system.
Edit: And, worth mentioning that your competitors wouldn't have had to be better than, or even as good as QPX. "Good enough" would have squashed several big sales, since shopping was typically bundled in with what their customers already paid.
A booking engine (CRS/GDS) would be used by either airlines or a reservations system (Amadeus, Sabre, etc). That's the piece they didn't deliver on.
Edit: Reference to the announcement of abandoning the booking space: https://skift.com/2013/05/15/google-and-ita-software-abandon...
"This is indeed a bitter pill for ITA Software’s founders to swallow as they put years and millions of dollars into their dream to transform the nuts and bolts of the way airline reservations systems...are handled"
Every failed product/project I've worked on in my professional career, which had full intent to ship from the start, was killed by technical debt. It's usually indirect, but it's always the root cause.
It takes many forms:
* Too buggy to ship, due to a creaky old code base being over-stretched to a product with too high reliability/experience expectations.
* Product form factor, efficiency, user experience not good enough to sell well, due to spaghetti code base which couldn't be whittled down to removable pieces. Result: large runtime, more expensive, less efficient hardware.
* Existing old codebase deemed too bad to ship a product, requiring a rewrite-from-scratch, but timescale too long to make any sense -> product killed.
It's difficult to elaborate more while maintaining some discretion about exact companies and projects. The general point is: technical debt isn't just some fuzzy intangible issue — it indirectly creates enormous costs in people and time, can affect the physical form products take on, and impact the user experience. Products always get started without taking this debt into account, but when it's finally realized, it can change basic features, and then it kills them.
Products are designed with faulty assumptions about what existing resources can be applied to them.
I am curious how long your products/projects were in development for before falling to tech debt? Were these net-new projects?
I've been mostly in consumer electronics related companies, where a product which ships and then becomes too hard to maintain usually doesn't "fail". It just gets phased out. In a way, this is another way technical debt has an indirect, but large impact on products: obsolescence becomes a necessity. Not so much planned — which implies malice — as simply realizing it's not possible to maintain indefinitely.
> I am curious how long your products/projects were in development for before falling to tech debt? Were these net-new projects?
Usually very quickly, or after far too long.
The better projects know ahead of time that there are Dragons lurking in the code base. But that's effectively saying there are projects which never even got past brainstorming because we knew the technical debt was too high.
On the other hand, there are projects where it only becomes apparent how much debt there is after a lot has already been invested. It's like you'd expect, e.g "There's a performance problem because of a basic primitive this library uses everywhere. And that was originally a workaround for a compiler performance bug. We could fix the compiler bug, but it turns out other libraries relied on it..." and so on. Extra time-to-market makes a product make less and less sense — fashions change, hardware improves, new tech arrives — and so it gets killed. Or worse, shipped.
The class that is no longer appropriate for new requirements gets canned for a better abstraction etc.
In aggregate, over time, you may kill the product to avoid technical debt!
The original codebase was about 20 years old. It was control code for something best described as an industrial robot. Written for the last 20 years by greybeards who knew a lot about the manufacturing process, and were reasonably good at getting a product out the door.
But the whole thing was riddled with #ifdefs for this customer or that, or one batch of machines or another. All long forgotten, written by people who had since left, or been pensioned. It was in dire need of improvement and extension, but it would have been superhuman to inject new features into this rat's nest. Plus their electronics supplier was discontinuing the control electronics the system was designed for. The UI also looked like it had been designed by German engineers in the 1980s. Which was the case.
So they made the defensible decision to start from scratch. A team of engineers was to develop an brand new machine, with all new electronics and all new code. They got to work -- and had to scrap the new software about three years in. It was just utterly misdesigned, and riddled with bugs. It featured wonderful WTFs like the embedded realtime code depending on the Qt libraries.
I observed its instability myself: it would just spontaneously crash every five minutes, sometimes just while idling. Once the project lead was on holiday, the programmers revolted, went to the head of the company, and the project lead found himself without a project on his return. Whee.
Now we've started from scratch again, and have at least succeeded in making different mistakes this time around. Fingers crossed, this might end up working.
Should I ever inherit an #ifdef mess again, I intend to replace #ifdefs with Strategy patterns.
#1 figure out all the known defs in actual use
#2 rerun the preprocessor with each variant (combo)
#3 capture the output(s)
#4 aggressively apply the Strategy pattern, refactor code
Last time, I removed dead code piecemeal manually. It sucked.
Not to say I haven't seen this effect myself many times...
Management ordered the creation of new software. Shouldn't that be enough?
The project lead was responsible for this design, and above him there was nobody with any expertise in the matter.
From what I've heard he's an extremely good C++ programmer. He's just a terrible architect.
I am a big fan of constant refactoring on a small scale but I am very skeptical of large refactoring of a whole project. You may end up with something that's just different but not really better.
I'm not sure what the differentiator is. I'd be curious if others have ideas. I think part of it is that in both cases it was a small team, who caught the issues early enough that it hadn't gotten too bad yet, but late enough that the right direction to move in was clear.
I'm okay with the occasional week-long rewrite of a subsystem, but usually only after I've spent some time coming to grips with exactly why the old one is terrible and have a firm grip of exactly how the new one will be better.
So even though it makes no technical sense it bolsters a gap in the product offering, and they'll have to find consultants to limp it along every time they need something small done that would otherwise be very cheap. It's all about the balancing act.
I built this extranet app for a Fortune-class / NYSE company in 2001. They were a Lotus Domino shop so for that and various other reasons the extranet was deployed in Domino. The initial rollout was considered quite successful, but it was definitely "v1" code, and I'm being really generous with the code quality. Plus, Domino.
The application was considered a stopgap until the shop had become fully Microsoft-centric, at which point it was expected to be migrated to .NET. That was expected to be in ~5 years.
The result was that no investment was made into the app for over fifteen years. Every now an then an enhancement would be needed, and a contractor would be called up to bolt on a feature in shockingly slipshod manner (this app is much too complex for the average Domino dev). But no technical debt was ever cleaned up, because "meh, we're going to replace that app by 2007."
2007 was 10 years ago. In the meantime two projects to replace the app were spun up and killed. The app is finally being retired this year. I was called up at the 11th hour to jump back in (15 years later) to help support the thing through the conversion, as the one existing Domino dev they had on staff finally (wisely) jumped ship.
I cannot even begin to describe the state of this app.... it's a case study in "how to not manage IT."
---
Another recent client was a content-creation shop (think glossy magazines). Their outgoing sr dev had deployed a CMS that nobody had (or has) ever heard of. This CMS was originally developed during the glory days of XML. Believe it or not, the app worked by loading all of the CMS content into a single in-memory XML document. This was probably OK for a brochure site, but this was a site with hundreds of thousands of pages of content. As a result the application required a server with 64GB of RAM just to launch. Also - launching the app took about ten minutes after the server OS was loaded. And there was no server farm, just the one server. If the app was ever stopped, it would stay down for at minimum 10 minutes.
I came in to fill in temporarily and to try to find someone to staff the position permanently. Even with a competitive salary, nobody qualified wanted the job.
Meanwhile, the same company also had a set of blogs that they managed in WordPress....
They were already using WordPress for blogging. A custom WordPress implementation would have easily solved their CMS problems and devs are trivial to find.
The point was that the thing had just been rolled out the prior year. There was no budget or appetite for throwing the thing away. It did work. So there it stands, aside some dozen WordPress sites...
As someone else pointed out - technological debt is not a cause per se; it's an indication of some deeper problem - usually of human, not technological, nature.
So any business plan that includes the steps "A miracle occurs" and then "We get bought out" is probably going to suffer that fate?
https://hackernoon.com/12-signs-youre-working-in-a-feature-f...
I'm going to take a different stance and suggest it is very difficult for a product to be killed by technical debt.
I'm working on a product now which has huge technical debt requiring a full re-write, but we have customers who love the product, so we just keep the old version going while building the new version.
Let's assume we are talking about a single product company here, to make the discussion simpler and not get bogged down by larger corporate issues and it lets us focus on the case where the product is what is paying the bills (assuming we're talking commercial products here).
Let's look at a few examples.
The product has some customers, but it is difficult to add new customers due to some technical debt issue and the lifetime customer value for your target market does not cover the cost of re-developing your product and continuing operations. Ok, you're probably dead.
You've got some customers, and you can't sell more, but the lifetime customer value is greater than the cost of re-engineering the system. You've got some bad times ahead, but there is a path. So technical debt doesn't kill you.
You've got technical debt, and you don't have many users, you want to add features but can't because of technical debt. Did technical debt kill the product? Or was it killed by a lack of market? You don't know that the new features would have saved it, all you know is that what existed did not make enough to keep you going, so that can't really be blamed on the technical debt.
Technical debt can be costly, but rarely fatal. It's great not to have it, it can make it difficult to keep good people (I lost 2 amazing devs partly because they hated the old code-base we inherited, but they also had amazing opportunities).
Or could have killed it even faster.
Imagine that at the beginning you decided not to accumulate "technical debt". Instead you got some product that developers believed had no "technical debt" and because of that it was late to the market. Money were running out and once customers started using it you had to be really lucky for new features to make enough impact to stay alive. Because customers don't necessary care that much about features that are easy to add, but maybe want some features that are hard to add and nobody anticipated that. Either way I cannot find a reason for "technical debt" to kill a product.
I put "technical debt" in quotes, because even the underlying idea of the concept doesn't make sense and relies on a belief of knowing how to do it "the right way", which I don't think can be a good way to write software. It's better to substitute it with more complex concepts of flexibility and simplicity and corresponding trade offs.
Along with the 'feature' that could kill things faster is general bloat when people keep building without knowing what the customer wants. You have to try something, but product testing should be done in such a modular way that most things can be removed if it turns out to be the wrong direction. Of course, you need a base data structure, but features should conform to that rather than constantly extend.
One example I'm seeing with a start-up I currently know is that they are building SDKs for multiple languages. Most of the code is auto-generated, but just the time in examples, documentation and packaging is killing them while they don't have customers for most of the languages they're publishing. This isn't just 'code' debt, there is overhead in documentation and management of code.
My understanding is that it was never released, so all of the money the company put into the project was wasted.
This is maybe different from what people normally consider 'technical debt'. I don't mean just code aesthetics but also bugs, redundant code, and bad abstractions.
Similar situation at my place (great co-workers, good product space), but poor management, leading to lots of turnover.
I've seen this play out probably close to a dozen times now, at different employers and consulting clients.
For a company that makes software as a product, or to directly support or create their main product, not being able to add new features is a really bad place to be.
* In the early 2000s, they added support for Windows NT to the product. Unfortunately, they did this with an MPE compatibility layer that means the entire thing still thinks it's running on an HP 3000, so controlling it programatically means writing MPE job streams.
* It was originally written to store data in COBOL records. When they added support for SQL databases, they apparently just copy-pasted the schema verbatim from the COBOL copybook format. This means the database has no foreign keys, FLAGS columns all over the place (including tables where you have to JOIN ON SUBSTRING), and, most egregiously, a table with ITEMNO_001, ITEMNO_002, ITEMNO_003, PRICE_001, PRICE_002, PRICE_003 and so on, which has to be queried three times and UNIONed to get the data out.
* Printing packing lists requires not only a specific model of printer, but also an extra several-hundred-dollar chip to be installed in that printer. I'm told that this chip's sole function is to enable barcode printing.
I have no insight into what goes on inside the company that makes this thing, but it certainly looks to me like they have a severe case of technical debt. Any bug fixes generally take 4-6 weeks in the best case scenario, and frequently either don't fix the bug or introduce new ones instead. Their only customers are the ones that have been using the system for so long that they're stuck with the system, and can't switch--in fact, many of them are still running HP 3000 systems, which HP has been trying to end-of-life since at least 2006.
The end result of this is that the product is dying a slow, agonizing death of attrition. I think the only reason it still exists at all is because the company that makes it is stuck with support contracts that haven't expired yet.
The problem was two-fold:
1. The relevant tools (Unity3D) were extremely immature and the problem was quite diffuse. No profiler, poor quality of generated code, tiny caches, etc.
2. A problem in string-handling code that was quite diffuse throughout the game. As near as I can tell, it was blowing out the tiny CPU cache hundreds of times per frame.
On desktop, this code was a complete non-issue. On the puny little ARM on the iPhone? It was the difference between having dozens of towers and 50 enemies in play vs half a dozen towers and less than a dozen enemies in play. The impact on game dynamics, and need to re-balance everything by itself would add weeks to the shipping schedule.
There were plenty of other things that needed to be scaled WAY back of course: Switching from 3D to 2D to get vertex count and draw call count down. Completely rebuilding the entire UI. Revamping the pathfinding and suffix caching to not play havoc with the CPU cache. Moving from a 24x24 grid to a 12x12 grid. All of that combined helped a LOT, but not nearly enough.
The string manipulation was for a hierarchical property system that let me parameterize all sorts of attributes for enemies/spells/towers/projectiles in a set of text files. Ultimately, I had over-engineered on the assumption that I would be tweaking many more things -- with much greater frequency -- than I wound up actually tweaking.
Had I ripped most of it out and just had local properties on each prefab that I assigned manually, I might've hit that market opportunity. Finding that that was the cause was a multi-month project because of how interwoven it was with everything else. Hell, it would've been fine, had a not over-generalized it into a shared component on each prefab that the other components inquired with to get property values. But I did. And it took me vastly too long to identify it was the major problem it was.
Opportunity missed, and that was the final nail in the coffin for my fledgling game studio.
One application was a web application built in C++ in the 90's. It didn't have the STL, it implemented everything from XML parsing to PDF rendering from scratch. It stored all data in XML files on the file system. It was a single-threaded CGI application. And it was the core product of the small business that created it.
There was no series-B/C/D/E that was going to appear so we could hire more developers and re-write everything or develop a new, superior product, etc. This is where I learned how to maintain and extend legacy software. I spent hours pouring over Michael Feathers' book. We did manage to extend and breath new life into the system. We wrapped the old code in Python, wrote a tonne of integration and unit tests on every change, wrote some code to sync data to a database alongside the XML file storage scheme it used. We even got to a place where we started replacing code paths from the Python API with functionally-equivalent (as far as our test suite was concerned) code written in nice, clean Python (and gained some features along the way thanks to Python's nice libraries!).
We kept the lights on without having to spend too much time hacking on undocumented, untested C++ code and without trying to just re-write everything. It was much more difficult to make progress than a typical greenfield project in a dynamic language but that would've cost more upfront without a clear payoff... so we did what we had to do.
Another company? Well they decided to use a document-based data storage system as the source of truth in a hot-new micro services architecture that was going to save everything... only there was no schema validation and their use cases were killing performance in some scenarios. Random breakages cause by changes at a distance. It hasn't killed their business but it has limited their options.
"Working Effectively with Legacy Code"
We're slowly killing (i.e. no big new developments, but only maintenance for existing customers) and abandoning it. And luckily we're not rewriting it. :-)
Although at that point I wouldn't call it technical debt. If you've a million lines of spaghetti code, then you've a million lines of spaghetti code not technical debt. I.e. a camel is a camel. It's not a horse with technical debt.
Nice line though, made me smile.
The second one was a mobile app that was originally ported from some legacy J2ME app and "gotten to work" on the iPhone platform. It was pretty much a straight port, data structure by data structure, from Java to Objective C, and didn't really use the platform properly at all. For example, each and every control was hand-crafted to mimic the original J2ME app, rather than using the built in UIs that iPhone provided. It got to the point where nobody could touch it without it falling over, and no senior person was willing to work on it anymore. I was senior enough in my career at that point that I could insist that a complete re-write was the only way to go. We did that successfully and the previous pile of technical debt was killed.
Those examples aside, I'd say that almost every place I have worked suffered from technical debt to a large degree. The common theme was a huge legacy code base that suffered for years (decades) from repeated "just cram it in and get it to work" abuse. The metaphor I always like to use is: No home builder on earth would, when their requirements were to build a 5 story apartment building, take a single story single family home and just add 4 floors. But seemingly every company building software attempts to do this.
I worked on an EDA product for a short time. Its code base had a few components, some of which were written by the product team, and some of which were copied (without source history) from other teams. The team decided to make a cloud version of the product too, and to do that, the team split in half and each new team had their own copy of the code base.
No sharing code, because those other teams were our competition. There were teams working on components for our product, and another team working on a copy of our product, but nobody shared code. So bugfixes from one of the components would never make it into our code base. (The components were products in their own right.)
The code base was ~30 years old, and it had "survived" a port from Unix to Windows. The build system was a hellish nightmare of Cygwin, makefiles, Perl scripts, hard-coded paths, and unsupported internal tools, and after ~2 hours of manual fiddling it spat out a build. A few million lines of code, which according to my tests, would take ~10 minutes to build with a proper build system. I have a proper rant about the horrors I saw in the build system, I had never imagined that a build system could be so bad, and whenever I mentioned the name of the internal tool it was built on, company veterans on other teams would recoil.
Meanwhile, I witnessed how much damage the other developers were doing to the product. Some were trying to catch up with feature requests, some were adding hacks to the core algorithm to try and improve numerical stability, and some were doing real damage to the code base by adding buggy and poorly designed features--think major performance regressions, memory errors, and deadlocks fixed by adding calls to Sleep().
I tried to improve things as much as I could there. The company culture had some upsides (good work/life balance) but the internal competition strangled innovation, the wrong people made technical decisions, and the team didn't have a balanced set of skills.
The product hasn't been "killed" per se. It's still for sale. But it's the walking dead. Being EDA, licenses can run into five figures per seat, but the product's revenue was on a downward trend last time I checked. I have a couple friends that still work on the team but I'm encouraging them to apply for new jobs. I left when I got an offer from a major (big five) tech company.
I've been on two lack of refactoring/standardization projects. If you use rails, like back in the ver 1.0 era, you can't just stop updating and do something else, you have a tiger by the tail and if you don't keep up you'll never, ever, be able to catch up ever again. You can't go from 1.0 era to today where today is any time in the last five years. Scrap and complete rewrite. Of course management doesn't understand dependency trees and going back to OS and libraries from 2007 means rolling back 10 years of security patches or hand compiling everything and it would be a lot simpler to just rewrite.
I've been tangentially involved in a poor leadership situation where basically an entire department was forced out by a new leader, taking all their domain specific knowledge with them, and then the consultant friends brought in bled the company dry killing it in a race between expenses of consultants and smelly code being unusable. On paper the company died because of the financial load of switching completely over to outsourcing ("We're not a software company so we will not have developer employees anymore ... but we will have twice as many consultants working two thousand hours per year for five times the pay temporarily")
I've never experienced the parallel development trap, or lack of test suite, those must be interesting.
One of those products was released half a year late and turned out to be a poor market fit. The company closed several months later. It could've used this half a year to complete a pivot with another product, which could have been successful.
All software has some technical debt but you can have more or less depending on how much effort you or your organization takes in reducing it or avoiding it from the start.
About 1986 I was tasked with moving a small block (a few KB) of data very quickly from cabinet A to B, with the racks full of custom electronics - no PCs, all original stuff on a flight sim with 386 Intel processors all over the place. The racks had Multibus backplanes.
I suggested a 'TAXI' fast optical link (oooh - optical..too radical) or a pair of Intel 589 (Ethernet) cards for an off-the-shelf solution. Nope, too expensive. Engineering Management suggested a twisted pair ribbon cable between the two adjacent racks - um, OK..
Long story short - me and the senior design engineer decided to use the Intel 8257 DMA controller chip to grab the bus and blast the data between the RAM on two cards.
After a short period of fails, we found that the engineers who designed our 386 cards did not bi-directional buffer the DMA request line onto the backplane as they never expected any other card except the master CPU ones to initiate a DMA, so the CPU cards could not see the line being toggled from elsewhere.
Engineers would not accept a change request for 'reasons'
Intel 589 cards is it then!
All because someone chose to omit one tristate buffer.
my product was http://www.teamkpi.com/
I hired 3 mid-level PHP and jsp developers in Thailand and had them make the website + reporting page.
total nightmare. Don't hire developers and assume that they will rise to the occasion (learn new tricks). I gave them as much time as they needed to research and make sound engineering decisions, I ended up with a spaghetti nightmare Frankenstein mix of server side scripts mixed with client side script mixed with server side that generates client side script.
In Thailand at least, you always need a manager to force architecture and design decisions, and force devs to refactor poorly thought out solutions.
I was naive and thought that I could have a team of 3 figure out the web part while I write the desktop client and provide PM-level guidance.
Technical debt destroyed the team.
A rewrite was started, but never got anywhere. The company folded under the weight of its massive salary costs.
I don't think technical debt alone will kill you. But it may render you unable to cope with another problem, which will then kill you.
Because our product was customized per customer - not just look-and-feel, we coded their business rules into it - we had problems scaling. Further, our design didn't lend itself to rapid development.
First our sales team left. Then developers started to leave. Within a couple of years after I left, the company folded. Many good experiences had in that company. Many lessons learned.
It took a year to build an index from a new crawl, and they were only doing incremental "freshness" updates in between where they updated certain pages. It was a fiasco.
I'm using this definition of technical debt.
Technical debt is a concept in programming that reflects the extra development work that arises when code that is easy to implement in the short run is used instead of applying the best overall solution.[1]
[1] https://en.wikipedia.org/wiki/Technical_debt##edit
The project I am referring to was a consumer web project. No mission critical type data.
In my experience (mostly consumer web, social media management for B2B) I have seen causes of failure to be heavily weighted towards product issues not engineering. The one big success I was a part of had the most technical debt. :) But that's probably because it lived the longest (and still lives today!). My experience is limited to being an employee at 4 different tech companies and several failed attempts of my own.
The former has happened to every project I know, which doesn't die for another reason (market disappearing, etc). The latter I have not experienced.
I worked for a dotcom back in the day. The underlying tech was appalling, to the extent they couldn't actually stay on the internet. We fixed this. They then proceeded to vastly expand the site, but every part of it was written as a special snowflake.
Five years later, they decided on a rebrand. This was literally a reskinning of the existing site. It took 50 odd people, ten months and more overtime than you care to think about. Many of those people were contractors.
Not only was this a huge expense, it prevented us from making the site any better. It was an extremely competitive market and the other sites ate our lunch. The next year was to be after round of redundancy, a financial statement that wrote the value of the business down to its cash reserves and finally a sell-off.
Ironically, the new owners turned it back into a viable business, but they had a very different attitude to technical debt.
The original architect was, despite his outward appearance towards emphasizing correctness, was really not a great developer or a good technical lead. He insisted on writing "perfect" code, ignoring the rest of the development cycle or the people he worked with. Regardless, he left for reasons I am not really aware of last year. I knew he hated me, since I didn't share his obsession.
Fast forward to today ... the applications work. There is no more feature plans for it. But, it is frustrating to go back into it because the code is utter garbage. I am the only one who has any deep understanding of it in my company. Those who tried to understand it also struggled over it.
We are taking technical debt in the new application we are working on, but, we have been planning for it to happen. I have the chance with the devs I am working with to get this "right".
Throughout my career, the sense that I got was that people did not understand something critical: Owning code is a full-time job even after it is written. I never worked with technical leads that asserted that. They always made passing nods to the idea that we should comment more code or write more tests, as if it is just some tired dogma to follow, but no one has made a case to me for being clear on knowing what you are accomplishing, how one accomplishes it, planning to own the explanation of what the code does and what trade-offs were made and a plan to pay back the debt that was taken out. As tech lead now, I am continually communicating this; the progress is starting to show. I am just glad now my company is recognizing these needs too and is understanding the investment that comes with maintaining their major product.
I've long thought that "premature optimization is the root of all evil" is sort of a waste of breath... not because it's wrong but because over-engineering is a far greater problem today than premature optimization.
Over-engineering is a plague in modern software. Most of the failures due to technical debt that I've seen involved cases where "smart" developers built swiss army chainsaws to do things that required a hammer. This also often results in products that require orders of magnitude more resources than they should, which makes cloud vendors like Amazon and Digital Ocean a lot of money I guess.
I think I've worked for a company that did essentially mostly screw itself with technical debt, though.
They originally had overseas contractors write parts of their product without having a proper developer to assess and vet the results or requirements and it resulted in zero separation of concerns, business processes and display logic combined, and a terrible database structure. Eventually they wanted to move from their original layout and page design to a new one and it proved almost impossible without multiple developers spending around half a year. And even that was wasted effort because at the end of that they realized they wanted to actually have a mobile site and they still hadn't really created a good separation of concerns... Of course the whole time engineering and development wanted to refactor the business logic into separate code from the other layers, but management didn't want to expend resources on non-customer facing dev work.
I think the product still exists, but they've killed their momentum.
Government customers rarely have the ability to independently assess what something should cost if done according to industry best practices. So what you do is bid low, with an extremely short time horizon to release. You get the contract, and now have license to run up a lot of technical debt, because it needs to be done fast and cheap.
Now you get to maintenance phase. That's where the money is. The typical government customer is never willing to spend even one penny on paying off the principal on technical debt, but will make the installment payments forever. They will spend $100k on one new feature, but not even $0.01 on reducing the price of new features. Many still use SLOC as a management metric. So you run your codebase up to a million lines of code, when the software itself is just another glorified CRUD app. For bonus points, you give yourself 100% test coverage on a bunch of functions that have boolean "isUnitTest" parameters.
I imagine this is similar to how VC firms loot companies by manipulating their financial structure. I find it to be extremely unethical, but there is literally nothing I can do individually to put a stop to it.
I wrote a complicated and horrendously ugly scraping PHP script with hundreds of regexes to generate a PDF file from a CMS. It worked in about 90% of all articles, and for the rest only needing a few manual fixes in InDesign.
I always intended this as a stopgap measure till we implemented a clean document structure, of which both the web page and the PDF could be created. But that never happened. Then I left the company.
A few years later the company made a redesign of the website and totally scrapped the PDF feature.
However I still sometimes see these PDF pages out in the wild, mostly saved by the authors of the articles on their own web sites, because authors are allowed to link to their own content for free. It's a shame because I think these PDF versions of the articles were beautiful. But as you would say, technical debt killed the feature.
Another project for a NPO has been written in Python using an ancient back-end. When the shared hosting provider made an upgrade, the back-end died, and I was not able to repair it and I was not paid enough to invest too much time. I bluntly told the board that their project is as dead as a dodo, and they accepted that.
I once worked on a system that was so messed up that fixing any bug would create 2 more. At one point, the entire team was only working bug fixes for 6 months straight and in the end, we had more bugs than we started with.
We tried to refactor the code several times but it was just so fucked up.
This was the stuff of nightmares. Most of the module consisted of one class with 50,000+ lines of VB.NET code in a single file.
Global mutating state referenced in functions everywhere. Functions that were 500-1000 lines long (today I have ESLint limit functions to 10 lines).
After working on it for over a year, I recommended to my boss that they initiate a complete re-write and made it clear that in my estimation there was NO saving this code base.
I got moved to working on single page applications in JavaScript, but when I quit that job and moved on roughly a year and a half later, that code base was still in production, generating ever more bugs for that team.
feature requests with no mind to maintainability/cleanup -> tech debt -> slow dev. velocity -> failure to deliver features for the businesses -> failure to retain/grow/compete
A lot of these are game mod related. Since hey, even the base game is a black box loaded with the technical dev of a team that could have been gone for decades. And the very tools and patches you're using on top of it were then also by people wanting theings done the 'quick' and 'easy' way over the 'right' way. Usually because said people constructed the tool in question for their project, and had designed it specifically for the environment they were working in rather than anyone else's. So you'll often see such a project completely fall apart because you don't understand all the code you used or how it interacts with the other stuff you used without understanding it.
But a few were actual work projects for clients. These fell apart because the following sequence of events occurred:
1. A coder was hired to work on the system and had a very different coding style to everyone else in the company. They thought they were being 'smart' but had overengineered the project by about a hundredfold.
2. They got sacked without telling anyone else how the project was constructed or why it was built that way. So developer B took over.
3. Developer B tried to 'rewrite' the system completely, but ended up merely creating a hodgepodge of his work and the other developer's work that ended up being rather unstable.
4. More features were requested from the client (which the system wasn't designed for), so three more developers each added them on independently. None of this work was commented, documented anywhere or stored in version control, so they bolted the features extras on, tested just that one part of the system and claimed it worked fine.
5. Project ran into large numbers of bugs, often ones which crashed the system or took down the database for a while. Multiple times a day, whatever developer was free would have to apply patches to whatever random thing stopped working in the last few hours.
6. Everyone ended up complaining that the system didn't work. Or that it should be rewritten. Or that it wasn't what the client 'wanted' at all despite the latter having changed their plans three times this week.
Either way, what should have been simple websites turned into giant unwieldy messes that no one developer understood the full design of. Which sat in endless limbo while developers ran around trying to patch up problems caused by no one having a coherent plan for the whole project.
From that point on, the architecture froze. It still was possible to continue creating new features on top of it, but I am sure that architect would be able to provide both guidance and new solutions I feel may be needed. No one else picked up his tasks or views, nor was the architect position filled up.
The problem was that the original programmers didn't understand how to program with a database, and management was unwilling to address the core design flaws.
As a result, upper management told us we missed our market window and the project was killed. In reality, it was the technical debt from not understanding how to correctly write a data access layer that made us move too slowly to meet our market window.
The previous project has some issues but they were fixable with refactoring while continuing feature dev. The rewrite was done over my objections. New management had been hired and they were looking to make their mark. I left before it could all come crashing down.
A typical scenario is:
1) Product becomes too hard to change but has many existing customers
2) Product is off-shored, and costs of maintenance are under-estimated because it depends on a vast technical ecosystem
3) Alternative products are developed to replace revenue of the original
The original product can live in this moribund state for decades.
More recently another one which is being killed by tech debt but the tech is so old that only a rewrite can really save it and the income from it is just not high enough to warrant it.
(the former was written by me at first before the company really took off and the latter was an acquisition)
- opportunity cost for the product since you aren't fast enough to chase new revenue channels, this cost is almost invisible,
- the product can become plainly unable to run on newer platforms because of debt,
- unpleasantness of the codebase that will lead to loosing engineers.
It's a slow and excruciating process.
The company IP and employees were absorbed by the highest bidder and the company lost its funding after the CEO and CTO left.
Typically these projects are able to limp along until some other forces kill the company.
It's still around and still being maintained but it's a shadow of what it once was.
I've worked at multiple other companies which have gone through rounds of expensive rewrites.
I've also upscaled databases and programs to modern technology.
Shockingly enough, the “project” was a gigantic Excel spreadsheet. Accumulating debt on a thing like that is actually very easy.
People are reluctant to change them so they don't adapt.
I used to work for a company that specialized in Solar Panel data. I was hired to help the company catch up on their client queue (those waiting for the product to be set up).. and when I first started they were behind by about 140 clients and I had gotten the client queue down to about 30 clients remaining.
So our company would sell solar panels to corporations, and then they would put kiosks in the lobbys of their buildings so their clients could come take a look at the energy savings and other information about the building. It was a very popular way of doing things, especially in cities like New York, Chicago, and Los Angeles. Well, my job was actually designing the graphics for what was displayed on those kiosks and then syncing the data with the graphics.
There was a platform that did the work, but I still had to design the graphics and type in all the serial and model numbers and all that to sync everything together and make it show it in chart form in a gorgeous view. Anyways, for years, it was being done in Flash. Unfortunately, as progressive and dominant as our company was -- at one time, their technology was the best, even having received several awards. Over the years, HTML5, JSON, and other technologies were better and faster -- and could read data from a database just like as our system could, but turnaround time for production was faster for others. It took about 1 week for us to develop, set up, and sync everything from system to database. Now these other companies were able to get everything done in days, rather than a week or more. I think a huge part of the problem was my company had these corporations interested but forgot to hire enough people to actually do the work.
We did have an in-house software developer working on improving this technology, upgrading the software, but they seemed very reluctant to adapt the new software quickly. I had worked in the new software for a few days, and then all of a sudden, they called me into an office, and laid me off. A few months later, everyone else followed, and today, I think the company only holds on to about 2 or 3 employees who just maintain all the kiosks and software because of the existing contracts. Or maybe they went under... their website doesn't even work anymore.
Why did they go under? We were warned that technology was changing fast and that we needed to do something about it... and it was just the reluctance of the CEO to adapt the technology and push it out faster. Our competition slaughtered us. Luckily, it was my second job, so I worked out a great deal with them on letting me go.. I got about a month of vacation time on the condition that I wouldn't go file for unemployment. Obviously, already having another job, it worked out in my favor.
I ran the project for the first four years. It was my first large project and the first time I had run a team of any size. I made a few mistakes that might be worth learning from, but my mistakes weren't the only ones responsible for the debt.
The largest driver in the technical debt issue was the timelines. My boss was new to software development, and his expectations were not in line with reality. We frequently had to ship ad hoc features under tight deadlines to keep him happy. When I pushed back on the timelines he became unhappy. So we acquired debt in the form of 1) hastily designed sections of code that don't lend themselves to scalability or refactoring 2) a lot of small code smell issues that by themselves don't amount to much but in the aggregate form a kind of surface scum that makes future development and more importantly testing difficult and thus brittle.
The debt, and the accompanying bugs and delays it caused, eventually led to enough dissatisfaction that I was taken off the project (I am in a weird outside position now, sort of half on half off, supporting a tangent of the software but not working on the main branch or involved in architecture discussions, planning, or execution of the new replacement). A new manager was hired. We disagreed over strategy, so I was taken off the project.
He wanted to start over from scratch, which is what they are essentially doing now. The plan is to pattern the new solution off the old one by incorporating all the business rules (which they want me to document), but they have very different implementations in mind. They don't want to reuse existing libraries (some of which is driven by a lack of familiarity or understanding of those libraries, how they work, why they approached the problem the way they did).
They face some significant challenges:
1) their coding velocity is slow. more than 60% of the original team has left and been replaced, so a large part of the team hasn't been on the project for more than 2-3 months. this means that there is a significant dearth of institutional knowledge. i think a decent pace is good for development of software, but they aren't starting from scratch, and there's a lot of expectations to meet.
2) because of the dearth of institutional knowledge when they do start implementation, they are going to end up repeating many of the mistakes made by the old team. I have limited insight into what those are, but based on what exposure i do have, i can see planned missteps all ready.
3) my boss, the owner of the company, has learned some patience, but they've been at the rewrite for nearly 6 months and have little to show for it. in terms of feature parity with the old software, they are severely lacking. i don't expect he is going to be willing to wait another full year to get an app that does essentially what he already has only differently, even if that architecture is more flexible.
In all honesty, I hope they succeed. My boss is a good friend and this experience hasn't ruined that. I have and am dealing with some resentments, but I don't want him to fail. And because this project was my baby (so to speak) there is a part of me that wants it to live.
----------------------
In retrospect, I'm not entirely sure what I should have done differently. Had I ignored the requests for adhoc features, I likely wouldn't have made it as far as i did, because without those features he would have pulled the plug. The company grew at an exponential rate the first 2 to 3 years of that project, and part of what fueled that growth was the adhoc, fast turn around times of my dev team. We were incurring debt, but we were also making big gains.
If I was going to do this all over again here is what I would do:
1) I would push back more, in smaller increments, placing more emphasis on getting the details right before shipping. 2) I would push back more on requirement creep. My boss would frequently have these "great" ideas that he would insist we work on, which would get half done then discarded leaving the code base littered with dead ends that needed cleaning up later. Part of the debt we acquired stemmed from the fact that in order to keep a lot of those activities from impacting ongoing efforts, i built an architecture that was somewhat disjointed. the lots of little islands approach meant we had apps that were reinventing the wheel, taking different approaches, etc... 3) Place a greater emphasis on automated testing (fewer human testors, more test engineers).
The other aspect to this is the fact that when you are creating software to solve problems that don't currently have software solutions, you spend a fair amount of time going down trails that don't pan out. The R&D aspect left us with bit and pieces of code in the code base that were incomplete or incompatible, but that were relied upon by one app or another.
We lost time discovering that certain approaches didn't work.
I think thats it... I hope someone finds the story useful.
All their "media" was in flash format; so they made it a requirement, even though they could have saved themselves many tens of thousands of dollars in hardware costs by going to a mobile platform. (they did not have reliable access to networking in most of their deployments, and didn't want to bear the costs of rolling their own with cellular).
So I wrote the app in a "AIR isolated thick client" mode. There were many problems with garbage collection and application freezes, that could have been addressed had they funded migration to post-Adobe FLEX (Apache FLEX). But they ran out of funds for that.
Their platform was Win 7 laptops. The first hardware iteration was fine. The next year, when they started replacing the laptops, they hit a driver bug that caused the whole screen to freeze. With the next hardware iteration they were able to fix that, but they started having problems with AIR trying to download an update when they were not connected to a network (ie. deployed in the field). AIR behaved badly in this instance and refused to run.
When they moved to Windows 10, Adobe hadn't updated the AIR player yet, so they had another botched deployment.
It was in our contract to hand over the source code, so they tried to hire their own developer. I spent many hours very thoroughly documenting the code, and they had no hours budgeted for ongoing support. Judging by some of the desperate emails I was getting last year, before my Program Manager told them to fuck off, I think my documentation was not enough. Of course: the Dev Environment setup was Win 7, Eclipse 3.x with the Adobe plugin. I maintain a VM of the dev environment, but I doubt even I could follow my own directions to set it up anymore since the Adobe SDK of that version is so difficult to locate.
Had they listened to my original recommendation to write the entire application in Java, they would still be running fine.
That was my most recent experience with "death by technical debt".
Many years ago; a product died simply because a competitor bought our company, and tried to sell both products (because theirs was a "consumer market product", and ours was "enterprise market"). Over the next 18 months, they cut development to the enterprise product, and tried to tart-up the consumer product to meet the needs of our enterprise customers. I was a major account manager, so I watched one by one as our frustrated customers dumped everything our company sold, and went to the other competitor. So that product wasn't so much killed by technical debt as it was killed by moron MBA's.
In the end; all of those products are obsolete because nobody used dedicated backup tape library software anymore. It became a very small market because tape backup hardware never got commoditized, and prices just never came down from "insane". Even blank tapes were more expensive than a removable hard drive. Poor people just back up to the cloud, pay rent to someone else for their own data, and end up losing it.
I can probably think of about a dozen other examples from my long and miserable career in software.
Another was a rewrite of an application that had been very popular, but was written by someone who was actively learning on his own (a talented developer who later learned to write very clean and well organized code). It was, as I like to call it, "superglued" to the server. SQL statements that could have been handled with a join were instead handled by querying all the ids, running through them in a loop, building a new query, getting results, stashing them in an array, one by one. When those queries took too long, they were run in the background or just crashed the system. Changes were all made in place, on the prod system. There was no build, no archive, no nothing (this was in the early 2000s - even back then, this was a no-no, but it wasn't quite as shockingly unusual as it is now) I was part of a team that tried to rewrite it, but halfway through the organization was frustrated with the time and expense and terminated the project.
Interestingly, I think there's almost always a psychological or political factor in play when a project is killed purely by "Technical debt". In theory, debt itself shouldn't really play much of a role in whether a project is worth pursuing, because it's all sunk cost. If a project has a positive enough ROI that it would be worth pursuing as a greenfield project, then even the worst case scenario, trash everything and start over, is a net win.
However, I think what happens is that some people never wanted to see the project happen in the first place, or people get very frustrated with the expense, or people start to suspect that the failure was inevitable and that the technical problems are just a distraction. Think Jurassic Park (which in many ways is a story of a software project failure, a theme that is much stronger in the book). At the end, during the "post mortem", some of the characters think that the problem was in the approach. That with a bigger budget, less dependency on a few people, less corner cutting due to a lowball bid from a software developer, enforced through threats to reputation and lawsuits, that with a better approach it all would have worked. On the other hand, you have the chaotician, who insists from the start that a project like this will fail inevitably.
My guess is that if it was ever worth pursuing, it is always worth pursuing. The problem is, we can never really tell. Sometimes failure due to technical debt is taken as a sign that this failure was inevitable. Other times, it isn't.