Ask HN: Good resources about legacy code?

56 pointsASVBPREAUBV8y ago38 comments

Hello HN: I got offered my first consultant job for a company with a really old&bad (no documentation, spaghetti, monolithic...) PHP codebase. Most parts of the codebase is working fine in production but some parts have to be replaced. Can you recommend any good books/papers/websites on how to get started? i don't need language-specific material. i need methodic/abstract advise.

38 comments

wirthjason8y ago

I sencond Working With Legacy Code. A lot of advice comes down to writing tests so you don’t break existing functionality. You should write a lot of tests, particularly high level stuff that tests the entire system because with tightly coupled systems you’ll modify part A but part Q will break. Integration level tests help find this stuff out.

You have two problems on your hand. One is understanding what the code is doing from a technical perspective but another is understanding the business rules.

If you haven’t already, get a high level view of the system. Maybe it can be divided into 4 chunks, and chunk 1 can be broken down into 8 components, etc. Then start documenting the different components in the codebase. Try to understand what the different components do — how are they called, what’s the input, output, do they mutate objects, etc.

Once you have a road map you search for “seams” where you can break things apart. Maybe component A, B and C are tightly coupled, but you can split A into two parts — A1 and A2 — and write something that encapsulates all of them (A1, A2, B, C) pinto a cleaner interface. Try to write wrappers that use existing code, then you can have higher confidence that behavior isn’t changing. If you rewrite low level components there’s no telling what the side effects may be.

Lastly, learn the language well. I work on a similar code base but it’s in Python. Knowing “advanced” features of the language has helped. Often a lot of boiler plate code can be eliminated by an advanced language feature. By knowing the “seams” of the system and the language you can bend the system to your will.

matt_the_bass8y ago

Can you share a python example of replacing boilerplate code with advanced features?

auxym8y ago

Context handlers come to mind and can often save a lot of error handling code.

maxxxxx8y ago

I have learned that it's good to approach something like this with a level of humility. What looks like a big pile of spaghetti code may actually have a structure, just not one you may like. I often think "How stupid can these people be?" only to learn later that they actually had a reasonable design. It pays to take the time to understand the code.

Otherwise I'd try to refactor the code into testable modules as much as possible. Unfortunately PHP is not on your side when it comes to refactoring. Especially in the older version people used a lot of "tricks" that make refactoring hard.

yarinr8y ago

"Working Effectively with Legacy Code" is a classic read. It dates back to 2004 but the techniques are still relevant.

mirceal8y ago

It’s on my list of books that every developer should read.

djuralfc8y ago

What's the rest of that list?

2 more replies

pdkl958y ago

> spaghetti, monolithic

https://www.joelonsoftware.com/2000/04/06/things-you-should-...

It can be tempting to declare entire sections of legacy code - or even the entire project - to be an unmaintainable mess that isn't worth fixing. Reading code for a true understanding of how it works can be a slow, laborious, annoying process. Simply rewriting everything from scratch often appears to be easier than trying to read, understand, and fix a legacy pile of spaghetti.

Giving in to that temptation and rewriting a project instead of fixing and refactoring the existing code is almost always the wrong decision. The messy spaghetti probably started out a lot cleaner. The mess that accreted over time is often important bugfixes and design changes. Some problems only show up in the field and sometimes requirements change. The spaghetti of bugfixes, workarounds, and changes might be the most valuable part of the codebase. Throwing it out might be throwing away the accumulated knowledge and experience of many expensive developer man-years.

Instead of rewriting, preserve the bugfixes and real-world experience by refactoring the spaghetti. It can be annoying and tedious, but it's probably less work than re-debugging old problems until the "clean rewrite" accretes it's own spaghetti-like layer of bugfixes and workarounds.

8ezhikov8y ago

My advice be will be: don't put your soul inside it. I had lots of projects supporting/rewriting legacy code. Less emotions you dig in it the better. Otherwise it will be torture. One day when you will have chance to rewrite it, you will be exhausted and empty.

noir_lord8y ago

Are you me?

This is the exact position I was in 6 months ago and it wasn't just legacy code (legacy code isn't inherently bad) it was/is bad legacy code.

https://leanpub.com/mlaphp is very very good, it's a roadmap/process to get a legacy PHP project to a reasonable state efficiently.

1 more reply

SeanKilleen8y ago

If pieces of a codebase need to be changed, I break it into a few general steps (varies depending on specifics):

1) Look at the code and examine what it would take to make it testable.

2) make tiny, safe refactorings to prepare the code to be tested -- only if you are absolutely certain these changes can cause no side-effects (usually I'll rely on tools to help me do this, just to increase the confidence level). If you can safely extract related code into appropriately small & related files / objects, that can be a great start.

3) Put the existing code under tests. Write tests around that code -- preferably unit tests that exercise the legacy code, as written, to verify its behavior.

4) refactor the code that the tests are using to represent a cleaner codebase -- maybe you extract some objects / functions.

4) Write new tests to demonstrate the desired changed behavior. Write them as if you're writing them for a brand new codebase.

5) Make the code pass those tests. Some old tests may fail -- if they represent the old requirement, you can delete them. If they don't represent the old requirement, you know you have a bug.

Repeat this process for each piece of the application that needs to be changed. NOTE: In a legacy app, you sometimes have to make peace with the fact that some of the app in production will remain legacy, untested code. If it doesn't need to change, then you don't necessarily need to sink a huge amount of time trying to refactor / put all the code under test. Get in the habit of doing it whenever you need to make a change (and factoring in that time into any estimates, etc.) -- over time, the cost of a change will hopefully go down as you pay off that technical debt.

If you're trying to figure out what the cost of change will be, sometimes you can use static analysis to look at a codebase and show potential issues for a given section. Using such tools can sometimes help you understand how heavy of a lift it will be to modernize the code.

As has been suggested, "Working effectively with legacy code" is a guide book here. You'll likely also want to seek out language-specific material about refactoring, unit/integration testing patterns & tooling, etc.

Good luck!

Too8y ago

Since you are working with dynamic language, step 2 should include adding type annotations or whatever they are called in php. Specifying stricter types makes refactoring much easier.

maxxxxx8y ago

Agreed. Adding type annotations is already a huge refactor in itself though and pretty risky with PHP. Once you have that done life will be much easier.

1 more reply

megaman228y ago

In addition to Working Effectively with Legacy Code, Re-Engineering Legacy Software[1] is pretty decent. It's really a lot of the same recommendations; put a test harness in place to guard against regressions, then start pulling out things and making them into more sane, SOLID components. For that aspect, you can look at things like the Uncle Bob Clean Code series, or my personal favorite Adaptive Code via C#[2] which despite the title is pretty general and all about writing code with the SOLID principles.

[1] https://www.manning.com/books/re-engineering-legacy-software

[2] http://amzn.to/2CfxK8w

carbocation8y ago

I've worked with Paul Jones in the past and he has actually modernized one of my own legacy PHP codebases. He wrote down his experiences and advice in "Modernizing Legacy Applications in PHP": https://leanpub.com/mlaphp

It might be worth your time. (Edit: noir_lord also recommended the same book in this thread.)

web0078y ago

Others have already recommended Legacy Code by Michael Feathers, that's the place to start. The only other suggestion I have is Refactoring: Improving the Design of Existing Code by Martin Fowler.

vincenv8y ago

"Object-Oriented Reengineering Patterns" has some good advice on how to approach a legacy system and rewrite parts of it, pdf available from the authors website:

http://scg.unibe.ch/download/oorp/

w0rd-driven8y ago

This mirrored my experience joining a company when my prior PHP usage was also custom code. I struggled for about a month with the current workflow of writing code, pushing it live to a hidden test area, and then getting feedback from the changes I made. Fortunately, vagrant was newish and I learned of the site https://puphpet.com/.

I set out on a mission to recreate the production environment as closely and as primitively as possible. Instead of the full 16gb legacy database for instance, I only dumped the structure and added rudimentary test data. Now my primary workflow involved local, manual testing but the feedback loop was orders of a magnitude faster than waiting for subversion changes to get deployed. Recreating the production environment 1:1 was wrought with large annoying challenges.

Barring full conversion there's various techniques that required less effort. Using scratch scripts and running the local server in phpstorm helps but stripping code down to run locally can be cumbersome. Another option I took was getting lightweight functions working in an environment like http://sandbox.onlinephpfunctions.com/ and slowly integrating them into local scratch scripts or production.

Fortunately, the future at the company involved selling Laravel as a viable option, which makes everything so much easier. I'm a big proponent of frameworks or packages over custom code or NiH as they often soften edge cases or work around quirks in the language.

PeterisP8y ago

The core idea is that you need to imagine what would be the condition of the artifact in question if it'd be reasonable to do the required modifications, and (slowly) push the environment towards that state, filling in the gaps.

IMHO it's to be expected that you won't get where you want to be, you'll fill some of the gaps but not all of them.

The key issue is to identify what is the major pain point that prevents you from doing this. For example, in a particular similar situation for me the key points were (a) ability to reliably build a deployment package that's sure to work; (b) brief documentation about the functionality of the main components of the software and their interaction/interfaces; (c) creating a basic suite of tests to ensure that key functionality keeps working as intended if we change/rewrite certain parts fo the codebase.

The pain points will be different for you, but that's the direction that needs to be identified and taken to proceed properly.

zer00eyz8y ago

This has been the bread and butter of my work life - I either inherit a pile of garbage or get a green field.

Your first chore: getting a working dev instance with a debugger. Depending on the stack, and dependencies this might be a mountain that is rather hard to climb but it is going to make your life easier.

Second chore: look at the history you have. God bless you if you have source control, with commit history, and any sort of sane commit messages. Bug trackers are also your friend. Lastly there has to be SOMEONE with lore/knowlege of how the code base got to be the way it is - if you find them and TALK to them. Knowing WHY is almost as important as knowing what.

Third chore: Pull the schema. Get a schema dump of the production database, and look at what query logging is set at (might not be sane) and what it should be (query sampling might be your friend). If your lucky enough to have a MYSQL setup then use workbench to help generate a diagram or any other tool you prefer. You want to have an artifact when your done, one that you should maintain.

The fourth and fifth tasks are going to occur concurrently: Walking the code base and understanding or building in logging - Your going to walk the hot spots in the code base first - think home page, log in, and the core functionality. Every time you find something interesting add a comment, and LOG where you think is appropriate (remember you can always shut this off later).

Really this is an exercise in reading and understanding what exists today with as much context as to WHY as you can discover (see step 2). Don't be afraid of either using the @bug and @todo syntax in comments and opening up tickets against yourself/the codebase. You may end up with a list of 200 things to change in the first week and that is OK.

Once you can READ the code base as it exists make sure you REVIEW the code for what your replacing -- even money says that there are bug fixes and edge cases that have already been solved for in that code, ones that your replacement may have to solve for even if it isn't in the "requirements"

Lastly, find someone to commiserate with and someone you can "bounce ideas off of" - rubber ducking works up to a point but sometimes in explaining to someone else they ask the critical question and it sets off your thinking/exploration in a new direction. They don't HAVE to work where you do but if they don't having history with them (even at another job) sure does help.

twunde8y ago

Some advice from someone who has done the same thing multiple times. Find yourself someone who knows the application really well and become their best friend. You'll find some weird things in the codebase and often they'll be able to give you context

ioddly8y ago

Some really good suggestions here, so I'll just add: become acquainted with a code search tool (I used grep for a long time, now ag -- I don't think the tool matters that much for most purposes as long as you are comfortable with it).

camnora8y ago

The Legacy Code Rocks podcast is pretty good. They have a nice community on Slack too: http://legacycode.rocks

mschwaig8y ago

It's funny how you can write something terrible now and put it into production anyways, skipping the process of slow but steady degradation altogether. BAM! Instant legacy code.

newbear8y ago

How did you get this job with no experience? How did you estimate the price and time to completion? Congrats and good luck

imhoguy8y ago

Possibly the OP has got hourly/daily rate gig.

ASVBPREAUBVOP8y ago

Yes exactly. My first task is to examine the system and give recommendations on how to improve it.

ASVBPREAUBVOP8y ago

I'm working for the same company for a different product as a developer. I did not estimate anything yet.

j / k navigate · click thread line to collapse

38 comments

wirthjason8y ago

You have two problems on your hand. One is understanding what the code is doing from a technical perspective but another is understanding the business rules.

matt_the_bass8y ago

Can you share a python example of replacing boilerplate code with advanced features?

auxym8y ago

Context handlers come to mind and can often save a lot of error handling code.

maxxxxx8y ago

yarinr8y ago

"Working Effectively with Legacy Code" is a classic read. It dates back to 2004 but the techniques are still relevant.

mirceal8y ago

It’s on my list of books that every developer should read.

djuralfc8y ago

What's the rest of that list?

2 more replies

pdkl958y ago

> spaghetti, monolithic

https://www.joelonsoftware.com/2000/04/06/things-you-should-...

8ezhikov8y ago

noir_lord8y ago

Are you me?

This is the exact position I was in 6 months ago and it wasn't just legacy code (legacy code isn't inherently bad) it was/is bad legacy code.

https://leanpub.com/mlaphp is very very good, it's a roadmap/process to get a legacy PHP project to a reasonable state efficiently.

1 more reply

SeanKilleen8y ago

If pieces of a codebase need to be changed, I break it into a few general steps (varies depending on specifics):

1) Look at the code and examine what it would take to make it testable.

3) Put the existing code under tests. Write tests around that code -- preferably unit tests that exercise the legacy code, as written, to verify its behavior.

4) refactor the code that the tests are using to represent a cleaner codebase -- maybe you extract some objects / functions.

4) Write new tests to demonstrate the desired changed behavior. Write them as if you're writing them for a brand new codebase.

5) Make the code pass those tests. Some old tests may fail -- if they represent the old requirement, you can delete them. If they don't represent the old requirement, you know you have a bug.

Good luck!

Too8y ago

Since you are working with dynamic language, step 2 should include adding type annotations or whatever they are called in php. Specifying stricter types makes refactoring much easier.

maxxxxx8y ago

Agreed. Adding type annotations is already a huge refactor in itself though and pretty risky with PHP. Once you have that done life will be much easier.

1 more reply

megaman228y ago

[1] https://www.manning.com/books/re-engineering-legacy-software

[2] http://amzn.to/2CfxK8w

carbocation8y ago

It might be worth your time. (Edit: noir_lord also recommended the same book in this thread.)

web0078y ago

Others have already recommended Legacy Code by Michael Feathers, that's the place to start. The only other suggestion I have is Refactoring: Improving the Design of Existing Code by Martin Fowler.

vincenv8y ago

"Object-Oriented Reengineering Patterns" has some good advice on how to approach a legacy system and rewrite parts of it, pdf available from the authors website:

http://scg.unibe.ch/download/oorp/

w0rd-driven8y ago

PeterisP8y ago

IMHO it's to be expected that you won't get where you want to be, you'll fill some of the gaps but not all of them.

The pain points will be different for you, but that's the direction that needs to be identified and taken to proceed properly.

zer00eyz8y ago

This has been the bread and butter of my work life - I either inherit a pile of garbage or get a green field.

twunde8y ago

ioddly8y ago

camnora8y ago

The Legacy Code Rocks podcast is pretty good. They have a nice community on Slack too: http://legacycode.rocks

mschwaig8y ago

It's funny how you can write something terrible now and put it into production anyways, skipping the process of slow but steady degradation altogether. BAM! Instant legacy code.

newbear8y ago

How did you get this job with no experience? How did you estimate the price and time to completion? Congrats and good luck

imhoguy8y ago

Possibly the OP has got hourly/daily rate gig.

ASVBPREAUBVOP8y ago

Yes exactly. My first task is to examine the system and give recommendations on how to improve it.

ASVBPREAUBVOP8y ago

I'm working for the same company for a different product as a developer. I did not estimate anything yet.

j / k navigate · click thread line to collapse