You have two problems on your hand. One is understanding what the code is doing from a technical perspective but another is understanding the business rules.
If you haven’t already, get a high level view of the system. Maybe it can be divided into 4 chunks, and chunk 1 can be broken down into 8 components, etc. Then start documenting the different components in the codebase. Try to understand what the different components do — how are they called, what’s the input, output, do they mutate objects, etc.
Once you have a road map you search for “seams” where you can break things apart. Maybe component A, B and C are tightly coupled, but you can split A into two parts — A1 and A2 — and write something that encapsulates all of them (A1, A2, B, C) pinto a cleaner interface. Try to write wrappers that use existing code, then you can have higher confidence that behavior isn’t changing. If you rewrite low level components there’s no telling what the side effects may be.
Lastly, learn the language well. I work on a similar code base but it’s in Python. Knowing “advanced” features of the language has helped. Often a lot of boiler plate code can be eliminated by an advanced language feature. By knowing the “seams” of the system and the language you can bend the system to your will.
Otherwise I'd try to refactor the code into testable modules as much as possible. Unfortunately PHP is not on your side when it comes to refactoring. Especially in the older version people used a lot of "tricks" that make refactoring hard.
It’s on my list of books that every developer should read.
https://www.joelonsoftware.com/2000/04/06/things-you-should-...
It can be tempting to declare entire sections of legacy code - or even the entire project - to be an unmaintainable mess that isn't worth fixing. Reading code for a true understanding of how it works can be a slow, laborious, annoying process. Simply rewriting everything from scratch often appears to be easier than trying to read, understand, and fix a legacy pile of spaghetti.
Giving in to that temptation and rewriting a project instead of fixing and refactoring the existing code is almost always the wrong decision. The messy spaghetti probably started out a lot cleaner. The mess that accreted over time is often important bugfixes and design changes. Some problems only show up in the field and sometimes requirements change. The spaghetti of bugfixes, workarounds, and changes might be the most valuable part of the codebase. Throwing it out might be throwing away the accumulated knowledge and experience of many expensive developer man-years.
Instead of rewriting, preserve the bugfixes and real-world experience by refactoring the spaghetti. It can be annoying and tedious, but it's probably less work than re-debugging old problems until the "clean rewrite" accretes it's own spaghetti-like layer of bugfixes and workarounds.
This is the exact position I was in 6 months ago and it wasn't just legacy code (legacy code isn't inherently bad) it was/is bad legacy code.
https://leanpub.com/mlaphp is very very good, it's a roadmap/process to get a legacy PHP project to a reasonable state efficiently.
1) Look at the code and examine what it would take to make it testable.
2) make tiny, safe refactorings to prepare the code to be tested -- only if you are absolutely certain these changes can cause no side-effects (usually I'll rely on tools to help me do this, just to increase the confidence level). If you can safely extract related code into appropriately small & related files / objects, that can be a great start.
3) Put the existing code under tests. Write tests around that code -- preferably unit tests that exercise the legacy code, as written, to verify its behavior.
4) refactor the code that the tests are using to represent a cleaner codebase -- maybe you extract some objects / functions.
4) Write new tests to demonstrate the desired changed behavior. Write them as if you're writing them for a brand new codebase.
5) Make the code pass those tests. Some old tests may fail -- if they represent the old requirement, you can delete them. If they don't represent the old requirement, you know you have a bug.
Repeat this process for each piece of the application that needs to be changed. NOTE: In a legacy app, you sometimes have to make peace with the fact that some of the app in production will remain legacy, untested code. If it doesn't need to change, then you don't necessarily need to sink a huge amount of time trying to refactor / put all the code under test. Get in the habit of doing it whenever you need to make a change (and factoring in that time into any estimates, etc.) -- over time, the cost of a change will hopefully go down as you pay off that technical debt.
If you're trying to figure out what the cost of change will be, sometimes you can use static analysis to look at a codebase and show potential issues for a given section. Using such tools can sometimes help you understand how heavy of a lift it will be to modernize the code.
As has been suggested, "Working effectively with legacy code" is a guide book here. You'll likely also want to seek out language-specific material about refactoring, unit/integration testing patterns & tooling, etc.
Good luck!
[1] https://www.manning.com/books/re-engineering-legacy-software
It might be worth your time. (Edit: noir_lord also recommended the same book in this thread.)
I set out on a mission to recreate the production environment as closely and as primitively as possible. Instead of the full 16gb legacy database for instance, I only dumped the structure and added rudimentary test data. Now my primary workflow involved local, manual testing but the feedback loop was orders of a magnitude faster than waiting for subversion changes to get deployed. Recreating the production environment 1:1 was wrought with large annoying challenges.
Barring full conversion there's various techniques that required less effort. Using scratch scripts and running the local server in phpstorm helps but stripping code down to run locally can be cumbersome. Another option I took was getting lightweight functions working in an environment like http://sandbox.onlinephpfunctions.com/ and slowly integrating them into local scratch scripts or production.
Fortunately, the future at the company involved selling Laravel as a viable option, which makes everything so much easier. I'm a big proponent of frameworks or packages over custom code or NiH as they often soften edge cases or work around quirks in the language.
IMHO it's to be expected that you won't get where you want to be, you'll fill some of the gaps but not all of them.
The key issue is to identify what is the major pain point that prevents you from doing this. For example, in a particular similar situation for me the key points were (a) ability to reliably build a deployment package that's sure to work; (b) brief documentation about the functionality of the main components of the software and their interaction/interfaces; (c) creating a basic suite of tests to ensure that key functionality keeps working as intended if we change/rewrite certain parts fo the codebase.
The pain points will be different for you, but that's the direction that needs to be identified and taken to proceed properly.
Your first chore: getting a working dev instance with a debugger. Depending on the stack, and dependencies this might be a mountain that is rather hard to climb but it is going to make your life easier.
Second chore: look at the history you have. God bless you if you have source control, with commit history, and any sort of sane commit messages. Bug trackers are also your friend. Lastly there has to be SOMEONE with lore/knowlege of how the code base got to be the way it is - if you find them and TALK to them. Knowing WHY is almost as important as knowing what.
Third chore: Pull the schema. Get a schema dump of the production database, and look at what query logging is set at (might not be sane) and what it should be (query sampling might be your friend). If your lucky enough to have a MYSQL setup then use workbench to help generate a diagram or any other tool you prefer. You want to have an artifact when your done, one that you should maintain.
The fourth and fifth tasks are going to occur concurrently: Walking the code base and understanding or building in logging - Your going to walk the hot spots in the code base first - think home page, log in, and the core functionality. Every time you find something interesting add a comment, and LOG where you think is appropriate (remember you can always shut this off later).
Really this is an exercise in reading and understanding what exists today with as much context as to WHY as you can discover (see step 2). Don't be afraid of either using the @bug and @todo syntax in comments and opening up tickets against yourself/the codebase. You may end up with a list of 200 things to change in the first week and that is OK.
Once you can READ the code base as it exists make sure you REVIEW the code for what your replacing -- even money says that there are bug fixes and edge cases that have already been solved for in that code, ones that your replacement may have to solve for even if it isn't in the "requirements"
Lastly, find someone to commiserate with and someone you can "bounce ideas off of" - rubber ducking works up to a point but sometimes in explaining to someone else they ask the critical question and it sets off your thinking/exploration in a new direction. They don't HAVE to work where you do but if they don't having history with them (even at another job) sure does help.