I start with a relatively high level interface point, such as an important function in a public API. Such functions and methods tend to accomplish easily understandable things. And by "important" I mean something that is fundamental to what the system accomplishes.
Then you dive.
Your goal is to have a decent understanding of how this fundamental thing is accomplished. You start at the public facing function, then find the actual implementation of that function, and start reading code. If things make sense, you keep going. If you can't make sense of it, then you will probably need to start diving into related APIs and - most importantly - data structures.
This process will tend to have a point where you have dozens of files open, which have non-trivial relationships with each other, and they are a variety of interfaces and data structures. That's okay. You're just trying to get a feel for all of it; you're not necessarily going for total, complete understanding.
What you're going for is that Aha! moment where you can feel confident in saying, "Oh, that's how it's done." This will tend to happen once you find those fundamental data structures, and have finally pieced together some understanding of how they all fit together. Once you've had the Aha! moment, you can start to trace the results back out, to make sure that is how the thing is accomplished, or what is returned. I do this with all large codebases I encounter that I want to understand. It's quite fun to do this with the Linux source code.
My philosophy is that "It's all just code", which means that with enough patience, it's all understandable. Sometimes a good strategy is to just start diving into it.
Which types are required to define the function, both as parameters and through any global/inherited scope/other state? Which types are directly related, eg, containers to string, containers themselves, indexing, etc?
By the time you have some sense of what things go where in the program, and what they turn in to, you usually have significantly narrowed down what the program can be doing.
Note: This works considerably better with some languages than others, but can work reasonably well on weakly typed things like Python.
The really cool thing about that is that it's really no different than how we handle other new experiences in our lives. Nobody walks into something new and just knows what to do. Without even consciously realizing it, we make observations and draw inferences as we relate those observations to previous experience that allow us to make better-informed judgments. And thousands of years of evolution have made us rather good at this.
You already have the skills to jump into the deep end without being immediately overwhelmed, it's just a matter of applying them successfully.
After a few months you'll be familiar with it?
Wow, I'm lucky to get 8 hours on a new code base before I have to ship a bugfix. Months?? O_o luxury. Sip your pina colada as you write your immediately out dated documentation. Great advice.
No, that code base was probably written by several people some of whom knew what they were doing, some of whom were just 'getting the job done' and some of whom were idiots.
Probably, the only useful suggestion in that list is code reviews. Hack a fix in, get someone who seems clued up to review and suggest a better way.
Look for logging, thats the first thing I do; you're probably not the first person to be given this code to work with, and if you're lucky the last folk(s) made some debugging tools. If not, be a pro and leave some good ones behind when you go.
...not documentation.
A) Making sure they can compile it, and showing them where the various locations for documentation are.
B) Giving them simple bugs to work on while they get familiar with the codebase. 8 hours is more than enough time for a competent developer to fix many simple bugs.
You also have situations where the nature of the problem dictates that you fairly often need to make changes in unfamiliar code-bases. For example, at my previous job I was developing an Android fork. Android has a fair amount of its own code, but also includes a large number of external projects. It was a fairly common occurrence where I would have to fix a bug in one of those projects that I have never seen before (and will likely never see again).
These 'here is some generic pointless advice; got some useful advice? Leave it in the comments!' posts are social media posturing and lazy writing.
Certainly not everyone dives into different code bases every day / every week; but everyone does it sooner or later; and being immediately productive is hard. This article could have been an interesting collection of tips on how to do that... but the author couldn't be bothered researching the subject.
The nearest I've found (but not tried) are:
Screenster: http://www.creamtec.com/products/screenster/index.html
1. Run the program, look at stacktraces/profiler. Sometimes it's easier to analyze runtime and figure out what the program is doing and what are the main/common patterns.
E.g. Java - jstack, C++ - gdb, gprof
2. Use some static tools to get call/caller graph and be able to browse program quickly. Jump between definitions, see what's used, what's not.
E.g. C++ - docgen, Java - most modern IDE will do just fine
That said one step down the chain from stack traces and profiler output might be to look at log files, or run the code in verbose/debug/trace mode if it exists.
My first attempt was to reverse engineer the code into a UML diagram. For some reason I keep making this mistake. A messy code base will result in an extremely messy diagram. It can give a few insights, but between finding a tool which will work and trying to make sense of a tangled mess of lines, visual diagrams usually aren't worth the time.
I found that a tool called Chronon was somewhat useful (google "DVR for Java"). This tools just records a single run of a program. It is great for going forwards AND BACKWARDS and you step through the code, take a look at different threads, state of various objects, etc.
My strategy was to run the server and have it execute a small and simple bit of functionality (execute a single order). Follow it all the way from input to output. Make the scenario a bit more complex and follow that through to completion. This way you get to understand the core functionality, edge case code and start to get a sense of performance enhancements, etc.
I found myself making steady progress and fixing a number of bugs, until I hit heisenbugs, caused by overly clever concurrency/object pooling. It is enough to drain your soul :)
I find it helps focus the mind, provides a clear definition of success and forces you to think about a specific area of the code without requiring too much context.
After doing a few of these I've got a fair understanding of the structure of the code and can figure out where to go next.
I'm also wary of comments as I find they can often have 'drifted' in old code and become misleading. I make detailed notes of things that look dodgy or could be improved separate from the code.
I'd add that if the project has poor-to-no build scripts that require a lot of manual steps it's worth at least bandaging it up with a shell script early on.
I recently worked on a project that had many separate modules with independent build scripts, requiring copy-and-pasting built artifacts that others depended on, and several manual configuration tweaks post-packaging. That stuff is very tedious and sucks your energy. It's not generally a priority to rewrite all the build scripts, but if you're editing several modules it's worth being able to do a one-shot build from early on.
Episode 148: Software Archaeology with Dave Thomas http://www.se-radio.net/2009/11/episode-148-software-archaeo...
Being able to do some pair programming helped me to understand a new code base, in a language I never used before bits by bits. I could ask questions and actually helped on the issue. Asking questions is important, but as stated on the post, you should try to find to answer by yourself first. I'll actually time myself for 10 minutes, and if I can't find an answer, I'll just poke the most prominent person based on a git blame of the file that is related to my question.
This is my second week working on a Ruby codebase, without any prior experience with Ruby, or Rails (I've mostly been doing Python with Django and Pyramid for the past 3 years). I managed to get quickly up to speed this way.
Making unit tests can be great if you already understand the language (and the IDE), but when you're new to xcode, dont expect a unit test to make any more sense than the code.
Pairing can be a great tool, if you are pairing with the right person. You can't always find the right programmer (with the time or care) to sit down and work with you like that.
I was certainly humbled by this process, whether by choice or not, and asked my share of dumb questions, but I don't think you can really understand a codebase until you work on it
Just running through the tests and communicating a lot with testers and developers helped me understand how the app (ought to) behave.
Saw them on the "Made in NY" Product Hunt collection yesterday. Has anyone played with Bowery?