Edit: Removed offensive word
2. Read the function a few times to get a sense of what it's doing.
3. Break the function into blocks of statements that are performing a related task.
4. Factor those out into smaller functions.
5. Recursively repeat 3-5 until you're satisfied with the new functions' line count.
> Should I tell my manager that the function needs to be refactored and take time to refactor it?
It's absolutely worth mentioning. Time estimations are one of the harder tasks of Software Engineering. Being up front with your manager can help better set expectations.
Assertions are comments which can always be trusted and which never get out of date. It baffles me that there's so much reluctance to use them (especially in dynamic languages).
You can probably add some tests around the inputs/outputs, depending on the structure. For example you might have to check the database before/after and other things like that. But it won't be "unit" tests in the conventional sense. More like Integration tests, because something that long should not only be multiple methods, but multiple classes.
Go to the part that you suspect you need to modify, identify a block of code that together forms something you can give a name, i.e. "initialize_doodad_struct", "validate_widget_params", pull it out put it in a function. Now look at all identifiers that are referenced or declared in your block of code, do a ack or ctrl+f through the 2000-N line function to 100% verify that they're not referenced anywhere. If they aren't you can safely encapsulate them in your new function (if you're working in a static language you probably could just right-mouse-click-extract to do all this).
Then immediately verify that the code still works. Run it, ideally in unit tests but most certainly also in real world(-like) scenarios. Do this for every small step. If your codebase takes 2 hours to compile, instead make a commit. Make a commit for every single 3-10 line extraction you do. (do it anyway, it'll look good and everyone will love you)
And then repeat until either the part you wanted to modify is clear to you, or there's nothing more to extract ;)
By the way, I'm a Ruby programmer mainly, and in Ruby any function over 10 lines is frowned upon. I know that in some programming language communities (looking at you C and C++!) a 200 line function doesn't raise any eyebrows, and they'll sometimes even scold you for extracting a function if it doesn't 'add' anything. I think this is because in C every function pollutes the global namespace, so there's a little more cost associated with it.
That is of course rarely the reason. From what I've experienced it's usually that the developer is uncomfortable with abstraction (at least that was my reason many moons ago).
Citation required.
https://robots.thoughtbot.com/sandi-metz-rules-for-developer...
It's probably easier to do that than to muck around in the 2000-line version hoping you don't screw something else up.
Making the assumption that anybody is capable of sufficiently testing such functions and subsequently re-writing them will only introduce new bugs and old regressions.
Seems harsh, but I've had to work with a few such monstrosities. Global state galore.
1) Understand at a very fundamental level what it does (there are probably multiple things).
2) Understand at a very fundamental level what it does (this is really the hard part)
3) Document your hard-earned understanding (tests, ironically, do not work well for this, since there are no fundamental units to test at, and there likely is a metric ton of shared state)
4) Refactor it into something easier to understand and test.
I've had to work with a similar 5,000 line C function at the heart of a DSL parser; literally the core of the entire business. Changes were made very slowly, and with a lot of consideration. Refactoring is still underway 6 years later.
What you do is that you:
A) Realize that there might be years of hidden bugs, but also fixes to complex problems, speedups, kludges, weird changes to requirements, etc, hiding in the code, so even really creative unit-test might not cover real-world edge-cases.
B) Take as small bit as you can, and just refactor out the smelly code into smaller parts. Often a good name around smelly code helps a long way.
D) Iterate. Until good enough.But not more... Maybe you have more important things to do than rewriting ugly code that works.
There is a big risk that since you probably won't understand everything that goes on, and why - you will break something important.
The good news is that many, many, many people have been here before you. While you may have a lot of work ahead, you can do it safely and in an orderly and safe fashion. (And with something as butt-ugly as that sounds, that's the only way I'd do it)
getting a feel for the structure of the function helps me understand it.
Ex:
function someFunction () {
if () {
for () {
if () {
} else {
}
}
if () {
for () {
}
}
}
}Ferinstance:
def giant_method
if thingy1?
do_stuff
else
do_something_else
end
do_common_stuff
if thingy1?
finish_up_for_thingy1
else
finish_up
end
end
It's easy to spot that there are really two paths through the method (depending on what thingy1? returns) in this simplified form, but rewriting it will make it clearer even in a 2k line function: def giant_method
if thingy1?
do_stuff
do_common_stuff
finish_up_for_thingy1
else
do_something_else
do_common_stuff
finish_up
end
end
This is also useful even when the 'if' clauses aren't identical - even then, there are frequently implied relationships (for instance, if one checks if something is positive and the other checks for zero) that can simplify things.In this case, the developer actually has return statements inside some if blocks. A general point that has stood out from all comments is to first understand how the function achieves what the function is trying to achieve.
dict[ keyword](args)
were dict is a large dictionary containing function objects. In that case the dictionary approach has the advantage of separating the logic from the clutter, but it is possible to work with 2000 lines of case something:
foo();
break;
However if it is not such a case, then you are in trouble. Thing is, management does not like to be told that the code is a steaming pile of shit, and management is especially unhappy if you tell them that you are going to spend a few month on accomplishing absolutely nothing. From their perspective refactoring is accomplishing nothing: the functionality is there, it works and they do not care about the gory details. So you need to convince your manager that refactoring is a good idea, which just goes against every basic assumption of his job. So think about the actual problems you have with the code, and what the company gains by refactoring it. (Easier extensibility, less risk of bugs and faster turn around if a bug occurs.) Then make sure that you convince your manager that refactoring is the right thing to do. ( Ideally you should enlist your coworkers for that, if they know the kinds of problems.)Speaking a bit more generally, this is part of why technical dept happens. Selling refactoring is hard, so everybody tries to tip-toe around the 2000 line gorilla until some unhappy soul (read someone else) can no longer avoid to wrestle with it.
http://www.amazon.com/Working-Effectively-Legacy-Michael-Fea...
This function was used as a container to hold all the business logic and flow control. It had dozens of parameters that all had to have some value or another passed to keep from throwing undefined errors. There was an integer called "wat_to_do" that would choose different blocks of if/than/else inside the function. Not like case, mind you, scattered everywhere.
I've never really been the same since then. Can you get code PTSD?
It then goes on to describe the "legacy software change algorithm" (google that for more articles that will help you. Get targeted legacy code into test harness and cover with tests: 1. Identify change points for the target change or new code addition. a. Find test points where the behavior of the code can be sensed. b. Break dependencies (very carefully and often without sufficient tests) and get the targeted legacy code into a test harness. c. Cover targeted legacy code with (characterization) unit tests 2. Add new functionality with Test Driven Development (TDD) 3. Refactor to remove duplication, clean up, etc.
It really is worth reading this book as it covers almost exactly your question: "22. I Need To Change a Monster Method and I Can’t Write Tests for It." and various other relevant chapters.
Answering your manager question with a standard consultant answer "It depends". Refactoring code is about setting yourself up for future development. You should always be refactoring your code after you confirm it works. Asking your manager whether you do it should take on the same amount of importance as asking whether you should wash your hands after visiting the restroom. That said, developers can often feel a sense of ownership of code and not wish change for various reason (usually familiarity with the current crap situation).
(note: would be happy for you to have left the "offensive" word in - 2,000 lines of code for a single function screams for a refactor, and says that the original dev had no clue how to write good code. if the code makes you want to cuss, well....)
If you add the new logic this time and also split it into four 500-line methods and maybe a few extra tests, you have done quite a lot and it is likely doable within the time frame of the work you are doing.
But
If there are zero tests and you aren't confident that you can create those tests then I'd bring it up with management. Maybe they'll say that changes in this functionality will be very rare after this small change and that it's very well tested manually and by customers - then that's it. Not all code has to be made concise and elegant, it's better to focus your effort on the code that regularly needs changing. However, in this case I'd ask management to consider not changing it at all, or make sure it will be very well tested manually after your changes).
I would recommend against any kind of rewrites or radical refactoring right away. You probably don't know what it does well enough yet or what kind of workarounds and bug fixes are in there, so you're likely to bring back bugs. And not many businesses have the time to spare to actually redo something like that correctly all at once.
Aside: You could just give up and quit, but I consider it part of the job of a good developer to be able to take a mess of spaghetti code and gradually turn it into something easily understandable without causing huge delays in business processes or show-stopper bugs. I don't think a developer who is only willing to touch pristine code is very valuable in the real world.
I'd make the first goal to add the required changes with as few changes to the function as possible. Focus your refactoring work at first on trying to get some tests around this thing. Don't try to test every use case right away, but do try to comprehensively test a few tasks, including capturing whatever it does to any state in any other systems. Most likely, a lot of the initial refactoring work will hardly touch the function at all, but instead focus on putting the things that it touches into modular, testable parts. Getting good tests around it will help you understand what it does better, what parts of it are important, and how state is handled around the system.
Then as you go, start adding tests, and gradually refactor things that are tested. Write tests for any bugs that are reported back to you. Anything that looks weird, ask around and see if you can figure out what it's for and if you really need to test for it.
The gist of it: a strong suite of integration and unit tests. Isolate small code paths into logical units and test for equivalency.
[0] http://www.amazon.com/Working-Effectively-Legacy-Michael-Fea...
I would read the thing a few times and figure out exactly where to add the functionality. Refactor second, if time permits. If the thing is 2k long clearly, refactoring it isn't a priority.
Refactor thereafter and test after each iteration of changes. It's not a rewrite, so don't lose the plot just Refactor into smaller methods :)
or read the 'ol refactoring book...
The obvious refactoring is to split the function up into smaller pieces, even if you can split it into two 1500 line functions that is a win.
Tell your manager that you want to refactor it. Just don't tell your manager "it's shit".
The "don't say it's shit" was when speaking to the manager... not using it on your thread. Unless I'm the one misunderstanding the statement...