Gamasutra: The Art & Business of Making Gamesspacer
View All     RSS
April 24, 2014
arrowPress Releases
April 24, 2014
PR Newswire
View All

If you enjoy reading this site, you might also want to check out these UBM TechWeb sites:

In-depth: Cleaning bad code
In-depth: Cleaning bad code
August 21, 2012 | By Niklas Frykholm

August 21, 2012 | By Niklas Frykholm
More: Console/PC, Social/Online, Smartphone/Tablet, Indie, Programming

In this reprinted #altdevblogaday in-depth piece, BitSquid co-founder Niklas Frykholm offers some useful advice for dealing with messy, old code you've inherited from elsewhere.

Guess what! You've just inherited a stinking, steaming pile of messy old code. Congratulations! It's all yours.

Bad code can come from all kinds of places. Middleware, the internet, perhaps even your own company.

You know that nice guy in the corner that nobody had time to check up on? Guess what he was doing all that time. Churning out bad code.

Or remember that module someone wrote years ago, just before she left the company. That module that 20 different people have then added hacks, patches and bug fixes to, without really understanding what they were doing. Yup, that one.

Or what about that open source thing you downloaded that you knew was horrible, but it solved a very specific and quite hairy problem that would have taken you ages to do by yourself.

Bad code doesn't have to be a problem, as long as it's not misbehaving, and nobody pokes their bloody nose in it. Unfortunately, that state of ignorant bliss rarely lasts. A bug will be discovered. A feature requested. A new platform released. Now you have to dig into that horrible mess and try to clean it up. This article offers some humble advice for that unfortunate situation.

0. Is it worth doing?

The first thing you need to ask yourself is whether the code is worth cleaning. I'm of the opinion that when it comes to code cleaning, you should either karate do "yes", or karate do "no". Either you assume full responsibility for the code and rework it until you end up with something that you are actually happy to maintain and proud to have in your codebase.

Or you decide that even though the code looks horrible, it isn't cost-effective to take time out of your busy schedule to fix it. So instead you just do the smallest change possible that solves your current problem.

In other words, you either regard the code as yours or theirs.

There are merits to both alternatives. Good programmers get an itch when they see bad code. They bring out their torches and pitchforks and chant: "Unclean! Unclean!" And that is a good instinct.

But cleaning code is also a lot of work. It is easy to underestimate the time it takes. It can be nearly as time-consuming as rewriting the whole thing from scratch. And it doesn't bring any short-term benefits. Two weeks cleaning code won't add any new features to the game, but it might give you some new bugs.

On the other hand, the long-term effects of never cleaning your code can be devastating. Entropy is the code-killer.

So, never an easy choice. Some things to consider are:
  • How many changes do you expect to make to the code?Is it just this one small bug that you need to fix, or is this code that you expect to return to many times to tweak and tune and add new features. If it's just this one bug, then perhaps it is best to let sleeping dogs lie. However, if this is a module that you will need to mess around with a lot, then spending some time to clean it up now, will save a lot of headache later.
  • Will you need/want to import upstream changes?Is this an open source project that is under active development? If so, and you want to pull the changes made upstream, you can't make any big changes to the code or you will be in merge hell every time you pull. So just be a nice team player, accept its idiosyncrasies and send patches with your bug fixes to the maintainer.
  • How much work is it?How many lines of code can you realistically clean in a day? An order of magnitude estimate says more than 100 and less than 10,000, so let's say 1,000. So if the module has 30,000 lines, you might be looking at a month of work. Can you spend that? Is it worth it?
  • Is it a part of your core functionality?If what the module does is something peripheral, like say font rendering or image loading, you might not care that it is messy. You might swap out the whole thing for something else in the future, who knows. But you should own the code that relates to your core competence.
  • How bad is it?If the code is just slightly bad, then perhaps you can live with it. If it is mind-numbingly, frustratingly incomprehensibly bad, then perhaps something needs to be done.

1. Get a test case

Seriously cleaning a piece of code means messing around with it a lot. You will break things.

If you have a decent test case with good coverage, you will immediately know what has broken and you can usually quite quickly figure out what stupid mistake you just made. The time and anxiety this saves over the course of the cleaning process is just ridiculous. Get a test case. It's the first thing you should do.

Unit tests are best, but all code is not amenable to unit testing. (Test fanatics, send your hate mail now!) If unit tests are too cumbersome, use an integration test instead. For example, fire up a game level and run the character through a specific set of actions related to the code you are cleaning.

Since such tests are more time-consuming, it might not make sense to run it after every change you make, which would be ideal. But as you put every single change you make into source control, it's not so bad. Run the test every once in a while (e.g., every five changes). When it discovers a problem, you can do a binary search of those last few commits to find out which one caused the problem.

If you discover an issue that wasn't detected by your test, make sure that you add that to the test, so that you capture it in the future.

2. Use source control

Do people still have to be told to use source control? I sure hope not.

For cleaning work, it is absolutely crucial. You will be making lots and lots of small changes to the code. If something breaks you want to be able to look back in the revision history and find out where it broke.

Also, if you are anything like me, you will sometimes start down a refactoring path (like removing a stupid class) and realize after a while that it wasn't such a good idea, or, that it was a good idea, but that everything would be a lot simpler if you did something else first. So you want to be able to quickly revert everything you just did and begin anew.

Your company should have a source control system in-place that allows you to do these changes in a separate branch and commit as much as you like without disturbing anybody else.

But even if it doesn't, you should still use source control. In that case, download Mercurial (or Git), create a new repository and put the code that you checked out of your company's stupid system there. Do your changes in that repository, committing as you go. When you are done you can merge everything back into the stupid system.

Cloning the repository into a sensible source control system only takes a few minutes. It is absolutely worth it. If you don't know Mercurial, spend an hour to learn it. You will be happy you did. Or if you prefer, spend 30 hours to learn Git instead. (I kid! Not really. Nerd fight now!)

3. Make one (small) change at a time

There are two ways of improving bad code: revolution and reform. The revolution method is to burn everything with fire and rewrite it from scratch. The reform method is to refactor the code with one small change at a time without ever breaking it.

This article is about the reform method. I'm not saying that revolutions never are necessary. Sometimes things are so bad that they just need to go. But people who get frustrated with the slow pace of reform and advocate revolution often fail to realize the full complexity of the problem and thus don't give the existing system enough credit for the things it does.

Joel Spolsky has written a classic article about this without falling into the trap of making strained political metaphors.

The best way of reforming code is to make one minimal change at a time, test it and commit it. When the change is small it is easier to understand its consequences and make sure that it doesn't affect the existing functionality. If something goes wrong, you only have a small amount of code that you need to check.

If you start doing a change and realize that it is bad, you won't lose much work by reverting to the last commit. If you notice after a while that something has gone subtly wrong, a binary search in the revision history will let you find the small change that introduced the problem.

A common mistake is to do more than one thing at the same time. For example, while getting rid of an unnecessary level of inheritance you might notice that the API methods are not as orthogonal as you would like them to be and start to rearrange them. Don't! Get rid of the inheritance first, commit that and then fix the API.

Smart programmers organize the way they work so that they don't have to be that smart.

Try to find a path that takes you from what the code is now to what you want it to be in a sequence of small steps. For example, in one step you might rename the methods to give them more sane names. In the next, you might change some member variables to function parameters. Then you reorder some algorithms so that they are clearer. And so on.

If you start doing a change and realize that it was a bigger change than you originally thought, don't be afraid to revert and find a way of doing the same thing in smaller, simpler steps.

4. Don't clean and fix at the same time

This is a corollary to (3), but important enough to get its own point.

It is a common problem. You start to look at a module because you want to add some new functionality. Then you notice that the code is really badly organized, so you start reorganizing it at the same time as you are adding the new functionality.

The problem with this is that cleaning and fixing has diametrically opposite goals. When you clean, you want to make the code look better without changing its functionality. When you fix, you want to change its functionality to something better. If you clean and fix at the same time, it becomes very hard to make sure that your cleaning didn't inadvertently change something.

Do the cleaning first. Then, when you have a nice clean base to work with, add the new functionality.

5. Remove any functionality that you are not using

The time it takes to clean is proportional to the amount of code, its complexity and its messiness.

If there is any functionality in the code that you are currently not using and don't plan to be using in the foreseeable future — get rid of it. That will both reduce the amount of code you will have to go through and its complexity (by getting rid of unnecessary concepts and dependencies). You will be able to clean faster and the end result will be simpler.

Don't save code because "who knows, you might need it some day." Code is costly — it needs to be ported, bug checked, read and understood. The less code you have, the better. In the unlikely event that you do need the old code, you can always find it in the source repository.

6. Delete most of the comments

Bad code rarely has good comments. Instead, they are often:
// Pointless:

// Set x to 3
x = 3;

// Incomprehensible:

// Fix for CB (aug)
pos += vector3(0, -0.007, 0);

// Sowing fear and doubt:

// Really we shouldn't be doing this
t = get_latest_time();

// Downright lying:

// p cannot be NULL here
Read through the code. If a comment doesn't make sense to you and doesn't further your understanding of the code — get rid of it. Otherwise you will just waste mental energy on trying to understand that comment on each future reading of the code.

The same goes for dead code that has been commented or #ifdef'ed out. Get rid of it. It's there in the source repository if you need it.

Even when comments are correct and useful, remember that you will be doing a lot of refactoring of the code. The comments may no longer be correct when you are done. And there is no unit test in world that can tell you if you have "broken the comments".

Good code needs few comments because the code itself is clearly written and easy to understand. Variables with good names do not need comments explaining their purpose. Functions with clear inputs and outputs and no special cases or gotchas require little explanation. Simple, well written algorithms can be understood without comments. Asserts document expectations and preconditions.

In many cases, the best thing to do is just to get rid of all old comments, focus on making the code clear and readable, and then add back whatever comments are needed — now reflecting the new API and your own understanding of the code.

7. Get rid of shared mutable state

Shared mutable state is the single biggest problem when it comes to understanding code, because it allows for spooky "action at a distance", where one piece of code changes how a completely different piece of code behaves. People often say that multithreading is difficult. But really, it is the fact that the threads share mutable state that is the problem. If you get rid of that, multithreading is not so complex.

Since your goal is to write high-performant software, you won't be able to get rid of all mutable state, but your code can still benefit enormously from reducing it as much as possible. Strive for programs that are "almost functional" and make sure you know exactly what state you are mutating where and why.

Shared mutable state can come from several different places:
  • Global variables. The classic example. By now everybody surely knows that global variables are bad. But note (and this is a distinction that people sometimes fail to make), that it is only shared mutable state that is problematic. Global constants are not bad. Pi is not bad. Sprintf is not bad.
  • Objects — big bags of fun. Objects are a way for a large number of functions (the methods) to implicitly share a big bag of mutable state (the members). If a lazy programmer needs to pass some information around between methods, she can just make a new member that they can read and write as they please. It's almost like a global variable. How fun! The more members and the more methods an object has, the bigger this problem is.
  • Megafunctions. You have heard about them. These mythic creatures that dwell in the deepest recesses of the darkest codebases. Broken programmers talk about them in dusky bars, their sanity shattered by their encounters: "I just kept scrolling and scrolling. I couldn't believe my eyes. It was 12,000 lines long."When functions are big enough, their local variables are almost as bad as global variables. It becomes impossible to tell what effect a change to a local variable might have 2,000 lines further down in the code.
  • Reference and pointer parameters. Reference and pointer parameters that are passed without const can be used to subtly share mutable state between the caller, the callee and anyone else who might be passed the same pointer.
Here are some practical ideas for getting rid of shared mutable state:
  • Split big functions into smaller ones.
  • Split big objects into smaller ones by grouping members that belong together.
  • Make members private.
  • Change methods to be const and return the result instead of mutating state.
  • Change methods to be static and take their arguments as parameters instead of reading them from shared state.
  • Get rid of objects entirely and implement the functionality as pure functions without side effects.
  • Make local variables const.
  • Change pointer and reference arguments to const.

8. Get rid of unnecessary complexity

Unnecessary complexity is often a result of over-engineering — where the support structures (for serialization, reference counting, virtualized interfaces, abstract factories, visitors, etc) dwarf the code that performs the actual functionality.

Sometimes over-engineering occurs because software projects start out with a lot more ambitious goals than what actually gets implemented. More often, I think, it reflects the ambitions/esthetics of a programmer who has read books on design patterns and the waterfall model and believes that over-engineering makes a product "solid" and "high-quality."

Often, the heavy, rigid, overly complex model that results is unable to adapt to feature requests that were not anticipated by the designer. Those features are then implemented as hacks, bolt-ons and backdoors on top of the ivory tower resulting in a schizophrenic mix of absolute order and utter chaos.

The cure against over-engineering is YAGNI — you are not gonna need it! Only build the things that you know you need. Add more complicated stuff when you need it, not before.

Some practical ideas for cleaning out of unnecessary complexity:
  • Remove the functionality you are not using (as suggested above).
  • Simplify necessary concepts, and get rid of unneeded ones.
  • Remove unnecessary abstractions, replace with concrete implementations.
  • Remove unnecessary virtualization and simplify object hierarchies.
  • If only one setting is ever used, get rid of the possibility of running the module in other configurations.

9. That is all

Now go clean your room!

[This piece was reprinted from #AltDevBlogADay, a shared blog initiative started by @mike_acton devoted to giving game developers of all disciplines a place to motivate each other to write regularly about their personal game development passions.]

Related Jobs

Turbine Inc.
Turbine Inc. — Needham, Massachusetts, United States

Director, Analytics Platform Development
Gameloft Toronto
Gameloft Toronto — Toronto, Ontario, Canada

Technical Artist
Muti Labs
Muti Labs — Santa Monica, California, United States

Senior Game Engineer
Digital Extremes
Digital Extremes — London, Ontario, Canada



Matthew Downey
profile image
Good read. I can't relate to 30k line scripts, but I can relate to 1000+. A lot of the info seemed obvious to me at first, but after looking at my scripts, I realized I still had useless comments. When it comes to fixing things I tend to clean and fix at the same time and delete a segment before rewriting it (which is a problem since I don't have source control).

Luke Quinn
profile image
As a one man indie dev, I was suprised how much I got out of this blog.
I have been guilty of a lot of the bad habits complained about here (namely over-engineering, giant & bloated classes, stupidly vague comments that I myself will likely not understand in the future) and it really gave me incentive to make sure I keep my code clean.
From now on, I'll try to make sure to spend a little extra time on future-proofing my code by neatening it all up a bit as I finish it.

Nathan Baughman
profile image
I'm also a one-person indie developer (Island Forge), and I'm guilty of over-engineering. Specifically, I abstract too early (Point#8). Design Patterns are fun, but simplicity is my new mantra.

Point#7 is a stickler for me. So many textbooks teach students to use member variables (in an object) to hold temporary inter-method state. This is, in my opinion (and yours, I'd say), entirely wrong. Member variables serve the context of the object. Methods should pass arguments around. This avoids side-effect behavior and makes for a cleaner intra-object API.

Point #2: I use Subversion. Is that wrong?

Matthew Mouras
profile image
Response to your Point #2: Not if you are making it work for yourself :)

Our team inherited a project that was already using SVN. We hated it, but finally spent half a day coming to terms with SVN and developing some scripts to automate our use of it... I spent weeks begging for us to dump it in favor of Mercurial before that. Was the half a day better spent coming to terms with SVN or migrating to Mercurial? Our team will never know - bottom line is we have source control that works and we're all on the same page. That's valuable.

sean lindskog
profile image
I think subversion is a great choice for a one-person indie dev.
For larger teams, it may become more of an issue.

Cartrell Hampton
profile image

I do like tips 3 and 4, as I often do those myself. Source control isn't a bad idea, either.
One pet peeve of mine is seeing yoda-conditions. Ack!

Otherwise, I HATE "inheriting" other people's code. Period.

- Ziro out.

Christian Sumido
profile image
I know that feel bro.

I enjoyed that read. I start an internship next Monday and will hopefully not seeing a lot of bad code. hahahahaha, I can hope right? Then again, interns get all the bullshit no one wants to do. Well then, may the Programming Gods have mercy on my soul.

Bram Stolk
profile image
I think there is a much more important piece of advice you should give:
You need to understand the inherited code, and to do this, you run Doxygen on it.
Very practicle, and extremely useful advice I would say.

Danny Bernal
profile image
"Megafunctions. You have heard about them. These mythic creatures that dwell in the deepest recesses of the darkest codebases. Broken programmers talk about them in dusky bars, their sanity shattered by their encounters: "I just kept scrolling and scrolling. I couldn't believe my eyes. It was 12,000 lines long."When functions are big enough, their local variables are almost as bad as global variables. It becomes impossible to tell what effect a change to a local variable might have 2 000 lines further down in the code."

.... oh god... :\
Yes, this is real! And the stories do get told like that!

Jamie Mann
profile image
I've got a few of those at the moment. Admittedly, the functions top out at around 1000 lines apiece, but they're embedded in some significantly large files...

$ find . -size +200k | grep inc | xargs wc -l
61024 total

Andrew Wallace
profile image
"// Really we shouldn't be doing this"

Laughed out loud when I saw this. I am extremely guilty of this.

Ian Uniacke
profile image
not to mention the more important aspect of that:

why are you doing it then?

Boy do I hate seeing comments like that. It's either: do it the right way, if the right way is this (because of time constraints) then we SHOULD be doing this so don't add the comment.

Jamie Mann
profile image
@Ian: pragmatically, sometimes you don't have a choice. For instance, you may be under time/resource constraints, or there may be external dependencies which are beyond your control.

The correct thing to do in these situations is to document the reason for what you're doing, and what the preferred solution is. That said, I'm generally in two minds about whether or not this sort of documentation should go into the code. On the one hand, it's not going to get lost/forgotten; on the other hand, this sort of comment is most likely to undergo digital bitrot over time...

Charles Zapata
profile image
A fantastic book providing strategies for dealing with older, larger code bases is Michael Feather's "Working Effectively with Legacy Code". It shows you how to safely modify and improve legacy code bases. It should be required reading for anyone taking on, or inheriting, a large software project.

chris kirby
profile image
In the good old days (z80 assembler on GameGear if I remember correctly) my editor could only load 8 32k files.. So I had to delete comments to add more code!

tony oakden
profile image
Oh yes! the good old days. I started developing on a machine with only 32K in total so that put a major limit on how many comments you could have and how long variable names could be. I think the experience taught me quite a bit of discipline.

chris kirby
profile image
On and for mega functions, I converted nba jam arcade to ps1 back in the day, most of the game was just one huge function called 4 times, once for each player!! I converted it from the weird Texas insuraments chip to r3000 and walked away..

profile image
very lol @ spreading fear and doubt via code comments. Hits home hard.

Michael Rooney
profile image
I disagree very firmly with step number 9.

sean lindskog
profile image
Great article.

I'd give one extra piece of advice. When you start tugging on the strings of bad code, it can unravel much further than you originally expect. At the start, do your best to create a "blast zone" of what code is going to be replaced. Try very hard to resist the temptation to go beyond that, or your 2 week clean-up might turn into a 2 month clean-up. Accept that there may be a little bit of ugly "glue" code at the boundaries of your nice clean refactored code, and some of the remaining nasty code it interfaces with.

Seth Rosenfeld
profile image
I like this post quite a bit - thanks Nikolas.

I've found that in certain types of smart engineers, the greater a tendency desire a scrap and rewrite. The better programmers I've worked with can temper this desire with an eye towards the ultimate project objectives.

Is there a vector here that considers whether the code was initially meant to be re-used, or if it was intended as "throwaway"? As much as I hate the latter, it still happens from time to time and can be a much different situation than poorly executed code that is intended to be used again.

(Unfortunately, all too often, throwaway code is not in fact thrown away, but archived somewhere where it sits like a timebomb for poor unsuspecting future devs.)

Kris G
profile image
>Good code needs few comments because the code itself is clearly written and easy to understand.

disagree. "what" good code is doing is usually straightfwd to understand without too many comments, but
WHY it's doing what it's doing is where comments are needed.

Mark Venturelli
profile image
This is really good! Reminds me I should check #altdev more =)