in Software engineering

Maintenance, The Hidden Cost of Software

“Maintenance? I never do maintenance!” I hear you say. “I’m a game programmer! A coder who lives on the bleeding edge and doesn’t have to bother with boring stuff like that. That’s for stuffy database programmers, not for me.”

Have you ever tackled a problem that was supposed to be solved by writing some code in half an hour, but that code haunted you (or your coworkers, much to their chagrin I’m sure) for months or years to come? Then you have felt first hand the consequences of maintenance, the hidden cost of software.

The situation usually starts our very innocently. Maybe someone brings up an idea during a meeting, or maybe your lead asks you to estimate how long something will take, or maybe it’s you who decides to implement a cool new feature you’ve been itching to add to the game. In any case, you end up agreeing to taking a couple of hours to write some quick code that will solve that particular problem.

If someone expresses some doubts about whether it’s worth doing it, you quickly answer “Don’t worry. It’ll only take me a couple of hours.” Famous last words.

Let’s take it from the top. Unfortunately, nobody works at full efficiency. Once you take into account the other meeting scheduled for that morning, a co-worker coming over to talk about yesterday’s football game, the obligatory mid-morning break to answer email and browse your favorite web site, and the time to synch to the latest code and do a full build while you get a fresh cup of coffee, we’re talking at least four hours instead of two. No big deal, it’s still really quick and it’s definitely worth doing.

Of course, chances are the initial estimate of two hours was totally inaccurate. It just felt like a task that should take two hours: you just sit there, type away for two hours, and it’s done. After all, you’re a really good programmer and if you can deal with writing super-optimized [insert your favorite complex algorithm here] functions, then you can write that simple code three times over before breakfast. Software engineers are eternal optimists, especially as far as our own abilities is concerned. Once you sit down and start banging away at the keyboard (because it’s so simple that you felt no need to do any thinking before start programming) you realize that it’s going a bit slower than you thought, and the “couple” of hours quickly turn into four or five hours. Add the interruptions and other distractions we mentioned earlier to that, and we’re talking a full work day taken up by that simple task. Maybe you even end up staying for an hour or two after work to finish the task. Professional pride; after all, you said it was going to be done in just a few hours. No big deal; it’s just one day.

You come in the next morning with a big smile on your face ready to continue with the work you had to put aside before you started with this task. You’re just wrapping your head around the code to figure out where you left it off a couple of days ago, when your lead walks in and tells you that the code you wrote yesterday doesn’t exactly do what he had in mind. He wanted it to output things in columns instead of comma-delimited rows. Sigh. No problem. In an hour you give him the new version he wanted. We’re up to 10 hours total. Let’s keep counting.

coins (c) FreeFoto.com Life is good for a while and you manage to put that task behind you. It left a bad taste in your mouth somehow, but at least it’s working and you’re moving along. Then, one day, you’re checking your list of assigned bugs and you see a high-priority bug. Something with that code you wrote is causing a tool to crash in all the designers’ machines with the latest release of the tools. Damn! You try to reproduce the problem in your machine so you can debug it, and, of course, it works (that’s the curse of the programmers’ computers–your code will always work flawlessly there… except when you’re trying to demo it someone). So you head over to the office of a designer, fiddle with it for a while until the crash happens again, hook it up with remote debugging, and eventually you see the obvious bug. Doh! Quick fix, new release of the tools. Another wasted morning. Without counting the time the designers were not able to work because of the crash, we’re up to 14 hours.

Weeks later, the story repeats itself. This time it turns out your code is too slow. “What do you mean too slow? It was very fast when I wrote it!” Then your heart sinks when you see that you were testing it with a couple dozen objects, but the designers are applying it to a level with 50,000 objects, and your O(n^2) algorithm isn’t cutting it anymore. This means you pretty much have to re-write it completely from scratch, but now you need to use a more complicated algorithm. This time it takes two full days to write, plus another day to figure out why the changes you made broke another tool that was using the same code. 38 hours so far.

Does the situation sound familiar yet? Maybe a little too painfully familiar? So we’ll skip over the 4 hours that will take to keep up to date when somebody else changes an interface that your code uses (and nobody knew it did because you wrote it by yourself without discussing with with anybody), and the full day it took Bob to fix it when your code broke the build and you were out sick.

The point of this drawn-out example is that we all do maintenance, and a lot more than we’d like to think. It is unfortunately rare for a piece of software to be written and to be left alone and untouched for years to come. You’ll have to update it to reflect changes to the underlying libraries, you’ll have to speed it up when its speed becomes critical, you’ll have to make modifications to it to implement the new features the artists are asking for.

I’d love to collect some statistics gathered from the source control program of a mature code base and see how often certain files are changed, what the hotspots are, and which projects are mostly left alone and never touched (and then learn something about what makes those files different). I’m sure someone has already written a script to collect exactly that type of statistics, so if I get a chance I’ll try it out and I’ll make sure to report back here.

What can we learn from all this?

The first thing is that the cost of writing some software is a lot more than the initial time it’s going to take a programmer to implement it. Depending on where and how that code is used, it could be many times that of the initial implementation cost. Unfortunately people (both programmers and managers) tend not to think of that when they’re deciding what features to implement or whether to give the green light to a new tool.

What’s the best way to reduce that cost? Having no software to maintain in the first place! The best code is no code at all. That’s the easiest code to implement, debug, and maintain! If we can avoid writing something, and have no negative impact to the project, then we shouldn’t write it. On the other hand, if having a programmer implement that feature is going to save artists and designers a huge amount of hours for the duration of the project, then we clearly should go ahead with it. We should simply be aware of the true cost of what we’re doing.

Sometimes we won’t know at the beginning whether we should implement something or not. If there is a chance that we don’t want or need to implement a feature, it’s probably worth our time to spend some time up front to decide whether we really need it. For instance, I was in charge of adding a texture caching system to our current game with the objective of reducing the amount of texture memory used and/or let us use more textures in a single level. I could have jumped straight into the task, coded some cool systems, optimized them, and had something acceptable by the end. Instead, I spent a couple of days running some tests: I measured the throughput from the disk while reading from a background thread and I wrote a really simple cache simulator running on the actual game data. By the end, it was clear that we were not going to get much benefit from texture caching unless we were willing to change how our levels were put together (which was not something we were willing to do at this stage of the project). Too bad that we couldn’t get that feature, but better that, than having spent a long time to write a bunch of code that is going to complicate things and not get us much of a benefit.

If we have decided that we need to implement a feature, then we should aim for the simplest possible solution that will do what we want. There are many reasons for this (probably a topic for another article), but in general, the simpler a solution is, the easier and faster it is going to be to maintain. You just don’t know what you’ll need to do to that code in the future: maybe you’ll need to make it faster, maybe you’ll need to extend it, or maybe you’ll be lucky and you’ll just have to keep it up to date. Trying to optimize it early or make it very general before it’s needed is going to be a waste of time now, and a waste of time when it comes time to maintain it, even if it makes you feel better about all the cool code you got to write.

Another interesting consequence of the cost of maintaining software is that the rate at which a team can write new code is going to decrease as the size of the code base increases. As more code is added, more time is spent maintaining existing code and making sure new code works well with existing code, and less time can be devoted to writing new code. This effect will be particularly noticeable in the first couple of years right after starting a project from scratch. At the beginning the project is a small, cute snowball that everybody understands. Development moves on at a really high pace and everybody’s spirits are high. However, as the months go by, the result of that pace makes the snowball grow and grow. Eventually it will become frustrating how hard it has become to make a small change that before would have been a trivial task.

Some of this can be mitigated by maintaining a good architecture: keep modular, independent subsystems, reduce dependencies, etc. Still, even with a good system, the weight of all the existing code makes it more difficult to change or add code in the future.

Something that can help the burden of software maintenance is using third-party libraries as much as possible. When people make decisions to use third-party libraries, they’re usually thinking of right now. They can pay X amount of money, and then have a programmer spend two weeks integrating those features into the game engine. What they might not realize is that they’re also saving costs in the future. Those libraries are going to be debugged separately, they’re going to be improved, and most importantly, they’re going to be forcefully separated from the project itself, maximizing the independence between the two. Yes, you’ll still have to retrofit your engine to make use of any interface changes of new versions of the libraries, but it’s still a big win.

Finally, in this age where reusability is looked upon as the Holy Grail, this could be a surprising concept for some: Throwaway code can be your friend. Throwaway or temporary code completely liberates us from any maintenance costs. We just write a something, use it, and once it fulfills its objective, we throw it away never to be seen again (OK, it’s fine to leave it in the dark depths of the version control program so we can refer to it if we ever need to). Perfect examples of throwaway candidates are quick scripts and tools to convert between file formats, or “scaffolding” code to help us get somewhere or help us get started while we wait for some other functionality to become available. It is very important to keep in mind what type of code we’re dealing with when we’re developing it though, and we should also clearly label it in some way so everybody knows it’s throwaway. That way a class labeled Tmp3DView won’t become deeply rooted in your tools, or will it?

  • The curse of the programmers’

    The curse of the programmers’ computers