in C++, Software engineering

Simple Is Beautiful

If you’ve read some of my other articles, you know that I believe that the best code is no code at all. But what if you actually have to write some code? What then? This article deals with that question and shows the importance of simplicity.

As I write this there are still a few cardboard boxes strewn across our home as we finish unpacking everything. My wife and I just finished a surprisingly painless cross-country move to sunny San Diego. The town house we’re moving into is definitely larger than our previous condo, but it wasn’t until we had to pack everything, move it, and then unpack it that we realized how much stuff we had accumulated over the years. Yes, there was a reason for everything we had, but it was still a lot of stuff. Then we came to the realization that we prefer a more open, simpler environment with fewer pieces of furniture, fewer paintings covering the walls, and less clutter in general.

That is an interesting insight because it applies to many different things, including programming: There is a cost associated with having a complex, cluttered environment, whether it be the furniture in your house, your class hierarchy, or your build process. In particular it will create resistance to change, which can be in the form of a cross-country move, or a simple class refactoring.

Goals

The ultimate reason to write code is to have the computer carry out a sequence of instructions. In our case, that sequence of instructions is a game or a tool to help create the game. It is tempting to think of the computer as the “audience” for our program, but it is not a very good audience. It is way too forgiving, and as long as things compile, it’ll be happy with anything you throw at it. We can do better than that.

We can still write code that is correct, but with people in mind as the primary audience. The usual reason for choosing people as your audience is that software development is a team effort, and it’s very important that other people be able to understand and modify your code. True, but there’s more to it than that. Even if you were the only person writing code, you should still write for people, not computers. Assuming it’s more than a toy project, you will have to refactor your code as you learn new things about it. Probably not once, not twice, but many times over the course of the project’s lifetime. You will have to keep it up to date, fix bugs, add new features, etc. I am convinced that refactoring (or lack thereof) is one of the major influences on code quality, and, as a consequence, product quality, so anything to help with that is extremely important in my book.

Lately, I’ve been taking it a step further and thinking of my audience not just other programmers, but other people who are not programmers. If you manage to write some code that a non-programmer can more or less understand, then you know you’re there. The code should be simple and self-explanatory enough that should be a breeze to refactor in the future.

links (c) freeimages.co.uk

But wait, as if ease of refactoring wasn’t enough, there are more reasons for writing really simple code. Edsger W. Dijkstra in his lecture “The Humble Programmer” advocated the avoidance of clever tricks and urged programmers to be fully aware of the limitations of the human mind when writing programs.

“The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague.”

Dijkstra’s reason for keeping clever tricks away is not so much for ease of refactoring, but the quality of the program itself. Programmers are all too eager to tackle something too ambitious at once, without breaking it down into smaller, more understandable sub-problems, and, as a result, create a disastrous program that crashes often and doesn’t even implement all the required features.

Today, this sounds like a surprisingly modern attitude. After all, it’s right up the alley of agile development. But amazingly, this lecture took place in 1972. Talk about timeless advice (OK, OK, in the really short scale of computer history, not in the grand scheme of things).

Brian Kernighan had also something to contribute to the idea of writing simple code, but coming from a very different angle:

“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”

While I totally agree with the sentiment, I feel it is not as important as the other two reasons for aiming towards code simplicity. Frankly, if you use test-driven development and take steps of the correct size, you should hardly ever find yourself knee-deep in the debugger trying to make sense of your program. However, a decade or two ago that made a lot more sense, especially given that debuggers were less sophisticated then as well. Still, anything that encourages programmers to keep things simple is a step in the right direction. So if that convinces some people, so be it.

Attaining Simplicity

Great. Simple is good. How do we write “simple” code? That’s not something they teach in school and it’s not something that is pushed for in the workplace, so it’s probably not something you’re used to thinking about. The rest of the article will be a series of suggestions and techniques on how to make code as simple as possible. Specific examples are given in C++, but the general techniques should apply to most object-oriented languages.

Avoid unnecessary language tricks.

Listen to Dijkstra and go easy on “clever code.” If you feel the need to write a comment to explain to other people what that cute line of code you just wrote does, you should very strongly consider re-writing it so a comment is not necessary. I’m not saying to avoid taking advantage of language features, just the ones that can be done in another, simpler way. For example, don’t use a template when a non-templated implementation is perfectly fine. Don’t rely on a clever arrangement of variables so they’re initialized in the correct order even though the language standard says it should work.

Avoid or postpone performance optimizations that obscure the meaning of the code for as long as possible.

You’re all familiar with the Donald Knuth quote: “Premature optimization is the root of all evil.” Programs have a way of twisting into evil, misshapen fiends when they’re optimized, and they usually become much harder to refactor afterwards. Specifically in this case, I suggest avoiding (or delaying at least) optimizations that affect how simple or readable a section of code is. There are many ways to skin a cat, and there are also many ways to optimize a program. I find that there is often a way to optimize things that doesn’t cause a major loss in code simplicity. But if that’s not possible, at least defer that optimization as long as possible. Maybe you won’t need it, and maybe the code will change before then and you’ll save yourself a lot of rework.

Functions should do one thing only.

The secret to a good function is that it should do one thing and one thing only. Functions that follow that rule are usually quite small (at most 10-15 lines in C++) and their whole meaning can be easily understood at a glance. Because the function has only one purpose, refactoring becomes very easy: No need to worry about how changes in that function will affect other parts of the function, no pesky local variables reused in many parts of the function, etc.

Maybe this is a bit extreme, but ideally, functions should be so simple that I should be able to look at a function, delete it and re-write it from scratch without any difficulty.

I follow two guidelines to keep functions as simple as possible:

  • If a function goes over 10 to 15 lines of code, I look at it really hard and try to split it. Chances are it’s doing more than one thing.
  • If I ever feel the need to write an explanatory comment about what a few lines of code do, I move those lines into a function of their own and give it a name that explains what it does (usually the same as the what the comment would have been). By following this rule, the only comments in my code are the ones that explain *why* the code does things the way it does, and deal with higher-level such as the purpose of a class or a library.

Classes should do one thing only.

This should come as no big surprise. If we want our functions to be small, simple, and just one thing, we also want the same qualities from our classes. When classes are left unchecked, they can easily grow into unmanageable blobs of many thousands of lines. The largest class I had the misfortune of encountering had a staggering 10,000 lines in it, and the problem was compounded by inheriting from another class with over 8,000 lines! As you can imagine, having to do any work near that class was quite a punishment.

links (c) freeimages.co.uk

Personally, I feel that C++ classes should ideally be no more than about 200-300 lines (remember, with virtually no other comments). Anything that goes beyond that, and I start thinking hard about what exactly the class does. If a class reaches 1000 lines, it’s time to get the axe out.

One technique I’ve been following to keep classes as simple as possible is to use the controversial suggestion from Scott Meyers on how to decide on the scope of a function. The quick summary is that functions should be pushed out of a class as much as possible: if a function can be implemented as a non-member function, then do it; if it can be a class static, make it so; otherwise, leave it as a member function.

One immediate consequence of following that advice is that a class is stripped of anything that is not essential. Helper functions (both public and private to the class) are pushed out, and the meaning of the class is made more clear. The second consequence is that refactoring is a lot easier. By using non-member functions in anonymous namespaces whenever possible, we avoid the need to edit any header files at all, which results in faster compile and iteration times. Also, since those functions are completely independent of the class, they’re trivial to move around, promote to higher levels, etc. while member functions require some massaging to move around.

I’m suggesting that we keep functions small, but also that we keep classes small. We’re not going to end up with any less code than we would have before, but we’re going to end up with many more classes scattered all over our program. We have pushed the complexity from the function and class level up to the program organization and architecture level. The next article in this series will deal with how to manage this newfound complexity, but here’s a preview: Use hierarchies.

Timings, logging, and error handling

Ideally, I’d like my code to be really small, simple, and to the point so its meaning is always clear and obvious. Unfortunately, in the real world, that’s is not always possible. We can start with a very clean 5-line function, but then we add timings around it to get an idea of its performance, and logging statements to know what is going on, and error handling code to deal with the unexpected, and before we know it, we have a monster function whose meaning is lost in a mass of details.

What can we do about it? We could argue that timings should be done with non-intrusive methods such as using profilers, but that’s not always possible or desirable. We often want very specific timings that only affect a particular section of the code (load times), or we want to continuously monitor performance of certain sections while the game is running. For example, we probably want to get an idea of how much time we’re spending each frame on each of the major tasks: entity updates, AI calculations, physics, animation, rendering, sound, etc. We need to wrap each of those areas in timing statements. The more detailed we want those timings to be, the more junk we’ll have to add to the code. We can try making that as clean as possible by using macros that only require one line, but it’s still junk that obscures the meaning.

Logging doesn’t even have a non-intrusive counterpart unless you want to automatically add prologue and epilogue code to every function (which would generate way too many messages and would affect performance too much). So we’re stuck writing messages by hand to the log system.

Error handling at least as has the possibility of using exceptions, which add a lot less clutter than returning and checking error codes. Unfortunately, we have to deal with several real-world issues which often make exception-handling impractical. The first one is performance. As much as I hate bringing this up, exception handling can easily have a significant performance impact even in modern hardware. This is something that will hopefully go away in the near future as PC hardware and consoles become more powerful. The second problem is compiler support. Even today, some console manufacturers are admitting that exception-handling support in their compilers is not usable and we shouldn’t rely on it. So much for that idea. Finally, and probably most importantly, is complexity and programmer knowledge. If you have read Exceptional C++ you how complicated it can be to write exception-safe code, and not every C++ programmer out there today is going to feel comfortable using exceptions. All of this is a vicious circle, because until programmers demand robust and efficient exception handling, compiler and system creators aren’t going to provide them, which means a lot of programmers are not going to be exposed to them, etc, etc.

So, what great solutions do I have to this problem? None, I’m afraid. This is where I’d like to hear from some of you. How do you deal with timing, logging, and error handling in a way to minimize the clutter and leave your code as clear as possible? Email me and I’ll edit this article or make a new one with any great solutions I receive.

Striking a balance

In the end, even if we have the best goals in mind about simplicity, we will be forced to make compromises. Error handling is one case. Another case is often performance. Sometimes, having many scattered functions can add up to a noticeable performance hit in some inner loop of a performance-critical section. In that case simplicity needs to give way to practicality and do whatever is necessary. Just make really sure that having many functions is the cause of the performance problems before you make any changes.

It is also important not to confuse code simplicity with algorithmic simplicity. Writing simple code doesn’t mean that you have to use a silly bubble sort to order a list. Always choose the most appropriate algorithm for the task at hand, just implement it with the simplest possible code. Implementing it in a very simple way will also make possible to easily change it down the line with a more efficient algorithm.

The next article will deal with how to manage all the complexity we’ve pushed up into the program structure. Until then, keep your functions short, your classes clean, and remember The Humble Programmer:

“…We shall do a much better programming job, provided that we approach the task with a full appreciation of its tremendous difficulty, provided that we stick to modest and elegant programming languages, provided that we respect the intrinsic limitations of the human mind and approach the task as Very Humble Programmers. “

Comments are closed.