in C++

The Measure Of Code

I’ve gotten a lot of questions about how big our codebase is, how fast does it build, how many tests we have… Fear not, Gentle Reader, all your burning questions will be answered here.

Size

Charles and I were priding ourselves in keeping things small and minimal. But truth be told, it’s not like we were keeping track of how many lines of code we had written. Were things as small as we hoped they were?

The most convenient way of counting lines of code that I know is CLOC. It’s an extremely easy to use open source program which counts the lines of code in a codebase, gives very detailed information, strips out whitespace, breaks things down by language, and does just about everything you’d want from a program like that.

Running it on the latest version of our code (not including any 3rd party libraries) produces this:

    1621 text files.    1579 unique files.    3721 files ignored.
-------------------------------------------------------------------------------
Language                    files          blank        comment          code
-------------------------------------------------------------------------------
C++                            485          13577            303          46181
C#                             324           4935            712          22966
C/C++ Header                   407           4153             95          11975
MSBuild scripts                 18              0            126          1490
-------------------------------------------------------------------------------
SUM:                          1234          22665           1236          82612

Almost 60K lines of C++ code seemed very high. At first I thought it was because CLOC was counting files twice: once in their regular location and once in the .svn directory, but apparently it’s already removing all duplicates, so that wasn’t it.

Almost more scary than the amount of C++ code (which is all our runtime and some of our tools) is the amount of C# code. For a language that claims to be of significantly higher level than C++, that’s quite a mouthful of code!

Another surprising count in there is the number of lines with comments. Since we make heavy use of TDD, I really didn’t expect more than a couple dozen lines of code in the whole codebase. Still, I’m kind of proud that we have less than one line of code per file on average 🙂

Here’s a more detailed breakdown, with the line count just for our runtime (engine and game):

1089 text files.    1053 unique files.    2338 files ignored.
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C++                            441          11997            245          40943
C/C++ Header                   385           3964             90          11405
-------------------------------------------------------------------------------
SUM:                           826          15961            335          52348

and for our tools:

532 text files.     531 unique files.    1383 files ignored.
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C#                             324           4935            712          22966
C++                             44           1580             58           5238
MSBuild scripts                 18              0            126           1490
C/C++ Header                    23            199              5            591
-------------------------------------------------------------------------------
SUM:                           409           6714            901          30285

Tests

Then I realized that a good chunk of those were tests. So excluding all directories matching *Tests* gets the following result:

1206 text files.    1187 unique files.    4199 files ignored.
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C++                            283           6464            150          22464
C#                             213           2636            534          12402
C/C++ Header                   380           3824             94          10782
MSBuild scripts                 12              0             84            978
-------------------------------------------------------------------------------
SUM:                           888          12924            862          46626

A bit more than half the C++ code consisted of tests. That’s pretty consistent with my experience with TDD. C# seems to follow a similar percentage as well.

As for the exact number of tests, running a grep for TEST shows all the C++ tests:

C:\pow2>grep -r TEST SweetPea Engine Tools | grep -v svn | wc   2163    3620  221953

And doing the same thing with [Test] brings up all C# tests:

C:\pow2>grep -r \[Test\] SweetPea Engine Tools | grep -v svn | wc   735    1470  52717

That means that our average C++ test is about 11.5 lines long, and C# tests 14.4. Frankly, that sounds rather high. We make heavy use of fixtures whenever possible and each test usually only checks for a single condition (even if it involves a couple check statements). I suppose that number is higher than expected because it probably includes all the lines from #include statements and all the fixtures as part of the average.

Language Lines Non test lines Test lines % of non test code Number of tests Lines per test
C++ 58156 33246 24910 57% * 2163 11.5
C# 22966 12402 10564 54% 735 14.4

* If we only count cpp files, that goes down to 49%

I was curious about that last part of checking a single thing per test, so I ran a grep for the number of CHECK statements in our code:

C:\pow2>grep -r CHECK SweetPea Engine Tools | grep -v svn | wc   3886   15079  399598

That’s 1.8 CHECK statements per TEST, which is about right. Even though we’re checking for a single condition, we’ll often check a couple things about it (i.e. the camera stopped and it reached its final destination).

Build Times

So, given that amount of code, how long does it take to build it? Clearly it depends on your hardware. Since we’re not exactly rolling in money, we don’t have particularly powerful machines. Here at home, I’m using a modest Core 2 Duo E4300 (overclocked to 2.6 GHz) with fast memory and a relatively fast SATA hard drive, so that’s what I used for all my timings.

A full build of our game, plus all the libraries, all the tests, and running all the tests takes exactly 1 minute and 10 seconds. That’s pretty good for two reasons:

  • When we work with the game we don’t build and run the unit tests for the engine. We have a separate solution for that. A full build of just the engine, the game, and the game unit tests only takes 43 seconds.
  • The game itself is a fairly large project and devenv doesn’t know how to paralellize that build, so it’s only using half the available CPU power for about half the build time.

An incremental build after changing a single cpp file takes slightly over a second (including half a second of unit test execution).

As you can imagine, working with that codebase is a dream come true. Snappy, responsive. Nothing is hard enough that can’t be changed.

Unfortunately that’s where the fairy tale ends. The tools are another story altogether. Our C# tools, with all their unit tests, build in a mere 18 seconds, and the C++ tools in 1 minute and 10 seconds. That’s not too bad, except that it’s a surprisingly large amount of time for the C++ tools since there aren’t that many of them.

Here’s the kicker, doing another build without changing a single thing take 38 seconds. Whoa! We’re doing some C++/CLI trickery and apparently dependency checking is totally broken in VS2005 (either that, or we just don’t know how to set it up right).

Keeping things fast

What’s the secret of a lighting-fast build? Clearly, keeping the code size down is crucial. If your codebase is 2 million lines of code, builds are going to be painful no matter what. But they can be a little less painful with some gentle care.

One of the main build-time killers that we’re avoiding is the use of STL or Boost. Those libraries pull in everything and the kitchen sink, and their heavy use of templates make build and link time go through the roof. No thanks.

Our template use is pretty minimal. We have a couple containers (which I love and I’ll write about it one of these days) and that’s about it.

We’re pretty anal when it comes to keeping physical dependencies to a minimum. We forward declare aggressively, and we only include the headers that are necessary for each cpp file (PC Lint is “kind” of enough to remind us every time we have unnecessary #includes). We’re not using external include guards or #pragma once.

Precompiled headers are either not used, or kept to a minimum. I think the only project that uses them is the game and only for Havok headers. We don’t even have windows.h in a precompiled header (which would be a really bad idea because you’d be putting all the junk in windows.h available to your whole program).

Finally, we are using incremental links whenever possible. I remember a few versions of Visual Studio ago they were pretty broken, but they’re not giving us any problems. The only caveat is that if you modify a static library your program is linking with, it will force a full link. So they’re really only good for modifying the executable itself.

We’re not using any distributed builds. First of all, we don’t have enough computers to make it worthwhile. And second, I had horrible experiences with distributed builds in the past. They would help with a badly structured codebase, at the cost of longer incremental builds and mysterious spurious bad builds. Besides, once they’re in place, they tend to encourage even further disregard for keeping dependencies to a minimum.

How about you?

So, that’s it for the Power of Two codebase. How about you? Want to share your size, build times, or any other data?

  • Brandon

    Have you (or do you regularly) run a code coverage tool? TDD should show near 100% depending on how religiously it’s practiced. If not 100%, any patterns arising that explain the remaining percentage points?

  • Brandon,

    I’m not a fan of code coverage tools. To me, TDD is not about achieving 100% coverage, but about helping drive design. It might be an OK metric to run on a large team that is not sold on TDD to make sure things aren’t sliding, but I wouldn’t get anything from running it on my own code.

    And guess what, if I did run it, I doubt my code coverage would be higher than 80%. There is some glue code that is more of a pain to test than any benefits I get from testing it. Same thing with some “leaf” code that nothing depends on it.