I’ve gotten a lot of questions about how big our codebase is, how fast does it build, how many tests we have… Fear not, Gentle Reader, all your burning questions will be answered here.
Size
Charles and I were priding ourselves in keeping things small and minimal. But truth be told, it’s not like we were keeping track of how many lines of code we had written. Were things as small as we hoped they were?
The most convenient way of counting lines of code that I know is CLOC. It’s an extremely easy to use open source program which counts the lines of code in a codebase, gives very detailed information, strips out whitespace, breaks things down by language, and does just about everything you’d want from a program like that.
Running it on the latest version of our code (not including any 3rd party libraries) produces this:
   1621 text files.   1579 unique files.   3721 files ignored. ------------------------------------------------------------------------------- Language                    files         blank       comment          code ------------------------------------------------------------------------------- C++                           485         13577           303         46181 C#                            324          4935           712         22966 C/C++ Header                  407          4153            95         11975 MSBuild scripts                18             0           126          1490 ------------------------------------------------------------------------------- SUM:                         1234         22665          1236         82612
Almost 60K lines of C++ code seemed very high. At first I thought it was because CLOC was counting files twice: once in their regular location and once in the .svn directory, but apparently it’s already removing all duplicates, so that wasn’t it.
Almost more scary than the amount of C++ code (which is all our runtime and some of our tools) is the amount of C# code. For a language that claims to be of significantly higher level than C++, that’s quite a mouthful of code!
Another surprising count in there is the number of lines with comments. Since we make heavy use of TDD, I really didn’t expect more than a couple dozen lines of code in the whole codebase. Still, I’m kind of proud that we have less than one line of code per file on average 🙂
Here’s a more detailed breakdown, with the line count just for our runtime (engine and game):
1089 text files. 1053 unique files. 2338 files ignored. ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- C++ 441 11997 245 40943 C/C++ Header 385 3964 90 11405 ------------------------------------------------------------------------------- SUM: 826 15961 335 52348
and for our tools:
532 text files. 531 unique files. 1383 files ignored. ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- C# 324 4935 712 22966 C++ 44 1580 58 5238 MSBuild scripts 18 0 126 1490 C/C++ Header 23 199 5 591 ------------------------------------------------------------------------------- SUM: 409 6714 901 30285
Tests
Then I realized that a good chunk of those were tests. So excluding all directories matching *Tests* gets the following result:
1206 text files. 1187 unique files. 4199 files ignored. ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- C++ 283 6464 150 22464 C# 213 2636 534 12402 C/C++ Header 380 3824 94 10782 MSBuild scripts 12 0 84 978 ------------------------------------------------------------------------------- SUM: 888 12924 862 46626
A bit more than half the C++ code consisted of tests. That’s pretty consistent with my experience with TDD. C# seems to follow a similar percentage as well.
As for the exact number of tests, running a grep for TEST shows all the C++ tests:
C:\pow2>grep -r TEST SweetPea Engine Tools | grep -v svn | wc  2163   3620 221953
And doing the same thing with [Test] brings up all C# tests:
C:\pow2>grep -r \[Test\] SweetPea Engine Tools | grep -v svn | wc  735   1470  52717
That means that our average C++ test is about 11.5 lines long, and C# tests 14.4. Frankly, that sounds rather high. We make heavy use of fixtures whenever possible and each test usually only checks for a single condition (even if it involves a couple check statements). I suppose that number is higher than expected because it probably includes all the lines from #include statements and all the fixtures as part of the average.
Language | Lines | Non test lines | Test lines | % of non test code | Number of tests | Lines per test |
C++ | 58156 | 33246 | 24910 | 57% * | 2163 | 11.5 |
C# | 22966 | 12402 | 10564 | 54% | 735 | 14.4 |
* If we only count cpp files, that goes down to 49%
I was curious about that last part of checking a single thing per test, so I ran a grep for the number of CHECK statements in our code:
C:\pow2>grep -r CHECK SweetPea Engine Tools | grep -v svn | wc  3886  15079 399598
That’s 1.8 CHECK statements per TEST, which is about right. Even though we’re checking for a single condition, we’ll often check a couple things about it (i.e. the camera stopped and it reached its final destination).
Build Times
So, given that amount of code, how long does it take to build it? Clearly it depends on your hardware. Since we’re not exactly rolling in money, we don’t have particularly powerful machines. Here at home, I’m using a modest Core 2 Duo E4300 (overclocked to 2.6 GHz) with fast memory and a relatively fast SATA hard drive, so that’s what I used for all my timings.
A full build of our game, plus all the libraries, all the tests, and running all the tests takes exactly 1 minute and 10 seconds. That’s pretty good for two reasons:
- When we work with the game we don’t build and run the unit tests for the engine. We have a separate solution for that. A full build of just the engine, the game, and the game unit tests only takes 43 seconds.
- The game itself is a fairly large project and devenv doesn’t know how to paralellize that build, so it’s only using half the available CPU power for about half the build time.
An incremental build after changing a single cpp file takes slightly over a second (including half a second of unit test execution).
As you can imagine, working with that codebase is a dream come true. Snappy, responsive. Nothing is hard enough that can’t be changed.
Unfortunately that’s where the fairy tale ends. The tools are another story altogether. Our C# tools, with all their unit tests, build in a mere 18 seconds, and the C++ tools in 1 minute and 10 seconds. That’s not too bad, except that it’s a surprisingly large amount of time for the C++ tools since there aren’t that many of them.
Here’s the kicker, doing another build without changing a single thing take 38 seconds. Whoa! We’re doing some C++/CLI trickery and apparently dependency checking is totally broken in VS2005 (either that, or we just don’t know how to set it up right).
Keeping things fast
What’s the secret of a lighting-fast build? Clearly, keeping the code size down is crucial. If your codebase is 2 million lines of code, builds are going to be painful no matter what. But they can be a little less painful with some gentle care.
One of the main build-time killers that we’re avoiding is the use of STL or Boost. Those libraries pull in everything and the kitchen sink, and their heavy use of templates make build and link time go through the roof. No thanks.
Our template use is pretty minimal. We have a couple containers (which I love and I’ll write about it one of these days) and that’s about it.
We’re pretty anal when it comes to keeping physical dependencies to a minimum. We forward declare aggressively, and we only include the headers that are necessary for each cpp file (PC Lint is “kind” of enough to remind us every time we have unnecessary #includes). We’re not using external include guards or #pragma once.
Precompiled headers are either not used, or kept to a minimum. I think the only project that uses them is the game and only for Havok headers. We don’t even have windows.h in a precompiled header (which would be a really bad idea because you’d be putting all the junk in windows.h available to your whole program).
Finally, we are using incremental links whenever possible. I remember a few versions of Visual Studio ago they were pretty broken, but they’re not giving us any problems. The only caveat is that if you modify a static library your program is linking with, it will force a full link. So they’re really only good for modifying the executable itself.
We’re not using any distributed builds. First of all, we don’t have enough computers to make it worthwhile. And second, I had horrible experiences with distributed builds in the past. They would help with a badly structured codebase, at the cost of longer incremental builds and mysterious spurious bad builds. Besides, once they’re in place, they tend to encourage even further disregard for keeping dependencies to a minimum.
How about you?
So, that’s it for the Power of Two codebase. How about you? Want to share your size, build times, or any other data?
Have you (or do you regularly) run a code coverage tool? TDD should show near 100% depending on how religiously it’s practiced. If not 100%, any patterns arising that explain the remaining percentage points?
Brandon,
I’m not a fan of code coverage tools. To me, TDD is not about achieving 100% coverage, but about helping drive design. It might be an OK metric to run on a large team that is not sold on TDD to make sure things aren’t sliding, but I wouldn’t get anything from running it on my own code.
And guess what, if I did run it, I doubt my code coverage would be higher than 80%. There is some glue code that is more of a pain to test than any benefits I get from testing it. Same thing with some “leaf” code that nothing depends on it.