The Measure Of Code

I’ve gotten a lot of questions about how big our codebase is, how fast does it build, how many tests we have… Fear not, Gentle Reader, all your burning questions will be answered here.

Size

Charles and I were priding ourselves in keeping things small and minimal. But truth be told, it’s not like we were keeping track of how many lines of code we had written. Were things as small as we hoped they were?

The most convenient way of counting lines of code that I know is CLOC. It’s an extremely easy to use open source program which counts the lines of code in a codebase, gives very detailed information, strips out whitespace, breaks things down by language, and does just about everything you’d want from a program like that.

Running it on the latest version of our code (not including any 3rd party libraries) produces this:

    1621 text files.    1579 unique files.    3721 files ignored.
-------------------------------------------------------------------------------
Language                    files          blank        comment          code
-------------------------------------------------------------------------------
C++                            485          13577            303          46181
C#                             324           4935            712          22966
C/C++ Header                   407           4153             95          11975
MSBuild scripts                 18              0            126          1490
-------------------------------------------------------------------------------
SUM:                          1234          22665           1236          82612

Almost 60K lines of C++ code seemed very high. At first I thought it was because CLOC was counting files twice: once in their regular location and once in the .svn directory, but apparently it’s already removing all duplicates, so that wasn’t it.

Almost more scary than the amount of C++ code (which is all our runtime and some of our tools) is the amount of C# code. For a language that claims to be of significantly higher level than C++, that’s quite a mouthful of code!

Another surprising count in there is the number of lines with comments. Since we make heavy use of TDD, I really didn’t expect more than a couple dozen lines of code in the whole codebase. Still, I’m kind of proud that we have less than one line of code per file on average :-)

Here’s a more detailed breakdown, with the line count just for our runtime (engine and game):

1089 text files.    1053 unique files.    2338 files ignored.
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C++                            441          11997            245          40943
C/C++ Header                   385           3964             90          11405
-------------------------------------------------------------------------------
SUM:                           826          15961            335          52348

and for our tools:

532 text files.     531 unique files.    1383 files ignored.
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C#                             324           4935            712          22966
C++                             44           1580             58           5238
MSBuild scripts                 18              0            126           1490
C/C++ Header                    23            199              5            591
-------------------------------------------------------------------------------
SUM:                           409           6714            901          30285

Tests

Then I realized that a good chunk of those were tests. So excluding all directories matching *Tests* gets the following result:

1206 text files.    1187 unique files.    4199 files ignored.
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C++                            283           6464            150          22464
C#                             213           2636            534          12402
C/C++ Header                   380           3824             94          10782
MSBuild scripts                 12              0             84            978
-------------------------------------------------------------------------------
SUM:                           888          12924            862          46626

A bit more than half the C++ code consisted of tests. That’s pretty consistent with my experience with TDD. C# seems to follow a similar percentage as well.

As for the exact number of tests, running a grep for TEST shows all the C++ tests:

C:\pow2>grep -r TEST SweetPea Engine Tools | grep -v svn | wc   2163    3620  221953

And doing the same thing with [Test] brings up all C# tests:

C:\pow2>grep -r \[Test\] SweetPea Engine Tools | grep -v svn | wc   735    1470  52717

That means that our average C++ test is about 11.5 lines long, and C# tests 14.4. Frankly, that sounds rather high. We make heavy use of fixtures whenever possible and each test usually only checks for a single condition (even if it involves a couple check statements). I suppose that number is higher than expected because it probably includes all the lines from #include statements and all the fixtures as part of the average.

Language Lines Non test lines Test lines % of non test code Number of tests Lines per test
C++ 58156 33246 24910 57% * 2163 11.5
C# 22966 12402 10564 54% 735 14.4

* If we only count cpp files, that goes down to 49%

I was curious about that last part of checking a single thing per test, so I ran a grep for the number of CHECK statements in our code:

C:\pow2>grep -r CHECK SweetPea Engine Tools | grep -v svn | wc   3886   15079  399598

That’s 1.8 CHECK statements per TEST, which is about right. Even though we’re checking for a single condition, we’ll often check a couple things about it (i.e. the camera stopped and it reached its final destination).

Build Times

So, given that amount of code, how long does it take to build it? Clearly it depends on your hardware. Since we’re not exactly rolling in money, we don’t have particularly powerful machines. Here at home, I’m using a modest Core 2 Duo E4300 (overclocked to 2.6 GHz) with fast memory and a relatively fast SATA hard drive, so that’s what I used for all my timings.

A full build of our game, plus all the libraries, all the tests, and running all the tests takes exactly 1 minute and 10 seconds. That’s pretty good for two reasons:

  • When we work with the game we don’t build and run the unit tests for the engine. We have a separate solution for that. A full build of just the engine, the game, and the game unit tests only takes 43 seconds.
  • The game itself is a fairly large project and devenv doesn’t know how to paralellize that build, so it’s only using half the available CPU power for about half the build time.

An incremental build after changing a single cpp file takes slightly over a second (including half a second of unit test execution).

As you can imagine, working with that codebase is a dream come true. Snappy, responsive. Nothing is hard enough that can’t be changed.

Unfortunately that’s where the fairy tale ends. The tools are another story altogether. Our C# tools, with all their unit tests, build in a mere 18 seconds, and the C++ tools in 1 minute and 10 seconds. That’s not too bad, except that it’s a surprisingly large amount of time for the C++ tools since there aren’t that many of them.

Here’s the kicker, doing another build without changing a single thing take 38 seconds. Whoa! We’re doing some C++/CLI trickery and apparently dependency checking is totally broken in VS2005 (either that, or we just don’t know how to set it up right).

Keeping things fast

What’s the secret of a lighting-fast build? Clearly, keeping the code size down is crucial. If your codebase is 2 million lines of code, builds are going to be painful no matter what. But they can be a little less painful with some gentle care.

One of the main build-time killers that we’re avoiding is the use of STL or Boost. Those libraries pull in everything and the kitchen sink, and their heavy use of templates make build and link time go through the roof. No thanks.

Our template use is pretty minimal. We have a couple containers (which I love and I’ll write about it one of these days) and that’s about it.

We’re pretty anal when it comes to keeping physical dependencies to a minimum. We forward declare aggressively, and we only include the headers that are necessary for each cpp file (PC Lint is “kind” of enough to remind us every time we have unnecessary #includes). We’re not using external include guards or #pragma once.

Precompiled headers are either not used, or kept to a minimum. I think the only project that uses them is the game and only for Havok headers. We don’t even have windows.h in a precompiled header (which would be a really bad idea because you’d be putting all the junk in windows.h available to your whole program).

Finally, we are using incremental links whenever possible. I remember a few versions of Visual Studio ago they were pretty broken, but they’re not giving us any problems. The only caveat is that if you modify a static library your program is linking with, it will force a full link. So they’re really only good for modifying the executable itself.

We’re not using any distributed builds. First of all, we don’t have enough computers to make it worthwhile. And second, I had horrible experiences with distributed builds in the past. They would help with a badly structured codebase, at the cost of longer incremental builds and mysterious spurious bad builds. Besides, once they’re in place, they tend to encourage even further disregard for keeping dependencies to a minimum.

How about you?

So, that’s it for the Power of Two codebase. How about you? Want to share your size, build times, or any other data?

  • Ismail

    Cool! Especially in game programming where everybody just cares about the product, but not the process. If I am not mistaken your approach is very similar to the approach mentioned in the book “Refactoring: Improving the Design of Existing Code”

    And, may I humbly suggest using Visual Assist X’s (X part is important) “rename” refactoring instead of “replace-in-files”, because it works like a charm. Also with “extract method”, “move implementation to source file” they also work good. Enough advertising for Visual Assist X.

    Thanks for you reply :)

  • noel

    I’ve really wanted to like Visual Assist, but it seems that in the last several years, Visual Assist slows down the Visual Studio editor very significantly. To the point that I can type faster than the VA can keep up. Is it any faster these days? Maybe I’ll give it a try whirl again.

    There are a handful of features I’d love to use from VA, but I need to turn the other features off. For some reason, they don’t give you the option to disable most features and only turn on a few!

  • noel

    Robin, I haven’t looked at Scons and Jam since a couple of years ago, when I wrote this article about build systems. Have they fixed the horrible dependency checking in Scons that was causing incredibly slow incremental builds? If so, I’d really like to revisit it.

    I’d also be curious to find out how easy it would be to extend something like Scons to do asset building (with totally custom rules).

  • Robin Green

    They added file hash caches and a lot of fixes, so the “yeah, but SCons is slow” argument has gone. It also arguably handles parallel builds better than Jam with less serialization if there are close dependencies or autogenerated files, but we’ve never pushed it that far. No fully distributed builds for SCons yet. Handy for me, Stephen Knight is just upstairs!

  • Steve Cothern

    Hey Robin (and Hi Noel!),

    I remember reading Noel’s Games From Within series on build tools a while ago and the main detractive thing he had to say about Jam was regarding its complexity. Well, that and the fact that “…Jam is dead…”; hence also my interest in your comment about an avid developer community. Serrendipitously in the last couple of days I ran across mention of KJam. Don’t really know much about either (with respect to the latter other than the main site, I found this article which looks to be by an author) but it would be interesting to hear what an experienced Jam user has to say about it if you have an opinion.

  • Robin Green

    I certainly found plenty to read when I was getting up to speed, and everyone I have met who have used Jam have raved about it. But you are right, it is classic Jam dead as an active open source project – dead but used daily – and KJam is in permanent beta since 2006.

  • Ilya

    Been quite a while since your last post, sounds like you’ve been spending one too many a day on the beach. How are things going?

  • noel

    Yes, we’re still alive. Has it really been over a month since our last post? Yikes! Things have been very busy, and writing my column for Game Developer Magazine has been taking most of my writing time. Don’t worry, we’ll have new updates soon. Really…. :-)

  • Ismail

    It’s a bit off-topic but I was just reading Noel’s old articles at http://gamesfromwithin.com about refactoring tools. So, which tool do you use for refactoring, or do you use any tool at all?

    By the way, best luck with “Power of Two”. You are really an inspiration for all indies.

  • chas

    Hey Ismail-

    I guess it depends on the kind of refactoring.  In C++, most of our ‘rename’ refactoring is done using VS2005′s built-in "replace in files" functionality.  Amazingly enough, this purely syntactic approach usually gets us about 90-95% of the way through it and leaves us with just a few corner cases that require manual attention.  I also use Visual Assist for simple refactorings.

    For bigger semantic refactorings, we usually end up following a fairly simple two-step pattern:

    1. Get the target code compiling.  Start at the area of code you want to change, make your change, and try to compile.  At least in our codebase, the compiler is great for telling us what we need to do next.  Depending on how fundamental the refactoring is (and some of them have been _very_ foundational), this can take a while.  When you’re done, the game at least compiles.

    2.  Get the tests compiling.  Semantic refactorings almost always physically break tests.  This is where you get to see the effects of your refactoring, and how it can change your original physical design in ways you didn’t necessarily expect.

    3.  Get the tests passing.  This one can take a while, and usually contains a bunch of "oh wow" moments, where you realize, inevitably, that your changes had ramifications that you didn’t even consider.  This phase usually involves refactoring your original refactoring into a more refined and informed expression of what you originally intended.

    4.  Write new tests.  Often refactorings like this will introduce new code, or pulling out chunks of functionality into new modules.  I usually go back and retro-test these after the fact, just to keep the test coverage high.

     

    Really though, it all boils down to the tests.  We’ve found that the two most useful purposes of having unit tests around are to help flesh out the design of a module, and to aid in refactorings.  It’s really hard to understand the ramifications of your changes in a large and complex system, and any help you can get that makes things more visible is 100% worth it.  At least to us, anyway!

    -charles

  • http://dev-enter.com Bart van Deventer

    I wonder, if you disable project parallelism in favour of the /MP switch, does that make a difference? I don’t have a large enough project to test it on here right now.

    Still, it’s nice to see you can now build the Engine and Game in less time than it previously took just for the game!

  • noel

    Given how the CPU is totally pegged at 100% during the build of the large game project, I suspect the best build time would be achieved by putting all the engine and game code in a single project. That would be a bit of a pain because we would need to maintain different project files (one for the game, one for each library), so probably the small build-time gains wouldn’t outweight the drawbacks.

    Let me try it right now for the heck of it… Yeah, it’s a bit faster, but not as much as I expected. Down to 30 seconds from 36 seconds. Definitely not worth it in this case, but I could see how it could make a big difference for some projects.

    Actually, I just realized you were probably not suggesting creating one mega project, but simply turning on /MP on all the projects and disabling the project-level paralellism. Trying right now… 35 seconds. So your hunch was right. It’s a bit faster. And it would probably be significantly faster if the libraries were larger, but they’re pretty small. Still, I think I like it better with the /MP switch. It feels more "controlled" somehow to know that the compiler is doing a single library before moving to the next one.

    And while I’m it, let me try the opposite: No /MP switch, but crank up the number of parallel project builds to 4. And the results are… 33 seconds.

    Now, this is not a very good test case, because we have 16 smallish libraries, and one large game project. Still, considering that 30 seconds is probably the best possible paralellization, being at 33 with no extra effort is pretty good.

  • Robin Green

    We have had a lot of success using the “make” alternatives SCons and Jam. SCons runs under Python, making Windows setup a smidge little trickier, but it does scientifically precise minimal builds by hashing files (including the output .obj files) to see whether they have changed and it needs to recompile or relink them. The upside: you get to specify how to build your project using a real programming language with a lot of cross-platform build tool smarts built in. The downside: It can sometimes not show much speed advantage over Visual Studio.

    Jam does much the same thing, relying on timestamps rather than direct hashing, and runs like lightning. Null builds are immediate returns, partial builds are just plain snappy compared to Visual Studio. The upside: speed. Jam is FAST, free, cross platform and has an avid developer community. The build intelligence is built into the Jambase language base so everything is up for grabs. The downside: There are three versions of Jam out there (ftjam, classic Jam and Boost Jam) with little to differentiate them that anyone but the most super advanced user would ever care about (start with ftjam, it understands VC8 better). The Jamfile language is kinda funky and documentation is scattered over the web. Basic use is simple to start with but has subtleties that take time to fully Grok – its’ a Prolog-like declarative, Python-like functional, string-list based language. Once you do get it, it’s full steam ahead.

    We use SCons on our projects at work, but at home I use Jam and have promised myself never to use Make again. It’s great to be able to list a bunch of .cpp files and ask the build system to “go make a SharedLibrary() out of these”, have it find the interdependencies automatically and have the build script work on multiple platforms without any rewriting on your part.

  • http://dev-enter.com Bart van Deventer

    In VS2K5 (I think it requires SP1), you can use the /MP switch added to: “C/C++ -> Command Line -> Additional Options” per project to get parallel builds going.

    You can find more information on this switch here: http://msdn.microsoft.com/en-us/library/bb385193.aspx
    Note: The page refers to VS2K8, but this is also an undocumented feature in VS2K5.

  • Ben Straub

    This has worked well for us, but the real gain was when we figured out how to combine parallel builds with a precompiled header for all the STL and Boost we use. A full build of one part of our product went from 4:30 to :31 on an 8-core machine.

  • noel

    Thanks for the tip! I knew about the /M switch (and the fact that it was broken in an earlier version of VCBuild), but I didn’t know about /MP.

    By using /MP2 it pegs the CPUs to 100% load and I was able to bring down the build times just for the game project from 37 seconds to 24 seconds. Unfortunately the gains are less when building both the engine and the game since there was already some project-level parallelism. Still, build times went down from 43 seconds to 36 seconds. Not too bad!

  • Brandon

    Have you (or do you regularly) run a code coverage tool? TDD should show near 100% depending on how religiously it’s practiced. If not 100%, any patterns arising that explain the remaining percentage points?

  • http://www.gamesfromwithin.com Noel

    Brandon,

    I’m not a fan of code coverage tools. To me, TDD is not about achieving 100% coverage, but about helping drive design. It might be an OK metric to run on a large team that is not sold on TDD to make sure things aren’t sliding, but I wouldn’t get anything from running it on my own code.

    And guess what, if I did run it, I doubt my code coverage would be higher than 80%. There is some glue code that is more of a pain to test than any benefits I get from testing it. Same thing with some “leaf” code that nothing depends on it.