For small projects, we can blissfully code away without paying any attention to physical structure and we won’t be any worse off for it. However, as a project grows, it reaches a critical point where build times become unbearably slow. This article looks into the reasons for such slow build times and explores some techniques to speed things up.
You know one of the reasons I refuse to live near most cities in the US? Traffic. I’m not talking about the “this road is crowded” type of traffic that lets you zip along at a good speed, or even about the “I can’t get out of my lane because this is so packed” type of traffic that moves slowly along. No, I’ve come to accept and deal with that. It’s the “we might as well get out of the car and enjoy the sunshine because we aren’t moving” type of traffic that I can’t stand. Amazingly enough, it seems to happen around most major cities in the US during rush hour, and sometimes this “rush hour” stretches from 7AM until 8PM. It’s a bad sign when car manufacturers have advertisements telling you how much more comfortable you’ll be in their car when you’re stuck in traffic. Under those conditions it can easily take over an hour just to cover a distance of 3 or 4 miles.
Other than pointing out that you’re much better off walking, cycling, or using public transportation, what’s the point of all this and how does it relate to the physical structure of a program? There are some things that just don’t scale well. They appear to work perfectly fine for a small number of units, but as soon as a certain threshold is reached, things seem to bog down and eventually collapse under their own weight. Just adding more lanes doesn’t appear to solve the problem either, judging by the number of clogged-up 5-lane highways everywhere. Sometimes, you need to take a step back and deal with the problem in a different way. Either that, or buy a nice music system and enjoy your time in traffic.
We might not have much of a say over how traffic should be dealt with where we live, but we certainly have a lot of choices when it comes down to structuring our C++ source code. For small projects, we can blissfully code away without paying any attention to physical structure and we won’t be any worse off for it. However, as a project grows, it reaches a critical point, and compilation times start getting slower and slower, to the point where tiny changes could make you wish you were stuck in traffic instead of staring powerlessly at your monitor. Adding a faster CPU, more memory, or a better hard drive can help make things faster, but is usually not a good long-term solution.
We are usually concerned with the time for two types of builds:
- Full builds. In this case we care about the time it takes to build the whole project from scratch, starting from a totally clean build. This situation comes about when we just want to use the result of the build of a project we’re not actively modifying. For example, an automated build machine will most likely be doing full builds of the game, so the turnaround time before a build is ready will depend on the full build time. Another example might be if you need to link your code with a library for which you have the source code.
- Minimal builds. Once we have done a full build on a project, we then make a very small change to its source code and build it again. That’s the time for a minimal build. This is what you really care about when you’re actively working on a project, making modifications and compiling constantly. Ideally, building the project after a small change should require very little time This allows for very fast turnaround time for debugging, or even to get feedback from the compiler on silly syntax errors we just typed.
Improving the physical structure of a program often reduces the time of both types of builds. Unfortunately things don’t always work out so neatly and there are times where some changes will make one type of build faster and the other slower. Understanding what affects each compilation time allows us to optimize our compilation strategy and strike a balance that fits our needs.
Clearly, the time for both types of builds depends on the number of files and the complexity of those files. Both types of builds are also affected by the number of files each file depends on (the number of #include statements in each file). However, as we’ll see in a moment, in the case of a full build there is the chance of caching the includes of some files and reusing them for other files.
There is something very different about minimal builds. Their build time is usually dominated by the number of files that depend on the modified files. In the worst situation, every file will depend on the file that changed and a full build will be triggered. In the ideal case, only the file with the changes itself will be compiled and no other files will have been affected. In one case the build could take less than a second, and in the other it could easily take multiple hours.
The rest of this article will look at different techniques to reduce build times and how they affect each of those two build types.
It is easy to underestimate how quickly include statements can compound. If file A includes file B, and file B includes file C and D, every time someone includes file A they’re including three other files for the ride. Add a few more levels of inclusion with header files including many other header files, and you have a recipe for disaster (or for really long build times at least).
As an experiment, I added one more feature to the script I wrote last week. The script analyzes a set of source code files and determines how many times each file is included by other files in a recursive way. So, in our trivial example above, file C will be reported as being included twice (once by B directly, and once by A indirectly). I then decided to test it on the source code for a high-level game library (I’m not going to be any more specific since it wasn’t particularly good code and it had a pretty hideous physical structure). I wouldn’t be surprised if it’s not very different from the level of complexity of a lot of game code out there. As a point of reference, the library was composed of 300 cpp files and 312 header files.
Before I ran the script, I tried to guess how many times the most included file in the whole library was included by other files. My guess was around 600 times, just because I knew that the physical structure of that code wasn’t pretty. I figured maybe almost half the files included that one header file, and a few others included it indirectly. Boy was I wrong! Here are the shocking results:
|Top included files|
That means that during the course of a full build for those 300 cpp files, one header file could be included over 10,000 times! No wonder this particular library seemed to take a long time to compile. Notice that the other top files quickly drop to being included around 800 times each (which is still even higher than my initial estimate).
As a comparison, I tried running that same script on another, much smaller library, but also one with a much better physical structure and many fewer dependencies between files. This second library was only made up of 33 cpp files and 39 header files. The most included file was only included a total of 23 times (with the second one being included less than 10 times). So, having the number of classes grow by a factor of 10, caused the number of includes to grow by a factor of 1000. Clearly not a very scalable situation.
Things aren’t quite that bad though. Header files typically have a set of include guards in them, to prevent the compiler from adding duplicate symbols if it encounters the same header file multiple times during the compilation of one cpp file. This is what include guards look like:
// Normal code goes here, even other #include statements if necessary
With every header file having include guards around it, I turned on the /showincludes switch in Visual C++ and performed a full build. The total number of includes during the course of building the 300 classes in the library was an astounding 15,264. Better than the worst-case-scenario we calculated earlier, but still tremendously high.
Apparently some C++ compilers try to optimize this situation by automatically caching header files and avoiding hitting the disk to reload them over and over. Unfortunately, there is very little hard data about that, and you’re always at the mercy of your current compiler writer. Was that true for Visual Studio .NET 2003?
To test if I could speed up the compilation any, I added redundant include guards to the whole library. Redundant include guards are like the regular include guards, but they are placed around the actual #include statement. I first saw them mentioned in the book Large Scale C++ Software Design by John Lakos (written in 1996), but popular wisdom claims that they are unnecessary with modern compilers. Well, time to test that.
This is what redundant include guards look like:
I wrote a quick script to add redundant guards to all the source code and did a full build again. The number of includes reported by the compiler went down to 10,568 (from over 15,000). That means that there were about 5,000 redundant includes in a full build. However, the overall build time didn’t change at all.
Result: Zero. Apparently Visual Studio .NET 2003 (and probably most of the major compilers) does a pretty good job caching those includes by itself.
Recommendation: Stay away from redundant include guards. I never liked having the including files know about the internal define, and if the guard ever changes it can easily break things. Besides, the code looks a lot messier and unreadable. It might have been worth it if we could define #include to expand to a redundant guard automatically, but I don’t think that’s possible with the standard C preprocessor.
Just in case, I decided to test another strategy and see if I obtained similar results. Instead of using redundant include guards, I added the #pragma once preprocessor directive to all header files. Visual C++ will treat files with that directive differently and it’ll make sure that those files are only included once per compilation unit. In other words, it accomplishes the same thing as the external guards, just in a non-portable way. Here’s another really simple script to add #pragma once to all the header files.
Result: No difference. Just as with redundant include guards, it seems that the compiler was smart enough already to optimize that case.
Recommendation: Don’t bother with it. It’s a non-standard construct that doesn’t get any apparent benefit. If you still feel compelled to use it, at least wrap it up with #ifdef checks for the correct version of Visual Studio.
During a full build, every cpp file is treated as a separate compilation unit. For each of those files, all the necessary includes are pulled in, parsed, and compiled. If you look at all the includes during a full build, you’re bound to find a lot of common headers that get included over and over for every compilation unit. Those are usually headers for other libraries that the code relies on, such as STL, boost, or even platform-specific headers like windows.h or DirectX headers. They are usually also particularly expensive headers to include because they tend to include many other header files in turn.
From our findings in the previous two sections, it is clear that some compilers cache the headers encountered for each compilation unit. However, they don’t do anything about duplicated headers found across multiple cpp files, and that’s where precompiled headers come in.
When using precompiled headers, we can flag a set of headers as being part of the precompiled set. The compiler will then process them all at once and save those results. Every compilation unit will then automatically include all the headers that were part of the precompiled set at the very beginning, but at a much lower cost than parsing them from scratch every time.
The catch is that if any of the contents of the precompiled headers changes, a full rebuild is necessary to compile the program again. This means that we should only add headers that are included very often throughout our project but that don’t change frequently. Perfect candidates are the ones we mentioned earlier: STL headers, boost, and any other big external APIs. I always prefer not to include any headers from the project itself, although if you have a header that is included in every file, you might as well include it in the precompiled set (or, even better, change it so it’s not included everywhere and improve the physical structure).
The gains from using precompiled headers are quite dramatic. The game library we mentioned in an earlier section took over 14 minutes to compile without pre-compiled headers, but only 2:30 when using them. Those are huge savings! Minimal rebuilds are also improved because we avoid parsing some of the common headers for one file, but the results aren’t as dramatic as for full builds.
Precompiled headers are not without their downside though. The first problem is that precompiled headers often end up forcing the inclusions of more headers than it is absolutely necessary to compile each individual file. Not every file needs <vector> or <windows.h> included, but since a fair amount of them do and those are considered expensive includes, they’ll invariably end up in the precompiled header section. That means that any compilation unit taking advantage of precompiled headers will be forced to include those as well. Logically, the program is the same, but we have worsened the physical structure of the source code. In effect, we are trading extra physical dependencies between files for a faster compile time.
The second problem is that precompiled headers are not something you can rely on from compiler to compiler and platform to platform. The only compilers I’m aware of that implement them are Microsoft’s Visual C++ and Metrowerks’ CodeWarrior (although I just did a Google search and apparently gcc also supports precompiled headers–that’s great news!). For the rest of us using different compilers, we’re out of luck as far as this technique goes. Considering how important multi-platform development is becoming in the games industry (and elsewhere), this is a big blow against them.
Finally, by far the worst aspect of precompiled headers, is what happens when you combine the first two problems: Take a set of source code developed on a compiler where precompiled headers were available, and try to build it on a different platform. The code will compile since everything is standard C++, but it’ll compile at a glacial pace. That’s because every file is including a massive precompiled header file, and it is being parsed over and over for every compilation unit without taking advantage of any optimizations in the part of the compiler. If the code had been developed without precompiled headers in the first place, each file would only include those headers that it absolutely needs to compile, which would result in much faster compile times.
So earlier, when I said that the game library without precompiled headers took over 14 minutes to build, that’s because it was written with precompiled headers in mind. Otherwise, I estimate it would only take about 5-6 minutes (still much longer than the 2:30 it took with precompiled headers).
Personally, I have never yet worked on a set of code that was developed to be compiled on multiple platforms and where some of them did not support precompiled headers. I suppose the best approach is to use #ifdefs to only include the precompiled headers in one platform and the minimal set of includes for the rest, but it seems like an extremely error-prone approach where programmers are going to be breaking the other platform’s builds all the time. I’d be interested to know how teams working in such an environment deal with it.
Result: Huge gains both for full builds and minimal builds if your compiler supports them. Much worse physical structure.
Recommendation: Definitely use them if you’re only compiling in a platform that supports them. If you need to support multiple platforms, the gain is still too big to pass up. It probably is worth if you manage to separate the includes for precompiled headers with lots of #ifdefs and try to keep the physical structure sane for platforms that don’t support them.
Single Compilation Unit
This is an interesting trick that you won’t find in most books. I first read about it in the sweng-gamedev mailing list about a couple of years ago. Be warned, this is hackish and ugly, but people claimed really good results. I just had to find out for myself how it stacked up against the other techniques to reduce build times.
This technique involves having a single cpp file (compilation unit) that includes all the other cpp files in the project (yes, that’s right, cpp files, not header files). To compile the project we just compile that one cpp file and nothing else. The contents of this file are simply #include statements including all the cpp files we’re interested in. Something along these lines:
As you can imagine by now, I wrote a script to create that file from a directory containing the source code for a project. I just created a file including all the cpp files in that directory, although a better way of doing it would be to parse the make (or project) file and only include those files that are actually part of the project. That way, as I discovered, you avoid including outdated files or files that are in that directory but are not part of the project.
I created this file (everything.cpp), compiled it and… get ready: The build time went down from 2:32 minutes to 43 seconds! That’s a 72% decrease in build time!! Not only that, but the .lib file it created from that library went from 42MB down to 15MB, so it should help with link times down the line. People in the mailing list reported even better results with gcc than with Visual Studio.
What is the reason for such reduction in build times? I can only speculate. I suspect part of it is due to avoiding the overhead of starting and stopping the compiler for every compilation unit. However, the biggest win probably comes from the reduced number of included files. Because everything is one compilation unit, we only include every file once. The second time any other file attempts to include a particular header file, the compiler will have already cached it (and it’ll have include guards so there’s no need to parse anything). To test this theory, I again turned on the /showincludes switch. Indeed, the number of includes during a full build went down from 10,568 to 3,197. That’s a 70% reduction of included files, which is, probably not coincidentally, the same reduction in build time.
One very interesting observation from this experiment is that build times are probably more dependent on the number of actual includes performed by the compiler than I thought at first. All the more reason to keep a really watchful eye on the physical structure of the program. The third part of this article will cover what architectural choices we can make to improve the physical structure and keep the overall number of includes down.
Unfortunately this method also has its share of problems. One of the biggest problems is that there is no such a thing as a minimal build anymore. Any modification to any file will cause a full rebuild. Of course, the full build takes a only a fraction of the time it took before, so this might not be much of an issue.
As with precompiled headers, we’re adding a lot of physical dependencies between files. In this case, files will have a physical dependency with any files that were included before it in the everything.cpp file.
However, the most objectionable of all problems is that we can now run into naming conflicts. Before we assumed each cpp file was a separate compilation unit. Now they’ve all been forcefully added to the same one. Any static variables or functions, or anything on an anonymous namespace will be available to every cpp file that comes after it on the large everything.cpp file. This means that there’s potential for having conflicting symbols, which is one of the things that anonymous namespaces were supposed to solve in the first place. If you have decided to use this technique, you will want to keep everything as part of a separate namespace or part of the class itself and avoid global-scope symbols completely.
Result: Huge improvement in full-build times, but minimal-build times become much worse. Potential for clashing of static and anonymous namespace symbols.
Recommendation: The gains of this technique are simply huge so it would be a shame to ignore it. It is probably no good for regular builds, but you might want to have it as an option when you just care about doing full builds (automated build machine or building someone else’s code). If so, make sure to wrap symbols in namespaces or classes.
The next (and final, I promise) part of this article will look at architectural choices that can greatly influence build times as well as looking briefly at link times and see what we can do about them.