One aspect of the scientific process is publishing detailed experiment descriptions and results so that they can be independently verified by other scientists. That’s exactly what I decided to do after reading Kyle Wilson’s surprising results in his article “Experiments with Includes.”
Kyle found that using internal include guards was significantly slower than using #pragma once with the C++ compiler in Microsoft Visual Studio 2003 and 2005 beta. That just flies in the face of the measurements I had done before and what other people had reported. How was it possible?
The first thing that struck me when looking at Kyle’s results is that he’s using an extremely pathological case: 200 header files, each of them including all 200 headers. Still, the point is to measure header include performance, so maybe that was fine. More on that later.
The first thing I did was to write a quick script to generate a set of header files and main file given some parameters. I took the chance to write in Python, which displaced Perl as my favorite scripting language a few months ago. The script will generate a certain number of header files, using either no guards, internal guards, external guards, or pragma statements. In order to be able to compile no guards (and to keep things a bit more realistic), I actually had each header file include all the header files with numbers higher than itself. Feel free to download the script and play with it if you want to run the measurements yourself.
Since I’m running Linux at home, I first ran the experiment with gcc 3.4.1. First I tested no guards, but I had to keep it to a set of 20 header files to avoid taking forever with an astronomical number of includes (they really add up combinatorially!). As expected, even a 20×20 half-matrix of includes too a while to compile: 2+ minutes. The same set of headers with internal include guards was just a blip at 0.15s. So far, so good. That’s what I expected.
Now I cranked things up to 200 headers like Kyle had done. Here comes the first surprise of the day: gcc blows up trying to parse that many includes. I get a “#include nested too deeply†error. Digging through the gcc documentation I wasn’t able to find any way to increase that depth. That made me realize that having include chains of 200+ headers is completely unrealistic. I knew that it was artificially large, but it’s probably so by an order of magnitude at least.
Still, just for the test, I decided to go ahead. It turns out it was choking very near the end, so doing a run with 195 headers worked just fine. The results were what I would have expected. A bit better actually, and certainly much better than the results Kyle saw in Visual Studio: all the runs (internal guards, external guards, and pragma) took about the same time (0.07 seconds), and each of them included exactly 195 headers. No more, no less. That’s exactly how I would expect the compiler to behave.
gcc 3.4.1 (Linux with 2.6 kernel). 195 headers.
- Internal guards: 0.07s
- External guards: 0.09s
- Pragma directive: 0.07s
At work, I’m not as lucky, and I have to use Windows and Microsoft Visual Studio, so I ran the second set of tests there. The results confirmed what Kyle reported: The internal guards were very slow (14+ seconds), the external guards where blazingly fast (0.37 seconds), and the pragma directive was somewhere in between (9.6 seconds).
Microsoft Visual Studio 2003. 195 headers.
- Internal guards: 14.7s
- External guards: 0.37s
- Pragma directive: 9.57s
I was pretty amazed to see that. It seems like a major flaw in the Visual C++ compiler, doesn’t it?
To round out the tests, since I had the Metrowerks PS2 compiler handy, I decided to run the same set of tests with it. It turns out that it completely blew up, complaining of includes nested too deeply with the project with 195 headers. To my surprise, I had to lower the number of includes to 30 in order to be able to compile it at all. Anything over 30 would cause it to blow up.
That made me think again about the worst cases I would see in a typical project, and I realized that having over 30 includes deep at once is probably very, very rare, even if you’re using STL or Boost, which make heavy use of header files.
Just for the sake of completeness, I decided to run the tests with just 30 includes and see if I could discern any patterns from the results. It turns out that Metrowerks is pretty slow overall, but at least the differences between the three approaches are minimal.
Metrowerks PS2 compiler v3.0. 30 headers.
- Internal guards: 0.60s
- External guards: 0.46s
- Pragma directive: 0.49s
I ran the same set of tests with Visual Studio and it showed the external includes being the clear winner, but internal and pragmas being almost the same. I wonder if Visual C++ uses some dynamic algorithm that avoids having a fixed hard limit on include depth at the cost of runtime performance. I’ll take a fixed depth and flawless behavior like gcc any day personally.
Microsoft Visual Studio 2003. 30 headers.
- Internal guards: 0.48s
- External guards: 0.14s
- Pragma directive: 0.41s
gcc was so fast with such a tiny project that I couldn’t reliably measure it. But if the 195 includes was taking 0.07 seconds, you can guess how fast it churned through just 30 header files.
Conclusion
It does seem that Microsoft Visual Studio has some major problems with includes in pathological situations. On the other hand, it also seems that those situations are completely unrealistic and will never happen in a real code base. For a more realistic situation (30×30 half matrix), internal guards and pragmas are the same, and external guards have somewhat of an edge.
gcc behaved like a real champion by being super fast and efficient no matter what technique you threw at it. Way to go! The Microsoft compiler certainly could stand some improvement in that area.
Metrowerks was the slowest of the bunch, but it dealt with all the different techniques just fine for a relatively small set of tests.
Overall, there should be no real difference between using internal guards and pragma directives (which happily confirms some of the measurements we had done in real code bases). So stick with internal guards, which are standard and work on any compiler, but if adding a #pragma once directive surrounded by conditionals (because it’s not standard and it’s not supported in all the compilers) in addition to the internal guards would make you sleep better, go for it. External guards might have an edge with Visual Studio, but the pain and potential trouble of using external guards clearly outweighs any speed benefits gained by using them.
ahh good stuff I was getting a little worried that I would have to go back to using lakos style external include guards.
Some things to consider about pragma directives. “#pragma once” did not used to work well with IncrediBuild in the past. I’m not sure if that is still the case. Haven’t tried with distcc.
The holy standard (ISO 14882 – 1998) in section 16.6/1 states:
“A preprocessing directive of the form
#pragma pp-token new-line
causes the implementation to behave in an implementation-defined manner. Any pragma that is not recognized by the implementation is ignored.”
Good point about Incredibuild. I don’t know how it will deal with #pragma once. It tends to get confused enough without it, so I would be very hesitant to rely on that. On the whole, I’m not very happy with Incredibuild (for many different reasons).
distcc relies on doing a local preprocessor pass on a file, and then passing it on to the nextwork to be compiled, so I suspect the results would be the same as for the local compilation and #pragma once would work just fine.
It’s true that unknown #pragma directives should be ignored, but that’s not always the case. For example, older versions of gcc used to spew messages saying that #pragma was a deprecated directive. Since #pragmas are totally non-standard, it’s also concievable that compilers can use the same one to mean different things (although highly unlikely with something like #pragma once).
In general, if you plan to compile your code in more than one compiler (a good thing even if you’re only developing for one platform), it’s a good idea to #ifdef the pragma directives so they only apply to the compilers you intended them for.
I just tried putting external include guards on all of the most used files in the codebase I work on, including the precompiled.h inclusion in every header, and I got 0 speedup (no second change in a 45 second compile). I use MSVC 7.1 and tested both debug and release modes. I don’t think Kyle’s test case is really representative of any realistic situation.
As of incredibuild 2.12 or so #pragma once was no go. I don’t know if they fixed in in 2.20 or not.
>On the whole, I’m not very happy with Incredibuild (for many different reasons).
Why don’t you like Incredibuild?
The biggest issue we had was that edit and continue is not supported. Other than that it was a massive massive time saver. From 30-40 minute builds down to sub 10 minutes for a complete rebuilt.
It took me about 2 hours or so to get our Unreal based codebase to work with it (with all the compiling and re compiling). But after that start up cost it was smooth sailing.
> Why don’t you like Incredibuild?
Gosh, I started replying to that, but after I was typing for 10 minutes, I realized it would be best left as a full article. And even though I don’t like to bash products, I really think that Incredibuild can be actually harmful to a project. I’ll put up a full rant tomorrow 🙂
“I don’t think Kyle’s test case is really representative of any realistic situation.”
Gosh, I sure hope not. 🙂
There, this should answer how I feel about Incredibuild: http://www.gamesfromwithin.com/articles/0502/000069.html 🙂
Perhaps I’m missing something, but aren’t the external guards significantly faster because they preclude any disk access, whereas an internal guard or pragma require the include file to be opened first?
Additionally, I would expect pragma to be faster than the internal guards because the pragma is a first-class function of the pre-processor. When the pre-processor consumes it there is no ambiguity.
Internal guards, on the other hand, are not first-class concepts to pre-processor, they’re constructs built from first-class pre-processor functionality. In other words, the pre-processor doesn’t know that you’re attempting to eliminate redundant header processing. More importantly, the pre-processor has no clue what to do until it has consumed the entire file. And even then, I would doubt the pre-processor (in conjunction with the compiler) is optimized for the specific case of #ifdef’ing a file into emptiness. I would expect the compiler to simply receive an empty file. But, I may not be giving the compiler writers enough credit… 😉
Of course, this may all be stuff that was just being assumed by all of you guys, I am a little “new” to this stuff. 😉
All that being said, I am suprised that Visual Studio trailed so far behind GCC. It doesn’t suprise me it trailed, just by what seemed to be several orders of magnitude for something that didn’t involve any real compiling. Of course, as menteiond before, this test case is particularly pathological. Again, to not disregard the compiler writers, perhaps the Visual Studio approach solves a more common, “real world” problem more efficiently at the expense of this scenario?
I just came upon this page by chance – looked interesting so read it. My first thoughts were the same as Troy’s – in that surely the external guards test is not particularly fair (not even relevant) as they don’t involve disk access.
I’m not even sure the test was fair because you used two different OSs – shouldn’t you try a win32 build of gcc against MSVC? (maybe you did and I just missed that bit). Could be Win32 file access is just crap or you had a virus checker turned on?
The only real way to test which is better is to take a large existing project and change it over to use #pragma once instead and see what the speed increases are.
Modest use of precompiled headers and internal guards are probably fine though and then you can spend your spare time writing useful code and not worrying about saving a few 10ths of a second compile time 🙂