LeChimp has been rocking my world lately. I’ve been checking in code that passes all the unit tests, confident that everything was fine, just to find out the functional test fails loudly and obnoxiously . The other day it even managed to put the game in an infinite loop (yes, my fault). It might sound annoying, but I love it how LeChimp keeps us honest and makes subtle problems immediately obvious.
A couple of months ago something unusual happened: The functional test failed but I wasn’t able to reproduce the problem right away. The failure was not a crash, but an object in the world ending up in a different state than expected. That’s always tougher to track down. To make things even more fun, was object was affected changed depending on whether the game was run from the command line or the debugger. Oh, and did I mention it only happened in release mode? I’ve got a baaad feeling about this!
My first instinct was to blame it on Havok. After some investigating, I narrowed it down to one of the enemies that was in mid-air. Sometimes it would fall down, and other times it would stay hovering, as if not affected by gravity. I knew that Havok was supposed to be deterministic, but it’s every programmer’s instinct to always mistrust other people’s code. I should have known better.
I spent a whole day tracking this down. Adding more and more information to the recorded world state to detect the problem as soon as possible. Slowly circling in; silently creeping up on the unsuspecting bug. Eventually, I had my “aha!” moment. I had finally found it.
A member variable, m_alive, in the RobotLogical class was never initialized! That was causing the Havok character controller not to be updated sometimes, causing the discrepancy in object state between the recording pass and the playback pass. What made it even more difficult is that m_alive is a boolean, so any bit pattern other than zero, it’s going to make it true, which is the initial state I expected for the robots. It never failed in debug mode because unused heap memory is filled with patterns like 0xdeadbeef or (the definitely less fun, but more practical) 0xcdcdcdcd, which all make the variable come up as true. Doh!
Yes, C++ sucks because of time-wasters like that, but it didn’t prevent me from feeling really stupid for having sunk a whole day on it.
All the code Charles and I write is done through TDD so we have almost 100% unit-test coverage . How come Test Driven Development didn’t catch that? TDD is not about catching bugs, it’s about designing code. This is the type of error that is only triggered once every thousand times, so running one or two unit tests that cover that path probably won’t trigger it. That’s one of the many reasons end-to-end functional tests are invaluable. Go LeChimp!
Help, LeChimp! Help!
Detecting there’s a problem is good, but if it takes a whole day to fix it, things are going to be pretty painful. The more complicated the game gets, the more difficult life will be.
A good start is to crank up the warning level of your compiler as far as it goes. We’re compiling everything with warning level 4 in VC2005, so we’re already covered in that end . Unfortunately, warning level 4 doesn’t warn us about uninitialized member variables. We need something else.
Fortunately we have a secret weapon in our arsenal we had been ignoring: Lint.
Lint performs static analysis of C++ programs and spews out lots and lots of warnings. Think of it as an extremely pedantic warning level on a C++ compiler. And I mean extremely! It will warn you not only of potential errors and dodgy constructs, but of best practices, and sometimes even of recommendations in books like Effective C++.
As you can imagine, with all those things to check against, Lint is going to find lots to complain about. So taming the output into something readable is a definite challenge.
When it comes to Linting your code, there are a few options available.
Team Edition Compiler
My friend Jim Tilander brought up that there’s a Lint-like tool hidden in the bowels of the Microsoft Platform SDK. Boy, he wasn’t kidding about the hidden part! It’s a whopping 1.4 GB download for a DVD image that then needs to be extracted and installed in order to get to the small free compiler. When will Microsoft adopt a more modular approach?
Once I got past all the installation hurdles, I was pretty excited to give it a test. A free tool is a free tool. And if it has good integration with Visual Studio, so much the better. Strictly speaking, it’s not really a Lint tool, but an optimizing compiler with a special /analyze switch, which performs a lot of the same checks as Lint. But as we know, a rose by any other name…
Right off the bat, it fell flat the moment I started using it. Apparently it doesn’t know how to deal with vcproj files. To work on all the files in a project, I had to feed each cpp file separately. That in itself is only a slight annoyance, but it meant that you also need to pass it all the compiler switches set in the vcproj files: include directories, defines, etc. For a Microsoft product, I really would expect it to understand vcproj files. Oh well. It’s not the end of the world. I can always write a quick script to do that for me.
Unfortunately, the next problem was more serious. There was no global way to control what warnings I wanted reported. I can disable specific warnings in the command line, but if I only care about a few, it gets really cumbersome.
Finally the killer: It simply doesn’t report some of the most important warnings I expect out of Lint, such as uninitialized member variables or variables appearing in incorrect order in the initializer list.
In its defense, it had some decent features I would expect of good Lint tools, like the ability to suppress certain warnings from code or to give extra hits in the code itself. On the other hand, the way to control the behavior of the analyze switch is done with special keywords, which will make other compilers choke, making it harder to write cross-platform code. Somehow, I’m not terribly surprised.
LeChimp’s recommendation: Even for a free tool, it simply doesn’t stand up to any serious use. Skip it.
Don’t be put off by PC Lint’s archaic web site that seems to be snatched straight out of 1995 . PC Lint has been around forever and it shows. Not just in their web site, but in the program itself. Don’t expect any fancy GUIs, or dialog boxes, or any fancy cyber-amenities developed in the last 20 years.
Still, what it lacks in sophistication and pleasing visuals it more than makes up in functionality: Three minutes, $239, and a 4MB download later, I was ready to go.
In about an hour, I had PC Lint fully integrated with our build server and checking for problems in all of our source code. In just half an hour more, all our code was passing Lint’s rigorous inspection.
PC Lint can be quite intimidating out of the (virtual) box. If you run it on a typical codebase (or even a really high-quality one), it will treat you to megabytes of warning spew. Most of it useless, I have to say. So you have to take some time and tame it. Teach it to be a bit more polite, or more relevant. Or at least not to yell so much.
My preference is to start out with no warnings by starting out the options.lnt file with -e* and then add warning messages that I find useful: uninitalized member variables, unreferenced symbols, gotchas with virtual functions, etc..
Even running Lint with a small number warnings enabled, there were a few places in our code that caused PC Lint to complain but we didn’t want to change (Vec4 really shouldn’t initialize any member variables in its default constructor, or the NonCopyable class shouldn’t have a virtual destructor). In those cases, we can add a few discrete comments directly in the code that tell Lint to be quiet. We know what we’re doing, Lint. Really.
After we got all the code passing the initial set of checks, Charles dug into the Lint options file with a gleeful glint in his eyes and became the Lint-nazi, adding a good couple dozen extra warnings. It only took a few more minutes to get all the code to pass all the new checks. Frankly, the slowest part of the process is reading the thousands of different checks that PC Lint can do and decipher them (there are a couple of them I’m still unsure what exactly they mean).
This is what our options.lnt file looks like today:
-d__debugbreak()= -e* +e539 // Did not expect positive indentation from Location +e549 // Suspicious cast +e616 // control flows into case/default +e744 // switch statement has no default +e750 // local macro 'Symbol' (Location) not referenced +e751 // local typedef 'Symbol' (Location) not referenced +e752 // local declarator 'Symbol' (Location) not referenced +e753 // local struct, union or enum tag 'Symbol' (Location) not referenced +e764 // switch statement does not have a case +e766 // Include of header file FileName not used in module String +e767 // macro 'Symbol' was defined differently in another module +e773 // Expression-like macro 'Symbol' not parenthesized +e777 // Testing floats for equality +e783 // Line does not end with new-line +e784 // Nul character truncated from string +e789 // Assigning address of auto variable 'Symbol' to static +e796 // Conceivable access of out-of-bounds pointer +e797 // Conceivable creation of out-of-bounds pointer +e801 // Use of goto is deprecated +e802 // Conceivably passing a null pointer to function 'Symbol' +e803 // Conceivable data overrun for function 'Symbol' +e804 // Conceivable access beyond array for function 'Symbol' +e806 // Small bit field is signed rather than unsigned +e814 // useless declaration +e815 // Arithmetic modification of unsaved pointer +e817 // Conceivably negative subscript (Integer) in operator 'String' +e818 // Pointer parameter 'Symbol' (Location) could be declared ptr to const +e820 // Boolean test of a parenthesized assignment +e825 // control flows into case/default without -fallthrough comment +e940 // omitted braces within an initializer +e954 // Pointer variable 'Symbol' (Location) could be declared as pointing to a const // TURN THESE ON IF WE CAN FIGURE OUT HOW TO APPLY IT ONLY TO VALUES AND NOT POINTERS! //+e952 // Parameter 'Symbol' (Location) could be declared const //+e953 // Variable 'Symbol' (Location) could be declared as const +e1401 // member symbol 'Symbol' (Location) not initialized by constructor +e1404 // deleting an object of type 'Symbol' before type is defined +e1411 // Member with different signature hides virtual member 'Symbol' +e1413 // function 'Symbol' is returning a temporary via a reference +e1506 // Call to virtual function 'Symbol' within a constructor or destructor +e1507 // attempting to 'delete' an array (not a pointer) +e1509 // base class destructor for class 'Name' is not virtual +e1511 // Member hides non-virtual member 'Symbol' (Location) +e1512 // destructor for base class 'Symbol' (Location) is not virtual +e1516 // Data member hides inherited member 'Symbol' (Location) +e1520 // Multiple assignment operators for class 'Symbol' +e1521 // Multiple copy constructors for class 'Symbol' +e1534 // static variable 'Symbol' found within inline function in header +e1535 // Exposing low access data through member 'Symbol' +e1537 // const function returns pointer data member 'Symbol' +e1538 // base class 'Name' absent from initializer list for copy constructor +e1539 // member 'Symbol' (Location) not assigned by assignment operator +e1541 // member 'Symbol' (Location) possibly not initialized by constructor +e1542 // member 'Symbol' (Location) possibly not initialized +e1543 // member 'Symbol' (Location) possibly not initialized +e1544 // value of variable 'Symbol' (Location) indeterminate (order of initialization) +e1545 // value of variable 'Symbol' used previously to initialize variable 'Symbol' (Location) +e1547 // Assignment of array to pointer to base class (Context) +e1552 // Converting pointer to array-of-derived to pointer to base +e1554 // Direct pointer copy of member 'Symbol' within copy constructor +e1555 // Direct pointer copy of member 'Symbol' within copy assignment operator +e1556 // 'new Type(integer)' is suspicious +e1557 // const member 'Symbol' is not initialized +e1561 // Reference initialization causes loss of const/volatile integrity +e1705 // static class member may be accessed by the scoping operator +e1706 // Declaration with scope operator is unusual within a class +e1707 // static assumed for String +e1710 // An implicit 'typename' was assumed +e1711 // class 'Symbol' (Location) has a virtual function but is not inherited +e1718 // expression within brackets ignored +e1719 // assignment operator for class 'Symbol' has non-reference parameter +e1720 // assignment operator for class 'Symbol' has non-const parameter +e1724 // Argument to copy constructor for class 'Symbol' should be a const reference +e1726 // taking address of overloaded function name 'Symbol' +e1729 // Initializer inversion detected for member 'Symbol' +e1734 // Had difficulty compiling template function: 'Symbol' +e1736 // Redundant access specifier (String) +e1931 // Constructor 'Symbol' can be used for implicit conversions
I originally only had about a dozen warnings enabled, but, as you can see, Charles really went to town with it 🙂
To integrate it into our functional test, we just set up a new target in msbuild. For each project, we call it twice: once to generate a lnt file for the project, and once to run it on the files for the project.
<PropertyGroup> <LintDir>..\Bin\Lint\</LintDir> <Lint>$(LintDir)lint-nt.exe</Lint> </PropertyGroup> <ItemGroup> <ProjectFiles Exclude="..\Engine\**\*Tests.vcproj" Include="Src\SweetPea.vcproj;..\Engine\**\*.vcproj;"/> </ItemGroup> <Target Name="Lint" DependsOnTargets="CleanLint;RunLint"/> <Target Name="CleanLint"> <Delete Files="%(ProjectFiles.RelativeDir)%(ProjectFiles.Filename).lnt"/> </Target> <Target Name="RunLint" Inputs="@(ProjectFiles)" Outputs="@(ProjectFiles -> %(ProjectFiles.RelativeDir)%(ProjectFiles.Filename).lnt)" > <Exec Command="$(Lint) -v -os(%(ProjectFiles.RelativeDir)%(ProjectFiles.Filename).lnt) %(ProjectFiles.Identity)"/> <Exec WorkingDirectory="%(ProjectFiles.RelativeDir)" Command="$(MSBuildProjectDirectory)\$(Lint) -i$(MSBuildProjectDirectory)\$(LintDir) std.lnt %(ProjectFiles.Filename).lnt"/> </Target>
Whenever PC Lint detects some code that violates one of its rules, it returns an error message, which fails the msbuild target and the whole build is reported as failed by Cruise Control .Net. I love it when things work exactly like you want them to without doing any extra work. To make it even better, because PC Lint can output errors in VC++ format, they’re understood and parsed correctly by msbuild and CCNet, so they show up in the CCNet log like any other error.
PC Lint runs surprisingly fast. It takes about 20 seconds to run on all of our source code. I was almost tempted to add it to the incremental build in the build server, but I really want to keep build times to a minimum there, so it runs once every hour, which seems to be good enough.
If you want to be more hands-on with PC Lint, it’s a breeze to integrate with Visual Studio. Just set up a few custom commands in External Tools and you’re good to go. The Visual Studio Lint options file even details exactly how to set up those commands.
My only minor complaint with PC Lint is the strange behavior of error 766 (Include of header file not used in module). I don’t know what algorithm PC Lint uses to determine that, but it seems not to report some unused headers. That wouldn’t be a big deal, but sometimes just rearranging headers or moving some code around will cause it to recognize a header as unused, triggering a failed build. It’s a very useful warning though, so I’m willing to live with that quirk.
Working with PC Lint is a pleasure. It’s really low level and old school, but that’s precisely what made it so easy to integrate with our code and our build system. It doesn’t need installation programs, databases, registry entries, dll deployments, or anything like that. Just a good-ol’ ini file with options and you’re good to go. We even have it checked it in version control to make it easier to deploy to our development stations and the build server.
PC Lint has the characteristics I want in a tool: Easy to buy, deploy, and integrate into the way I want to work with it.
LeChimp’s recommendation: An absolute must! Worth every penny. Two opposable thumbs up!
Other Lint Alternatives
Lint was originally a Unix utility developed in the late 70s. You would expect that lots of different versions would be available for modern compilers and platforms, but there aren’t that many. Maybe it’s not a really fun project to engage enough Open Source hackers?
I have to admit I haven’t looked at any other alternatives to PC Lint and the Team Edition Compiler that work under Windows. A quick Google search reveals there are a few tools out there. Has anybody tried them? I’d love to hear your experiences with them.
- Splint. Only works on C. Emphasis on security. Open source.
- CodeCheck. Works on C++. Commercial.
- A bunch for Linux/Unix. As much as I’d like to, the reality is that most game development happens under Windows.
Conclusion: LeChimp Rules The Day (Again)
PC Lint should be a requirement for any C++ project. The day you set up a build server (which should be the first day), go ahead and spend the extra hour setting up PC Lint as well. Be safe up front and avoid wasting any time down the line. We should have followed our own advice, but sometimes priorities slip a bit in the rush of getting a whole company off the ground.
Now I can finally sleep better knowing that LeChimp is keeping busy linting my code several times a day.
 There are some things we don’t bother TDDing: main functions, super-high level leaf code, etc.
 Of course, you need to have a clean compile in warning level 4 in order to be worth it. That means builds without any warnings or other spew. Incidentally, I do hate the setting to treat warnings as errors because it means you can’t temporarily have warnings while you’re refactoring between checkins, which slows things down.