Inner-Product on Games From Within

Writing Reusable Code

Tue, 21 Jun 2011 00:00:00 +0000

Some people asked what I meant by a “toolkit architecture” in the previous post about my middleware fears. It turns out I wrote about that in a previous Inner Product column that for some reason I never reposted here. I think at the time I wrote this (late 2008), I already wasn’t very concerned about writing reusable code, and I was focusing it mostly with respect to using other people’s code and how I wanted it to be architected.

As programmers, we’re constantly reusing code. Sometimes it’s in the form of low-level OS function calls, or game middleware, and sometimes code that our teammates wrote. At the same time, unless you’re writing top-level game script code, chances are that the code you’re writing will be used by people as well.

I’m purposefully avoiding labeling reusable code as “libraries” or “middleware”. Those are just two of the many forms in which we end up reusing code. Copying some source files onto your project, calling an API function, or instantiating a class written by someone else in your team, are all different forms of code reuse.

Getting Started

Without a doubt, the most difficult job writing reusable code, is making sure it solves a problem correctly and meets everybody’s needs. That’s not a easy task when we’re writing code for ourselves, but it’s much more difficult when we’re targeting other programmers. Too many libraries get it only half right, and they solve some problems at the expense of introducing a bunch of new ones. Or they force the user to jump through all sorts of hoops to get the desired result in the end. We need a clear understand of the exact problem we’re trying to solve with our code.

Libraries often fall in the trap of presenting an implementation-centric interface. That is, their interface is based on the implementation details of the library, rather than how the users are going to use it in their programs.

The best way I’ve found to address both those shortcomings is to start by implementing code that solves the problem for one person. Just one. Forget about multiple users and code reusability for now. If you don’t have immediate and constant access to that one person, then you need to play that role and create a game or application that is as close as possible to what one of your users is going to be developing. By using this approach, I find that the interface to the reused code is much more natural, and it’s based on the experience of having solved the problem at least once. Otherwise, you run the risk of creating an interface that is not a good fit and forcing everything to conform to it in unnatural ways.

Once you have implemented a solution, take a moment to think before you dive into doing any more work. Many times you’ll find there is no reason to abstract it any further since your code is only going to be used in one place. If that’s the case, step away from the code and go do something more productive. You can always come back later whenever there is a real need to reuse it later in the project or in a future game.

There are exceptions to this approach of implementing a solution first, and abstracting it later. Some problems are very simple, or very well understood, so we might be able to jump in and implement the reusable solution directly (an optimized search algorithm, or a compression function for example). It’s also possible that you’ve implemented a similar system several times before, and you know exactly at what kind of level to expose the interface and how things should look like. In that case, it’s perfectly valid to draw on your past experience. Just try to avoid the second-system effect: The tendency to follow up a successful, simple first system, by an overly-complex system with all the ideas that didn’t make it into the first one. Setting Goals

Most successful reusable code was created with specific goals about how it was meant to be used. Sometimes those goals are explicitly stated, most of the time they are implied in the code design.

Some of the most common goals are flexibility, protection, simplicity, robustness, or performance. You can obviously not meet all those goals at once, and even if you could, you probably shouldn’t try. That would be a tremendous waste of time and resources. Stop thinking in the abstract and think about your one user. What does he or she need? What is the most important goal for them?

Many APIs and middleware packages are designed with protection as one of the primary goals: They don’t want the user to accidentally do anything wrong. In itself is not a bad goal, and it can often be implemented by having clean, unambiguous interfaces, clearly-named types and functions, and strongly-typed data types. Unfortunately, a design with protection as a main goal can often result in encumbered interfaces, verbose code, slow performance, and inflexible code. There is nothing more frustrating than wanting to do something that is explicitly being protected against, and having to work around the interface.

If your target users are professional game developers, give them the benefit of the doubt and don’t try to overprotect all your code. Save that for the scripting API exposed to junior designers and released to the customers with the game. If you’re concerned about programmers using your code correctly and not making mistakes, provide good sample code, tests, and documentation. If that’s not enough, and you feel that everybody would benefit from some level of protection, try to keep it to a minimum and maybe even provide lower-level functions that bypass it for power users.

Whatever the primary goal, the libraries and APIs I prefer to work with, help me get whatever I need done, while getting out of the way as much as possible.

Architecture

There are many different ways to architect code that is intended for reuse. The best approach will depend on the particular code: How complex is it? How much of it is there? What are its goals?

Unless your goals are to make quick, throwaway applications, I strongly recommend against a framework type of architecture (see Figure 1.) A framework is a system in which you add a few bits of functionality to customize your program into an existing system. They may sound like a clean and easy way to use complex code, but they’re inevitably very restrictive, and they make it very difficult, if not impossible, to do things with them beyond what they were intended to.

A more flexible approach is a layered architecture (see Figure 2.) Each layer is relatively simple and provides a well-defined set of functionality. Higher-level layers build on top of lower-level layers to create more complex or more specific functionality.

Keep in mind that it is not necessary, or even desirable, to have higher-level layers completely abstract out and hide the lower-level ones. By letting layers be fully transparent, they allow you to mix and match at what level you want to access the code. This can be very important, especially towards the end of a game when fixing some bugs or trying to squeeze some more performance out of the engine.

For example, one layer can expose functionality to create and manipulate pathfinding networks and nodes, another one can implement searches and other queries on those networks, while a third, higher-level layer, can expose functions to reason on the state of the network.

A toolkit architecture (see Figure 3), is the most flexible of all. It provides small, well-defined modules or functions with very few dependencies on other modules. This allows users to pull in whatever modules they need into their game to meet their needs and nothing else. Users can also start by reusing some modules, and but replace them down the line when they want to go beyond the existing functionality. Because of this, toolkit architectures are particularly well suited to game development.

Object Oriented

I tend to avoid complex class hierarchies in most of my code, but that’s especially important in the case of reusable code. Class hierarchies are very rigid, and impose a particular structure on their users.

If you want to remain object-oriented, a better approach is to emphasize composition of objects instead of inheritance. That allows users to much more easily pull in the functionality they need, and create their own objects based on their own constraints.

You can even go a step further and provide purely procedural interfaces. Plain static functions that operate on data types. Interfaces based on static functions are often much easier to understand and grasp than interfaces that involve classes and inheritance. They are also much more convenient for users to wrap and use in many different ways.

Remember what you learned about object-oriented design and having private data? Forget about it, and keep everything accessible. You may think you’re doing the user a favor by making some variables private and reducing the complexity of the interface. That’s partly true, but eventually, your users will want to have access to some of those variables that you took pains to hide. And if they really want to, they will get to them, even if it means direct addressing into an object or vtable.

It’s true that large codebases can be intimidating, and exposing all the internal details along with the regular interface would make it unwieldy for new users. An effective approach is to separate the public interface from what is intended for internal use only, but still make it available through some other means. For example, private data and functions could be wrapped in a different namespace, or simply in a different set of headers. Anything that clearly sets them apart, and doesn’t clutter the initial look at the interface, but that allows experienced developers to get to them and get their hands dirty. Extensibility

As soon as you make your code available to a wide range of developers, you’ll find that people want to use it in progressively more complex and bizarre situations. Your code might have completely solved the case of your first couple of users, and since it’s layered and modular, it can meet a lot of different requirements. But eventually, some people will start taking it to extremes you hadn’t imagined and it falls short for them. What to do?

You could start adding more options and more modules and more callbacks to your code. That way programmers can hook up into almost any part of the code and replace it with their own. The problem with that approach is that you’ve taken something that was relatively simple and made it into an large, ugly, fully-customizable, behemoth that tries to keep everybody happy. That sounds like a lot of common APIs we know and hate. Most successful products try to completely meet the need of some people rather than meet everybody’s needs part way. Otherwise you inconvenience 95% of your users for the benefit of 5% of them.

A better approach is to let those 5% users fend for themselves, but give them the means to do it. How so? With source code. Without source code, developers feel caged and constrained. They know they can’t look behind the interfaces, let alone modify anything in case something goes wrong (and we all know something will go wrong). The more code there is, and the more a project relies on it, the more important it is to have access to the full source code. Many teams will refuse to use some libraries or middleware unless the full source code is available. As soon as you make source code available to your users, you immediately put them more at ease because they feel more in control, and you allow those with special requirements to make whatever modifications they need to do.

Even those developers without a need to modify the code, they will be able to browse the code and see how certain functions are implemented. Not only will it make people much more likely to use your code, but they will probably also fix your bugs and suggest performance improvements, so it’s a win-win situation for everybody.

For extra bonus points, the source code should be accompanied by a set of tests. The more comprehensive the better (unit tests, functional tests, etc). Hopefully you created all those tests while you were developing the code, so distributing it along with the source code shouldn’t be any extra effort, but it will make a huge difference to your users. It will give them much more confidence modifying the code to suit their needs and still see that all the tests are passing.

Upgrades

As soon as you release some code and you have your first user, the question comes up of how to deal with new versions. There are many ways you can go about it, depending on how often you’ll release new versions, and how important it is to maintain backwards compatibility.

On one extreme, you can change the code and the interface to fit new features, changes in architecture, or any other reason. Whenever you release a new version, users will have to choose to remain with their current version or upgrade to the latest and make whatever changes are necessary. This is a common approach in open source projects and internal company code.

On the other extreme, once you release a version, you stick to that interface whether it’s a good idea or not. This can be good for users because they can get new versions without any extra work on their part, but it can be very constraining. It can make new features impossible to add, and it can prevent performance optimizations. This approach is more common on code that will become the foundation of many programs, like OS libraries and low-level APIs.

A good compromise is to keep interfaces the same during minor versions, and only change them whenever a major version is released. That way other developers only have to put in the time to upgrade to a major release if they really need the new features at that time.

Another approach is not to change existing functions or classes, but to introduce new ones and slowly deprecate the old ones over time. That way code continues to work, but users can take advantage of the new functions. After a few versions, you can completely drop off deprecated functionality, at which point most people will have upgraded already. If you do this, make sure to label deprecated functions with a #pragma warning or some other way. That way, developers know it will be phased out and they can start thinking about upgrading to the new interface.

The easier you make the transition to the new version, the better for your users, and the more likely they will be continue using your code. For example, you can provide scripts that parse their source code and upgrade it to match the new interfaces. That can be a bit risky, but it can be really worthwhile if you have a lot of required changes that are relatively mechanical (renamed functions, changed parameter order, etc).

Is there any point to all this talk of interfaces if you made your source code available? Yes, very much so. Most developers will look at the source code to know how things work under the hood, but they probably won’t modify it. Even if they do, they know that’s something they do at their own risk, so they’ll be more than willing to make a few changes whenever a new version is released. Everybody else will still definitely benefit from a relatively stable interface.

Conclusion

Writing reusable code starts with solving a problem and solving it well. The rest should all fall from there, and you can pick whichever method is more appropriate for your particular code and how you want to share it.

This article was originally printed in the March 2009 issue of Game Developer.

Data-Oriented Design Now And In The Future

Tue, 04 Jan 2011 00:00:00 +0000

There has been a lot of recent discussion (and criticism) on Data Oriented Design recently. I want to address some of the issues that have been raised, but before that, I’ll start with this reprint from my most recent Game Developer Magazine. If you have any questions you’d like addressed, add write a comment and I’ll try to answer everything I can.

Last year I wrote about the basics of Data-Oriented Design (see the September 2009 issue of Game Developer). In the time since that article, Data-Oriented Design has gained a lot of traction in game development and many of teams are thinking in terms of data for some of the more performance-critical systems.

As a quick recap, the main goal of Data-Oriented Design is achieving high-performance on modern hardware platforms. Specifically, that means making good use of memory accesses, multiple cores, and removing any unnecessary code. A side effect of Data-Oriented Design is that code becomes more modular and easier to test.

Data-Oriented Design concentrates on the input data available and the output data that needs to be generated. Code is not something to focus on (like traditional Computer Science), but is something that is written to transform the input data into the output data in an efficient way. In modern hardware, that often means applying the same code to large, contiguous blocks of homogeneous memory.

Applying Data-Oriented Design

Itâ€™s pretty easy to apply these ideas to a self-contained system that already works over mostly-homogeneous data. Most particle systems in games are probably designed that way because one of their main goals is to be very efficient and handle a large number of particles at high framerates. Sound processing is another system that is naturally implemented thinking about data first and foremost.

So, whatâ€™s stopping us from applying it to all the performance-sensitive systems in a game code base? Mostly just the way we think about the code. We need to be ready to really look at the data and be willing to split up the code into different phases. Letâ€™s take a high-level example and see how the code would have to be restructured when optimizing for data access.

Listing 1 shows pseudocode for what could be a typical update function for a generic game AI. To make things worse, that function might even be virtual and different types of entities might implement it in different ways. Letâ€™s ignore that for now and concentrate on what it does. In particular, the pseudocode highlights that, as part of the entity update, it does many conditional ray casting queries, and also updates some state based on the results of those queries. In other words, weâ€™re confronted with the typical tree-traversal code structure so common in Object-Oriented Programming.

void AIEntity::Update(float dt)
{
    DoSomeProcessing();
    if (someCondition && Raycast(world))
       DoSomething();
    if (someOtherCondition && BunchOfRayCasts(world))
       DoSomethingElse();
    UpdateSomeOtherStuff();
}

Listing 1

Ray casts against the world are a very common operation for game entities. Thatâ€™s how they â€œseeâ€ whatâ€™s around them and thatâ€™s what allows them to react correctly to their surroundings. Unfortunately, ray casting is a very heavy weight operation, and it involves potentially accessing many different areas in memory: a spatial data structure, other entity representations, polygons in a collision mesh, etc.

Additionally, the entity update function would be very hard to paralellize on multiple cores. Itâ€™s unclear how much data is read or written in that function, and some of that data (like the world data structure) might be particularly hard and expensive to protect from updates from multiple threads.

If we re-organize things a bit, we can significantly improve performance and paralellization.

Break Up And Batch

Without seeing any of the details of whatâ€™s going on inside the entity update, we can see the raycasts sticking out like a sore thumb in the middle. A raycast operation is fairly independent of anything else related to the entity, itâ€™s heavyweight, and there could be many of them, so itâ€™s a perfect candidate to break up into a separate step.

Listing 2 shows how the broken up entity update code would look like. The update is now split in two different passes: The first pass does some of the updating that can be done independently of any ray casts, and decides which, if any, raycasts need to be performed sometime this frame.

void AIEntity::InitialUpdate(float dt, RayCastQueries& queries)
{
    DoSomeProcessing();
    if (someCondition)
        AddRayCastQuery(queries);
    if (someOtherCondition)
        AddBunchOfRayCasts(queries);
}

void AIEntity::FinalUpdate(const RayCastResults& results)
{
    UpdateSomeOtherStuff(results);
}

Listing 2

The game code in charge of updating the game processes all AI entities in batches (Listing 3). So instead of calling InitialUpdate(), solve ray casts, and FinalUpdate() for each entity, it iterates over all the AI entities calling InitialUpdate() and adds all the raycast query requests to the output data. Once it has collected all the raycast queries, it can process them all at once and store their results. Finally, it does one last pass and calls FinalUpdate() with the raycast results on each entity.

RayCastQueries queries;
for (int i=0; i

Listing 3


By removing the raycast calls from within the entity update function, weâ€™ve shortened the call tree significantly. The functions are more self-contained, easier to understand, and probably much more efficient because of better cache utilization. You can also see how it would be a lot easier to parallelize things now by sending all raycasts to one core while another core is busy updating something unrelated (or maybe by spreading all raycasts across multiple cores, depending on your level of granularity).
Note that after calling InitialUpdate() on all entities, we could do some processing on other game objects that might also need raycast queries and collect them all. That way, we can batch all the raycasts at compute them all at once. For years, weâ€™ve been drilled by graphics hardware manufacturers how we should batch our render calls and avoid drawing individual polygons. This is the same way: By batching all raycasts in a single call, we have the potential to achieve much higher performance.
Splitting Things Up
Have we really gained much by reorganizing the code this way? Weâ€™re doing two full passes over the AI entities, so wouldnâ€™t that be worse from a memory point of view? Ultimately you need to measure it and compare the two. In modern hardware platforms, I would expect performance to be better because, even though weâ€™re traversing through the entities twice, weâ€™re using the code cache much better and weâ€™re accessing them sequentially (which allows us to pre-fetch the next one too).
If this is the only change we make to the entity update, and the rest of the code is the usual deep tree traversal code, we might not have gained much because weâ€™re still blowing the cache limits with every update. We might need to apply the same design principles to the rest of the update function to start seeing performance improvements. But at the very least, even with this small change, we have made it easier to parallelize.
One thing weâ€™ve gained now is the ability to modify our data to fit the way weâ€™re using it, and thatâ€™s the key to big performance gains. For example, after seeing how the entity is updated in two separate passes, you might notice that only some of the data that was stored in the entity object is touched from the first update, and the second pass accesses more specific data.
At that point we can split up the entity class into two different sets of data. One of the most difficult things at this point is naming these sets data in some meaningful way. Theyâ€™re not representing real objects or real-world concepts anymore, but different aspects of a concept, broken down purely by how the data is processed. So what before was an AIEntity, can now become a EntityInfo (containing things like position, orientation, and some high-level data) and AIState (with the current goals, orders, paths to follow, enemies targeted, etc).
The overall update function now deals with EntityInfo structures in the first pass, and AIState structures in the second pass, making it much more cache friendly and efficient.
Realistically, both the first and second passes will have to access some common data (for example the entity current state: fleeing, engaged, exploring, idle, etc). If itâ€™s only a small amount of data, the best solution might be to simply duplicate that data on both structures (going against all â€œcommon wisdomâ€ in Computer Science). If the common data is larger or is read-write, it might make more sense to give it separate data structure of its own.
At this point, a different kind of complexity is introduced: Keeping track of all the relationships from the different structures. This can be particularly challenging while debugging because some of the data belonging to the same logical entity isnâ€™t stored in the same structure and itâ€™s harder to explore in a debugger. Even so, making good use of indices and handles makes this problem much more manageable (see Managing Data Relationships in the September 2008 issue of Game Developer).
Conditional Execution
So far things are pretty simple because weâ€™re assuming that every AI entity needs both updates and some ray casts. Thatâ€™s not very realistic because entities are probably very bursty: sometimes they need a lot of ray casts, and sometimes theyâ€™re idle or following orders and donâ€™t need any for a while. We can deal with this situation by adding a conditional execution to the second update function.
The easiest way to conditionally execute the update would be to add an extra output parameter to the FirstUpdate() function indicating whether the entity needs a second update or not. The same information could be derived form the calling code depending on whether there were any raycasts queries added. Then, in the second pass, we only update those entities that appear in the list of entities requiring a second update.
The biggest drawback of this approach is that the second update went from traversing memory linearly to skipping over entities, potentially affecting cache performance. So what we thought was going to be a performance optimization ended up making things slower. Unless weâ€™re gaining a significant performance improvement, itâ€™s often better to simply do the work for all entities whether they need it or not. However, if on average less than 10 or 20 percent of the entities need a ray cast, then it might be worth avoiding doing the second update on all the other entities and paying the conditional execution penalty.
If the number of entities to be updated in the second pass is fairly small, another approach would be to copy all necessary data from the first pass into a new temporary buffer. The second pass can then process that data sequentially without any performance penalties and it would completely offset the performance hit of copying the data.
Finally, another alternative, especially if the conditional execution remains fairly similar from frame to frame, is to relocate entities that need raycasting together. That way the copying is minimal (swapping an entity to a new location in the array whenever it needs a raycast), and we still get the benefit of the sequential second update. For this to work all your entities need to be fully relocatable, which means working with handles or some other indirection, or updating all the references to the entities that swapped places.
Different Modes
What if the entity can be in several, totally different modes of execution? Even if itâ€™s the same type of entity, traversing through them linearly calling the update function could end up using completely different code for each of them, so it will result in poor code cache performance.
There are several approached we can take in a situation like that:

If the different execution modes also are tied to different parts of the entity data, we could treat them as if they were completely different entities and break each of their data components apart. That way, we can iterate through each type separately and get all the performance benefits.
If the data is mostly the same, and itâ€™s just the code that changes, we could keep all the entities in the same memory block, but rearrange them so that entities in the same mode are next to each other. Again, if you can relocate your data, this is very easy and efficient (it only requires swapping a few entities whenever the state changes).
Leave it alone! Ultimately, Data-Oriented Design is about thinking about the data and how it affects your program. It doesnâ€™t mean you always have to optimize every aspect of it, especially if the gains arenâ€™t significant enough to warrant the added complexity.

The Future
Is thinking about a program in terms of data and doing these kind of optimizations a good use of our time? Is this all going to go away in the near future as hardware improves? As far as we can tell right now, the answer is a definite no. Efficient memory access with a single CPU is a very complicated problem, and matters get much worse as we add more cores. Also, the amount of transistors in CPUs (which is a rough measure of power) continues to increase much faster than memory access time. That tells us that, barring new technological breakthroughs, weâ€™re going to be dealing with this problem for a long time. This is a problem we need to deal with right now and build our technology around it.
There are some things Iâ€™d like to see in the future to make Data-Oriented Design easier. We can all dream up of a new language that will magically allow for great memory access and easy paralellization, but replacing C/C++ and all existing libraries is always going to be a really hard sell. Historically, the best advances in game technology have been incremental, not throwing away existing languages, tools, and libraries (thatâ€™s why weâ€™re still stuck with C++ today).
Here are two things that could be done right now and work with our existing codebases. I know a lot of developers are working on similar systems in their projects, but it would be great to have a common implementation released publicly so we can all build on top of them.
Language
Even though a functional language might be ideal, either created from scratch or reusing an existing one, we could temporarily extend C to fit our needs. I would like to see a set of C extensions where functions have clearly defined inputs and outputs, and code inside a function is not allowed to access any global state or call any code outside that function (other than local helper functions defined in the same scope). This could be done as a preprocessor or a modified C compiler, so it remains very compatible with existing libraries and code.
void FirstEntityUpdate(input Entities* entities, input int entityCount, output RayCastQueries* queries, output int queryCount);
Dependencies between functions would be expressed by tying the outputs of some functions to the input of other functions. This could be done in code or through the use of GUI tools that help developers manage data relationships visually. That way we can construct a dependency diagram of all the functions involved in every frame.
Scheduler
Once we have the dependencies for every function, we can create a directed acyclic graph (DAG) from it, which would give us a global view of how data is processed every frame. At that point, instead of running functions manually, we can leave that job in the hands of a scheduler.
The scheduler has full information about all the functions as well as the number of available cores (and information from the previous frame execution if we want to use that as well). It can determine the critical path through the DAG and optimize the scheduling of the tasks so the critical path is always being worked on. If temporary memory buffers are a limitation for our platform, the scheduler can take that into account and trade some performance time for a reduced memory footprint.
Just like the language, the scheduler would be a very generic component, and could be made public. Developers could use it as a starting point, build on top of it, and add their own rules for their specific games and platforms.
 
Even if weâ€™re not ready to create those reusable components, every developer involved in creating high-performance games should be thinking about data in their games right now. Data is only going to get more important in the future as the next generation of consoles and computers rolls in.
 
This article was originally printed in the September 2010 issue of Game Developer.



Start Pre-allocating And Stop Worrying
Mon, 25 Oct 2010 00:00:00 +0000
One of the more frequent questions I receive is what kind of memory allocation strategy I use in my games. The quick answer is none (at least frame to frame, I do some allocation at the beginning of each level on a stack-based allocator). This reprint of one of my Inner Product column covers quite well how I feel about memory allocation.
We’ve all had things nagging us in the back of our minds. They’re nothing we have to worry about this very instant, just something we need to do sometime in the future. Maybe that’s changing those worn tires in the car, or making an appointment with the dentist about that tooth that has been bugging you on and off.
Dynamic memory allocation is something that falls in that category for most programmers. We all know we can’t just go on allocating memory willy-nilly whenever we need it, yet we put off dealing with it until the end of the project. By that time, deadlines are piling on the pressure and it’s usually too late to make significant changes. With a little bit of forethought and pre-planning, we can avoid those problems and be confident our game is not going to run out of memory in the most inopportune moment.
On-Demand Dynamic Memory Allocation
The easiest way to get started with memory management is to allocate memory dynamically whenever you need it. This is the approach many software engineering books consider as ideal and it’s often encouraged in Computer Science classes.
It’s certainly an easy approach to use. Need a new animation instance when the player is landing on a ledge? Allocate it. Need a new sound when we reach the goal? Just allocate another one!
On-demand dynamic memory allocation can help to keep memory usage to a minimum, because, you only allocate the memory that you need and no more. In practice it’s not quite as neat and tidy because there can be a surprisingly large amount of overhead per allocation, which adds up if programmers become really allocation-happy.
It’s also a good way to shoot yourself in the foot.
Games don’t live inside a Computer Science textbook, so we have to deal with real world limitations, which make this approach cumbersome, clunky, and potentially disastrous. What can go wrong with on-demand dynamic memory allocation? Almost everything!
Limited Memory
Games, or any software for that matter, run on machines with limited amounts of memory. As long as you know what that limit is, and you keep extremely careful track of your memory usage, you can remain under the limit. However, since the game is allocating memory any time it needs it, there will most likely come a time when the game tries to allocate a new block but there is no memory left. What can you do then? Nothing easy I’m afraid. You can try to free an older memory block and allocate the new one there, or you can try to make your game tolerant to running out of memory. Both those solutions are very complex and difficult to implement correctly.
Even setting memory budgets and sticking to them can be very difficult. How can a designer know that a given particle system isn’t going to run out of memory? Are these AI units going to create too many pathfinding objects and crash the game? Hard to say until we run the game in all possible combinations. And even then, how do you know it isn’t going to crash five minutes later? Or ten? It’s almost impossible to know for certain.
If you insist in using this approach, at the very least, you should tag all memory allocations, so you have an idea of how memory is being used. You can either tag each allocation based on what system initiated it (physics, textures, animation, sound, AI, etc) or even on the filename where it originated, which has the advantage that it can be automated and should still give you a good picture of the overall memory usage.
Memory Fragmentation
Even if you take lots of pain not to go over your the available memory, you might still run into trouble because of memory fragmentation. You might have enough memory for a new allocation, but in the form of many small memory blocks instead of a large contiguous one. Unless you provide your own memory allocation mechanism, fragmentation is something that is very hard to track on your own, so you can’t even be ready for it until the allocation fails.
Virtual Memory
Virtual memory could solve all those problems. In theory, if you run out of real memory, the operating system swaps out some older, unused pages to disk and makes room for the new memory you requested. In practice, it’s just a bad caching scheme because it can be triggered at the worst possible moment, and it doesn’t know about what data it’s swapping out or how your game uses it.
Games, unlike most other software, have a “soft realtime” requirement: The game needs to keep updating at an acceptable interactive rate, which is somewhere around 15 or more frames per second. That means that gamers are going to make a trip to the store to return your game if it pauses for a couple of seconds every few minutes to “make some room” for new memory. So relying on virtual memory isn’t a particularly attractive solution.
Additionally, lots of games run in platforms with fixed amounts of RAM and no virtual memory. So when memory runs out, things won’t get slow and chuggy, they’ll crash hard. When the memory is gone, it’s really gone.
Performance Problems
There are some performance issues that are relatively easy to track down and fix. Usually ones that occur every frame and are happening in a single spot: some expensive operation, a O(n3) algorithm, etc. Then there are performance problems introduced by dynamic memory allocations, which can be really hard to track down.
Standard malloc returns memory pretty quickly, and usually doesn’t ever register on the the profiler. Every so often though, whenever the game has been running for a while and memory is pretty fragmented, it can spike up and cause a significant delay for just a frame. Trying to track down those spikes has caused more than one programmer to age prematurely. You can avoid some of those problems by using your own memory manager, but don’t attempt to write a generic one yourself from scratch. Instead start with some of the ones listed in the references.
Malloc spikes are not the only source of performance problems. Allocating many small blocks of memory can lead to bad cache coherence when the game code access them sequentially. This problem usually manifests itself as a general slowdown that can’t be narrowed down in the profiler. With today’s hardware of slow memory systems and deep caches, good memory access patterns are more important than ever.
Keeping Track Of Memory
Another source of problems with dynamic memory allocation are bugs in the logic that keeps track of the allocated memory blocks. If we forget to free some of them, our program will have memory leaks and has the potential to run out of memory.
The flip side of memory leaks are invalid memory access. If we free a memory block and later we access it as if it were allocated, we’ll either get a memory access exception, or we’ll manage to corrupt our own game.
Some techniques, such as reference counting and garbage collection can help keep track of memory allocations, but introduce their own complexities and overhead.
Introducing Pre-allocation
On the opposite corner of the boxing ring is the purely pre-allocated game. It excels at everything that the dynamically-allocated game is weak at, but it has a few weaknesses of its own. All in all, it’s probably a much safer approach for most games though.
The idea behind a pre-allocation memory strategy is to allocate everything once and never have to do any dynamic allocations in the middle of the game. Usually you grab as big a block of memory as you can, and then you carve it out to suit your game’s needs.
Some advantages are very clear: no performance penalties, knowing exactly how your memory is used, never running out of memory, and no memory fragmentation to worry about. There are some other more subtle advantages, such as being able to put data in contiguous areas of memory to get best cache coherency, or having blazingly-fast load times by loading a baked image of a level directly into memory.
The main drawback of pre-allocation is that is more complex to implement than the dynamic allocation approach and it takes some planning ahead.
Know Your Data
For preallocation to work, you need to know ahead of time how much of every type of data you will need in the game. That can be a daunting proposition, especially to those used to a more dynamic approach. However, with a good data baking system (see last month’s Inner Product column), you can get a global view of each level and figure out how big things need to be.
There is one important design philosophy that needs to be adopted for preallocation to work: Everything in the game has to be bounded. That shouldn’t feel too restrictive; after all, the memory in your target platform is bounded, as well as every single resource. That means that everything that can create new objects, including high-level game constructs, should operate on a fixed number of them. This might seem like an implementation detail, but it often bubbles up to what’s exposed to game designers. A common example is an enemy spawner. Instead of designing a spawner with an emission rate, it should have a fixed number of enemies it can spawn (and potentially reuse them after they’re dead).
Potentially Wasted Space
If you allocate enough data for the worst case in your game, that can lead to a lot of unused data most of the time. That’s only an issue if that unused data is preventing you from adding more content to the game. We might initially balk at the idea of having 2000 preallocated enemies when we’re only going to see 10 of them at once. But when you realize that each of those enemies is only taking 256 bytes and the total overhead is 500 KB, which can be easily accommodated in most modern platforms today.
Preallocation doesn’t have to be as draconian as it sounds though. You could relax this approach and commit to having each level preallocated and never having dynamic memory allocations while the game is running. That still allows you to dynamically allocate the memory needed for each level and keep wasted space to a minimum. Or you can take it even further and preallocate the contents of memory blocks that are streamed in memory. That way each block can be divided in the best way for that content and wasted space is kept to a minimum.
Reuse, RecycleM
If you don’t want to preallocate every single object you’ll ever use, then you can create a smaller set, and reuse them as needed. This can be a bit tricky though. First of all, it needs to be very much specific to the type of object that is reused. So particles are easy to reuse (just drop the oldest one, or the ones not in view), but might be harder with enemy units or active projectiles. It’s going to take some game knowledge of those objects to decide which ones to reuse and how to do it.
It also means that systems need to be prepared to either fail an allocation (if your current set of objects is full and you don’t want to reuse an existing one), or they need to cope with an object disappearing from one frame to another. That’s a relatively easy problem to solve by using handles or other weak references instead of direct pointers.
Then there’s the issue that reusing an object isn’t as simple as constructing a new one. You really need to make sure that when you reuse it, there’s nothing left from the object it replaced. This is easy when your objects are just plain data in a table, but can be more complicated when they’re complex C++ classes tied together with pointers. In any case, you can’t apply the Resource Acquisition Is Initialization (RAII) pattern, but it doesn’t seem to be a pattern very well suited for games, and it’s a small price to pay for the simplicity that preallocation provides.
Specialized Heaps
Truth be told, a pure pre-allocated approach can be hard to pull off, especially with highly dynamic environments or games with user-created content. Specialized heaps is a combination of dynamic memory allocation and pre-allocation that takes the best of both worlds.
The idea behind specialized heaps is that the heaps themselves are pre-allocated, but they allow some form of specialized dynamic allocation within them. That way you avoid the problems of running out of memory, or memory fragmentation globally, but you still can perform some sort of dynamic allocation when needed.
One type of specialized heaps is based on the object type. If you can guarantee that all objects allocated in that heap are going to be of the same size, or at least a multiple of a certain size, memory management becomes much easier and less error prone, and removes a lot of the complexity of a general memory manager.
My favorite approach for games is to create specialized heaps based on the lifetime of the objects allocated in them. These heaps use sequential allocators, always allocating memory from the beginning of a memory block. When the lifetime of the objects is up, the heap is reset and allocations can start from the beginning again. The use of a simple sequential allocator bypasses all the insidious problems of general memory management: fragmentation, compaction, leaks, etc. See the code in http://gdmag.com/resources/code.htm for an implementation of a SequentialAllocator class.
The heap types most often used in games are:

Level heap. Here you allocate all the assets and data for the level at load time. When the level is unloaded, all objects are destroyed at once. If your game makes heavy use of streaming, this can be a streaming block instead of a full level.
Frame heap. Any temporary objects that only need to last a frame or less get allocated here, and destroyed at the end of the frame.
Stack heap. This one is a bit different from the others. Like the other heaps, it uses a sequential allocator and objects are allocated from the beginning, but instead of destroying all objects at once, it only destroys objects up to the marker that is popped fro the stack.

What About Tools?
You can take everything I’ve written here, and (almost) completely ignore it for tools. I fall in the camp of the programmers who consider the runtime as a totally separate beast from the tools. That means that the runtime can be lean and mean and minimalistic, but I can relax and use whatever technique makes me more productive when writing tools. That means you can allocate memory any time you want, you can use complex libraries like STL and Boost, etc. Most tools are going to run on a beefy PC and a few extra allocations here and there won’t make any difference.
Be careful with performance-sensitive tools though. Tools that build assets or compute complex lighting calculations might be a bottleneck in the build process. In that case, performance becomes crucial again and you might want to be a bit more careful about memory layout and cache coherency.
On the other hand, if the tool you’re writing is not performance sensitive, you should ask yourself if it really needs to be written in C++. Maybe C# or Python are better languages if all you’re doing is transforming XML files or verifying that a file format is correct. Trading performance for ease of development is almost always a win with development tools.
Next time you reach out for a malloc inside your main loop, think about how it can fail. Maybe you can pre-allocate that memory and stop worrying about what’s going to happen the day you prepare the release candidate.
This article was originally printed in the February 2009 issue of Game Developer.



Mock Objects: Friends Or Foes?
Mon, 26 Jul 2010 00:00:00 +0000
In a previous article we covered all the details necessary to start using unit testing on a real-world project. That was enough knowledge to get started and tackle just about any codebase. Eventually you might have found yourself doing a lot of typing, writing redundant tests, or having a frustrating time interfacing with some libraries and still trying to write unit tests. Mock objects are the final piece in your toolkit that allow you to test like a pro in just about any codebase.
Testing Code
The purpose of writing unit tests is to verify the code does what it’s supposed to do. How exactly do we go about checking that? It depends on what the code under test does. There are three main things we can test for when writing unit tests:


Return values. This is the easiest thing to test. We call a function and verify that the return value is what we expect. It can be a simple boolean, or maybe it’s a number resulting from a complex calculation. Either way, it’s simple and easy to test. it doesn’t get any better than this.


Modified data. Some functions will modify data as a result of being called (for example, filling out a vertex buffer with particle data). Testing this data can be straightforward as long as the outputs are clearly defined. If the function changes data in some global location, then it can be more complicated to test it or even find all the possible places that can be changed. Whenever possible, pass the address of the data to be modified as an input parameter to the functions. That will make them easier to understand and test.


Object interaction. This is the hardest effect to test. Sometimes calling a function doesn’t return anything or modify any external data directly, and it instead interacts with other objects. We want to test that the interaction happened in the order we expected and with the parameters we expected.


Testing the first two cases is relatively simple, and there’s nothing you need to do beyond what a basic unit testing-framework provides. Call the function and verify values with a CHECK statement. Done. However, testing that an object “talks” with other objects in the correct way is much trickier. That’s what we’ll concentrate on for the rest of the article.
As a side note, when we talk about object interaction, it simply refers to parts of the code calling functions or sending messages to other parts of the code. It doesn’t necessarily imply real objects. Everything we cover here applies as well to plain functions calling other functions.
Before we go any further, let’s look at a simple example of object interaction. We have a game entity factory and we want to test that the function CreateGameEntity() finds the entity template in the dictionary and calls CreateMesh() once per each mesh.
TEST(CreateGameEntityCallsCreateMeshForEachMesh)
{
    EntityDictionary dict;
    MeshFactory meshFactory;
    GameEntityFactory gameFactory(dict, meshFactory);

    Entity* entity = gameFactory.CreateGameEntity(gameEntityUid);
    // How do we test it called the correct functions?
}
We can write a test like the one above, but after we call the function CreateGameEntity(), how do we test the right functions were called in response? We can try testing for their results. For example, we could check that the returned entity has the correct number of meshes, but that relies on the mesh factory working correctly, which we’ve probably tested elsewhere, so we’re testing things multiple times. It also means that it needs to physically create some meshes, which can be time consuming or just need more resources than we want for a unit test. Remember that these are unit tests, so we really want to minimize the amount of code that is under test at any one time. Here we only want to test that the entity factory does the right thing, not that the dictionary or the mesh factory work.
Introducing Mocks
To test interactions between objects, we need something that sits between those objects and intercepts all the function calls we care about. At the same time, we want to make sure that the code under test doesn’t need to be changed just to be able to write tests, so this new object needs to look just like the objects the code expects to communicate with.
A mock object is an object that presents the same interface as some other object in the system, but whose only goal is to attach to the code under test and record function calls. This mock object can then be inspected by the test code to verify all the communication happened correctly.
TEST(CreateGameEntityCallsCreateMeshForEachMesh)
{
    MockEntityDictionary dict;
    MockMeshFactory meshFactory;
    GameEntityFactory gameFactory(dict, meshFactory);

dict.meshCount = 3;

    Entity* entity = gameFactory.CreateGameEntity(gameEntityUid);

    CHECK_EQUAL(1, dict.getEntityInfoCallCount);
    CHECK_EQUAL(gameEntityUid, dict.lastEntityUidPassed);
    CHECK_EQUAL(3, meshFactory.createMeshCallCount);
}
This code shows how a mock object helps us test our game entity factory. Notice how there are no real MeshFactory or EntityDictionary objects. Those have been removed from the test completely and replaced with mock versions. Because those mock objects implement the same interface as the objects they’re standing for, the GameEntityFactory doesn’t know that it’s being tested and goes about business as usual.
Here are the mock objects themselves:
struct MockEntityDictionary : public IEntityDictionary
{
    MockEntityDictionary() 
        : meshCount(0)
        , lastEntityUidPassed(0)
        , getEntityInfoCallCount(0)
    {}

    void GetEntityInfo(EntityInfo& info, int uid)
    {
        lastEntityUidPassed = uid;
        info.meshCount = meshCount;
        ++getEntityInfoCallCount;
    }

    int meshCount;
    int lastEntityUidPassed;
    int getEntityInfoCallCount;
};

struct MockMeshFactory : public IMeshFactory
{
    MockMeshFactory() : createMeshCallCount(0)
    {}

    Mesh* CreateMesh()
    {
        ++createMeshCallCount;
        return NULL;
    }
};
Notice that they do no real work; they’re just there for bookkeeping purposes. They count how many times functions are called, some parameters, and return whatever values you fed them ahead of time. The fact that we’re setting the meshCount in the dictionary to 3 is how we can then test that the mesh factory is called the correct number of times.
When developers talk about mock objects, they’ll often differentiate between mocks and fakes. Mocks are objects that stand in for a real object, and they are used to verify the interaction between objects. Fakes also stand in for real objects, but they’re there to remove dependencies or speed up tests. For example, you could have a fake object that stands in for the file system and provides data directly from memory, allowing tests to run very quickly and not depend on a particular file layout. All the techniques presented in this article apply both to mocks and fakes, it’s just how you use them that sets them apart from each other.
Mocking Frameworks
The basics of mocking objects are as simple as what we’ve seen. Armed with that knowledge, you can go ahead and test all the object interactions in your code. However, I bet that you’re going to get tired quickly from all that typing every time you create a new mock. The bigger and more complex the object is, the more tedious the operation becomes. That’s where a mocking framework comes in.
A mocking framework lets you create mock objects in a more automated way, with less typing. Different frameworks use different syntax, but at the core they all have two parts to them: A semi-automatic way of creating a mock object from an existing class or interface. A way to set up the mock expectations. Expectations are the results you expect to happen as a result of the test: functions called in that object, the order of those calls, or the parameters passed to them.
Once the mock object has been created and its expectations set, you perform the rest of the unit test as usual. If the mock object didn’t receive the correct calls the way you specified in the expectations, the unit test is marked as failed. Otherwise the test passes and everything is good.
GoogleMock
GoogleMock is the free C++ mocking framework provided by Google. It takes a very straightforward implementation approach and offers a set of macros to easily create mocks for your classes, and set up expectations. Because you need to create mocks by hand, there’s still a fair amount of typing involved to create each mock, although they provide a Python script that can generate mocks automatically from from C++ classes. It still relies on your classes inheriting from a virtual interface to hook up the mock object to your code.
This code shows the game entity factory test written with GoogleMock. Keep in mind that in addition to the test code, you still need to create the mock object through the macros provided in the framework.
TEST(CreateGameEntityCallsCreateMeshForEachMesh) 
{
    MockEntityDictionary dict;
    MockMeshFactory meshFactory;
    GameEntityFactory gameFactory(dict, meshFactory);

    EXPECT_CALL(dict, GetEntityInfo())
        .Times(1)
        .WillOnce(Return(EntityInfo(3));

    EXPECT_CALL(meshFactory, CreateMesh())
        .Times(3);

    Entity* entity = gameFactory.CreateGameEntity(gameEntityUid);
}
MockItNow
This open-source C++ mocking framework written by Rory Driscoll takes a totally different approach from GoogleMock. Instead of requiring that all your mockable classes inherit from a virtual interface, it uses compiler support to insert some code before each call. This code can then call the mock and return to the test directly, without ever calling the real object.
From a technical point of view, it’s a very slick method of hooking up the mocks, but the main advantage of this approach is that it doesn’t force a virtual interface on classes that don’t need it. It also minimizes typing compared to GoogleMock. The only downside is that it’s very platform-specific implementation, and the version available only supports Intel x86 processors, although it can be re-implemented for PowerPC architectures.
Problems With Mocks
There is no doubt that mocks are a very useful tool. They allow us to test object interactions in our unit tests without involving lots of different classes. In particular, mock frameworks make using mocks even simpler, saving typing and reducing the time we have to spend writing tests. What’s not to like about them?
The first problem with mocks is that they can add extra unnecessary complexity to the code, just for the sake of testing. In particular, I’m referring to the need to have a virtual interface that objects are are going to be mocked inherit from. This is a requirement if you’re writing mocks by hand or using GoogleMock (not so much with MockItNow), and the result is more complicated code: You need to instantiate the correct type, but then you pass around references to the interface type in your code. It’s just ugly and I really resent that using mocks is the only reason those interfaces are there. Obviously, if you need the interface and you’re adding a mock to it afterwards, then there’s no extra complexity added.
If the complexity and ugliness argument doesn’t sway you, try this one: Every unnecessary interface is going to result in an extra indirection through a vtable with the corresponding performance hit. Do you really want to fill up your game code with interfaces just for the sake of testing? Probably not.
But in my mind, there’s another, bigger disadvantage to using mock frameworks. One of the main benefits of unit tests is that they encourages a modular design, with small, independent objects, that can be easily used individually. In other words, unit tests tend to push design away from object interactions and more towards returning values directly or modifying external data.
A mocking framework can make creating mocks so easy, to the point that it doesn’t discourage programmers from creating a mock object any time they think of one. And when you have a good mocking framework, every object looks like a good mock candidate. At that point, your code design is going to start looking more like a tangled web of objects communicating in complex ways, rather than simple functions without any dependencies. You might have saved some typing time, but at what price!
When to Use Mock Frameworks
That doesn’t mean that you shouldn’t use a mocking framework though. A good mocking framework can be a lifesaving tool. Just be very, very careful how you use it.
The case when using a mocking framework is most justified when dealing with existing code that was not written in unit testing in mind. Code that is tangled together, and impossible to use in isolation. Sometimes that’s third-party libraries, and sometimes it’s even (yes, we can admit it) code that we wrote in the past, maybe under a tight deadline, or maybe before we cared much about unit tests. In any case, trying to write unit tests that interface with code not intended to be tested can be extremely challenging. So much so, that a lot of people give up on unit tests completely because they don’t see a way of writing unit tests without a lot of extra effort. A mocking framework can really help in that situation to isolate the new code you’re writing, from the legacy code that was not intended for testing.
Another situation when using a mocking framework is a big win is to use as training wheels to get started with unit tests in your codebase. There’s no need to wait until you start a brand new project with a brand new codebase (how often does that happen anyway?). Instead, you can start testing today and using a good mock framework to help isolate your new code from the existing one. Once you get the ball rolling and write new, testable code, you’ll probably find you don’t need it as much.
Apart from that, my recommendation is to keep your favorite mocking framework ready in your toolbox, but only take it out when you absolutely need it. Otherwise, it’s a bit like using a jackhammer to put a picture nail on the wall. Just because you can do it, it doesn’t mean it’s a good idea.
Keep in mind that these recommendations are aimed at using mock objects in C and C++. If you’re using other languages, especially more dynamic or modern ones, using mock objects is even simpler and without many of the drawbacks. In a lot of other languages, such as Lua, C#, or Python, your code doesn’t have to be modified in any way to insert a mock object. In that case you’re not introducing any extra complexity or performance penalties by using mocks, and none of the earlier objections apply. The only drawback left in that case is the tendency to create complex designs that are heavily interconnected, instead of simple, standalone pieces of code. Use caution and your best judgement and you’ll make the best use of mocks.
This article was originally printed in the June 2009 issue of Game Developer.



Nitty Gritty Unit Testing
Sun, 18 Jul 2010 00:00:00 +0000
It’s one thing to see someone drive a car and have a theoretical understanding of what the pedals do and how to change gears. It’s is a completely different thing to be able to drive a car safely on the street. There are some activities that require many small details and some hands-on experience to be able to execute them successfully.
The good news is that unit testing is a lot simpler than driving a standard shift, but there are a lot of small details you need to get right to do it successfully. Even after reading about unit testing and being convinced of its benefits, programmers are often not sure how to get started. This month’s column is not going to try to convince you of the many benefits of unit testing (I hope you are already convinced), but rather, describe some very concrete tips to help you get started right away.
Goals of Unit Testing
There are many different reasons to write unit tests:

Correctness testing. Checking that the code behaves as designed.
Boundary testing. Checking that the code behaves correctly in odd or boundary situations.
Regression testing. Checking that the behavior of the code doesn’t change unintentionally over time.
Performance testing. Checking that the program meets certain minimum performance or memory constraints.
Platform testing. Checking that the code behaves the same across multiple platforms.
Design. Tests provide a way to advance the code design and architecture. This is usually referred as Test-Driven Development (TDD).
Full game or tool testing. Technically this is a functional test, not a unit test anymore because it involves the whole program instead of a small subset of the code, but a lot of the same techniques apply.

Some developers use unit tests only for one of the reasons listed above, while others use many kinds of tests for a variety of different reasons. It’s important to recognize that because there are so many different uses for unit tests, no single solution is going to fit everybody. The ideal setup for some of those situations is going to be slightly different than for others. The basics are the same for all of them though.
When working with unit tests, these are our main goals:

Spend as little time as possible writing a new test.
Be notified of failing tests, and see at a glance which ones failed and why.
Trust our tests. Have them be consistent from run to run and robust in the face of bad code.

Testing Framework
Most of us have created one-off programs in the past to test some particularly complicated code. It’s usually a quick command line program that runs through a bunch of cases and asserts after each one that the results were correct. That’s the most bare-bones way of creating unit tests.
Unfortunately, it’s also a pain and it misses on most of the unit-testing goals described in the previous section: creating a new program just for that is a pain, we have to go out of our way to run the tests, and it usually gets out of date faster than the latest Internet meme. That is in part why a lot of programmers have an initial aversion to writing unit tests.
If you’re considering writing even a small unit test, you should use a unit-testing framework. A unit-testing framework removes all the busy work from writing unit tests and lets you spend your time on the logic of what to test. This doesn’t mean that the framework writes the tests for you. Be very wary of any tool that claims to do that! No, a unit-testing framework is simply a small library that provides all the glue for running unit tests and reporting the results. Sorry, you still need to use your brain and do (some of) the typing.
A quick search will reveal plenty of unit-testing frameworks to choose from for your language of choice, and most of them are free and open source so you can rely on them and modify them to suit your needs. For C/C++ and game development, I strongly recommend starting with UnitTest++. Charles Nicholson and I wrote that framework a few years ago specifically with games and consoles in mind. Many game teams have adopted it for their games and tools, and it has been used on lots of different game platforms including current and last generation consoles, Windows, Linux, and Mac PCs, and even on the iPhone. In most situations, it should be a straight drop-in to your project and you’re up and running.
If you end up using a different testing framework, or even if you roll your own, the techniques described here still apply, even if the syntax is slightly different.
Hello Tests
Writing your first test is easy as pie:
#include 

TEST(MyFirstTest)
{
    int a = 4;
    CHECK(a == 4);
}
To run it you need to add the following line to your executable somewhere. We’ll talk more about the physical organization of tests in a moment.
int failedCount = UnitTest::RunAllTests();
Done! Easy, wasn’t it? When you compile and run the program you should see the following output:
Success: 1 test passed.
Test time: 0.00 seconds.
Let’s add a failing test:
TEST(MyFailingTest)
{
    int a = 5;
    CHECK(a == 4);
}
Now we get:
/fullpath/filename.cpp:17: error: Failure in MyFailingTest: a == 4
FAILURE: 1 out of 2 tests failed (1 failures).
Test time: 0.00 seconds.
That’s great, but if we’re going to diagnose the problem, we probably need to know the value of the variable a, and all the test is telling us is that it’s not 4. So instead, we can change the CHECK statement to the following:
TEST(MyFailingTest)
{
    int a = 5;
    CHECK_EQUAL(4, a);
}
And now the output will be
/fullpath/filename.cpp:17: error: Failure in MyFailingTest: Expected 4 but was 5
FAILURE: 1 out of 2 tests failed (1 failures).
Test time: 0.00 seconds.
Much better. Now we get both the error information and the value of the variable under test. Virtually all unit-testing frameworks include different types of CHECK statements to get more information when testing floats, arrays, or other data types. You can even make your own CHECK statement for your own common data types such as colors or lists.
As a bonus, if you’re using an IDE, double-clicking on the test failure message should bring you automatically to the failing test statement.
When To Run
When to run unit tests will depend on what is being tested and how long it takes to do so. In general, the more frequently you run the tests, the better. The sooner you get feedback that something went wrong, the easier it will be to fix. Maybe even before it was checked in and it spread to the rest of the team. On the flip side, realistically, building and running a set of unit tests takes a certain amount of time, so it’s important to find the right balance between feedback frequency and time spent waiting for tests.
At the very least, all tests should run once a day, during the nightly build process in your build server (You have a build server, don’t you? If not, run over and read this column in the August 2008 issue). It doesn’t matter how long they take or how how many different projects you need to run. Just add them to the build script, and hook their output into the build results.
On the other extreme, you can build your tests every time you build the project and execute them as a postbuild step. That way, any time you make a change to a project, all tests will execute and you’ll see if anything went wrong. This is a great approach, but I wouldn’t recommend it if tests add more than a couple of seconds to the incremental build time, otherwise, they’ll be slowing you down more than they help.
For most developers, some approach in the middle will make most sense. For example, take a small, fast subset of tests that are more likely to break, and run those with every build. Whenever any code is checked in to version control, the build server can run those tests plus a few more, slower ones. And finally, at night, you can bring out the big guns and run those really long, really thorough ones that take a few hours to complete. You can separate those tests into different projects, or, if your framework supports them, into different test suites, which allow you to decide which sets of tests to execute at runtime. Reporting Results
If a unit test fails and nobody notices, is it really an error? Just running the tests isn’t good enough. We have to make sure that someone sees the failure it and fixes the problem.
Most unit-testing frameworks will let you customize how you want the failure errors to be reported. By default they will probably be sent to stdout, but you can easily customize the framework to send them to debug log streams, save to a file, or upload them to a server.
Even more important than the actual error messages is detecting whether there were any failures. After running all the tests, there is usually some way to detect how many tests failed. The program that was running the tests can detect any failures, print an error message, and exit with an error code. That error code will propagate to the build server and trigger a build failure. Hopefully by now alarm sounds are going off across the office and someone is on his way to fix it.
Project Organization
When people start down the unit test path, they often struggle to figure out how to physically lay out the unit tests. In the end, it really doesn’t matter too much as long as it makes sense to you, the final build doesn’t contain any tests, and they’re still easy to build and run.
My personal preference is to keep unit tests separate from the rest of the code. Usually I end up creating one file of tests for every cpp file. So FirstPersonCameraController.h and .cpp have a corresponding TestFirstPersonCameraController.cpp. Since I use this convention regularly throughout all of my code, I have a custom IDE macro to toggle between a file and its corresponding test file. I also put all the tests in a separate subdirectory to keep them as physically separate as possible.
I prefer to break up my code into several static libraries for each major subsystem: graphics, networking, physics, animations, etc. Each of those libraries has a set of unit tests, but instead of compiling them into the library, I create a separate project that creates a simple executable program. That project contains all the unit tests and links against the library itself, and in its main entry point it just calls the function to runs all unit tests and returns the number of failures. This keeps the tests separate from the library, but still very easy to build.
If all your code is organized into libraries, and your game is just a collection of libraries linked together, that’s all you need. Most games and tools, however, have a fair amount of code that you might want to test in the project itself. Since the game is an executable, you can’t easily link against it from a different project like we did before. In this case, I build the unit tests into the game itself, and I optionally call them whenever a particular command-line parameter like -runtests is present. Just make sure to #ifdef out all the tests in the final build.
Multiplatform Testing
Running the tests on the PC where you build the code is very straightforward. But unless you’re only creating games and tools for that platform, you will definitely want to run your tests on different platforms as well. Unit tests are an invaluable tool for catching slight platform inconsistencies caused by different compilers, architecture idiosyncrasies, or varying floating point rules.
Unfortunately, running unit tests on a different platform from your build machine is usually a bit more involved and not nearly as fast as doing it locally. You need to start by compiling the tests for the target platform. This is usually not a problem since you’re already building all your code for that platform, and hopefully your unit testing framework already supports it. Then you need to upload your executable with the tests and any data required to the target platform and run it there. Finally, you need to get the return code back to detect if there were any failures. This is surprisingly the trickiest part of the process with a lot of console development kits. If getting the exit code is not a possibility, you’ll need to get creative by parsing the output channel, or even waiting for a notification on a particular network port.
Some target platforms are more limited than others in both resources and C++ support. One of the features that makes UnitTest++ a good choice for games is that it requires minimal C++ features (no STL) and it can be trimmed down even further (no exceptions or streams).
For example, running unit tests on the PS3 SPUs was extremely useful, but it required stripping the framework down to the minimum amount of features. It was also tricky being able to fit the library code plus all the tests in the small amount of memory available. To get around that, we ended up changing the build rules for the SPUs so each test file created its own SPU executable (or module). We then wrote a simple main SPU program that would load each module separately, run its tests, keep track of all the stats, and finally report them.
Running a set of unit tests on the local machine can be an almost instant process, but running them on a remote machine is usually much slower, and can take up to 10 or 20 seconds just with the overhead of copying them and launching the program remotely. For this reason, you’ll want to run tests on other platforms less frequently.
No Leaking Allowed
Finally, if you’re going to have this all this unit testing code running on a regular basis, you might as well get as much information out of it as possible. I have found it invaluable to keep track of memory leaks around the unit test code.
You’ll have to hook into your own memory manager, or use the platform-specific memory tracking functions. The basic idea is to get a memory status before running the tests, and another one after all the tests execute. If there are any extra memory allocations, that’s probably a leak. In that case, you can report it as a failed build by returning the correct error code.
Watch out for static variables or singletons that allocate memory the first time they’re used. They might be reported as memory leaks even though it wasn’t what you were hoping to catch. In that case, you can explicitly initialize and destroy all singletons, or, even better, not use them at all, and keep your memory leak report clean.
You’re now armed with all you need to know to set up unit tests into your project and build pipeline. Grab a testing framework and get started today.
This article was originally printed in the May 2009 issue of Game Developer.



Managing Data Relationships
Sun, 27 Jun 2010 00:00:00 +0000
_I’ve been meaning to put up the rest of the Inner Product columns I wrote for Game Developer Magazine, but I wasn’t finding the time. With all the recent discussion on data-oriented design, I figured it was time to dust some of them off.
This was one of the first columns I wrote. At first glance it might seem a completely introductory topic, not worth spending that much time on it. After all, all experience programmers know about pointers and indices, right? True, but I don’t think all programmers really take the time to think about the advantages and disadvantages of each approach, and how it affects architecture decisions, data organization, and memory traversal.
It should provide a good background for this coming Thursday’s #iDevBlogADay post on how to deal with heterogeneous objects in a data-oriented way._
From a 10,000-Foot view, all video games are just a sequence of bytes. Those bytes can be divided into code and data. Code is executed by the hardware and it performs operations on the data. This code is generated by the compiler and linker from the source code in our favorite computer language. Data is just about everything else. [1]
As programmers, weâ€™re obsessed with code: beautiful algorithms, clean logic, and efficient execution. We spend most of our time thinking about it and make most decisions based on a code-centric view of the game.
Modern hardware architectures have turned things around. A data-centric approach can make much better use of hardware resources, and can produce code that is much simpler to implement, easier to test, and easier to understand. In the next few months, weâ€™ll be looking at different aspects of game data and how everything affects the game. This month we start by looking at how to manage data relationships.
Data Relationships
Data is everything that is not code: meshes and textures, animations and skeletons, game entities and pathfinding networks, sounds and text, cut scene descriptions and dialog trees. Our lives would be made simpler if data simply lived in memory, each bit totally isolated from the rest, but thatâ€™s not the case. In a game, just about all the data is intertwined in some way. A model refers to the meshes it contains, a character needs to know about its skeleton and its animations, and a special effect points to textures and sounds.
How are those relationships between different parts of data described? There are many approaches we can use, each with its own set of advantages and drawbacks. There isnâ€™t a one-size-fits-all solution. Whatâ€™s important is choosing the right tool for the job.
Pointing The Way
In C++, regular pointers (as opposed to â€œsmart pointersâ€ which weâ€™ll discuss later on) are the easiest and most straightforward way to refer to other data. Following a pointer is a very fast operation, and pointers are strongly typed, so itâ€™s always clear what type of data theyâ€™re pointing to.
However, they have their share of shortcomings. The biggest drawback is that a pointer is just the memory address where the data happens to be located. We often have no control over that location, so pointer values usually change from run to run. This means if we attempt to save a game checkpoint which contains a pointer to other parts of the data, the pointer value will be incorrect when we restore it.
Pointers represent a many-to-one relationship. You can only follow a pointer one way, and it is possible to have many pointers pointing to the same piece of data (for example, many models pointing to the same texture). All of this means that it is not easy to relocate a piece of data that is referred to by pointers. Unless we do some extra bookkeeping, we have no way of knowing what pointers are pointing to the data we want to relocate. And if we move or delete that data, all those pointers wonâ€™t just be invalid, theyâ€™ll be dangling pointers. They will point to a place in memory that contains something else, but the program will still think it has the original data in it, causing horrible bugs that are no fun to debug.
One last drawback of pointers is that even though theyâ€™re easy to use, somewhere, somehow, they need to be set. Because the actual memory location addresses change from run to run, they canâ€™t be computed offline as part of the data build. So we need to have some extra step in the runtime to set the pointers after loading the data so the code can use them. This is usually done either by explicit creation and linking of objects at runtime, by using other methods of identifying data, such as resource UIDs created from hashes, or through pointer fixup tables converting data offsets into real memory addresses. All of it adds some work and complexity to using pointers.
Given those characteristics, pointers are a good fit to model relationships to data that is never deleted or relocated, from data that does not need to be serialized. For example, a character loaded from disk can safely contain pointers to its meshes, skeletons, and animations if we know weâ€™re never going to be moving them around.
Indexing
One way to get around the limitation of not being able to save and restore pointer values is to use offsets into a block of data. The problem with plain offsets is that the memory location pointed to by the offset then needs to be cast to the correct data type, which is cumbersome and prone to error.
The more common approach is to use indices into an array of data. Indices, in addition to being safe to save and restore, have the same advantage as pointers in that theyâ€™re very fast, with no extra indirections or possible cache misses.
Unfortunately, they still suffer from the same problem as pointers of being strictly a many-to-one relationship and making it difficult to relocate or delete the data pointed to by the index. Additionally, arrays can only be used to store data of the same type (or different types but of the same size with some extra trickery on our part), which might be too restrictive for some uses.
A good use of indices into an array are particle system descriptions. The game can create instances of particle systems by referring to their description by index into that array. On the other hand, the particle system instances themselves would not be a good candidate to refer to with indices because their lifetimes vary considerably and they will be constantly created and destroyed.
Itâ€™s tempting to try and extend this approach to holding pointers in the array instead of the actual data values. That way, we would be able to deal with different types of data. Unfortunately, storing pointers means that we have to go through an extra indirection to reach our data, which incurs a small performance hit. Although this performance hit is something that we’re going to have to live with for any system that allows us to relocate data, the important thing is to keep the performance hit as small as possible.
An even bigger problem is that, if the data is truly heterogeneous, we still need to cast it to the correct type before we use it. Unless all data referred to by the pointers inherits from a common base class that we can use to query for its derived type, we have no easy way to find out what type the data really is. On the positive side, now that weâ€™ve added an indirection (index to pointer, pointer to data), we could relocate the data, update the pointer in the array, and all the indices would still be valid. We could even delete the data and null the pointer out to indicate it is gone. Unfortunately, what we canâ€™t do is reuse a slot in the array since we donâ€™t know if thereâ€™s any data out there using that particular index still referring to the old data.
Because of these drawbacks, indices into an array of pointers is usually not an effective way to keep references to data. Itâ€™s usually better to stick with indices into an array of data, or extend the idea a bit further into a handle system, which is much safer and more versatile.
Handle-Ing The Problem
Handles are small units of data (32 bits typically) that uniquely identify some other part of data. Unlike pointers, however, handles can be safely serialized and remain valid after theyâ€™re restored. They also have the advantages of being updatable to refer to data that has been relocated or deleted, and can be implemented with minimal performance overhead.
The handle is used as a key into a handle manager, which associates handles with their data. The simplest possible implementation of a handle manager is a list of handle-pointer pairs and every lookup simply traverses the list looking for the handle. This would work but itâ€™s clearly very inefficient. Even sorting the handles and doing a binary search is slow and we can do much better than that.
Here’s an efficient implementation of a handle manager (released under the usual MIT license, so go to town with it). The handle manager is implemented as an array of pointers, and handles are indices into that array. However, to get around the drawbacks of plain indices, handles are enhanced in a couple of ways.
In order to make handles more useful than pointers, weâ€™re going to use up different bits for different purposes. We have a full 32 bits to play with, so this is how weâ€™re going to carve them out:



The index field. These bits will make up the actual index into the handle manager, so going from a handle to the pointer is a very fast operation. We should make this field as large as we need to, depending on how many handles we plan on having active at once. 14 bits give us over 16,000 handles, which seems plenty for most applications. But if you really need more, you can always use up a couple more bits and get up to 65,000 handles.


The counter field. This is the key to making this type of handle implementation work. We want to make sure we can delete handles and reuse their indices when we need to. But if some part of the game is holding on to a handle that gets deletedâ€”and eventually that slot gets reused with a new handleâ€”how can we detect that the old handle is invalid? The counter field is the answer. This field contains a number that goes up every time the index slot is reused. Whenever the handle manager tries to convert a handle into a pointer, it first checks that the counter field matches with the stored entry. Otherwise, it knows the handle is expired and returns null.


The type field. This field indicates what type of data the pointer is pointing to. There are usually not that many different data types in the same handle manager, so 6â€“8 bits are usually enough. If youâ€™re storing homogeneous data, or all your data inherits from a common base class, then you might not need a type field at all.


struct Handle
{
    Handle() : m_index(0), m_counter(0), m_type(0)
    {}

    Handle(uint32 index, uint32 counter, uint32 type)
        : m_index(index), m_counter(counter), m_type(type)
    {}

    inline operator uint32() const;
    
    uint32 m_index : 12;
    uint32 m_counter : 15;
    uint32 m_type : 5;
};

Handle::operator uint32() const
{
    return m_type << 27 | m_counter << 12 | m_index;
}
The workings of the handle manager itself are pretty simple. It contains an array of HandleEntry types, and each HandleEntry has a pointer to the data and a few other bookkeeping fields: freelist indices for efficient addition to the array, the counter field corresponding to each entry, and some flags indicating whether an entry is in use or itâ€™s the end of the freelist.
struct HandleEntry
{
	HandleEntry();
	explicit HandleEntry(uint32 nextFreeIndex);
	
	uint32 m_nextFreeIndex : 12;
	uint32 m_counter : 15;
	uint32 m_active : 1;
	uint32 m_endOfList : 1;
	void* m_entry;
};
Accessing data from a handle is just a matter of getting the index from the handle, verifying that the counters in the handle and the handle manager entry are the same, and accessing the pointer. Just one level of indirection and very fast performance.
We can also easily relocate or invalidate existing handles just by updating the entry in the handle manager to point to a new location or to flag it as removed.
Handles are the perfect reference to data that can change locations or even be removed, from data that needs to be serialized. Game entities are usually very dynamic, and are created and destroyed frequently (such as enemies spawning and being destroyed, or projectiles). So any references to game entities would be a good fit for handles, especially if this reference is held from another game entity and its state needs to be saved and restored. Examples of these types of relationships are the object a player is currently holding, or the target an enemy AI has locked onto.
Getting Smarter
The term smart pointers encompasses many different classes that give pointer-like syntax to reference data, but offer some extra features on top of â€œrawâ€ pointers.
A common type of smart pointer deals with object lifetime. Smart pointers keep track of how many references there are to a particular piece of data, and free it when nobody is using it. For the runtime of games, I prefer to have very explicit object lifetime management, so Iâ€™m not a big fan of this kind of pointers. They can be of great help in development for tools written in C++ though.
Another kind of smart pointers insert an indirection between the data holding the pointer and the data being pointed. This allows data to be relocated, like we could do with handles. However, implementations of these pointers are often non- serializable, so they can be quite limiting.
If you consider using smart pointers from some of the popular libraries (STL, Boost) in your game, you should be very careful about the impact they can have on your build times. Including a single header file from one of those libraries will often pull in numerous other header files. Additionally, smart pointers are often templated, so the compiler will do some extra work generating code for each data type you instantiated templates on. All in all, templated smart pointers can have a significant impact in build times unless they are managed very carefully.
Itâ€™s possible to implement a smart pointer that wraps handles, provides a syntax like a regular pointer, and it still consists of a handle underneath, which can be serialized without any problem. But is the extra complexity of that layer worth the syntax benefits it provides? It will depend on your team and what youâ€™re used to, but itâ€™s always an option if the team is more comfortable dealing with pointers instead of handles.
Conclusion
There are many different approaches to expressing data relationships. Itâ€™s important to remember that different data types are better suited to some approaches than others. Pick the right method for your data and make sure itâ€™s clear which one youâ€™re using.
In the next few months, weâ€™ll continue talking about data, and maybe even convince you that putting some love into your data can pay off big time with your code and the game as a whole.
This article was originally printed in the September 2008 issue of Game Developer.
[1] I’m not too happy about the strong distinction I was making between code and data. Really, data is any byte in memory, and that includes code. Most of the time programs are going to be managing references to non-code data, but sometimes to other code as well: function pointers, compiled shaders, compiled scripts, etc. So just ignore that distinction and think of data in a more generic way.



Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP)
Fri, 04 Dec 2009 00:00:00 +0000
Picture this: Toward the end of the development cycle, your game crawls, but you don’t see any obvious hotspots in the profiler. The culprit? Random memory access patterns and constant cache misses. In an attempt to improve performance, you try to parallelize parts of the code, but it takes heroic efforts, and, in the end, you barely get much of a speed-up due to all the synchronization you had to add. To top it off, the code is so complex that fixing bugs creates more problems, and the thought of adding new features is discarded right away. Sound familiar?
That scenario pretty accurately describes almost every game I’ve been involved with for the last 10 years. The reasons aren’t the programming languages we’re using, nor the development tools, nor even a lack of discipline. In my experience, it’s object- oriented programming (OOP) and the culture that surrounds it that is in large part to blame for those problems. OOP could be hindering your project rather than helping it!
It’s All About Data
OOP is so ingrained in the current game development culture that it’s hard to think beyond objects when thinking about a game. After all, we’ve been creating classes representing vehicles, players, and state machines for many years. What are the alternatives? Procedural programming? Functional languages? Exotic programming languages?
Data-oriented design is a different way to approach program design that addresses all these problems. Procedural programming focuses on procedure calls as its main element, and OOP deals primarily with objects. Notice that the main focus of both approaches is code: plain procedures (or functions) in one case, and grouped code associated with some internal state in the other. Data-oriented design shifts the perspective of programming from objects to the data itself: The type of the data, how it is laid out in memory, and how it will be read and processed in the game.
Programming, by definition, is about transforming data: It’s the act of creating a sequence of machine instructions describing how to process the input data and create some specific output data. A game is nothing more than a program that works at interactive rates, so wouldn’t it make sense for us to concentrate primarily on that data instead of on the code that manipulates it?
I’d like to clear up potential confusion and stress that data-oriented design does not imply that something is data- driven. A data-driven game is usually a game that exposes a large amount of functionality outside of code and lets the data determine the behavior of the game. That is an orthogonal concept to data-oriented design, and can be used with any type of programming approach.
Ideal Data
If we look at a program from the data point of view, what does the ideal data look like? It depends on the data and how it’s used. In general, the ideal data is in a format that we can use with the least amount of effort. In the best case, the format will be the same we expect as an output, so the processing is limited to just copying that data. Very often, our ideal data layout will be large blocks of contiguous, homogeneous data that we can process sequentially. In any case, the goal is to minimize the amount of transformations, and whenever possible, you should bake your data into this ideal format offline, during your asset-building process.
Because data-oriented design puts data first and foremost, we can architect our whole program around the ideal data format. We won’t always be able to make it exactly ideal (the same way that code is hardly ever by-the-book OOP), but it’s the primary goal to keep in mind. Once we achieve that, most of the problems I mentioned at the beginning of the column tend to melt away (more about that in the next section).
When we think about objects, we immediately think of treesâ€” inheritance trees, containment trees, or message-passing trees, and our data is naturally arranged that way. As a result, when we perform an operation on an object, it will usually result in that object in turn accessing other objects further down in the tree. Iterating over a set of objects performing the same operation generates cascading, totally different operations at each object (see Figure 1a).
To achieve the best possible data layout, it’s helpful to break down each object into the different components, and group components of the same type together in memory, regardless of what object they came from. This organization results in large blocks of homogeneous data, which allow us to process the data sequentially (see Figure 1b). A key reason why data-oriented design is so powerful is because it works very well on large groups of objects. OOP, by definition, works on a single object. Step back for a minute and think of the last game you worked on: How many places in the code did you have only one of something? One enemy? One vehicle? One pathfinding node? One bullet? One particle? Never! Where there’s one, there are many. OOP ignores that and deals with each object in isolation. Instead, we can make things easy for us and for the hardware and organize our data to deal with the common case of having many items of the same type.
Does this sound like a strange approach? Guess what? You’re probably already doing this in some parts of your code: The particle system! Data-oriented design is turning our whole codebase into a gigantic particle system. Perhaps a name for this approach that would be more familiar to game programmers would have been particle-driven programming.
Advantages of Data-Oriented Design
hinking about data first and architecting the program based on that brings along lots of advantages.
Parallelization.
These days, there’s no way around the fact that we need to deal with multiple cores. Anyone who has tried taking some OOP code and parallelizing it can attest how difficult, error prone, and possibly not very efficient that is. Often you end up adding lots of synchronization primitives to prevent concurrent access to data from multiple threads, and usually a lot of the threads end up idling for quite a while waiting for other threads to complete. As a result, the performance improvement can be quite underwhelming.
When we apply data-oriented design, parallelization becomes a lot simpler: We have the input data, a small function to process it, and some output data. We can easily take something like that and split it among multiple threads with minimal synchronization between them. We can even take it further and run that code on processors with local memory (like the SPUs on the Cell processor) without having to do anything differently.
Cache utilization.
In addition to using multiple cores, one of the keys to achieving great performance in modern hardware, with its deep instruction pipelines and slow memory systems with multiple levels of caches, is having cache-friendly memory access. Data-oriented design results in very efficient use of the instruction cache because the same code is executed over and over. Also, if we lay out the data in large, contiguous blocks, we can process the data sequentially, getting nearly perfect data cache usage and great performance. Possible optimizations. When we think of objects or functions, we tend to get stuck optimizing at the function or even the algorithm level; Reordering some function calls, changing the sort method, or even re-writing some C code with assembly.
That kind of optimization is certainly beneficial, but by thinking about the data first we can step further back and make larger, more important optimizations. Remember that all a game does is transform some data (assets, inputs, state) into some other data (graphics commands, new game states). By keeping in mind that flow of data, we can make higher-level, more intelligent decisions based on how the data is transformed, and how it is used. That kind of optimization can be extremely difficult and time- consuming to implement with more traditional OOP methods.
Modularity.
So far, all the advantages of data-oriented design have been based around performance: cache utilization, optimizations, and parallelization. There is no doubt that as game programmers, performance is an extremely important goal for us. There is often a conflict between techniques that improve performance and techniques that help readability and ease of development. For example, re-writing some code in assembly language can result in a performance boost, but usually makes the code harder to read and maintain.
Fortunately, data-oriented design is beneficial to both performance and ease of development. When you write code specifically to transform data, you end up with small functions, with very few dependencies on other parts of the code. The codebase ends up being very â€œflat,â€ with lots of leaf functions without many dependencies. This level of modularity and lack of dependences makes understanding, replacing, and updating the code much easier.
Testing.
The last major advantage of data-oriented design is ease of testing. As we saw in the June and August Inner Product columns, writing unit tests to check object interactions is not trivial. You need to set up mocks and test things indirectly. Frankly, it’s a bit of a pain. On the other hand, when dealing directly with data, it couldn’t be easier to write unit tests: Create some input data, call the transform function, and check that the output data is what we expect. There’s nothing else to it. This is actually a huge advantage and makes code extremely easy to test, whether you’re doing test-driven development or just writing unit tests after the code.
Drawbacks of Data-Oriented Design
Data-oriented design is not the silver bullet to all the problems in game development. It does help tremendously writing high-performance code and making programs more readable and easier to maintain, but it does come with a few drawbacks of its own.
The main problem with data-oriented design is that it’s different from what most programmers are used to or learned in school. It requires turning our mental model of the program ninety degrees and changing how we think about it. It takes some practice before it becomes second-nature.
Also, because it’s a different approach, it can be challenging to interface with existing code, written in a more OOP or procedural way. It’s hard to write a single function in isolation, but as long as you can apply data-oriented design to a whole subsystem you should be able to reap a lot of the benefits.
Applying Data-Oriented Design
Enough of the theory and overview. How do you actually get started with data-oriented design? To start with, just pick a specific area in your code: navigation, animations, collisions, or something else. Later on, when most of your game engine is centered around the data, you can worry about data flow all the way from the start of a frame until the end.
The next step is to clearly identify the data inputs required by the system, and what kind of data it needs to generate. It’s OK to think about it in OOP terms for now, just to help us identify the data. For example, in an animation system, some of the input data is skeletons, base poses, animation data, and current state. The result is not â€œthe code plays animations,â€ but the data generated by the animations that are currently playing. In this case, our outputs would be a new set of poses and an updated state.
It’s important to take a step further and classify the input data based on how it is used. Is it read- only, read-write, or write-only? That classification will help guide design decisions about where to store it, and when to process it depending on dependencies with other parts of the program.
At this point, stop thinking of the data required for a single operation, and think in terms of applying it to dozens or hundreds of entries. We no longer have one skeleton, one base pose, and a current state, and instead we have a block of each of those types with many instances in each of the blocks.
Think very carefully how the data is used during the transformation process from input to output. You might realize that you need to scan a particular field in a structure to perform a pass on the data, and then you need to use the results to do another pass. In that case, it might make more sense to split that initial field into a separate block of memory that can be processed independently, allowing for better cache utilization and potential parallelization. Or maybe you need to vectorize some part of the code, which requires fetching data from different locations to put it in the same vector register. In that case, that data can be stored contiguously so vector operations can be applied directly, without any extra transformations.
Now you should have a very good understanding of your data. Writing the code to transform it is going to be much simpler. It’s like writing code by filling in the blanks. You’ll even be pleasantly surprised to realize that the code is much simpler and smaller than you thought in the first place, compared to what the equivalent OOP code would have been.
If you think back about most of the topics we’ve covered in this column over the last year, you’ll see that they were all leading toward this type of design. Now it’s the time to be careful about how the data is aligned (Dec 2008 and Jan 2009), to bake data directly into an input format that you can use efficiently (Oct and Nov 2008), or to use non- pointer references between data blocks so they can be easily relocated (Sept 2009).
Is Thre Room For OOP?
Does this mean that OOP is useless and you should never apply it in your programs? I’m not quite ready to say that. Thinking in terms of objects is not detrimental when there is only one of each object (a graphics device, a log manager, etc) although in that case you might as well write it with simpler C-style functions and file-level static data. Even in that situation, it’s still important that those objects are designed around transforming data.
Another situation where I still find myself using OOP is GUI systems. Maybe it’s because you’re working with a system that is already designed in an object-oriented way, or maybe it’s because performance and complexity are not crucial factors with GUI code. In any case, I much prefer GUI APIs that are light on inheritance and use containment as much as possible (Cocoa and CocoaTouch are good examples of this). It’s very possible that a data-oriented GUI system could be written for games that would be a pleasure to work with, but I haven’t seen one yet.
Finally, there’s nothing stopping you from still having a mental picture of objects if that’s the way you like to think about the game. It’s just that the enemy entity won’t be all in the same physical location in memory. Instead, it will be split up into smaller subcomponents, each one forming part of a larger data table of similar components.
Data-oriented design is a bit of a departure from traditional programming approaches, but by always thinking about the data and how it needs to be transformed, you’ll be able to reap huge benefits both in terms of performance and ease of development.
Thanks to Mike Acton and Jim Tilander for challenging my ideas over the years and for their feedback on this article.
Picture this: Toward the end of the development cycle, your game crawls, but you don’t see any obvious hotspots in the profiler. The culprit? Random memory access patterns and constant cache misses. In an attempt to improve performance, you try to parallelize parts of the code, but it takes heroic efforts, and, in the end, you barely get much of a speed-up due to all the synchronization you had to add. To top it off, the code is so complex that fixing bugs creates more problems, and the thought of adding new features is discarded right away. Sound familiar?
That scenario pretty accurately describes almost every game I’ve been involved with for the last 10 years. The reasons aren’t the programming languages we’re using, nor the development tools, nor even a lack of discipline. In my experience, it’s object- oriented programming (OOP) and the culture that surrounds it that is in large part to blame for those problems. OOP could be hindering your project rather than helping it!
It’s All About Data
OOP is so ingrained in the current game development culture that it’s hard to think beyond objects when thinking about a game. After all, we’ve been creating classes representing vehicles, players, and state machines for many years. What are the alternatives? Procedural programming? Functional languages? Exotic programming languages?
Data-oriented design is a different way to approach program design that addresses all these problems. Procedural programming focuses on procedure calls as its main element, and OOP deals primarily with objects. Notice that the main focus of both approaches is code: plain procedures (or functions) in one case, and grouped code associated with some internal state in the other. Data-oriented design shifts the perspective of programming from objects to the data itself: The type of the data, how it is laid out in memory, and how it will be read and processed in the game.
Programming, by definition, is about transforming data: It’s the act of creating a sequence of machine instructions describing how to process the input data and create some specific output data. A game is nothing more than a program that works at interactive rates, so wouldn’t it make sense for us to concentrate primarily on that data instead of on the code that manipulates it?
I’d like to clear up potential confusion and stress that data-oriented design does not imply that something is data- driven. A data-driven game is usually a game that exposes a large amount of functionality outside of code and lets the data determine the behavior of the game. That is an orthogonal concept to data-oriented design, and can be used with any type of programming approach.
Ideal Data

Figure 1a. Call sequence with an object-oriented approach
If we look at a program from the data point of view, what does the ideal data look like? It depends on the data and how it’s used. In general, the ideal data is in a format that we can use with the least amount of effort. In the best case, the format will be the same we expect as an output, so the processing is limited to just copying that data. Very often, our ideal data layout will be large blocks of contiguous, homogeneous data that we can process sequentially. In any case, the goal is to minimize the amount of transformations, and whenever possible, you should bake your data into this ideal format offline, during your asset-building process.
Because data-oriented design puts data first and foremost, we can architect our whole program around the ideal data format. We won’t always be able to make it exactly ideal (the same way that code is hardly ever by-the-book OOP), but it’s the primary goal to keep in mind. Once we achieve that, most of the problems I mentioned at the beginning of the column tend to melt away (more about that in the next section).
When we think about objects, we immediately think of treesâ€” inheritance trees, containment trees, or message-passing trees, and our data is naturally arranged that way. As a result, when we perform an operation on an object, it will usually result in that object in turn accessing other objects further down in the tree. Iterating over a set of objects performing the same operation generates cascading, totally different operations at each object (see Figure 1a).

Figure 1b. Call sequence with a data-oriented approach
To achieve the best possible data layout, it’s helpful to break down each object into the different components, and group components of the same type together in memory, regardless of what object they came from. This organization results in large blocks of homogeneous data, which allow us to process the data sequentially (see Figure 1b). A key reason why data-oriented design is so powerful is because it works very well on large groups of objects. OOP, by definition, works on a single object. Step back for a minute and think of the last game you worked on: How many places in the code did you have only one of something? One enemy? One vehicle? One pathfinding node? One bullet? One particle? Never! Where there’s one, there are many. OOP ignores that and deals with each object in isolation. Instead, we can make things easy for us and for the hardware and organize our data to deal with the common case of having many items of the same type.
Does this sound like a strange approach? Guess what? You’re probably already doing this in some parts of your code: The particle system! Data-oriented design is turning our whole codebase into a gigantic particle system. Perhaps a name for this approach that would be more familiar to game programmers would have been particle-driven programming.
Advantages of Data-Oriented Design
Thinking about data first and architecting the program based on that brings along lots of advantages.
Parallelization.
These days, there’s no way around the fact that we need to deal with multiple cores. Anyone who has tried taking some OOP code and parallelizing it can attest how difficult, error prone, and possibly not very efficient that is. Often you end up adding lots of synchronization primitives to prevent concurrent access to data from multiple threads, and usually a lot of the threads end up idling for quite a while waiting for other threads to complete. As a result, the performance improvement can be quite underwhelming.
When we apply data-oriented design, parallelization becomes a lot simpler: We have the input data, a small function to process it, and some output data. We can easily take something like that and split it among multiple threads with minimal synchronization between them. We can even take it further and run that code on processors with local memory (like the SPUs on the Cell processor) without having to do anything differently.
Cache utilization.
In addition to using multiple cores, one of the keys to achieving great performance in modern hardware, with its deep instruction pipelines and slow memory systems with multiple levels of caches, is having cache-friendly memory access. Data-oriented design results in very efficient use of the instruction cache because the same code is executed over and over. Also, if we lay out the data in large, contiguous blocks, we can process the data sequentially, getting nearly perfect data cache usage and great performance. Possible optimizations. When we think of objects or functions, we tend to get stuck optimizing at the function or even the algorithm level; Reordering some function calls, changing the sort method, or even re-writing some C code with assembly.
That kind of optimization is certainly beneficial, but by thinking about the data first we can step further back and make larger, more important optimizations. Remember that all a game does is transform some data (assets, inputs, state) into some other data (graphics commands, new game states). By keeping in mind that flow of data, we can make higher-level, more intelligent decisions based on how the data is transformed, and how it is used. That kind of optimization can be extremely difficult and time- consuming to implement with more traditional OOP methods.
Modularity.
So far, all the advantages of data-oriented design have been based around performance: cache utilization, optimizations, and parallelization. There is no doubt that as game programmers, performance is an extremely important goal for us. There is often a conflict between techniques that improve performance and techniques that help readability and ease of development. For example, re-writing some code in assembly language can result in a performance boost, but usually makes the code harder to read and maintain.
Fortunately, data-oriented design is beneficial to both performance and ease of development. When you write code specifically to transform data, you end up with small functions, with very few dependencies on other parts of the code. The codebase ends up being very â€œflat,â€ with lots of leaf functions without many dependencies. This level of modularity and lack of dependences makes understanding, replacing, and updating the code much easier.
Testing.
The last major advantage of data-oriented design is ease of testing. As we saw in the June and August Inner Product columns, writing unit tests to check object interactions is not trivial. You need to set up mocks and test things indirectly. Frankly, it’s a bit of a pain. On the other hand, when dealing directly with data, it couldn’t be easier to write unit tests: Create some input data, call the transform function, and check that the output data is what we expect. There’s nothing else to it. This is actually a huge advantage and makes code extremely easy to test, whether you’re doing test-driven development or just writing unit tests after the code.
Drawbacks of Data-Oriented Design
Data-oriented design is not the silver bullet to all the problems in game development. It does help tremendously writing high-performance code and making programs more readable and easier to maintain, but it does come with a few drawbacks of its own.
The main problem with data-oriented design is that it’s different from what most programmers are used to or learned in school. It requires turning our mental model of the program ninety degrees and changing how we think about it. It takes some practice before it becomes second-nature.
Also, because it’s a different approach, it can be challenging to interface with existing code, written in a more OOP or procedural way. It’s hard to write a single function in isolation, but as long as you can apply data-oriented design to a whole subsystem you should be able to reap a lot of the benefits.
Applying Data-Oriented Design
Enough of the theory and overview. How do you actually get started with data-oriented design? To start with, just pick a specific area in your code: navigation, animations, collisions, or something else. Later on, when most of your game engine is centered around the data, you can worry about data flow all the way from the start of a frame until the end.
The next step is to clearly identify the data inputs required by the system, and what kind of data it needs to generate. It’s OK to think about it in OOP terms for now, just to help us identify the data. For example, in an animation system, some of the input data is skeletons, base poses, animation data, and current state. The result is not â€œthe code plays animations,â€ but the data generated by the animations that are currently playing. In this case, our outputs would be a new set of poses and an updated state.
It’s important to take a step further and classify the input data based on how it is used. Is it read- only, read-write, or write-only? That classification will help guide design decisions about where to store it, and when to process it depending on dependencies with other parts of the program.
At this point, stop thinking of the data required for a single operation, and think in terms of applying it to dozens or hundreds of entries. We no longer have one skeleton, one base pose, and a current state, and instead we have a block of each of those types with many instances in each of the blocks.
Think very carefully how the data is used during the transformation process from input to output. You might realize that you need to scan a particular field in a structure to perform a pass on the data, and then you need to use the results to do another pass. In that case, it might make more sense to split that initial field into a separate block of memory that can be processed independently, allowing for better cache utilization and potential parallelization. Or maybe you need to vectorize some part of the code, which requires fetching data from different locations to put it in the same vector register. In that case, that data can be stored contiguously so vector operations can be applied directly, without any extra transformations.
Now you should have a very good understanding of your data. Writing the code to transform it is going to be much simpler. It’s like writing code by filling in the blanks. You’ll even be pleasantly surprised to realize that the code is much simpler and smaller than you thought in the first place, compared to what the equivalent OOP code would have been.
If you think back about most of the topics we’ve covered in this column over the last year, you’ll see that they were all leading toward this type of design. Now it’s the time to be careful about how the data is aligned (Dec 2008 and Jan 2009), to bake data directly into an input format that you can use efficiently (Oct and Nov 2008), or to use non- pointer references between data blocks so they can be easily relocated (Sept 2009).
Is There Room For OOP?
Does this mean that OOP is useless and you should never apply it in your programs? I’m not quite ready to say that. Thinking in terms of objects is not detrimental when there is only one of each object (a graphics device, a log manager, etc) although in that case you might as well write it with simpler C-style functions and file-level static data. Even in that situation, it’s still important that those objects are designed around transforming data.
Another situation where I still find myself using OOP is GUI systems. Maybe it’s because you’re working with a system that is already designed in an object-oriented way, or maybe it’s because performance and complexity are not crucial factors with GUI code. In any case, I much prefer GUI APIs that are light on inheritance and use containment as much as possible (Cocoa and CocoaTouch are good examples of this). It’s very possible that a data-oriented GUI system could be written for games that would be a pleasure to work with, but I haven’t seen one yet.
Finally, there’s nothing stopping you from still having a mental picture of objects if that’s the way you like to think about the game. It’s just that the enemy entity won’t be all in the same physical location in memory. Instead, it will be split up into smaller subcomponents, each one forming part of a larger data table of similar components.
Data-oriented design is a bit of a departure from traditional programming approaches, but by always thinking about the data and how it needs to be transformed, you’ll be able to reap huge benefits both in terms of performance and ease of development.
Thanks to Mike Acton and Jim Tilander for challenging my ideas over the years and for their feedback on this article.
This article was originally printed in the September 2009 issue of Game Developer.
Here’s a Korean translation of this article byÂ Hakkyu Kim.



Build Server: The Heartbeat of The Project
Fri, 12 Dec 2008 00:00:00 +0000
Have you ever given some thought to why you decided to become a game programmer? Iâ€™m pretty sure it wasnâ€™t to do mundane, repetitive tasks. Yet sometimes we find ourselves spending a significant portion of our time making sure that the code compiles for all platforms, or that there are no potential bugs lurking in the depths of the game, or even building the assets for each level and running them to make sure they load correctly.
Clearly, those are all things that need to be done, but if they are so repetitive and mindless, couldnâ€™t we put some of the computers around us to good use and have them do the job for us?
A build server will do all that and more, much faster and more reliably than we could, and it will free us to work on the thing that made us fall in love with this industry in the first place: the game.
Getting Off The Ground
Before we can start thinking about setting up a build server, we need to be able to build the game with a single command from the command line. No clicking around, no GUI apps, no multiple steps, no magical incantations that only work during a full moon. Just type a command and the build for the game and all its libraries starts.
This is not just a necessary step to set up a build server; itâ€™s a very good engineering practice. So if youâ€™re not there, spend some time on it right away and youâ€™ll be glad you did when candidate submission time comes around.
Building the game with a single command should be fairly easy, but the specifics will depend on your environment and build system. If youâ€™re using Visual Studio, you can put the game and all the libraries in a single solution with the correct dependencies. Then you can invoke the devenv.com command line program specifying the solution and configuration you want to build: devenv.com mygame.sln /build Debug. You can wrap that up in a single batch file buildgame.bat for extra convenience and youâ€™re done.
If youâ€™re using another build system, such as make or jam, you can probably already build it with a single command. If youâ€™re using a bunch of mode-made scripts, at least wrap them all up in a single script file so they can be run with a single command.
Just building the game with a single command isnâ€™t enough. We must have a way to automatically detect whether the build succeeded or failed. Fortunately, most build systems (including devenv. com and make) return an error code when the build fails. If youâ€™re rolling your own build script file, make sure to capture build failure and return an error code as well.
Setting Up The Build Server
A build server should be a dedicated machine with access to version control. Whenever a new build is needed, the server syncs to the latest version in version control, starts a build, and notifies the team in case of any errors.
Thatâ€™s a fine start, but we could make things much better. For example, we could include the error message in the notification email so programmers can see right there what the problem was instead of being forced to sync and build the game themselves. We could also trigger a build in different circumstances (for example, code checked-in, forced by a person, or some other event) instead of only at fixed intervals. We might want to distribute the build across multiple machines, or keep logs and make them available on a web page, or format emails better, or use more direct notification methods …
Put away those Python reference manuals because fortunately, someone has already done all the work for us: CruiseControl (and CruiseControl.Net). Itâ€™s a free, open source build server program with all the bells and whistles that you could possibly want. And did I mention itâ€™s free?
There are three main parts to it:

The build server. It runs as an application or a Windows service. Itâ€™s configured through a very simple XML file that tells it when to sync, where to sync, what to build, and how to report it.
The web front end. CruiseControl features a pretty, web-based dashboard showing all the builds, their status, past logs, and other pertinent data.
The system tray notificator. This is a little app that runs in the system tray and shows the status of all the builds and notifies you of any changes right away with a message and by playing some sounds. This is my favorite way to keep up to date with the build status. Youâ€™ll be up and running in about 10 minutes. The most complicated part is probably installing a web server (if you donâ€™t already have one) and getting the web dashboard running. Youâ€™ll spend a few more hours tinkering with it to get it â€œjust right,â€ and then youâ€™re done. The only time I have to mess with it is to upgrade to a new version every so often. Other than that, itâ€™s virtually maintenance free.

At this point youâ€™ll have a fully featured build server in place. It verifies that the game can be built from the latest checked-in version of the code. It notifies developers of failed and successful builds right away. It increases version numbers, keeps a build history and statistics, archives executables, and emails logs.
CruiseControl and CruiseControl.Net are the two build servers I have most experience with. There are other build servers out there, with slightly different features, integrations with different environments, and so forth. Some of them are commercial and come with full support in case youâ€™re more comfortable with that model.
Itâ€™s important to stress that a build server is not intended to be the only machine that builds the game. Every programmer (and maybe every member of the team) should be able to build the game in his or her own machine from scratch. The build server is there to verify that all the checked-in changes build correctly on a clean machine, and to make sure that all platforms and configurations are building successfully. Any official builds should be created exclusively from the build server, though. Especially any builds distributed externally to publishers or manufacturers. This ensures that the build is clean, was created in a repeatable manner, and is free of any idiosyncrasies from a particular machine.
How Often?
Once the build server is in place and is producing successful builds reliably, the question arises of how often to make builds of the game.
It used to be considered good practice to do a weekly build. The team would start ramping things up on Thursday to try and get a build out the door by the end of the day on Friday. Anybody who has done that knows how stressful it can be and how it can easily become a bottleneck.
Why wait a week if you can do one every night? More teams started switching to the daily build, which is much less stressful because there are fewer changes in each new build. It also gives the team a chance to fix anything that was found broken in the previous day’s build. Soon, teams took it beyond the daily build and started making two builds a day, or even one every hour.
The build server has been very appropriately described as the heartbeat of the project. A â€œgreen buildâ€ is one heartbeat and one small step forward. A â€œred buildâ€ is done when something is wrong and needs to get fixed as soon as possible. If you have a red build several days in a row, the project is in serious trouble. The more often you make a successful build, the better. Youâ€™ll find fewer surprises and stay more on course that way.
My favorite approach is continuous integration. With continuous integration, the build server starts a new build as soon as thereâ€™s a new check-in. If multiple check-ins come in while the build is in progress, another build starts right after itâ€™s done, with all the new changes queued during that time. When following this practice, programmers sync to the latest version often, make small changes, and check-in code frequently, rather than batching many changes. Very conveniently, Cruise Control has a setting to start builds whenever anything changes.
The main benefit of continuous integration is that you are notified as soon as a check-in breaks the buildâ€”not a day later, or even an hour later, but minutes later. It tells you, â€œThe last build was good. This one is not.â€ You can look through the last couple of check-ins that happened during that short time period and quickly narrow down the problem and fix it. Imagine trying to narrow down an elusive crash bug from all the check- ins for a full day or two!
Another benefit is that all programmers are working on a version very close to the latest one. This means that there are fewer source code conflicts when checking-in code, and fewer surprises lurking in the code. The flip side of that is that working on the latest version is living in the proverbial bleeding edge. Itâ€™s not unusual for someone to check-in code that has some accidental bad side effects. As long as those bad check-ins are limited, and that whenever they happen they are fixed right away, I have found the benefits to outweigh some instability in the main branch. Some of the ways to minimize disruptions when working with continuous integration are:

Make sure that any code compiles before checking it in (that should go without saying!)
Execute a fast set of unit tests to verify that basic functionality is working correctly, and
Have the build server notify everybody as soon as thereâ€™s a broken build so it can be fixed and so that nobody else syncs or checks-in any code while the build is broken.

Need For Speed
Ideally, Iâ€™d like to check in some code and see whether the build server found any problems right away. In the real world, things can be much slower. After all, the build server needs to sync to the latest code, kick off builds for multiple platforms and multiple configurations, and perform some other time-consuming steps. Even so, there is work we can do to get feedback as soon as possible. Perform incremental builds during the day, so only the affected sections of the code need to be built. Itâ€™s still a good idea to do a full build at least every night to make sure that everything can be built from scratch.
Set up each platform and configuration as separate builds. That way you get feedback as soon as one of them completes. The only downside is if an error makes it through that causes all the builds to fail, get ready for lots and lots of broken build sounds playing all over the company. Speed up build times through good physical dependencies, modularity, precompiled headers, and good use of forward declarations.
Split up different builds and configurations in different machines. The easiest way is to set up one machine per platform and configuration (or maybe do a couple of configurations per machine). Cruise Control lets you easily integrate several build servers into the same web dashboard and system tray application, so this is a very easy solution.
Don’t Skimp on Hardware
Get the beefiest computers you can afford. Throw fast CPUs, disk access, and gigabit ethernet. Get multiprocessor cores and make sure your build system takes advantage of them. Does it sound like a lot of money? Not when you take into account how few servers youâ€™ll have and how much time youâ€™ll save all the members of the team.
I have tried several distributed build systems, and even though they can sometimes be beneficial for some codebases, Iâ€™m still not a huge fan. I find that you can often achieve the same (or better) results by using multiple processors and good build architectures, and you avoid the complexity and overhead of a distributed build system.
One â€œgotchaâ€ we ran into when we scaled our build farm beyond about 15 build servers was that each of them was hitting our version control repository every few seconds to see if anything had changed. That wasnâ€™t a trivial operation, and so many servers doing it so frequently definitely slowed things down to a crawl.
To remedy that, instead of having the build servers poll the overtaxed source control server, we had the source control server push out a notification. Whenever there was a check-in, the source control server changed a timestamp in a file located on an internal web server. We changed the build servers to constantly monitor the internal web server for changes in that file, and whenever it changed it triggered a build, which completely eliminated the overhead on the version control server.
Beyond The Build
So far, weâ€™ve only been talking about building the game. But the build server is a great tool that we can put to good use for many other purposes. Why restrict ourselves to just the game? All the in-house tools would also benefit from getting the same treatment. We can even take it a step further and deploy the freshly-built copies of all the tools on a network drive or web page so theyâ€™re available to the whole team.
The build server can also double up as a symbol server. That makes it much more convenient for programmers to debug an earlier version of the game and libraries and have all the debugging information available without having to rebuild everything locally.
Thereâ€™s no reason to limit the build server to just building source code. One of the most useful things you can do with it is use it to build game assets as well. Building assets is usually a slow process. Having a fast asset build system that can correctly perform incremental builds is crucial to keep asset build times down.
Build servers are general enough to perform just about any task. Running both unit tests (small tests on each class or function) and functional tests (tests that exercise a larger module or even the whole game) are perfect uses for the build server. Functional tests can be pretty slow, so make sure that theyâ€™re treated as a separate build and not as the last step in building the game. Nobody wants to wait for hours for all the functional tests to complete before they can see the successful build status after a check-in. The sky is the limit with what the build server can do. We use it to run static analysis of our source code, checking for spots in the code that can lead to subtle and dangerous bugs (uninitialized variables, implicit type conversions, and the like).
Another great use is to run through the different levels of the game, recording frame rate at different points of each level, logging the results, and failing the build if it ever drops below a certain threshold. Having the performance history for specific levels can be really useful to narrow down why a particular section is chugging at 20fps but was running at a solid 60 a couple of weeks ago. For bonus points, integrate all the collected data into easy-to-visualize graphs available through the web front end.
The build server is definitely the heartbeat of a project. Keep those check-ins coming and those builds green, and you know youâ€™re heading in the right direction.
This article was originally printed in the August 2008 issue of Game Developer.


Back to The Future (Part 2)
Mon, 15 Sep 2008 00:00:00 +0000
I really enjoy a good cup of tea. On the surface, making tea is really easy: take some tea leaves, pour some hot water over them, and wait a few minutes. In practice, the difference between a bitter, undrinkable brew, and a perfect cup of tea is all in the details; the type and amount of tea, the temperature of the water, and the steeping, time all make a huge difference. A playback system for a game is very much the same. As we saw last month, the basic idea is really simple: Record all game inputs, make the game deterministic, and you get the same playback every time. Unfortunately things aren’t quite that simple in real life. Just as with tea making, the secret to a perfect playback system is all in the details.
Let’s look at some of the common problems that can trip up the playback system and what we can do about them. We’ll warm up by starting with the easiest ones and work our way to the hardest ones.
Game Versions
A playback of a game recorded with a different version is almost guaranteed to end up with different results than the original version. Any minor tweaks to AI parameters, pathfinding algorithms, or just about anything is most likely going to generate a different game state. It doesn’t even have to be a change in code either: adding a new character or moving a trigger volume will also cause different playbacks.
In general, you’re best off avoiding using recorded inputs from previous versions of the game. Inserting the game build version at the beginning of the recorded game state file allows you to detect version mismatches during playback. If a mismatch is detected, you can either print a warning or give up on the state verification altogether. This check will be particularly useful if you’re sharing recorded sessions across the office (perhaps through a bug database) and people are on slightly different versions.
The idea of recording a session in release mode (all optimizations enabled, few debugging checks in place), and playing it back in debug mode is very tempting. Unfortunately, mixing and matching configurations is almost guaranteed not to work. Middleware libraries will often have slightly different behaviors in debug and release, which will throw off playback right away. But most likely so does your own code, and often in ways you might not expect. For example, you might have different floating point rounding modes in debug and release. Or even a more subtle difference: depending on the optimizations you have turned on, in debug mode a float variable might be copied back to memory after each operation (which rounds it back to a float), whereas in release the program might execute several operations in a row on that value before writing it back to main memory and causing the rounding only once. The results are going to be very close, but they’ll often be cumulative and the simulation will quickly diverge.
As a general rule, treat debug and release builds as if they were different versions. Make sure to include which configuration it is along with the build number in the game state file so you can check for that as well.
Memory State
Reading data from uninitialized memory is probably the number one cause of non-determinism in games. Uninitialized memory will contain whatever data was stored there before. That can mean anything from sane-looking values to total garbage. What’s worse, they’re going to change from run to run, so if your game ever uses the values contained in uninitialized memory, playback is not going to be deterministic.
The most common case of using uninitialized memory happens by reading a variable that was declared and not initialized. This mistake is easy to see when we’re dealing with a standalone variable, but when it becomes a member variable hidden inside some class, it becomes a lot less obvious. It’s also important to notice that this will only happen when we declare variables on the stack or we create them dynamically on the heap. For global and static variables, the C runtime takes care of initializing them all to zero for us.
Cranking up the compiler warning level to the maximum will allow it to catch some of the most obvious cases of reading uninitialized variables. For extra checks, I recommend a static code analysis program such as PC Lint. That’s almost guaranteed to catch every use of uninitialized variables (along with ten thousand other things, some of them extremely useful, though most of them you probably won’t care about at all).
Another common mistake is to read past the end of a valid memory area, such as an array. The values read from memory in that case are going to depend on where that array is, and what happened before. In any case, it’s going to become a source of non-determinsim, and it’s a clear bug that should be fixed right away.
If you ever find that your game is deterministic in debug mode but not in release mode, start by suspecting access to uninitialized memory. In debug mode, a lot of platforms will fill the memory heap with specific bit patterns describing the memory status: allocated, freed, etc. Those patterns can be really useful to track down problems with memory allocations, but they also set memory values to a consistent state, which will make debug builds seem deterministic when they really aren’t.
Pointer Values
Unless you’re working on a platform over which you have total control, and you have a very strict memory allocation scheme, you should probably never rely on the actual numerical value of your pointer variables. The pointer values will change from run to run, and will depend on what happened before, including what other programs were running at the time.
If you’re puzzled at the idea of using the pointer values as part of the logic in your program (and you should be), you still need to watch out for this possibility. Some well-known, open source compression libraries rely on this technique by hashing pointer values. This means that two consecutive runs of the same program might end up with two slightly different results.
In general, it’s considered a good practice to avoid using pointer values for anything other than dereferencing them and accessing the data they’re pointing to. Making decisions based on their actual numerical values is asking for trouble, and you can almost always achieve the same result by using offsets between pointer values (as long as you know they’re coming from the same memory block), which has none of those problems.
Asynchronous File I/O
This is where things start to get fun. A lot of games perform background loads while the game is running. Whenever the game detects it will need an asset, it initiates an asynchronous load. When the load is complete, the game is notified and, optionally, does some processing on that asset. There is no guarantee about when exactly the load will complete, which means that each playback has the potential to be slightly different.
If the background load just brings in more mipmaps for a texture, it’s not going to affect the simulation whether it happened a frame earlier or a frame later. On the other hand, if it’s loading a more detailed AI navigation path, it can definitely affect the position and state of AI units. Even if you’re not loading data that affects the simulation, it’s very useful to have asynchronous file IO work deterministically to catch bugs more reliably. For example, the game might crash if a background texture load completes the same frame as a line of dialog sound is requested. As any programmer who’s ever had to debug a problem like this will tell you, being able to reliably reproduce that crash is worth a small fortune.
We can turn asynchronous file loading into a deterministic operation by considering the read completions as inputs to the game. Whenever an asynchronous file load completes, we record it in the game input file at the corresponding frame. During playback, the game will request background loads as usual, but completion events are made available only when they’re read from the input stream.
This can be easily implemented at the game level by buffering all background read completions. During recording and normal game operation, background read completion events are made available as soon as they happen. During playback, however, background read completion events need to be available the exact same frame they happened in the original session. If the read finishes earlier, it is simply buffered for a few frames until the correct time. If the read hasn’t completed by the time it finished in the original session, the game blocks until it’s done and it’s made available right away. That means that playback might be a bit choppy at times, while the game blocks for data to be read, but it will ensure that playback is fully deterministic. Notice that blocking the game while waiting for a read to complete is not going to affect subsequent frames with a larger delta time because we’re also reading the system clock from the input stream.
Network Data
What about online games? Our playback method has been concerned exclusively with local players. Playtesting and Q/A on online games is much more expensive and time consuming than single-player games because of the manpower required and the coordination necessary to set the games up. Wouldn’t it be great to be able to replay a 30-person game that resulted in a crash without having to get 30 players again?
Data received from the network most definitely influences the game itself, so it’s another input to the game. We could treat all network traffic like any of the other inputs. Record it along with the other game input and play it back by inserting the network packets as if they had come from the network card. That would work, but it’s a bit more complicated than it has to be. Fully emulating the network traffic, connection status, and everything else can be a bit tricky to get just right.
A simpler approach consists of translating the data received through the network into higher-level game actions. For example, whenever a player fires a weapon, it results in a network packet, which is translated into a game-level action whenever it’s received. Then, the local simulation applies the action that causes that player to fire locally. You probably already have a system like that in place in your multiplayer game anyway. Recording those actions is a much simpler task than the raw network traffic, and playing back a recorded multiplayer game is just a matter of inserting the game-level actions at the same time they occurred, just like any other input.
The game input file is going to grow significantly as soon as you start recording all the actions created by the players through the network. Fortunately, most games are extremely careful to minimize network bandwidth, so the input file will remain at a reasonable size even recording all that data.
Threads And Multiprocessors
In today’s games, multithreading is an inevitable reality. Even if your code isn’t explicitly multithreaded, parts of your engine and middleware are probably using multiple threads. The problem with threads is that you never know exactly when one thread stops working and another one resumes. So it is possible that two events in two different threads will happen in different order in two runs of the game. To make things even more fun, most of today’s platforms have multiple cores, making determinism even more complicated. Do we need to give up determinism in a thread-dominated world?
The totally honest answer is that we’ve already picked all the low-hanging fruit. If you really want your game to be 100 percent deterministic while running in multiple threads over multiple cores, get ready to roll up your sleeves and do some serious work. For the rest of us, we can still get most of the benefits of a playback system without total determinism. That means that there is potential for bugs caused by thread interactions that might not be repeatable from run to run. But at least the game simulation will be the same for every playback, so the recording technique should still be very useful.
In the easiest case, the simulation runs on a single thread, with the rest of the threads dedicated to graphics, sound, and other systems. If we apply all the techniques we’ve covered so far, the simulation itself should be deterministic and we shouldn’t have to do anything different than we did in the singlethreaded case.
However, as soon as the simulation is spread across multiple threads, we need to be a lot more careful. Even if we have no control over the thread context switches, we can at least ensure that major events happen in the same order no matter how they were executed. For example, if a worker thread creates a set of actions to be executed, we can sort those actions before processing them. If that’s not an option (because those actions are processed as soon as they are created, for example), we could record the creation order as part of the input recording, although enforcing that order during playback might have far reaching consequences for many systems deep inside the game.
If achieving 100 percent determinism in a threaded environment is important to your project, have a look at Replay Director. It’s a commercial tool and set of libraries that gives you very accurate recordings and playbacks of your game, including thread context switches, with little extra programming on your part.
More Than Just For Debugging
At this point, if you’ve applied all the techniques we’ve discussed so far, you should have a pretty bullet-proof recording and playback system: lightweight, reliable, and accurate. It will be a great help as a debugging tool, but there’s no reason to stop there. You could use the same system to record demo sequences to show off in presentations without the need to make a movie capture and degrade the image quality. You can also truthfully claim that it’s a live demo, running on the game itself, not a canned movie, which always carries extra weight with most audiences.
You can even integrate the playback system as a key feature in your game. Halo 3 already does this by allowing you to replay any play session, examine what happen exactly, and let you take screenshots of the juiciest moments. It can be a great feature for a lot of games for almost no extra effort over what you’ve already implemented. The main problem will be to make sure future updates to the game don’t affect the playback of older gameplay sessions. This is really hard to achieve, since the most insignificant change can end up causing the simulation to diverge in unexpected ways. So if you’re going to rely on it as a game feature, you need a suite of comprehensive tests verifying that nothing changes whenever the game is updated. Last month we saw how to verify that a playback results in the same game state as the original session, so again, there’s very little extra work there.
With all these techniques in your toolbox, you should be able to make your game pretty close to 100 percent deterministicâ€”enough to have a solid playback system and use it to its fullest during production, and maybe even as a game feature.
This article was originally printed in the June 2008 issue of Game Developer.


Back to The Future (Part 1)
Fri, 22 Aug 2008 00:00:00 +0000
Insanity: doing the same thing over and over again and expecting different results. â€“ (attributed) Albert Einstein
How would you like to be able to reproduce every crash report that QA adds to the bug database quickly and reliably? How useful would it be to be able to put a breakpoint the frame before a crash bug happens?
You can do all that and more if your game is deterministic and you feed it the same inputs as an earlier run. Sounds easy? It is, if you implement it early on and you keep it that way during development. If you choose not to make your game deterministic, your team will go insane by Einsteinâ€™s definition, and maybe by a few other definitions as well by the time the project ends.
Determinism
A system is said to be deterministic if, given the same set of inputs, it produces the same set of outputs. Think of your game as a big, black box, with some inputs and outputs.
The two main input types for most games are:

Player input devices. The state of the gamepad, keyboard, mouse, or any other input device is used to update the simulation.
System clock. Most games query the system clock once per frame to retrieve a current time and delta time. The game then advances the simulation forward by that amount of time. Even console games that expect to run at a rocksolid 60 Hz are often implemented this way to be able to deal with different TV refresh rates or with slower builds during development.

The main outputs of a game system are the pixels on the screen and the generated sounds. Optionally there are other outputs such as network packets or force feedback.
To make the game deterministic we need to make sure that, for a given set of inputs, it always produces the same outputs. In other words, playing the game twice, entering the same button presses at the exact times, the game will end up in the same state both times (same location, same health, number of lives, NPC positions, etc).
What about random numbers, which we rely on so much in games? Arenâ€™t random numbers inputs as well? Yes and no. In games we use pseudo random number generators. That means that the sequence of numbers from a particular seed is always the same. For a given seed, all runs of the game are fully deterministic, so the random numbers are not really an input into the system. The seed to the random number generator is an input, since it will greatly affect the state of the simulation.
Record and Playback
Recording all player inputs and the system clock is as straightforward as it sounds: At the beginning of the simulation loop, sample the clock and the player input devices. Then save those values to a file and continue the game simulation as usual. You want to make sure the file is flushed after writing the values for each frame. That way, if the game crashes, youâ€™ll have all the input leading up to that frame and youâ€™ll be able to play it back right up to the point where it crashed.
Recording the input has a negligible performance impact and the amount of data saved is very small. As an example, in our current game, every frame we save the frame number, the clock time, frame delta time, and a 20-byte structure for each gamepad (see Listing 1). Without any compression, thatâ€™s 52 bytes per frame for a two-player game. At 60 Hz, a 20-minute game would be a tiny 3.6 MB, which is small enough to attach to a bug tracking system, email to the team, or archive with the build itself.
Listing 1. Game input structures.
struct FameInput{    uint32 frameNumber;    float time;    float dt;};

struct RawControllerState{    uint32 buttons;    float leftStickX;    float leftStickY;    float rightStickX;    float rightStickY;};
Using the recorded input for playback is almost as easy. Every frame, before the simulation starts, we read the input data from the file and feed it to the game as if it came from the system clock and the input devices. A clean way to do that is to separate the reading of the data from where it comes from. In C++ we can use abstract base classes that define an interface describing how to retrieve the data. For example, a class IGameInput can define an interface, and the class GamepadGameInput reads the data from the hardware, while the class FileGameInput reads it from the recorded file (see Listing 2).
Listing 2. IGameInput, GamepadGameInput and FileGameInput
class IGameInput{public:    virtual void GetInputData(RawControllerState& state) = 0;};

class GamepadGameInput: public IGameInput{public:    virtual void GetInputData(RawControllerState& state);};

class FileGameInput: public IGameInput{public:    virtual void GetInputData(RawControllerState& state);};
In addition to recording and playing back those inputs every frame, we also need to record the seed to the random number generator before the game starts, and read it and apply it during playback. That will ensure that all the random numbers are the same in both runs of the game.
Verification
If youâ€™re going to rely on this recording system, you need to make sure the playback produces the exact same results as the original session. If they arenâ€™t the same, the whole system is worthless. Also, because weâ€™re just recording input, if the state of the game during playback ever starts to diverge, it will continue getting more and more out of sync from there. We need a way to make sure the playback produces the same state as the initial recording.
Earlier we identified some of the outputs of the game as the pixels on the screen or the sound produced by the speakers. We could compare pixels with a previous run of the game, but the storage requirements would be enormous, and the performance less than ideal. Besides, subtle changes in shaders, lighting, or a simple texture change would throw the comparison off. An easier solution is to check the game state itself. If an enemy is in a different location in two runs of the game, of course itâ€™s going to render to a different set of pixels. It will be a lot faster and easier to compare game states than raw output.
During recording, in addition to the game input, we can save some of the state of the game. Thereâ€™s no need to record it all, just some of the most important and representative state. Good candidates include player and enemy positions, prop transforms, score, etc. Unless itâ€™s crucial to your game, thereâ€™s no need to record things like player animations or enemy AI state. Chances are that if any of those diverge, the positions for those entities will also diverge right away.
Once we have all of this state recorded, we can verify the state does not change during playback. Every frame, we compare the current game state with the recorded game state. If theyâ€™re different, even by the smallest amount, we know something has gone wrong and we flag it right away. The game isnâ€™t quite deterministic and something needs to be fixed. We can choose to record and verify the game state anywhere in the simulation loop (as long as theyâ€™re both done in the same spot), but if we do it after the simulation step instead of before, weâ€™ll be able to see what the inputs were that caused the divergence this frame, which will help when debugging. The main loop is shown in Listing 3.
Listing 3. Main loop structure
while (!done){    SampleClockAndInputs();   // from different sources through the same interface    RecordInput();    UpdateSimulation();    if (recordGameState) RecordGameState();    if (verifyGameState) VerifyGameState();}
Unlike recording game input, recording the game state can be a much more expensive operation. Traversing the game structures can be a significant performance hit due to cache misses, and the data saved to disk often results in large files. Because of this, in our current game player input is recorded all the time, but recording game state is optional, and we can be controlled through a command-line parameter. Game state verification is also optional, since sometimes we want to play back a recorded set of inputs even knowing that the state of the game is going to diverge due to changes weâ€™ve made.
For a few of you, checking the game state wonâ€™t be enough. If youâ€™re writing a middleware graphics layer, you probably want to verify that the values youâ€™re generating in the back buffer are the same ones from a previous pass. Or maybe that improving the performance of a rendering algorithms still generates the same image. In that case, you might want to consider something like Perceptual Image Difference, which will be a much less error-prone than comparing exact values for pixels.
Monkeying Around
Once the game is fully deterministic and the playbacks are rock solid, you want to make sure it stays that way. Itâ€™s all too easy to introduce bugs that will cause playbacks to diverge. For example, a single, innocent-looking, uninitialized variable can change the simulation depending on the value it happens to have for this run.
The best method Iâ€™ve found to stress test the playback system is to use a recorded session of monkey input. The monkey input method consists of feeding the game pseudo-random inputs (as if a monkey were playing the game). Recording both the input and game state of a monkey input play session, and then playing it back verifying the game state kills two birds with one stone: You get some nice automated testing of your game, and you verify that the game is fully deterministic. It sounds too simple to be useful, but youâ€™ll be amazed at how many bugs your first session of monkey input will uncover.
A clean way to implement the monkey input, is writing a new class that implements the IGameInput interface. This new class will generate game input for all the buttons and axes of a game controller, but it wonâ€™t come from the hardware game controller or from a recorded session, but from randomly generated values. Apart from being a very clean way to insert input into the game, this approach has the advantage that the monkey input can be recorded just as if it were regular input coming from the controller. That means we can later replay a session and verify its game state, which makes for a very useful functional test.
It turns out that totally random input values are not ideal, as it becomes apparent as soon as you implement the naive random monkey input class. If the state of the jump button is truly random, it will be pressed and released almost every frame, which means the player will hardly ever jump high enough off the floor. A better implementation will use a range with a minimum and maximum press time durations, which will give much better results. You might also want to avoid pressing specific buttons (like the pause button, or the restart button combination). Resist the temptation to make the monkey input too smart and make it behave more like a real player, for example, by limiting the number of buttons it can have pressed simultaneously. Part of the benefit of the monkey input is that it will do very unexpected things, that no sane player will ever do on purpose, and by doing that, it will unearth many more problems and issues with the game.
One day, shortly after I implemented the monkey input system, the playback verification started failing. Some game entities were ending up in a different state than expected. After doing some digging, I realized that I was using the same random number generator for the game simulation and for the monkey input. Since the playback was just reading input values from the file and not generating them on the fly, the random number sequence was getting off sync right away, causing entities with some randomness in them to behave differently. Lesson learned: Make sure that nothing in the monkey input affects the game systems. In this case, I solved it by using two instances of the random number generator, one for the monkey and one for the game.
Playback in The Real World
At this point we can easily reproduce bugs. We can re-run the game and put a breakpoint right before the game crashesâ€”except that the crash happens 20 minutes into the playback and nobody wants to wait that long.
Fortunately, we can make time go by faster. During playback we don’t care about tearing artifacts, so we can turn off the vertical sync, which will speed things up a bit. Most importantly, we often donâ€™t even care whether we render anything or not, so if we turn off rendering completely, we can make the playback run significantly faster, particularly if you were graphics bound. One word of caution: If the rendering part of the game does significant work, and especially if you suspect it might be interacting with the rest of the game to cause a bug or some other source of non-determinism, you might want to do the same work in the rendering system, constructing the push buffer the way you would normally do, but never send it off to the graphics hardware.
Turning off the vertical sync and disabling the graphics rendering, is just reducing the amount of time we spend in each frame. But it still takes 10 minutes to get 10 minutes into the game, we just go through many more frames to get there. The last piece of the puzzle to make time go faster is to be able to set a fixed timestep. Now we can go through the simulation at a rate much faster than real time (and the beefier you computer, the faster it will go). In my current game, we can go through 10 minutes of game time in about 1 minute of real time.
One use for recording and playback that we havenâ€™t mentioned is performance comparisons. When youâ€™re optimizing the game, you can record a gameplay sequence and gather some performance statistics: average fps, longest frames, etc. Then you can apply your optimizations, playback the same input sequence, and compare the performance statistics to see how effective the optimizations really were. Just make sure you use real clock time and not the recorded clock time, otherwise you wonâ€™t see any differences, even with all the hard effort you put into the optimizations.
There are a few details we glossed over that might prevent some games from begin fully deterministic: network traffic, asynchronous file I/O, and threading issues. Weâ€™ll cover those in detail next month.
Youâ€™ll soon find that input recording and playback becomes an essential tool for in development process and youâ€™ll wonder how you ever lived without it. Spend a few hours implementing it early in the development cycle and reap the benefits many times over during production. Youâ€™ll be the hero of the day when that dreaded crash bug from QA arrives minutes before a deadline.
Thanks to Jim Tilander for being a great idea bouncing-board and proofreading the article.
This article was originally printed in the May 2008 issue of Game Developer.