There has been a lot of recent discussion (and criticism) on Data Oriented Design recently. I want to address some of the issues that have been raised, but before that, I’ll start with this reprint from my most recent Game Developer Magazine. If you have any questions you’d like addressed, add write a comment and I’ll try to answer everything I can.
Last year I wrote about the basics of Data-Oriented Design (see the September 2009 issue of Game Developer). In the time since that article, Data-Oriented Design has gained a lot of traction in game development and many of teams are thinking in terms of data for some of the more performance-critical systems.
As a quick recap, the main goal of Data-Oriented Design is achieving high-performance on modern hardware platforms. Specifically, that means making good use of memory accesses, multiple cores, and removing any unnecessary code. A side effect of Data-Oriented Design is that code becomes more modular and easier to test.
Data-Oriented Design concentrates on the input data available and the output data that needs to be generated. Code is not something to focus on (like traditional Computer Science), but is something that is written to transform the input data into the output data in an efficient way. In modern hardware, that often means applying the same code to large, contiguous blocks of homogeneous memory.
Applying Data-Oriented Design
It’s pretty easy to apply these ideas to a self-contained system that already works over mostly-homogeneous data. Most particle systems in games are probably designed that way because one of their main goals is to be very efficient and handle a large number of particles at high framerates. Sound processing is another system that is naturally implemented thinking about data first and foremost.
So, what’s stopping us from applying it to all the performance-sensitive systems in a game code base? Mostly just the way we think about the code. We need to be ready to really look at the data and be willing to split up the code into different phases. Let’s take a high-level example and see how the code would have to be restructured when optimizing for data access.
Listing 1 shows pseudocode for what could be a typical update function for a generic game AI. To make things worse, that function might even be virtual and different types of entities might implement it in different ways. Let’s ignore that for now and concentrate on what it does. In particular, the pseudocode highlights that, as part of the entity update, it does many conditional ray casting queries, and also updates some state based on the results of those queries. In other words, we’re confronted with the typical tree-traversal code structure so common in Object-Oriented Programming.
void AIEntity::Update(float dt) { DoSomeProcessing(); if (someCondition && Raycast(world)) DoSomething(); if (someOtherCondition && BunchOfRayCasts(world)) DoSomethingElse(); UpdateSomeOtherStuff(); }
Listing 1
Ray casts against the world are a very common operation for game entities. That’s how they “see†what’s around them and that’s what allows them to react correctly to their surroundings. Unfortunately, ray casting is a very heavy weight operation, and it involves potentially accessing many different areas in memory: a spatial data structure, other entity representations, polygons in a collision mesh, etc.
Additionally, the entity update function would be very hard to paralellize on multiple cores. It’s unclear how much data is read or written in that function, and some of that data (like the world data structure) might be particularly hard and expensive to protect from updates from multiple threads.
If we re-organize things a bit, we can significantly improve performance and paralellization.
Break Up And Batch
Without seeing any of the details of what’s going on inside the entity update, we can see the raycasts sticking out like a sore thumb in the middle. A raycast operation is fairly independent of anything else related to the entity, it’s heavyweight, and there could be many of them, so it’s a perfect candidate to break up into a separate step.
Listing 2 shows how the broken up entity update code would look like. The update is now split in two different passes: The first pass does some of the updating that can be done independently of any ray casts, and decides which, if any, raycasts need to be performed sometime this frame.
void AIEntity::InitialUpdate(float dt, RayCastQueries& queries) { DoSomeProcessing(); if (someCondition) AddRayCastQuery(queries); if (someOtherCondition) AddBunchOfRayCasts(queries); } void AIEntity::FinalUpdate(const RayCastResults& results) { UpdateSomeOtherStuff(results); }
Listing 2
The game code in charge of updating the game processes all AI entities in batches (Listing 3). So instead of calling InitialUpdate(), solve ray casts, and FinalUpdate() for each entity, it iterates over all the AI entities calling InitialUpdate() and adds all the raycast query requests to the output data. Once it has collected all the raycast queries, it can process them all at once and store their results. Finally, it does one last pass and calls FinalUpdate() with the raycast results on each entity.
RayCastQueries queries; for (int i=0; i<entityCount; ++i) entities[i].InitialUpdate(dt, queries); // Other update that might need raycasts RayCastResults results; for (int i=0; i<queries.count; ++i) PerformRayCast(queries[i], results); for (int i=0; i<entityCount; ++i) entities[i].FinalUpdate(results);
Listing 3
By removing the raycast calls from within the entity update function, we’ve shortened the call tree significantly. The functions are more self-contained, easier to understand, and probably much more efficient because of better cache utilization. You can also see how it would be a lot easier to parallelize things now by sending all raycasts to one core while another core is busy updating something unrelated (or maybe by spreading all raycasts across multiple cores, depending on your level of granularity).
Note that after calling InitialUpdate() on all entities, we could do some processing on other game objects that might also need raycast queries and collect them all. That way, we can batch all the raycasts at compute them all at once. For years, we’ve been drilled by graphics hardware manufacturers how we should batch our render calls and avoid drawing individual polygons. This is the same way: By batching all raycasts in a single call, we have the potential to achieve much higher performance.
Splitting Things Up
Have we really gained much by reorganizing the code this way? We’re doing two full passes over the AI entities, so wouldn’t that be worse from a memory point of view? Ultimately you need to measure it and compare the two. In modern hardware platforms, I would expect performance to be better because, even though we’re traversing through the entities twice, we’re using the code cache much better and we’re accessing them sequentially (which allows us to pre-fetch the next one too).
If this is the only change we make to the entity update, and the rest of the code is the usual deep tree traversal code, we might not have gained much because we’re still blowing the cache limits with every update. We might need to apply the same design principles to the rest of the update function to start seeing performance improvements. But at the very least, even with this small change, we have made it easier to parallelize.
One thing we’ve gained now is the ability to modify our data to fit the way we’re using it, and that’s the key to big performance gains. For example, after seeing how the entity is updated in two separate passes, you might notice that only some of the data that was stored in the entity object is touched from the first update, and the second pass accesses more specific data.
At that point we can split up the entity class into two different sets of data. One of the most difficult things at this point is naming these sets data in some meaningful way. They’re not representing real objects or real-world concepts anymore, but different aspects of a concept, broken down purely by how the data is processed. So what before was an AIEntity, can now become a EntityInfo (containing things like position, orientation, and some high-level data) and AIState (with the current goals, orders, paths to follow, enemies targeted, etc).
The overall update function now deals with EntityInfo structures in the first pass, and AIState structures in the second pass, making it much more cache friendly and efficient.
Realistically, both the first and second passes will have to access some common data (for example the entity current state: fleeing, engaged, exploring, idle, etc). If it’s only a small amount of data, the best solution might be to simply duplicate that data on both structures (going against all “common wisdom†in Computer Science). If the common data is larger or is read-write, it might make more sense to give it separate data structure of its own.
At this point, a different kind of complexity is introduced: Keeping track of all the relationships from the different structures. This can be particularly challenging while debugging because some of the data belonging to the same logical entity isn’t stored in the same structure and it’s harder to explore in a debugger. Even so, making good use of indices and handles makes this problem much more manageable (see Managing Data Relationships in the September 2008 issue of Game Developer).
Conditional Execution
So far things are pretty simple because we’re assuming that every AI entity needs both updates and some ray casts. That’s not very realistic because entities are probably very bursty: sometimes they need a lot of ray casts, and sometimes they’re idle or following orders and don’t need any for a while. We can deal with this situation by adding a conditional execution to the second update function.
The easiest way to conditionally execute the update would be to add an extra output parameter to the FirstUpdate() function indicating whether the entity needs a second update or not. The same information could be derived form the calling code depending on whether there were any raycasts queries added. Then, in the second pass, we only update those entities that appear in the list of entities requiring a second update.
The biggest drawback of this approach is that the second update went from traversing memory linearly to skipping over entities, potentially affecting cache performance. So what we thought was going to be a performance optimization ended up making things slower. Unless we’re gaining a significant performance improvement, it’s often better to simply do the work for all entities whether they need it or not. However, if on average less than 10 or 20 percent of the entities need a ray cast, then it might be worth avoiding doing the second update on all the other entities and paying the conditional execution penalty.
If the number of entities to be updated in the second pass is fairly small, another approach would be to copy all necessary data from the first pass into a new temporary buffer. The second pass can then process that data sequentially without any performance penalties and it would completely offset the performance hit of copying the data.
Finally, another alternative, especially if the conditional execution remains fairly similar from frame to frame, is to relocate entities that need raycasting together. That way the copying is minimal (swapping an entity to a new location in the array whenever it needs a raycast), and we still get the benefit of the sequential second update. For this to work all your entities need to be fully relocatable, which means working with handles or some other indirection, or updating all the references to the entities that swapped places.
Different Modes
What if the entity can be in several, totally different modes of execution? Even if it’s the same type of entity, traversing through them linearly calling the update function could end up using completely different code for each of them, so it will result in poor code cache performance.
There are several approached we can take in a situation like that:
- If the different execution modes also are tied to different parts of the entity data, we could treat them as if they were completely different entities and break each of their data components apart. That way, we can iterate through each type separately and get all the performance benefits.
- If the data is mostly the same, and it’s just the code that changes, we could keep all the entities in the same memory block, but rearrange them so that entities in the same mode are next to each other. Again, if you can relocate your data, this is very easy and efficient (it only requires swapping a few entities whenever the state changes).
- Leave it alone! Ultimately, Data-Oriented Design is about thinking about the data and how it affects your program. It doesn’t mean you always have to optimize every aspect of it, especially if the gains aren’t significant enough to warrant the added complexity.
The Future
Is thinking about a program in terms of data and doing these kind of optimizations a good use of our time? Is this all going to go away in the near future as hardware improves? As far as we can tell right now, the answer is a definite no. Efficient memory access with a single CPU is a very complicated problem, and matters get much worse as we add more cores. Also, the amount of transistors in CPUs (which is a rough measure of power) continues to increase much faster than memory access time. That tells us that, barring new technological breakthroughs, we’re going to be dealing with this problem for a long time. This is a problem we need to deal with right now and build our technology around it.
There are some things I’d like to see in the future to make Data-Oriented Design easier. We can all dream up of a new language that will magically allow for great memory access and easy paralellization, but replacing C/C++ and all existing libraries is always going to be a really hard sell. Historically, the best advances in game technology have been incremental, not throwing away existing languages, tools, and libraries (that’s why we’re still stuck with C++ today).
Here are two things that could be done right now and work with our existing codebases. I know a lot of developers are working on similar systems in their projects, but it would be great to have a common implementation released publicly so we can all build on top of them.
Language
Even though a functional language might be ideal, either created from scratch or reusing an existing one, we could temporarily extend C to fit our needs. I would like to see a set of C extensions where functions have clearly defined inputs and outputs, and code inside a function is not allowed to access any global state or call any code outside that function (other than local helper functions defined in the same scope). This could be done as a preprocessor or a modified C compiler, so it remains very compatible with existing libraries and code.
void FirstEntityUpdate(input Entities* entities, input int entityCount, output RayCastQueries* queries, output int queryCount);
Dependencies between functions would be expressed by tying the outputs of some functions to the input of other functions. This could be done in code or through the use of GUI tools that help developers manage data relationships visually. That way we can construct a dependency diagram of all the functions involved in every frame.
Scheduler
Once we have the dependencies for every function, we can create a directed acyclic graph (DAG) from it, which would give us a global view of how data is processed every frame. At that point, instead of running functions manually, we can leave that job in the hands of a scheduler.
The scheduler has full information about all the functions as well as the number of available cores (and information from the previous frame execution if we want to use that as well). It can determine the critical path through the DAG and optimize the scheduling of the tasks so the critical path is always being worked on. If temporary memory buffers are a limitation for our platform, the scheduler can take that into account and trade some performance time for a reduced memory footprint.
Just like the language, the scheduler would be a very generic component, and could be made public. Developers could use it as a starting point, build on top of it, and add their own rules for their specific games and platforms.
Even if we’re not ready to create those reusable components, every developer involved in creating high-performance games should be thinking about data in their games right now. Data is only going to get more important in the future as the next generation of consoles and computers rolls in.
This article was originally printed in the September 2010 issue of Game Developer.
New blog post: Data-Oriented Design Now And In The Future http://gamesfromwithin.com/data-oriented… I’ll follow up w/ more DOD posts. Send your questions.
@noel_llopis Nice post, are you still doing a chapter for GEG2 btw? 🙂
@nonchaotic And yes, the GEG2 chapter is already on its way to the printers (so… old fashioned! 🙂
@noel_llopis The whole thing was a mess of ghastly templates, but it worked nicely if you designed your system for it.
@noel_llopis As one might imagine, retroactively porting to it sucked
@noel_llopis it would certainly be better done with language extensions as you suggest, rather than TMP
@noel_llopis I am looking forward to reading it very much 🙂
@noel_llopis Really interesting post. Been thinking a bit about the functional-dod relationshiop lately. They seem a natural fit.
@noel_llopis Interesting you propose C extensions in that vain, at a company I worked at 2-3 years ago, we had something very similar
@noel_llopis that essentially provided that functionality. Although modification of global state couldn’t be safe-guarded.
@noel_llopis We had some nice debug tools that allowed you to visualise the DAG and how it had scheduled the jobs.
@noel_llopis My marking atoms with their usage, it would allow the scheduler to ensure prerequiesite work was completed kernels were ran
@nonchaotic Very cool. I know some companies doing the same. How did it work for you guys? What would you have done differently?
@noel_llopis I would have probably tried to implement it as a preprocessor and generate scheduling metadata that the scheduler can use
@noel_llopis in the main the scheduling of subsystems can be an offline problem.
@noel_llopis quick question. Where is that discussion/criticism you talk about happening? Or you mean the casual tweet here and there?
@mysterycoconut Through Twitter and blog posts I saw referenced through Twitter. I’ll be posting links to them.
@noel_llopis Great read! Extended C sounds like shader lang.Should funcs be organized in scoped contexts to keep their mem controllable too?
@noel_llopis One issue I can see with regards to Language section of your DOD post, is it’d be hard to get people to change languages.
@lantree Agreed. That’s why I was hoping for an extension or preprocessor. Still C and still binary compat with libraries.
@noel_llopis We’ll be stuck with C for quite some time because those in the games community have a preference for it due to legacy.
@noel_llopis Also have you seen Parallel Linq in C#. It performs parallel operations over a dataset using a foreach idiom or SQL syntax.
I read the original article when it was published, looking forward to seeing the follow up. The brick wall that I can see coming up is the structure of the development environment. iOS / Android work very well with DOD but it’s difficult to see it transition to PC / Console games without a paradigm shift in approach to coding.
Really? Why would it be easier on iOS/Android? It’s actually on PCs and consoles that you get the real benefits. So far, on iOS or Android, we only have one core and memory access isn’t as huge a deal as on “modern” hardware. But it’s coming, oh is it coming :-b
@noel_llopis Great article! The complexity introduced and the abstraction level worry me a bit but that the way to go (fast)!
Nice post! Maybe it could be enough to use Mozilla Dehydra with some additional user attributes to force the input/output parameters and the pure functions.
https://developer.mozilla.org/En/Dehydra/Using_Dehydra
While I’m a fan of DOD, I’d like to point out that nothing in OO prevents you from taking a smarter approach. Especially once you start thinking of containers and operations as objects, too. And let go of the idea of single-value returns.
intermediates,queries=entities.apply(InitialUpdate);
queries.apply(PerformRayCast);
entities=intermediate.apply(FinalUpdate);
In general, for many tasks in games, map/reduce (which is what this is) is a very powerful approach.
Rachel
TZ RT New blog post: Data-Oriented Design Now And In The Future http://gamesfromwithin.com/data-oriented… I’ll follow up w/ more DOD posts. Send your questions
@noel_llopis Regarding C extensions, have you used GCC’s pure or const function attributes? (ref: http://gcc.gnu.org/onlinedocs/gcc/Functi…)
@maximilianburke I hadn’t seen that. Looks very promising. Thanks!
The language extensions sound really interesting. Have you seen the SGL work being done at Cornell? http://www.cs.cornell.edu/bigreddata/games/SGL.php Their approach has a nice split between read only and write only data for a game update that facilitates multi-threaded processing very nicely…
My question is:
I know lots of AAA games, some with 60hz frame rates and lots of characters on the screen that do almost none of these optimizations. Sure they have the basic ones like particle systems and maybe memory layout optimizations for skinning or loding but otherwise they are using the standard way shown in your listing #1 for AI and collisions.
Given that, what are the benefits of this method? Which games would not be possible without these methods? Is there a tradeoff in design complexity? Debugging complexity? Implementation complexity?
To put it bluntly, if the top selling high tech games are not using these techniques, what’s the point? For most games, wouldn’t making the game fun a better return on investment than optimizing the code?
I’d love to hear which AAA games use ‘almost none’ of these optimizations. It’s kind of hard to ignore the cost of cache misses on consoles (not to mention SPU memory constraints, DMA issues, etc.) and still produce high-perf titles, so I’d love to see what those guys did that allows them to work around that.
Because, as you said, at the end of the day making a game more fun is always more worthwhile than optimizing – *if* you don’t need to.
That’s probably the most important question of all. I’ll definitely address it on a post of its own but my short answer is: You need this if your game relies on good tech to stand out (more characters, more items, more stuff, better looking stuff, etc). If you’re making Super Meat Boy, or even happy using Unreal or Unity, then there’s probably not much point (except for a few key systems).
There are other reasons other than performance to use DOD, but they’re not as strong or compelling (especially if you already have an architecture/pipeline in place).
Very few if any hit games rely on the tech described above to stand out.
better looking = good artists not good tech. Some of the best looking games have crap tech. Metal Gear Solid for example is all about amazing artists working within the limits of their system. It’s not about programmers adding more polygons.
more items / more characters / more stuff. It’s hard to name a game that uses the techniques described above that was a hit because it had more characters than some other game that didn’t use these techniques. For example looking that the Source SDK docs I’m pretty sure L4D doesn’t use these techniques and yet it has tons of characters on the screen.
There’s different kinds of more. More quantity of the same stuff, more variety. More variety > more quantity and yet more variety has nothing to do with these techniques.
Let me be clear. I’m not saying you shouldn’t optimize your data for cache coherence etc. Most engines do this to some degree for particles, for skinning, for animation, for dynamic lodding, for PVS traversal. But very few take it to the level of parallelizing AI and ray casting. That’s the part I’m questioning.
Exactly! I’m also a big fan of DOD but I’d really like too see a sensible and manageable example of DOD implementation of world entities and AI logics. All examples I’ve seen so far are very simplistic and don’t take into account tons of misc. issues.
Don’t get me wrong, I love DOD but I’m seriously not sure how to apply this technique for *all* game code.
Furthermore, DOD requires some discipline and knowledge. At the same time there are gameplay scripters using Lua who don’t really bother about DOD and their code contains lots of “ifs”. And since it does the job it’s a complete nightmare trying to rewrite it using DOD.
@noel_llopis Nice Article.
The scheduler you’re talking about is very close to what I designed a few years ago and presented at GDC 2010. That’s nice to see we are all moving more and more toward such sequencer. It’s not how we traditionally work but once we are used to take a different view to the problem, it’s not that hard.
Don’t get me wrong, I love DOD but I’m seriously not sure how to apply this technique for *all* game code.
Furthermore, DOD requires some discipline and knowledge. At the same time there are gameplay scripters using Lua who don’t really bother about DOD and their code contains lots of “ifsâ€. And since it does the job it’s a complete nightmare trying to rewrite it using DOD
Just about anywhere programmers are consuming DOD code, you should be fine. They should be able to understand and maintain it.
Script, as you point out, can be a problem. In the worst cases, every game object instance becomes its own class because so much custom behavior is attached to it. Script tends toward anarchy because often the people making script are concerned with getting things done more than getting them done well.
The only solution I can think of is to have dedicated dev time to watch the engine/script barrier. Every so often, more behavior should be refactored into engine code. And, conversely, engine code could be migrated to script when it’s not used enough or becomes too brittle. The goal is to keep the main processing time in the engine where it can be optimized better while keeping the rare cases out in script.
Outside of the game industry, this topic has flourished, and grown quite a bit. The information presented here is a subset of a larger topic. Martin Fowler does a good job explaining these concepts, and, although the post is over 5 years old, it is still relevant.
http://martinfowler.com/articles/languageWorkbench.html
Interesting read, though it seems to relate more to how you might go about modeling components in a data driven way regardless of the programming paradigm (DOD, OOP). However, both DOD and the approach described in the article are really just examples of dataflow programming.
@noel_llopis A lot of people talk about functional languages like Haskell, F#, Clojure etc being the future, due to multicore processors. But these languages do in general not seem to address the issue of memory layout, importance of cache misses in modern hardware etc.
Ignoring the Game developers need to interface with existing libraries and tools, what do you think would be the future for programming languages? Will a functional language work, or do we need some new category of programming languages? Or do we already have languages that fit the bill?
Personally I was thinking something like Google’s Go was a step in the right direction, given that it deals with both concurrency and has fine grained control over memory layout. Any thoughts?
@d281a23b55db2b3d1d6b0be43791bf6b:disqus Hi, i’m write a monograph about Data-Oriented Design and is very dificult find something about this topic, i need references about this, if you can help me. 😀 Thanks a lot!
Great article! This seems very much in line with “service architecture”, a very data oriented approach to game architecture presented here: http://jdinolt.tumblr.com/post/756062883/hrraaah-ego-trippin-my-first-talk-at-the-igad
Keep up the great work!