All About In-App Purchases Part 2: Selling The Goods

The last entry about in-app purchases left off with the products correctly displayed on the store and ready for purchase. Let’s push that buy button!

Purchasing a product

iap_1The store is fully populated, the items are displayed and, if all goes well, the user presses the buy button. What now?

At this point I check that in-app purchases are enabled and if they aren’t, I put up a message box. Apple says you can also do this earlier, but I figured I’d rather show users what’s available and entice them to buy it. By the time they get that message, they’ll be much more likely to enable them and continue the purchase than if they never saw the store in the first place.

Actually, before I even do that check, I see if the item is currently offered for free. If it is, the code takes an alternative path that doesn’t involve the App Store because apparently you can’t set an in-app purchase to free. Really! Why not? It’s yet something else we need to implement ourselves. Please, Apple, fix that. In the case of free purchases (like the Bonus Seeds included in the full version of Flower Garden) I contact my server directly and download everything from there. If you’re thinking that a hacker can just set the free flag to true and have access to all your content, it’s not that easy, because the server checks that the product is really marked as free when it’s contacted, and if it isn’t, it declines the transaction. So they have to work a bit harder than that to crack things (more on that next time).

But if the product is not marked as free and in-app purchases are enabled, then you can add the payment to the StoreKit payment queue which is, of course, another asynchronous call. So now we’re back to having to make a decision of what we want the user to do after they attempt to purchase something. Does someone really want to move away from the screen before they complete the purchase? I don’t, so I set another blocking activity screen explaining what’s going on with an activity indicator.

if ([SKPaymentQueue canMakePayments])
{
    SKPayment* payment = [SKPayment paymentWithProductIdentifier:m_product.m_id];
    SKPaymentQueue* queue = [SKPaymentQueue defaultQueue];
    [queue addPayment:payment];
    [m_progressViewController setText:@"Accessing store" completed:NO];
    [m_progressViewController display:YES animated:YES];
}

To be able to test the actual purchase, you need to create a test account in iTunes Connect. It’s slightly annoying that you need to explicitly log out from the Settings | Store screen and you can’t do that from the app itself, but it’s not nearly as much of a big deal as not being able to test purchases from the simulator at all. That’s a big fail as far as I’m concerned, and I can’t imagine a single reason why that was done that way (other than lack of time/manpower on Apple’s side).

Completing the purchase

You think we might be done by now, but not. Far from it. We’re just getting warmed up.

Now that the payment has entered the queue, you’re going to get a callback notification every time its status changes. That means you’ll get one right away because it was just added to the queue, another time when the payment is approved by the App Store, or when it fails, or if it’s detected that it was already purchased and this a re-purchase.

iap_2While that’s happening, the StoreKit code will bring up those ugly message boxes asking the user if they’re really sure they want to purchase the item for that price, and possibly ask them for their login info. It’s funny that the first time I implemented this part, I thought that it was up to me to put those message boxes up, so they were showing up twice. You don’t have to do anything for that, whether you want to or not. What’s worse, is that each time one of those message boxes comes up, your app gets a message putting it on hold, which usually means the sound is temporarily muted. Looks very unprofessional and I’m surprised Apple forces that on us while they leave everything else so open ended and requiring so much work.

Once the purchase is marked as complete, you can obtain the content for that purchase and make it available in your app. After that step is complete, you can remove the payment from the queue and the transaction is considered complete. In the case of Flower Garden, most purchases involve accessing my server and downloading some data from there for new seeds. Some purchases don’t require any downloading though, and all they do is twiddle a few bits internally.

It’s important to understand the payment queue well. First of all, it’s persistent, so once a purchase payment has been added to it, it will be there next time the application starts if it was never removed. It was designed that way to make sure an application can deliver on the product that was purchased even if the app is stopped, crashes, or is interrupted in any way. So initially, a payment is added to the queue meaning that the user has initiated the purchase transaction. At that point, the transaction can fail for multiple reasons: User cancels payment during one of the prompts, wrong user ID, the servers are down, the product was removed from the store, etc. Assuming the purchase is successful, the user account will be charged the cost of the product and the payment will be marked as successful. It’s at this time that you have to deliver the goods. Only once you’ve successfully done that you can go ahead and remove the payment from the queue. Otherwise, next time that the app starts, your code should detect a payment there and continue processing it. You really never want to get in a situation that the user pays for something and it’s never delivered to them.

I’m happy with how the StoreKit API will keep track of past purchases, and if a user tries to buy the same item again, they’ll be given the option to redownload it for free. It’s a great relief not to have to keep track of that ourselves. It will also cut down on customer support emails a huge amount because users can always redownload something if they had to uninstall their app, or somehow, in spite of all our precautions, they bought something but it was never updated in their app. In that case a quick re-download should fix it (just make sure your code is pretty stateless and can deal with re-downloading the same content multiple times).

It gets a bit trickier though. It turns out the payment queue can have more than one payment in it. Why is that? First of all, because the buy process wasn’t blocking the way Apple designed it, a hyperactive user could initiate multiple purchases at once. The other, more realistic situation is that the user initiates a purchase, it’s verified with the App Store, but before you can complete the download, your application is shut down. The Store Kit queue will be restored next time the application starts, but at that point the user could go to the store again and start another purchase.

In any case, you really need to deal with that situation, otherwise you risk losing purchases and angering users (and you really don’t want to rely on users re-downloading something for free if you can avoid it). This is trickier than it sounds when you’re dealing with downloads from your own server because it might involve multiple downloads, multiple files, etc. So in my case, whenever I get an update of status in the SKPaymentQueue, I add it to my own queue. Then I can process that queue sequentially, one purchase at the time.

- (void)paymentQueue:(SKPaymentQueue *)queue updatedTransactions:(NSArray *)transactions
{
    for (SKPaymentTransaction* transaction in transactions)
        [m_transactionQueue addObject:transaction];

    [self processNextTransaction];
}

- (void)processNextTransaction
{
    if (m_processingTransaction)
        return;

    while (m_transactionQueue.count > 0)
    {
        SKPaymentTransaction* transaction = [m_transactionQueue objectAtIndex:0];
        [m_transactionQueue removeObjectAtIndex:0];

        switch (transaction.transactionState)
        {
        case SKPaymentTransactionStatePurchased:
            m_processingTransaction = true;
            [self completeTransaction:transaction];
            return;
        case SKPaymentTransactionStateFailed:
            m_processingTransaction = true;
            [self failedTransaction:transaction];
            return;
        case SKPaymentTransactionStateRestored:
            m_processingTransaction = true;
            [self restoreTransaction:transaction];
            return;
        default: break;
        }
    }
}

Whenever a transaction is complete, you can flag it as finished and it will be completely removed from the payment queue: [[SKPaymentQueue defaultQueue] finishTransaction:info.m_transaction];

A whole  step I skipped for now is verifying the purchase with your server. I’ll cover that in the next part of this post when I talk about anti-piracy measures.

All About In-App Purchases Part 1: Displaying Store Items

Ever since Apple announced OS 3.0 with in-app purchases, I knew I had implement it in Flower Garden. The concept of in-app purchases fits very well with the idea of a flower shop where virtual gardeners can purchase extra items for their gardens. It was just a matter of justifying the time necessary to implement it. I knew from my previous experience with downloadable content in games that only a small amount of the people who originally purchased the game would be interested in buying additional content. Flower Garden has been extremely well received, both by the media and the players, but it never got high-up enough on the charts to be a big seller, so in-app purchases were doomed from the start to be very limited.

The situation improved as soon as Apple announced the availability of in-app purchases in free apps. Flower Garden Free had been out for a few weeks at that point, so that effectively doubled the potential audience.

The final consideration that convinced me to plunge ahead was the realization that while only a small percentage of users would buy extra content, a fraction of those would make multiple purchases, and some people would probably buy every available item. So maybe it would be worthwhile after all.

fg_flowershop2Having said that, I gave myself one week to completely implement in-app purchases in Flower Garden. That included everything, from implementing the actual purchasing through StoreKit, to server code, and, of course, creating the additional content itself.

For those of you familiar with Flower Garden, the new content I created specifically for in-app purchases was a flower shop offering:

  • An extra garden screen with 12 pots
  • Liquid plant fertilizer that speeds up plant growth. This is a consumable item with a fixed number of doses and can be repurchased after it’s used up.

Additionally, the flower shop in the free version of Flower Garden had these items that would bring it functionality up to par with the full version of Flower Garden:

  • Extra pots in the main garden
  • Extra common seeds
  • Set of bonus seeds

My initial estimate of one week turned out to be very optimistic, and it ended up taking closer to two and a half weeks. Most of it was spent creating the new content and integrating it smoothly in the game.

In this post and the next few ones, I’ll share my experiences with in-app purchases. From implementation details, tips and tricks, how it might help with piracy, mistakes I made along the way, and even how many sales in-app purchases generated.

Displaying Store Items

Apple did a good job documenting the overview of how to implement in-app purchases. Some of the details are missing, but they’re small enough that I was able to fill in the blanks pretty easily and get things working in a couple of days.

However, I really think the process should have been implemented very differently in Apple’s part. The sequence of actions that needs to be done feels overly complicated. It may be great if someone is trying to do a very customized and integrated store with all sorts of crazy features, but really, in 99% of the cases, developers just want to sell something and be done with it. Forcing us to go through all those steps seems overkill. A simplified higher-level set of helper functions would be a welcome addition to the SDK. For suggestions on how to go about it, look at the SDK for game consoles. They got that part right (or at least much more right than the iPhone SDK).

Getting the catalog

The first step towards implementing in-app purchases, is to get a list of all the products you want to sell in your store. It’s a good idea to keep this list off your app and on a web server instead, that way you can add, remove, or edit any products without going through a full app update. That in itself makes in-app purchases very attractive, doesn’t it? You can have a look at the master Flower Garden shop catalog.

Getting product info

Step two is to go through each of those products, and get the official product information from the App Store. That involves creating a set with the product ids you want to query, and sending a request.

The code in Flower Garden to query for product ids looks like this:

NSArray* products = m_appData->m_shopCatalog.m_products;
NSMutableSet* productIds = [[NSMutableSet alloc] initWithCapacity:32];
for (int i=0; i<[products count]; ++i)
{
    ShopProduct* product = [products objectAtIndex:i];
    if (!product.m_alreadyIncluded)
        [productIds addObject:product.m_id];
}

SKProductsRequest* request = [[SKProductsRequest alloc] initWithProductIdentifiers:productIds];
request.delegate = self;
#ifdef WORK_WITHOUT_APP_STORE
[self productsRequest:request didReceiveResponse:nil];
#else
[request start];
#endif
[productIds release];

There are a couple interesting things in that code. First of all, notice that before we ask for the product info, we check if the product is already included in the app. That’s because some items, such as the extra pots or the common seeds, already come as part of Flower Garden Full. But at the same time, I wanted to show them in the store and mark them as purchased. So I need to have a parallel path for those items that doesn’t go through the App Store. More on that in a future post.

The other interesting bit is the #ifdef. This one really sucks, so get ready: The simulator can’t make any StoreKit calls. It fails with a nice message box explaining why, but it fails nonetheless. To put it in highly technical terms: That blows! Seriously, that means you’re stuck developing on the device. Turnaround times are about 20-30 seconds, and debugging on the device directly is no fun at all. I’m convinced that alone was part of the reason it took me longer than my initial estimate. So that #ifdef was my attempt to try to get as much done on the simulator as possible. At least I was able to display the products on the flower shop. The actual buying had to be done on the device though. No way around it as far as I know. Why, oh, why, doesn’t Apple let us add a test account on the simulator? Maybe a feature for SDK 4.0?

Displaying the products

fg_flowershop3

Some time after sending the request for info on the products, and assuming we have a valid internet connection and that the App Store is in a good mood (to their credit, Apple’s servers have been very reliable), we’ll get a response. The response will include more information for each of the products we requested, including localized name, description, and price. This is all information taken from the in-app purchases you created in iTunes Connect. And yes, that means that you need to keep multiple sets of data in sync: Your shop catalog and the in-app purchases in iTunes Connect.

Notice that because everything we’ve done so far has been asynchronous (and asynchronous is the name of the game for a lot of the remaining steps as well) it introduces a lot of complexity. What do we do during that time? Do we let the user wander around to other parts of the program? Do we block and make the UI modal? I went with the latter approach for simplicity both coding and in the UI for the user.

At this point you can display them in your shop. In my case I ended up using a very similar interface to the App Store on the iPhone. I figured users are already familiar with that kind of interface, so might as well build on top of that. So that meant the flower shop is a table showing all available products. Clicking on a cell brings up a view with details and a description of each product.

To make my life easier, the only elements that are hardwired on the product page are the icon, the title, and the price/purchase button. The actual description and screenshots is just an embedded web view and part of the product description in the catalog has the correct URL for each project. You can even access some of them directly here.

So far all this has done for us is let us populate a list of products that are available to be sold. Next entry will cover the details of the actual transaction and a few things to watch out for.

Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP)

Picture this: Toward the end of the development cycle, your game crawls, but you don’t see any obvious hotspots in the profiler. The culprit? Random memory access patterns and constant cache misses. In an attempt to improve performance, you try to parallelize parts of the code, but it takes heroic efforts, and, in the end, you barely get much of a speed-up due to all the synchronization you had to add. To top it off, the code is so complex that fixing bugs creates more problems, and the thought of adding new features is discarded right away. Sound familiar?
That scenario pretty accurately describes almost every game I’ve been involved with for the last 10 years. The reasons aren’t the programming languages we’re using, nor the development tools, nor even a lack of discipline. In my experience, it’s object- oriented programming (OOP) and the culture that surrounds it that is in large part to blame for those problems. OOP could be hindering your project rather than helping it!
It’s All About Data
OOP is so ingrained in the current game development culture that it’s hard to think beyond objects when thinking about a game. After all, we’ve been creating classes representing vehicles, players, and state machines for many years. What are the alternatives? Procedural programming? Functional languages? Exotic programming languages?
Data-oriented design is a different way to approach program design that addresses all these problems. Procedural programming focuses on procedure calls as its main element, and OOP deals primarily with objects. Notice that the main focus of both approaches is code: plain procedures (or functions) in one case, and grouped code associated with some internal state in the other. Data-oriented design shifts the perspective of programming from objects to the data itself: The type of the data, how it is laid out in memory, and how it will be read and processed in the game.
Programming, by definition, is about transforming data: It’s the act of creating a sequence of machine instructions describing how to process the input data and create some specific output data. A game is nothing more than a program that works at interactive rates, so wouldn’t it make sense for us to concentrate primarily on that data instead of on the code that manipulates it?
I’d like to clear up potential confusion and stress that data-oriented design does not imply that something is data- driven. A data-driven game is usually a game that exposes a large amount of functionality outside of code and lets the data determine the behavior of the game. That is an orthogonal concept to data-oriented design, and can be used with any type of programming approach.
Ideal Data
If we look at a program from the data point of view, what does the ideal data look like? It depends on the data and how it’s used. In general, the ideal data is in a format that we can use with the least amount of effort. In the best case, the format will be the same we expect as an output, so the processing is limited to just copying that data. Very often, our ideal data layout will be large blocks of contiguous, homogeneous data that we can process sequentially. In any case, the goal is to minimize the amount of transformations, and whenever possible, you should bake your data into this ideal format offline, during your asset-building process.
Because data-oriented design puts data first and foremost, we can architect our whole program around the ideal data format. We won’t always be able to make it exactly ideal (the same way that code is hardly ever by-the-book OOP), but it’s the primary goal to keep in mind. Once we achieve that, most of the problems I mentioned at the beginning of the column tend to melt away (more about that in the next section).
When we think about objects, we immediately think of trees— inheritance trees, containment trees, or message-passing trees, and our data is naturally arranged that way. As a result, when we perform an operation on an object, it will usually result in that object in turn accessing other objects further down in the tree. Iterating over a set of objects performing the same operation generates cascading, totally different operations at each object (see Figure 1a).
To achieve the best possible data layout, it’s helpful to break down each object into the different components, and group components of the same type together in memory, regardless of what object they came from. This organization results in large blocks of homogeneous data, which allow us to process the data sequentially (see Figure 1b). A key reason why data-oriented design is so powerful is because it works very well on large groups of objects. OOP, by definition, works on a single object. Step back for a minute and think of the last game you worked on: How many places in the code did you have only one of something? One enemy? One vehicle? One pathfinding node? One bullet? One particle? Never! Where there’s one, there are many. OOP ignores that and deals with each object in isolation. Instead, we can make things easy for us and for the hardware and organize our data to deal with the common case of having many items of the same type.
Does this sound like a strange approach? Guess what? You’re probably already doing this in some parts of your code: The particle system! Data-oriented design is turning our whole codebase into a gigantic particle system. Perhaps a name for this approach that would be more familiar to game programmers would have been particle-driven programming.
Advantages of Data-Oriented Design
hinking about data first and architecting the program based on that brings along lots of advantages.
Parallelization.
These days, there’s no way around the fact that we need to deal with multiple cores. Anyone who has tried taking some OOP code and parallelizing it can attest how difficult, error prone, and possibly not very efficient that is. Often you end up adding lots of synchronization primitives to prevent concurrent access to data from multiple threads, and usually a lot of the threads end up idling for quite a while waiting for other threads to complete. As a result, the performance improvement can be quite underwhelming.
When we apply data-oriented design, parallelization becomes a lot simpler: We have the input data, a small function to process it, and some output data. We can easily take something like that and split it among multiple threads with minimal synchronization between them. We can even take it further and run that code on processors with local memory (like the SPUs on the Cell processor) without having to do anything differently.
Cache utilization.
In addition to using multiple cores, one of the keys to achieving great performance in modern hardware, with its deep instruction pipelines and slow memory systems with multiple levels of caches, is having cache-friendly memory access. Data-oriented design results in very efficient use of the instruction cache because the same code is executed over and over. Also, if we lay out the data in large, contiguous blocks, we can process the data sequentially, getting nearly perfect data cache usage and great performance. Possible optimizations. When we think of objects or functions, we tend to get stuck optimizing at the function or even the algorithm level; Reordering some function calls, changing the sort method, or even re-writing some C code with assembly.
That kind of optimization is certainly beneficial, but by thinking about the data first we can step further back and make larger, more important optimizations. Remember that all a game does is transform some data (assets, inputs, state) into some other data (graphics commands, new game states). By keeping in mind that flow of data, we can make higher-level, more intelligent decisions based on how the data is transformed, and how it is used. That kind of optimization can be extremely difficult and time- consuming to implement with more traditional OOP methods.
Modularity.
So far, all the advantages of data-oriented design have been based around performance: cache utilization, optimizations, and parallelization. There is no doubt that as game programmers, performance is an extremely important goal for us. There is often a conflict between techniques that improve performance and techniques that help readability and ease of development. For example, re-writing some code in assembly language can result in a performance boost, but usually makes the code harder to read and maintain.
Fortunately, data-oriented design is beneficial to both performance and ease of development. When you write code specifically to transform data, you end up with small functions, with very few dependencies on other parts of the code. The codebase ends up being very “flat,†with lots of leaf functions without many dependencies. This level of modularity and lack of dependences makes understanding, replacing, and updating the code much easier.
Testing.
The last major advantage of data-oriented design is ease of testing. As we saw in the June and August Inner Product columns, writing unit tests to check object interactions is not trivial. You need to set up mocks and test things indirectly. Frankly, it’s a bit of a pain. On the other hand, when dealing directly with data, it couldn’t be easier to write unit tests: Create some input data, call the transform function, and check that the output data is what we expect. There’s nothing else to it. This is actually a huge advantage and makes code extremely easy to test, whether you’re doing test-driven development or just writing unit tests after the code.
Drawbacks of Data-Oriented Design
Data-oriented design is not the silver bullet to all the problems in game development. It does help tremendously writing high-performance code and making programs more readable and easier to maintain, but it does come with a few drawbacks of its own.
The main problem with data-oriented design is that it’s different from what most programmers are used to or learned in school. It requires turning our mental model of the program ninety degrees and changing how we think about it. It takes some practice before it becomes second-nature.
Also, because it’s a different approach, it can be challenging to interface with existing code, written in a more OOP or procedural way. It’s hard to write a single function in isolation, but as long as you can apply data-oriented design to a whole subsystem you should be able to reap a lot of the benefits.
Applying Data-Oriented Design
Enough of the theory and overview. How do you actually get started with data-oriented design? To start with, just pick a specific area in your code: navigation, animations, collisions, or something else. Later on, when most of your game engine is centered around the data, you can worry about data flow all the way from the start of a frame until the end.
The next step is to clearly identify the data inputs required by the system, and what kind of data it needs to generate. It’s OK to think about it in OOP terms for now, just to help us identify the data. For example, in an animation system, some of the input data is skeletons, base poses, animation data, and current state. The result is not “the code plays animations,†but the data generated by the animations that are currently playing. In this case, our outputs would be a new set of poses and an updated state.
It’s important to take a step further and classify the input data based on how it is used. Is it read- only, read-write, or write-only? That classification will help guide design decisions about where to store it, and when to process it depending on dependencies with other parts of the program.
At this point, stop thinking of the data required for a single operation, and think in terms of applying it to dozens or hundreds of entries. We no longer have one skeleton, one base pose, and a current state, and instead we have a block of each of those types with many instances in each of the blocks.
Think very carefully how the data is used during the transformation process from input to output. You might realize that you need to scan a particular field in a structure to perform a pass on the data, and then you need to use the results to do another pass. In that case, it might make more sense to split that initial field into a separate block of memory that can be processed independently, allowing for better cache utilization and potential parallelization. Or maybe you need to vectorize some part of the code, which requires fetching data from different locations to put it in the same vector register. In that case, that data can be stored contiguously so vector operations can be applied directly, without any extra transformations.
Now you should have a very good understanding of your data. Writing the code to transform it is going to be much simpler. It’s like writing code by filling in the blanks. You’ll even be pleasantly surprised to realize that the code is much simpler and smaller than you thought in the first place, compared to what the equivalent OOP code would have been.
If you think back about most of the topics we’ve covered in this column over the last year, you’ll see that they were all leading toward this type of design. Now it’s the time to be careful about how the data is aligned (Dec 2008 and Jan 2009), to bake data directly into an input format that you can use efficiently (Oct and Nov 2008), or to use non- pointer references between data blocks so they can be easily relocated (Sept 2009).
Is Thre Room For OOP?
Does this mean that OOP is useless and you should never apply it in your programs? I’m not quite ready to say that. Thinking in terms of objects is not detrimental when there is only one of each object (a graphics device, a log manager, etc) although in that case you might as well write it with simpler C-style functions and file-level static data. Even in that situation, it’s still important that those objects are designed around transforming data.
Another situation where I still find myself using OOP is GUI systems. Maybe it’s because you’re working with a system that is already designed in an object-oriented way, or maybe it’s because performance and complexity are not crucial factors with GUI code. In any case, I much prefer GUI APIs that are light on inheritance and use containment as much as possible (Cocoa and CocoaTouch are good examples of this). It’s very possible that a data-oriented GUI system could be written for games that would be a pleasure to work with, but I haven’t seen one yet.
Finally, there’s nothing stopping you from still having a mental picture of objects if that’s the way you like to think about the game. It’s just that the enemy entity won’t be all in the same physical location in memory. Instead, it will be split up into smaller subcomponents, each one forming part of a larger data table of similar components.
Data-oriented design is a bit of a departure from traditional programming approaches, but by always thinking about the data and how it needs to be transformed, you’ll be able to reap huge benefits both in terms of performance and ease of development.
Thanks to Mike Acton and Jim Tilander for challenging my ideas over the years and for their feedback on this article.

Picture this: Toward the end of the development cycle, your game crawls, but you don’t see any obvious hotspots in the profiler. The culprit? Random memory access patterns and constant cache misses. In an attempt to improve performance, you try to parallelize parts of the code, but it takes heroic efforts, and, in the end, you barely get much of a speed-up due to all the synchronization you had to add. To top it off, the code is so complex that fixing bugs creates more problems, and the thought of adding new features is discarded right away. Sound familiar?

That scenario pretty accurately describes almost every game I’ve been involved with for the last 10 years. The reasons aren’t the programming languages we’re using, nor the development tools, nor even a lack of discipline. In my experience, it’s object- oriented programming (OOP) and the culture that surrounds it that is in large part to blame for those problems. OOP could be hindering your project rather than helping it!

It’s All About Data

OOP is so ingrained in the current game development culture that it’s hard to think beyond objects when thinking about a game. After all, we’ve been creating classes representing vehicles, players, and state machines for many years. What are the alternatives? Procedural programming? Functional languages? Exotic programming languages?

Data-oriented design is a different way to approach program design that addresses all these problems. Procedural programming focuses on procedure calls as its main element, and OOP deals primarily with objects. Notice that the main focus of both approaches is code: plain procedures (or functions) in one case, and grouped code associated with some internal state in the other. Data-oriented design shifts the perspective of programming from objects to the data itself: The type of the data, how it is laid out in memory, and how it will be read and processed in the game.

Programming, by definition, is about transforming data: It’s the act of creating a sequence of machine instructions describing how to process the input data and create some specific output data. A game is nothing more than a program that works at interactive rates, so wouldn’t it make sense for us to concentrate primarily on that data instead of on the code that manipulates it?

I’d like to clear up potential confusion and stress that data-oriented design does not imply that something is data- driven. A data-driven game is usually a game that exposes a large amount of functionality outside of code and lets the data determine the behavior of the game. That is an orthogonal concept to data-oriented design, and can be used with any type of programming approach.

Ideal Data

Call sequence with an object-oriented approach

Figure 1a. Call sequence with an object-oriented approach

If we look at a program from the data point of view, what does the ideal data look like? It depends on the data and how it’s used. In general, the ideal data is in a format that we can use with the least amount of effort. In the best case, the format will be the same we expect as an output, so the processing is limited to just copying that data. Very often, our ideal data layout will be large blocks of contiguous, homogeneous data that we can process sequentially. In any case, the goal is to minimize the amount of transformations, and whenever possible, you should bake your data into this ideal format offline, during your asset-building process.

Because data-oriented design puts data first and foremost, we can architect our whole program around the ideal data format. We won’t always be able to make it exactly ideal (the same way that code is hardly ever by-the-book OOP), but it’s the primary goal to keep in mind. Once we achieve that, most of the problems I mentioned at the beginning of the column tend to melt away (more about that in the next section).

When we think about objects, we immediately think of trees— inheritance trees, containment trees, or message-passing trees, and our data is naturally arranged that way. As a result, when we perform an operation on an object, it will usually result in that object in turn accessing other objects further down in the tree. Iterating over a set of objects performing the same operation generates cascading, totally different operations at each object (see Figure 1a).

Call sequence with a data-oriented approach

Figure 1b. Call sequence with a data-oriented approach

To achieve the best possible data layout, it’s helpful to break down each object into the different components, and group components of the same type together in memory, regardless of what object they came from. This organization results in large blocks of homogeneous data, which allow us to process the data sequentially (see Figure 1b). A key reason why data-oriented design is so powerful is because it works very well on large groups of objects. OOP, by definition, works on a single object. Step back for a minute and think of the last game you worked on: How many places in the code did you have only one of something? One enemy? One vehicle? One pathfinding node? One bullet? One particle? Never! Where there’s one, there are many. OOP ignores that and deals with each object in isolation. Instead, we can make things easy for us and for the hardware and organize our data to deal with the common case of having many items of the same type.

Does this sound like a strange approach? Guess what? You’re probably already doing this in some parts of your code: The particle system! Data-oriented design is turning our whole codebase into a gigantic particle system. Perhaps a name for this approach that would be more familiar to game programmers would have been particle-driven programming.

Advantages of Data-Oriented Design

Thinking about data first and architecting the program based on that brings along lots of advantages.

Parallelization.

These days, there’s no way around the fact that we need to deal with multiple cores. Anyone who has tried taking some OOP code and parallelizing it can attest how difficult, error prone, and possibly not very efficient that is. Often you end up adding lots of synchronization primitives to prevent concurrent access to data from multiple threads, and usually a lot of the threads end up idling for quite a while waiting for other threads to complete. As a result, the performance improvement can be quite underwhelming.

When we apply data-oriented design, parallelization becomes a lot simpler: We have the input data, a small function to process it, and some output data. We can easily take something like that and split it among multiple threads with minimal synchronization between them. We can even take it further and run that code on processors with local memory (like the SPUs on the Cell processor) without having to do anything differently.

Cache utilization.

In addition to using multiple cores, one of the keys to achieving great performance in modern hardware, with its deep instruction pipelines and slow memory systems with multiple levels of caches, is having cache-friendly memory access. Data-oriented design results in very efficient use of the instruction cache because the same code is executed over and over. Also, if we lay out the data in large, contiguous blocks, we can process the data sequentially, getting nearly perfect data cache usage and great performance. Possible optimizations. When we think of objects or functions, we tend to get stuck optimizing at the function or even the algorithm level; Reordering some function calls, changing the sort method, or even re-writing some C code with assembly.

That kind of optimization is certainly beneficial, but by thinking about the data first we can step further back and make larger, more important optimizations. Remember that all a game does is transform some data (assets, inputs, state) into some other data (graphics commands, new game states). By keeping in mind that flow of data, we can make higher-level, more intelligent decisions based on how the data is transformed, and how it is used. That kind of optimization can be extremely difficult and time- consuming to implement with more traditional OOP methods.

Modularity.

So far, all the advantages of data-oriented design have been based around performance: cache utilization, optimizations, and parallelization. There is no doubt that as game programmers, performance is an extremely important goal for us. There is often a conflict between techniques that improve performance and techniques that help readability and ease of development. For example, re-writing some code in assembly language can result in a performance boost, but usually makes the code harder to read and maintain.

Fortunately, data-oriented design is beneficial to both performance and ease of development. When you write code specifically to transform data, you end up with small functions, with very few dependencies on other parts of the code. The codebase ends up being very “flat,†with lots of leaf functions without many dependencies. This level of modularity and lack of dependences makes understanding, replacing, and updating the code much easier.

Testing.

The last major advantage of data-oriented design is ease of testing. As we saw in the June and August Inner Product columns, writing unit tests to check object interactions is not trivial. You need to set up mocks and test things indirectly. Frankly, it’s a bit of a pain. On the other hand, when dealing directly with data, it couldn’t be easier to write unit tests: Create some input data, call the transform function, and check that the output data is what we expect. There’s nothing else to it. This is actually a huge advantage and makes code extremely easy to test, whether you’re doing test-driven development or just writing unit tests after the code.

Drawbacks of Data-Oriented Design

Data-oriented design is not the silver bullet to all the problems in game development. It does help tremendously writing high-performance code and making programs more readable and easier to maintain, but it does come with a few drawbacks of its own.

The main problem with data-oriented design is that it’s different from what most programmers are used to or learned in school. It requires turning our mental model of the program ninety degrees and changing how we think about it. It takes some practice before it becomes second-nature.

Also, because it’s a different approach, it can be challenging to interface with existing code, written in a more OOP or procedural way. It’s hard to write a single function in isolation, but as long as you can apply data-oriented design to a whole subsystem you should be able to reap a lot of the benefits.

Applying Data-Oriented Design

Enough of the theory and overview. How do you actually get started with data-oriented design? To start with, just pick a specific area in your code: navigation, animations, collisions, or something else. Later on, when most of your game engine is centered around the data, you can worry about data flow all the way from the start of a frame until the end.

The next step is to clearly identify the data inputs required by the system, and what kind of data it needs to generate. It’s OK to think about it in OOP terms for now, just to help us identify the data. For example, in an animation system, some of the input data is skeletons, base poses, animation data, and current state. The result is not “the code plays animations,†but the data generated by the animations that are currently playing. In this case, our outputs would be a new set of poses and an updated state.

It’s important to take a step further and classify the input data based on how it is used. Is it read- only, read-write, or write-only? That classification will help guide design decisions about where to store it, and when to process it depending on dependencies with other parts of the program.

At this point, stop thinking of the data required for a single operation, and think in terms of applying it to dozens or hundreds of entries. We no longer have one skeleton, one base pose, and a current state, and instead we have a block of each of those types with many instances in each of the blocks.

Think very carefully how the data is used during the transformation process from input to output. You might realize that you need to scan a particular field in a structure to perform a pass on the data, and then you need to use the results to do another pass. In that case, it might make more sense to split that initial field into a separate block of memory that can be processed independently, allowing for better cache utilization and potential parallelization. Or maybe you need to vectorize some part of the code, which requires fetching data from different locations to put it in the same vector register. In that case, that data can be stored contiguously so vector operations can be applied directly, without any extra transformations.

Now you should have a very good understanding of your data. Writing the code to transform it is going to be much simpler. It’s like writing code by filling in the blanks. You’ll even be pleasantly surprised to realize that the code is much simpler and smaller than you thought in the first place, compared to what the equivalent OOP code would have been.

If you think back about most of the topics we’ve covered in this column over the last year, you’ll see that they were all leading toward this type of design. Now it’s the time to be careful about how the data is aligned (Dec 2008 and Jan 2009), to bake data directly into an input format that you can use efficiently (Oct and Nov 2008), or to use non- pointer references between data blocks so they can be easily relocated (Sept 2009).

Is There Room For OOP?

Does this mean that OOP is useless and you should never apply it in your programs? I’m not quite ready to say that. Thinking in terms of objects is not detrimental when there is only one of each object (a graphics device, a log manager, etc) although in that case you might as well write it with simpler C-style functions and file-level static data. Even in that situation, it’s still important that those objects are designed around transforming data.

Another situation where I still find myself using OOP is GUI systems. Maybe it’s because you’re working with a system that is already designed in an object-oriented way, or maybe it’s because performance and complexity are not crucial factors with GUI code. In any case, I much prefer GUI APIs that are light on inheritance and use containment as much as possible (Cocoa and CocoaTouch are good examples of this). It’s very possible that a data-oriented GUI system could be written for games that would be a pleasure to work with, but I haven’t seen one yet.

Finally, there’s nothing stopping you from still having a mental picture of objects if that’s the way you like to think about the game. It’s just that the enemy entity won’t be all in the same physical location in memory. Instead, it will be split up into smaller subcomponents, each one forming part of a larger data table of similar components.

Data-oriented design is a bit of a departure from traditional programming approaches, but by always thinking about the data and how it needs to be transformed, you’ll be able to reap huge benefits both in terms of performance and ease of development.

Thanks to Mike Acton and Jim Tilander for challenging my ideas over the years and for their feedback on this article.

This article was originally printed in the September 2009 issue of Game Developer.

Here’s a Korean translation of this article by Hakkyu Kim.

Two Day iPhone OpenGL Class Coming to The Bay Area

carsAfter the success of the OpenGL class in Denver, we’re bringing the iPhone OpenGL class to the Bay Area. It will be held November 19th and 20th in Cupertino, right next to Apple’s headquarters at Infinite Loop.

The class is aimed at iPhone developers without previous OpenGL or 3D graphics experience. As part of the class, we’ll create both 2D and 3D OpenGL applications, and we’ll cover a broad range of topics, start with the basics of setting up OpenGL and rendering triangles on screen, to multitexturing and point sprites in the last day. This is definitely a hands-on class, so you’ll need to bring your laptop and be ready to do some coding.

Check out the Mobile Orchard page for discounts and registration details. Feel free to contact me if you have any questions.

Hope to see you there!

Space in Stereo iPhone Game Jam Postmortem

A week ago, I sent out a quick tweet asking if anyone would be interested in doing an iPhone Game Jam at the 360iDev conference. The response was immediate and hugely positive, so, with the help of the organizers of 360iDev, we put together an informal iPhone Game Jam.

The idea was to get together Tuesday evening, starting at around 7PM, and to code all night and have an iPhone game (or at least a prototype) done by morning. About 25 showed up, working on about a dozen projects. Participants were welcome to group into teams or work solo. There were no restrictions as far as themes or technology. The only rules were that you had to finish something by morning (no leaving something that was 5% of a game) and you had to start the game from scratch (no finishing a game you had started a while ago).

Continue reading