Break That Thumb For Best iPhone Performance

I haven’t started doing any real performance optimizations on my iPhone app yet. I pretty much ported it from the original PC version, saw that it was running rather slow (2-3 fps) so I changed the number of elements on the screen until it was an “acceptable” 15 fps and left it there until I had time to come back and optimize things towards the end.

I even ran it through some of the iPhone performance gathering tools and saw that it was spending half the time in the simulation and half the time rendering. A bit surprising, but it’s not like I had much to compare it against since this was my first performance-intensive iPhone app.

That was until this morning. Prompted by some of the comments in my previous post, I did a quick search about the iPhone floating-point performance. I had heard conflicting information on it: does it have hardware floating-point support? do you have to use fixed-point numbers for best performance? Apple is very tight-lipped about the specific hardware specs of the iPhone, which seems very strange to those of us coming from a console background. But people have been able to determine that the CPU is a 32-bit RISC ARM1176JZF. The good news is that it have a full floating-point unit, so we can continue writing math and physics code the way we do in most platforms.

The ARM CPU also has an architecture extension called the Thumb. It seems to be a special set of 16-bit bitcodes that claim to improve performance. I image the performance increase comes from a smaller code footprint and faster code fetching and processing. As a bonus, you also get a smaller memory footprint, so it seems like a win for a lot of mobile platforms. XCode comes with the option to generate Thumb code turned on by default.

But, and this is a huge but, it seems that floating point operations cause the program to switch back and form between Thumb mode and regular 32-bit mode. I would be interesting to look at the assembly generated, but I haven’t had time to do that yet. So the more floating-point calculations you do, the less of performance gain you’ll get from Thumb optimizations. Or, in the extreme, you might even get a performance degradation.

Most 3D games are very floating-point intenstive, and my app is no different, so I decided to turn off Thumb code generation on a whim. The results:

  • Thumb code generation ON (default): 15 fps
  • Thumb code generation OFF: 39 fps

Whoa!!! That’s a saving of 41 ms per frame! That has to be the optimization with most bang for the time spent on it that I’ve ever done. This also probably means that my app is now totally render-bound, which is good news. I’m sure I can optimize tons of stuff there

So if you’re doing any kind of a 3D game, turn off Thumb code generation. Now!

Edit: I realized I never explained how to turn the Thumb code generation off! Oops. Go to your project settings, add a new User-Defined setting called: GCC_THUMB_SUPPORT and set it to NO. That simple (but surprisingly there wasn’t an already existing setting to check it on and off).

thumb

  • Scott Brooks

    The floating point unit is the VFP, so a floating point multiply loads the into the vector processor, then does the multiply in the vector unit, and then copies it back out. Which is kind of bad, but not nearly as bad as the if you are in thumb code. Then it’s a function call to dyld_stub___mulsf3vfp.

    We found a huge improvement writing our skinning loop directly for the VFP.

  • http://www.gamesfromwithin.com Noel

    Scott, Do you know if there are any matrix/vector math libraries using the VFP on the iPhone out there? I’d love to change my matrix multiplies to use it. [Edit: I just found vpfmathlibrary. Time to check it out.]

  • Brad

    To switch between THUMB and ARM code the processor does a branch. If I recall, Thumb code has a branch address with the lowest bit set – the instructions are at the 16 bit aligned address & the bit indicates Thumb mode. So, if you are constantly swapping in and out of Thumb mode, then you are branching like crazy.
    Thumb code normally results in a 30% decrease in code size & a small % in performance loss.

  • http://stevestreza.com/ Steve

    Also, Core Animation does everything in floating point and is OpenGL-based. So Core Animation-based games and apps could be better off by Thumb-breaking. Of course, do your profiling.

  • Andrew Shuttlewood

    The nice thing is that on the latest and greatest ARM processors (like the Cortex A-8), they have ‘Thumb-2′ – which has 16bit instructions for 90% of stuff, but also supports floating point instructions without changing mode.

    The only downside is that the Cortex A-8 has different floating point performance than the ARM11 (quite a bit worse if you don’t use the SIMD float instructions)

  • Dan

    @Brad: Well, yes and no. Basically, you don’t have individual modules/routines switching between Thumb and ARM. Usually a single compilation unit will be done in Thumb or ARM mode. (i.e. minimum one function, perhaps a set of functions, whatever is in the file.)

    So when ARM func A() calls (branches) to Thumb code B(), it simultaneously switches modes. Same thing on the way out.

    In other words, it was going to take the branch anyway. It’s just a question of mode the CPU is in once it lands in the new routine.

  • cak

    Great post, would have been better if you supplied some info on how to switch if off.

  • Pingback: Disassembling iPhoneOS binaries with Idis.pl | PiMPMiPhone.info

  • Pingback: BAD_ACCESS, Thumbs and Zombies « Evilog