Gamasutra: The Art & Business of Making Gamesspacer
Solving Smartphone Performance Problems

Printer-Friendly VersionPrinter-Friendly Version
View All     RSS
April 24, 2014
arrowPress Releases
April 24, 2014
PR Newswire
View All





If you enjoy reading this site, you might also want to check out these UBM TechWeb sites:


 
Solving Smartphone Performance Problems

October 3, 2012 Article Start Page 1 of 3 Next
 

Having issues with performance on your smartphone game? In this technical article, MoMinis programmer Itay Duvdevani explains how he and his colleagues solved what should have been a very simple sorting problem creatively, and upgraded their smartphone game engine in the process.

Working on the phone side of the MoMinis Platform is an interesting job. Since our content developers work mostly on the Simulator for game mechanics, bugs and performance tuning -- we always get a little nervous when a title becomes ready to start testing on actual phones.

Compilation of a game can reveal bugs (either in the simulator or the compiler) that will break the game's logic, or performance problems that aren't simulated in the Simulator correctly.

When a game works in the Simulator, but not on a phone, it usually finds its way to my team pretty quickly -- As compiler developers, we're the only ones that can pinpoint the problem.

The Problem with Jelly Jump

Our content developers were working on a new title -- Jelly Jump. After several weeks of development, and a week or so from release, the game became mature enough to start testing on actual phones. On higher-end phones, the game ran smoothly and was actually very fun. However, on mid-range and low-end devices, we experienced random frame rate jitters that made the game unplayable. My team got to handle the situation.

We observed the problem on a relatively capable device (800MHz Snapdragon with Adreno 200 GPU). This wasn't a low FPS problem, but FPS jitter that occurred every few seconds, in what looked like random occurrences. The game would run at 40-50 FPS, then dip to 10-5 FPS for a second or so, and then continue.

My team was tipped off by the content developers about some testing they'd done, which showed that removal of graphics assets made the game run better.

At compile time, the asset-generation stage is responsible for distributing the game's assets to texture atlases. Since this is an automatic process, it uses several heuristics to predict which objects will be drawn together, to minimize texture swapping. These heuristics are currently pretty basic, and rely on static analysis only -- we don't have performance-guided optimization facilities as of today. We suspected the assets were being grouped in an inefficient manner, causing texture thrashing due to memory exhaustion every time certain objects were being drawn.

This was a problem. We had no "magic solution" or even a workaround for this -- we didn't even have a quick 'n' dirty solution. The game simply had too many graphics for our asset-generation pipeline.

After spending a few days figuring out how to tackle the problem, yielding nothing practical for the time we had -- we started doubting our assumptions, and tested whether there even was a problem with the textures. We configured the asset-generation pipeline to downscale all the graphics to a quarter of their size, making all the game's assets fit in a single texture. To our surprise, the game ran just as bad (and was very ugly).

That meant that the game's logic was the bottleneck. We tried profiling the game with the Dalvik profiler. Unfortunately, profilers are very useful when you debug a low-FPS problem, but almost useless when you debug an FPS jitter problem, unless you can predict exactly when your game is going to jitter -- and of course, if we knew that, we probably would have known what the problem was.

Sitting down with the content-developers, hearing about the game's logic -- and using our intuition to plant some debug prints in the code, we came to a surprising conclusion -- our Z-ordering code was misbehaving when a large amount of object were being created in a single logical iteration!

Why Keep Things in Z-Order?

It should come as no surprise that some of the tasks our game-engine is responsible for, except driving the game's compiled logic, is to draw pretty things to the screen, and respond to touch input.

For a 2D engine like the MoMinis Platform, objects need to be arranged according to their Z-order and traversed (either in a back-to-front or front-to-back order), to decide which objects receive touch input, and which get drawn first.

The Touch Handler

Our touch handler is a simple adapter which expects screen-level touch events from the OS, and dispatches game-level events -- so when I click at a certain location on the screen, the object at that location receives a logical touch-down-event instead of just "The user clicked at: (x,y)".

We provide this layer of abstraction to save game developers the need to handle low level touch events and decide which object should respond. It is very important that our development platform be kept suitable for entry-level developers, and be easy to use with as little hassle as possible.

The case where game objects are stacked on one another and a touch event is received at a location that's within multiple objects can be difficult to handle at the game developer's level. That's why we decided that the touch event in this case is dispatched to the top-most object that's visible under the finger, and that's it.

The easiest way to do it is to scan all the objects in a front-to-back order and find the first that can handle the event.

The Render Queue

The MoMinis game-engine is powered by OpenGL ES on both Android and iPhone. For portability reasons, we limit ourselves to the OpenGL ES 1.1 API.

To maximize fill rate, we separate objects into two groups: opaque and translucent. The opaque group gets drawn first, front-to-back -- so fragments that are eventually invisible will be culled by the Z-buffer instead of being drawn-upon later, saving the entire rasterization pipeline.

For correct alpha-blending, the translucent group must be rendered back-to-front, after the opaque group has been rendered.

Challenges to Platform Developers

The demands from our cross-platform development environment and the games it produces are challenging:

  • Inexperienced, non-programmer developers should be able to create nice games easily
  • Games should have good performance
  • Games should be pretty
  • Final game size should be small
  • Games should be portable
  • Developer should be bothered with porting issues as little as possible (preferably none at all)

These demands conflict most of the time, and make our game-engine development a challenge.

When you're a content developer making a fully-fledged game, the game's logic can get quite complex pretty quickly. You have all these power ups and bonuses and special animations and sounds that change your basic game mechanics temporarily, and in some games you have to dynamically generate the level as the user advances (as in Jelly Jump).

Our platform developers aren't always seasoned programmers, or simply don't know the ins-and-outs of the game engine's implementation, so game logic is at times written in a suboptimal fashion. In addition, as the platform is still evolving, common practices used by content developers may worsen the situation, usually with no real alternative due to lack of features in the platform.

This means that sometimes a content developer will create and destroy many auxiliary objects in a short time during gameplay, making instance management a challenge.

An Always-Sorted Collection Problem

As the engine was originally written for J2ME, it allowed use of only the facilities offered by CLDC under a limited number of classes and code size, and some implementations are very far from optimal. For instance, the code relied on java.util.Vector -- a class that has all its methods synchronized.

The entity that manages the game held all existing object instances in an array sorted by Z-order. Every time an object was being created, destroyed, or its Z-order changed -- we would remove it from the array (moving all the elements after it one index back) and re-insert it at the correct location after a binary search (moving all elements after it one place forward).

Since scanning the array for the object is an O(n) operation, removing and inserting an element is again an O(n) operation, and finding the correct spot is a O(log n) operation -- all the basic actions on an object had runtime proportional to the number of objects existing at the moment.

Needless to say, as games got more complex and required more objects, this became a problem. Jelly Jump was creating 40-60 objects every few seconds, in a world containing 500-700 objects. On top of that, when an object is created, its Z-order is set to the maximal Z-order allowed -- but then immediately set to its layer's Z-order by the object's constructor, doubling the amount of Z-order operation the engine had to perform. This would explain the jitter -- creating objects caused a very long logical iteration, dipping FPS temporarily.

Initial Optimization Experiments

We needed to fix it, and we needed to fix it fast. This was the main problem blocking the release.

At first we took a conservative approach, trying to make as few changes as possible so close to a release. We concluded that there was no reason for the collection to be kept in order at all times -- just before rendering and touch-handling. Our first attempt was to defer sorting to these two occasions.

This should have addressed the object creation problem, reducing it from O(n) to O(1), as we intended to simply append the new object to the end of the list, and query it for its Z-order just before sorting. This would also cut down by half Z-order changes, as we were no longer adding new object to the collection, then moving it like we used to.

We took the standard approach, and tried good-ol' quicksort for the just-in-time array sorting. Unfortunately, the game engine's contract with the developer is such that objects at the same Z-order are sub-ordered by the order their Z-value changed -- meaning our sorting algorithm had to be stable, which quicksort isn't. (It took us a good while to figure that this was what's causing the weird phenomenon we were seeing with the game's graphics).

Somewhat humbled by our reckless arrogance in thinking this was such an easy problem to fix, we looked for a good sorting algorithm that is both stable and efficient. Tree sort was the next thing we tried. Though the sort was now stable, it was also very slow. We failed at identifying that we were using tree sort on a mostly-sorted collection, which is tree sort's Achilles Heel -- in that case you end up with a degenerate tree and efficiency is no more.

Reading a little bit on the web we encountered all sorts of sorting algorithms and variations on existing algorithms that might or might not work, but we didn't have the time to start learning and understanding each one and decide whether it was appropriate for our needs.

So we went back to quicksort. This time we applied a bias to the Z-order, giving a monotonic index for each object that got its Z-order changed. This wasn't perfect, but it was reasonable for the time we had.

Eventually, this optimization didn't make it to the release -- we didn't have the time to properly test the change and we were able to work around the problem enough to make the game releasable by avoiding creation of multiple objects in a single logical iteration, thus not triggering the Z-order overhead. Instead of creating a full screen of jellies at once, we instructed the content developers to create a screen line-by-line.


Article Start Page 1 of 3 Next

Related Jobs

Infinity Ward / Activision
Infinity Ward / Activision — Woodland Hills, California, United States
[04.24.14]

Principal / Lead Rendering Engineer
Gearbox Software
Gearbox Software — Plano, Texas, United States
[04.24.14]

Graphics Programmer
Turtle Beach
Turtle Beach — San Diego, California, United States
[04.24.14]

SENIOR C# SOFTWARE ENGINEER
SOAR Inc.
SOAR Inc. — Mountain View, California, United States
[04.24.14]

Unity 3D MiniGame Programmer (and Designer)






Comments


Andrew Grapsas
profile image
I mean... you read what you wrote in your first content paragraph, right?

"After several weeks of development, and a week or so from release..." Why did you wait so long to get the product on an actual device?

The first thing I do when starting a project as an engineer is get it running on target. There are then weekly build reviews. It doesn't matter if you only have a bouncing ball.

Itay Duvdevani
profile image
You raise a valid point.

This specific game wasn't tested on weak devices soon enough. The game was tested on more powerful phones, and we got to test it on weaker phones late in the process.

However, you must understand MoMinis isn't a software house that just develop games. We develop game-creation platform, and manage internal and external game creators to create the games we publish through Playscape. There are 3-5 project running concurrently at any given time. Think more like GameSalad, or GameMaker for mobile (although we do not compete with them directly)

My team develops the platform, so we're not involved in the process of game development. We get called when there's a problem.

However, my team and I do try and contribute to the game development process with advices and experience like this.

K Gadd
profile image
Is this really about smartphone performance problems? This just seems like a case where the design wasn't thought through and it turned out to be really hard to fix the design later. The need for a stable sort sounds like it forced you into some relatively unexplored territory. It's really unfortunate that you ended up having to build your own complex data structure from scratch just to sort objects by Z, and I feel like the design decisions that led to this were the real root problem here, not any particular performance issue with Java or J2ME or the compiler you were using or your libraries. As Andrew said above, also, it seems troubling that you waited so long to actually test on-device and this likewise hints at some more fundamental failures here that led to you having these performance issues.

I really appreciate you guys sharing this in detail, though, and sharing your code. Hopefully it will help to inform people who might otherwise make the same mistakes you made in the future.

EDIT: Also this is really grating on me, the conclusion to your article is basically just complete nonsense and it makes me angry. You tried what, three sorting algorithms? The first of which - insertion sort - is already known to be an awful algorithm, and the other two clearly not the correct algorithms for your problem. From this you conclude that TRADITIONAL SORTING METHODS were the issue here, and that the solution is to adopt the thread scheduling algorithm from the linux kernel to sort your sprites?!?!

Andrew Grapsas
profile image
Yeah, after they said their problem was sorting I stopped reading because any junior game programmer should be able to solve this issue...

...now based on your comments I'm going to have to reread and smack my forehead.

Itay Duvdevani
profile image
Hi Kevin,

Thanks for your comments.

I partially agree on the point you make on design. Yes, it could have been better, but as the article mentions, this is not an engine written from scratch after weeks of planning and requirements gathering. This code base was written over years, piece by piece, with major backward compatibility issues to accommodate – as the last section in the article suggests.
When the engine was first designed – weaker J2ME devices couldn't handle more than ~100 objects. We now have almost a 100 games being re-compiled every now and then with newer compiler versions - some of them were developed on Studio versions that are over three years old, which rely heavily on the platform’s behavior and *sigh* bugs.

Besides, I’m not sure how you would have completly eliminated the need for sorting-by-Z (see relevant article section). If you have a suggestion I’ll be happy to discuss it in private.

I am sorry to hear my conclusion made you angry and that you found it to be nonsense. Yes, we tried only three sorting algorithms, but as I've written in the article - we didn't have time to try every algorithm in the book, nor the need. It was obvious this was not the way to go. Generic sorting algorithms have a theoretical lower-bound of O(nlogn), and I'm suggesting a method with better runtime complexity – O(1), so why settle for less?

As for the data structure complexity - I urge you to take 10 minutes of your time and read the code on GitHub (read the simple version, I don't expect anyone to use the complex one we had to use). It is basically an array of linked lists with back-references. Hardly complex, and takes minutes to code.

Maybe my intentions with publishing this article weren’t clear. This is not a post-mortem article where I criticize the development process that lead to the problem and suggest methods and procedures to better develop games in the future. My hope with publishing this article was that people will find the suggested data structure useful for other applications as well, like we found the Linux scheduler useful for Z-ordering, and that developers in general should keep an open mind when tackling a problem (putting aside whether it can or can’t be removed)

Rod Boyd
profile image
"Generic sorting algorithms have a theoretical lower-bound of O(nlogn), and I'm suggesting a method with better runtime complexity – O(1), so why settle for less?"

That's not strictly accurate. You're solution is O(1) per operation (ie. per (re)insertion into the structure) but you have to do this for all n objects, so your solution is O(n). This is also assuming that your allocator is O(1) (which it likely is not) because you've introduced a whole lot of memory management overhead that wouldn't exist with a sorting solution. Anyway, even assuming an O(1) allocator, your O(n) solution may still seem comparatively better than an O(n log n) sort, but the memory access pattern of your solution will probably introduce even further overhead due to cache thrashing. (This may be able to be alleviated by using a custom allocator, I'd look into that if you stick with this solution.)

That's all without even mentioning radix sort, which for large n is very suitable and is, in fact, O(n). Don't be deterred by the suggestion that this only works for unsigned integers, do a little digging and you'll find it can be made to work with IEEE standard-compliant floating point values as well. (Serious question for everyone, is there a platform today that doesn't adhere to the same floating point format for 32-bit floats?)

Having said all that, I really enjoyed your article. It's invaluable to hear the experiences of other devs and I hope you'll post again in the future!

Rod Boyd
profile image
Apologies for the incorrect use of "you're" -- typing on a (crappy) phone.

Lee Griffiths
profile image
re: the iPhone, or anything using a PowerVR GPU.

If you're drawing anything that is fully opaque (i.e., no destination based alpha blend, no alpha testing -- nothing that requires the 'background' in anyway), then you can rely on the TBDR[1] chips in these phones to deal with the Z ordering for you. They'll eliminate the pixel overdraw, meaning you don't have to waste time sorting them, assuming you're sending the correct Z values out of your vertex shader (or fixed function equivilent).

Of course, you need to draw those opaque ones before you draw the transparent ones. And it's best you draw all the opaque ones before you begin on the transparent ones, which still need to be sorted, but it's a quick optimisation that might help you out so won't have to worry about whether to bubble sort or not :)

Wylie Garvin
profile image
I don't wish to "pile on", and while I agree that this problem shouldn't be difficult to solve for someone with the right experience (e.g. either a CS degree, or some real-world experience with good sorting algorithms) I still enjoyed reading about your problem-solving process.

Hopefully you've learned some useful tricks for debugging "strange looking problems" from this experience! For one thing, always _check your assumptions_. It hurts when you waste a lot of time trying to solve the wrong problem, because of an early assumption that you could have easily disproved with a bit of investigation!

Example: Yesterday, I was shown a problem where some profile events appeared to be missing, from a profile capture using our in-house profiling library. It was on a platform where we'd never really used the profiler before, so our first assumption was that the event capturing on that platform was somehow messed up. But I had a funny feeling, so I spent about 1 hour to get a text printout of the contents of the event capture, and compare it manually to the graphical display from the tool. And it turned out that the problem was actually in the display tool: a well-tested piece of software which we've used on thousands of profiles over many years! The "missing" events were properly captured, but the tool didn't display them. As soon as we had that figured out, we proceeded to solve the REAL problem, instead of wasting hours (or days) trying to fix something that wasn't broken (the profile capture code on this rarely-used platform). The moral of the story is: always check your assumptions!

Felipe Borges Alves
profile image
java.util.Vector does not have synchronized methods on J2ME (it does on JSE). In J2ME it is equivalent of JSE's ArrayList.
You really don't need to have a list with all your sprites sorted by it's Z position. There are many ways around it.
Even if you really want to keep that sorted list, there are stable sorting algorithms with good performance, like mergesort and heapsort. For heapsort, the almost sorted list would fall under its best case.
Note that your solution is not O(1). Every time you add an object with a new Z value, you will need to make an O(n) insertion on the quick access list. A much better solution would be to use a TreeMap mapping Z values to lists of objects.

Felipe Borges Alves
profile image
Actually heapsort is not stable, my mistake. But mergesort is.

Lior Tal
profile image
"Unfortunately, profilers are very useful when you debug a low-FPS problem, but almost useless when you debug an FPS jitter problem"

How is that so?

A good profiler lets you record your game and then run through all frames, finding the ones that took longer to process.


none
 
Comment: