Gamasutra: The Art & Business of Making Gamesspacer
View All     RSS
July 22, 2014
arrowPress Releases
July 22, 2014
PR Newswire
View All
View All     Submit Event





If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 
“0 – 60 fps in 14 days!” What we learned trying to optimize our game using Unity3D.
by Amir H Fassihi on 08/28/13 12:14:00 pm   Expert Blogs   Featured Blogs

The following blog post, unless otherwise noted, was written by a member of Gamasutra’s community.
The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.

 

A smooth gameplay is built upon the foundations of a smooth frame rate and hitting the 60 frames per second target on the standard iPhone and iPad devices was a significant goal during the development of our upcoming action platformer game, Shadow Blade. (http://shadowblade.deadmage.com)

The following is a summary from the things we had to consider and change in the game in order to increase the performance and reach the targeted frame rate during the intense optimization sessions.

Once the basic game functionalities were in place, it was time to make sure the game performance would meet its target. Our main tool for measuring the performance was the built-in Unity profiler and the Xcode profiling tools. Being able to profile the running code on the device using the Unity profiler proved to be an invaluable feature.

So here goes our summary and what we learned about the results of this intense measuring, tweaking and re-measuring journey which paid out well at the end and resulted in a fixed 60fps for our target devices.

1 – Head to head with a ferocious monster called the Garbage Collector. 

Coming from a C/C++ game programming background, we were not used to the specific behaviors of the garbage collector. Making sure your unused memory is cleaned up automatically for you is nice at first but soon the reality kicks in and you witness regular spikes in your profiler showing the CPU load caused by the garbage collector doing what it is supposed to do, collecting the garbage memory. This proved to be a huge issue specifically for the mobile devices. Chasing down memory allocations and trying to eliminate them became priority number one and here are some of the main actions we took:

  1. Remove any string concatenation in code since this leaves a lot of garbage for the GC to collect.
  2. Replace the “foreach” loops with simple “for” loops. For some reason, every iteration of every “foreach” loop generated 24 Bytes of garbage memory. A simple loop iterating 10 times left 240 Bytes of memory ready to be collected which was just unacceptable
  3. Replace the way we checked for game object tags. Instead of “if (go.tag == “Enemy”)” we used “if (go.CompareTag (“Enemy”)”. Calling the tag property on an object allocates and copies additional memory and this is really bad if such a check resides in an inner loop.
  4. Object pools are great, we made and used pools for all dynamic game objects so that nothing is ever allocated dynamically during the game runtime in the middle of the levels and everything is recycled back to the pool when not needed.
  5. Not using LINQ commands since they tended to allocate intermediate buffers, food for the GC.

2 – Careful with the communication overhead between high level scripts and native engine C++ code.

All gameplay code written for a game using Unity3D is script code which in our case was C# that was handled using the Mono runtime. Any requirements to communicate with the engine data would require a call into the native engine code from the high level scripting language. This of course has its own overhead and trying to reduce such calls in game code was the second priority.

  1. Moving objects around in the scene requires calls from the script code to the engine code and we ended up caching the transformation requirements for an object during a frame in the gameplay code and sending the request to the engine only once to reduce the call overhead. This pattern was used in other similar places other than the needs to move and rotate an object.
  2. Caching references to components locally would eliminate the need to fetch a component reference using the “GetComponent” method on a game object every time which is another example for a call into the native engine code.

3 – Physics, Physics and more Physics.

  1. Setting the physics simulation timestep to the minimum possible. For our case we could not set it lower than 16 milliseconds.
  2. Reducing calls to character controller move commands. Moving the character controller happens synchronously and every call can have a significant performance cost. What we did was to cache the movement requests per frame and apply them only once.
  3. Modifying code to not rely on the “ControllerColliderHit” callbacks. It proved that these callbacks are not handled very quickly.
  4. Replacing the physics cloth with a skinned mesh for the weaker devices. The cloth parameters can play important roles in performance also and it pays off to spend some time to find the appropriate balance between aesthetics and performance.
  5. Ragdolls were disabled so that they were not part of the physics simulation loop and only enabled when necessary.
  6. “OnInside” callbacks of the triggers need to be assessed carefully and in our case we tried to model the logic without relying on them if possible.
  7. Layers instead of tags! Layers and tags can be assigned to objects easily and used for querying specific objects, however, layers have a definite advantage at least performance wise when it comes to working with collision logic. Quicker physics calculations and less unwanted newly allocated memory are the basic reasons.
  8. Mesh colliders are definitely a no-no.
  9. Minimize collision detection requests like ray casts and sphere checks in general and try to get as much information from each check.

4 – Let's make the AI code faster!

We use artificial intelligence for the enemies that try to block our main ninja hero and fight with him. The following topics needed to be covered regarding AI performance issues:

  1. A lot of physical queries are generated from AI logic like visibility checks. The AI update loop could be set to something much lower than the graphics update loop to reduce CPU load.

5 – Best performance is achieved from no code at ALL!

When nothing happens, performance is good. This was the base philosophy for us to try and turn anything not necessary at the moment off. Our game is a side scroller action game and so a lot of the dynamic level objects can be turned off when they are not visible in the scene.

  1. Enemy AI was turned off when far away using a custom level of detail scheme.
  2. Moving platforms and hazards and their physics colliders were turned off when far away.
  3. Built in Unity “animation culling” system was used to turn off animations on objects not being rendered.
  4. Same disabling mechanism used for all in level particle systems.

6 – Callback! How about empty callbacks?

The Unity callbacks needed to be reduced as much as possible. Even the empty callbacks had performance penalties. There is no reason for having empty callbacks but they just get left in the code base sometimes in between a lot of code rewrite and refactoring.

7 – The mighty Artists to the rescue.

Artists can always magically help out the hair-pulling programmer trying to go for a few more frames per second.

  1. Sharing materials for game objects and making them static in Unity causes them to be batched together and the resulting reduced draw calls are critical for good mobile performance.
  2. Texture atlases helped a lot especially for the UI elements.
  3. Square textures and power of two with proper compression was a must.
  4. Being a side-scroller enabled our artists to remove all far background meshes and convert them to simple 2D planes instead.
  5. Light maps were highly valuable.
  6. Our artists removed extra vertices during a few passes.
  7. Proper texture mip levels were a good decision especially for having a good frame rate on devices with different resolutions.
  8. Combining meshes was another performance friendly action by the artists.
  9. Our animator tried to share animations between different characters if it was possible.
  10. A lot of iterations on the particles were necessary to find the aesthetic/performance balance. Reducing number of emitters and trying to reduce transparency requirements were among the major challenges.

8 – The memory usage needs to be reduced, now!

Using a lot of memory of course has negative performance related effects but in our case we experienced a lot of crashes on iPods due to exceeding memory limits which was a much more critical problem. The biggest memory consumers in our game were the textures.

  1. Different texture sizes were used for different devices, especially textures used in UI and large backgrounds. Shadow Blade uses a universal build but different assets get loaded when the device size and resolution is detected upon startup.
  2. We needed to make sure un-used assets were not loaded in memory. We had to find out a little late in the project that any asset that was only referenced by an instance of a prefab and never instantiated was fully loaded in memory.
  3. Stripping out extra polygons from meshes helped.
  4. We needed to re-architect the lifecycle management of some assets a few times. For example tweaking the load/unload time for the main menu assets or end of level assets or game music.
  5. Each level needed to have its specific object pool tailored to its dynamic object requirements and optimized for the least memory needs. Object pools can be flexible and contain a lot of objects during development, however, they need to be specific once the game object requirements are known.
  6. Keeping the sound files compressed in memory was necessary.

Game performance enhancement is a long and challenging journey and we had a fun time experiencing a small part of this voyage. The vast amount of knowledge shared by the game development community and very good profiling tools provided by Unity were what made us reach our performance targets for Shadow Blade.

Here is the game trailer for our game, Shadow Blade:

http://youtu.be/tgSXLVAwZJs

Game website: shadowblade.deadmage.com


Related Jobs

FitGoFun
FitGoFun — Mountain View, California, United States
[07.22.14]

Unity 3D Programmer
Activision
Activision — Seattle, Washington, United States
[07.22.14]

Software Engineer - Activision Seattle
Treyarch / Activision
Treyarch / Activision — Santa Monica, California, United States
[07.22.14]

Senior Environment Concept Artist - Treyarch (temporary)
Vicarious Visions / Activision
Vicarious Visions / Activision — Albany, New York, United States
[07.22.14]

Software Engineer-Vicarious Visions






Comments


Eric Robertson
profile image
Informative Article as I've spent the last year optimizing and optimizing to get our Project running smooth on an iPhone4. Our biggest performance bottleneck is Rendering Opaques, especially when the player is on top of a hill looking over the entire zone.

Do you use WAV sound files for most of your sounds?

Amir H Fassihi
profile image
Well for iOS we use mp3 as it can be played while compressed.

Tim Miller
profile image
Thanks for the article, lots of useful info here and your game looks great!

I thought that iOS could only play 1 compressed sound at a time. Are you using "decompress on load" on your mp3s? And are you using them for all the sounds in the game (even short sounds) or just longer audio like music?

Amir H Fassihi
profile image
We used "decompress on load" for smaller sounds and the music uses "compress in memory" only.

Tim Miller
profile image
Is there any performance hit or increased load times or small loading hitches when using decompress on load? The unity docs say:

"Be aware that decompressing sounds on load will use about ten times more memory than keeping them compressed, so don't use this option for large files."

But it doesn't say anything about how much memory short (0 - 4 seconds) sounds take (if any).

Amir H Fassihi
profile image
We did not notice performance hits for the small sound files. I assume the sound files should take as much memory as their equivalent .wav format instead of .mp3. Another thing to note is that since the cpu load occurs on initial level load only, it is a bit harder to analyze the different instructions going on in that frame and the result for the user is only a slightly longer load time.

Michael Silverman
profile image
Thanks for this! It would have taken forever to realize foreach generates garbage coming from a c++ background.

Amir H Fassihi
profile image
It was quite strange for us and we still can not find the reason why this can happen.

Romain Guy
profile image
Java has the same issue. Using a foreach loop leads to the creation of a new Iterator object, C# probably works in a similar way. This means this piece of code (a Java foreach):

for (String s : listOfStrings) { ... }

Is equivalent to this:

Iterator strings = listOfStrings.iterator(); // allocates an Iterator instance
while (strings.hasNext()) {
String s = strings.next();
// ...
}

Amir H Fassihi
profile image
Right, it should be related to the extra iterators and pointers created.

Axel Habermaier
profile image
That actually depends on the data type you're iterating over. List or Arrays never allocate when used in a foreach loop, IList, on the other hand, does. See, for instance: http://blogs.msdn.com/b/etayrien/archive/2007/03/17/foreach-garba
ge-and-the-clr-profiler.aspx

Amir H Fassihi
profile image
We saw the allocations when we used Lists with foreach loops.

Andy Dunn
profile image
The MSDN article refers to the Microsoft CLR implementation. Unity uses the Mono/Xamarin implementation.

Microsoft have specific optimizations to reduce the GC overhead for most of the built in types. It would seem like the Mono version does not.

Jason Bentley
profile image
Great Article
Object Pooling was the single biggest improvement of our iOS project.

Dustin Chertoff
profile image
Great article! Bookmarked and passed along to my team as a must read.

Darrel Cusey
profile image
Fantastic article, Amir -- thank you so much for posting.

Ben Droste
profile image
Thanks Amir, this article has been really useful for me. I'm attempting to teach myself programming while working on a solo project and I've already made a few changes to my code after reading this. Being a bit of a noobie though I've got a couple questions regarding Garbage Collection I'm hoping you can clear up.

I'm currently going through and replacing my foreach loops with for loops, which is a simple enough process for arrays and lists. However when it comes to looping through the children of a GameObject, is it also better to use transform.childCount rather than a foreach loop? (I ask in case there is some extra performance hit in accessing the number of children, which from what I can gather is similar to a GetComponent call).

"Instead of “if (go.tag == “Enemy”)” we used “if (go.CompareTag (“Enemy”)”." - Does this apply to things like GameObject names as well? eg: "if (go.name == "Enemy")?


Once again thanks for this article, it's been a huge help. Code performance cost has been something I've been very concerned about while developing my game. It's not information that's easy to come across, it seems to be largely knowledge acquired with experience.

Amir H Fassihi
profile image
Ben, extra allocations using foreach cause the GC to collect more often but calling a function or accessor can have a small overhead and comparing these two depends on numerous other factors but as a general rule it is always best to cache the container size locally once and not call it with every iteration loop. (int size = transform.childCount)
Regarding reading game object names, you are correct, I think we witnessed about 50 bytes allocated for every game object name query. Setting the name does not allocate any memory but reading the name does, however, since there aren't any built in functions to compare names, you would have to minimize such calls in your game code.
The best is for you to monitor your running code using the profiler every day! The good news is that the built in profiler is quite good and can help find a lot of performance bottlenecks and it will help you fight the root of all evil!

Ben Droste
profile image
Thanks Amir, I appreciate the reply. Thanks again for a very useful article.

Troy Walker
profile image
definitley alot of callbacks happening in Unity :)

I like the idea of caching transformations into a single frame update... would love to see how you worked that out.

Thibaud de Souza
profile image
This is a strategy you can implement aggressively (uniformly, on all communications between the scripting platform and the underlying engine) by funnelling all writes and batching them to the underlying engine as a single command (so, no write occurs until control returns to the engine); you can also avoid read access by caching read results whenever possible (so, at a cost).
While we did this in a production context, we had access to the C++ source; without which processing your command stack may result in multiple messages to the underlying engine regardless (but at least you can avoid redundant writes).
Amir may give you a more pointed answer; in the meantime a simple trick worth testing is how engine access processed within the scope of a single function is processed; Unity may perform optimisations of its own in this case.

Amir H Fassihi
profile image
We did follow the simplest solution possible, make a specific component to work as the single bridge to the character controller for example and then instead of calling the move function of the character controller separately from every other component, we called the AddMovement functions of this bridge component and this component interacted with the character controller in the LateUpdate callback to make sure it happens after all other transformation requests in the current frame. The same scheme can be used for other interactions with the engine code that can be batched together and executed once.

haim ifrah
profile image
Hi,

great article!

thanks.

Enz -x
profile image
Hi Amir, thank you for sharing experience.
Btw. what's your preferred quality setting?
Disabling Vsync can reduce latency from 16ms(60fps) to 5ms(200fps) on one core cpu(2.6GHz), Is there any other tweak?


none
 
Comment: