We just released Osmos 2.4.0 on iOS – our first release in almost 4 years. Why the long hiatus? Was it because 2.3.1, our previous release, was perfect? No, though it was solid and stable, and we were happy with it. Rather, it was due to our reliance on “floating-point determinism” for multiplayer. And for years afterwards I thought it might be the last version we ever released on iOS.
It’s 2013. Miley Cyrus is wrecking stuff in her underwear. I was about to submit version 2.3.1 to Apple for approval. This was a minor update from 2.3.0 – just a few fixes. We did a little testing, and all seemed solid. Then, just before submitting, we got a “blobiverse desync” error during a multiplayer test.
What, you may ask, is a “blobiverse desync”? Well at the time it was something I never expected - nor wanted - to see again.
It’s 2011. Everyone is getting into Minecraft. Aaron and I were working hard on creating a multiplayer mode for Osmos – a surprisingly ambitious and tricky project. One technical question we faced was how to synchronize game state between devices over a network. There are a lot of moving blobs in Osmos: It takes roughly 7 kb to represent the game’s state at any given moment. Multiply that by a decent framerate, and the network bandwidth required to stream Osmos state across a network gets into video streaming territory. We wanted to avoid putting that kind of load on people’s networks and devices. Instead, if we could simulate the game locally and identically on each device, transmitting only player input (mass-firing, mainly) across the network, the bandwidth requirements would be minimal. We decided to go that route.
We encountered and solved so many technical challenges in our development of Osmos multiplayer, many of which I found super interesting, but I want to focus on just one here: simulation determinism. It’s actually a huge subject, so I’ll just touch on a couple bits and zoom in on an even narrower subject: floating-point determinism.
For starters, a titch about simulating physics on computer. If you’re familiar with the term "lockstep simulation" - one of the things we implemented for Osmos multiplayer - feel free to skip the rest of this paragraph. Generally, physics is simulated a frame at a time: calculate, step; calculate, step; and so on. If you do it right, the smaller your time-step is (the time from frame to frame) the more precise your results will be. Precision aside, it’s important here to note that a different time-step will give you different results. For example, simulating one second in 60 steps (1/60th of a second per step) will give you a different result than simulating that same second in 30 steps (at 1/30th of a second per step). But so long as the framerate is decent and things are stable, these differences don’t get noticed by most players. In single-player Osmos we simply run as many frames as we can per second. If your device can handle it, this means the game will run at 60 fps, bound by the refresh rate of your display. But for various reasons some frames end up taking longer than others, sometimes due to the complexity of what’s happening in game, and sometimes due to what else the device might be doing in the background. So one frame make take 1/30th of a second, another 1/47th, another 1/60th, etc. And that’s fine. The problem arises when you want two different simulations to give identical results. In that case, the time-step used for calculations must be identical for both simulations. Physics programers call this a “lockstep” simulation, which we implemented for Osmos multiplayer. I won’t dive any deeper into this subject here, but a great resource on all this, including code examples, can be found on Glen Fiedler’s website.
You may ask: is this necessary? Won't these calculation differences be minor? Who will notice? Well, Osmos falls into the category of sensitive systems that can produce drastically different results from slightly different initial conditions – aka the Butterfly Effect. For example: A tiny difference in mass-transfer between motes on different devices will cause them to exert slightly more or less gravity from that moment onwards, causing trajectories to vary more and more over time, until a near-miss on one device is a collision on the other – a “catastrophic” divergence. A player could die on one device but not the other. Not good.
Moving on, once you've implemented a lockstep simulation over a network, how do you know if it's working correctly? Well, you could run a simulation on two devices and watch… and watch… and watch… until you maybe notice something looks different on the two displays. We did this for a little while, saw differences accumulate over time, and decided to get more rigorous. We computed a checksum / hash of the summed masses, x & y positions, and x & y velocities of all motes on the level (5 floats total), and started to send that across the network for every time-step. The devices would compare their local hash with the remote device’s hash, and if they differed we’d throw an alert up on the screen - blobiverse desync! - and dump a bunch of relevant numbers to a log file to analyze. One by one we found the divergences, and resolved them in various ways. And once we were done we were kind of amazed: devices from different generations, each running different versions of iOS - even the Xcode simulator - they all gave the same results! We had achieved simulation determinism. Huzzah!
I remember chatting with Jonathan Blow at GDC 2012 about what we were up to. We got into some technical details, and I mentioned we were using a lockstep simulation as our multiplayer solution. He made a face I can only describe as "Ugh" with a dash of "Really?" I remember saying "I know. I know. But it's working well! We got this."
We happily released Osmos multiplayer and never saw that error message again...
... until that final 2.3.1 build in 2013. What happened with that build? I had actually tested it a lot, and had never seen the desync. But one day right before release Dave and I were testing and - blammo - there it was. After some investigation we realized the desync only happened when playing 2.3.1 against the 2.3.0 store build. But what had changed? It turns out I had recently updated my version of Xcode. I tried reverting back to the previous version, and the problem went away. (I tested this to death.) Something in how the new Xcode was compiling our code gave different results from previous versions. Spooky. So, while Apple was still accepting builds from the older version of Xcode, we slipped it in. All went well with the release. But soon after, Apple stopped accepting builds from that version. We could no longer submit new versions of Osmos without breaking multiplayer. And for several years, we didn't feel we needed to.
Over the past half-year, Apple has gotten more aggressive about culling “unsupported” apps from the App Store, with 32-bit-only apps slowly approaching the chopping block. It's pretty clear at this point that Apple will cut support for these with iOS 11. And so this January I rolled up my sleeves and started work on an update. I figured - worst case scenario - I might have to cut multiplayer entirely. I’m not actually sure what percentage of players play multiplayer. In any case I figure single-player Osmos is better than no Osmos.
Modernizing the Osmos project so it would build and run using the latest Xcode and frameworks took some work, but less than I expected. Ditto for 64-bit support. There were only a few glitches related to memory alignment in our rendering pipeline and network protocol. For example, here are a couple before-and-after videos I put together demonstrating the kind of alignment bugs that required squashing.
Pretty smooth work for the most part.
Desync issues aside, bigger changes were required for multiplayer. Apple’s networking and game frameworks have changed a lot over 4 years. And while I was at it I took a cue from Apple and removed all Game Center UI from the game, adding support for “background matchmaking” so players can practice/play single-player levels while waiting for a match. (There aren't as many people playing Osmos multiplayer these days, so it can be a while before someone else comes along looking for a match.)
Of course, synchronization was the big question / risk in this update. I didn’t expect the new version of Osmos to be compatible with the 2.3.1 store build, and sure enough it wasn’t. So we’d lose backwards compatibility at least. But I wondered if this new version would be compatible with itself across different devices and OS versions, like it was in “the good old days” – specifically, 32 vs 64 bit devices. It wasn’t. Ugh. And so began the deep dive into floating-point determinism.
I tried many things. I spent a good chunk of time tinkering with compiler flags / settings. Turning off optimizations solved most of the desync issues, but I wanted to avoid that if possible. I tried flags like mfpmath and sse2, but they didn’t seem to get me anywhere, and documentation on the web with respect to those and clang is pretty thin. I revisited my understanding of floating-point math. I stared at waterfalls of numbers with many decimal places, trying to figure out where, why, and how things were diverging. I reduced the Osmos physics code to the point where nothing moved and no collisions occurred – at least that stayed in sync! I isolated the problem to the point where I had a single line of code that gave different results on 32 vs 64 bit devices for some (but not all!) input values. Simplified, it looked something like this:
mote.x += mote.vx * dt;
Simply update a mote's x-position by its x-velocity times the time-step. For example, with
mote.x = 0.00668302644
mote.vx = 2.32162547
dt = 33/1000.0 // time-step in milliseconds
The mote's new x-position on the iPhone 6 would be 0.0832966641 whereas it would be 0.0832966715 on the iPod 5. (A small difference, but still important.)
Were IEEE standards being ignored? No. This difference only occurred in Final Release builds, with optimizations enabled. Eventually I convinced myself it was due to compiler optimizations causing some intermediate results to be temporarily stored in double / 64-bit registers on 64-bit devices, leading the final float / 32-bit result to be somewhat different. So I tried “unrolling” some simple calculations. For example, I expanded the single line above to
float dx = mote.vx * dt;
mote.x += dx;
This kind of change helped in some sections of code, but not everywhere. In some places the compiler was still optimizing / merging instructions. So, how to tell the compiler not to daisy-chain floating-point calculations? Well, as someone who is absolutely not a compiler expert, I came across a neat trick: the somewhat esoteric volatile keyword. Rewriting the above code as
volatile float dx = mote.vx * dt;
mote.x += dx;
tells the compiler to rewrite the result (as a float) to the dx variable as soon as it's calculated, and not to use any intermediate / higher-level registers. It’s a nice, code-local solution to the problem that can be applied in a very precise way where needed. I ended up having to do this to about 30 different blocks of code here and there in Osmos. It lengthens those sections of code (in some places from 10 lines to 40 lines of code), giving it more of an assembly-language style, but it works.
Unfortunately that wasn’t the one, magic bullet that solved everything. It took me a while to track down the last couple sources of divergence, and they turned out to be the sqrt() and some trignometry functions. (Osmos is all about circles after all.) When compiler optimizations are enabled, these both give slightly different results for some inputs. For example, acos(0.830012262) returns 0.591666639 on my iPhone 6 and 0.591666698 on my iPod 5. Volatile doesn't help with this, so I tried rounding the results to the nearest degree, throwing away a bunch of precision, but giving indistinguishably-different results – totally fine so long as results match across devices. That worked. 99.999% of the time. Turns out every once in a while - hours of play on average - the results would end up on different integer boundaries after rounding. Ouch. Rounding can be a more complex operation than you might think, but it’s a solvable problem when there’s a ground truth you’re looking for, like the nearest integer to a given value. But when inputs are different, neither device in isolation has enough information to always come to the same result as the other. I lost days to that one, with much fun staring at streams and streams of tiny decimals. The solution? I went back to some basic circular-geometry and came up some new equations that would give a decent approximation of mote-mote area-overlap without the use of any trigonometry functions. The new approximation always underestimates the overlap, but that’s ok since the mass transfer generally gets spread across multiple time-steps anyways. I didn’t notice the difference, and I doubt anyone else would either.
With that, and after tons of testing, I think Osmos 2.4.0 is solid on this front. All seems good after a few days in "the wild" as well. Can I guarantee there aren’t any super-rare divergences remaining? Nope. Hopefully people will let me know if they ever see it occur.
Overall I spent nearly 4 months working on this update. Most of that was on multiplayer, with one month of that spent in the rabbit hole of floating-point determinism. I hope this blog post helps others avoid some of that pain.
To summarize: Lockstep synchronicity got you down?
Moving forward I’m curious if a future version of Xcode will again break our synchronization, or if we’re now more-or-less future proof. Time will tell.
ps. I could go on a lot longer on this and many other subjects related to Osmos multiplayer. If you find this blog post useful and/or interesting, please let me know. It’ll motivate me to blog more than once per year! ;-)