GameFest: Testing Crackdown's 495 City Blocks
Everyone knows that testing a game thoroughly is essential to its quality. Everyone talks constantly about how games are getting more complex to create -- and of course, then, to test.
Jami Johns, Software Development Engineer in Test at Microsoft Game Studios, gave a very compelling discussion of testing Crackdown, Microsoft's successful open world game from earlier this year, at Microsoft's recent GameFest 2007 in Seattle.
Changing The Game: Testing For 360
To illustrate this, Johns -- who previously worked on Project Gotham Racing 3 -- let us in on a little secret on that game's testing. "We have an entire class of bugs about the color of the speedometer needles." But with Crackdown, a game with "dozens and dozens of things running around in the world at once, and one can't help but crash it" the problems were even worse. "If there's more content, and it's getting more detailed, and it's getting more interactive, how are you going to test it with the same number of testers?"
According to Johns, "We learned a lot on Crackdown. It was our first open world game at MGS. It changed the way we think at MGS." Those who haven't played the game may not be familiar with it -- and the challenges it could potentially present in testing. For one, it's an open world game, of course, like Grand Theft Auto. But you're also empowered by the game's ability system to travel anywhere in its world, vertically as well as horizontally; there are only a handful of places you can't explore. There are enemies to fight, items to interact with, a wide variety of vehicles... and little scripted behavior.
Summed up by Johns: "[Crackdown] actually consists of about 495 city blocks. At any point in the game any gamer can run, jump, swim or fly from one of these blocks to any other. "
If that wasn't complicated enough, Johns pointed out, "It turns out that designers like nothing more than to hide something better. This changed every build." With 500 Ability Orbs to collect, 300 Secret Orbs, and plenty of interactive objects, this presented a problem. How big of a problem? "Around 10,000 environment bugs" and, further "...any place that people can get to had bugs associated with it at some point in development. Finding these bugs forced us to look at the way we tested games, and figure out what we could do better."
"On previous projects," Johns continued, "we had [performance] testers... every couple of weeks they'd do a perf pass... 'Are we trending towards where we need to be at the end of the project?' What it didn't get us was a heartbeat." For Crackdown, the team changed over to a PC application that would rip the data from the game instead of making people report it manually.
This generated maps with big red blotches -- making it easy to find the places where performance took a real hit. One block -- where the developers weren't expecting much traffic -- turned out to be a big performance drain. It turned out that the area offered a great vantage from which to look out over the city, so the testers were flocking there.
According to Johns, generating maps for Crackdown testing was a valuable tactic in other regards. "If you're collecting, along with crash information every build, the coordinates, you can map that information out. That doesn't make a lot of sense for all games, but if you think of an open world game..." Not only did it find broken areas, it also allowed them to compensate for bad builds and send people away from the bad areas while testing other features..
The Tactic: Automate Heavy Lifting
The team built tools to track data on object placment from build to build. "This was a huge pain. But it was also absolutely worth it," Johns offered. It kept people from ignoring problems with objects, since the objects were trackable. To test objects within the world, the testers worked with the developer to create a clean environment in which to load all objects to test them.
It saved time, and kept the testers from getting bored -- they simply had to review screenshots the tool generated to make sure the objects showed up. This took a couple of seconds, instead of several minutes per object. Coupled with the object placement tracker, they only ever had to jump into the game to actually check an object if and when it changed.
One thing that could not be automated was checking the seams in the world. With 495 city blocks and an average of 12 bugs per block (at a cost of 12 minutes to locate the bug) they discovered it would take 148 days to deal with all of the seam bugs. According to Johns, on prior projects this would be "the time we have to sit down, do the math, and find out how many testers this is going to take, and grin and bear it. On Crackdown we realized that isn't going to work."
The decision was made to create a new tool -- one that could make dealing with the bugs much, much faster. The team came up with a tool called SWARM. This allowed bugs to be tracked easily: each one had a text description and a screenshot, and it could track every bit of relevant data for each bug.
Since it was easy to see the bugs, this stopped duplicate bugs and also made them easy to check. Metadata was stored in each bug's jpeg iamge, which meant when that data was dropped into the Crackdown build, they'd teleport to the bug and verify it. Forza 2 took it one step further and integrated with Maya, so the bugged area could be loaded seamlessly and fixed.
A Few More Lessons
Johns left the session with some parting thoughts, some spurred by the Q&A that followed his talk. One suggestion was to not flip the debug hooks off in the game world, but slowly wean your testers off of them -- so they'd be playing the game like a consumer toward the end, but without a harsh and abrupt transition (which is what happened when testing Crackdown.)
Another observation: "Smart testers are hard to find and harder to keep, and harder to keep if you keep making them test the same bugs over and over again every single build." When asked about some notorious bugs that made the retail version, Johns confessed that they left in the double jump and the wall-climbing SUV because they were fun and didn't break the balance too much. To address gameplay concerns that aren't typical bugs, they had "suggestion bugs" about things that aren't fun in the game, too.
In short, the presentation opened up a lot of interesting questions about how bug test a sprawling, ambitious game like Crackdown. If there was one disappointment after it was over, it's that a lot of the tools that were used are proprietary to MGS and are currently not available to external studios. With custom builds required to create these tools, dealing with the testing issue could still prove to be a major headache.