3. Content Avalanche
Brütal Legend is not a small game. Fortunately, we thought we knew what we were facing and invested heavily in data/build infrastructure. What went horribly awry was that we both underestimated the total content push and, more importantly, didn't anticipate the huge content spike at the very end of production. From the start of the game through the end of 2008, both our rate of data churn and data growth were fairly steady and corresponded roughly to increases in staffing and team productivity. This was expected and planned and supported by the technology.
But then, in January 2009, everything exploded. All at once. After three years of development we had accumulated about 2.5 GB of optimized/packed game data. Less than four months later, we'd jumped to over 9 GB.
The central cause of this was a very large increase in asset delivery from a number of teams simultaneously. For example, we went from 0 localized files to about 100,000 in a matter of weeks.
We received the high resolution video assets for the Jack Black intro and all our main menus in one heap. We made a late decision to contract additional audio work, and new ambiences and sound effects were quickly added to the game. And so on.
This simultaneous significant increase across a number of types of content put a massive burden on our entire infrastructure, in particular our build machine, Perforce server, and network backbone. To exacerbate matters, we started to see cascade effects -- where a massive hit to one system (such as a check in of 10,000 .wav files) would bog down Perforce, causing a bottleneck in all of the dependent systems (like our build server and individual check ins) and these bottlenecks would then cause other bottlenecks.
It's a credit to our pipeline and build infrastructure that things never failed, but we experienced a number of severe performance degradations, many of which required emergency interventions from the engineering team. This unexpected firefighting caused lost productivity, invariably at the worst possible times (like during preparation for a demo or a milestone).
These large content dumps also put significant strain on our runtime systems. The per-line memory overhead in the voice system was not prepared to handle tens of thousands of lines, causing us to panic about our ability to even fit on a dual layer DVD.
Across the board, these unexpected increases in content caused ripple effects throughout our IO, memory, and processing profile. And because the rate of increase was both high and unexpected, the engineers responsible for wrangling these systems were pulled from their assigned work and redirected to emergency fire fighting.
Moving forward, we will be much more cognizant about working with content creators to proactively estimate the total amount of data that they plan to create and to factor these numbers into our technical designs to ensure that we meet the final needs of the product. Additionally, we plan to invest more in scalable data infrastructure in the hopes that we can be better positioned to bring new capacity online quickly should it prove necessary. With those improvements and a little luck, hopefully content avalanche handling will be something we brag about in our future projects.
4. Facilitators vs. Implementers
As Brütal Legend moved into production, it became clear that the team was understaffed in key implementation positions. To meet this need, we reallocated positions set aside for facilitation hires (such as design and production) to staff up more implementers (such as animators and programmers).
While it was necessary to staff up on implementers, we failed to recognize until we were deep into production that our overall efficiency was reduced by this personnel trade, due to the statistical increase in designs that had to be reworked, an up-trend in management oversights, and general miscommunication.
The price of understaffing the design department meant that we were often implementing ideas that had only been loosely discussed, before the feature had been fully planned. Serious flaws were sometimes discovered only after a feature had been partially or fully implemented.
This also meant that there was always a hefty backlog of decisions and specifications, even before adding in the rework required after false starts. Since it is often at the top of the dependency chain, bottlenecking design had a deleterious impact on the rest of the team, especially considering our broad game scope.
The same understaffing symptoms could be found in the production department. Producers are responsible for communication, for resolving dependencies, for ensuring the team works reasonable hours, and for keeping the project on schedule. They ensure that the road ahead is paved well ahead of the team's arrival. With insufficient staffing, some of the production responsibilities fell to the leads, or, absent that safety net, inefficiencies were handled with overtime and stress.
In retrospect, staffing facilitation and implementation positions should have been equally prioritized. Scoping the game to meet our capacity to create it would have almost certainly meant a smaller game, but one more likely more polished and smoothly managed.