Gamasutra: The Art & Business of Making Gamesspacer
View All     RSS
October 23, 2014
arrowPress Releases
October 23, 2014
PR Newswire
View All
View All     Submit Event

If you enjoy reading this site, you might also want to check out these UBM Tech sites:

Opinion: Big Data in the Game Industry
by Nils Pihl on 05/22/14 12:34:00 am   Expert Blogs   Featured Blogs

The following blog post, unless otherwise noted, was written by a member of Gamasutra’s community.
The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.


As an industry, we’ve increasingly turned our attention to the promise of big data and analytics. The concept is simple - When operating at a large enough scale, when collecting terabytes of data, billions of events, even one-in-a-million insights become predictable assets in our developer arsenal, allowing us to predict and repeat success. We’ve seen it work for companies like Zynga and Kabam, and thousands of blog posts and panels and articles proclaim the bright future of the big data market. The sheer size of our modern marketplace has tilted the odds in our favor.

20 years ago, very few could dream of having tens of millions of active users, let alone having their behavior and customs recorded and stored. The market has changed, our attention has shifted - but in our heart of hearts we know that comparatively little innovation has followed our increased interest in big data. Data-driven design is still incredibly difficult to effectively incorporate into your workflow, and far from every company manages to get a return on their big data investment. Analytics is still difficult because we’re stuck in a paradigm that is as old as I am.

"The McKinsey Global Institute projected that there will be a shortage of 190 000 data scientists by 2018."

For those that paid attention, the last 20 years of technological development has been truly remarkable. We’ve gone from floppy disks to the app store, from dial up to broadband, from DOS to iOS, from command line to touch screen - but SQL is still SQL.

Imagine a world where the graphical operating system never caught on, where business was still conducted primarily in DOS and other command line interfaces. Remember, or imagine, the frustration that came with needing specific training to use some of the most instrumental tools of your trade. Imagine hiring people to operate those machines, instead of buying machines to improve the productivity of the people you hire. It was a nightmare then, and it is nightmare today, because that’s exactly where we are today with analytics.

Last year the McKinsey Global Institute projected that there will be a shortage of 190 000 data scientists by 2018. There are already 1000 job openings for data scientists in San Francisco alone. This tells us two things:


1) We want to make data-driven decisions.

2) We’ve made ourselves painfully reliant on data scientists, because data-driven decision making is hard.


So what exactly is a data scientist? I saw this humorous tweet the other week, and I thought I’d share it with you: 

Data Scientist (noun): Person who is better at statistics than any software engineer and better at software engineering than any statistician.” 

Another netizen quipped that a data scientist is a statistician, living in San Francisco, much like a “growth hacker” is a Marketing Professional living in San Francisco. These data scientists will cost you at least 130k dollars/year, and through our work on Traintracks we increasingly find that up to 1/10 of the development workforce at a typical game studio is hired to wrangle, wrestle, and munge data. And why do we need them? 

Without pointing any fingers, I lifted this sentence from the website of a company that is claiming to innovate in the field of analytics and big data:


“With its unmatched speed and familiar ANSI SQL interface, [this product] is a powerful drop-in solution to fix application performance issues.”


Familiar ANSI SQL interface! How’s that for a sentence? How many of the people best suited to make design decisions at your company are even proficient in SQL? 

Can I put your through to your insight?

If we don’t know how to use it, it’s not our interface. The data scientist is the interface. The data scientist is a human keyboard, not unlike the telephone operators from the infancy of telephony, and they operate between you and your data, between you and your power to make data-driven decisions - and by the year 2018, there will be a shortage of 190 000 data scientists. This is not a sustainable state of affairs.

Based on the readership of this blog, it’s a pretty safe bet that some of you work at a company that has already spent a million dollars on analytics-related expenses this year. Our industry is literally spending millions of dollars attempting to make data-driven decisions so that we can address the concern that we all share: How can we make our next investment successful?

Why is it so hard being a game studio? Because there are no guarantees, and no continuity. You can have game with 10 million MAU today, and your next game might not even get 100 000 downloads. Game studios disappear seemingly overnight. We all worry about it. We worry about it as employees wanting job security, we worry about it as executives wanting to build better companies, and we worry about it as investors. The days of easy money for game studios are over. Investors have rightfully gotten weary of the industry’s track record. The industry itself, and its investors, are all looking for the same kinds of innovations and solutions.

Bruce Gibney at the Founders Fund captured it rather eloquently in a blog post entitled “What happened to the future” when he wrote:

"At the least grandiose level, we need analytical software much more powerful and much easier to use than the current state of the art. Most analytical platforms are exceedingly arcane, requiring lengthy experience with that exact platform to acquire mastery, and yet the quality of analysis remains fairly poor. It does society no good to collect huge amounts of data that only a small minority can analyze, and even then only partially."

Arcane is the right word. In 1994 the developers at my father’s company built their own database, and their own programming language, to reduce the need for complicated SQL workflows. As misguided as that might seem, they felt it was the only way for them to really scale their business. That was 20 years ago, and not much has happened since. We are still offered that “familiar ANSI SQL interface”, and we’re asked to think of it as “familiar” rather than arcane.

There’s this joke about SQL, you might have heard it:


A SQL query walks into a bar, confidently approaches two girls at two tables and asks “May I join you”?


If you’re a nerd like me, it’s a pretty funny joke, well worth a chuckle. Even funnier to consider is that the joke might be older than me.

The big data industry is still waiting for it’s “1984 moment”. We’re waiting for something like the Macintosh, with its graphical OS and commitment to ease-of-use, to come around and change the way we interact with data entirely, but it won’t happen unless the industry realizes just how poorly served we are by the current paradigm.

There are a lot of differing opinions on where the industry has to go - or can go - from here, but my time and money is invested in a simple idea: Great games are not built by data scientists. They're built by great game studios that have a commitment to making good games, rather than chasing metrics. By empowering those studios to get to know their players better, and to let them connect with, and understand, their audience in a way that is currently reserved for the analysts, I hope to have some small impact on the great games of tomorrow. With Traintracks, we hope to make the term "data scientist" obsolete as fast as it became trendy, and no game designer should be forced to learn SQL to stay competitive. If the last 20 years have taught me anything, it is that a technology's full potential cannot be reached before it can be effectively used by someone without a degree. The video game industry is built on the foundation of one the most impressive feats of tech democritization ever, and we should continue to innovate.

There is a bright future for the games industry, but your place in it is in no way guaranteed. Your future depends on how well you manage to predict and repeat success. So bring your data into play.

Related Jobs

Avalanche Studios
Avalanche Studios — New York, New York, United States

UI Programmer
Avalanche Studios
Avalanche Studios — New York, New York, United States

UI Artist/Designer — Hunt Valley, Maryland, United States

Lead UI Engineer — Chicago, Illinois, United States

Lead UI Engineer


Pin Wang
profile image
Very true. Indie developers like us don't have their own (expensive) big data solutions or the (expensive) teams to back them. I've seen several small to mid-sized studios flounder on trying to answer basic questions like "which features are working for us?" or "why are we really making money?"

The current free solutions for indies make it easy to get superficial data, but it can be surprisingly difficult to draw real insight. The truth is, even Zynga who do extensive AB testing see a big percentage of experiments result in a non-significant difference. This is why big data has a bad rap amongst indies and why game analytics have a long way to go!

Nils Pihl
profile image
I asked a game studio I know if new players of their FPS title preferred splash damage weapons or precision weapons. It was surprisingly difficult to answer! Those two particular categories were not instrumented in the game code, and segmenting on "new players" is always tricky. To answer the question, their very competent CTO said he'd have to get the code re-instrumented, update the parser, upload a new version to the app store, wait for approval, record enough data to get a statistically significant answer... Total cost of ownership on that question, for this medium sized studio, would have been 20k USD according to the CTO.

If you're Zynga, dropping 20k on a question like that might make sense - because something as little as a 1% increase in sales on that title could make up for the investment. For a studio like the one I asked, it was a question too expensive to invest in. Most studios can't afford to drop tens of thousands of dollars on every behavioral question they might have.

Heather Stark
profile image
20k seems a lot for that work description. Mind you I have no idea what they need to do to their parser. Or how much they pay their devs....!

Nils Pihl
profile image
Heather, I think most of the cost was in re-instrumenting the game, updating their analytics platform, finding the right time to push these updates to the app store (since this was a mobile game), debugging results etc...

That being said, I was not responsible for the calculation, I am just recounting what the CTO said.

P.S. A good Dev is paid between 50k and 100k USD in Beijing, where this took place.

Johan Toresson
profile image
Great post, and really nice to see someone sharing my view of SQL as an outdated model to access data. Looking forward to what the future brings. :)

Eric Finlay
profile image
Big data is a buzzword for many industries (game dev and genetics mainly), but it hasn't yielded a fraction of what it was meant to. Personally, I think measuring game enjoyment is beyond the data that we can log.

Nils Pihl
profile image
I disagree! I think we can measure game enjoyment to some degree of certainty, although certainly not with perfect confidence. That being said, people are not doing that. Most analytics we come across focus on very simple metrics like retention that don't really help you make your game better.

Some studios measure meaningful things like how player death frequency per [time unit] influences other behaviors - and those kinds of questions, although difficult and costly to ask, give us a rough sketch of player mindset. You can see rough patterns of enjoyment, frustration, exhilaration and competitive mindsets if you know where to look.

Ward W Vuillemot
profile image
As a product manager investigating this space, I am very much interested in learning where the real pain lies. As jay Anne notes, some of the pain is up front at time of instrumentation. And there is only so much a middle-ware solution can do, since this is where the "semantics" are added. Only a human can really differentiate at time of instrumenting if a weapon is splash or precision damage, as illustrated by Nils Pihl. Is this the single biggest pain - just the act of instrumenting? Or is the entire "value stream"? In talking with monetization managers and gameplay analysts, part of the problem appears to be with the inflexibility of the models themselves; namely, that each middle-ware creates "entities and relationships" on a single world-view that more caters or influenced by retail than game experiences. I would love to understand what a killer solution would look like in a bit more specific terms. And I would love to hear peoples' thoughts on how "big data analytics" can augment both game development (pre-launch) and game operations (post-launch).

Ben Weber
profile image
One of the biggest issues in meeting the demand for data scientists in games is a lack of data sets that students can use for building the necessary expertise. Sure, you can write a script to pull data from a web site such as the Steam Gauge tool, and even avoid using SQL altogether, but it's quite limited in the types of metrics and analysis that can be performed. Unfortunately, I don't see this changing any time soon due to privacy issues, and the valid concern that even obfuscated data can be revealing. Some of the ways of dealing with this limitation are working with Indie titles to set up data hooks, working at a large company with a big data solution in place, or using and sharing data from publicly available data sets (while anonymizing the users).

Jeff Hsu
profile image
"Meeting the demand for data scientists" isn't the right issue to focus on, because data scientists aren't generally very useful within any company in the first place, especially for the salaries they get but don't deserve.

Data science work isn't usually as fantastically productive as the industry idealizes it to be. Data scientists are often stuck with very menial tasks that have nothing to do with fancy programming tasks nor fancy statistics -- most of their time is dedicated to lame sql queries and cleaning up dirty data due to the inevitable evolution of their data schema. Unfortunately, every non technical person who wants to ask a simple question about their data has to go through these "human keyboards."

Instead of asking how we can breed more data scientists to further isolate non technical employees from the data, we should really be asking, "what software will allow anyone within a technology company to directly interface to and interrogate their data, without going through any human bottlenecks?" Instead of focusing on hiring the perfect data scientist (who is both an expert in programming and statistics and unfortunately doesn't exist), why can't we build software that democratizes data for everyone -- for the game designer who wants to do data driven design, or the statistician and programmer who know their respective fields better than a data scientist does?

Sebastian Lagemann
profile image
Wow, I'm really surprised that SQL seems for some commentators as well as the post writer is still the standard of getting some insights of data. We at try to solve that already for more than 3 years and we made a lot of progress there. Even with our free or cheap priced indie versions it would be really easy to get some of your questions answered within seconds and without doing any kind of SQL or technical complicated things. Sure, you have to implement some kind of events to send us the informations from your app but that's all and a small effort for the huge valuable informations you can achieve afterwards with just a few clicks.
You all should try out HoneyTracks immediately and if you're an indie get in touch with us for some special offers! We're also very interested in your feedback and we would really appreciate for any kind of feedback.

Nils Pihl
profile image
For Indie and smaller studios that are not doing very complex analytics, SQL is not the standard. You don't need SQL to ask trivial and almost completely unactionable questions like:

"Is new users’ ARPU the same as it was for existing users in their first month?"

"What is the virality and the ARPU of level 5 users compared to level 10 users?"

Both of these questions were lifted from your own promotional materials, and the answers would be of very little value.

ARPU is a pretty problematic metric to begin with, but let's say the answer to the first question is "New user ARPU is 0.75x that of first the month ARPU", and let us ignore the fact that it is a question so trivial to answer that it could pretty comfortably be done in Excel. Now what?

How does it help you knowing that ARPU is down 25%? What caused that? How do you reverse the trend?

How does it help you to know that virality is 2x higher for Lvl 10 players? Are you going to make more Lvl 10 players?

At HoneyTracks, how would you solve a question like:

"What was the most popular weapon among all players in North America that have a win streak of at least 4 consecutive matches in a single session? Is it statistically different from those who do not have such win streaks?" (In my experience, it is for questions like this that larger studios start flexing their SQL muscles)

Once that question has been answered there's a good likelihood that you'll have a dozen really meaningful follow up questions. Going down a rabbit hole like this is incredibly costly, and no off the shelf solution is going to make a meaningful impact on the time-to-insight for such data exploration.

There's a reason Zynga, Activision, EA, Ubisoft, Rovio, Supercell, etc all built their own in-house systems.

Or how about something seemingly trivial? In a case study for a somewhat well-known analytics tool focusing exclusively on games, one indie developer had used the tool to track where players were having difficulty in his platformer game. The following account is a paraphrase:

He instrumented event types such as FellOnSpikes, FellOffScreen and HitByFlames, and started tracking the occurrence of these event types. In a post-mortem of his game he realizes that he would have been much better off having instrumented a single event type PlayerDied, storing what killed the player in JSON instead. This would have allowed him to ask better questions faster. Lesson learned? Spend more time thinking about instrumentation next time!

Analytics is not as easy as the likes of Honeytracks, Mixpanel, Kissmetrics and Kontagent want you to believe.