Gamasutra: The Art & Business of Making Gamesspacer
View All     RSS
October 30, 2014
arrowPress Releases
October 30, 2014
PR Newswire
View All
View All     Submit Event





If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 
Big Data, Big Problems: A Mathematician’s Take on the Current State of Game Analytics
by Tom Matcham on 07/29/14 12:54:00 pm   Featured Blogs

The following blog post, unless otherwise noted, was written by a member of Gamasutra’s community.
The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.

 

90% Of The Way There?

A phrase I come across quite frequently with regards to Game Analytics is that ‘the simple stuff will get you 90% of the way there’. Whenever I hear or read this phrase, my immediate thought is ‘do you personally know that? Have you personally gone as far as modern research in machine learning and statistics can take you and concluded that the additional insight into your data provided by such tools was only worth 10% of your resulting report, and that bar charts and histograms and heatmaps told you 90% of what your company’s stakeholders needed to know about your dataset? Or are you presuming this to be true from what you’ve read from other people? Furthermore, what do you mean by “The Simple Stuff?”’

I’m not claiming that this statement isn’t true in many cases: I’ve definitely seen games where there really wouldn’t be much point in performing a logistic regression for example, but when the games industry has some of the best data sources available, I’m regularly surprised at the contrast between how game companies treat data compared to other industries. What I’m trying to say is that from my experience of talking with developers and producers, Applied Game Analytics, as a whole, is not done well.

Now, I completely understand that there are many constraints on the quality of an analytics investigation: time and money are extremely scarce, but developers still want insight into data. The problem is that a bad report is a very dangerous thing. Sample bias, misuse of data mining tools, misinterpretation of results and many other factors can lead to report conclusions that actually harm the game design and production process. It’s very rarely anyone’s fault, it’s just that doing ‘proper’ data science is difficult, and it’s incredibly important to have the right balance of statistics and computer science whilst performing an analysis.

Recurring Problems

Having investigated Game Analytics quite extensively, I’ve found 4 recurring areas that are frequently overlooked when game data is analysed:

Data Cleaning

Gamers are highly variable creatures and regularly act in bizarre ways. As such, it’s likely that if you don’t clean your data to remove outliers, the statistics of the gamers that you’re actually trying to design for will become distorted.  Make sure you think very carefully about who and what you’re interested in: getting good quality data to analyse your particular problem is paramount to a meaningful conclusion.

What Probability Distribution is Appropriate for my Data

Far too frequently are statistics in game analytics computed on the basis that the data is normally distributed. If your data isn’t normally distributed, and you perform statistical tests based on this assumption, you’re going to get duff results that could lead to bad design decisions and ultimately a flop game. Think carefully about the assumptions you’re making about your data: can these assumptions be tested?

Over-reliance on Data Visualisation

It’s understandable to want to visualise data, especially when game development is such a visual process. Furthermore, data visualisation is an absolutely key component of the analytics process. However, if all the reporting you’re performing can be boiled down to a pretty picture, then it’s likely that you’re missing out on a lot of potential insights into your dataset. In statistics, box plots, histograms and the like are all part of Exploratory Data Analysis, which is usually performed by a statistician to get a ‘feel’ of the data they’re working with before they do the proper work. It’s quite likely that your dataset contains more insight than is purely representable by graphs.

Behavioural modelling

Note: my opinion on this subject is biased as I have a personal interest in behavioural modelling.

In game analytics academia, it’s frequently stated that it’s very difficult to infer the motivations of a user. Whilst this is true in many contexts, if you’re willing and able to creating a model of the player’s behaviour, it’s likely that you can do a fairly decent job of understanding why certain events took place in a playthrough. Obviously having such motivational data would be extremely beneficial for designers and moneymen alike, yet it’s a largely unexplored area in game analytics in both academia and production. If a developer had the resources, modelling and analysing the behaviour of a player would go a long way in explaining other game behaviours.

Closing Remarks

Although this article may give you the impression that I'm not impressed with the use of game analytics in practice, that's not the case. By contrast, I believe that the systems that many companies have set up to collect and analyse data are state of the art. However, I do believe that more can be done to get the most of the datasets that studios collect. Understanding what our users want is the essence of the game development problem: better analytics across the industry will make solving that problem a little bit easier.


Related Jobs

Square Enix Co., Ltd.
Square Enix Co., Ltd. — Tokyo, Japan
[10.30.14]

Programmers
Blizzard Entertainment
Blizzard Entertainment — San Francisco, California, United States
[10.29.14]

iOS Engineer, San Francisco
DeNA
DeNA — San Francisco, California, United States
[10.29.14]

Software Engineer, Game Server
DeNA
DeNA — San Francisco, California, United States
[10.29.14]

Full Stack Engineer, Games






Comments


Ben Weber
profile image
Nice post. I would say the 90% rule is way off, especially if the analyst role is not embedded on the development team.

Another area sometimes overlooked when setting up telemetry is “planned use of data”. There’s often a focus on collecting as much data as possible with little planning on how the collected data can be used to impact the title. Without this step you can potentially do a lot of interesting analysis, at least from an academic standpoint, but the analysis might not be actionable. When defining a telemetry specification, I find it useful to include an “intended use” for each event hook if possible, such as marketplace optimization or player funnel analysis. Otherwise you might end up with big, useless data.

Tom Matcham
profile image
Completely agree Ben, seems very obvious but your point is frequently ignored.

Lucas Stinis
profile image
I totally agree with the statement that understanding why you want to collect and analyse (specific) data in the first place is essential before proceeding with the implementation of any analytics related calls. It's so obvious it's painful to see most companies don't plan effectively (due to the time/cost issues mentioned in the article), but also due to the lack of knowledge regarding data analysis.

A huge issue that is directly related to this, is the fact most companies simply don't understand what they need to do effective analysis *human resource* wise. While working for a number of companies I have seen the teams struggle to define what an analyst actually was supposed to be; what he/she should be capable of in the first place. Not necessarily in a universal sense, but given the tasks they were trying to accomplish. This included deciding/understanding if it should be a single person, a team or any other configuration; if he/she should be embedded in the dev-team, production team or even a free-agent.

In that sense I think (data)analysts could do themselves and everybody else a *huge* favour and collectively go through the painful steps of creating a public, well defined identity for themselves. We all know what a visual designer does, and even there it's quite difficult to pin-point what the person is exactly capable of, what his "objective" quality is, expertise, etc. but at least most companies have a good enough grasp of what the deliverables should be.

Tom Matcham
profile image
I think what you describe is an immediate consequence of game analytics being an emerging area. Although we hear a lot about it, I'm frequently surprised at how little knowledge games developers actually have about the subject and what data science can do. It's all a bit mystical, but I'm confident that will change as it becomes better understood and utilised.

Nick Lim
profile image
Hi Ben, there are tried-and-true ways to use the data that should be baked into the collection process. But the truly tricky things about collecting for a purpose is that it's a chicken and egg problem. The insidious problem for an emerging discipline like game analytics is that there is usually a lack of creativity beyond the tried-and-true, especially on how to use the output. So most folks actually look at the data being collected, then try to brainstorm ways to use it. ie, what they can dream up is limited by what is collected.

When data collection, storage and processing was expensive and slow in the past, I would say collecting with a purpose was absolutely the way to go. Remember data warehousing, anyone? The recent cost savings in cloud collection and storage may change that equation. For example, free data transfer into Amazon means you can receive unlimited incoming events at no charge, then decide later what to store. Archival storage via Glacier makes things even more interesting. Redshift and BigQuery are obliterating query costs!!

Ian Griffiths
profile image
Excellent post. I see that few people in games have a decent understanding of even basic statistics. One of my biggest issues with the understanding of data analysis in games is that people tend to think data should always be solely hypothesis driven, that there is no room for exploratory analysis.

There's definitely a push towards visualisation as it makes it easier to consume information, particularly when explaining things to a non-mathematically literate audience. However, I think that this is okay because telling an exec what the kurtosis of your distribution is will leave them with a fairly unimpressed look on their face.

I think that many in the gaming industry fundamentally understand the complex challenges in understanding data. It's much harder to understand complex behaviours than people realise, for example just because we can derive DAU in a chart in minutes doesn't mean we can figure out what factors are behind it just as quickly and easily.

I'd like to see more effort put into understanding data in statistics in our industry. People need to understand the importance of metrics and how they change over time. They then need to figure out how to turn those into actionable insights, or even better as Ben said above, plan their telemetry for that exact purpose. It's hard work but it's definitely worth it.


none
 
Comment: