There has been quite a lot of hype around big data (BG), with every vendor adding it to their marketing slogan and tag lines. Having been in data and consumer marketing for over 20 years, I was quite dismayed to watch the promise of big data following the Gartner Hype cycle... This series of blog posts hopefully cuts through and clarifies the issues surrounding big data, and more importantly applies them to the digital games industry.
In addition to the experiences of my company Sonamine (www.sonamine.com), I will lean heavily on an excellent book "Big Data: a revolution that will transform how we live, work and think". I won't be touching on the technology behind big data, I'll leave that to the vendors.
What does big data look like?
Rather than providing a definition, it is more useful to simply describe some of its characteristics.
Leveraging big data for business value
In their book, Mayer-Schonberger and Cukier make the point that "at its core, big data is about predictions" (p.11). Examples given include the usual ones such as Google predicting which webpages are most relevant to your search and Amazon predicting which items you are most likely to buy. The data used for these predictions include
In Sonamine's experience with game developers, being able to predict which free user is ready to convert or which user is ready to make an additional purchase provides tremendous insight for the marketing team. Using the user predictions for marketing campaigns results in much higher conversion rates without spamming the entire user base. The data used for these predictions could include
Learning when not to ask why
When we use Google or browse the Amazon catalog, we don't really care why certain web pages are more relevant or why Amazon recommends certain products. But there is an interesting story recounted in the book...
Amazon first started by adding product reviews to drive sales, assuming that better reviews would drive more sales. They had a large editorial team writing these reviews. Then, Greg Linden came up with an alternative algorithmic approach, which was then rigorously compared with the product reviews. These A-B tests showed that the results were "not even close". In the end the algorithm worked so much better that it is driving up to 30% of Amazon's sales today.
It's important to recognize that the algorithm did not provide any explanation for why a user would buy a certain product. Amazon was one of the first companies to realize that although trying to come up with reasons "why" was interesting, and knowing "why" would be pleasant, it was unimportant for stimulating sales. Incidentally, they shut down the editorial group. If the Amazon team had insisted on explaining the reasons why a user bought a specific product, then they might not have implemented Greg Linden's approach, sacrificing 30% of their current revenue.
A consumer friendly example is in order. There was a company called Farecast, which took historical prices of flights and used it to predict whether prices would increase or decrease. If the flight price was predicted to drop, consumers could wait for a bit. The predictive models were incredibly useful and helped consumers save money. There are many reasons why airlines change their prices; none of those reasons was available and included in the predictive model. All the predictive model used were historical prices. Farecast was quickly purchased by Microsoft and integrated into the Bing flight search. Other travel sites such as Kayak now have similar predictive price capabilities.
Sonamine has worked with dozens of game developers predicting which users are ready to convert. In all of them, we encountered the "why" question. Designers and marketers wanted to know why specific users were ready to buy and not others. Unfortunately, the truth is that no one knows the true answers to this question, which might vary from user to user. The only thing we know is that the algorithms work much better than human intuitions; and that has been demonstrated in all our customers. Some customers were never able to overcome their fear of not knowing why, thus giving up the opportunity to get up to 20% more revenue from their players.
"Knowing what, not why, is good enough"
I am not saying that we should not ask why. Rather I'm saying that with big data, we don't have to know the answers to "why" before we act. Of course the decision to act on the predictions is dependent on the situation. The more important the impact of the decision, the more reason to act. For example, manholes have been exploding in New York City for many years, seemingly for no logical reason. A very good predictive model was developed that could identify the manholes most likely to explode. Officials could just take action and pre-emptively replace these manholes, preventing the majority of explosions. Knowing the reasons might allow us to prevent future explosions, and more research is certainly warranted. But in the meantime while the search for the real answer is ongoing, we can prevent the majority of explosions.
In the same way, when a game developer sees a major player drop off in level 10 of a game, they may try to change that level in different ways without knowing the "true" reason why users are dropping off.
In many cases, it would be impossible given our existing technology and constraints to answer the question of why. Let me draw from some Sonamine examples. When a user is ready to abandon a game, it is not possible to survey all of them when there are millions of players. Survey methods are notoriously subject to user biases such as recall availability.
In the same fashion, it is not really possible to answer the question why users are ready to make the first purchase. Sometimes, marketers or game designers argue that when a user gets to a high level or are "invested" in the game, they will buy. The problem is that there are always paying users who are at level 1 and others at high level that don't buy. Here's where big data starts to make explanations even harder and unrealistic, because more data points make the picture more complicated until it becomes totally in-comprehensible by a human.
In other words, when true explanations are impossible or impractical to obtain, use big-data algorithmic approaches to guide your actions. In fact, most of us frequently act on the basis of incomplete information. The only difference is that we are more confident and trusting of our own decisions, which are usually based on our "experience" and "expertise". Whether human experience is better or worse than big data algorithms is up in the air, but there are many documented cases where algorithms are far better. And we have seen it first hand at Sonamine.
When to act using big data predictions (even without the why)?
For the games industry, many decisions are rooted in creative and historical experiences. And this poses a unique challenge in adopting big data. It's true that no amount of big data can produce a new hit story line or new game genre. Data does not help us create, conceive and imagine new things; where big data can help is to improve things once the game is conceived, prototyped and ready to go. So if your responsibility is to improve and enhance a game that is already live, you should try letting big data help you. Here are some general situations in which big data would applicable:
The next post in this series will cover the implications of letting the data tell you its story and the expert... I welcome your comments and feedback. You can reach me at nick_at_sonamine.com
Mayer-Schonberger, V and Cukier K. Big Data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, Boston, New York 2013 (Amazon link)