Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Gamasutra: The Art & Business of Making Gamesspacer
View All     RSS
August 20, 2019
arrowPress Releases







If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 

Losing Control: A/B Testing in Mobile Games

by Andrew Kope on 11/16/15 01:16:00 pm   Featured Blogs

3 comments Share on Twitter    RSS

The following blog post, unless otherwise noted, was written by a member of Gamasutra’s community.
The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.

 

I hear the phrase "A/B testing" on an almost daily basis. It's often touted as a cure-all for game design decision making - remove personal bias from the equation, and make data-driven decisions because "the numbers don't lie" (like in Mark Robinson's article here). Now I'm not saying that A/B testing can't work, or can't be effective... but as with a lot of things which cross my desk, the devil is in the details.

Consider the following: We have published a F2P racing game, where users earn soft currency by completing races, with new cars/upgrades costing soft currency to purchase. You can enter only 10 races per day each costing one 'action point', with the option to buy more action points or more soft currency via IAP. User retention is good, but maybe UA is a little pricey given the game's relatively narrow target audience, so the execs are looking for a way to improve ARPU.

During a design meeting, the suggestion is made to change the UI so that the upgrade screen is visible ahead of the currently prominent race screen in the main menu... but after some discussion, the team divides. One side thinks this is a great idea; it will improve the ARPU by improving the visibility of the upgrade screen, a sink for in-game currency. The other side disagrees; downgrading the visibility of the race screen will make users run fewer races and therefore use up less of their action points, another important sink for IGC.

How does the team resolve this debate? How can we really know which decision is right? Invariably, the suggestion is made to A/B test. So, the programmers go to task and a change to the menu flow is made via DLC. 50% of new and existing users will now see the new menu highlighting upgrades, and the other half will see the old one highlighting races. In three weeks, the design team will have their answer... or maybe not.

Two days before the DLC change, the marketing team changed their UA strategy to use a different mix of advertisers and upped CPI bids in Tier 1 countries. Coupled with that, a week into the test the programming team fixed a server bug that was slowing down the download speed of new racetracks. Finally, during the two weeks since the change maybe we had a holiday like Thanksgiving, which prompted an in-game sale on all IAPs for Black Friday.

The assumption is that with enough users, the effects of everything but the menu change will even out: the so-called Law of Large Numbers. However, both the Black Friday sale and the higher ratio of Tier 1 users might have affected which IAPs users are making and thus which in-game currency sinks were most accessible. And who's to say how much shortening the download time of new racetracks might have improved engagement with that feature? Now it seems a lot harder to argue that the DLC menu change is the real cause of any changes to ARPU or user behavior than it did before the test. In reality, there is almost no way to individually quantify how much of an effect these coincidental changes might have had on the game.

At its core, any A/B test of a new game feature is really akin to running a psychological experiment with a treatment and control group, and then trying to determine if there was a statistically significant effect of the experimental manipulation. As a researcher in a psychology lab, you can take deliberate measures to ensure that your manipulation is the only difference between your treatment and control groups. As a game analyst, your experimental groups are the unfortunate victims of a whole host of factors outside of your control.

So where does that leave us? My point here is not that A/B testing can't be done, nor is it that data-driven decisions aren't the way to go in the F2P mobile ecosystem. Instead, I'm suggesting that before rushing to suggest another A/B test, both analysts and designers should consider the cost of achieving real experimental control - namely sacrificing the ability to make almost any other changes to the game - or run the risk of trying to make good decisions with bad data.


Related Jobs

innogames
innogames — Hamburg, Germany
[08.20.19]

Senior Unity Software Developer - God Kings
Red Lens Games, Inc.
Red Lens Games, Inc. — Redmond, Washington, United States
[08.20.19]

Senior Programmer
Cold Iron Studios
Cold Iron Studios — San Jose, California, United States
[08.19.19]

Principal Hard Surface Artist
Daybreak Games
Daybreak Games — Austin, Texas, United States
[08.19.19]

Senior Engine Programmer





Loading Comments

loader image