The first and simplest data-mining algorithm is naive Bayes, which is extremely human-friendly and comprehensible. It showed that the hypothesis metrics do not correlate with real churners. The second method, Decision Trees, revealed that a few of my ideas were actually quite useful, but not enough to boost the prediction precision to the top.
Data Mining algorithms 101: Naive Bayes is great at preliminary dataset analysis and highlighting correlations between variables. Decision Tree reduces the dataset into distinct subsets, separating the churners from happy players. These methods are both human-readable, but quite different in their underlying math and practical value. Neural Network is essentially a black box capable of taking complex variable relations into account, and producing better predictions, at the cost of being completely opaque for the developer.
I brainstormed with the Aion team, and we had a great time discussing our newbie players -- who they are, how they play, and their distinct traits. We remembered how our friends and relatives first stepped into the game and how their experience was.
The result of this brainstorming session was a revised list of in-game factors affecting newbie gameplay (had she expanded the inventory size, bound the resurrection point, and used the speed movement scrolls?) and also the brilliant idea of measuring the general in-game activity of players.
We used the following metrics:
By that time, we had also completely revamped the ETL part (extraction, transformation, and loading of data) and our SQL engineer made a sophisticated SSIS-based game log processor, focused on scalability and the addition of new game events from logs. Given the gigabytes of logs available, it was essential that we be able to add a new hypothesis easily.
New data was loaded and processed, models examined and verified, and results analyzed. For the sake of simplicity, I won't post more lift charts, but instead only the refined results:
Level 9's anomalous high precision was game-related at the time of research, so disregard that data.
At this stage, our models improved their prediction power -- especially levels 2 to 4 -- but 6 to 8 are still way too bad. Such imprecise results are barely usable.
Decision Tree proves that general activity metrics are the key prediction factors. In a sense, playtime per level, mobs killed per level, and quests completed per level metrics comprised the core prediction power of our models. Other metrics contributed less than 5 percent to overall precision. Also, the Decision Tree was rather short, with only two or three branches, which means it lacked the relevant metrics. It was also a mystery to me why all three algorithms have variable precision/recall rates from level to level.
Phase 2 Result: We've achieved considerable success with general activity metrics, as opposed to specific game content-related ones. While precision is still not acceptable, we've found the right method for analysis, using Bayes first and Tree afterwards.
Inspired by visible improvements in the data mining results, I set up three development vectors: more general activity metrics, more game-specific metrics, and a deeper learning of the Microsoft BI tools.
Experimenting with general activity, we finally settled on the silver bullets:
Those metrics accounted for massive increase in recall rate (thus fewer false positives, which is great news!) Decision Tree finally started branching like there is no tomorrow. We also saw the unification of different data-mining algorithms for all levels, a good sign that the prediction process was stabilizing, and becoming less random. Naive Bayes was lagging behind the Tree and Neural by a whopping 10 percent in precision.
New individual metrics actually were quite a pain to manage. Manual segmentation for auto-attack use involved some math, and things like 75th percentile calculation in SQL queries. But we normalized the data, allowing us to compare the different game classes; the data mining models received category index data instead of just raw data. Normalized and indexed new individual metrics added a solid 3 to 4 percent to overall prediction power.
Combat 101: In online games, characters fight with skills and abilities. Auto-attack is the most basic, free action. Experienced players use all skills available and their auto-attack percentage will be lower -- although game and class mechanics heavily influence this metric. In Aion, the median for mage is at 5 percent while the fighter is at 70 percent, and even within a single class, the standard deviation is still high.
The next move was reading the book Data Mining with Microsoft SQL Server 2008 in search of tips and tricks for working with analysis services. The book itself was helpful for explaining the intricacies of Decision Tree fine-tuning, but it also led me to realize the importance of correct data discretization.
In the example above, we've manually achieved discretization of the auto-attack metric. The moment I started tinkering with the data, it became obvious that SQL Server's automated discretization could and should be fine-tuned. Manually tuning the number of buckets heavily affects the Tree's shape and precision (and that of other models too, for sure -- but for the Tree, changes are most visible).
I've spent a whole week of my life tuning each of the 30+ dimensions for each of the nine mining structures (one structure per game level; nine levels total). Experimenting with the buckets revealed some interesting patterns, and the difference between seven and eight buckets could easily be a whopping 2 percent precision increase. For example, the mobs killed bucket count was settled at 20, total playtime at 12, and playtime per level at 7.
Fine-tuning yielded a great decrease in false positives, and boosted the Tree up to the numbers for Neural Net:
Phase 3 Result: Finally, we've got some decent figures, and we've also gathered a lot of interesting data about our players.