This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.
One of the key insights from the first project was how important general activity metrics are for predicting churn.
We expected them to also be important for analyzing veteran players, but nonetheless decided to try some game-specific and social metrics as well:
Daily activity and playtime were expected to be key predicting factors, and so they are.
When we analyzed new players, we had only a couple of days of data, thus using simple instant values. However for veterans we think in terms of weeks and months, so different approach is required for data aggregation over time. Moving totals and moving averages, derivatives and intercepts are useful in this case.
We used a moving 30-days sum for analyzing long-term daily activity, approximated by a linear equation. The actual metrics that went into data-mining model were slope and intercept of the approximation line. Their calculation with ordinary least squares method is fairly easy with T-SQL during the data preparation process.
For daily playtimes' analysis, days of inactivity with zero playtimes were stripped before applying approximation:
ETL procedures had to be rewritten from scratch and all data reloaded, but idea was well worth it: even first results on raw, untuned data mining models for the 16-20 cohort were around 80 percent precision.
In the end, with about 30 metrics with different aggregation periods and methods, we achieved 80 to 90 percent precision in predicting players who are about to fall into The Pit in two to three weeks. This is quite an impressive result, but for a couple of months we were stuck with it, no matter what new metrics and methods we tried. Finally, a breakthrough happened with the introduction of detailed past metrics.