Developing metrics for free-to-play products
In my previous post, I outlined a few ways in which a "sales funnel" KPI model changes between different businesses, and argued that it really doesn't serve a free-to-play business well. Today, I'll summarize a few ways in which a free-to-play model can be measured effectively.
Free-to-play is a games industry term, but the model is a bit more general. In effect, this model is one where a free product or service exists not only as a trial step on the way to converting a paying customer, but can serve both the user as well as the business without a direct customer relationship, for example by increasing the scale of the service, making more content available. From a revenue standpoint, a free-to-play service is structured to sell small add-ons or premium services to the users on repeat basis - in the games space, typically in individual transactions ranging from a few cents to a couple of dollars in value.
As I wrote in the previous article, it's this repeated small transaction feature which makes conversion funnels of limited value to free-to-play models. Profitable business depends on building customer value over a longer lifetime (LTV), and thus retention and repeat purchase become more important attributes and measurements. Here is where things become interesting, and common methodologies diverge between platforms.
Facebook games have standardized on measuring number and growth of daily active users (DAU), engagement rate (measured as % of monthly users on average day, ie DAU/MAU), and the average daily revenue per user (ARPDAU). These are good metrics, primarily because they are very simple to define, measure and compare. However, they also have significant weaknesses. DAU/MAU is hard to interpret as it pushed up by high retention but down by high growth, yet both are desirable. Digital Chocolate's Trip Hawkins has written numerous posts about this, I recommend reading them. ARPDAU, on the other hand, hides a very subtle, but crucially important fact regarding the business - because there is no standard price point, LTV will range from zero to possibly very high values, and an average value will bear no reflection on either the median nor the mode value. This is, of course, the Long Tail like Pareto distribution in action. Why does this matter? Well, because without understanding the impact of the extreme ends of the LTV range to the total, investments will be impossible to target, implications of changes impossible to predict, as Soren Johnson describes in an anecdote about Battlefield Heroes ("Trust the Community?").
Another way of structuring the metrics is to look at measured cohort lifetimes, sizes and lifetime values. Typically, cohorts will be defined by their registration/install/join date. This practice is very instructive and permits in-depth analysis and conclusions on performance improvement: are the people who first joined our service or installed our product last week more or less likely to stay active and turn into paying users than the people four weeks ago? Did our changes to the product help? Assuming you trust that later cohorts will behave similarly to earlier ones, you can also use the earlier cohorts' overall and long term performance to predict future performance of currently new users. The weakness of this model relates to the rapidly increasing number of metrics, as every performance indicator is repeated for every cohort. Aggregation becomes crucial. Should you aggregate data older than a few months all-in-one? Does your business exhibit seasonality, so that you should compare this New Year cohort to the one last year, rather than to the one from December? In addition, we have not yet done anything here to address the fallacy of averages.
The average problem can be tackled to some degree by further increasing the number of cohorts over some other feature than the join date, such as the source by which they arrived, their geographic location, or some demographic data we may have of them. This will let us understand that French gamers will spend more money than those from Mexico, or that Facebook users are less likely to buy business services than those from LinkedIn. This information comes at a further cost in ballooning the number of metrics, and will ultimately require automating significant parts of the comparison analysis, sorting data to top-and-bottom percentiles, and highlighting changes in cohort behavior.
Up until now, all the metrics discussed have been simple aggregations of per-user data into predefined break-down segments. While I've introduced a few metrics which can take some practice to learn to read, the implementations of these measurements are relatively trivial - only the comparison automation and highlight summaries might require non-trivial development work. Engineering investments may already be fairly substantial, depending on the traffic numbers and the amount of collected data, but the work is fairly straightforward. In the next installation, I will discuss what happens when we start to break the aggregates down using something else than pre-determined cohort features.