Last week I discussed segmentation as a method for identifying and differentiating customers for their specific service needs. Whether used for young cohort's introductory period service, high-value segments special treatment, or to identify the group on a transitionary path to high value and help accelerate that process, segmentation is a very versatile tool for business and product optimization. It can be approached with many techniques and I'll go on to more implementation details on those. But first, an introduction to the next topic after segmentation: social metrics.

While social behavior is not historically strongly featured by many products in either the gaming space or in the wider scope of freemium products, your customers and users are people, and thus they will have social interaction with others you can benefit from. If you can capture any of that activity in your product measurement, it can serve as a very valuable basis for in-depth analytics. Today, I will focus on those products and services in which their audience can interact among each other - that is, there is some sort of easily measured, directly connected community.

Any such product will probably have user segments such as:

  • new users who would benefit from seeing good examples of effective use of the product, guidance on the first steps, or some other introduction beyond what the product can do automatically or what your sales or support staff can scale to
  • enthusiasts who would like nothing better than to help the first group
  • direct revenue contributors who either have a lot of disposable income, or otherwise find your service so valuable to them that they'll be happy to buy a lot of premium features or content
  • people who, though they're not top customers themselves, find innovative ways to use premium features for extra value
  • people who are widely appreciated by the community for their contributions, "have good karma"
  • people whose influence within the community is on the whole negative due to disruptive behavior

and many, many others. Two of these groups are easy to identify simply based on their own history, I'm sure you'll recognize which two. The other four are determined largely by their interaction with the rest of the community and other users' reaction to their activities. How do you find them? This is a rapidly evolving field of analytics with constantly growing pool of theoretical approaches and practical tools, and can look daunting at first. The good news, there are many practical tools already, and while theoretical background helps, the first steps aren't too hard to make.

You'll need to develop some simple way to identify interaction. The traditional way to begin is to define a "buddy list" of some sort similar to Facebook friends network, Twitter following, or a simple email address book. However, I find a more "casual" approach of quantifying interactions works better for analytics. Enumerate comments, time in the same "space", exposure to the same content, common play time, or whatever works for your product. At the simplest level, this will be a list of "user A, user B, scale of interaction" stored somewhere in your logs or a metrics database. This is already a very good baseline. With the addition of time/calendar, you'll be able to measure the ebb and flow of social activities, but even that isn't strictly necessary.

Up to data set of about 100k users and half a million connections or so, you'll be able to do a lot of analysis just on your laptop. Grab such a data dump and a tool called Gephi and you're just minutes away from fun stuff like visualizing whether connections are uniformly defined or clustered into smaller, relatively separate groups (I bet you'll find the latter - social networks are practically always have this "small world" property). This alone, even though it isn't an ongoing, easily comparable metric, will be very informative for your product design and community interaction.

In terms of metrics and connected actions, here's a high-level overview of some of the more simple-to-implement things:

  • highly connected users are a great seed for new features or content, because they can spread messages fast and giving them early access will make them more engaged. While in theory you'd want to reach people "in between" clusters, the top connected people are an easy, surprisingly well functioning substitute.
  • those same people with a large number of connections are also critical hubs in the community, and you should protect them well, jumping in fast if they have problems. This is independent of their individual LTV, because they may well be the connection between high-value customers.
  • high clustering coefficient will indicate a robust network, so you should aim to build one and increase that metric. Try introducing less-connected (including new) people to existing clusters, not simply to random other users. A cluster, of course, is a set of people who all have connections to most others in the cluster (i.e., a high local clustering coefficient).
  • Once someone already has a reasonable number of semi-stable relationships (such as, 4-8 people they've interacted with more than once or twice), it's time to start introducing more variance, such as connecting them to someone who's distant in the existing graph. Most of these introductions are unlikely to stick, but the ones that do will improve the entire community a great deal.
  • if you can quantify the importance of the connections, e.g. by measuring the time or number of interactions, you can further identify the top influencers apart from the overall most connected people.
  • finally, when you combine these basic social graph metrics to the other user lifetime data I discussed previously, you'll get a whole new view into how to find crucial user segments and predict their future behavior. This merged analysis will give you measurable improvement far faster than burying yourself into advanced theories of social models, so take the low-hanging fruits first.

That's it for yet another introductory post. Time for feedback: what other analytics areas would you like to see high-level explanations about, or would you rather see this series dive into the implementation details on some particular area? Do let me know, either via comments here, or by a tweet.