Sometimes, when you’re buried in data, statistics, graphs and reports, analytics work can feel a tad dry. Personally, I tolerate creating reports (generally by automating them) but find analysis (identifying why the data is the way it is) rather compelling. In this first of several posts on cohort analysis I’m going to explore why dividing your visitors into cohorts is the fastest path to the insight you need to answer the tough “why” questions about your data.
Is your product getting better?
How do you know? It’s easy to see when the data for your game, blog or service has changed but how do you know if it’s due to your latest “improvements”? Take a look at the graph below from Google Analytics:
Assume that this plot shows the number of visits in which people achieved some significant goal with your product – say, downloading a white paper from your blog. Are you getting more downloads? Clearly. Let’s ask another question. You made several key changes to your site in August aimed at increasing the download rate. Did they help? Are new visitors more likely to start this download now than they were before?
Contrary to appearances, the above graph tells you almost nothing about the effect of your changes. If the traffic to your startup also increased during this period then how would you know how much of the increase above is due to growth and how much is due to your changes, if any? How can we tell if we’re making progress? There are many ways to improve this analysis (segmentation, funnels and split tests come to mind) but the best place to start is to separate changes in user behavior from product growth using cohort analysis.
What’s a cohort?
Cohort studies, sometimes referred to as panel studies or longitudinal studies, focus on the activities of a cohort group. A cohort is basically a group of people who share something in common. They might have the same height, birth year, or even the same vaccination history. For an online startup company, cohort analysis usually involves clustering users by the day, week or month in which they first start using your product.
Cohort definitions aside, their real value comes in allowing us to compare the retention and engagement of user groups against each other to ensure that changes made during each period have a positive impact on our product.
For example, assume we use Time on Site as our core engagement metric. Looking back over our data for the past couple months, we might find that the Week 3 cohort has consistently remained more engaged (stayed significantly longer on each visit) than other weekly cohorts. In doing so, we aren’t necessarily saying that we saw more users who were engaged during week 3. Instead, we’re saying that those users who joined in week 3 have, over the last couple months, proven to spend more time with our startup product on each visit than other weekly cohorts.
Likewise, looking at cohort retention, we might see that the Week 3 cohort has remained stubbornly loyal to our humble product offering.
Growth is great but the goal here is to isolate the data in such a way that we focus on factors that improve the product. A vanity metric, like number of visits, is interesting but it tells you nothing about why your data looks this way or how you might improve it in the future. Our aim is to focus on startup analytics with actionable metrics that offer real insights and help us make decisions
At this point we aren’t trying to answer why we’ve seen the change. There are many possibilities apart from product improvements. Our first priority is to recognize that a change in our startup metrics has occurred and isolate it. Armed with this information we can then attempt to correlate positive and negative changes with product modifications. Did you make a product alteration that only the Week 3 cohort would be exposed to? Did you change your advertising that week? Does Week 3 correspond with any major holidays or events?
The great thing about customer cohort analysis in web analytics is that each new group provides the opportunity to start with a fresh set of users. This allows us to passively identify differences in our cohort metrics after the fact or actively segment out controlled cohort test groups for more directed analysis. We can now, on a regular basis, focus on how well we engage with our audience independent of how much we’re growing.