Cohort analysis example: retention
Reading about what to do is really boring so let’s walk through another example. We’ll start with the simplest case: you’re a startup entrepreneur interested in tracking the retention of your users by month. You’ve been struggling to bring people back to your product for a while. After some rather sweet changes to your site in October you’re anxious to see if the stickiness has improved. Are visitors now more compelled to make return visits?
1. Decide what to track
Persistent tracking data is recorded as a string in 1 of 5 custom variables that Google Analytics stores in their __utmv tracking cookie. You can structure your data in whatever representation is most convenient. Two things to keep in mind are that there are limited slots available and you probably want a format that makes it easy to extract the relevant data later. The table below shows a variety of ways you might encode the date that new users arrive. This example shows users who make their first visit to your product on October 19th, 2011.
|Example||Slot 1||Slot 2|
The examples all store the day of the month but this isn’t strictly needed if you only care about monthly tracking. As you can see in options 3 and 4, if you want to group users by day and week you might consider using the day (292) and week (42) index of the year. We’ll use the format in example 2 for this discussion.
2. Write the data
On each new visit/signup you’ll call _setCustomVar with the appropriate parameters:
Method: _setCustomVar(index, name, value, opt_scope)
Example: _gaq.push(['_setCustomVar', 1, 'Start Date', '20111019', 1]);
When the user arrives for the first time you’ll record their “Start Date” in custom variable 1. The fourth parameter indicates that we are using the slot for visitor-level (cookie) tracking.
3. Segment the results
For each cohort group you’ll likely want to define an advanced segment in Google Analytics to simplify analysis. This may require a regex condition (which are worth learning). Segmenting users by month and year is a little simpler:
|Cohort group||Segment name||Match conditions|
|All users whose first visit was in 2011||“Cohort: 2011”||Custom var: 1
|All users whose first visit was in October 2011||“Cohort: October 2011”||Custom var: 1
With custom variable 1, your advanced segment condition would look like this in Google Analytics:
Using regular expressions you can get a little more fancy:
|Cohort group||Segment name||Match conditions|
|All users whose first visit was in the summer of 2011||“Cohort: Summer 2011”||Custom var: 1
Matching RegExp: ^20110[6-8]
|All users whose first visit was the week before Christmas||“Cohort: Dec 18-24”||Custom var: 1
Matching RegExp: ^201112(1|2[0-4])
4. Analyze with reports
It’s time to pull up a report and look proudly upon our new cohort creations. This is what the Audience Overview screen in Google Analytics would look like after turning on the September, October and November cohort segments:
Success! But we need more than pretty graphs. What, exactly, was the one month retention for these 3 months? Is it getting better? For example, how many of the people who first visited in September returned at least once in the following month (October)?
Let’s start with how many users visited for the first time in September. By definition, that should include all visitors that are within the cohort. According to the first screenshot in this section, the number of uniques in the September cohort is 2233. Those users checked out your product in September but how many came back in October?
Ouch. Only 70 visitors from the September cohort returned at least once in October. The one month retention for September was 70/2233 = 3.1%. Are you OK with that? For most online products that’s a pretty abysmal retention rate. Similarly we can calculate retention values for October and November.
|Returned next month||70||122||145|
Engagement and retention often correlate. For example, visitors that interact and spend time with a product tend to return. But not always. Some products, like search engines, naturally tend to have high retention but low engagement. Others may do better at engaging their audience for long periods but don’t see them visit as often. Facebook is an example of a product that both engages and retains its users.
Hey, look at you! Those changes you made in October have doubled your retention! It’s gone from crappy to just plain lousy! Don’t get too excited – you have no idea if your changes were responsible for this rise. At the very least you would want to see if the trend continued over the next few months.
While you’re poking around Google Analytics with your finely sliced cohort segments you may want to explore some of the other reports. For example, how do your cohorts compare based on your conversion goals? What can you learn by applying the segments to your favorite custom reports? There’s a lot of insight potential here. For example, the Audience Overview report that we were just looking at from Aug 14 – Dec 31 shows a couple of engagement metrics (pages/visit and avg time on site). Do those match what you would expect based on the retention values we just calculated above? They might not.
Pulling your cohort data through the Google Analytics API is another option for analysing your results but that’s a bit more work. You may also want to consider using the Google Analytics Data Feed Query Explorer. Nice name, eh? This tool is actually pretty handy for getting exactly the information you need without having to navigate the Google Analytics web interface.
> An aside: Can you trust these numbers?