How to do Cohort Analysis in Google Analytics

Cohort analysis example: retention

Reading about what to do is really boring so let’s walk through another example. We’ll start with the simplest case: you’re a startup entrepreneur interested in tracking the retention of your users by month. You’ve been struggling to bring people back to your product for a while. After some rather sweet changes to your site in October you’re anxious to see if the stickiness has improved. Are visitors now more compelled to make return visits?

1. Decide what to track

Persistent tracking data is recorded as a string in 1 of 5 custom variables that Google Analytics stores in their __utmv tracking cookie. You can structure your data in whatever representation is most convenient. Two things to keep in mind are that there are limited slots available and you probably want a format that makes it easy to extract the relevant data later. The table below shows a variety of ways you might encode the date that new users arrive. This example shows users who make their first visit to your product on October 19th, 2011.

Example Slot 1 Slot 2
1 October 19
2 20111019 -
3 Y2011M10W42D292 -
4 1142 292

The examples all store the day of the month but this isn’t strictly needed if you only care about monthly tracking. As you can see in options 3 and 4, if you want to group users by day and week you might consider using the day (292) and week (42) index of the year. We’ll use the format in example 2 for this discussion.

2. Write the data

In this step you’ll need to add some JavaScript to your site. LunaMetrics has a nice overview of custom variables and how to set them up for your site.

On each new visit/signup you’ll call _setCustomVar with the appropriate parameters:
Method: _setCustomVar(index, name, value, opt_scope)
Example: _gaq.push(['_setCustomVar', 1, 'Start Date', '20111019', 1]);

When the user arrives for the first time you’ll record their “Start Date” in custom variable 1. The fourth parameter indicates that we are using the slot for visitor-level (cookie) tracking.

3. Segment the results

For each cohort group you’ll likely want to define an advanced segment in Google Analytics to simplify analysis. This may require a regex condition (which are worth learning). Segmenting users by month and year is a little simpler:

Cohort group Segment name Match conditions
All users whose first visit was in 2011 “Cohort: 2011″ Custom var: 1
Containing: 2011
All users whose first visit was in October 2011 “Cohort: October 2011″ Custom var: 1
Containing: 201110

With custom variable 1, your advanced segment condition would look like this in Google Analytics:

Advanced segment for cohort analysis in Google Analytics

Using regular expressions you can get a little more fancy:

Cohort group Segment name Match conditions
All users whose first visit was in the summer of 2011 “Cohort: Summer 2011″ Custom var: 1
Matching RegExp: ^20110[6-8]
All users whose first visit was the week before Christmas “Cohort: Dec 18-24″ Custom var: 1
Matching RegExp: ^201112(1[89]|2[0-4])

4. Analyze with reports

It’s time to pull up a report and look proudly upon our new cohort creations. This is what the Audience Overview screen in Google Analytics would look like after turning on the September, October and November cohort segments:

Success! But we need more than pretty graphs. What, exactly, was the one month retention for these 3 months? Is it getting better? For example, how many of the people who first visited in September returned at least once in the following month (October)?

Let’s start with how many users visited for the first time in September. By definition, that should include all visitors that are within the cohort. According to the first screenshot in this section, the number of uniques in the September cohort is 2233. Those users checked out your product in September but how many came back in October?

Ouch. Only 70 visitors from the September cohort returned at least once in October. The one month retention for September was 70/2233 = 3.1%. Are you OK with that? For most online products that’s a pretty abysmal retention rate. Similarly we can calculate retention values for October and November.

September October November
Visitors 2233 2026 2250
Returned next month 70 122 145
Retention 3.1% 6.0% 6.4%

Engagement and retention often correlate. For example, visitors that interact and spend time with a product tend to return. But not always. Some products, like search engines, naturally tend to have high retention but low engagement. Others may do better at engaging their audience for long periods but don’t see them visit as often. Facebook is an example of a product that both engages and retains its users.

Hey, look at you! Those changes you made in October have doubled your retention! It’s gone from crappy to just plain lousy! Don’t get too excited – you have no idea if your changes were responsible for this rise. At the very least you would want to see if the trend continued over the next few months.

While you’re poking around Google Analytics with your finely sliced cohort segments you may want to explore some of the other reports. For example, how do your cohorts compare based on your conversion goals? What can you learn by applying the segments to your favorite custom reports? There’s a lot of insight potential here. For example, the Audience Overview report that we were just looking at from Aug 14 – Dec 31 shows a couple of engagement metrics (pages/visit and avg time on site). Do those match what you would expect based on the retention values we just calculated above? They might not.

Pulling your cohort data through the Google Analytics API is another option for analysing your results but that’s a bit more work. You may also want to consider using the Google Analytics Data Feed Query Explorer. Nice name, eh? This tool is actually pretty handy for getting exactly the information you need without having to navigate the Google Analytics web interface.

> An aside: Can you trust these numbers?

Can we be confident that the cohort values are accurate? In short: no. Since we’re storing cohort groups in cookies and Google Analytics relies on its own cookies to track unique users, the reported numbers will be directly affected by the cookie deletion rate of your users. The big question, then, is how often do people delete their cookies? It seems to depend on who you ask. I’ve seen online reports that suggest it’s actually pretty rare while others suggest you can expect over 50% of your users to clean their cookies every month. One reason for the discrepancy is that deletion rates can vary by the geography, technical awareness and other attributes of your users as well as the type of product you have and the date range you’re testing. Of course, there are other factors affecting the accuracy of Google Analytics results including how you’ve set it up and whether your users allow JavaScript or run ad blocking browser extensions.
The important thing to understand is the accuracy of unique visitor counts reported by Google Analytics for your product. In my experience, comparing the numbers from Google Analytics to an IP-based analytics solution as well as server logs has been somewhat reassuring. Most tests for unique visitor counts tend to be pretty close, usually within 5%, but I’ve seen errors in excess of 15% depending on the date range and query.


Continue…

About these ads

17 thoughts on “How to do Cohort Analysis in Google Analytics

  1. Pingback: Introduction to Cohort Analysis for Startups | Jonathon Balogh

  2. Pingback: How to learn about your customer behaviour and engagement | Farbey's Notes

  3. Pingback: Quora

  4. Pingback: Quora

  5. Pingback: How Does Mixpanel Compare to Google Analytics? | Jonathon Balogh

  6. Pingback: Cohort analysis in a nutshell | A blog about behavioral economics, crowdfunding, crowdsourcing, gamification and more

  7. Great post. What I feel is missing (in GA, not in your post) is the ability to `get` a visitor-level custom variable, or alternatively, to only set a custom variable if it’s not already set.

    Unless I’m missing something obvious, that would make the implementation of cohort analysis in GA – specifically, setting the starting date – way easier. I guess you could dive into the cookies, but this sounds awful to me.

      • I wanted to only set a visitor-level customer variable if it was not already set. This way I have start date for my first time visitors.

        What I did was check if my visitor-level variable “Start Date” was set and if it wasn’t I would appoint the day’s date to this variable.

        The _gaq.push looks like this:
        _gaq.push(function() {
        var pageTracker = _gat._getTrackerByName(); // Gets the default tracker.
        var VisitorCustomVar = pageTracker._getVisitorCustomVar(1);
        var push_content = ['_setCustomVar', 1, 'Start Date', '{{ analytics_startdate }}', 1];

        if (!VisitorCustomVar) {
        return push_content;
        }

        return false;
        });

        We get the visitor custom variable with the _getVisitorCustomVar(1) function where 1 is the index of our custom variable. Then we check if this variable already exist and if it doesn’t we set a new variable(through the push_content array)

  8. Pingback: Metrics | Annotary

  9. Pingback: The Scrappy Mofo’s Guide to Advanced Segmentation | iAcquire Blog

  10. Pingback: Google Analyticsを使ったコホート分析で施策の有効性を検証する | @takatama_jp

  11. Pingback: The Scrappy Mofo’s Guide to Advanced Segmentation »

  12. You can enjoy the GTA gameplay in an online community with other players.

    CIRCLE, X, L1, CIRCLE, CIRCLE, L1, CIRCLE, R1, R2, L2, L1, L1
    : Spawn Hunter. These levels dictate the notoriety of the character and how prevalent the authorities will be.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s