Cohort analysis example: engagement
Never use analytics to track information that uniquely identifies a particular person, including their real name, email address or IP. It’s not only against Google Analytics’ terms of service, it’s also a lousy and unnecessary violation of privacy.
Most cohort analysis is based on users grouped by a common date range. We do this to see if their behavior from one period to the next has changed. It’s also possible to group users based on other attributes that they share, such as membership level or achieved goals. The objective is to learn whether users with this attribute tend to achieve our product goals at a significantly different rate than a baseline cohort over time.
What types of data should we track? This depends on the type of product you have and the level of detail you need. Ask yourself: what are the long term attributes of your users that Google Analytics doesn’t provide? Which properties best differentiate your users and are most relevant to your product? What questions are you trying to answer?
|Good examples||total downloads, donated, sign up date, Klout score, gender, membership type, games played, referred friend, test group|
|Bad examples||number of visits, location, browser, referer, number of pageviews, IP address, last name|
Yes, there are exceptions to virtually every one of those examples. Use your judgement. If it’s important for you to know the number of people who started with Internet Explorer last year but are using Chrome this year then go ahead and record the user’s “Initial Browser”, for example.
> An aside: Aren’t there better ways to do this?
Blog example: Guido’s Mosquitos
I find things much easier to understand when looking at a real world situation. Let’s try a quick tutorial showing how you might use cohort analysis in Google Analytics to track engagement. Imagine your product is a blog advocating respect for your friend, the misunderstood mosquito. Your goal for “Guido’s Mosquitos” is to understand how well you retain your readers as well as record a few goals that they might reach on your site. In this case, you need to decide which cohort retention intervals you care about and which goals matter most. Let’s start with something like this:
|Slot 1||Signup date||20111019||Date of user’s first visit|
|Slot 2||Weekly cohort||42||Week of user’s first visit|
|Slot 3||Ebook downloads||3||Number of ebooks downloaded|
|Slot 4||Goal tracking||RefSent||User referred a friend|
It’s a new year and you’re considering adding more ebooks for readers to download from your blog. However, you only want to do so if it’s likely to increase donations. How do you proceed? In this case, the cohort, the group of people you’re most interested in, is made up of users who have downloaded at least x of your ebooks. You don’t care when they started coming to your site, or even how long they stayed, just that they engaged in an activity of interest to you.
|Advanced segment||Match conditions|
|“Cohort: 0 downloads”||Custom var: 3
Matching RegExp: ^0$
|“Cohort: 1 download”||Custom var: 3
Matching RegExp: ^1$
|“Cohort: 2+ downloads”||Custom var: 3
Matching RegExp: ^[2-9]$
With this segmentation you can jump over to an appropriately configured custom report and attempt to answer your initial question. For example, you might try to plot the number of goals achieved (donations) by each of the 3 user segments during the last couple months of the year.
Aak! The abundance of ebooks is killing your business! Ok, not really. This is a rather limited analysis and it’s important that we understand exactly what it says. Looking at the “Cohort: 1 download” segment, for example, the results might be read something like this: 14.49% of users who downloaded exactly 1 ebook made a donation in the last 2 months. These users may have downloaded their one ebook during the analysis period or any time before that.
What we are trying to do is establish a correlation between our test segments (users who download ebooks) and our target goals (in this case, donations). The graph suggests that those who download ebooks are significantly more likely to donate but that those who download 1 ebook are just as likely to donate (if not more) as those who download 2 or more. The graph says nothing about why this is the case. Perhaps each of the downloaded ebooks repeat the same message and you’re boring your audience to tears. I don’t know. A more detailed attribution analysis would be required. But the investigation here should at least make you stop and think: maybe I should investigate this further before adding more ebooks, or perhaps there’s a better way to increase donations (preferably one with more promising data).