Running a Funnel Analysis in SQL
You’ve spent a ton of effort / money to get users to explore your product. Are there chasms in your product where visitors simply fall off?
A funnel analysis gives clarity.
We initially used Google Analytics for funnel analyses. It was... lacking.
We wanted to talk to users who fell off at a particular step... but we couldn’t since Google Analytics aggregates and anonymizes. We wanted to see a funnel for our most qualified users by passing in an internally calculated metric like a Lead Score. No luck.
SQL empowers you to get a 360° understanding of how customers use your product.
SQL you can copy / paste
First step: define the stages of your funnel
Inspired by the Pirate Metrics framework, in this example, we’ll segment our funnel steps as:
Second step: follow the same users through your funnel
Many other funnel analysis templates make a big mistake: they independently calculate each step, then lump them all together.
To illustrate the problem, you could have activations in July 2020 who first visited and signed up in September 2019.
These 2019 users experienced a totally different landing page or product -- a different funnel altogether! If you include these historical user actions, you run the risk of artificially inflating all your funnel conversion metrics. That can be a big problem if you’re using these conversion metrics in any future models.
Here's the query
If you have an
events table with a
user_id you can use the queries below with little modification.
This query looks complex, but we break it down inline:
-- VISITORS (DEFINES THE GROUP WE FOLLOW THROUGH THE FUNNEL) with visitors as ( select distinct_id, -- effectively a user_id min(time) as min_time -- gets the earliest Visit for each person from events where name = 'View Landing Page' group by 1 having min(time) between '2020-04-01' and '2020-05-31' -- selects people whose first visit is in this time range ), -- SIGN-UPS (FROM THE VISITORS ABOVE) sign_ups as ( select distinct e.distinct_id from visitors v -- ensures we only look at the Visitors defined above inner join events e on e.distinct_id = v.distinct_id where e.name = 'Sign Up' -- an internal event that defines sign-up ), -- ACTIVATIONS (FROM THE SIGN-UPS ABOVE) activations as ( select distinct e.distinct_id from sign_ups s -- ensures we only look at the Signups defined above inner join events e on e.distinct_id = s.distinct_id where e.name = 'New Canvas' ), -- PURCHASES (FROM THE ACTIVATIONS ABOVE) purchases as ( select distinct e.distinct_id from activations a -- ensures we only look at the Activations defined above inner join events e on e.distinct_id = a.distinct_id where e.name = 'Start Subscription' ) select 'Visit' as step, COUNT(*) from visitors union -- joins the output of queries together (as long as they have the same columns) select 'Sign Up' as step, COUNT(*) from sign_ups union select 'Activate' as step, COUNT(*) from activations union select 'Purchase' as step, COUNT(*) from purchases order by count desc -- applies to the whole result set ;
| Step | Count | |-------------|-------| | Visit | 593 | | Sign Up | 142 | | Activations | 127 | | Purchases | 6 |
Calculating drop-off at each step
You’ll likely want to know where the drop-off between steps is the worst. You can use the
lag() function (docs) to compare the output of each step against the one prior:
-- query is identical to that above. We add a CTE called steps so we can do one further analysis steps as ( select 'Visit' as step, COUNT(*) from visitors union select 'Sign Up' as step, COUNT(*) from sign_ups union select 'Activate' as step, COUNT(*) from activations union select 'Purchase' as step, COUNT(*) from purchases order by count desc ) select step, count, lag(count, 1) over (), from steps;
| Step | Count | lag | |----------|-------|-----| | Visit | 593 | | | Sign Up | 142 | 593 | | Activate | 127 | 142 | | Purchase | 6 | 127 |
We add a line below to calculate 1 - (
lag) for each step to get
drop_off. To illustrate, for the Sign Up step:
1 - (142 / 593) = 0.76, i.e. a 76% drop off.
-- query is identical to above, just adding one extra line to calculate ∆ between Lag and Count columns of each step select step, count, lag(count, 1) over (), round((1.0 - count::numeric/lag(count, 1) over ()),2) as drop_off from steps;
| Step | Count | lag | drop_off | |----------|-------|-----|----------| | Visit | 593 | | | | Sign Up | 142 | 593 | 0.76 | | Activate | 127 | 142 | 0.11 | | Purchase | 6 | 127 | 0.95 |
This query suggests we focus on the drop-off from Visit to Sign Up. The drop-off is substantial and it's near the top of our funnel. Any improvement at the top of the funnel positively impacts all down-funnel metrics.
Try it yourself?
Run this template against our sample database that mirrors real startup data. See the connection credentials, then connect in PopSQL.
Spread the wordTweet