Marketing Attribution in SQL
You just spent $500 on a newsletter sponsorship, $2k on display ads (or multiply those figures by 100 if you’re a big enterprise).
You need signal on the efficacy of these campaigns as quickly as humanly possible.
Why SQL?
You’re going to test a lot of acquisition channels.
SQL gives you a way to view all your results in one place (as opposed to copy/pasting results from various tools into spreadsheets).
First touch vs last touch?
In this template, we look at "first touch" attribution, i.e. giving 100% of the credit to the initial channel that acquired the user. This post from HIPB2B explains why you might want to consider "last touch" (e.g. you do a lot of re-targeting and want to measure how effectively you bring a user back).
In the query below, just change asc
to desc
and you'll see the UTM associated with the last touch.
SQL you can copy / paste
If you have an events
table with a timestamp
and utm_source
you can use the query below with little modification.
with _events as (
select
*,
row_number() over (partition by distinct_id order by time asc)
from events
where time >= now() - interval '1 week' -- looking only events from the last week (you likely want to go further)
)
select
time at time zone 'utc' at time zone 'pst',
utm_source,
distinct_id,
-- any other columns you’d like
from _events
where row_number = 1
and utm_source = 'your_utm_source'
order by 1 desc;
Output of the CTE _events
:
| time | name | user_id | distinct_id | utm_source | row_number |
|---------------------|---------------------|---------|------------------|------------|------------|
| 2020-03-18 11:28:42 | View Landing Page | 563714 | 0032v5yh0d39g... | google | 1 |
| 2020-03-18 11:44:20 | Sign Up | 563714 | 0032v5yh0d39g... | | 2 |
| 2020-03-18 11:44:34 | Complete Onboarding | 563714 | 0032v5yh0d39g... | | 3 |
| ... | ... | ... | ... | ... | ... |
| 2020-04-17 16:51:42 | View Landing Page | | 0055l1pg9p73h... | mailchimp | 1 |
| 2020-04-30 3:11:50 | View Landing Page | 574991 | 0658o9ki3b22t... | facebook | 1 |
| 2020-04-30 15:17:58 | Sign Up | 574991 | 0658o9ki3b22t... | | 2 |
| 2020-04-30 15:18:19 | Complete Onboarding | 574991 | 0658o9ki3b22t... | | 3 |
| ... | ... | ... | ... | ... | ... |
The CTE in the query above chronologically orders every event of a user. It uses a Window function to do so.
Note how for every distinct_id
, the Window function adds an incrementing row_number
for every event. And at every change in distinct_id
the row_number
restarts.
Output of entire query:
| time | name | user_id | distinct_id | utm_source | utm_medium | row_number |
|---------------------|-------------------|---------|------------------|------------|------------|------------|
| 2020-03-18 11:28:42 | View Landing Page | 563714 | 0032v5yh0d39g... | google | cpc | 1 |
| 2020-04-17 16:51:42 | View Landing Page | | 0055l1pg9p73h... | mailchimp | newsletter | 1 |
| 2020-04-30 3:11:50 | View Landing Page | 574991 | 0658o9ki3b22t... | facebook | cpc | 1 |
| ... | ... | ... | ... | ... | ... | ... |
The latter part of the query simply grabs the first record (i.e. where row_number = 1
), showing you the utm_source
of the "first touch".
Try it yourself?
Run this template against our sample database that mirrors real startup data. See the connection credentials, then connect in PopSQL.
Pro Tips
Two steps to use this template with your own data:
- Use UTMs whenever you share links externally (an awesome guide on UTMs, h/t to @danielmccaw).
- Get your analytics data into your database. Back in 2018, we wrote this guide on how to export Mixpanel data into your DB. Since then, Mixpanel has built a product called Data Pipeline to facilitate.
Using Amplitude, Heap, etc? We’re working on those guides 🙏 (ping us if you want to help write them).
Understand user behavior
Beyond speed, the other benefit of this approach is 360 visibility. This SQL-driven approach let’s you follow a user end-to-end through your product.
Platforms like Google Analytics aggregate/anonymize your traffic; you lose the through-line from acquisition channel to later user behavior in your product. Plus you can’t filter in Google Analytics on the internal customer attributes you’ve created, like Lead Score. As a result, your qualified traffic gets mixed in with you unqualified traffic.
Automating this process
Have you launched a campaign and written this query? With PopSQL you can schedule this query to run daily so you get updates on new user sign-ups via Slack / email. Add a query variable to pass in a different utm_source
per run.
Spread the word
Tweet