Announcing our $3.4M seed round 🚀 Read more →

Marketing Attribution in SQL

You just spent $500 on a newsletter sponsorship, $2k on display ads (or multiply those figures by 100 if you’re a big enterprise).

You need signal on the efficacy of these campaigns as quickly as humanly possible.

Why SQL?

You’re going to test a lot of acquisition channels.

SQL gives you a way to view all your results in one place (as opposed to copy/pasting results from various tools into spreadsheets).

First touch vs last touch?

In this template, we look at "first touch" attribution, i.e. giving 100% of the credit to the initial channel that acquired the user. This post from HIPB2B explains why you might want to consider "last touch" (e.g. you do a lot of re-targeting and want to measure how effectively you bring a user back).

In the query below, just change asc to desc and you'll see the UTM associated with the last touch.

SQL you can copy / paste

If you have an events table with a timestamp and utm_source you can use the query below with little modification.

with _events as (
 select
   *,
   row_number() over (partition by distinct_id order by time asc)
 from events
 where time >= now() - interval '1 week' -- looking only events from the last week (you likely want to go further)
)

select
  time at time zone 'utc' at time zone 'pst',
  utm_source,
  distinct_id,
-- any other columns you’d like
from _events
where row_number = 1
  and utm_source = 'your_utm_source'
order by 1 desc;

Output of the CTE _events:

| time                | name                | user_id | distinct_id      | utm_source | row_number |
|---------------------|---------------------|---------|------------------|------------|------------|
| 2020-03-18 11:28:42 | View Landing Page   | 563714  | 0032v5yh0d39g... | google     | 1          |
| 2020-03-18 11:44:20 | Sign Up             | 563714  | 0032v5yh0d39g... |            | 2          |
| 2020-03-18 11:44:34 | Complete Onboarding | 563714  | 0032v5yh0d39g... |            | 3          |
| ...                 | ...                 | ...     | ...              | ...        | ...        |
| 2020-04-17 16:51:42 | View Landing Page   |         | 0055l1pg9p73h... | mailchimp  | 1          |
| 2020-04-30 3:11:50  | View Landing Page   | 574991  | 0658o9ki3b22t... | facebook   | 1          |
| 2020-04-30 15:17:58 | Sign Up             | 574991  | 0658o9ki3b22t... |            | 2          |
| 2020-04-30 15:18:19 | Complete Onboarding | 574991  | 0658o9ki3b22t... |            | 3          |
| ...                 | ...                 | ...     | ...              | ...        | ...        |

The CTE in the query above chronologically orders every event of a user. It uses a Window function to do so.

Note how for every distinct_id, the Window function adds an incrementing row_number for every event. And at every change in distinct_id the row_number restarts.

Output of entire query:

| time                | name              | user_id | distinct_id      | utm_source | utm_medium | row_number |
|---------------------|-------------------|---------|------------------|------------|------------|------------|
| 2020-03-18 11:28:42 | View Landing Page | 563714  | 0032v5yh0d39g... | google     | cpc        | 1          |
| 2020-04-17 16:51:42 | View Landing Page |         | 0055l1pg9p73h... | mailchimp  | newsletter | 1          |
| 2020-04-30 3:11:50  | View Landing Page | 574991  | 0658o9ki3b22t... | facebook   | cpc        | 1          |
| ...                 | ...               | ...     | ...              | ...        | ...        | ...        |

The latter part of the query simply grabs the first record (i.e. where row_number = 1), showing you the utm_source of the "first touch".

Try it yourself?

Run this template against our sample database that mirrors real startup data. See the connection credentials, then connect in PopSQL.

Pro Tips

Two steps to use this template with your own data:

  1. Use UTMs whenever you share links externally (an awesome guide on UTMs, h/t to @danielmccaw).
  2. Get your analytics data into your database. Back in 2018, we wrote this guide on how to export Mixpanel data into your DB. Since then, Mixpanel has built a product called Data Pipeline to facilitate.

    Using Amplitude, Heap, etc? We’re working on those guides 🙏 (ping us if you want to help write them).

Understand user behavior

Beyond speed, the other benefit of this approach is 360 visibility. This SQL-driven approach let’s you follow a user end-to-end through your product.

Platforms like Google Analytics aggregate/anonymize your traffic; you lose the through-line from acquisition channel to later user behavior in your product. Plus you can’t filter in Google Analytics on the internal customer attributes you’ve created, like Lead Score. As a result, your qualified traffic gets mixed in with you unqualified traffic.

Automating this process

Have you launched a campaign and written this query? With PopSQL you can schedule this query to run daily so you get updates on new user sign-ups via Slack / email. Add a query variable to pass in a different utm_source per run.

Ready for a modern SQL editor?