Finding Customers at Risk of Churning
Customers don’t simply love your product one minute and fall out of love the next. Normally there’s a declining trend in usage of your product over time.
A single proactive outreach can reverse that downward trend. The challenge: efficiently finding customers with declining usage.
Why SQL?
With just graphing tools, it’s near impossible to spot trends in customer usage. It’s neon spaghetti:
And examining customers one by one? That would take ages.
Using statistical functions
PostgreSQL’s statistical functions help you rapidly sift through the noise. We explore these functions further in our template on Linear Regression in SQL. Below we’ll use just the regr_slope()
function:
| Function | Argument Type | Return Type | Description |
|------------------|------------------|------------------|-------------------------------------------------------------------------------|
| regr_slope(Y, X) | double precision | double precision | slope of the least-squares-fit linear equation determined by the (X, Y) pairs |
This function first creates a trendline that fits your data. It then tells you the slope of that line (your algebra teacher was right! You will use this stuff in your real job 🤓 ).
What regr_slope() is doing:
SQL you can copy / paste
Here’s the query (we’ll break it down below):
-- DATA PREP
with action_data AS (
select
extract('week' from time) as week, -- quick hack to turn week into a numeric so we can use regr_slope() function
team_id, -- we're looking at team usage
(count(name) / count(distinct user_id))::numeric as actions_per_user -- we want to see actions per user to normalize
from events
where time between '{{start_date}}' and '{{end_date}}' -- your date range, make sure your start_date is a Monday and end_date is a Sunday
group by 1,2
)
-- FINDING SLOPE OF ALL CUSTOMER USAGE TRENDLINES
select
team_id,
count(week) as weeks_considered,
round(regr_slope(actions_per_user,week)::numeric,2) as slope
from action_data
group by 1
having count(week) >= 5 -- let's say we want at least 5 weeks of data
and regr_slope
Data prep
The first part of the query in the CTE action_data
is just data prep.
The extract()
function turns our timestamp into a numeric, as regr_slope()
doesn’t accept timestamps as parameters. This hack works great except at the change of calendar years (workaround at the bottom of the page).
For the extract()
function, a week goes from Monday to Sunday. Avoid partial weeks by starting your date range with a Monday and ending on a Sunday 👍
The output of the CTE action_data
looks like this:
| week | team_id | actions_per_user |
|------|---------|------------------|
| 12 | 93336 | 28 |
| 13 | 93336 | 6 |
| 14 | 93336 | 10 |
| ... | ... | ... |
| 12 | 92982 | 26 |
| 13 | 92982 | 1 |
| 14 | 92982 | 2 |
| ... | ... | ... |
Finding slope of all customer usage trendlines
In the second part of the query, the regr_slope()
creates a trendline between actions_per_user
and week
for each team, then returns the slope of that trendline. Again, visualized:
If the slope is negative, then the usage trend is negative. The more negative the output, the steeper the decline in their usage. We added the weeks_considered
column to ensure we had enough data points to see a trend. You can see that in the output:
| team_id | weeks_considered | slope |
|---------|------------------|-------|
| 97003 | 5 | -5.70 |
| 77503 | 9 | -4.93 |
| 95535 | 5 | -4.23 |
| 92982 | 5 | -3.11 |
| … | ... | … |
☎️ Contact the customers with the worst slope! Their usage of your product is plummeting. If they had a high Lead Score, you could be letting great customers slip away!
Seeing the SQL
Once more, to help you visualize what PostgreSQL is doing, we've graphed the trends for each of these teams in the output above:
If you'd like to see an individual customer's behavior, here’s the query we used:
select
extract('week' from DATE_TRUNC('week',time)) as week,
(COUNT(name) / COUNT(distinct user_id))::integer as actions_per__user
from events
where team_id = 97003 -- or 77503, 95535, 92982
group by 1;
Try it yourself?
Run this template against our sample database that mirrors real startup data. See the connection credentials, then connect in PopSQL.
Bonus Content
- The inverse is also true: customers with strong positive slopes are growing 📈
- Here's the aforementioned workaround if your data spans a changing calendar year involves casting from integer to text and back to integer. But it works!
with action_data AS (
select
(extract('year' from time)::text || extract('week' from time)::text)::integer as yearweek,
... -- rest of query continues as above
Spread the word
Tweet