How to Use Lateral Joins in PostgreSQL
Once upon a time, my queries were a mess. I didn’t know how to use lateral joins, so I would copy-and-paste the same calculations over and over again in my queries.
Co-workers were starting to talk.
Lateral joins allow you to reuse calculations, making your queries neat and legible. Let's learn about lateral joins by rewriting an atrocious query together.
Data Set
We'll use a cool sample dataset of real Kickstarter projects, if you'd like to follow along.
Relevant columns:
For each Kickstarter project, we want to calculate:
- total pledged in USD
- average pledge in USD
- USD over or under goal
- duration of the project in days
- daily shortfall / surplus, the extra USD needed daily to hit goal
Queries, Before and After
Before:
select
(pledged / fx_rate) as pledged_usd,
(pledged / fx_rate) / backers_count as avg_pledge_usd,
(goal / fx_rate) - (pledged / fx_rate) as amt_from_goal,
(deadline - launched_at) / 86400.00 as duration,
((goal / fx_rate) - (pledged / fx_rate)) / ((deadline - launched_at) / 86400.00) as usd_needed_daily
from kickstarter_data;
Without lateral joins, see how often I reuse the same calculations:
Yuck. Not only does this make the query difficult to read, it introduces risk of typos or other errors if I ever need to make an update.
After:
select
pledged_usd,
avg_pledge_usd,
amt_from_goal,
duration,
(usd_from_goal / duration) as usd_needed_daily
from kickstarter_data,
lateral (select pledged / fx_rate as pledged_usd) pu
lateral (select pledged_usd / backers_count as avg_pledge_usd) apu
lateral (select goal / fx_rate as goal_usd) gu
lateral (select goal_usd - pledged_usd as usd_from_goal) ufg
lateral (select (deadline - launched_at)/86400.00 as duration) dr;
With lateral joins, I can define the calculation just once. I can then reference those calculations in other parts of my query.
What's happening?
The lateral
keyword allows us to access columns after the FROM
statement, and reference these columns "earlier" in the query ("earlier" meaning "written higher in the query").
SQL queries run in a different order than you might expect. In fact, FROM
and JOIN
are the first statements run. Therefore it's no problem to reference columns after the FROM
statement.
Image Credit: Julia Evans
Word of warning: stick to simple mathematical operations when writing lateral joins for calculations. Aggregate functions like COUNT()
, AVG()
, or SUM()
are not supported.
Happy querying! 🍭
Previous
How to Calculate Percentiles