How to Use Lateral Joins in PostgreSQL

Once upon a time, my queries were a mess. I didn’t know how to use lateral joins, so I would copy-and-paste the same calculations over and over again in my queries.

Co-workers were starting to talk.

Lateral joins allow you to reuse calculations, making your queries neat and legible. Let's learn about lateral joins by rewriting an atrocious query together.

Data Set

We'll use a cool sample dataset of real Kickstarter projects, if you'd like to follow along.

Relevant columns: alt text

For each Kickstarter project, we want to calculate:

  • total pledged in USD
  • average pledge in USD
  • USD over or under goal
  • duration of the project in days
  • daily shortfall / surplus, the extra USD needed daily to hit goal

Queries, Before and After

Before:

select
    (pledged / fx_rate) as pledged_usd,
    (pledged / fx_rate) / backers_count as avg_pledge_usd,
    (goal / fx_rate) - (pledged / fx_rate) as amt_from_goal,
    (deadline - launched_at) / 86400.00 as duration,
    ((goal / fx_rate) - (pledged / fx_rate)) / ((deadline - launched_at) / 86400.00) as usd_needed_daily
from kickstarter_data;

Without lateral joins, see how often I reuse the same calculations: alt text

Yuck. Not only does this make the query difficult to read, it introduces risk of typos or other errors if I ever need to make an update.

After:

select
    pledged_usd,
    avg_pledge_usd,
    amt_from_goal,
    duration,
    (usd_from_goal / duration) as usd_needed_daily
from kickstarter_data,
    lateral (select pledged / fx_rate as pledged_usd) pu
    lateral (select pledged_usd / backers_count as avg_pledge_usd) apu
    lateral (select goal / fx_rate as goal_usd) gu
    lateral (select goal_usd - pledged_usd as usd_from_goal) ufg
    lateral (select (deadline - launched_at)/86400.00 as duration) dr;

With lateral joins, I can define the calculation just once. I can then reference those calculations in other parts of my query.

What's happening?

The lateral keyword allows us to access columns after the FROM statement, and reference these columns "earlier" in the query ("earlier" meaning "written higher in the query").

SQL queries run in a different order than you might expect. In fact, FROM and JOIN are the first statements run. Therefore it's no problem to reference columns after the FROM statement.

alt text

Image Credit: Julia Evans

Word of warning: stick to simple mathematical operations when writing lateral joins for calculations. Aggregate functions like COUNT(), AVG(), or SUM() are not supported.

Happy querying! 🍭

Previous

Next

Ready for a modern SQL editor?