PopSQL

How to Calculate Percentiles in BigQuery using Quantiles

You can calculate percentiles in BigQuery using the approx_quantiles function in Standard SQL. To get percentiles, simply ask for 100 quantiles.

select
  percentiles[offset(10)] as p10,
  percentiles[offset(25)] as p25,
  percentiles[offset(50)] as p50,
  percentiles[offset(75)] as p75,
  percentiles[offset(90)] as p90,
from (
  select approx_quantiles(char_length(text), 100) percentiles
  from `bigquery-public-data.hacker_news.full`
);

We're using a BigQuery public dataset on Hacker News in our example above, so you can follow along. To make this analysis real, we bucketed Hacker News posts into percentiles based on their text length.

As of this writing, here's the breakdown of Hacker News text length by percentile:

| p10 | p25 | p50 | p75 | p90 |
|-----|-----|-----|-----|-----|
| 64  | 129 | 256 | 486 | 856 |

In other words if your next Hacker News post is going to be more than 856 characters, you statistically have a lot (too much?) to say 😶.

Since the 50th percentile is the median, this is a helpful way to calculate median in BigQuery.

database icon
SQL editing that just rocks
PopSQL and BigQuery, better together