How to Calculate Percentiles in BigQuery using Quantiles
You can calculate percentiles in BigQuery using the approx_quantiles
function in Standard SQL. To get percentiles, simply ask for 100 quantiles.
select
percentiles[offset(10)] as p10,
percentiles[offset(25)] as p25,
percentiles[offset(50)] as p50,
percentiles[offset(75)] as p75,
percentiles[offset(90)] as p90,
from (
select approx_quantiles(char_length(text), 100) percentiles
from `bigquery-public-data.hacker_news.full`
);
We're using a BigQuery public dataset on Hacker News in our example above, so you can follow along. To make this analysis real, we bucketed Hacker News posts into percentiles based on their text length.
As of this writing, here's the breakdown of Hacker News text length by percentile:
| p10 | p25 | p50 | p75 | p90 |
|-----|-----|-----|-----|-----|
| 64 | 129 | 256 | 486 | 856 |
In other words if your next Hacker News post is going to be more than 856 characters, you statistically have a lot (too much?) to say 😶.
Since the 50th percentile is the median, this is a helpful way to calculate median in BigQuery.