You can calculate percentiles in BigQuery using the
approx_quantiles function in Standard SQL. To get percentiles, simply ask for 100 quantiles.
select percentiles[offset(10)] as p10, percentiles[offset(25)] as p25, percentiles[offset(50)] as p50, percentiles[offset(75)] as p75, percentiles[offset(90)] as p90, from ( select approx_quantiles(char_length(text), 100) percentiles from `bigquery-public-data.hacker_news.full` );
We're using a BigQuery public dataset on Hacker News in our example above, so you can follow along. To make this analysis real, we bucketed Hacker News posts into percentiles based on their text length.
As of this writing, here's the breakdown of Hacker News text length by percentile:
| p10 | p25 | p50 | p75 | p90 | |-----|-----|-----|-----|-----| | 64 | 129 | 256 | 486 | 856 |
In other words if your next Hacker News post is going to be more than 856 characters, you statistically have a lot (too much?) to say 😶.
Since the 50th percentile is the median, this is a helpful way to calculate median in BigQuery.