BigQuery数据存储与访问优化指南
1. 近似计数方法
在处理数据时,我们常常需要计算不同值的数量。对于伦敦自行车租赁数据,有两种方法可以计算不同起始站、终点站和站点的数量。
1.1 HLL函数方法
SELECT
HLL_COUNT.MERGE(hll_start) AS distinct_start
, HLL_COUNT.MERGE(hll_end) AS distinct_end
, HLL_COUNT.MERGE(hll_both) AS distinct_station
FROM sketch, UNNEST([hll_start, hll_end]) AS hll_both
执行结果如下:
| Row | distinct_start | distinct_end | distinct_station |
| — | — | — | — |
| 1 | 880 | 882 | 882 |
1.2 APPROX_COUNT_DISTINCT方法
SELECT
APPROX_COUNT_DISTINCT(start_station_name) AS distinct_start
, APPROX_COUNT_DISTINCT(end_station_name) AS distinct_end
, APPROX_COUNT_DISTINCT(both_stations) AS distinct_stati
超级会员免费看
订阅专栏 解锁全文
52

被折叠的 条评论
为什么被折叠?



