clickhouse（二、高阶函数应用retention计算留存数）

最新推荐文章于 2025-05-17 18:25:25 发布

原创最新推荐文章于 2025-05-17 18:25:25 发布 · 7k 阅读

10 ·

CC 4.0 BY-SA版权

文章标签：

#retention #clickhouse #高阶函数 #留存率

大数据同时被 2 个专栏收录

73 篇文章

订阅专栏

clickhouse

16 篇文章

订阅专栏

博客介绍了ClickHouse的retention函数用于计算用户留存率，也可用于其他需求。建表导入数据后进行查询，发现若retention函数第一个表达式不满足，后续结果为0，可将其设为1避免。还指出该函数计算结果是Array类型，可按需利用，如计算某天用户量。

文章目录

前言

clickhouse提供retention(cond1, cond2, …)函数方便计算用户留存率，当然也可以应用在其他需求上。

准备

建表

CREATE TABLE login_log -- 用户登录日志
(
uid Int32,  -- 用户唯一id
login_time DateTime -- 用户登录时间
)  
ENGINE = MergeTree PARTITION BY uid order by uid;

导数

-- 随意发挥就好
insert into login_log 
values
(1101,toDateTime('2019-06-21 22:00:00')),
(1101,toDateTime('2019-06-20 22:00:00')),
(1101,toDateTime('2019-06-19 22:00:00')),
(1101,toDateTime('2019-06-17 22:00:00')),
(1101,toDateTime('2019-06-16 22:00:00')),
(1101,toDateTime('2019-06-21 22:00:00')),
(1101,toDateTime('2019-06-21 23:00:00')),
(1101,toDateTime('2019-06-20 23:00:00')),
(1101,toDateTime('2019-06-19 23:00:00')),
(1101,toDateTime('2019-06-17 23:00:00')),
(1101,toDateTime('2019-06-16 23:00:00')),
(1101,toDateTime('2019-06-21 23:00:00')),
(4101,toDateTime('2019-06-20 22:00:00')),
(4101,toDateTime('2019-06-19 22:00:00')),
(4101,toDateTime('2019-06-17 22:00:00')),
(4101,toDateTime('2019-06-16 22:00:00')),
(2201,toDateTime('2019-06-21 22:00:00')),
(2201,toDateTime('2019-06-20 22:00:00')),
(2201,toDateTime('2019-06-21 23:00:00')),
(2201,toDateTime('2019-06-20 23:00:00')),
(3301,toDateTime('2019-06-21 22:00:00')),
(3301,toDateTime('2019-06-19 22:00:00')),
(3301,toDateTime('2019-06-18 23:00:00')),
(3301,toDateTime('2019-06-17 23:00:00')),
(3301,toDateTime('2019-06-16 23:00:00')),
(3301,toDateTime('2019-06-15 23:00:00'));

查询

-- 查询用户在2019-06-17 至 2019-06-21 号登录情况
select 
uid,
retention(
    toDate(login_time) = '2019-06-21',
    toDate(addDays(login_time,1)) = '2019-06-21',
    toDate(addDays(login_time,2)) = '2019-06-21',
    toDate(addDays(login_time,3)) = '2019-06-21',
    toDate(addDays(login_time,4)) = '2019-06-21'
    ) as r
from login_log ll 
group by ll.uid

结果：r数组表示每天登录情况，1/0 => 登录/未登录，
-----------------
┌──uid─┬─r───────────┐
│ 1101 │ [1,1,1,0,1] │
│ 2201 │ [1,1,0,0,0] │
│ 3301 │ [1,0,1,1,1] │
│ 4101 │ [0,0,0,0,0] │
└──────┴─────────────┘
4 rows in set. Elapsed: 0.014 sec.

值得注意的是uid=4101的用户，明明有登录记录，但数据全是0。这是因为当retention(cond1, cond2, …)函数中，第一个表达cond1如果不满足，那么后面所有的结果就都是0。
4101的用户2019-06-21当天没登录，所有他后面的数据全部为0，那如果不想出现这种情况怎么做了，既然cond是条件判断，所有只需要将第一个表达式cond1直接设置成1即可。