别名的使用是按照其sql读取顺序命名的,而非sql语句文字的顺序。
比如这样一个sql语句,看某类应用的点击日活:
select pkg, count(pkg) from (
select distinct(imei), params["pkgName"] as pkg from tableA a
join tableB b
on a.params["pkgName"] = b.game_name
where day = "2020-01-15"
) t
group by pkg
应用的名字的在 params["pkgName"]中。在子查询中,params["pkgName"]的名字被改为pkg,然后在外层都变成了pkg(select 和 group by 那里)。而不应该在外层把params["pkgName"]声明为pkg。
(但在join里似乎不能用别名?a.pkg = b.game_name 报错 Invalid column reference 'pkg')
值得一提是,在最后的结果里两列的列名是pkg和_c1,前者说明这个化名被沿用下去了(后者呢?原sql里没有给它命名)
再比如
select * from
(
select c.imei2 as imei2, c.tbat_list as tbat_list, c.lng as lng, c.lat as lat, ROW_NUMBER() over( partition by imei2 order by abs(a_start_time - b_start_time) ) as differ_rank
from (
select a.device_id_new as imei2, a.params['tbat'] as tbat_list, b.params['lng'] as lng, b.params['lat'] as lat, a.start_time as a_start_time, b.start_time as b_start_time
from AAAA a
join BBBB b on a.device_id_new = b.imei
and a.dt = "2021-07-18"
and b.day = "2021-07-18"
and a.event_label = "CCCC"
where a.start_time > b.start_time - 10 * 60 * 1000
and a.start_time < b.start_time + 10 * 60 * 1000
and length(a.params['tbat']) > 4
)c
)t
where t.differ_rank = 1;
可以看到对于每个子查询都有一个别名(c、t),同时需要对表的列名进行显式化的定义别名(如a.device_id_new as imei2)。