某气象现场一条查询语句,大数据场景下,单个机构查询耗时5分钟以上,需要分析,SQL语句如下:
SELECT t.station_no , t.collect_time_stamp, t.r2020 , t.topFROM ( SELECT t1.station_no , t1.collect_time_stamp, t1.r2020 , ( SELECT COUNT(*) + 1 FROM ( SELECT station_no , --站点编号 collect_time_stamp, --采集时间 bigint r2020 -- 20时降水量 FROM qbfx_env_process_data WHERE r2020 IS NOT NULL AND station_no IN (57253) AND r2020 >0 AND r2020 <99990 AND collect_time_stamp>=19510101000000 AND collect_time_stamp<=20211220000000 AND days >=1211 AND days <=1220 ORDER BY r2020 DESC ) t2 WHERE t2.station_no = t1.station_no AND t2.r2020 > t1.r2020 ) top FROM ( SELECT station_no , collect_time_stamp, r2020 FROM qbfx_env_process_data WHERE r2020 IS NOT NULL AND station_no IN (57253) AND r2020 >0 AND r2020 <99990 AND collect_time_stamp>=19510101000000 AND collect_time_stamp<=20211220000000 AND days >=1211 AND days <=1220 ORDER BY r2020 DESC) t1 ) tWHERE top <=10ORDER BY t.station_no, top , collect_time_stamp ('1',100),('1',101),('1',102),('1',103),('1',104),('1',105),('1',106),('1',107),('1',108),('1',109),('1',110),('1',111),('2',1),('2',2),('2',3),('2',4),('2',5),('2',6);
发现SQL语句中t 和t2临时表,查询完全相同,是同表的不等值关联。
咨询现场业务客户想通过表自关联,通过不等值关联排序得到排名前10的数据。
建议客户通过分析函数解决,rank() over(partition by … order by …) 实现分组排序加编号的建议。
性能对比:
改写前,单个机构排序耗时5分钟以上
改写后,全表按机构分组排序整体耗时在3分钟左右。