页面一:
select vendor, application,applicationversion,filename ,fileversion, lastuse from `ASIA`.applications
where INSTR(vendor, "") and INSTR(application , "") and INSTR(filename, "") and lastuse< 1597741978 and lastuse> 1595149978
GROUP BY vendor ,application ,applicationversion ,filename
ORDER BY vendor desc LIMIT 10 OFFSET 0; # 没有join表
页面二:
select vendor ,application ,applicationversion ,fileversion,filename ,lastuse ,host.deviceid , hostname from `ASIA`.applications as app
left join `ASIA`.hostinfo as host on app.deviceid=host.deviceid
where INSTR(hostname, "") and INSTR(vendor, "") and INSTR(application , "") and INSTR(applicationversion, "") and INSTR(filename, "")
GROUP BY vendor ,application ,applicationversion ,filename
ORDER BY vendor desc LIMIT 10 OFFSET 0; # 有join表
问题:第一个页面比第二个页面快,页面二查询太慢了
分析:
页面1聚合:耗时0.19sselect vendor, application,applicationversion,filename ,fileversion, lastuse from `ASIA`.applications where INSTR(vendor, "") and INSTR(application , "") and INSTR(filename, "") and lastuse< 1597741978 and lastuse> 1595149978 GROUP BY vendor ,application ,applicationversion ,filename ORDER BY vendor desc LIMIT 10 OFFSET 0;#条数
select count(*) from `ASIA`.applications where INSTR(vendor, "") and INSTR(application , "") and INSTR(filename, "") and lastuse< 1597741978 and lastuse> 1595149978; +----------+ | count(*) | +----------+ | 28726 | +----------+ 1 row in set (0.05 sec)页面1不聚合:耗时0.07sselect vendor, application,applicationversion,filename ,fileversion, lastuse from `ASIA`.applications where INSTR(vendor, "") and INSTR(application , "") and INSTR(filename, "") and lastuse< 1597741978 and lastuse> 1595149978 ORDER BY vendor desc LIMIT 10 OFFSET 0;
页面2聚合:耗时8.69s ,聚合后耗时突然猛增, 因为left join后聚合的数据条数为1897744,远远高于应用库applications的条数49174select vendor ,application ,applicationversion ,fileversion,filename ,lastuse ,host.deviceid , hostname from `ASIA`.applications as app left join `ASIA`.hostinfo as host on app.deviceid=host.deviceid where INSTR(hostname, "") and INSTR(vendor, "") and INSTR(application , "") and INSTR(applicationversion, "") and INSTR(filename, "") GROUP BY vendor ,application ,applicationversion ,filename ORDER BY vendor desc LIMIT 10 OFFSET 0;#条数
select count(*) from `ASIA`.applications as app join `ASIA`.hostinfo as host on app.deviceid=host.deviceid where INSTR(hostname, "") and INSTR(vendor, "") and INSTR(application , "") and INSTR(applicationversion, "") and INSTR(filename, ""); +----------+ | count(*) | +----------+ | 1897744 | +----------+ 1 row in set (2.10 sec)页面2不聚合:耗时0.15s
select vendor ,application ,applicationversion ,fileversion,filename ,lastuse ,host.deviceid , hostname from `ASIA`.applications as app left join `ASIA`.hostinfo as host on app.deviceid=host.deviceid where INSTR(hostname, "") and INSTR(vendor, "") and INSTR(application , "") and INSTR(applicationversion, "") and INSTR(filename, "") ORDER BY vendor desc LIMIT 10 OFFSET 0;
#应用库中条数49174
mysql> select count(*) from `ASIA`.applications; +----------+ | count(*) | +----------+ | 49174 | +----------+ 1 row in set (0.03 sec)页面2优化:
SELECT vendor ,application ,applicationversion ,fileversion,filename ,lastuse ,host.deviceid , hostname FROM( select deviceid, vendor ,application ,applicationversion ,fileversion,filename ,lastuse from `ASIA`.applications where INSTR(vendor, "") and INSTR(application , "") and INSTR(applicationversion, "") and INSTR(filename, "") GROUP BY vendor ,application ,applicationversion ,filename) as appLEFT JOIN ( select deviceid,hostname from `ASIA`.hostinfo where INSTR(hostname, "") group by deviceid) as hostON app.deviceid=host.deviceidORDER BY vendor desc LIMIT 10 OFFSET 0;![]()
left join后数据条数比原数据表多的原因及解决方法
举例:A 表(1500行)是用户id和对应年龄,B 表(5000行)是用户id 和对应的爱好(如果有多个爱好就会出现多行数据),现在想通过用户id来链接这两张表,使用 left join(A,B)或者 inner join,结果连完发现生成了一张 3000行的表C( 但理论上行数应该小于等于表 A)问题原因:一对多在 R 中发生这种问题,很大原因是两张表拿来连接的字段(比如用户id),在第二张表里面出现了多次,或者在第一张表里也有重复值在 SQL 中发生这种问题,还有一种可能性是因为表 B 行数比表 A 多,所以会导致表 C 行数高于表 A解决办法:1、当两个表数据都很大时,我们用left join进行查询速度就会明显感觉到有点慢了,这是最好的解决办法就是 尽量给每个表都加上where条件去过滤一下数据2、可以明确的是多出来的这些行是重复值,不是什么错误值,所以不需要太慌, 无论是在R 还是 SQL中,发生这种问题都可以直接去重解决在R中的代码,group_by 配合 summarize 去重在SQL中的代码,group_by 去重SELECT * FROM ( SELECT * FROM A GROUP BY id ) AS A1LEFT JOIN ( SELECT * FROM B GROUP BY id ) AS B1ON A1.id = B1.id;
参考:
https://blog.youkuaiyun.com/Yann_YU/article/details/107230066?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-3.channel_param&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-3.channel_param leftjoin/innerjoin以后数据条数比原数据表多的原因及解决办法
本文探讨了使用left join/inner join后数据条数增多的现象,分析了其背后的原因,并提出了解决这种问题的优化策略。通过案例展示了在不聚合情况下,查询效率和数据量的关系,以及如何提升查询速度。

525

被折叠的 条评论
为什么被折叠?



