left join/inner join以后数据条数比原数据表多的原因及解决办法

本文探讨了使用left join/inner join后数据条数增多的现象,分析了其背后的原因,并提出了解决这种问题的优化策略。通过案例展示了在不聚合情况下,查询效率和数据量的关系,以及如何提升查询速度。
 
页面一:
select vendor, application,applicationversion,filename ,fileversion, lastuse from `ASIA`.applications
where INSTR(vendor, "") and  INSTR(application , "") and  INSTR(filename, "") and lastuse< 1597741978 and lastuse> 1595149978
GROUP BY vendor ,application ,applicationversion ,filename
ORDER BY vendor desc LIMIT 10 OFFSET 0;                           # 没有join表

页面二:

select vendor ,application ,applicationversion ,fileversion,filename ,lastuse ,host.deviceid , hostname from `ASIA`.applications as app
left join  `ASIA`.hostinfo as host on app.deviceid=host.deviceid
where INSTR(hostname, "") and INSTR(vendor, "") and  INSTR(application , "") and  INSTR(applicationversion, "") and  INSTR(filename, "")
GROUP BY vendor ,application ,applicationversion ,filename
ORDER BY vendor desc LIMIT 10 OFFSET 0;              # 有join表

问题:第一个页面比第二个页面快,页面二查询太慢了

 
 
 
 
分析:
页面1聚合:耗时0.19s
select vendor, application,applicationversion,filename ,fileversion, lastuse from `ASIA`.applications
                            where INSTR(vendor, "") and  INSTR(application , "") and  INSTR(filename, "") and lastuse< 1597741978 and lastuse> 1595149978
                            GROUP BY vendor ,application ,applicationversion ,filename
                            ORDER BY vendor desc LIMIT 10 OFFSET 0;     

#条数    

select count(*) from `ASIA`.applications
where INSTR(vendor, "") and  INSTR(application , "") and  INSTR(filename, "") and lastuse< 1597741978 and lastuse> 1595149978;    
+----------+
| count(*) |
+----------+
|    28726 |
+----------+
1 row in set (0.05 sec)
页面1不聚合:耗时0.07s      
select vendor, application,applicationversion,filename ,fileversion, lastuse from `ASIA`.applications
                            where INSTR(vendor, "") and  INSTR(application , "") and  INSTR(filename, "") and lastuse< 1597741978 and lastuse> 1595149978
                            ORDER BY vendor desc LIMIT 10 OFFSET 0;    

                                              

                            
                            
页面2聚合:耗时8.69s  ,聚合后耗时突然猛增, 因为left join后聚合的数据条数为1897744,远远高于应用库applications的条数49174      
select vendor ,application ,applicationversion ,fileversion,filename ,lastuse ,host.deviceid , hostname from `ASIA`.applications as app
            left join  `ASIA`.hostinfo as host on app.deviceid=host.deviceid
            where INSTR(hostname, "") and INSTR(vendor, "") and  INSTR(application , "") and  INSTR(applicationversion, "") and  INSTR(filename, "")
            GROUP BY vendor ,application ,applicationversion ,filename
            ORDER BY vendor desc LIMIT 10 OFFSET 0;             

 #条数

select count(*) from `ASIA`.applications as app
            join  `ASIA`.hostinfo as host on app.deviceid=host.deviceid
            where INSTR(hostname, "") and INSTR(vendor, "") and  INSTR(application , "") and  INSTR(applicationversion, "") and  INSTR(filename, "");
+----------+
| count(*) |
+----------+
|  1897744 |
+----------+
1 row in set (2.10 sec)

页面2不聚合:耗时0.15s  

select vendor ,application ,applicationversion ,fileversion,filename ,lastuse ,host.deviceid , hostname from `ASIA`.applications as app
            left join  `ASIA`.hostinfo as host on app.deviceid=host.deviceid
            where INSTR(hostname, "") and INSTR(vendor, "") and  INSTR(application , "") and  INSTR(applicationversion, "") and  INSTR(filename, "")
            ORDER BY vendor desc LIMIT 10 OFFSET 0;     
            

 

 

#应用库中条数49174            

mysql> select count(*) from `ASIA`.applications;
+----------+
| count(*) |
+----------+
|    49174 |
+----------+
1 row in set (0.03 sec)

页面2优化:

SELECT vendor ,application ,applicationversion ,fileversion,filename ,lastuse ,host.deviceid , hostname FROM 
  ( select deviceid, vendor ,application ,applicationversion ,fileversion,filename ,lastuse from `ASIA`.applications where  INSTR(vendor, "") and INSTR(application , "") and  INSTR(applicationversion, "") and  INSTR(filename, "")  GROUP BY vendor ,application ,applicationversion ,filename
  ) as app 
LEFT JOIN  ( select deviceid,hostname from `ASIA`.hostinfo where INSTR(hostname, "") group by deviceid) as host 
ON app.deviceid=host.deviceid  
ORDER BY vendor desc LIMIT 10 OFFSET 0;
 
 
 
left join后数据条数比原数据表多的原因及解决方法
举例:
A 表(1500行)是用户id和对应年龄,B 表(5000行)是用户id 和对应的爱好(如果有多个爱好就会出现多行数据),现在想通过用户id来链接这两张表,使用 left join(A,B)或者 inner join,结果连完发现生成了一张 3000行的表C( 但理论上行数应该小于等于表 A
问题原因:一对多
在 R 中发生这种问题,很大原因是两张表拿来连接的字段(比如用户id),在第二张表里面出现了多次,或者在第一张表里也有重复值
在 SQL 中发生这种问题,还有一种可能性是因为表 B 行数比表 A 多,所以会导致表 C 行数高于表 A
解决办法:
1、当两个表数据都很大时,我们用left join进行查询速度就会明显感觉到有点慢了,这是最好的解决办法就是 尽量给每个表都加上where条件去过滤一下数据
2、可以明确的是多出来的这些行是重复值,不是什么错误值,所以不需要太慌, 无论是在R 还是 SQL中,发生这种问题都可以直接去重解决
   在R中的代码,group_by 配合 summarize 去重
   在SQL中的代码,group_by 去重
 
SELECT * FROM ( SELECT * FROM A GROUP BY id ) AS A1
LEFT JOIN ( SELECT * FROM B GROUP BY id ) AS B1
ON A1.id = B1.id;
 
 
 
 
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值