我们有一些酒店,酒店用酒店名唯一标识,酒店的地址可能有多个版本,比如不同网站的地址不一样,还有一个createTime,表示我们得到这个酒店的时间。
现在的问题:对于每个酒店,我们需要获取最近创建的这条纪录,比如北京aa饭店,我们需要获取id=2的那条纪录
测试表定义:
CREATE TABLE `tt` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`address` varchar(255) DEFAULT NULL,
`createTime` datetime DEFAULT NULL,
PRIMARY KEY (`Id`),
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8;
测试用的几条纪录:
id name address createTime
1,"北京aa饭店","北京","2009-01-01 00:00:00"
2,"北京aa饭店","北京市海淀区","2009-01-03 00:00:00"
3,"如家快捷","北京","2010-01-01 00:00:00"
4,"汉庭","北京","2010-03-04 00:00:00"
直觉的实现应该是对name,createTime排序(group?),然后对于每个name,选一个createTime最大的
想了很久,能想到的就是exists,
select a.* from tt a where not exists (select b.id from tt b where b.name=a.name and b.createTime>a.createTime)
explain 一下
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,"PRIMARY","a","ALL",NULL,NULL,NULL,NULL,4,"Using where"
2,"DEPENDENT SUBQUERY","b","ALL",NULL,NULL,NULL,NULL,4,"Using where"
对name建立索引
explain
1,"PRIMARY","a","ALL",NULL,NULL,NULL,NULL,4,"Using where"
2,"DEPENDENT SUBQUERY","b","ALL","name",NULL,NULL,NULL,4,"Using where"
它的实现应该是这样:对于每一个record,根据name索引找到同名的所有酒店,如果没有比他新的,就是它了,否则这条纪录被丢弃
上网搜索了一下,发现这个帖子和我的问题一样 http://stackoverflow.com/questions/1313120/sql-retrieving-the-last-record-in-each-group
不知道还有没有更好的解决方法,有兴趣的可以讨论讨论。
里面有个用left join的,其实和用exists很像
SELECT a.* FROM tt a LEFT JOIN tt b ON (a.name = b.name AND a.createTime < b.createTime) WHERE b.name IS NULL;
is null其实就是not exists的意思
explain一下:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,"SIMPLE","a","ALL",NULL,NULL,NULL,NULL,4,""
1,"SIMPLE","b","ALL","name",NULL,NULL,NULL,4,"Using where"
可以看到和用exists很像,但是left join的好像都是simple的select_type。对mysql的查询优化不熟,哪位能解释一下
还有一种和原始的想法很像,先排序,然后group一下
select * from (select * from tt order by createTime desc) as X group by name
explain:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,"PRIMARY","<derived2>","ALL",NULL,NULL,NULL,NULL,4,"Using temporary; Using filesort"
2,"DERIVED","tt","ALL",NULL,NULL,NULL,NULL,4,"Using filesort"
使用了temporary 和 filesort,估计纪录多了效率不怎么样。