在大量数据的情况下,可能会存在某些数据重复的情况,比如说某个字段重复,某几个字段重复,有时候我们需要针对重复数据进行一些操作,下面通过例子说明如何查询和删除重复数据。
假如有如下数据表,表名dep,其中deptno为主键
1、查询重复数据,重复记录是根据单个字段(dname)来判断
SELECT
*
FROM
dep
WHERE
dname IN (
SELECT
dname
FROM
dep
GROUP BY
dname
HAVING
count(dname) > 1
)
查询结果如下:
2、删除表中多余的重复记录,只保留deptno最小的记录,重复记录还是根据单个字段(dname)来判断
DELETE
FROM
dep
WHERE
dname IN (
SELECT
*
FROM
(
SELECT
dname
FROM
dep
GROUP BY
dname
HAVING
COUNT(dname) > 1
) AS t1
)
AND deptno NOT IN (
SELECT
*
FROM
(
SELECT
MIN(deptno)
FROM
dep
GROUP BY
dname
HAVING
COUNT(dname) > 1
) AS t2
)
执行后结果:
注:where 第一个字句为查询的是所有重复dname的数据,and连接的第二个字句排除deptno最小的,同时这里还需要注意的是
每一个嵌套查询里还添加了个select * from,这里这么做的原因是sql的语法机制,sql不允许在同一个表中既做查询又做修改,如果发生会报
You can't specify target table 'dep' for update in FROM clause
这样的错误,所以我们通过加上这个字句构造一个中间的临时表。临时表必须要加别名!!临时表必须要加别名!!临时表必须要加别名!!不然会报错的,所以我这里分别取别名t1,t2
3、查找表中多余的重复记录(多个字段),以dname和db_source来判断是否重复
有如下表:
SELECT
*
FROM
dep
WHERE
(dname, db_source) IN (
SELECT
dname,
db_source
FROM
dep
GROUP BY
dname,
db_source
HAVING
count(*) > 1
)
查询结果:
4、删除表中多余的重复记录(多个字段),以dname和db_source来判断是否重复 ,只保留有peopleId最小的记录
DELETE
FROM
dep
WHERE
(dname, db_source) IN (
SELECT
dname,
db_source
FROM
(
SELECT
dname,
db_source
FROM
dep
GROUP BY
dname,
db_source
HAVING
count(*) > 1
) AS t1
)
AND deptno NOT IN (
SELECT
*
FROM
(
SELECT
min(deptno)
FROM
dep
GROUP BY
dname,
db_source
HAVING
count(*) > 1
) AS t2
)
执行结果: