如何删除SQL Server中的重复行？

最新推荐文章于 2022-09-09 22:17:34 发布

翻译最新推荐文章于 2022-09-09 22:17:34 发布 · 455 阅读

1 ·

CC 4.0 BY-SA版权

原文链接：https://oldbug.net/q/1FAEA/How-to-delete-duplicate-rows-in-SQL-Server

文章标签：

#mysql #sql #sql-server-2008 #duplicates #sql-delete

本文探讨了在SQL Server中删除重复数据的有效方法，重点介绍了使用CTE和ROW_NUMBER函数来识别并删除重复行，同时提供了多种场景下的实用示例。

本文翻译自：How to delete duplicate rows in SQL Server?

How can I delete duplicate rows where no unique row id exists? 如何删除不存在unique row id 重复行 ？

My table is 我的桌子是

col1  col2 col3 col4 col5 col6 col7
john  1    1    1    1    1    1 
john  1    1    1    1    1    1
sally 2    2    2    2    2    2
sally 2    2    2    2    2    2

I want to be left with the following after the duplicate removal: 在重复删除后，我想保留以下内容：

john  1    1    1    1    1    1
sally 2    2    2    2    2    2

I've tried a few queries but I think they depend on having a row id as I don't get the desired result. 我已经尝试了一些查询，但是我认为它们依赖于行ID，因为我没有得到期望的结果。 For example: 例如：

DELETE
FROM table
WHERE col1 IN (
    SELECT id
    FROM table
    GROUP BY id
    HAVING (COUNT(col1) > 1)
)

#1楼

参考：https://stackoom.com/question/1FAEA/如何删除SQL-Server中的重复行

#2楼

I like CTEs and ROW_NUMBER as the two combined allow us to see which rows are deleted (or updated), therefore just change the DELETE FROM CTE... to SELECT * FROM CTE : 我喜欢CTE和ROW_NUMBER因为两者结合使我们能够看到删除（或更新）了哪些行，因此只需将DELETE FROM CTE...更改为SELECT * FROM CTE ：

WITH CTE AS(
   SELECT [col1], [col2], [col3], [col4], [col5], [col6], [col7],
       RN = ROW_NUMBER()OVER(PARTITION BY col1 ORDER BY col1)
   FROM dbo.Table1
)
DELETE FROM CTE WHERE RN > 1

DEMO (result is different; I assume that it's due to a typo on your part) 演示 （结果有所不同；我认为这是由于您的错字引起的）

COL1    COL2    COL3    COL4    COL5    COL6    COL7
john    1        1       1       1       1       1
sally   2        2       2       2       2       2

This example determines duplicates by a single column col1 because of the PARTITION BY col1 . 由于PARTITION BY col1本示例通过单个列col1确定重复项。 If you want to include multiple columns simply add them to the PARTITION BY : 如果要包括多个列，只需将它们添加到PARTITION BY ：

ROW_NUMBER()OVER(PARTITION BY Col1, Col2, ... ORDER BY OrderColumn)

#3楼

Microsoft has a vey ry neat guide on how to remove duplicates. Microsoft有一个关于如何删除重复项的简洁指南。 Check out http://support.microsoft.com/kb/139444 查看http://support.microsoft.com/kb/139444

In brief, here is the easiest way to delete duplicates when you have just a few rows to delete: 简而言之，当您只有几行要删除时，这是删除重复项的最简单方法：

SET rowcount 1;
DELETE FROM t1 WHERE myprimarykey=1;

myprimarykey is the identifier for the row. myprimarykey是该行的标识符。

I set rowcount to 1 because I only had two rows that were duplicated. 我将rowcount设置为1，因为只有两行重复。 If I had had 3 rows duplicated then I would have set rowcount to 2 so that it deletes the first two that it sees and only leaves one in table t1. 如果我重复了3行，则应将rowcount设置为2，以便它删除它看到的前两行，而只在表t1中保留一行。

Hope it helps anyone 希望它能帮助任何人

#4楼

Another way of removing dublicated rows without loosing information in one step is like following: 另一种在不丢失信息的情况下删除重复行的方法如下：

delete from dublicated_table t1 (nolock)
join (
    select t2.dublicated_field
    , min(len(t2.field_kept)) as min_field_kept
    from dublicated_table t2 (nolock)
    group by t2.dublicated_field having COUNT(*)>1
) t3 
on t1.dublicated_field=t3.dublicated_field 
    and len(t1.field_kept)=t3.min_field_kept

#5楼

DELETE from search
where id not in (
   select min(id) from search
   group by url
   having count(*)=1

   union

   SELECT min(id) FROM search
   group by url
   having count(*) > 1
)

#6楼

If you have no references, like foreign keys, you can do this. 如果没有引用（例如外键），则可以执行此操作。 I do it a lot when testing proofs of concept and the test data gets duplicated. 测试概念证明和测试数据重复时，我会做很多事情。

SELECT DISTINCT [col1],[col2],[col3],[col4],[col5],[col6],[col7]

INTO [newTable]

Go into the object explorer and delete the old table. 进入对象资源管理器并删除旧表。

Rename the new table with the old table's name. 用旧表的名称重命名新表。