如何删除SQL Server中的重复行?

本文探讨了在SQL Server中删除重复数据的有效方法,重点介绍了使用CTE和ROW_NUMBER函数来识别并删除重复行,同时提供了多种场景下的实用示例。

本文翻译自:How to delete duplicate rows in SQL Server?

How can I delete duplicate rows where no unique row id exists? 如何删除不存在unique row id 重复行

My table is 我的桌子是

col1  col2 col3 col4 col5 col6 col7
john  1    1    1    1    1    1 
john  1    1    1    1    1    1
sally 2    2    2    2    2    2
sally 2    2    2    2    2    2

I want to be left with the following after the duplicate removal: 在重复删除后,我想保留以下内容:

john  1    1    1    1    1    1
sally 2    2    2    2    2    2

I've tried a few queries but I think they depend on having a row id as I don't get the desired result. 我已经尝试了一些查询,但是我认为它们依赖于行ID,因为我没有得到期望的结果。 For example: 例如:

DELETE
FROM table
WHERE col1 IN (
    SELECT id
    FROM table
    GROUP BY id
    HAVING (COUNT(col1) > 1)
)

#1楼

参考:https://stackoom.com/question/1FAEA/如何删除SQL-Server中的重复行


#2楼

I like CTEs and ROW_NUMBER as the two combined allow us to see which rows are deleted (or updated), therefore just change the DELETE FROM CTE... to SELECT * FROM CTE : 我喜欢CTE和ROW_NUMBER因为两者结合使我们能够看到删除(或更新)了哪些行,因此只需将DELETE FROM CTE...更改为SELECT * FROM CTE

WITH CTE AS(
   SELECT [col1], [col2], [col3], [col4], [col5], [col6], [col7],
       RN = ROW_NUMBER()OVER(PARTITION BY col1 ORDER BY col1)
   FROM dbo.Table1
)
DELETE FROM CTE WHERE RN > 1

DEMO (result is different; I assume that it's due to a typo on your part) 演示 (结果有所不同;我认为这是由于您的错字引起的)

COL1    COL2    COL3    COL4    COL5    COL6    COL7
john    1        1       1       1       1       1
sally   2        2       2       2       2       2

This example determines duplicates by a single column col1 because of the PARTITION BY col1 . 由于PARTITION BY col1本示例通过单个列col1确定重复项。 If you want to include multiple columns simply add them to the PARTITION BY : 如果要包括多个列,只需将它们添加到PARTITION BY

ROW_NUMBER()OVER(PARTITION BY Col1, Col2, ... ORDER BY OrderColumn)

#3楼

Microsoft has a vey ry neat guide on how to remove duplicates. Microsoft有一个关于如何删除重复项的简洁指南。 Check out http://support.microsoft.com/kb/139444 查看http://support.microsoft.com/kb/139444

In brief, here is the easiest way to delete duplicates when you have just a few rows to delete: 简而言之,当您只有几行要删除时,这是删除重复项的最简单方法:

SET rowcount 1;
DELETE FROM t1 WHERE myprimarykey=1;

myprimarykey is the identifier for the row. myprimarykey是该行的标识符。

I set rowcount to 1 because I only had two rows that were duplicated. 我将rowcount设置为1,因为只有两行重复。 If I had had 3 rows duplicated then I would have set rowcount to 2 so that it deletes the first two that it sees and only leaves one in table t1. 如果我重复了3行,则应将rowcount设置为2,以便它删除它看到的前两行,而只在表t1中保留一行。

Hope it helps anyone 希望它能帮助任何人


#4楼

Another way of removing dublicated rows without loosing information in one step is like following: 另一种在不丢失信息的情况下删除重复行的方法如下:

delete from dublicated_table t1 (nolock)
join (
    select t2.dublicated_field
    , min(len(t2.field_kept)) as min_field_kept
    from dublicated_table t2 (nolock)
    group by t2.dublicated_field having COUNT(*)>1
) t3 
on t1.dublicated_field=t3.dublicated_field 
    and len(t1.field_kept)=t3.min_field_kept

#5楼

DELETE from search
where id not in (
   select min(id) from search
   group by url
   having count(*)=1

   union

   SELECT min(id) FROM search
   group by url
   having count(*) > 1
)

#6楼

If you have no references, like foreign keys, you can do this. 如果没有引用(例如外键),则可以执行此操作。 I do it a lot when testing proofs of concept and the test data gets duplicated. 测试概念证明和测试数据重复时,我会做很多事情。

SELECT DISTINCT [col1],[col2],[col3],[col4],[col5],[col6],[col7]

INTO [newTable]

Go into the object explorer and delete the old table. 进入对象资源管理器并删除旧表。

Rename the new table with the old table's name. 用旧表的名称重命名新表。

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值