How to delete duplicate rows with SQL

本文介绍如何使用SQL在单一原子操作中删除数据库中的重复记录。通过三种不同方法:分组自连接、相关子查询及利用数据特性进行高效删除,文章详细解释了每种方法的适用场景及其优缺点。

from: http://www.xaprb.com/blog/2006/10/11/how-to-delete-duplicate-rows-with-sql/

The comments on my article how to find duplicate rows with SQL made me realize I was simplifying things too much. In particular, I really glossed over the “how to delete duplicate rows” section, which I should have explained in more detail. I hope this article will remedy the omissions.

Update: I’ve now written another article on deleting duplicates when you have no primary key.
What’s wrong with what I wrote?

I assumed a beginner audience in my other article. Specifically, I assumed a beginner programmer who designs a table, connects some program to it, and then discovers a bunch of duplicate data because of incorrect table design. The programmer then has to find and delete the duplicate rows before putting unique indexes and primary keys on the data. And because I assumed a relatively small, non-mission-critical task, I suggested you can make a temporary table to aid in deleting the duplicate rows.

This isn’t realistic for a couple of reasons. First, as you showed me, many of you are beyond the simplistic ways to delete duplicates. Second, unless you have the data all to yourself, you need to find and delete the duplicates in an atomic operation, leaving no chance for more duplicates to sneak in while you’re working.
Two ways to do it

There are essentially two ways to do this: in one statement, or in multiple statements. If you do it in one statement, the statement is atomic, and you don’t need a transaction. If you use temporary tables or another method and use multiple statements, you have to do it all in a transaction.

It’s probably easier and better to do it in one statement, contrary to what I suggested previously.
How to delete duplicates with a single statement

The basic technique is to do a grouped self-join or subquery. That will make more sense in a moment.

First, familiarize yourself with the basic techniques I explained for finding the duplicate rows, especially the section on finding the data you need to delete them. This will become the innermost query in your self-join. Here’s the query I’ll use (refer to the other article for details):
select day, MIN(id)
from test
group by day
having count(*) > 1

You cannot delete from that result set, but you can delete by joining against it or using it in a subquery. First, I’ll show you how to self-join against the grouped query.
Technique 1: a grouped self-join

Place the find-bad-rows query into a subquery in the FROM clause, and join against it in such a way that the join will succeed only on rows you don’t want:
select bad_rows.*
from test as bad_rows
inner join (
select day, MIN(id) as min_id
from test
group by day
having count(*) > 1
) as good_rows on good_rows.day = bad_rows.day
and good_rows.min_id <> bad_rows.id;

Notice I’m joining on days that match and excluding the row I want to keep, the one with the minimum value for id. If that query returns the rows you don’t want, you’re good to go. All you have to do is put the DELETE in front of it:
delete bad_rows.*
from test as bad_rows
inner join (
select day, MIN(id) as min_id
from test
group by day
having count(*) > 1
) as good_rows on good_rows.day = bad_rows.day
and good_rows.min_id <> bad_rows.id;

The syntax will vary slightly depending on your RDBMS. I’ve written this for MySQL (MySQL users might also need to be careful about cross-database deletes). This will also only work on versions of MySQL where subqueries are implemented.
Technique 2: correlated subquery

The second method is to use a correlated subquery and place the find-bad-rows query inside the subquery. You can write these subqueries many different ways. Here’s one rather sub-optimal way:
delete test_outer.*
from test as test_outer
where exists(
select *
from test as test_inner
where test_inner.day = test_outer.day
group by day
having count(*) > 1
and min(test_inner.id) <> test_outer.id
);

This won’t even work on MySQL because it is trying to select from the same table it’s modifying. There are some silly tricks to get around this, which force intermediate materialization of the subquery, but in general you’re better off using the JOIN technique in MySQL.

[Edit: I originally listed an alternate query using a silly trick, which didn't work (my mistake). I've removed that because that kind of query is a Stupid Thing To Do. If you are competent enough to write that query, I don't need to tell you how. See the comments to follow the conversation about this.]
Technique 3: be clever about your data

Both the previous techniques rely on certain behaviors and database features. How about relying on the data itself to enable a smart, efficient deleting algorithm?

If most groups have duplicates, but there are not many duplicate rows within each group, this can be very efficient and doesn’t require any subqueries:
delete bad_rows.*
from test as good_rows
inner join test as bad_rows on bad_rows.day = good_rows.day
and bad_rows.id > good_rows.id;

This works because I decided I wanted to keep the row with the smallest id in each group. That means I can do a self-join that matches rows with a strict greater-than. Greater than what? The minimum value of id for that value of day, of course.

This is essentially a cross join, which is an O(n2) algorithm in the pathological worst case, but if there are not many duplicate rows, it’s basically the same cost as a regular join.

There are cases where this method is really terrible, too. For instance, you have a hundred million rows and only one duplicate row. You’d be joining a hundred million rows against a hundred million rows, eliminating all but one of them, and deleting it. That would be a very bad idea because it would take forever. It could be much more efficient to find the lone duplicate and delete just it, without doing all the joining.
Conclusion

I hope this article filled in some of the gaps in my other article. I’ve shown you at least three techniques for deleting duplicate rows with a single atomic query. Each is good for some scenarios and bad for others.

If you found this useful, you should subscribe by e-mail or feeds and get my upcoming articles conveniently.

在不同的前端技术框架中,监听键盘的 Delete 事件有不同的实现方法: ### jQuery 实现 可以基于 jQuery 实现键盘事件监听控件,监听 Delete 事件。示例代码如下: ```html <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>jQuery监听Delete事件</title> <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script> </head> <body> <input type="text" id="inputField"> <script> $(document).ready(function() { $('#inputField').on('keydown', function(event) { if (event.key === 'Delete') { console.log('Delete 键被按下'); } }); }); </script> </body> </html> ``` 这里使用 jQuery 的 `on` 方法监听 `keydown` 事件,当按下的键为 `Delete` 时,在控制台输出相应信息[^1]。 ### Vue 实现 在 Vue 中监听 Delete 事件,示例代码如下: ```html <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Vue监听Delete事件</title> <script src="https://cdn.jsdelivr.net/npm/vue@2.6.14/dist/vue.js"></script> </head> <body> <div id="app"> <input type="text" @keydown.delete="handleDelete"> </div> <script> new Vue({ el: '#app', methods: { handleDelete() { console.log('Delete 键被按下'); } } }); </script> </body> </html> ``` 此代码中,在输入框上使用 `@keydown.delete` 指令绑定 `handleDelete` 方法,当按下 Delete 键时触发该方法并在控制台输出信息。同时,也可以利用 Vue 监听键盘事件的特性,结合 Unicode 编码等方式实现监听,在 Web 应用中可通过 `onkeydown` 事件的 `event.keyCode/event.which` 获取键盘操作,不过在 Vue 中更推荐使用指令方式 [^2][^5]。 ### Angular 实现 在 Angular 中全局监听 Delete 事件,示例代码如下: ```typescript import { Component, OnInit } from '@angular/core'; import { EventManager } from '@angular/platform-browser'; @Component({ selector: 'app-root', templateUrl: './app.component.html', styleUrls: ['./app.component.css'] }) export class AppComponent implements OnInit { constructor(private eventManager: EventManager) {} ngOnInit(): void { this.eventManager.addGlobalEventListener('window', 'keyup.Delete', () => { console.log('Delete 键被释放'); }); } } ``` 在组件的 `ngOnInit` 方法中,使用 `EventManager` 注册全局监听,当释放 Delete 键时,在控制台输出信息 [^4]。 ### HTML5 原生 JavaScript 实现 使用 HTML5 原生 JavaScript 监听 Delete 事件,示例代码如下: ```html <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>原生JS监听Delete事件</title> </head> <body> <input type="text" id="inputField"> <script> const inputField = document.getElementById('inputField'); inputField.addEventListener('keydown', function(event) { if (event.key === 'Delete') { console.log('Delete 键被按下'); } }); </script> </body> </html> ``` 通过 `addEventListener` 方法为输入框添加 `keydown` 事件监听器,当按下 Delete 键时,在控制台输出相应信息。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值