题目:
Write a SQL query to delete all duplicate email entries in a table named Person
, keeping only unique emails based on its smallest Id.
+----+------------------+ | Id | Email | +----+------------------+ | 1 | john@example.com | | 2 | bob@example.com | | 3 | john@example.com | +----+------------------+ Id is the primary key column for this table.
For example, after running your query, the above Person
table should have the following rows:
+----+------------------+ | Id | Email | +----+------------------+ | 1 | john@example.com | | 2 | bob@example.com | +----+------------------+
第一次知道DELETE还可以这样用,想法是对的但是很傻的用了SELECT=.=
DELETE p1
FROM Person p1, Person p2
WHERE p1.Email = p2.Email AND
p1.Id > p2.Id
但是不明白的是 怎么就能知道ID在p1 p2中是有序的呢
EXPLANATION:
- Take the table in the example
Id | Email
1 | john@example.com
2 | bob@example.com
3 | john@example.com
- Join the table on itself by the Email and you'll get:
FROM Person p1, Person p2 WHERE p1.Email = p2.Email
p1.Id | p1.Email | p2.Id | p2.Email
1 | john@example.com | 1 | john@example.com
3 | john@example.com | 1 | john@example.com
2 | bob@example.com | 2 | bob@example.com
1 | john@example.com | 3 | john@example.com
3 | john@example.com | 3 | john@example.com
- From this results filter the records that have p1.Id>p2.ID, in this case you'll get just one record:
p1.Id | p1.Email | p2.Id | p2.Email
3 | john@example.com | 1 | john@example.com
- This is the record we need to delete, and by saying
DELETE p1
in this multiple-table syntax, only matching rows from the tables listed before the FROM clause are deleted, in this case just
p1.Id | p1.Email
3 | john@example.com
will be deleted
还有一个人的答案,在Sqlite可以运行但是MySQL不可以
DELETE FROM Person
WHERE Id IN
(SELECT P1.Id FROM Person AS P1, Person AS P2
WHERE P1.Id > P2.Id AND P1.Email = P2.Email)
In mysql you must't update a table while using select clause , You can only do that step by step . However ,you can use a middle table as :
delete from Person where id not in(
select t.id from (
select min(id) as id from Person group by email
) t
)
MySQL Don't allow referring delete target table in sub query, a workaround is use ( select * from Person ) to get a new table.
delete from Person where Id in (
select p1.Id from (select * from Person) p1, (select * from Person) p2
where p1.Email = p2.Email and p1.Id > p2.Id )
Delete and Distinct are completely different, while delete alters the table, distinct only selects distinct values and doesn't alter table.