Removing Duplicate Records using SQL Server 2005

最新推荐文章于 2025-08-12 15:56:51 发布

eagleatustb

最新推荐文章于 2025-08-12 15:56:51 发布

阅读量771

点赞数

分类专栏：国外原文转载文章标签： sql server duplicates table microsoft database delete

国外原文转载专栏收录该内容

5 篇文章

订阅专栏

本文介绍如何使用SQL Server 2005中的公共表表达式(CTE)快速有效地从没有主键的表中删除重复记录。通过示例展示了如何定义CTE，并使用DENSE_RANK()函数对重复项进行排名，从而实现仅保留一条记录的目标。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Source: http://blogs.techrepublic.com.com/datacenter/?p=420&tag=content;leftCol
Date: July 29th, 2008
Author: Tim Chapman
Category: SQL, databases, sql server
Tags: Microsoft SQL Server, Record, Microsoft SQL Server 2005, Table, SalesHistory, SaleDate, SalePrice, SELECT, Productivity, Databases

Removing duplicate records from tables is sometimes an arduous task, especially if the source table does not contain a primary key field. Here’s how you can take advantage of a new SQL Server 2005 feature to remove duplicates quickly and efficiently.
——————————————————————-

Duplicate Records

Duplicate records can occur numerous ways, such as loading source files too many times, keying the same data more than once, or from just bad database coding. Having a primary key on your table (and you always should have one) can will in the removal of the duplicate records, but even w/ a primary key it is never a fun task to have handed to you to complete.

In today’s example I will demonstrate how you can use a common-table expression (CTE) in 2005 to easily remove duplicate entries from a table, even when the table does not contain a primary key field. The script below creates the SalesHistory table, and loads 10 records into the table. Two of these records are duplicate entries, which I am defining as having the same values for the Product, SaleDate, and SalePrice fields. In the real world, these may not be duplicate records at all, but for our example we’ll assume that they are.

IF OBJECT_ID('SalesHistory') IS NOT NULL

DROP TABLE SalesHistory

CREATE TABLE [dbo].[SalesHistory]

        [Product] [varchar](10) NULL,

        [SaleDate] [datetime] NULL,

        [SalePrice] [money] NULL

GO

INSERT INTO SalesHistory(Product, SaleDate, SalePrice)

SELECT 'Computer','1919-03-18 00:00:00.000',1008.00

UNION ALL

SELECT 'BigScreen','1927-03-18 00:00:00.000',91.00

UNION ALL

SELECT 'PoolTable','1927-04-01 00:00:00.000',139.00

UNION ALL

SELECT 'Computer','1919-03-18 00:00:00.000',1008.00

UNION ALL

SELECT 'BigScreen','1927-03-25 00:00:00.000',92.00

UNION ALL

SELECT 'PoolTable','1927-03-25 00:00:00.000',108.00

UNION ALL

SELECT 'Computer','1919-04-01 00:00:00.000',150.00

UNION ALL

SELECT 'BigScreen','1927-04-01 00:00:00.000',	123.00

UNION ALL

SELECT 'PoolTable','1927-04-01 00:00:00.000',	139.00

UNION ALL

SELECT 'Computer','1919-04-08 00:00:00.000',	168.00

Now that I have some duplicate records loaded up, I can start writing my routine to remove them. The trick with removing duplicates is to find a way to delete them from your source table, rather than grouping all of the records in the source table together and creating a new table w/ the new entries. Deleting from the source usually requires a way to uniquely identify one of the unique records and then remove it. With some crafty TSQL, this is a relatively easy task to do when a primary key defined on the table. Luckily, the new CTE feature in SQL Server 2005 makes it very easy to remove these duplicates, with or without a primary key.

The script below defines my CTE. I am using a windowing function named DENSE_RANK to group the records together based on the Product, SaleDate, and SalePrice fields, and assign them a sequential value randomly. This means that if I have two records with the exact same Product, SaleDate, and SalePrice values, the first record will be ranked as 1, the second as 2, and so on.

;WITH SalesCTE(Product, SaleDate, SalePrice, Ranking)

AS

SELECT

	Product, SaleDate, SalePrice,

Ranking = DENSE_RANK() OVER(PARTITION BY Product, SaleDate, SalePrice ORDER BY NEWID() ASC)

FROM SalesHistory

DELETE FROM SalesCTE

WHERE Ranking > 1

Because a CTE acts as a virtual table, I am able to process data modification statements against it, and the underlying table will be affected. In this case, I am removing any record from the SalesCTE that is ranked higher than 1. This will remove all of my duplicate records.

To verify my duplicates have been removed, I can review the data in the table, which should now contain 8 records, rather than the previous 10.

SELECT *

FROM SalesHistory

Be Creative

Common-table expressions are a very useful new feature in SQL Server 2005. You can use them for recursive queries, removing duplicates, and even simple looping procedures. Explore this neat new feature to find some creative ways to solve some of your everyday database problems.

Get IT tips, news, and reviews delivered directly to your inbox by subscribing to TechRepublic's free newsletters.

Tim Chapman is a SQL Server MVP, a database architect, and an administrator who works as an independent consultant in Raleigh, NC, and has more than nine years of IT experience. If you would like to contact Tim, please e-mail him at chapman.tim@gmail.com or contact him through his Web site, SQL Server Nation.