Removing Duplicate Records using SQL Server 2005

本文介绍如何使用SQL Server 2005中的公共表表达式(CTE)快速有效地从没有主键的表中删除重复记录。通过示例展示了如何定义CTE,并使用DENSE_RANK()函数对重复项进行排名,从而实现仅保留一条记录的目标。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Removing duplicate records from tables is sometimes an arduous task, especially if the source table does not contain a primary key field.  Here’s how you can take advantage of a new SQL Server 2005 feature to remove duplicates quickly and efficiently.
——————————————————————-

 

Duplicate Records

Duplicate records can occur numerous ways, such as loading source files too many times, keying the same data more than once, or from just bad database coding. Having a primary key on your table (and you always should have one) can will in the removal of the duplicate records, but even w/ a primary key it is never a fun task to have handed to you to complete.

In today’s example I will demonstrate how you can use a common-table expression (CTE) in 2005 to easily remove duplicate entries from a table, even when the table does not contain a primary key field. The script below creates the SalesHistory table, and loads 10 records into the table. Two of these records are duplicate entries, which I am defining as having the same values for the Product, SaleDate, and SalePrice fields. In the real world, these may not be duplicate records at all, but for our example we’ll assume that they are.

IF OBJECT_ID('SalesHistory') IS NOT NULL
DROP TABLE SalesHistory
CREATE TABLE [dbo].[SalesHistory]
  (
        [Product] [varchar](10) NULL,
        [SaleDate] [datetime] NULL,
        [SalePrice] [money] NULL
  )
  GO
INSERT INTO SalesHistory(Product, SaleDate, SalePrice)
SELECT 'Computer','1919-03-18 00:00:00.000',1008.00
UNION ALL
SELECT 'BigScreen','1927-03-18 00:00:00.000',91.00
UNION ALL
SELECT 'PoolTable','1927-04-01 00:00:00.000',139.00
UNION ALL
SELECT 'Computer','1919-03-18 00:00:00.000',1008.00
UNION ALL
SELECT 'BigScreen','1927-03-25 00:00:00.000',92.00
UNION ALL
SELECT 'PoolTable','1927-03-25 00:00:00.000',108.00
UNION ALL
SELECT 'Computer','1919-04-01 00:00:00.000',150.00
UNION ALL
SELECT 'BigScreen','1927-04-01 00:00:00.000',	123.00
UNION ALL
SELECT 'PoolTable','1927-04-01 00:00:00.000',	139.00
UNION ALL
SELECT 'Computer','1919-04-08 00:00:00.000',	168.00

Now that I have some duplicate records loaded up, I can start writing my routine to remove them. The trick with removing duplicates is to find a way to delete them from your source table, rather than grouping all of the records in the source table together and creating a new table w/ the new entries. Deleting from the source usually requires a way to uniquely identify one of the unique records and then remove it. With some crafty TSQL, this is a relatively easy task to do when a primary key defined on the table. Luckily, the new CTE feature in SQL Server 2005 makes it very easy to remove these duplicates, with or without a primary key.

The script below defines my CTE. I am using a windowing function named DENSE_RANK to group the records together based on the Product, SaleDate, and SalePrice fields, and assign them a sequential value randomly. This means that if I have two records with the exact same Product, SaleDate, and SalePrice values, the first record will be ranked as 1, the second as 2, and so on.

;WITH SalesCTE(Product, SaleDate, SalePrice, Ranking)
AS
(
SELECT
	Product, SaleDate, SalePrice,
Ranking = DENSE_RANK() OVER(PARTITION BY Product, SaleDate, SalePrice ORDER BY NEWID() ASC)
FROM SalesHistory
)
DELETE FROM SalesCTE
WHERE Ranking > 1

Because a CTE acts as a virtual table, I am able to process data modification statements against it, and the underlying table will be affected. In this case, I am removing any record from the SalesCTE that is ranked higher than 1. This will remove all of my duplicate records.

To verify my duplicates have been removed, I can review the data in the table, which should now contain 8 records, rather than the previous 10.

SELECT *
FROM SalesHistory

Be Creative

Common-table expressions are a very useful new feature in SQL Server 2005. You can use them for recursive queries, removing duplicates, and even simple looping procedures. Explore this neat new feature to find some creative ways to solve some of your everyday database problems.

Get IT tips, news, and reviews delivered directly to your inbox by subscribing to TechRepublic's free newsletters.

 

Tim ChapmanTim Chapman is a SQL Server MVP, a database architect, and an administrator who works as an independent consultant in Raleigh, NC, and has more than nine years of IT experience. If you would like to contact Tim, please e-mail him at chapman.tim@gmail.com or contact him through his Web site, SQL Server Nation.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值