Thursday, May 5, 2011

Remove duplicates with a CTE

'CTE' means Common Table Expression.  Per BOL, this is derived from a simple query and defined within the execution scope of a single SELECT, INSERT, UPDATE, MERGE, or DELETE statement. The CTE is a new feature enabled in v2005 and forward, that can be used in stored procedures, as recursive queries, even for removing duplicates!  In this example, I will show you how to remove duplicates from a table WITHOUT a primary key:

USE MyDatabase;

IF OBJECT_ID('MyTable','u')>0
DROP TABLE dbo.MyTable
GO
CREATE TABLE MyTable (
 Col1 VARCHAR(10),
 Col2 VARCHAR(10)
);

/* Insert data with dupes */
 INSERT INTO MyTable(Col1,Col2)
 VALUES ('Jack','Jill');
 INSERT INTO MyTable(Col1,Col2)
 VALUES ('Jack','Jill');
 INSERT INTO MyTable(Col1,Col2)
 VALUES ('Jack','Jill');
 INSERT INTO MyTable(Col1, Col2)  
 VALUES ('WhenHarry','MetSally');
 INSERT INTO MyTable(Col1,Col2)
 VALUES ('WhenHarry','MetSally');
 INSERT INTO MyTable(Col1,Col2)
 VALUES ('WhenHarry','MetSally');
 INSERT INTO MyTable(Col1,Col2)
 VALUES ('WhenHarry','MetSally');
 INSERT INTO MyTable(Col1,Col2)
 VALUES ('Peter','Paul'); 

/* Quick look at your data */
 SELECT * FROM MyTable;

Col1              Col2
Jack                   Jill
Jack                   Jill
Jack                   Jill
WhenHarry     MetSally
WhenHarry     MetSally
WhenHarry    MetSally
WhenHarry    MetSally
Peter              Paul

/* Remove dupes */
 DECLARE @Col1 VARCHAR(10),
   @Col2 VARCHAR(10),
 @PreviousCol1 VARCHAR(10),
 @PreviousCol2 VARCHAR(10);

  WITH CTE AS(
  SELECT 
COALESCE(Col1,'') AS Col1, 
COALESCE(Col2,'') AS Col2, ROW_NUMBER() OVER(PARTITION BY col1,col2 ORDER BY col1,col2) AS row_id
  FROM MyTable
  )

DELETE CTE WHERE row_id >1;

/* See that your dupes are gone. */
SELECT * FROM MyTable 

    Col1              Col2
    Jack               Jill
    WhenHarry    MetSally
    Peter              Paul


No comments:

Post a Comment