Tuesday, February 3, 2015

SQL Server EXISTS vs IN

There are many different opinions on the use of EXISTS vs the IN list.  While they are logically different, many people say that they are identical in terms of performance.  I know they may seem very close to the same, but they are not.  In this post, I am going to demonstrate the performance difference of a very simple SELECT;  first done with EXISTS, then with the IN list.  I have included two quick tables and the population thereof.  Then we've got the retrieval using EXISTS and IN, along with the statistics generated by each of the statements.

-- TableA
IF OBJECT_ID('TableA','U') <> 0
DROP TABLE dbo.TableA
GO
CREATE TABLE dbo.TableA (
 ColumnA INT
) ON dba_data  -- rename as needed
GO

-- TableB
IF OBJECT_ID('TableB','U') <> 0
DROP TABLE dbo.TableB
GO
CREATE TABLE dbo.TableB (
 ColumnB INT
) ON dba_data -- rename as needed
GO

Ok.  Now let's load some data into both test tables.  This is just a very simple method that I use when I'm stress testing, and I need test data sets.  I am auto-incrementing the data while inserting, to generate unique values.

-- Load both tables
DECLARE @a INT
SET @a = 1
WHILE @a <= 200000
BEGIN
  INSERT dbo.TableA VALUES (@a)
  SET @a = @a + 1
END;

DECLARE @b INT
SET @b = 20
WHILE @b <= 120
BEGIN
  INSERT dbo.TableB VALUES (@b)
  SET @b = @b + 1
END;

-- check your rowcounts
SELECT COUNT(*) FROM dbo.TableA
SELECT COUNT(*) FROM dbo.TableB

And now for our retrieval.  We are looking for everything in TableA that does not exist in TableB. In this first SELECT we are using EXISTS.  Please also note that I am using STATISTICS IO ON/OFF to gather the statistics generated by the statement.

/* USING EXISTS */
SET NOCOUNT ON; -- just getting the rowcounts out of the test
SET STATISTICS IO ON;

SELECT * FROM dbo.TableA a 
WHERE NOT EXISTS(
  SELECT 1 FROM dbo.TableB b 
  WHERE a.ColumnA = b.ColumnB)

SET NOCOUNT OFF;
SET STATISTICS IO OFF;

Same same on the SELECT here, but now we are using the IN list.

/* USING THE IN LIST */
SET NOCOUNT ON;
SET STATISTICS IO ON;

SELECT * FROM dbo.TableA a 
WHERE a.ColumnA NOT IN(
  SELECT ColumnB b FROM dbo.TableB)

SET NOCOUNT OFF;
SET STATISTICS IO OFF;

These are our collected statistics -- first for the EXISTS, then for the IN list:

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TableA'. Scan count 1, logical reads 338, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TableB'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'TableB'.Scan count 9, logical reads 200005,physical reads 0,read-ahead reads 0,lob logical reads 0,lob physical reads 0, lob read-ahead reads 0.
Table 'TableA'. Scan count 5, logical reads 338, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Look at those scan counts, and the logic reads!  I realize this is just a small example, but the difference is very clear, both logically and physically.



No comments:

Post a Comment