Sometimes SQL can be cursed. It runs fine in dev and makes it through testing without issue. Then it gets to production, the table grows, the load spikes, and things turn on you at 2 a.m.
The cursed SQL returns answers. The problem is that the answer is wrong, or slow, or both -- and SQL Server never says a word. That's the curse. Here are six good examples.
CREATE TABLE dbo.Orders (
OrderID INT IDENTITY(1,1) PRIMARY KEY,
CustomerID INT NOT NULL,
AccountNo VARCHAR(20) NOT NULL,
OrderDate DATETIME2 NOT NULL,
Amount DECIMAL(10,2) NOT NULL
);
-- A covering index for some of our victims below
CREATE INDEX idx_Orders_CustomerID ON dbo.Orders (CustomerID) INCLUDE (Amount);
-- And one on the columns we will abuse
CREATE INDEX idx_Orders_AccountNo ON dbo.Orders (AccountNo);
CREATE INDEX idx_Orders_OrderDate ON dbo.Orders (OrderDate);
INSERT dbo.Orders (CustomerID, AccountNo, OrderDate, Amount)
SELECT TOP (500000) -- 500K rows, enough to make the bad plans hurt (TOP caps the cross join)
ABS(CHECKSUM(NEWID())) % 1000,
CAST(ABS(CHECKSUM(NEWID())) % 100000 AS VARCHAR(20)),
DATEADD(DAY, -(ABS(CHECKSUM(NEWID())) % 1000), SYSDATETIME()),
(ABS(CHECKSUM(NEWID())) % 50000) / 100.0
FROM sys.all_objects a CROSS JOIN sys.all_objects b;
Curse 1: The Cursor That Should Have Been a Set-Based Manipulation
The classic. You need to update every row, so you reach for the tool that processes one row at a time. It feels procedural. It feels safe. It is neither.
DECLARE @id INT, @amt DECIMAL(10,2);
DECLARE c CURSOR FOR
SELECT OrderID, Amount FROM dbo.Orders;
OPEN c;
FETCH NEXT FROM c INTO @id, @amt;
WHILE @@FETCH_STATUS = 0
BEGIN
UPDATE dbo.Orders
SET Amount = @amt * 1.05
WHERE OrderID = @id;
FETCH NEXT FROM c INTO @id, @amt;
END
CLOSE c;
DEALLOCATE c;
The cursor spells out a procedure: fetch a row, update that row, fetch the next, repeat half a million times. Each pass is its own latch, lock, and log record. But SQL Server is a set-based engine, and a safer approach is a working table you walk in batches with a WHILE loop, updating through a join instead of fetching one row at a time.
-- Stage the rows to process in a working table with an identity column
CREATE TABLE #Work (
RowID INT IDENTITY(1,1) PRIMARY KEY,
OrderID INT NOT NULL
);
INSERT #Work (OrderID)
SELECT OrderID FROM dbo.Orders;
DECLARE @BatchStart INT = 1,
@BatchSize INT = 10000,
@MaxRow INT;
SELECT @MaxRow = MAX(RowID) FROM #Work;
WHILE @BatchStart <= @MaxRow
BEGIN
UPDATE o
SET o.Amount = o.Amount * 1.05
FROM dbo.Orders o INNER JOIN #Work w
ON o.OrderID = w.OrderID
WHERE w.RowID >= @BatchStart
AND w.RowID < @BatchStart + @BatchSize;
SET @BatchStart = @BatchStart + @BatchSize;
END
DROP TABLE #Work;
That is the set-based manipulation: stage the rows in a working table keyed by an identity column, then walk it with a WHILE loop, updating through the join one batch at a time. No CURSOR, no FETCH, no DEALLOCATE, and none of the per-row engine machinery a cursor hauls around behind it. It still walks the set, but on your terms, through a structure you own, and each pass updates ten thousand rows in a single set operation instead of one row in its own transaction. Batching is the whole point: you cap transaction size, lock footprint, and log growth instead of letting one statement hold the entire table hostage. And the 10K is a starting point. Test it out. You can probably go as high as 100K.
Curse 2: SELECT * Where Nobody Asked
SELECT * is convenient, but it's also a promise you can't keep. The moment someone adds, drops, or reorders a column, everything downstream gets a surprise. And what about the overhead? SELECT * kills index efficiency by forcing expensive Key Lookups or Clustered Index Scans instead of using fast, narrow indexes. It also wastes network bandwidth by pushing large amounts of unneeded data across the network - and more.
We have an index that covers CustomerID and Amount. Ask for precisely those two columns and the index answers the query by itself; SQL Server never touches the table:
SELECT CustomerID, Amount FROM dbo.Orders WHERE CustomerID = 42; -- Index Seek on idx_Orders_CustomerID.
Now ask for everything. The index doesn't carry the other columns, so SQL Server has to go fetch them -- a key lookup back to the clustered index for every matching row, or it abandons the index entirely and scans. Same WHERE clause, far more work:
SELECT * FROM dbo.Orders WHERE CustomerID = 42; -- Key Lookup or Clustered Index Scan. More reads, more pages, more time.
Name your columns. You get the index you paid for, and your query still means the same thing six months from now.
Curse 3: A Function on the Wrong Side of the WHERE
You want this year's orders, so you wrap the column in a function. Reads beautifully but performs like a swamp. When you apply a function to an indexed column in the WHERE clause, the predicate is no longer SARGable, the index goes unused, and SQL Server scans the whole thing.
SELECT OrderID, Amount FROM dbo.Orders WHERE YEAR(OrderDate) = 2026; -- Index Scan. SQL Server has to compute YEAR() for every single row.
Leave the column alone and express the same idea as a range. Now the index on OrderDate does its job.
SELECT OrderID, Amount FROM dbo.Orders WHERE OrderDate >= '2026-01-01' AND OrderDate < '2027-01-01'; -- Index Seek. Touches only the rows in range.
Note the half-open range, '>=' to '<', rather than BETWEEN. That will sidestep the boundary fights you get with the time portion of a datetime2.
Curse 4: The Implicit Conversion You Never See
This one is the quietest of them all. AccountNo is a VARCHAR. You compare it to a number because, well, it looks like a number. SQL Server obliges, because it never says no when it can convert.
SELECT OrderID, AccountNo FROM dbo.Orders WHERE AccountNo = 12345; -- integer literal vs VARCHAR column
Here's the trap. When two types meet, the one with lower precedence is converted up to the higher one, and INT outranks VARCHAR. So SQL Server doesn't convert your literal down to a string, it converts the entire column up to INT, row by row. CONVERT_IMPLICIT on every value means your index on AccountNo is useless and it's going to scan. The plan for this query even waves a little yellow warning triangle at you that nobody ever clicks.
Match the literal to the column's type and the conversion vanishes:
SELECT OrderID, AccountNo FROM dbo.Orders WHERE AccountNo = '12345'; -- string vs string, index seek
The fix is one pair of quotes. Nothing was broken -- SQL Server did exactly what the precedence rules say it should. It just did it silently, on every row, and handed back the right answer very slowly. That's the whole story of implicit conversion.
Curse 5: NOLOCK as a Performance Setting
Somewhere along the line, NOLOCK got a reputation as the 'go faster' hint. Slap it on, skip the locks, and get your data sooner. What it actually does is set that statement to READ UNCOMMITTED, and that means something very different than just 'faster.'
SELECT CustomerID, Amount FROM dbo.Orders WITH (NOLOCK);
READ UNCOMMITTED reads rows other transactions have modified but not committed. Those are dirty reads. Values that may never have existed if the other transaction rolls back. But it gets worse than reading the wrong number. Straight from Microsoft: with NOLOCK, rows can appear or disappear in your result set before your statement finishes. During a page split it can skip rows entirely, or hand you the same row twice. You don't get an error. You get a report with bad data. Again, dirty reads. You don't want these.
NOLOCK is not a performance tuning option. If blocking is your real problem, the grown-up answer is to identify and resolve what is causing the blocking. Row versioning is an option, where readers do not request shared locks, and writers do not block readers:
USE master; ALTER DATABASE YourDBName SET READ_COMMITTED_SNAPSHOT ON WITH ROLLBACK IMMEDIATE;
Reach for NOLOCK only when you've decided intentionally that speed beats accuracy. That's a real tradeoff sometimes, and it's almost never the one people think they're making.
Curse 6: The Scalar UDF Hiding in a SELECT
You wrap a little logic in a scalar function because it's tidy and reusable. Then you call it in a SELECT over a million rows, and that tidy little function runs a million times, serially, with the Optimizer mostly blind to what's inside it.
CREATE FUNCTION dbo.udfAmountWithTax (@amt DECIMAL(10,2))
RETURNS DECIMAL(10,2)
AS
BEGIN
RETURN @amt * 1.08;
END
GO
SELECT OrderID, dbo.udfAmountWithTax(Amount) AS WithTax
FROM dbo.Orders;
Historically this was a guaranteed performance crime with per-row execution and a query that refuses to go parallel. But there is a plot twist. Starting in SQL Server 2019, scalar UDF inlining can fold qualifying functions right into the calling query, so it gets cost-based, set-oriented, and parallel-friendly, with no code change. It's on by default at database compatibility level 150 or higher.
This doesn't remove the curse. It just hides it better. Inlining doesn't catch everything. A function that references certain intrinsics, like GETDATE() or @@ROWCOUNT, can be disqualified and quietly fall back to the old per-row misery. And if you're on an older version, or running an older compatibility level in a newer version, you get none of the magic. The honest fix is to not hide set logic inside a scalar shell. Inline the expression, or use an inline table-valued function the Optimizer can actually see through.
SELECT OrderID, Amount * 1.08 AS WithTax FROM dbo.Orders;
In Summary
None of the above throw errors. They pass review and run clean until scale, concurrency, or a schema change surfaces them. That's what makes it helpful to know them on sight. The fix is almost always trivial once you spot the shape, and invisible until you do. Now you have six shapes to spot.
More to Read
Data type precedence (Transact-SQL)
SET TRANSACTION ISOLATION LEVEL (Transact-SQL)
Scalar UDF Inlining
Intelligent Query Processing details
No comments:
Post a Comment