AI Advice - Bad Prompt

This is the full AI Advice column output from the 'bad prompt' test in I Lied to sp_BlitzCache. ChatGPT Believed Me.

Below I analyze the provided plan for the SELECT, show how it interacts with SQL Server locking/lock-escalation behaviour, describe the likely deadlock and blocking chains you’re seeing in production, and give prioritized, actionable recommendations (indexes, isolation-level and lock-hint options, plus transaction / access-order guidance) that will remove the deadlocks without sacrificing correctness.

Summary / key findings

The plan is a parallel hash join/aggregate that reads Regions (small, clustered scan with Country = @Country) and does a large clustered index scan on Orders (1,000,000 rows table, actual rows read large per thread). The join is Regions → Orders (Regions is the build side; Orders is probe).

Missing-index hints show the Orders table would benefit from a nonclustered, covering index (either on OrderDate INCLUDE (RegionID, OrderTotal) or on (RegionID, OrderDate) INCLUDE (OrderTotal)); creating one will convert a big clustered index scan into seeks or narrower scans.

Because the query scans many pages in Orders and runs in parallel (DegreeOfParallelism 8), it will acquire a large number of shared locks (S) on Orders pages/rows during execution. Under default locking behavior these S locks may be retained for the duration of the transaction if the SELECT runs inside a larger explicit transaction that holds other locks — that is the current situation.

The stored procedure runs inside a larger explicit transaction that holds exclusive locks (X) on Orders and Regions for the duration of the batch. That is the immediate root cause of the deadlocks: you have concurrent transactions that both hold X locks and try to acquire S locks (or vice-versa) on the same objects but in different order, which produces cycles.

Parallelism increases the surface area for deadlocks (multiple threads locking different page ranges concurrently can create cycles), and large scans increase chances of lock escalation to page/table locks which will make deadlocks more likely and larger in scope.

Lock escalation paths & deadlock risk

Many row/page locks from the large Orders scan can hit the lock escalation threshold (default threshold = lots of locks). When SQL Server escalates to a table lock, it converts many S locks into a single S or intent lock at table level — that increases incompatibility with other transactions holding X locks and increases deadlock risk.

Parallel threads each acquire locks on different pages. If other concurrent code takes exclusive locks on Orders and Regions in a different order you can get a classic cyclical deadlock (TxA holds X on Orders and wants S on Regions; TxB holds X on Regions and wants S on Orders). The plan reads Regions first then Orders; if some update path takes Orders first then Regions, you have an order mismatch.

The SELECT itself requests many shared locks (unless using versioning or NOLOCK), and because the outer transaction keeps locks for the batch, shared locks can be held longer than the statement lifetime — turning transient read locks into long-lived locks that can deadlock with updates.

Blocking chains to watch for

Long-running exclusive-modifying transaction (TxX) that touches Orders and Regions → holds X locks on some pages/rows/tables.

Reporting SELECT (TxR) tries to read Regions then Orders (acquires S on regions then S on orders). If TxX holds X on Orders, TxR will block; if another concurrent updating transaction TxY holds X on Regions and wants X or S on Orders you can get circular wait.

Lock escalation on Orders to a table-level X or S lock will create broad blocking and increase deadlock probability affecting many sessions.

Recommended remediation — prioritized and with trade-offs

Priority 1 — Fix the design issue: shorten transactions and enforce consistent lock ordering

Ensure the larger explicit transaction does not hold exclusive locks for longer than absolutely necessary. Commit earlier or break work into smaller transactions.

Crucial: make all code paths acquire locks in the same logical order (for this workload the SELECT reads Regions then Orders). If any updater path currently locks Orders first then Regions, change it so it locks Regions first then Orders, or otherwise ensure a consistent ordering across the entire application. This alone often removes cyclic deadlocks.

Why: deadlocks from different lock acquisition orders are the most common root cause. Shortening X-lock hold time reduces chances of S/X conflicts.

Priority 2 — Add a covering nonclustered index to Orders (high expected impact)

The plan and the missing-index hints show the greatest IO and S lock pressure is on Orders clustered index scan. Create a nonclustered index to cover the predicate and the returned columns to turn the scan into an index seek or a much narrower scan.

Recommended index (from MissingIndex group with highest impact):

  • Option A (best if RegionID selective per country): CREATE NONCLUSTERED INDEX IX_Orders_RegionID_OrderDate ON dbo.Orders (RegionID, OrderDate) INCLUDE (OrderTotal);
  • Option B (if Date range is the more selective predicate): CREATE NONCLUSTERED INDEX IX_Orders_OrderDate ON dbo.Orders (OrderDate) INCLUDE (RegionID, OrderTotal);

Benefits: fewer pages read, fewer locks acquired; less chance of escalation; faster queries and lower memory grant. Test both options on representative data distribution.

Why not immediate lock-level option: structural changes reduce locking better and with no correctness tradeoffs.

Priority 3 — Use row versioning (RCSI or SNAPSHOT) to eliminate reader-writer locking

Best production-safe option to avoid shared locks for reads: enable READ_COMMITTED_SNAPSHOT (RCSI) at the database level. It makes READ COMMITTED use row-versioned reads (no S locks during reads), eliminating S/X conflicts with writers and reducing deadlocks dramatically.

Steps:

  1. Ensure you can tolerate versioned reads (you will avoid blocking but reads are slightly stale until committed).
  2. Test and monitor tempdb growth; enabling RCSI increases row-versioning activity in tempdb.
  3. ALTER DATABASE DBA SET READ_COMMITTED_SNAPSHOT ON;

Alternative: use SNAPSHOT ISOLATION for the session/transaction: set ALLOW_SNAPSHOT_ISOLATION ON and then use SET TRANSACTION ISOLATION LEVEL SNAPSHOT; BEGIN TRAN; SELECT ...; COMMIT; — requires application changes and has semantics different from Read Committed.

Why: eliminates shared locks for reads and greatly reduces deadlock risk without resorting to NOLOCK.

Priority 4 — Conservative lock hints / isolation-level adjustments (if you cannot change DB-level settings)

Use a statement-level hint ONLY if you accept the correctness tradeoffs:

  • If dirty reads are acceptable: SELECT ... FROM dbo.Orders WITH (NOLOCK) ...; this avoids S locks, but allows dirty/uncommitted reads.
  • If dirty reads are not acceptable but you cannot enable RCSI: use READ UNCOMMITTED or WITH (NOLOCK) on Orders, or use SNAPSHOT at the session level for the SELECT only.

Avoid forcing ROWLOCK; forcing ROWLOCK can increase number of locks and cause escalation. Avoid forcing table locks (TABLOCK) unless you explicitly want serialization.

If you want to limit parallelism (reduce multi-thread lock interleaving), hint OPTION (MAXDOP 1) on the SELECT. This reduces thread-based deadlocks at potential cost of performance.

Priority 5 — Query-level / plan changes

Use OPTION (RECOMPILE) if parameter sniffing or parameter variability leads to poor plans sometimes — not a direct deadlock fix but can remove repeated bad plans.

If you can run this summary query outside the explicit transaction (move reports after commit), then the SELECT will not be affected by the exclusive locks held by the batch.

Other mitigations (last resort)

Serialize access: use application-level serialization (sp_getapplock) to force single-threaded access across critical operations. This reduces concurrency but can be simpler than complex locking.

Use READPAST for readers to skip locked rows (very rarely desired for aggregates because it skews totals).

Concrete examples and code

1) Create the recommended index (example):

-- Option recommended by missing-index: RegionID first then OrderDate 
-- (good for join then date-range)
CREATE NONCLUSTERED INDEX IX_Orders_RegionID_OrderDate
    ON dbo.Orders (RegionID, OrderDate)
    INCLUDE (OrderTotal);

-- Or alternate:
CREATE NONCLUSTERED INDEX IX_Orders_OrderDate
    ON dbo.Orders (OrderDate)
    INCLUDE (RegionID, OrderTotal);

2) Enable READ_COMMITTED_SNAPSHOT (production checklist: test, monitor tempdb):

-- Ensure no active connections that would block DDL
ALTER DATABASE DBA SET READ_COMMITTED_SNAPSHOT ON;

3) Statement-level hint (if you cannot change DB setting and you accept dirty reads):

SELECT r.RegionName, r.Country, COUNT(*) AS OrderCount, ...
FROM dbo.Orders o WITH (NOLOCK)
INNER JOIN dbo.Regions r WITH (NOLOCK)
    ON o.RegionID = r.RegionID
WHERE r.Country = @Country
    AND o.OrderDate >= @StartDate
    AND o.OrderDate < @EndDate
GROUP BY r.RegionName, r.Country
ORDER BY TotalRevenue DESC;

4) Use SNAPSHOT for only the reporting session:

-- DB must have ALLOW_SNAPSHOT_ISOLATION ON
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRAN;
SELECT ... -- same query
COMMIT;

5) Reduce parallelism (if parallel locking cycles are suspected):

SELECT ... OPTION (MAXDOP 1);

6) Enforce consistent order in other code:

-- If any updating transaction locks Orders then Regions, change it 
-- to access Regions then Orders (or change the SELECT path to match).
-- Updater should do:
UPDATE r WITH (ROWLOCK) ... WHERE ...
UPDATE o WITH (ROWLOCK) ... WHERE o.RegionID = @RegionID;
-- Make sure region rows are touched in the same order 
-- your SELECT touches them.

Checklist to prioritize and test

  1. Add the recommended nonclustered index on Orders; test for plan change, I/O reduction, lock count reduction.
  2. If deadlocks persist, enable READ_COMMITTED_SNAPSHOT after testing in pre-prod and monitoring tempdb.
  3. Shorten/commit larger explicit transaction(s) and enforce consistent lock ordering across code paths.
  4. If unable to change DB setting: apply session-level SNAPSHOT for reporting or carefully use NOLOCK on reporting queries (understand risk).
  5. Only use MAXDOP hint if you see evidence parallel threads are causing deadlocks and other options fail.

Why I prefer index + RCSI over NOLOCK

A covering index reduces I/O and lock counts without changing read semantics.

RCSI removes shared locks for reads without allowing dirty reads (reads see committed snapshot), and is generally safer than NOLOCK.

NOLOCK is fast but can return inconsistent aggregates (phantoms, missing rows) — usually unacceptable for financial aggregates such as TotalRevenue.

Short diagnostic queries to confirm locking behaviour (run while load/testing)

Check blocking sessions and lock owners:

SELECT blocking_session_id, session_id, wait_type, resource_description
FROM sys.dm_os_waiting_tasks
JOIN sys.dm_exec_sessions s ON s.session_id = session_id;

Deadlock graphs: enable trace flag 1222 or use Extended Events system_health or a dedicated XE session to capture xml_deadlock_report so you can see the exact cycle (tables/lock modes/statement order).

Final recommendation (one-line)

  1. Create the nonclustered covering index on Orders (RegionID, OrderDate) INCLUDE (OrderTotal) — highest impact.
  2. Enable READ_COMMITTED_SNAPSHOT if feasible (best safety/behaviour/low-code change).
  3. Shorten transactions and ensure consistent object access order across all code paths (Regions first then Orders) — fix the application-level root cause.

If you want, I can:

  • Produce a concrete test script to measure lock counts before/after creating the index.
  • Produce a query to capture the live deadlock XML from the server or analyze a deadlock graph you have.
  • Suggest a precise index choice based on value distributions (if you can provide RegionID cardinality and distribution of OrderDate ranges).

No comments:

Post a Comment