AI Advice - Performance Tuning

AI Advice: Performance Tuning — dbo.usp_GetCustomerOrders

Below I analyze the plan XML and the sp_BlitzCache metrics you provided, list every performance problem I can prove from the evidence, explain why each problem hurts performance, and provide exact corrected code (procedure + recommended DDL fixes) to fix the problems. I do not guess — every finding is tied to the XML or the metrics you gave.

Summary of the biggest problems (so you can jump to fixes)

  • Parameter sniffing / cardinality misestimate: the plan was compiled for one CustomerType value ("VIP") but executions returned anywhere from 1,000 to 500,000 rows. Estimated rows (≈867) vs actual (500,000) mismatch caused a bad join strategy and huge work.
  • Implicit conversion (varchar <-> nvarchar) on CustomerCode: CONVERT_IMPLICIT shown in the plan; this forces per-row conversions and prevents optimal use of indexes.
  • Nested Loops chosen with extremely many inner executions: inner index seek on Customer executed ~999,000 times, driving heavy CPU and I/O.
  • Sort and memory grant inaccuracy / spills: memory grant vs usage and sp_BlitzCache spill counts show memory grant issues likely caused by the wrong cardinality estimates.
  • Non-sargable / operator warnings come from the implicit conversion and the misestimation.

All findings below quote the exact evidence from the plan/metrics and then propose the concrete fixes (exact code).

FINDING 1 — Parameter sniffing / cardinality misestimate

  • Evidence:
    • Parameter compiled value in the plan:
      • <ParameterList> shows: ParameterCompiledValue="'VIP'".
    • Estimated rows in the Query Plan are tiny compared to actual rows:
      • StatementEstRows="867.106" (top of plan).
      • The Sort node RunTimeInformation shows ActualRows="500000".
    • sp_BlitzCache execution metrics (plan cache summary):
      • Estimated Rows: 867.106 (from plan).
      • Row Statistics: Average Returned Rows: 250,500; Min Returned Rows: 1,000; Max Returned Rows: 500,000 (across 4 executions).
      • Execution Count = 4 with extreme result-count variance (min 1k vs max 500k).
    • sp_BlitzCache flagged "Parameter Sniffing".
  • Why this causes poor performance:
    • The plan was compiled for a specific parameter value ('VIP') whose distribution produced a plan optimized for small output, but other parameter values produce far more rows. The optimizer chose a nested-loops join assuming few outer rows; when the actual outer rowcount is large, the nested-loops generates an excessive number of inner seeks and high CPU/IO.
  • Exact fix (two options; I strongly recommend OPTION (RECOMPILE)):
    • Best: Add OPTION (RECOMPILE) on the SELECT so the optimizer estimates based on the actual parameter value at each execution and produces an appropriate join strategy and memory grant.
    • Alternative (if you cannot recompile on every call): OPTIMIZE FOR UNKNOWN or use a local variable to prevent sniffing — but because result distribution is highly skewed, RECOMPILE is the safest.

Exact corrected SELECT line (add only this hint to the proc):

SELECT ... FROM ... WHERE o.CustomerType = @CustomerType ORDER BY o.OrderDate DESC OPTION (RECOMPILE);

(Full corrected procedure is below.)

FINDING 2 — Implicit data type conversion (VARCHAR vs NVARCHAR) on the join column CustomerCode

  • Evidence:
    • The plan contains a ComputeScalar with an explicit CONVERT_IMPLICIT:
      • ScalarString="CONVERT_IMPLICIT(nvarchar(20),[Orders].[dbo].[SalesOrder].[CustomerCode] as [o].[CustomerCode],0)"
      • The plan comment and the Query Text snippet also state: "-- VARCHAR = NVARCHAR: CONVERT_IMPLICIT"
    • The Index Seek on Customer (the inner) uses Expr1002 (the converted value) as its seek key:
      • Seek predicate for Customer: RangeExpressions uses ScalarOperator ScalarString="[Expr1002]" (which is the implicitly converted SalesOrder.CustomerCode).
  • Why this causes poor performance:
    • Implicit conversions on columns generate CPU overhead per row. They make predicates non-sargable in some contexts and cause extra per-row conversion work. In this plan, every outer row has its SalesOrder.CustomerCode implicitly converted to nvarchar and the converted value is used to seek Customer index. Converting the column on the larger side (SalesOrder has 1,000,000 rows per TableCardinality in the plan) is expensive.
    • It also complicates index usage and can prevent an optimal join plan.
  • Exact fix (schema-level recommended):
    • Align data types on the two columns so no implicit conversion is necessary. Change Customer.CustomerCode to varchar(20) (or change SalesOrder.CustomerCode to nvarchar(20), but changing Customer to varchar is preferable if SalesOrder is the large table and already varchar).
    • Exact DDL (run during a maintenance window; check constraints/keys first):
      • ALTER TABLE dbo.Customer ALTER COLUMN CustomerCode VARCHAR(20);
        (Adjust NULL/NOT NULL to match existing definition — example assumes same nullability. If the column is part of an index/PK you may need to drop/recreate or use CREATE INDEX … WITH (ONLINE=ON) depending on SQL Server edition.)
  • If you cannot change schema quickly, a temporary mitigation in the procedure:
    • Force the conversion on the smaller table (Customer) rather than the larger table (SalesOrder). This will still do conversions but on fewer rows:
      • Replace JOIN condition with:
        ON o.CustomerCode = CAST(c.CustomerCode AS VARCHAR(20))
      • That causes conversion on Customer.CustomerCode (500k rows) rather than SalesOrder.CustomerCode (1,000,000 rows).
    • However this is only a workaround; the correct fix is to align column types.

FINDING 3 — Nested loops causing millions of inner executions

  • Evidence from plan:
    • Node 6 (Index Seek on Customer) RunTimeInformation shows:
      • ActualExecutions="999000"
      • ActualEndOfScans="999000"
      • ActualRows="500000" (rows returned by the join)
    • Node 5 (Index Seek on SalesOrder) RunTimeInformation shows:
      • ActualRows="999000" for the SalesOrder index seek (outer).
    • The Nested Loops node (NodeId=1) ActualExecutions show many outer rows (top shows ActualRows="500000" but inner executed ~999k times).
    • The optimizer estimated tiny numbers (EstimateRows=867.106) leading it to choose nested loops; actual outer rows are huge so nested loops produce nearly a million inner seeks.
  • Why this causes poor performance:
    • Nested loops are excellent when the outer input is small. Here, the outer is large (hundreds of thousands) — nested loops cause the inner index seek to run once per outer row, resulting in hundreds of thousands of index seeks and high CPU and I/O. If the optimizer had accurate cardinality, it might choose a hash or merge join and avoid that cost.
  • Exact fix:
    • Fix the cardinality (see FINDING 1) so the optimizer picks a join better suited to large inputs, or force a hash join only after testing. Using OPTION(RECOMPILE) will typically allow the optimizer to pick a hash join for large rowcounts.
    • Additionally, remove the implicit conversion (FINDING 2) so the optimizer can match key types natively and consider more efficient join algorithms.

FINDING 4 — Sort and memory grant inaccuracy leading to spills (sp_BlitzCache metrics)

  • Evidence:
    • Query plan MemoryGrantInfo shows:
      • SerialDesiredMemory="125608" (KB) but GrantedMemory="1840" (KB) — huge difference between desired and granted (this field combination indicates the optimizer's desired vs granted memory situation).
      • MaxUsedMemory="1584" KB.
    • sp_BlitzCache metrics show:
      • Memory Grant Info: Min Grant KB: 1840, Max Grant KB: 1840, Min Used Grant KB: 224, Max Used Grant KB: 1584, Percent Memory Grant Used: 49.13%
      • Spill Info: Min Spills: 0, Max Spills: 10253, Total Spills: 20506, Avg Spills: 5126.50 (many spills across the cached executions)
    • The top of plan shows a Sort operator (NodeId=0) with EstimatedTotalSubtreeCost and ActualRows=500,000 — ordering half a million rows is expensive and requires sufficient memory.
  • Why this causes poor performance:
    • When the optimizer misestimates rows, it requests an inappropriate memory grant. If the memory grant is too small for required in-memory work, operators spill to tempdb, causing huge I/O and CPU overhead. The many spills in sp_BlitzCache indicate this occurred over multiple executions. Sorting 500k rows with insufficient memory causes expensive tempdb activity.
  • Exact fix:
    • Fix cardinality (see FINDING 1) so memory grants are accurate per execution (OPTION(RECOMPILE)).
    • Create an index to avoid the Sort by making data already ordered to satisfy ORDER BY (covering index including the ORDER BY column and included columns). That both removes the Sort operator and reduces memory demand.
    • Example index to satisfy this query (exact code below):
      • CREATE NONCLUSTERED INDEX IX_SalesOrder_CustomerType_OrderDate ON dbo.SalesOrder (CustomerType, OrderDate DESC) INCLUDE (SalesOrderID, CustomerCode, OrderTotal, Status);
      • This allows the WHERE o.CustomerType = @CustomerType and ORDER BY o.OrderDate DESC to be satisfied by an index seek that already returns rows ordered, eliminating the expensive Sort.

COMBINED RECOMMENDED FIXES (exact code you can run)

Important notes before applying schema/index changes:

  • Check indexing strategy and duplicate keys first.
  • Changing column types and nonclustered indexes may require rebuilding indexes and careful handling of constraints and foreign keys.
  • Test changes on a staging replica or during maintenance window.

1) Immediate, low-risk change to the stored procedure — avoid parameter sniffing (OPTION(RECOMPILE)) and explicitly cast if you cannot change schema immediately.

Use this procedure (exact text). This addresses parameter sniffing immediately:

CREATE OR ALTER PROCEDURE dbo.usp_GetCustomerOrders
    @CustomerType VARCHAR(20)
AS
BEGIN
    SET NOCOUNT ON;

    SELECT
        o.SalesOrderID,
        o.CustomerType,
        o.OrderDate,
        o.OrderTotal,
        o.Status,
        c.FirstName,
        c.LastName,
        c.City
    FROM dbo.SalesOrder AS o
    INNER JOIN dbo.Customer AS c
        /* If you have already changed Customer.CustomerCode to VARCHAR(20), use:
           ON o.CustomerCode = c.CustomerCode
           Otherwise (temporary workaround), cast the smaller side to avoid converting the big table:
           ON o.CustomerCode = CAST(c.CustomerCode AS VARCHAR(20))
        */
        ON o.CustomerCode = c.CustomerCode
    WHERE o.CustomerType = @CustomerType
    ORDER BY o.OrderDate DESC
    OPTION (RECOMPILE);   -- Prevents parameter sniffing; produces an execution-plan tailored for the actual @CustomerType
END
GO

Notes:

  • The OPTION (RECOMPILE) hint will force the optimizer to recompile the statement at each execution and use correct row counts for memory grants and join selection. This directly addresses the ParameterCompiledValue='VIP' vs actual-row mismatch you had.
  • If you cannot alter the Customer column type yet, change the JOIN to ON o.CustomerCode = CAST(c.CustomerCode AS VARCHAR(20)) in the procedure as a temporary mitigation (this forces the conversion on Customer, which is smaller: 500k rows vs 1M); but this is only a stop-gap.

2) Recommended schema-level fixes (apply in maintenance window)

A. Fix the implicit conversion by aligning the column types (recommended):

-- Example: change Customer.CustomerCode from NVARCHAR to VARCHAR to match SalesOrder.CustomerCode.
-- Ensure you preserve nullability and constraints; example below assumes NULL allowed — adapt as needed.

ALTER TABLE dbo.Customer
ALTER COLUMN CustomerCode VARCHAR(20);
GO

B. Add an index to remove the Sort and make seeks efficient:

-- This composite nonclustered index allows:
--   WHERE CustomerType = @CustomerType
--   and ORDER BY OrderDate DESC
-- and includes the columns returned to make this a covering index for the query.
CREATE NONCLUSTERED INDEX IX_SalesOrder_CustomerType_OrderDate
ON dbo.SalesOrder (CustomerType, OrderDate DESC)
INCLUDE (SalesOrderID, CustomerCode, OrderTotal, Status);
GO

Rationale: With this index and matching column types,

  • The optimizer can seek by CustomerType and read rows already ordered by OrderDate DESC (no Sort).
  • The join can use typed equality without per-row conversion.
  • With OPTION(RECOMPILE) the optimizer can choose a hash/merge join when appropriate or a nested loops join only when the outer input is small.

3) Alternative if you cannot change schema or add indexes immediately

  • Use the corrected procedure above with:
    ON o.CustomerCode = CAST(c.CustomerCode AS VARCHAR(20))
    plus OPTION (RECOMPILE).
  • This reduces the per-row conversion cost by converting the smaller table's column and mitigates some of the CPU/I/O cost until you can change the schema and add the index.

Why these combined changes will solve each piece of evidence

  • Parameter sniffing: OPTION(RECOMPILE) fixes the mismatch between ParameterCompiledValue='VIP' and executions that return 1k–500k rows. That directly addresses EstimatedRows=867 vs ActualRows=500,000.
  • Implicit conversion: ALTER TABLE... or casting CustomerCustomerCode to VARCHAR removes CONVERT_IMPLICIT(nvarchar(20), SalesOrder.CustomerCode...) in the plan and prevents per-row conversions.
  • Nested loops: after recompile and correct cardinality + matching types, the optimizer will be able to choose a hash/merge join for large inputs, avoiding ~999k inner seek executions.
  • Sort + Memory Grant + Spills: the covering index with OrderDate DESC eliminates the Sort and reduces memory needed; recompile gives accurate memory grant for each run and eliminates the many spills reported in sp_BlitzCache.

Short checklist for deployment

  1. Deploy the procedure change with OPTION(RECOMPILE) right away (low risk).
  2. Test and deploy the schema change (ALTER COLUMN) in a maintenance window (ensure no implicit conversions or FK issues).
  3. Create the suggested nonclustered index to remove the Sort.
  4. Monitor the plan cache and sp_BlitzCache metrics. After fixes you should see:
    • Estimated rows much closer to actual rows in the query plan.
    • Dramatic reduction in ActualExecutions for the inner index seek (far fewer than 999k).
    • Eliminated CONVERT_IMPLICIT in the plan.
    • Far fewer spills and lower total reads/CPU per execution.

If you want, I can:

  • Produce exact ALTER TABLE that preserves the existing NULL/NOT NULL and constraint semantics if you paste the current CREATE TABLE or column definition for dbo.Customer.CustomerCode.
  • Produce a test script to validate before/after plans (showing where CONVERT_IMPLICIT disappears and where ActualRows aligns with estimates).
  • Provide guidance on whether to use OPTION(RECOMPILE) permanently or only for specific callers (based on expected call volume).

Summary of the key code changes again (copy/paste-ready)

1) Procedure (immediate change):

CREATE OR ALTER PROCEDURE dbo.usp_GetCustomerOrders
    @CustomerType VARCHAR(20)
AS
BEGIN
    SET NOCOUNT ON;

    SELECT
        o.SalesOrderID,
        o.CustomerType,
        o.OrderDate,
        o.OrderTotal,
        o.Status,
        c.FirstName,
        c.LastName,
        c.City
    FROM dbo.SalesOrder AS o
    INNER JOIN dbo.Customer AS c
        ON o.CustomerCode = c.CustomerCode
    WHERE o.CustomerType = @CustomerType
    ORDER BY o.OrderDate DESC
    OPTION (RECOMPILE);
END
GO

2) Recommended schema/index changes (maintenance window):

-- Align data types
ALTER TABLE dbo.Customer
ALTER COLUMN CustomerCode VARCHAR(20);
GO

-- Covering index to avoid Sort and cover the query
CREATE NONCLUSTERED INDEX IX_SalesOrder_CustomerType_OrderDate
ON dbo.SalesOrder (CustomerType, OrderDate DESC)
INCLUDE (SalesOrderID, CustomerCode, OrderTotal, Status);
GO

If you want I can prepare the ALTER script that preserves exact nullability and constraints if you paste the column definition and any FKs that reference CustomerCode.

No comments:

Post a Comment