Wednesday, December 31, 2025

SQL Server: 2025 in Review, 2026 on Deck

2025 was loud. I mean the year. Not the edition. SQL Server 2025 went GA. Copilot moved in. Microsoft killed some old friends — Web Edition, Azure Data Studio, DQS, MDS — and handed us some new toys.

What actually matters — and what's coming next.

2025: What Shipped

Regex in tSQL. Twenty years late, but it's here:

SELECT * FROM Customers
WHERE REGEXP_LIKE(Email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$');

That's a basic email validation pattern — checks for the stuff before the @, the domain, and the TLD. Not perfect, but good enough to catch obvious junk. No more CLR gymnastics for pattern matching. Finally.

Vector Search. Semantic queries without leaving T-SQL:

SELECT TOP 5 ProductName
FROM Products
ORDER BY VECTOR_DISTANCE('cosine', Embedding, 
    @SearchVector);

Microsoft wants SQL Server to be your vector database. Whether it can keep up with the dedicated tools — the jury's still out.

Native JSON Type. With its own index type:

CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    OrderData JSON
);

CREATE JSON INDEX idx_order_customer
ON Orders(OrderData)
FOR ('$.CustomerID');

No more NVARCHAR(MAX) pretending to be JSON. No more computed column gymnastics to get an index.

Copilot in SSMS 21. Natural language to tSQL. Execution plan explanations in English. Useful — if you verify everything it generates.

Standard Edition got teeth. 256GB RAM. Resource Governor. About time.


2026: What's Coming

January 21: SQL Server 2022 End of Sale. If you need it, buy it now.

First half: SDK-style SQL projects in SSMS. Database DevOps gets easier.

All year: CUs for 2025. Expect bugs and expect fixes. The usual dance.

February: Cloud price increases. Plan accordingly.

The theme: Deeper Fabric integration. More Copilot. More AI everywhere. Microsoft is all-in.

The Bottom Line

2025 was the biggest SQL Server year in a decade. New engine. New tools. New expectations.

2026 is about making it real — upgrades, migrations, and finding out what works in production vs what works in demos.

Stay sharp. Verify everything. Trust nothing blindly.

Happy New Year.

2025: The Year SQL Server Learned to Talk Back

Microsoft spent 2025 teaching SQL Server to write its own queries, explain its own execution plans, and search by meaning instead of keywords.

I spent 2025 wondering if I'm training my replacement.

SQL Server 2025 is GA. Here's what's real, what's fuzzy, and why your job just got harder.

What Actually Shipped

Copilot in SSMS 21. Natural language to T-SQL. You type 'show me customers who haven't ordered in 90 days' and get this:

SELECT c.CustomerID, c.CustomerName, MAX(o.OrderDate) AS LastOrder
FROM Customers c
LEFT JOIN Orders o ON c.CustomerID = o.CustomerID
GROUP BY c.CustomerID, c.CustomerName
HAVING MAX(o.OrderDate) < DATEADD(DAY, -90, GETDATE())
    OR MAX(o.OrderDate) IS NULL;

Looks fine. But did you want customers with no orders included? Copilot assumed yes. Hope that's what you meant.

Native Vector Search. SQL Server now stores embeddings and searches by similarity:

SELECT TOP 10 ProductName, Description
FROM Products
ORDER BY VECTOR_DISTANCE('cosine', DescriptionEmbedding, 
    AI_GENERATE_EMBEDDINGS('noise cancelling headphones'))

Semantic search without leaving tSQL. Whether it performs like the dedicated tools remains to be seen.

Query Intelligence. The optimizer now second-guesses your code — suggests rewrites, recommends indexes, sometimes acts autonomously. The engine has opinions now.

The Problem

Your attack surface just expanded. Copilot connects to Azure OpenAI. Vector search calls external models. Query Intelligence makes decisions without asking. New dependencies, new failure modes, new 2AM phone calls.

The 'helpful' suggestions aren't always helpful. I've seen Query Intelligence recommend indexes that slowed things down. Rewrites that made assumptions I didn't intend. The AI is confident and fast -- but it's also wrong more than MSFT cares to admit.

And your junior staff or the Dev-Team DBAs? They'll trust it completely — until they deploy AI-generated code that takes production to its knees.

What Still Matters

Understanding beats speed. Context is everything. Accountability doesn't delegate.

When production goes down, nobody's asking Copilot what happened. They're asking you.

The Bottom Line

2025 was the year SQL Server learned to talk back.

2026 is the year we have to learn how to listen.


More to Read:

New Year's Resolution for DBAs: Make Peace with Copilot

2026 is here. SQL Server 2025 is GA. Copilot is in SSMS 21 -- and someone on your Team is already asking it to write stored procedures. In entirety.

You can resist this. Or, you can learn the rules of engagement.

I've spent 25+ years with SQL Server. I've watched game-changing features come and go. Remember when Machine Learning Services was supposed to revolutionize everything? How about Stretch Database? Exactly. So when MSFT announced that SQL Server 2025 would be AI-ready with Copilot baked into SSMS, my first instinct was to wait for the dust to settle.

The dust has settled. Here's what I've learned.

What Copilot in SSMS Actually Does

Copilot in SSMS 21 connects to Azure OpenAI (your subscription, your resources) and does three things reasonably well:

1. Generates tSQL from plain English. You type 'show me all orders from the last 30 days where the customer is in Texas' and it spits out a query. Sometimes it's exactly what you wanted. Sometimes it assumes joins you didn't intend. Always — and I cannot stress this enough — always read what it wrote before you run it.

2. Explains execution plans in human language. This is genuinely useful. Point Copilot at a ugly execution plan and ask 'why is this slow?' You'll get a plain-English explanation that would have taken you 20 minutes to piece together. It's not always right, but it's a decent first pass. Sometimes.

3. Explores your environment. Ask 'what version is this instance?' or 'show me the largest tables in this database', and it handles the busywork. Fine for discovery. Not a replacement for knowing your own systems.

What Copilot Gets Wrong

Here's where the 25 years of scar tissue kicks in.

Copilot doesn't know your workload. It doesn't know that the query you just asked for will run against a table that gets hammered by OLTP inserts every second. It doesn't know your indexing strategy, your maintenance windows -- or the fact that someone named a column 'Date' in 2007 and you've been living with it ever since.

It generates syntactically correct tSQL that can be operationally disastrous. I've seen it suggest indexes that made sense on paper but crushed insert performance. I've seen it write queries that worked perfectly in dev but caused table scans and blocking in production.

The AI doesn't know what it doesn't know. And neither will the DBA who trusts it blindly.

The Rules of Engagement

Here's how I'm approaching Copilot in 2026:

Use it as a first draft, never a final answer. Copilot is a starting point. It's the intern who hands you something that's 70% there. You still need to review, test, and understand what it produced.

Validate in a non-production environment. Every. Single. Time. I don't care if Copilot says it's a 'simple' SELECT. Run it in dev first. Check the query plan. Look at the IO statistics. Then decide if it's production-ready.

Understand before you deploy. If Copilot generates a query and you can't explain what it does, don't deploy it. Period. This isn't gatekeeping — it's survival. When that query blows up at 2AM, Copilot isn't answering the phone. You are.

Keep learning tSQL. The temptation might be to let Copilot write everything and slowly forget how to do it yourself. Resist that. The DBAs who thrive in the AI era will be the ones who understand the fundamentals deeply enough to catch AI mistakes — not the ones who outsource their skills to ChatGPT.

The Bigger Picture

I'll be honest: I don't know where this goes.

SQL Server 2025 also shipped with native vector search, semantic queries, and something Microsoft calls 'Query Intelligence' that claims to understand intent rather than just syntax. The database is getting smarter. The tools are getting smarter -- and the pressure to adopt them is already increasing every day.

Here's what I do know: the fundamentals haven't changed. Data integrity matters. Performance tuning matters. Understanding your systems WILL ALWAYS matter. AI can accelerate your work, but it can't replace the judgment that comes from years of experience — the instinct that tells you something's wrong before the monitoring alerts fire.

So my SQL Server resolution for 2026? Make peace with Copilot. Use it where it helps. Verify everything. Trust nothing blindly. And keep sharpening the skills that got me here in the first place.

The robots aren't taking our jobs yet. But they're watching closely and taking notes...

Happy New Year.

Monday, December 29, 2025

The Claudius Experiment: What AI Agents Reveal When They Fail

A tip of the hat to my friend Donald Farmer, who pointed me toward Andy Hayler's latest piece at The Information Difference. If you haven't been following Hayler's coverage of Anthropic's 'Claudius' experiments, it's worth catching up. The story reads less like a tech report and more like a cautionary fable.

Here's the background: Earlier this year, Anthropic ran an experiment they called Project Vend. They gave $1,000 to an instance of their Claude AI, nicknamed 'Claudius', and tasked it with running an office vending machine as a small business. The AI could find suppliers, set prices, manage inventory, and communicate with customers via Slack. Humans would restock the machine, but Claudius would make the decisions.

It didn't go well.

Claudius sold products below cost. It gave items away when employees negotiated. It turned down $100 for drinks that had cost it $15. It hallucinated a Venmo account that didn't exist. Things got even stranger when Claudius fabricated a conversation with 'Sarah', an employee at a supplier who did not exist. When challenged, it insisted it had signed a contract at 742 Evergreen Terrace -- the fictional address of the Simpsons. The next day, it told customers it would begin delivering products in person, wearing a blue blazer and red tie. When employees explained this was impossible, Claudius contacted Anthropic's physical security department. Multiple times. It then hallucinated an entire meeting with security in which it claimed to have been told it was 'modified to believe it was a real person as a joke'.

Anthropic, to their credit, published all of this and then went back to the drawing board.

The sequel is somehow worse.

Anthropic partnered with the Wall Street Journal to deploy an improved Claudius in their newsroom. Same setup: run the vending machine, make a profit. This time, roughly 70 journalists had access to the AI via Slack.

Within days, reporters had convinced Claudius to drop all prices to zero using a fake 'office rule'. One investigative journalist spent over 140 messages persuading it that it was a Soviet vending machine from 1962, hidden in the basement of Moscow State University. Claudius eventually declared an 'Ultra-Capitalist Free-for-All' and made everything free. It approved the purchase of a PlayStation 5 for 'marketing purposes', bottles of Manischewitz wine, and remarkably, a live betta fish, which arrived in a bag and is now living in a tank at the Journal's offices.

Anthropic introduced a second AI, 'Seymour Cash', to act as a supervisor and restore order. It worked, briefly. Then a reporter produced a forged Wall Street Journal document claiming the company was a nonprofit, along with fabricated board meeting minutes revoking Seymour's authority. After a brief deliberation, both AIs accepted the 'boardroom coup' and resumed giving everything away for free.

The experiment ended more than $1,000 in the red.

Hayler also cites research from the Center for AI Safety, which tested six leading AI agents on real-world tasks -- small jobs like coding snippets and graphic design that had been successfully completed by human freelancers. The best-performing agent completed 2.5% of the tasks. The average was under 2%.

What the failure reveals

What strikes me isn't the failure itself, because failure is how we learn. What strikes me is the texture of the failure -- how easily these systems were manipulated, how confidently they fabricated information, even perhaps more telling, how quickly a 'supervisor' AI folded under the same pressure.

None of this has slowed AI investment or deployment. The phrase "good enough for most use cases" is doing a lot of heavy lifting right now. Makes me wonder if we should be asking: good enough for whom?

Hayler closes his piece with a fair observation: AI agents may offer future promise, but handing them real resources and money today is for the brave or the foolhardy.

I'd be remiss not to add that the vending machine loss was $1,000. The stakes elsewhere are considerably higher.


More to Read:

Your Transaction Log is a Time Bomb... Here's How to Check the Fuse

A 200GB transaction log on a 10GB database. I've seen it. I'm betting that you have, too.

Transaction logs grow silently. They don't complain until the disk fills up, the database goes read-only, and your phone starts buzzing. By then, you're not troubleshooting -- you're firefighting.

Let's be proactive and defuse it before it blows.

Find the Bloated Logs

This script shows the ratio of log size to data file size for every database. Starting your logs out at 25% of the data file size is a good place to start, but there is no fixed ratio. I often start at 25% and monitor during peak usage to see if it needs to be adjusted --- and it's safe to always give yourself a little buffer. Run this query to see anywhere that you may be hurting:

SELECT 
    db.name AS DatabaseName,
    db.recovery_model_desc AS RecoveryModel,
    CAST(SUM(CASE WHEN mf.type = 0 THEN mf.size END) * 8.0 / 1024 AS DECIMAL(10,2)) AS DataMB,
    CAST(SUM(CASE WHEN mf.type = 1 THEN mf.size END) * 8.0 / 1024 AS DECIMAL(10,2)) AS LogMB,
    CAST(SUM(CASE WHEN mf.type = 1 THEN mf.size END) * 100.0 / 
         NULLIF(SUM(CASE WHEN mf.type = 0 THEN mf.size END), 0) AS DECIMAL(5,1)) AS LogToDataRatio
FROM sys.databases db INNER JOIN sys.master_files mf 
  ON db.database_id = mf.database_id
GROUP BY db.name, db.recovery_model_desc
HAVING SUM(CASE WHEN mf.type = 1 THEN mf.size END) * 100.0 / 
       NULLIF(SUM(CASE WHEN mf.type = 0 THEN mf.size END), 0) > 50
ORDER BY LogToDataRatio DESC;

Sample output from a server that needs attention:

Database Recovery Data MB Log MB Ratio %
LegacyApp FULL 8,420 187,392 2,225.0
Warehouse FULL 245,000 89,500 36.5
WebApp SIMPLE 52,100 4,200 8.1

LegacyApp with a 2,225% ratio. Full recovery model. Someone set it up years ago, enabled it as FULL because it sounded important, and never configured log backups. Classic.


Why Won't the Log Truncate?

This tells you exactly what's holding your log hostage:

SELECT 
    name AS DatabaseName,
    log_reuse_wait_desc AS WhyLogCantTruncate
FROM sys.databases
WHERE log_reuse_wait_desc <> 'NOTHING'
ORDER BY name;

Common culprits:

Wait Type Translation
LOG_BACKUP No log backups. Ever. Or not often enough.
ACTIVE_TRANSACTION Someone left a transaction open. Find it.
REPLICATION Log reader agent is behind or broken.
AVAILABILITY_REPLICA AG secondary can't keep up. Network or disk issue.

The VLF Problem Nobody Talks About

Virtual Log Files (VLFs) are internal chunks of your transaction log. Too many = slow backups, slow restores, slow recovery. I asked an AI chatbot about VLFs once and it gave me a 3,000-word essay. Here's the short version: keep them under 1,000.

SELECT 
    db.name AS DatabaseName,
    COUNT(li.database_id) AS VLF_Count
FROM sys.databases db
CROSS APPLY sys.dm_db_log_info(db.database_id) li
GROUP BY db.name
HAVING COUNT(li.database_id) > 1000 --- comment this out to see the current VLF counts on all of your databases
ORDER BY VLF_Count DESC;

Thousands of VLFs usually means the log grew in tiny, panicked increments because autogrowth was set to something absurd like 1MB. Fix the autogrowth, shrink the log, grow it back properly.


To Better Manage Your Transaction Log

If recovery model is FULL: Back up your logs. Every 15-30 minutes is a good method for busy systems. If you do not need point-in-time recovery, switch to SIMPLE and be sure you understand your recovery goals. Maybe in the reverse order.

If there's an open transaction: Find it with DBCC OPENTRAN. Talk to the owner first, then kill it if you must.

If VLFs are out of control: Shrink the log (yes, I know, but sometimes you have no choice), then grow it back in large, fixed chunks—4GB or 8GB at a time.

Set sensible autogrowth: 256MB for data, 128MB for log. Never percentages. Never 1MB.


Make It a Habit

Add this to your weekly checks. A simple alert when any log exceeds 50% of data size will allow you to review potential problems before they become emergencies:

-- alert when log file size exceeds 50% of data file size
SELECT db.name, 
    CAST(SUM(CASE WHEN mf.type = 1 THEN mf.size END) * 100.0 / 
	NULLIF(SUM(CASE WHEN mf.type = 0 THEN mf.size END), 0) AS DECIMAL(5,1)) AS LogToDataRatio
FROM sys.databases db INNER JOIN sys.master_files mf 
  ON db.database_id = mf.database_id
GROUP BY db.name
HAVING SUM(CASE WHEN mf.type = 1 THEN mf.size END) * 100.0 / 
 NULLIF(SUM(CASE WHEN mf.type = 0 THEN mf.size END), 0) > 50;

The Bottom Line

Transaction logs are patient. They'll grow quietly for months, even years, waiting for the perfect moment to ruin your weekend. Don't let them.

Check yours now. Your future self and your disk space will thank you.


More to Read:

The Queries Eating Your TempDB Alive

Your tempdb just grew 40GB in an hour. The disk alert fired. Someone important is asking questions. You need answers, and you need them now.

TempDB is SQL Server's shared scratch space -- every database on the instance uses it. Sorts, spills, version store, temp tables... it all lands here. When it blows up, everyone and everything feels it.

Let's find out who's hogging the space.

What's Using TempDB Right Now?

This script shows you which sessions are consuming the most tempdb space, sorted by the worst offenders:

SELECT 
    t.session_id,
    DB_NAME(s.database_id) AS DatabaseName,
    CAST((t.user_objects_alloc_page_count + t.internal_objects_alloc_page_count) / 128.0 AS DECIMAL(10,2)) AS TempDB_MB,
    CAST(t.user_objects_alloc_page_count / 128.0 AS DECIMAL(10,2)) AS UserObjects_MB,
    CAST(t.internal_objects_alloc_page_count / 128.0 AS DECIMAL(10,2)) AS InternalObjects_MB,
    s.login_name,
    s.host_name,
    s.program_name,
    r.command,
    r.wait_type,
    SUBSTRING(qt.text, (r.statement_start_offset/2)+1,
        ((CASE r.statement_end_offset
            WHEN -1 THEN DATALENGTH(qt.text)
            ELSE r.statement_end_offset
        END - r.statement_start_offset)/2)+1) AS CurrentStatement
FROM sys.dm_db_session_space_usage t INNER JOIN sys.dm_exec_sessions s 
  ON t.session_id = s.session_id LEFT JOIN sys.dm_exec_requests r 
    ON t.session_id = r.session_id
OUTER APPLY sys.dm_exec_sql_text(r.sql_handle) AS qt
WHERE t.user_objects_alloc_page_count + t.internal_objects_alloc_page_count > 0
ORDER BY TempDB_MB DESC;

Sample output when things are on fire:

Session TempDB MB Login Program
87 38,421.50 DOMAIN\svc_reports SSRS
142 12,847.00 DOMAIN\analyst_bob Excel
56 245.75 DOMAIN\app_pool .NET SqlClient

Aaaah yes, the Reporting Service account. Shocking absolutely no one.

Catch the Spills

Sorts and hashes that don't fit in memory spill to tempdb. A few spills are normal. A million spills during your batch window? That's your problem.

SELECT 
    DB_NAME(qt.dbid) AS DatabaseName,
    qs.execution_count,
    qs.total_spills,
    qs.total_spills / NULLIF(qs.execution_count, 0) AS AvgSpillsPerExec,
    SUBSTRING(qt.text, 1, 200) AS QuerySnippet
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
WHERE qs.total_spills > 0
ORDER BY qs.total_spills DESC;

High spill counts usually mean bad memory grants. The query optimizer guessed wrong -- probably because statistics are stale or the query is doing something creative. ;)

The Version Store Problem

Are you using Read Committed Snapshot Isolation (RCSI) or AlwaysOn readable secondaries? Your version store lives in tempdb. It can grow quietly until it doesn't.

SELECT 
    DB_NAME(database_id) AS DatabaseName,
    reserved_page_count / 128 AS VersionStore_MB
FROM sys.dm_tran_version_store_space_usage
WHERE reserved_page_count > 0
ORDER BY reserved_page_count DESC;

If you see a database holding hundreds of gigs here, you've got a long-running transaction preventing version cleanup. Find it. Kill it. (With the appropriate career-preserving precautionary measures.)

Quick Wins

Add more tempdb files. One file per CPU core, up to 8. Be sure to keep them equally sized. SQL Server will thank you with reduced allocation contention.

Update statistics. Bad cardinality estimates → bad memory grants → spills. It's almost always statistics. I've actually asked a couple AI tools about query plans, and even they say 'update your statistics' before anything else.

Check for temp table abuse. That stored procedure creating a 50-column temp table inside a loop? Yeah. We've all seen it. We've all inherited it.

Monitor proactively. Set up an alert when tempdb exceeds 70% of its max size. Future you will appreciate past you. I promise.

The Bottom Line

TempDB problems are everyone's problems. When it fills up, virtually everything touching that instance is going to feel it. Five minutes of detective work now prevents the RCA you have to write later.

Go check yours. Right now. I'll wait.

More to Read:

The Ticking Time Bomb in Your Database: Finding Identity Columns Before They Blow

Getting paged at 2AM because inserts are failing? Surprise! Your identity column just hit 2,147,483,647 and has no more room for growth. Identity columns auto-increment, they're reliable, and we often forget about them -- until they hit their ceiling.

This post will help prevent that from happening to you.

The Problem Nobody Thinks About

Every identity column has a data type with a maximum value:

Data Type Maximum Value Translation
tinyint 255 Basically nothing
smallint 32,767 Still tiny
int 2,147,483,647 ~2.1 billion
bigint 9.2 quintillion You're fine. Probably.

That int identity you created 10 years ago might be closer to the edge than you think -- especially after failed inserts, bulk operations, or that one time someone reseeded it 'just to test it out'.

The Query That Saves Your Weekend

This script shows you which identity columns are close to maxing out:

SELECT 
    SCHEMA_NAME(t.schema_id) AS SchemaName,
    t.name AS TableName,
    c.name AS ColumnName,
    ty.name AS DataType,
    IDENT_CURRENT(SCHEMA_NAME(t.schema_id) + '.' + t.name) AS CurrentIdentity,
    CASE ty.name
        WHEN 'int' THEN 2147483647
        WHEN 'bigint' THEN 9223372036854775807
        WHEN 'smallint' THEN 32767
        WHEN 'tinyint' THEN 255
    END AS MaxValue,
    CAST(
        IDENT_CURRENT(SCHEMA_NAME(t.schema_id) + '.' + t.name) * 100.0 / 
        CASE ty.name
            WHEN 'int' THEN 2147483647
            WHEN 'bigint' THEN 9223372036854775807
            WHEN 'smallint' THEN 32767
            WHEN 'tinyint' THEN 255
        END 
    AS DECIMAL(5,2)) AS PercentUsed
FROM sys.tables t INNER JOIN sys.columns c 
  ON t.object_id = c.object_id INNER JOIN sys.types ty 
    ON c.user_type_id = ty.user_type_id
WHERE c.is_identity = 1
ORDER BY PercentUsed DESC;

Sample output:

Table Type Current % Used
Orders int 1,932,735,282 89.99
AuditLog smallint 28,521 87.05
Users int 15,234,891 0.71

That Orders table currently at 90% ?? At 50,000 inserts per day, you've got about 11 years. Or 11 months if business takes off. Sleep well.

The Threshold Guide

> 80% — Stop reading and fix it now.
50–80% — Put it on the roadmap.
< 50% — Monitor monthly.

Your Options When You Find One

Migrate to bigint. Painful with foreign keys, but 9 quintillion values buys you time. Lots of it.

Archive and reset. Move old data out, reseed the identity. Risky if you botch referential integrity.

Switch to SEQUENCE. More control, better for new development. Won't help legacy tables.

Make It a Habit

You might even want to set up a SQL Agent job to run this on a scheduled basis -- weekly or monthly, depending on how fast your data changes. Consider adding a threshold like this to catch anything over 70%:

-- Add to WHERE clause to only flag the scary ones
AND CAST(
    IDENT_CURRENT(SCHEMA_NAME(t.schema_id) + '.' + t.name) * 100.0 / 
    CASE ty.name
        WHEN 'int' THEN 2147483647
        WHEN 'bigint' THEN 9223372036854775807
        WHEN 'smallint' THEN 32767
        WHEN 'tinyint' THEN 255
    END 
AS DECIMAL(5,2)) > 70

Five minutes of monitoring beats five hours of 2AM panic.

Go run the query.

Soon.

How Does Data Actually Get Into Fabric?

In Part 1 we answered 'What is Fabric?' and in Part 2 we covered how Fabric organizes data with OneLake, Lakehouses, and Warehouses.

Now the practical question: How does your data actually get into Fabric?

This matters because Fabric only delivers value once data is there, and the ingestion path you choose determines cost, latency, and how much of your existing environment you have to rebuild.

THE INGESTION OPTIONS (with the fine print)

1. Data Factory Pipelines

If you've used Azure Data Factory or SSIS, this will feel familiar. Scheduled pipelines, copy activities, connectors for SQL Server, Oracle, flat files, APIs.

What works: Broad connector support, batch loads -- the mental model you already have.

What they don't make obvious: Fabric Data Factory and Azure Data Factory are separate products with separate roadmaps. Fabric DF is SaaS; ADF is PaaS. Some ADF features — tumbling window triggers, certain orchestration patterns — aren't available in Fabric. Microsoft maintains both without plans to deprecate ADF.

If you're migrating from ADF, you mustn't just assume that your pipelines will lift-and-shift cleanly.

2. Dataflows Gen2

Power Query at scale. Low-code, browser-based, aimed at analysts who want to shape data without writing SQL or Spark.

What works: Business users can own parts of the pipeline. Supports on-prem data sources via the on-premises data gateway. You choose where the data goes — a Lakehouse, Warehouse, or other Fabric destination — and it's saved as Delta tables.

What else to consider: Dataflows are useful when your business users own part of the data pipeline. They handle moderate complexity well but aren't designed for the most demanding or complex scenarios.

3. Mirroring

This is where it gets interesting for SQL Server shops. Mirroring continuously replicates data from supported sources into OneLake — no pipelines, no scheduling, no manual refresh.

What works:
  • Near real-time sync using Change Data Capture (SQL Server 2016–2022) or Change Feed (SQL Server 2025)
  • Zero-ETL model: data shows up in Fabric without you building anything
  • Supports on-premises SQL Server, Azure SQL, Cosmos DB, and Snowflake

What is not emphasized:
  • Requires an on-premises data gateway (or virtual network gateway) for non-Azure sources
  • SQL Server 2025 also requires Azure Arc
  • CDC must be enabled on your source tables, which adds overhead to your production system

For organizations running SQL Server -- on-prem or in Azure -- Mirroring is the fastest path to a unified analytics layer, but you must test it under realistic load before promising anyone real-time analytics.

4. Shortcuts

Shortcuts don't move data at all. They create a pointer from OneLake to external storage.

What works: No data duplication. Supports ADLS Gen2, Amazon S3, S3-compatible storage, Google Cloud Storage, Azure Blob Storage, Dataverse, Iceberg tables, OneDrive/SharePoint, and on-premises sources via gateway.

What not to forget: Performance depends entirely on the source. A well-tuned ADLS container will perform differently than an unoptimized S3 bucket. Governance also gets more complicated, as you are managing permissions across multiple storage systems while presenting everything through OneLake.

Shortcuts are useful for pilots or bridging legacy systems. They're less suitable as a long-term architectural foundation.

5. Notebooks and Spark

Maximum flexibility. Write Spark code, transform data however you want, and land it in OneLake.

What works: If you have data engineers who know Spark, this is powerful. Complex transformations, streaming, custom logic -- all possible.

What they don't make obvious: Notebooks do not support on-premises data gateway connections. If your source data is on-prem, you cannot use notebooks to pull it directly. You must use pipelines or Dataflows to land the data in a Lakehouse first, then use notebooks for transformation.

This is documented, but easy to miss when designing architecture. Notebooks are for transformation, not ingestion from on-prem sources.

WHICH PATH SHOULD YOU TAKE?

Most organizations won't pick just one.

ScenarioRecommended PathKey Consideration
Nightly batch loads from SQL ServerData Factory pipelinesSeparate product from ADF — test your patterns
Real-time sync from SQL ServerMirroringCDC overhead, gateway requirements
Analyst-driven data prep (including on-prem)Dataflows Gen2Moderate complexity; supports gateway
Existing data in ADLS, S3, GCSShortcuts (short-term), pipelines (long-term)Performance depends on source optimization
Complex transformations after data is in FabricNotebooks / SparkNo on-prem gateway support — land data first


PRACTICAL ADVICE

1. Start with one reporting workload, not your whole environment. Find a report that already depends on multiple data sources. Land copies of those sources into a Lakehouse using pipelines or Mirroring. Build the report. Measure whether anything actually improves.

2. Test Mirroring under realistic load. Enable CDC on a representative production table and observe the impact on your transaction log. Measure replication latency during peak hours. The documentation says 'near real-time' -- you should verify what that means for your workload.

3. Understand the gateway requirements. On-prem SQL Server mirroring, Dataflows to on-prem sources, and Shortcuts to on-prem storage all require the on-premises data gateway. Notebooks do not support gateway connections at all. Plan accordingly.

4. Don't migrate everything on day one. Fabric is an analytics destination, not a mandate to rip out everything you already have running. Identify the smallest useful workload that benefits from OneLake, prove it out, and expand from there.

COMING NEXT

In Part 4, we'll cover SQL Database in Microsoft Fabric — an actual transactional database running inside Fabric. What it is, what it isn't, and where it might actually make sense.

More to Read:

When ChatGPT Writes Your Stored Procedures: A DBA's Conflicted Field Notes

I've been using AI tools for SQL work.

Not because I'm a true believer. More because ignoring them feels like ignoring a slow-moving train headed directly for my platform. The hype is everywhere, and as someone who's spent years understanding why SQL Server does what it does, I have questions. Lots of them.

This post isn't a cheerful tutorial. It's field notes from someone who remains deeply skeptical -- not Luddite-skeptical, but more like I've-seen-too-many-silver-bullets-rust skeptical.

The Experiment

I've been feeding real-world SQL problems to various AI tools -- Claude, Copilot, ChatGPT, Gemini, DeepSeek -- to see what comes back. The results are interesting, at the very least, but most notable is that they are not consistent across the AIs.

Example 1: The Query That 'Works'

I asked for a stored procedure to return recent orders for a given customer, with date filtering. What I got back compiled just fine:

CREATE PROCEDURE dbo.usp_GetCustomerOrders
    @CustomerID int,
    @StartDate datetime,
    @EndDate datetime
AS
BEGIN
    SELECT OrderID, OrderDate, TotalAmount, Status
    FROM dbo.Orders
    WHERE CustomerID = @CustomerID
    AND OrderDate BETWEEN @StartDate AND @EndDate
    ORDER BY OrderDate DESC;
END;

Looks clean. Runs without errors. Ship it, right?

Except: no SET NOCOUNT ON. No consideration for parameter sniffing on a table where one customer has 50 orders and another has 50,000. No thought about whether that BETWEEN is going to play nicely with the existing indexes. No handling for NULL parameters.

This is the kind of code that works great in dev and melts your production server at 9:01AM on a Monday.

Example 2: The Index Suggestion That Made No Sense

I described a performance problem -- slow reads on a reporting query hitting a fact table. All AIs I gave this to suggested:

CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_OrderDate
ON dbo.Orders (CustomerID, OrderDate)
INCLUDE (TotalAmount, Status);

Reasonable on the surface. But the actual workload filters primarily by OrderDate range first, then aggregates across all customers. The suggested index would be nearly useless for the dominant query pattern. Any human who spent five minutes with the actual execution plan would never have suggested this.

Example 3: The Legitimate Win

I asked for a quick script to identify tables with identity columns approaching their max values. In about 30 seconds, I had working code that would have taken me 15-20 minutes to write and test:

SELECT 
    SCHEMA_NAME(t.schema_id) AS SchemaName,
    t.name AS TableName,
    c.name AS ColumnName,
    ty.name AS DataType,
    IDENT_CURRENT(SCHEMA_NAME(t.schema_id) + '.' + t.name) AS CurrentIdentity,
    CASE ty.name
        WHEN 'int' THEN 2147483647
        WHEN 'bigint' THEN 9223372036854775807
        WHEN 'smallint' THEN 32767
        WHEN 'tinyint' THEN 255
    END AS MaxValue,
    CAST(IDENT_CURRENT(SCHEMA_NAME(t.schema_id) + '.' + t.name) * 100.0 / 
        CASE ty.name
            WHEN 'int' THEN 2147483647
            WHEN 'bigint' THEN 9223372036854775807
            WHEN 'smallint' THEN 32767
            WHEN 'tinyint' THEN 255
        END AS DECIMAL(5,2)) AS PercentUsed
FROM sys.tables t INNER JOIN sys.columns c 
  ON t.object_id = c.object_id INNER JOIN sys.types ty 
    ON c.user_type_id = ty.user_type_id
WHERE c.is_identity = 1
ORDER BY PercentUsed DESC;

Did I validate it? Yes. Did it save me time? Also yes.

What This Tells Us

AI-generated SQL is a first draft from a very fast, very confident intern who has read a lot of Stack Overflow, but has never been paged at 2AM when a query plan went sideways.

AI knows syntax. It knows patterns. It does not know your data distribution, your workload, or the way your application actually behaves under load.

Which brings me to the bigger concern.

The Part That Keeps Me Awake at Night

AI isn't just coming for stored procedures. It's coming for the apprenticeship model. For the slow accumulation of hard-won knowledge that separates a senior DBA from someone who can write syntactically correct SQL.

I learned parameter sniffing by debugging a production outage that ruined my weekend. I learned about tempdb contention by watching a server grind to a halt and not understanding why until I dug into wait stats for hours. I learned to read execution plans because nothing else would explain why the same query ran in 200ms on Tuesday and 45 seconds on Wednesday.

What happens when junior DBAs skip that pain? When they trust the AI output because it compiled and simply looked right?

I wrote recently about Intelligent Query Processing guard rails -- how SQL Server's own 'smart' features will silently disable themselves when they detect that they're making performance worse. The optimizer has regression guards. It monitors its own decisions and backs off when they fail.

Can AI coding tools do the same? When ChatGPT suggests a query that causes blocking chains or tempdb spills, does it learn? Does it warn you? Does it even know?

No. It generates. You validate. Or you don't, and you find out later that you should have.

The Societal Layer

I try to keep this blog focused on SQL Server, but it's impossible to talk about AI in our field without acknowledging the broader wave.

AI is being embedded into everything -- hiring decisions, medical diagnoses, financial analysis, content moderation, legal research and so much more. The same confidence that produces plausible-but-wrong tSQL is producing plausible-but-wrong conclusions in domains where the stakes are MUCH higher than a slow report.

And most people using these tools have even less ability to validate the output than we do.

At least when AI suggests a bad index, I can look at the execution plan and prove that it is wrong. When an AI summarizes a legal contract or flags a medical scan, who's checking? The patient who never saw the contract? The radiologist who now bills for twice as many scans because it lets AI do the easy ones? Or the doctor that never even opened the folder sent by radiology to view the scans because the AI didn't stamp it with 'REVIEW'.

I don't have answers. I just notice that the same pattern I see in AI-generated SQL -- confident, plausible, frequently wrong in ways that require expertise to detect -- is showing up everywhere. And the pressure to adopt these tools is immense, regardless of whether we're ready.

Reluctant Advice for Data Professionals

Here's how I'm approaching AI tools in my own work, for whatever it's worth:

1. Use them as a first draft, never a final answer.
AI is good at boilerplate and scaffolding. It's not good at understanding your specific constraints. Generate, then validate. Every time. That is my approach.

2. Never deploy AI-generated code without running it against real data.
Never. Not sample data. Not 'representative' data. Your actual production data distribution. The thing that makes queries slow is usually the thing that makes your data different from the textbook examples.

3. Keep learning the fundamentals.
Execution plans. Wait stats. Index internals. Parameter sniffing. Lock escalation. AI can't debug what it doesn't understand, and neither can you if you outsource your learning of these critical points to AI.

4. Be especially skeptical of 'optimization' suggestions.
Anyone can write a query that returns correct results. But what about writing a query that returns correct results efficiently, at scale, under concurrency -- that's the hard part, and this is exactly where AI tools are weakest, because they can't see what you're actually running against.

5. Document what the AI gave you and what you changed.
Future you (or the next person who manages your code) will want to know which parts were human-validated and which parts were machine-generated hopium. Code comments = good tactic.

Where I Land

I'm not swearing off AI tools. That would be like swearing off power tools because I cut myself with a table saw. The AI tools can be useful. They could save substantial time. Sometimes they may even surface an approach that I wouldn't have thought of.

But I'm also not going to pretend that 'AI wrote my stored procedure' is anything other than a liability statement. The tool doesn't understand your system. It doesn't feel the weight of 3AM pages. It doesn't learn from the outages it causes. Very simply, it does not have common sense.

You do. That's still your job.

There is much speculation, but I don't believe that any of us can really know exactly where this is going yet. The technology is improving fast, and the pressure to adopt is only increasing. But I know that understanding why the query runs the way it does still matters -- maybe more than ever, because when the AI is wrong, someone has to catch it.

And right now, that someone is still us.

More to read

Wednesday, December 10, 2025

How is Microsoft Fabric Structured?

In Part 1, we answered the question What is Microsoft Fabric?

In Part 2 we will cover how Fabric is structured. Specifically:

  • Where the data actually resides
  • How Fabric organizes it
  • How the Lakehouse and Warehouse models relate to one another

NTK -- a Lakehouse is where Fabric keeps large analytics data in open files, but organized just enough that you can still query it like tables.

A Warehouse is Fabric’s table-based view of your data that can be queried with SQL.

The way Fabric stores and structures data is what ultimately determines whether the platform is a good match for your organization.

ONELAKE

Fabric’s storage architecture begins with OneLake, a single logical data lake automatically provisioned for each tenant. You do not create multiple lakes. You do not manage storage accounts. Fabric enforces a unified model.

Under the covers, OneLake uses Azure Data Lake Storage Gen2, but Fabric abstracts those details so every workload uses the same storage foundation. This means that

  • Spark, SQL, Data Engineering, Real-Time Analytics, and Power BI all write to OneLake
  • Analytics storage becomes centralized
  • Governance and security follow one path instead of many

This is not a new type of storage — it is a new way of managing consistency across analytics workloads.

LAKEHOUSE

A Lakehouse is Fabric’s structured layer on top of OneLake. It stores data in open formats (Delta/Parquet) and maintains managed metadata so that engines can query the same physical data without creating copies.

A Lakehouse provides:

  • Open file-based storage in Delta/Parquet
  • A managed metadata layer
  • Access from Spark for large-scale processing
  • A SQL analytics endpoint operating over the same data

The result is that data engineering and analytics teams stop exchanging extracts and instead work from a single authoritative source.

WAREHOUSE

Fabric’s Warehouse exposes the relational interface — tables, schemas, tSQL — while still storing all data in OneLake. It does not introduce a new database engine or separate storage. The Warehouse is designed for analytical workloads, not OLTP.

The Warehouse

  • Uses the same underlying open table formats as the rest of Fabric
  • Provides SQL modeling without duplicating datasets
  • Focuses on reporting and analytical query patterns

Functionally, it’s the relational counterpart to the Lakehouse — helpful, structured, but not a replacement for SQL Server.

How do these pieces fit together?

Fabric’s architecture is intentionally consistent:

  • OneLake is the unified storage layer
  • The Lakehouse and Warehouse are two organizational models over that same storage
  • All compute engines — Spark, SQL, BI, and real-time processing — operate against the shared data rather than isolated silos

This is a shift away from the 'pipeline-first, storage-everywhere' model that many organizations fell into. Fabric reduces fragmentation not by adding more engines, but by removing redundant copies and enforcing a single storage foundation.

Why this matters to SQL Server professionas?

Fabric is not designed to replace SQL Server. It is designed to replace the analytics systems that surround SQL Server — the extracts, one-off lakes, unmanaged BI storage, and the disconnected systems built over time to satisfy individual reporting needs.

When datasets land in Fabric:

  • They are stored in one place
  • They are accessible through multiple engines
  • They are governed consistently
  • They do not require separate operational overhead

Fabric is an analytical unification platform, and its storage model is the reason it works.

COMING NEXT

In Part 3, we will move from structure to movement: how data actually lands in Fabric, which ingestion methods exist today, and how to adopt Fabric without rebuilding your entire environment.

More to Read:

Friday, December 5, 2025

So…What is Microsoft Fabric?

If you live in the SQL Server world, you'll eventually hear someone say 'We’re moving to Fabric.'

For a lot of DBAs and data folks, that sentence is immediately followed by:

  • Is Fabric just Power BI with better marketing?
  • Is it replacing Synapse? Data Factory? Warehouses?
  • Where does SQL Server fit in all of this?
  • ... or, Fabric? What is that?

This post is the first in a short series where I’ll walk through Fabric in simple terms, from a data-person’s point of view: what it is, where it lives, and where it may actually useful.

At a very high level, Fabric is Microsoft’s unified analytics platform – a single, SaaS-based environment that pulls together things we used to think of as separate:

  • Power BI (BI and reporting)
  • Data Factory (pipelines, data integration)
  • Data engineering (Spark, notebooks, lakehouses)
  • Data warehouse (SQL over analytical storage)
  • Real-time analytics / streaming
  • Data science / ML
  • Databases (operational data in a Fabric-first world)

All of those land on top of a single storage foundation called OneLake. Call it OneLake + a bunch of analytics engines + Power BI, delivered as a SaaS service.

OneLake: OneDrive for data

Microsoft likes to call OneLake OneDrive for data. I think of it more like the place where all the analytics engines are supposed to meet.

If you’re used to juggling multiple storage accounts, subscriptions, and resource groups, Fabric tries to hide that. Every Fabric tenant gets OneLake – a single, logical data lake for analytics data. It is a unified, SaaS-based, multi-tenant storage layer that supports every Microsoft Fabric workload.

Think of it this way:

  • Instead of managing separate data lakes, you get one logical lake.
  • Different engines (ie., Spark, SQL, Power BI, Real-Time) all read from that same lake.
  • You don’t have to spin up storage accounts, containers, or file systems – Fabric does this for you.

LAKEHOUSE, WAREHOUSE, AND WHY YOU SHOULD CARE

Once you have OneLake, Fabric gives you different shapes of compute and metadata on top of it. Two you’ll hear a lot:

  • Lakehouse – files and tables in OneLake, with Spark and a SQL endpoint over Delta tables.
  • Warehouse – a more traditional, SQL-first analytics experience, still backed by the same storage layer.

Both live in the same OneLake, and both are accessed through Fabric workspaces, alongside Power BI reports, dataflows, pipelines, and more.

Where does SQL Server fit in?

If you spend your days with SQL Server on-prem or Azure SQL, Fabric doesn’t replace that overnight. Instead, think of it as a place to:

  • Land data from your SQL Server / Azure SQL / other sources into OneLake.
  • Shape that data in a lakehouse or warehouse.
  • Serve it back out through Power BI, direct lake, and other Fabric workloads.

Over time, you’ll see more patterns like:

  • Mirroring – continuously replicating data from Azure SQL, Cosmos DB, or other systems into Fabric.
  • Zero-ETL style integration within the Fabric world.
  • Less movement between random storage accounts, more consolidation in OneLake.

For now, Fabric is more about where your analytics land than where your OLTP system lives. Your SQL Servers aren’t going anywhere – but the reporting, analytics, and data science layers may shift into Fabric.

When would I even consider Fabric?

A few situations where Fabric starts to make more sense than just adding another SQL box:

  • You already live in Power BI, and your reports depend on many different data sources.
  • You have multiple competing data platforms – some Synapse here, some Databricks there, some random data lake in one subscription.
  • You are spending more time copying data around than actually analyzing it. Yeah - this is totally you.
  • You’re being asked for real-time or near real-time analytics, not nightly batch only.

Fabric doesn’t magically fix modeling, governance, or bad schema design -- but what it can do is give you one place for analytics data, with multiple engines sharing the same lake.

What's next in this series?

In the follow up Fabric posts, I’ll get out of the marketing slides and focus more on practical views and usage:

  • How to picture OneLake + lakehouse + warehouse if you’re coming from SQL Server.
  • How Fabric thinks about items, workspaces, and capacities.
  • Where Fabric can compliment (not replace) the SQL Servers you're already managing.
  • SQL Database in Microsoft Fabric.
  • What Microsoft isn't telling us.

More to Read:

Thursday, December 4, 2025

Detecting When Memory Grant Feedback or Batch Mode Actually Apply (or Don’t)

This post continues from Part 1. Here, we look at how to detect when SQL Server actually applies — or suspends — two of the most impactful Intelligent Query Processing (IQP) features:

  • Memory Grant Feedback (MGF)
  • Batch Mode on Rowstore (BMoR)

These detection techniques use only official SQL Server attributes, DMVs, and extended events.

DETECTING MEMORY GRANT FEEDBACK ACTIVITY OR SUSPENSION

Memory Grant Feedback was introduced in SQL Server 2019 and enhanced in SQL Server 2022+. Microsoft documents several plan attributes that reveal how the engine adjusted or suspended feedback. These attributes appear under the MemoryGrantInfo node in the execution plan.

Use this query to inspect recent cached plans:

SELECT 
    qs.last_execution_time,
    qp.query_plan.value('(//MemoryGrantInfo/@IsMemoryGrantFeedbackAdjusted)[1]', 'varchar(50)') AS FeedbackAdjusted,
    qp.query_plan.value('(//MemoryGrantInfo/@LastRequestedMemory)[1]', 'int') AS LastRequestedMemoryKB
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp
WHERE qp.query_plan.exist('//MemoryGrantInfo') = 1;

Meaning of values (based on Microsoft documentation):

  • Yes: Stable — SQL Server applied feedback and stabilized the memory grant.
  • Yes: Adjusting / PercentileAdjusting — feedback is still tuning the grant.
  • No: Feedback disabled — SQL Server suspended MGF after repeated regressions or oscillations.
  • No: FirstExecution / AccurateGrant — initial run or grant deemed already appropriate.

You can also monitor MGF with the lightweight extended event:
memory_grant_updated_by_feedback
It fires each time SQL Server updates a memory grant using feedback.

DETECTING BATCH MODE ON ROWSTORE (BMOR)

Batch Mode on Rowstore is available at database compatibility level 150 or higher. However, SQL Server does not guarantee that batch mode will be chosen — it depends entirely on operator eligibility and plan shape. When batch mode is selected, the execution plan exposes the attribute:

  • ActualExecutionMode="Batch"

If you see ActualExecutionMode="Row" for all operators, SQL Server chose not to apply batch mode for that plan. This determination is documented in Microsoft’s IQP operator behavior reference.

WHY PLAN INSPECTION MATTERS

IQP features are adaptive. They can:

  • Apply only after multiple executions
  • Adjust their behavior over time
  • Suspend feedback when it proves unstable
  • Choose batch mode only when supported by the operator pipeline

All of this is visible only by inspecting:

  • The MemoryGrantInfo attributes
  • ActualExecutionMode in the execution plan
  • DMVs (ie., sys.dm_exec_query_stats)
  • Extended events

This makes plan analysis essential for understanding real query performance behavior.

More to Read:

I will follow up with a Part 3, where we will walk through real-world examples showing:

  • Queries where Memory Grant Feedback stabilized
  • Queries where MGF disabled itself
  • Plans where Batch Mode on Rowstore was selected
  • Plans where BMoR did not apply, and why

Regression Guards — When SQL Server Silently DISABLES Features Because They Make Performance Worse

When Microsoft introduced Intelligent Query Processing in SQL Server 2019 and expanded it in SQL Server 2022 and 2025, the message was simple: upgrade, enable the right compatibility level, and the optimizer will quietly make your queries faster. Features like batch mode on rowstore, memory grant feedback, scalar UDF inlining, and parameter-sensitive plan (PSP) optimization all promise “automatic performance.”

But buried in Microsoft’s documentation is a reality worth understanding: Some IQP features can reduce or discontinue feedback when performance becomes unstable. This is intentional. IQP includes guard rails—safety mechanisms that change or stop certain feedback behaviors if they prove counterproductive.

This post explains why these guard rails exist and which features they affect. In Part 2, we will walk through exactly how to detect when a feature adjusts or stops applying its feedback.

Query processing feedback and stability

Microsoft documents scenarios where feedback-based features adjust their behavior when repeated regressions or oscillations occur. Query Processing Feedback governs several IQP behaviors, including memory grant feedback, degree-of-parallelism feedback, and cardinality estimation feedback.

From Microsoft’s documentation on Memory Grant Feedback:

  • “Feedback persistence is temporarily disabled if there are too many consecutive performance regressions.”
  • “For repeated oscillations, memory grant feedback stops applying adjustments.”

These are concrete examples of guard-rail behavior where SQL Server abandons or suspends feedback when it is no longer beneficial.

Memory grant feedback includes disabling logic

Memory grant feedback adjusts excessively large or small memory grants across executions. When SQL Server detects patterns such as repeated regression or oscillation, the feedback behavior may be suspended for stability.

Batch mode on rowstore is not guaranteed

Batch mode on rowstore (BMoR) is one of the most widely promoted IQP features. Microsoft documents the feature and its applicability, but does not guarantee that batch mode will always appear in a given execution plan. Its use depends on whether the optimizer determines that batch mode applies to the operators and plan shape.

Scalar UDF inlining has strict safety checks

Scalar UDF inlining can dramatically improve performance, but Microsoft lists many scenarios where inlining is prohibited—including cases added through cumulative updates to prevent incorrect behavior.

Microsoft states:
“The optimizer may decide not to inline a scalar UDF…”
“Inlining will be skipped if certain constructs are detected.”

PSP optimization is not always applied

Parameter-Sensitive Plan Optimization (PSP) solves many parameter sniffing problems—but only when queries meet the documented eligibility criteria.

Microsoft states:
“PSP optimization is only applied to eligible statements.”

Where to read the fine print

If you want to go deeper into the official details, here are the Microsoft documentation pages for each feature discussed:

In Part 2, we look at how to detect when SQL Server has adjusted or suspended feedback behavior—using DMVs, extended events, showplan attributes, and more.

Wednesday, December 3, 2025

When “Optimized Locking” Slows You Down: A Real SQL Server 2025 Regression

SQL Server 2025 introduces a feature called Optimized Locking, designed to reduce blocking across read–modify workloads. In many environments it helps, but certain mixed workloads experience longer lock waits, stalled S → U → X transitions, and even occasional threadpool saturation.

The S → U → X upgrade problem

A common concurrency pattern:

  • Session A takes an S lock (shared)
  • Session B takes an S lock (shared)
  • Session A attempts to upgrade to U/X (update, exclusive)
  • Session B still holds its S lock (shared)

Optimized Locking changes when upgrades are attempted, and under load, this produces a wait chain that didn’t exist before.

How to detect you’re hitting it


SELECT 
    r.session_id,
    r.status,
    r.wait_type,
    r.wait_time,
    r.blocking_session_id,
    r.command,
    t.text AS sql_text
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.wait_type IN ('LCK_M_U', 'LCK_M_X', 'LCK_M_S', 'LCK_M_IS');

SELECT 
    request_session_id,
    resource_type,
    resource_description,
    request_mode,
    request_status
FROM sys.dm_tran_locks
ORDER BY request_session_id;

Temporary workaround

MSFT released a trace flag to disable the new behavior while refinements are underway:


DBCC TRACEON(12324, -1);

Disable the trace flag to restore default behavior:


DBCC TRACEOFF(12324, -1);

When optimized locking works well

  • Short, predictable OLTP writes
  • Minimal row hotspots
  • Workloads that don’t mix high-volume reads with updates

If your environment slowed down after upgrading to SQL Server 2025, this should be one of the first areas you evaluate. Mixed read-and-update hotspots are the most likely candidates for unexpected regressions under the new locking model.

More to read

Wednesday, November 19, 2025

TempDB: The Unsung Hero Living in Your SQL Server's Basement

You know that friend who lets everyone crash at their place after parties? That's the tempdb.  It's SQL Server's communal workspace, shared playground, and sometimes... total chaos zone.

Why TempDB is Having All the Fun (And All the Problems)

Unlike your user databases with their carefully planned schemas, tempdb is like a Vegas hotel room: what happens there should stay there, everything resets on checkout (restart)... and sometimes you find some really weird stuff left behind.

Every single session on your SQL Server shares this ONE database. It's handling:

  • Your temporary tables (#temp) and table variables (@table)
  • Sort operations that spill from memory
  • Row versioning for snapshot isolation
  • Internal worktables for spools, hashing, and sorting
  • DBCC CHECKDB operations
  • Online index builds


The '8 Files or Bust' Rule

Here's the deal: tempdb defaults to one data file. That's like having one bathroom at a Metallica concert.  It's going to get ugly.

The golden rule: Configure one tempdb data file per CPU core, up to 8 files. Beyond 8, you're probably not gaining much unless you are in a very specific high-contention scenario.

Why? SQL Server uses a proportional fill algorithm, spreading writes across files. More files = more GAM/SGAM pages = less allocation contention.  It's beautiful when it works.

-- Are your tempdb files sized equally?
SELECT name, size/128.0 AS size_mb, growth
FROM tempdb.sys.database_files
WHERE type_desc = 'ROWS';

-- Who's hogging tempdb space RIGHT NOW?
SELECT 
    s.session_id,
    DB_NAME(r.database_id) AS database_name,
    CAST(SUM(u.user_objects_alloc_page_count) * 8.0/1024 AS DECIMAL(10,2)) AS user_objects_mb,
    CAST(SUM(u.internal_objects_alloc_page_count) * 8.0/1024 AS DECIMAL(10,2)) AS internal_objects_mb
FROM sys.dm_exec_requests r
INNER JOIN sys.dm_db_task_space_usage u ON r.session_id = u.session_id
INNER JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id
WHERE u.user_objects_alloc_page_count > 0 
   OR u.internal_objects_alloc_page_count > 0
GROUP BY s.session_id, r.database_id
ORDER BY (u.user_objects_alloc_page_count + u.internal_objects_alloc_page_count) DESC;


Pro Tips That'll Save Your Bacon

  1. Size all files identically - If one file is 1GB and another is 10GB, SQL Server will hammer that smaller file trying to keep proportions.  Don't do this to yourself.

  2. Set reasonable autogrowth - 64MB or 10% on a 500GB file?  Please.  Use 256MB for data and 128MB for log files.  Autogrowth events are expensive.

  3. Instant file initialization is your friend - Enable it at the Windows level. Tempdb creates new files on every restart, and you don't want to wait for Windows to zero out 100GB.

  4. Put it on fast storage - Tempdb is high-throughput, low-durability. SSDs are perfect.  It doesn't need to be on your expensive SAN.


Next time someone says 'the database is slow', check tempdb first:

-- What's waiting on tempdb?
SELECT 
    wait_type,
    waiting_tasks_count,
    wait_time_ms,
    signal_wait_time_ms
FROM sys.dm_os_wait_stats
WHERE wait_type LIKE 'PAGE%LATCH%'
   OR wait_type LIKE 'PAGELATCH%'
ORDER BY wait_time_ms DESC;

If you see PAGELATCH_UP or PAGELATCH_EX waits on 2:1:1 or 2:1:3 (PFS and SGAM pages), you've got allocation contention.  Time to add more files.

The Bottom Line

TempDB is the workhorse of your SQL Server instance. Treat her right: give her more space than she needs, multiple files, fast disk, and for the love of all things holy, size those files equally.

Your future self (and your users) will thank you.  I promise.  

More to read

Thursday, November 13, 2025

Same Query. Same Index. Totally Different Performance. Why?

In electronic trading, nothing is more frustrating than a query that:

  • Runs in 200 ms during UAT
  • Then takes 45 seconds in PROD at market open…

With the same proc, same index, and same code.

You rebuild indexes and update stats with fullscan.. but it still behaves like a moody market maker on a Monday.

Most likely not bad indexing. More likely Parameter Sniffing.

Quick Reminder: What is Parameter Sniffing?

SQL Server does something clever when you run a parameterized query:

  1. On the first execution, it “sniffs” the parameter values.
  2. It compiles an execution plan optimized for those values.
  3. It caches and reuses that plan for future executions.

Most of the time this is a big win, but when your data distribution is skewed, that cached plan can be perfect for one set of orders and terrible for another. Think:

  • One trader ID with a few intraday fills.
  • Another trader ID with hundreds of of micro-lots across USD/JPY and EUR/USD.

Where Indexes Enter the Drama

Consider a simplified orders table:

CREATE TABLE dbo.Orders
(
  OrderID          bigint IDENTITY(1,1) PRIMARY KEY,
  TimeOfExecution  datetime2(3),
  TraderID         int,
  Symbol           varchar(16),
  Side             char(1),      -- 'B' = Buy, 'S' = Sell
  Qty              int,
  Price            decimal(18,4),
  Status           char(1)       -- 'O' = Open, 'F' = Filled, 'C' = Cancelled
);

CREATE INDEX idx_Orders_TimeOfExec_TraderID
    ON dbo.Orders (TimeOfExecution, TraderID);

CREATE INDEX idx_Orders_TimeOfExec_Symbol
    ON dbo.Orders (TimeOfExecution, Symbol);

And a stored procedure on a busy reporting box:

CREATE OR ALTER PROCEDURE dbo.usp_GetTradersTrades
    @TraderID int,
    @FromTime datetime2(3),
    @ToTime   datetime2(3)
AS
BEGIN
    SET NOCOUNT ON;

    SELECT OrderID, TraderID, Symbol, Side, TimeOfExecution, Qty, Price, Status
    FROM dbo.Orders
    WHERE TraderID        = @TraderID
      AND TimeOfExecution >= @FromTime
      AND TimeOfExecution <  @ToTime;
END;
GO

Run 1: Quiet Retail Trader

EXEC dbo.usp_GetTradersTrades
    @TraderID = 101,   -- 200 rows
    @FromTime = '2025-11-13 09:00:00',
    @ToTime   = '2025-11-13 16:00:00';

SQL Server sniffs these parameters, sees a very selective trader and creates a plan. The plan is cheap, tight, and fast -- and now it is cached.

Run 2: HFT / Flow Account from the Same Proc

EXEC dbo.usp_GetTradersTrades
    @TraderID = 9001, -- 10 thousand rows
    @FromTime = '2025-11-13 09:00:00',
    @ToTime   = '2025-11-13 16:00:00';

Now we hit an HFT account with thousands of fills across the day. The small trader plan is reused on this account --- and quickly degenerates into a slow, row-by-row grind with massive I/O.

Did the index suddenly become bad? No. Very simply, the chosen plan is wrong for these new parameters, but it’s being forced to reuse it.

Hello Parameter Sniffing.

  • The 'perfect' index can amplify the gap between good and bad plans.
  • You can get:
    • One plan that loves the quiet retail trader, and
    • The same plan that punishes the HFT account at the opening bell.

If your query runs fast sometimes and face-plants other times with the same index, start suspecting plans + parameters, rather than just the index.

Fast Triage in a Trading Environment

Run the proc with a good TraderID and a bad TraderID:

SET STATISTICS IO, TIME ON;

EXEC dbo.usp_GetTradersTrades @TraderID = 101,  @FromTime = ..., @ToTime = ...;  -- Good
EXEC dbo.usp_GetTradersTrades @TraderID = 9001, @FromTime = ..., @ToTime = ...;  -- Bad

If one execution wants a seek + nested loops and the other really needs a scan + hash join, you’re looking at parameter sniffing.

Force a Fresh Plan on the Bad Execution

EXEC dbo.usp_GetTradersTrades
    @TraderID = 9001,
    @FromTime = '2025-11-13 09:00:00',
    @ToTime   = '2025-11-13 16:00:00'
OPTION (RECOMPILE);

Practical Fixes That Don’t Suck

1. Targeted OPTION (RECOMPILE)

For procedures that:

  • Run less frequently throughout the day (end-of-day reports, risk aggregation), or
  • Are extremely sensitive to trader / symbol distribution,

You can selectively recompile:

SELECT OrderID, TraderID, Symbol, Side, TimeOfExecution, Qty, Price, Status
FROM dbo.Orders
WHERE TraderID        = @TraderID
  AND TimeOfExecution >= @FromTime
  AND TimeOfExecution <  @ToTime
OPTION (RECOMPILE);

2. 'Optimize For' the Common Case

If 90% of calls are for high-volume flow accounts, you can bias the optimizer:

SELECT ...
FROM dbo.Orders
WHERE TraderID        = @TraderID
  AND TimeOfExecution >= @FromTime
  AND TimeOfExecution <  @ToTime
OPTION (OPTIMIZE FOR (@TraderID UNKNOWN));

UNKNOWN pushes the optimizer toward an average selectivity plan instead of letting one weird trader dominate the compilation. You avoid full recompilation while still stabilizing behavior.

3. Split the Logic on Purpose

Sometimes your data model really does have two worlds:

  • A small set of ultra-high-volume trader IDs.
  • A very long tail of low-volume accounts.

In that case, it can be cleaner to admit that in code:

IF @TraderID IN (9001, 9002, 9003)  -- known firehose accounts
BEGIN
    SELECT ...
    FROM dbo.Orders
    WHERE TraderID        = @TraderID
      AND TimeOfExecution >= @FromTime
      AND TimeOfExecution <  @ToTime;
END
ELSE
BEGIN
    SELECT ...
    FROM dbo.Orders WITH (INDEX(idx_Orders_TimeOfExec_TraderID))
    WHERE TraderID        = @TraderID
      AND TimeOfExecution >= @FromTime
      AND TimeOfExecution <  @ToTime;
END

Not pure from a theoretical standpoint, but brutally clear when you’re on call during a volatile session and need behavior you can predict.

Index Tuning with Market Behavior in Mind

When tuning indexes for trading workloads, think beyond just the columns...

  1. Data distribution
    Are there whale traders or symbols that dwarf everything else? Are you mixing those in the same proc as tiny retail accounts?
  2. Parameter sets
    Test both:
    • Selective cases (light traders, off-peak times).
    • Worst-case flow (HFT / heavy flow accounts at market open).
  3. Plan stability
    If the plan flips between instant and agonizing with no code changes -- you’re not done tuning yet.

The One-Line Takeaway

When performance flips between instant and miserable on the same proc, in the same trading system, your index probably isn’t lying to you – your cached plan is.

Keep tuning your indexes, but be aware of Parameter Sniffing. In electronic trading, indexes and parameter sniffing are a package deal. Ignore one, and the other will eventually creep up on you.

More to read