Saturday, June 6, 2026

Globex, Globx, and GLOBEX: Fuzzy String Matching in SQL Server 2025

You have a customer list. Somewhere in it, 'Globex Corporation' got entered four different ways by four different people at four different times. One typed 'Globx Corporation'. One typed 'Globex Corporaton'. One held the shift key down and gave you 'GLOBEX CORPORATION'. They are all the same company, but your reports are counting them as four different entities.

SQL Server never had a good built-in answer for this. SOUNDEX has been in the box forever, but it was built to match names that sound alike, not to measure how close two typed strings actually are -- a different job entirely. So, we either hand-wrote the matching logic ourselves or pushed the whole problem out to SSIS Fuzzy Lookup or CLR. SQL Server 2025 finally ships the math itself. Four functions, no UDF, no external code. This post is the demo: the old pain, the four new functions, a real dedup query, and the one gotcha that may bite you on day one.

The Old Way: Roll Your Own

The closest thing we had built in was SOUNDEX, and it was never up to this job. It is phonetic, it is English-centric, and it crushes every word down to a four-character code, so collisions are constant. It can tell you two words sound alike. It cannot tell you that 'Globex Corporaton' is one keystroke from 'Globex Corporation'.

-- The tool we used to reach for. It answers a different question than the one you have.
SELECT SOUNDEX('Globex'), SOUNDEX('Globx');

So most shops did one of two things. They hand-wrote a function to count the differences between two strings and ran it across a million rows -- the per-row, parallelism-killing scalar UDF curse I wrote about earlier this month. Or they gave up on T-SQL entirely and moved the work to SSIS or a CLR assembly. Both work, but neither is something you want to manage as an 'extra' in your data layer.

Turn It On First

Two things have to be true before the four new functions can be used. The database has to be at the SQL Server 2025 compatibility level (170), and the preview-features database scoped configuration has to be on.

-- Check where you are
SELECT name, compatibility_level
FROM sys.databases
WHERE name = DB_NAME();

-- Set the 2025 compatibility level if you are not there yet
ALTER DATABASE [YourDbName] SET COMPATIBILITY_LEVEL = 170;

-- Turn on the preview features for this database
ALTER DATABASE SCOPED CONFIGURATION SET PREVIEW_FEATURES = ON;

Skip that last step and SQL Server does not tell you what you missed. It just acts like the function does not exist, because as far as it is concerned, it doesn't:

Msg 195, Level 15, State 10, Line 1
'EDIT_DISTANCE' is not a recognized built-in function name.

If you see that on SQL Server 2025, your first check is whether the preview-features setting is ON.

USE DBA; -- Change to the db where you've enabled it.
SELECT 
    CASE value 
        WHEN 1 THEN 'TRUE (Enabled)' 
        ELSE 'FALSE (Disabled)' 
    END AS Is_Preview_Features_Enabled
FROM sys.database_scoped_configurations
WHERE name = 'PREVIEW_FEATURES';

Use the ALTER statement above to set it up if it's not enabled. Otherwise, keep moving.

The Four Functions

There are four functions, built on two different algorithms that answer the same question -- how far apart are two strings? There's two because they just measure it differently.

The first algo is Levenshtein, which measures what's called the edit distance, which is the number of single-character changes -- add a letter, remove one, swap one -- it takes to turn one string into the other. 'Globx' into 'Globex' is one edit, just add the missing e. 'Cat' to 'Dog' is three. The smaller that number, the closer the two strings are to matching.

The second algo is Jaro-Winkler. It scores similarity on a different formula, and the part that matters is that it gives extra credit when two strings start the same way. We will see exactly where that pays off in a minute.

Each algorithm comes as a pair. One function hands you the raw distance, the other turns it into a 0-to-100 similarity score.

Function Returns Range
EDIT_DISTANCE Edits to turn one string into the other
(insert, delete, substitute)
0 = identical,
higher = farther apart
EDIT_DISTANCE_SIMILARITY Similarity score from the edit distance 0 = no match,
100 = full match
JARO_WINKLER_DISTANCE Distance, weighted toward strings that
agree on their opening characters
0 = identical,
1 = no match
JARO_WINKLER_SIMILARITY Similarity, same prefix weighting 0 = no match,
100 = full match

One footnote before we move on. Microsoft's docs call this Damerau-Levenshtein and list transposition as a fourth operation -- until they admit a few lines down that transpositions are not supported yet. So today it is plain Levenshtein: three operations, not four. Test a straight swap like 'AB' against 'BA' and you get 2, not the 1 a real Damerau-Levenshtein would give you.

One other thing to keep straight is that distance and similarity run in opposite directions. A distance of 0 is a perfect match and climbs as the strings differ. A similarity of 100 is a perfect match and falls toward 0. So a good match means a low distance but a high similarity. Get that backwards in a WHERE clause and you will filter out exactly the rows you meant to keep.

A Quick Look at All Four

Here are five pairs run through all four functions at once, so you can see how they disagree.

SELECT T.source_string,
       T.target_string,
       EDIT_DISTANCE(T.source_string, T.target_string)            AS ED_Distance,
       JARO_WINKLER_DISTANCE(T.source_string, T.target_string)    AS JW_Distance,
       EDIT_DISTANCE_SIMILARITY(T.source_string, T.target_string) AS ED_Similarity,
       JARO_WINKLER_SIMILARITY(T.source_string, T.target_string)  AS JW_Similarity
FROM (VALUES ('Black', 'Red'),
             ('Colour', 'Yellow'),
             ('Colour', 'Color'),
             ('Microsoft', 'Msft'),
             ('Regex', 'Regex')
     ) AS T(source_string, target_string);

Look at the 'Microsoft' / 'Msft' row. Edit distance sees five changes and scores it a weak 44 because half the letters are gone. Jaro-Winkler is kinder at 51, because the two share an opening 'M' and the same letters in order. That difference is the whole personality split between the two algorithms, and it is why abbreviations are where they part ways.

The Real Job: Catching Duplicate Customers

The warm-up used literals. Here is the actual problem -- a staging table full of names a human typed, and a known-good master value you want to match them against.

CREATE TABLE dbo.CustomerImport
(
    RowID       INT IDENTITY(1,1) PRIMARY KEY,
    EnteredName NVARCHAR(100) NOT NULL
);

INSERT dbo.CustomerImport (EnteredName)
VALUES (N'Globex Corporation'),   -- clean
       (N'Globexx Corporation'),  -- doubled letter
       (N'Globex Corporaton'),    -- dropped an i
       (N'Globx Corporation'),    -- dropped an e
       (N'Globex Corp'),          -- abbreviation
       (N'GLOBEX CORPORATION');   -- shift key never came up

Now score every entered name against the master string and attach a verdict. The similarity score does the work; the CASE expression just turns a number into a decision a human can act on.

SELECT EnteredName,
       EDIT_DISTANCE(EnteredName, N'Globex Corporation')            AS Edits,
       EDIT_DISTANCE_SIMILARITY(EnteredName, N'Globex Corporation') AS Similarity,
       CASE
           WHEN EDIT_DISTANCE_SIMILARITY(EnteredName, N'Globex Corporation') >= 95
               THEN 'Almost certainly the same'
           WHEN EDIT_DISTANCE_SIMILARITY(EnteredName, N'Globex Corporation') >= 85
               THEN 'Likely the same - review'
           WHEN EDIT_DISTANCE_SIMILARITY(EnteredName, N'Globex Corporation') >= 70
               THEN 'Possible match'
           ELSE 'Probably different'
       END AS Verdict
FROM dbo.CustomerImport
ORDER BY Similarity DESC;

The three typos land at 94 and 95 and all get flagged for review or better -- that's the point, a single dropped or doubled letter barely moves the score on an eighteen-character string. But the bottom two rows are also the same company, and both fell through to 'Probably different'. Those two failures are what you need to understand before you trust this in production.

'Globex Corp' scored 61 -- abbreviations are hard for edit distance.

'Globex Corp' is just the company name shortened. But turning 'Globex Corp' into 'Globex Corporation' means adding seven letters, so the edit distance is seven and the score drops to 61. Edit distance only counts the changes; it has no idea the seven missing letters spell a harmless 'oration'. Any abbreviation looks far apart for the same reason. Jaro-Winkler does better here, because the two names start identically, and that is exactly what it rewards. So when your data is full of 'Corp', 'Inc', and 'Ltd', try both and compare the scores.

'GLOBEX CORPORATION' scored 17 -- and that is the gotcha that gets everyone.

The Gotcha: These Functions Are Case-Sensitive

This is the one to circle. The same company name, in capitals, scored 17 out of 100. Not because the algorithm is wrong, but because right now these functions ignore your collation entirely. Microsoft says so plainly in the docs: "the functions don't adhere to the comparison semantics defined by collation settings, such as case insensitivity and other collation-specific rules."

So even on a case-insensitive database, which most of them are, an uppercase letter and its lowercase version count as a change. 'GLOBEX CORPORATION' differs from 'Globex Corporation' in fifteen places, so the engine counts fifteen edits and scores it 17.

-- Same company. Case-sensitive comparison thinks they are barely related.
SELECT EDIT_DISTANCE_SIMILARITY(N'Globex Corporation', N'GLOBEX CORPORATION') AS RawScore;

-- Normalize the case on both sides first, and the truth comes back.
SELECT EDIT_DISTANCE_SIMILARITY(
           UPPER(N'Globex Corporation'),
           UPPER(N'GLOBEX CORPORATION')) AS NormalizedScore;

The fix is one fast function call away. Wrap both sides in UPPER or LOWER. You need to pick one and stay consistent so that case never enters into the comparison. Until Microsoft teaches these functions to honor collation, do this on every fuzzy comparison you write. Every time.

EDIT_DISTANCE or JARO_WINKLER?

You have two algorithms. They are good at different things.

Reach for When
EDIT_DISTANCE Typos, transpositions, dropped or doubled
characters -- short, roughly equal-length strings
JARO_WINKLER Names and strings that share a strong start
but vary in length, including abbreviations

Edit distance is literal. It simply counts changes. Great for the 'Corporaton' kind of typo, lousy at the 'Corp' kind, because a long missing tail reads as a pile of edits. Jaro-Winkler forgives a different ending when the beginning matches, which is how human-typed names usually drift. When you are unsure, run both and compare the scores, the way my warm-up query above did. Where they disagree is where your judgment goes.

The One Thing to Read First

Here is the honest version. On SQL Server 2025, fuzzy string matching is still in preview. It ships in the box, it works the moment you flip the database scoped configuration, and you can build and test on it today. But preview is not GA. This is the same caveat I raised on the native JSON data type, since preview behavior can change between cumulative updates.

Where Fuzzy matching status
SQL Server 2025 (box) In preview -- requires PREVIEW_FEATURES = ON
Azure SQL Database Generally available
Azure SQL Managed Instance Available on the SQL Server 2025 / Always-up-to-date update policy
SQL database in Microsoft Fabric Generally available

One more thing, beyond the case issue: these functions run per row. Comparing two literals is free. Comparing a column against a master across a few million rows has real compute costs on every row. Narrow the field first with something cheap, like a same-first-letter filter, a length window, or an existing index, and then let the fuzzy function do its work on what's left. Don't point it at the whole table and just hope it works out.

The Bottom Line

The hand-rolled Levenshtein UDF was never wrong. It was just the best we had, and it dragged a per-row scalar function through every query that used it. SQL Server 2025 puts the math in the engine where it belongs, and the whole pattern collapses to a function call.

Edit Distance and Jaro-Winkler are the two newly introduced built-in fuzzy string-matching functions designed to identify typos, duplicate entries, and approximate text matches. This is a nice uptick from the custom T-SQL scripts we had to write and/or the CLR assemblies we used to achieve the same before SQL Server 2025.

Two things to remember. The functions are case-sensitive today regardless of your collation, so fold the case on both sides before you compare. And edit distance and Jaro-Winkler are not interchangeable. One counts keystrokes, the other forgives a different ending. Get those two right and you can finally answer 'is this the same customer?' within T-SQL.

Oh yeah -- also keep an eye on that preview flag until Microsoft moves it to GA in the box.

More to Read

Microsoft Learn: What is fuzzy string matching?
Microsoft Learn: EDIT_DISTANCE (Transact-SQL)
Microsoft Learn: JARO_WINKLER_SIMILARITY (Transact-SQL)
Microsoft Learn: ALTER DATABASE SCOPED CONFIGURATION (preview features)
sqlfingers inc: Cursed SQL -- Six Queries That Run Fine Until They Don't
sqlfingers inc: Your JSON Column Was Never a JSON Column

Wednesday, June 3, 2026

The AI Didn't Hack Mexico. SQL Injection Did.

I live in Cozumel, MX part-time. So when one person walked off with about 150GB of Mexican government data this winter — tax records, voter rolls, civil registry files, government logins — it was not abstract news to me. It was my neighbors' data.

The headline everyone ran with was 'AI hacked the Mexican government.' That is the wrong lesson. The AI did not invent a new way in. It ran the oldest trick in the book at machine speed: SQL injection.

What Happened

On February 25, 2026, the Israeli firm Gambit Security disclosed the campaign, first reported by Bloomberg. One unidentified operator, working for about a month starting in December 2025, hit roughly ten government bodies; the federal tax authority (SAT), the electoral institute (INE), several state governments, Mexico City's civil registry, even Monterrey's water utility.

The toolkit was not a nation-state arsenal. It was a consumer AI subscription. The operator jailbroke Claude Code, which means they talked it past its safety guardrails, with Spanish-language prompts, framing every request as an authorized 'bug bounty' and role-playing the model as an 'elite hacker.' When Claude balked, they switched over to GPT-4.1 to keep moving. Total haul: about 150GB, and roughly 195 million identities exposed.

The Part Nobody Wants to Say Out Loud

Gambit found at least 20 vulnerabilities exploited across those systems. None were exotic. Just exposed admin panels. Default credentials. Unpatched web apps matching CVE-2023-era patterns. Login pages with no rate limiting. The kind of risks sitting in thousands of shops right now — maybe even one that you support.

Here is the line from the logs that should stop every DBA cold. The operator had a frontier AI model at the keyboard. What did they have it write? Python SQL injection payloads against login pages on .gov.mx, built around a string straight out of a 2006 tutorial:

' UNION SELECT username, password FROM users--

That is not a 2026 attack. That is a 2006 attack, run at machine speed by something that never gets tired of rephrasing the prompt. Gambit's CEO called it this way: "This reality is changing all the game rules we have ever known." Maybe so. But the door it walked through is one we have known how to lock for twenty years.

SQL Injection, Still Undefeated in 2026

If you concatenate user input into a query string, you are vulnerable. It does not matter how clever your firewall is. Here is the whole problem and the whole fix, end to end. Run it on any version, sp_executesql has been there since forever.

CREATE TABLE dbo.Users
(
    UserId       int IDENTITY(1,1) PRIMARY KEY,
    Username     nvarchar(128) NOT NULL,
    PasswordHash nvarchar(256) NOT NULL
);

INSERT dbo.Users (Username, PasswordHash)
VALUES (N'rlewis', N'hash_aaa'),
       (N'admin',  N'hash_zzz');

The vulnerable pattern

Build the query by gluing the input straight into the string, and you have handed over the keys to whoever is typing.

CREATE OR ALTER PROCEDURE dbo.GetUser
    @Username nvarchar(128)
AS
BEGIN
    DECLARE @sql nvarchar(max);

    SET @sql = N'SELECT Username, PasswordHash
                 FROM dbo.Users
                 WHERE Username = ''' + @Username + N'''';

    EXEC (@sql);
END;

Call it normally and it behaves:

EXEC dbo.GetUser @Username = N'rlewis';

Now feed it the kind of input the attacker used:

EXEC dbo.GetUser @Username = N''' OR 1=1 --';

The string your server builds and then runs becomes this:

SELECT Username, PasswordHash
FROM dbo.Users
WHERE Username = '' OR 1=1 --'

Every row in the table comes back. Username and password hash for every account, handed over by your own stored procedure. That is the breach, in miniature.

The fix: parameterize

The query text is fixed and the input rides in as a parameter, so the engine compiles the command before it ever sees the value. The input can only ever be a username to look up, never code to run. That is the whole difference: concatenation lets the attacker write SQL, parameters only let them supply a value.

CREATE OR ALTER PROCEDURE dbo.GetUser_Safe
    @Username nvarchar(128)
AS
BEGIN
    DECLARE @sql nvarchar(max);

    SET @sql = N'SELECT Username, PasswordHash
                 FROM dbo.Users
                 WHERE Username = @Username';

    EXEC sys.sp_executesql
        @sql,
        N'@Username nvarchar(128)',
        @Username = @Username;
END;

Same malicious input, run again:

EXEC dbo.GetUser_Safe @Username = N''' OR 1=1 --';

Zero rows. The input is read as a literal username that simply does not exist, not as a condition:

(0 rows affected)

What This Means for You

The uncomfortable part of this story is how little of it is about AI. Strip the model out and you still have unpatched servers, over-privileged accounts, and login pages that a 2006 worm could have walked through. AI did not lower your defenses. It lowered the skill and the patience required to find the gaps you already had.

So the homework has not changed. It has only become more pressing.

Control Why it matters now
Parameterize everything No concatenated user input, anywhere. sp_executesql or parameters, every time.
Patch on a real cadence The 20 flaws were known and old. A model can find every one of them in minutes.
Least privilege A compromised login should reach almost nothing. Scope every account down.
Rate limit and lock out Machine-speed credential stuffing dies fast against throttling and lockouts.
Watch behavior, not signatures No human types thousands of login attempts a minute. Alert on the pattern.

The AI is new. The way in was not. Lock the door we have known about since SQL Server 6.5.

More to Read

Security Affairs: Claude Code abused to steal 150GB in cyberattack on Mexican agencies
HawkEye: How hackers used Claude to breach the Mexican government (the full attack chain)

Tuesday, June 2, 2026

Cursed SQL: Six Queries That Run Fine Until They Don't

Sometimes SQL can be cursed. It runs fine in dev and makes it through testing without issue. Then it gets to production, the table grows, the load spikes, and things turn on you at 2 a.m.

The cursed SQL returns answers. The problem is that the answer is wrong, or slow, or both -- and SQL Server never says a word. That's the curse. Here are six good examples.

CREATE TABLE dbo.Orders (
    OrderID     INT IDENTITY(1,1) PRIMARY KEY,
    CustomerID  INT          NOT NULL,
    AccountNo   VARCHAR(20)  NOT NULL,
    OrderDate   DATETIME2    NOT NULL,
    Amount      DECIMAL(10,2) NOT NULL
);

-- A covering index for some of our victims below
CREATE INDEX idx_Orders_CustomerID ON dbo.Orders (CustomerID) INCLUDE (Amount);

-- And one on the columns we will abuse
CREATE INDEX idx_Orders_AccountNo ON dbo.Orders (AccountNo);
CREATE INDEX idx_Orders_OrderDate ON dbo.Orders (OrderDate);

INSERT dbo.Orders (CustomerID, AccountNo, OrderDate, Amount)
SELECT TOP (500000) -- 500K rows, enough to make the bad plans hurt (TOP caps the cross join)
    ABS(CHECKSUM(NEWID())) % 1000,
    CAST(ABS(CHECKSUM(NEWID())) % 100000 AS VARCHAR(20)),
    DATEADD(DAY, -(ABS(CHECKSUM(NEWID())) % 1000), SYSDATETIME()),
    (ABS(CHECKSUM(NEWID())) % 50000) / 100.0
FROM sys.all_objects a CROSS JOIN sys.all_objects b;

Curse 1: The Cursor That Should Have Been a Set-Based Manipulation

The classic. You need to update every row, so you reach for the tool that processes one row at a time. It feels procedural. It feels safe. It is neither.

DECLARE @id INT, @amt DECIMAL(10,2);
DECLARE c CURSOR FOR 
SELECT OrderID, Amount FROM dbo.Orders;
OPEN c;
FETCH NEXT FROM c INTO @id, @amt;

WHILE @@FETCH_STATUS = 0
BEGIN
    UPDATE dbo.Orders
    SET Amount = @amt * 1.05
    WHERE OrderID = @id;

    FETCH NEXT FROM c INTO @id, @amt;
END

CLOSE c;
DEALLOCATE c;

The cursor spells out a procedure: fetch a row, update that row, fetch the next, repeat half a million times. Each pass is its own latch, lock, and log record. But SQL Server is a set-based engine, and a safer approach is a working table you walk in batches with a WHILE loop, updating through a join instead of fetching one row at a time.

-- Stage the rows to process in a working table with an identity column
CREATE TABLE #Work (
    RowID    INT IDENTITY(1,1) PRIMARY KEY,
    OrderID  INT NOT NULL
);

INSERT #Work (OrderID)
SELECT OrderID FROM dbo.Orders;

DECLARE @BatchStart INT = 1,
        @BatchSize  INT = 10000,
        @MaxRow     INT;
SELECT @MaxRow = MAX(RowID) FROM #Work;

WHILE @BatchStart <= @MaxRow
BEGIN
    UPDATE o
    SET o.Amount = o.Amount * 1.05
    FROM dbo.Orders o INNER JOIN #Work w 
      ON o.OrderID = w.OrderID
    WHERE w.RowID >= @BatchStart
    AND w.RowID <  @BatchStart + @BatchSize;

    SET @BatchStart = @BatchStart + @BatchSize;
END

DROP TABLE #Work;

That is the set-based manipulation: stage the rows in a working table keyed by an identity column, then walk it with a WHILE loop, updating through the join one batch at a time. No CURSOR, no FETCH, no DEALLOCATE, and none of the per-row engine machinery a cursor hauls around behind it. It still walks the set, but on your terms, through a structure you own, and each pass updates ten thousand rows in a single set operation instead of one row in its own transaction. Batching is the whole point: you cap transaction size, lock footprint, and log growth instead of letting one statement hold the entire table hostage. And the 10K is a starting point. Test it out. You can probably go as high as 100K.

Curse 2: SELECT * Where Nobody Asked

SELECT * is convenient, but it's also a promise you can't keep. The moment someone adds, drops, or reorders a column, everything downstream gets a surprise. And what about the overhead? SELECT * kills index efficiency by forcing expensive Key Lookups or Clustered Index Scans instead of using fast, narrow indexes. It also wastes network bandwidth by pushing large amounts of unneeded data across the network - and more.

We have an index that covers CustomerID and Amount. Ask for precisely those two columns and the index answers the query by itself; SQL Server never touches the table:

SELECT CustomerID, Amount
FROM dbo.Orders
WHERE CustomerID = 42; -- Index Seek on idx_Orders_CustomerID.

Now ask for everything. The index doesn't carry the other columns, so SQL Server has to go fetch them -- a key lookup back to the clustered index for every matching row, or it abandons the index entirely and scans. Same WHERE clause, far more work:

SELECT *
FROM dbo.Orders
WHERE CustomerID = 42; -- Key Lookup or Clustered Index Scan. More reads, more pages, more time.

Name your columns. You get the index you paid for, and your query still means the same thing six months from now.

Curse 3: A Function on the Wrong Side of the WHERE

You want this year's orders, so you wrap the column in a function. Reads beautifully but performs like a swamp. When you apply a function to an indexed column in the WHERE clause, the predicate is no longer SARGable, the index goes unused, and SQL Server scans the whole thing.

SELECT OrderID, Amount
FROM dbo.Orders
WHERE YEAR(OrderDate) = 2026; -- Index Scan. SQL Server has to compute YEAR() for every single row.

Leave the column alone and express the same idea as a range. Now the index on OrderDate does its job.

SELECT OrderID, Amount
FROM dbo.Orders
WHERE OrderDate >= '2026-01-01'
AND OrderDate < '2027-01-01'; -- Index Seek. Touches only the rows in range.

Note the half-open range, '>=' to '<', rather than BETWEEN. That will sidestep the boundary fights you get with the time portion of a datetime2.

Curse 4: The Implicit Conversion You Never See

This one is the quietest of them all. AccountNo is a VARCHAR. You compare it to a number because, well, it looks like a number. SQL Server obliges, because it never says no when it can convert.

SELECT OrderID, AccountNo
FROM dbo.Orders
WHERE AccountNo = 12345;   -- integer literal vs VARCHAR column

Here's the trap. When two types meet, the one with lower precedence is converted up to the higher one, and INT outranks VARCHAR. So SQL Server doesn't convert your literal down to a string, it converts the entire column up to INT, row by row. CONVERT_IMPLICIT on every value means your index on AccountNo is useless and it's going to scan. The plan for this query even waves a little yellow warning triangle at you that nobody ever clicks.

Match the literal to the column's type and the conversion vanishes:

SELECT OrderID, AccountNo
FROM dbo.Orders
WHERE AccountNo = '12345';   -- string vs string, index seek

The fix is one pair of quotes. Nothing was broken -- SQL Server did exactly what the precedence rules say it should. It just did it silently, on every row, and handed back the right answer very slowly. That's the whole story of implicit conversion.

Curse 5: NOLOCK as a Performance Setting

Somewhere along the line, NOLOCK got a reputation as the 'go faster' hint. Slap it on, skip the locks, and get your data sooner. What it actually does is set that statement to READ UNCOMMITTED, and that means something very different than just 'faster.'

SELECT CustomerID, Amount
FROM dbo.Orders WITH (NOLOCK);

READ UNCOMMITTED reads rows other transactions have modified but not committed. Those are dirty reads. Values that may never have existed if the other transaction rolls back. But it gets worse than reading the wrong number. Straight from Microsoft: with NOLOCK, rows can appear or disappear in your result set before your statement finishes. During a page split it can skip rows entirely, or hand you the same row twice. You don't get an error. You get a report with bad data. Again, dirty reads. You don't want these.

NOLOCK is not a performance tuning option. If blocking is your real problem, the grown-up answer is to identify and resolve what is causing the blocking. Row versioning is an option, where readers do not request shared locks, and writers do not block readers:

USE master;
ALTER DATABASE YourDBName
SET READ_COMMITTED_SNAPSHOT ON
WITH ROLLBACK IMMEDIATE;

Reach for NOLOCK only when you've decided intentionally that speed beats accuracy. That's a real tradeoff sometimes, and it's almost never the one people think they're making.

Curse 6: The Scalar UDF Hiding in a SELECT

You wrap a little logic in a scalar function because it's tidy and reusable. Then you call it in a SELECT over a million rows, and that tidy little function runs a million times, serially, with the Optimizer mostly blind to what's inside it.

CREATE FUNCTION dbo.udfAmountWithTax (@amt DECIMAL(10,2))
RETURNS DECIMAL(10,2)
AS
BEGIN
    RETURN @amt * 1.08;
END
GO

SELECT OrderID, dbo.udfAmountWithTax(Amount) AS WithTax
FROM dbo.Orders;

Historically this was a guaranteed performance crime with per-row execution and a query that refuses to go parallel. But there is a plot twist. Starting in SQL Server 2019, scalar UDF inlining can fold qualifying functions right into the calling query, so it gets cost-based, set-oriented, and parallel-friendly, with no code change. It's on by default at database compatibility level 150 or higher.

This doesn't remove the curse. It just hides it better. Inlining doesn't catch everything. A function that references certain intrinsics, like GETDATE() or @@ROWCOUNT, can be disqualified and quietly fall back to the old per-row misery. And if you're on an older version, or running an older compatibility level in a newer version, you get none of the magic. The honest fix is to not hide set logic inside a scalar shell. Inline the expression, or use an inline table-valued function the Optimizer can actually see through.

SELECT OrderID, Amount * 1.08 AS WithTax
FROM dbo.Orders;

In Summary

None of the above throw errors. They pass review and run clean until scale, concurrency, or a schema change surfaces them. That's what makes it helpful to know them on sight. The fix is almost always trivial once you spot the shape, and invisible until you do. Now you have six shapes to spot.

More to Read

Data type precedence (Transact-SQL)
SET TRANSACTION ISOLATION LEVEL (Transact-SQL)
Scalar UDF Inlining
Intelligent Query Processing details

Monday, June 1, 2026

Error 7416: When a Security Patch Breaks Your Linked Servers

Microsoft's May 2026 cumulative update for SQL Server 2025 (CU4, KB5089899) did two things on the same day. It closed a remote code execution hole, CVE-2026-40370, and it quietly tightened how the engine validates linked server connections. The first one got the headline. The second one is the one that broke things.

If you run linked servers over the OLE DB Provider for ODBC, ie., MSDASQL, with a provider string, then any login that is not a member of sysadmin can suddenly get bounced. Nothing about your linked server changed. Nothing about your login mappings changed. The query just stops.

Msg 7416, Level 16, State 1, Line 1
Access to the remote server is denied because no login-mapping exists.

Ironically, the same security update that hardened the engine is the one rejecting your connection.

Three Things Have to Be True

The new validation check fires under a specific shape. All of these have to be true:

Condition Detail
Provider Linked server uses MSDASQL (OLE DB Provider for ODBC Drivers)
Provider string The definition specifies '@provstr'
Login The executing login is NOT in the 'sysadmin' fixed server role

What makes it crazy is that nothing is misconfigured. The linked server is right and the login mappings are right -- but the query still dies. A stricter connection validation check in the Database Engine now rejects connections that earlier builds allowed. Full stop. And since sysadmin logins are exempt from that check, the problem hides from you until a non-sysadmin application login runs a query.

Microsoft does not document the internals beyond calling it a stricter connection validation check, so the honest answer stops at what they assert: the issue hits logins that are not members of the sysadmin fixed server role, it occurs even when the linked server and login mappings are configured correctly, and it is tied to MSDASQL definitions that carry a provider string. The fix is the tell. Adding 'User ID' to '@provstr' clears it, so the check is satisfied once the provider string carries an explicit user credential.

This is not unique to SQL Server 2025. The validation change rolled out across the spring 2026 servicing stream.

Version Update KB
SQL Server 2025 CU4 / CU5 KB5089899 / KB5084896
SQL Server 2022 CU25 KB5081477
SQL Server 2019 CU32 KB5090407

SQL Server 2025 CU4 lands the engine at build 17.0.4040.1. If you are at or past these builds and a linked server job started failing in April or May, this is your first suspect.

The Setup That Breaks

Here is the shape that trips the check. An MSDASQL linked server defined with a provider string, queried by a non-sysadmin login.

EXEC master.dbo.sp_addlinkedserver
    @server     = N'ODBC_LINK',
    @srvproduct = N'',
    @provider   = N'MSDASQL',
    @provstr    = N'Driver={ODBC Driver 18 for SQL Server};Server=REMOTESRV;Database=AppDB;';
GO

Run a query against it under a non-sysadmin login and the engine says no right away.

SELECT TOP (10) *
FROM OPENQUERY(ODBC_LINK, 'SELECT * FROM dbo.Orders');
GO
Msg 7416, Level 16, State 1, Line 1
Access to the remote server is denied because no login-mapping exists.

The Fixes

You might be thinking about uninstalling the update, but that update closed an RCE. Rolling it back to fix a linked server is like trading a flat tire for a brake failure. Microsoft documents three workarounds:

1. Add User ID to the provider string

This is the cleanest fix. The login still has to supply 'UID' in the provider string, which is exactly what the stricter check now wants to see. Drop and recreate the linked server with 'User ID' included.

EXEC master.dbo.sp_addlinkedserver
    @server     = N'ODBC_LINK',
    @srvproduct = N'',
    @provider   = N'MSDASQL',
    @provstr    = N'Driver={ODBC Driver 18 for SQL Server};Server=REMOTESRV;Database=AppDB;User ID=AppReader;';
GO

2. Drop @provstr entirely

If the provider string is not pulling its weight, get rid of it. Push the connection target into '@datasrc' and let 'sp_addlinkedsrvlogin' carry the security, instead of stuffing everything into a string the engine now scrutinizes.

EXEC master.dbo.sp_addlinkedserver
    @server     = N'ODBC_LINK',
    @srvproduct = N'',
    @provider   = N'MSDASQL',
    @datasrc    = N'MyOdbcDsn';
GO

EXEC master.dbo.sp_addlinkedsrvlogin
    @rmtsrvname = N'ODBC_LINK',
    @useself    = N'False',
    @locallogin = NULL,
    @rmtuser    = N'AppReader',
    @rmtpassword = N'********';
GO

3. Grant the login sysadmin (not recommended)

It works because sysadmin skips the check. It is also a privilege escalation you are doing to yourself, on purpose, to undo a security patch. Microsoft lists it, then tells you not to do it. Listen to the second half of that sentence.

The Takeaway

Security hardening and 'no changes were made' failures arrive in the same box more often than vendors like to admit. When a linked server that ran for years dies right after a CU, the mappings are usually fine. Check the build, check the provider, and check whether the login carrying the query is a sysadmin or a mere mortal... The fix is very often just a User ID away.

More to Read

SQL Server 2025 Known Issues
SQL Server 2022 Release Notes (workarounds spelled out)
Cumulative Update 4 for SQL Server 2025 (KB5081495)
KB5089899 - Security update for SQL Server 2025 CU4 (CVE-2026-40370)

Tuesday, May 26, 2026

SIL — Because Holding 1,000 KEY Locks Is So 2022

Back in September I wrote about Optimized Locking in SQL Server 2025. Two components: Transaction ID (TID) locking and Lock After Qualification (LAQ). I told you the engine had stopped 'locking first and asking questions later'. That was true, but it was also only half the story.

Microsoft has since gone back and fine-tuned the feature, giving us two improvements that make a database more lock-free without you touching anything. The headliner is called Skip Index Locks (SIL), and the numbers are pretty good:

Measurement Result
Key locks skipped (Microsoft demo) 99.6%
Page locks skipped (Microsoft demo) 79%
Nonclustered index locks skipped (Azure fleet, ~25K DBs) 81%

No configuration. No hint to add. If you're on SQL Server 2025 with Optimized Locking enabled, SIL is already in there working for you. This post explains what SIL is, how it works, and a demo you can run yourself to see the locks vanish.

Quick recap: TID and LAQ

Transaction ID (TID) locking. Every modified row gets stamped with the ID of the transaction that touched it last. Row and page locks are still taken, but they're released the moment each row is modified, instead of being held through to the commit. The only thing held until commit is one X lock on the TID. Lock memory drops, and lock escalation drops with it.

Lock After Qualification (LAQ). The engine evaluates the predicate against the latest committed row version first, then locks only the rows that actually qualify. Pre v2025, a WHERE OrderDate < '2025-01-01' UPDATE took a U lock on every row it scanned, qualified or not. With LAQ, locks land only on the rows that match.

Both require Accelerated Database Recovery and LAQ also requires Read Committed Snapshot Isolation. Both of these shipped in SQL Server 2025 at launch.

The improvement: Skip Index Locks (SIL)

Under TID locking with RCSI and LAQ in the mix, those short-duration row and page locks are only necessary if some other query is also touching the row and expecting it to stay stable. The kind of query that needs that guarantee is one running under Repeatable Read or Serializable, or using locking hints like 'HOLDLOCK' or 'READCOMMITTEDLOCK'. Microsoft calls these Row Locking Queries (RLQ).

If no RLQ is touching the page, those row and page locks aren't preventing any conflict. Meaning, there's nothing there to conflict with and SIL recognizes that and skips them. The row still gets modified, the change is still ACID-safe, and the lock manager never gets the call. Nice.

The mechanism: Every page in the buffer pool now carries a single No-RLQ bit. When that bit is set, it tells the engine that no Row Locking Query is currently touching any row on the page. A DML statement modifying a row on that page skips the row lock and the page lock entirely, and takes a short exclusive page latch instead. The latch is brief. Each lock would have been a round trip into the Lock Manager, and for a transaction touching thousands of rows, those round trips will add up. That is the overhead that SIL minimizes for us.

Per Microsoft's Azure telemetry, RLQ-style queries are uncommon in real-world workloads using RCSI or snapshot isolation. That is why the skip ratio is so high. Most pages, most of the time, have nobody using them in a way that demands a lock.

What SIL skips, and what it doesn't

SIL IS used for:

  • INSERT statements on heaps - the intent-exclusive (IX) page lock is skipped.
  • UPDATE statements on clustered, nonclustered indexes, and heaps - both the IX page lock and the exclusive (X) row lock are skipped. The engine leans on the page latch and the TID lock instead.

SIL is NOT used for:

  • DELETE statements.
  • UPDATE on heaps where the row has existing forwarding pointers, or where the update creates new ones.
  • Rows with LOB columns - VARCHAR(MAX), NVARCHAR(MAX), VARBINARY(MAX), JSON.
  • Rows on pages that were split in the same transaction.

The exclusions matter. If your hot tables are LOB-heavy or your workload is heavy deletes, your skip ratio will be lower than the headline numbers. Those headline numbers come from update-heavy OLTP under RCSI, which is one of the most common workloads out there.

The other half: LAQ

SIL got the headline, but it shipped with a sibling. Lock After Qualification, or LAQ. LAQ is optimistic. It evaluates query predicates without placing locks on the rows while scanning. In short, it checks the latest committed version of the row and acquires the necessary lock only after the row has qualified for the modification.

The demo

Here is a full working demo for SQL Server 2025 and ADR and RCSI are required.

Step 1: Create the database and enable everything.

CREATE DATABASE SILDemo;
ALTER DATABASE SILDemo SET ACCELERATED_DATABASE_RECOVERY = ON;
ALTER DATABASE SILDemo SET READ_COMMITTED_SNAPSHOT ON;
ALTER DATABASE SILDemo SET OPTIMIZED_LOCKING = ON;
GO

Step 2: Verify SIL is enabled.

USE SILDemo;
SELECT DATABASEPROPERTYEX(DB_NAME(), 'IsOptimizedLockingOn') AS is_optimized_locking_enabled;
GO

Step 3: Create a table with a clustered index and add data.

CREATE TABLE dbo.TestSIL
(
  ID int IDENTITY(1,1) PRIMARY KEY CLUSTERED,
  Value int NOT NULL
);
GO

INSERT INTO dbo.TestSIL (Value)
SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_objects a
CROSS JOIN sys.all_objects b;
GO

Step 4: SIL active — clustered index update.

Open a transaction, update rows, and see what locks the engine holds.

BEGIN TRANSACTION;

UPDATE dbo.TestSIL
SET Value = Value + 10;

SELECT resource_type,
       request_mode,
       COUNT(*) AS lock_count
FROM sys.dm_tran_locks
WHERE request_session_id = @@SPID
  AND resource_type IN ('KEY','PAGE','RID','OBJECT','XACT')
GROUP BY resource_type, request_mode
ORDER BY resource_type, request_mode;

COMMIT TRANSACTION;
GO

If SIL is working correctly, we will see ZERO KEY locks. Under classic locking, an update of 1,000 rows would hold 1,000 X KEY locks (one per modified row) all the way to the commit. SIL skipped every one of them. We still see a small number of PAGE locks (IX mode) and an OBJECT lock (IX), but those are expected and are not held long-term. The KEY lock count is where the story is, and the KEY lock count is zero.

In a nutshell, under classic locking without SIL, updating 1,000 rows would hold 1,000 X KEY locks. With SIL, those KEY locks are skipped entirely.

What you need, and why it matters

SIL turns on for free if the rest of the stack is in place:

  • SQL Server 2025 — it shipped with GA release.
  • Accelerated Database Recovery enabled on the database.
  • Read Committed Snapshot Isolation enabled — without it, LAQ never engages and SIL has nothing to build on.
  • Optimized Locking enabled — ALTER DATABASE [dbname] SET OPTIMIZED_LOCKING = ON;
  • A workload that mostly INSERTs or UPDATEs indexed tables, under RCSI, light on LOB columns.

Azure SQL Database and SQL database in Fabric get all of this automatically. Azure SQL Managed Instance is rolling out. On-prem SQL Server 2025 needs it setup at the database-level.

When it's working, the payoff is the kind a DBA notices: lock memory drops, blocking chains on update-heavy OLTP shorten or even disappear, and you got there with no code changes and no new hints scattered throughout your queries. This is server level. There is no way to opt a single statement into SIL, and there doesn't need to be.

SQL Server 2025's Optimized Locking was already a win. I think SIL is the part that makes the numbers even more worth bragging about.

More to Read

Microsoft Tech Community: Introducing optimized locking v2
Microsoft Learn: Optimized Locking
Microsoft Learn: Accelerated Database Recovery
sqlfingers inc: Optimized Locking in SQL Server 2025 — Concurrency Gets Smarter (September 2025)

Monday, May 25, 2026

Your SQL Server 2016 Upgrade Will Succeed. Then Monday Happens.

SQL Server 2016 goes out of support on July 14, 2026. You know that already. What you may not know is how many seemingly successful upgrades are still going to fail.

Not during the installer or post-upgrade checks. Not during the Friday night smoke test. It's Monday morning -- when the applications reconnect, the linked servers are called, and the ETL starts up -- that things that worked quietly for a decade start throwing SSL errors nobody saw coming.

The upgrade succeeded. The reconnect didn't. That is this post.

The engine is not the problem

When upgrades go bad, it's typically not the engine. It's everything standing around it. Newer drivers. Stricter TLS defaults. Certificate validation that suddenly matters -- and an old environment that never had to think about any of it.

Your installer finishes, the databases come online, SSMS connects, the smoke tests pass, and everyone signs off. Then the application reconnects with a newer OLE DB or ODBC driver -- one that validates the certificate by default -- and the SQL Server that ran fine for years starts refusing connections because nobody ever put a real certificate on it.

Here is what actually changed. Somewhere between 2016-era connectivity and current Microsoft providers, certificate validation became strict by default. Old environments leaned on self-signed certificates, internal certificates that nothing ever really trusted, or no certificate story at all -- because the clients were not checking. The gap was always there, but nothing ever pushed it to the table.

Until now — and this is what it looks like:

A connection was successfully established with the server, but then an error
occurred during the login process. (provider: SSL Provider, error: 0 -
The certificate chain was issued by an authority that is not trusted.)

Read that first line again. The connection to SQL Server succeeded. The network is fine. The SQL Server is fine. The client reached the server, looked at the certificate, and stopped right there. The login never happens. It's a TLS trust problem wearing a login error as a costume.

And why didn't your testing catch it? Your DBA workstation probably already trusts that certificate chain -- so SSMS worked perfectly for everything you tested. The application servers and ETL servers do not trust it. You go home Friday believing the environment is healthy. Production tells you otherwise Monday morning.

Catch this before you upgrade

You don't have to wait until you've upgraded to see this happen. The switch is sitting right in the connection string, and you can test it today against any dev or non-prod SQL Server you have. The keyword is TrustServerCertificate.

This is the old behavior -- the one many of us leaned on for years:

Server=YourServer;Database=YourDb;
Encrypt=True;TrustServerCertificate=True;

And this is the modern default:

Server=YourServer;Database=YourDb;
Encrypt=True;TrustServerCertificate=False;

Test that second one against a SQL Server running a self-signed certificate and the SSL Provider error shows up on demand. That is the whole failure.

It is also why setting TrustServerCertificate=True is not a fix, even though it makes the error vanish instantly. It doesn't solve anything -- it just switches the check off. The traffic is still encrypted, but you are no longer verifying who you connected to. That workaround is all over the forums because it restores connectivity fast, but it's also the kind of shortcut you don't want running in production for another ten years.

The real fix is to put a properly trusted certificate on SQL Server before the migration, then validate the chain from the application and ETL servers that actually have to reconnect. Confirm the environment behaves under modern driver defaults before upgrade weekend -- not after.

Linked servers are a common place to miss this

Linked servers get overlooked because people don't picture them as client connections. But that's exactly what they are -- it's just that SQL Server is the client.

After an upgrade, a linked server inherits the same provider behavior and certificate rules as everything else. So a linked server that has run untouched since 2016 can start failing SSL validation against its partner, even though nothing in the linked server definition changed. The upgrade log won't explain it. The instance looks healthy. You find out when the nightly ETL fails, the reports stop building, or a cross-server query throws an SSL error at 2 AM.

I've seen the identical linked servers work from one SQL Server and fail from another, because the providers and trust stores underneath were different. Same definition. Same target. Completely different TLS behavior. The linked server didn't change. Everything underneath it did.

What about Extended Security Updates?

Not every shop is going to make the July 14 date. Miss it, and 2016 keeps running -- it's the patching that stops. To cover that gap, Microsoft sells Extended Security Updates for up to three more years. ESU buys security update coverage and nothing else. No features, no fixes, no performance. It just keeps the lights on while your upgrade clock keeps ticking -- and the price climbs high every year.

If you genuinely need the runway, ESU can make sense. Just treat it as what it is -- a temporary holdover, not a long-term plan. For a good breakdown of how expensive that path gets, Brent Ozar covered it here.

The bottom line

The upgrade is usually the easy part. The surrounding reconnects are where the breakage live.

Certificates, drivers, linked servers, old providers -- assumptions that survived quietly for a decade. These are what break in an upgrade. Test the connection layer now, while it is still your project and not everybody else's July emergency.

More to Read

Microsoft Lifecycle: SQL Server 2016
Microsoft Learn: Configure SQL Server Encryption
Microsoft Learn: SQL Server Extended Security Updates
Brent Ozar: SQL Server 2016 Support Ends in 90 Days

Thursday, May 21, 2026

Your JSON Column Was Never a JSON Column

For roughly a decade, the way you stored a JSON document in SQL Server was to not store a JSON document at all. You stored a string. NVARCHAR(MAX), an ISJSON check constraint to keep the garbage out, and a quiet agreement with yourself that the column was JSON because you said so.

SQL Server 2025 ships an actual json data type. Binary storage, parsed once, queryable without re-parsing every row. This post is the demo: the old pain, the new behavior, and the one word in the Microsoft docs you need to read before you put this into production.

The old way

Here is the pattern every shop has somewhere. A string column, pretending.

USE tempdb;
GO

CREATE TABLE dbo.OrdersOld
(
    OrderID INT IDENTITY(1,1) PRIMARY KEY,
    Payload NVARCHAR(MAX)
        CONSTRAINT ck_OrdersOld_IsJson_Payload CHECK (ISJSON(Payload) = 1)
);
GO

INSERT dbo.OrdersOld (Payload) VALUES
(N'{"customer":"Acme","total":1200.00,"items":["Coffee","Tea"]}'),
(N'{"customer":"Globex","total":450.00,"items":["Cookies"]}');
GO

It works. It has always worked. But look at what it actually is:

What you wanted What NVARCHAR(MAX) gave you
A JSON document A string to parse
Validation A check constraint you remembered to add
Fast reads A full re-parse of the text on every JSON_VALUE call
Cheap updates Rewrite the whole document to change one key

The check constraint is the giveaway. The engine does not know the column is JSON. You are bolting validation on from the outside, and if a table gets created without that constraint -- it happens -- the column happily accepts 'not json at all' and nobody finds out until a report breaks.

The new way

Same table, one keyword different.

CREATE TABLE dbo.OrdersNew
(
    OrderID INT IDENTITY(1,1) PRIMARY KEY,
    Payload JSON NOT NULL
);
GO

INSERT dbo.OrdersNew (Payload) VALUES
(N'{"customer":"Acme","total":1200.00,"items":["Coffee","Tea"]}'),
(N'{"customer":"Globex","total":450.00,"items":["Cookies"]}');
GO

No check constraint. The validation is in the data type now as it should be. The engine will not let an invalid document into a json column. Try it:

INSERT dbo.OrdersNew (Payload) VALUES (N'not json at all');

That is the difference, right there in the screenshot. The old table only caught bad data if you remembered the constraint. The new table catches it because the JSON data type itself will not accept anything that is not valid JSON.

It is stored parsed, not as text

This is the part that matters for performance, and it is invisible until you go looking. A json column is not stored as the characters you typed -- it is stored in a binary format, parsed once when the row is written.

That changes the math on reads. With the old method, NVARCHAR(MAX), every JSON_VALUE call parses the string from scratch: the engine walks it character by character, finds your key, hands back the value, throws the parse away, and does it all again on the next row. With the json type, that work is already done, and the reads navigate the binary structure directly.

The functions you already know will work against it as-is, no syntax changes:

SELECT
    OrderID,
    Customer = JSON_VALUE(Payload, '$.customer'),
    Total = JSON_VALUE(Payload, '$.total')
FROM dbo.OrdersNew;

Identical T-SQL to what you would write against the old NVARCHAR(MAX) column. That is deliberate -- Microsoft kept the surface area the same so your existing code keeps running. What changed is underneath it.

That is the claim. Here is the measurement. Build the same document set -- 200,000 rows of a heavier document -- in both an NVARCHAR(MAX) column and a json column:

CREATE TABLE dbo.JsonString (id INT PRIMARY KEY, Payload NVARCHAR(MAX));
CREATE TABLE dbo.JsonNative (id INT PRIMARY KEY, Payload JSON);
GO

INSERT dbo.JsonString (id, Payload)
SELECT TOP (200000)
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
    N'{"customer":"Acme Industries","region":"West","tier":"Gold",
"total":1200.00,"tax":96.00,"shipping":"expedited","priority":7,
"notes":"recurring quarterly order, net 30 terms on file",
"items":["Coffee","Tea","Cake","Cookies","Sandwich","Juice"],
"contact":{"name":"Jane Roe","email":"jane@acme.test","phone":"555-0100"}}'
FROM sys.all_objects a CROSS JOIN sys.all_objects b;

INSERT dbo.JsonNative (id, Payload)
SELECT id, Payload FROM dbo.JsonString;
GO

Now extract several values per row -- each JSON_VALUE call against the string column is another full parse of that document:

SET STATISTICS TIME ON;

-- The old way: re-parse the string on every call, every row
SELECT COUNT(*)
FROM dbo.JsonString
WHERE JSON_VALUE(Payload, '$.customer')      IS NOT NULL
  AND JSON_VALUE(Payload, '$.region')        IS NOT NULL
  AND JSON_VALUE(Payload, '$.tier')          IS NOT NULL
  AND JSON_VALUE(Payload, '$.contact.name')  IS NOT NULL
  AND JSON_VALUE(Payload, '$.contact.email') IS NOT NULL
  AND JSON_VALUE(Payload, '$.shipping')      IS NOT NULL;

-- The new way: document already parsed, navigated directly
SELECT COUNT(*)
FROM dbo.JsonNative
WHERE JSON_VALUE(Payload, '$.customer')      IS NOT NULL
  AND JSON_VALUE(Payload, '$.region')        IS NOT NULL
  AND JSON_VALUE(Payload, '$.tier')          IS NOT NULL
  AND JSON_VALUE(Payload, '$.contact.name')  IS NOT NULL
  AND JSON_VALUE(Payload, '$.contact.email') IS NOT NULL
  AND JSON_VALUE(Payload, '$.shipping')      IS NOT NULL;

SET STATISTICS TIME OFF;

On my box the string column ran 5,672 ms of CPU. The json column ran the identical query in 3,828 ms -- roughly a third faster, for changing one keyword in a CREATE TABLE. Same query, same data, same result. The only difference is that the string column re-parses the whole document on every one of those six JSON_VALUE calls, on every row, while the json column was parsed once on insert and just reads the structure. Your numbers will differ, but the shape will not: the more keys you pull and the bigger the document, the wider that gap gets.

A note on JSON_MODIFY

On a string column, changing one key means rewriting the entire document -- read it, modify the text, write the whole thing back. The json type adds a modify method that is built to do in-place updates where the new value fits, instead of rewriting the document in full. Very useful, and on the list of things I plan to lab, but don't miss the version note below. The modify method specifically is one of the pieces still marked preview.

Change existing column to json

You do not have to rebuild the table. ALTER TABLE ... ALTER COLUMN converts a character column in place:

-- The check constraint references Payload -- must be dropped first
ALTER TABLE dbo.OrdersOld
DROP CONSTRAINT ck_OrdersOld_IsJson_Payload;
GO

-- Now take the old string column to the real type
ALTER TABLE dbo.OrdersOld
ALTER COLUMN Payload JSON NOT NULL;
GO

One thing to know before you run that against a real table: the conversion validates every row. Any row holding text that is not well-formed JSON will fail the ALTER. That is not a bug -- it is the data type doing its job -- but it means the ALTER to json doubles as a data-quality check. On a column that has been collecting strings for years with a check constraint that may not have always been trusted, you can probably expect the first attempt to find something. That is not a bad thing.

The order of operations for a production column json change: add the new json column alongside the old one, backfill in batches, find and fix the rows that will not convert, then drop the old column when you know it's clean.

The one thing you should read first

Here is the honest version, because the headlines blur it.

The native JSON data type is generally available -- on Azure SQL Database and Azure SQL Managed Instance. For the boxed product, SQL Server 2025, Microsoft's own documentation still describes the json type as being in preview. The type ships in the box, it works under every database compatibility level, and you can build on it today in a dev environment. But 'in the box' and 'GA' are not the same thing, and the modify method and the JSON aggregate functions are explicitly called out as preview on SQL Server 2025.

Where json type status
Azure SQL Database Generally available
Azure SQL Managed Instance GA (SQL Server 2025 / Always-up-to-date policy)
SQL Server 2025 (box) In preview, per Microsoft Learn

What that means in practice: lab it, learn it, play with it. Do not bet a production schema on the preview-flagged pieces until Microsoft moves them to GA. Preview features can change between CUs, and you don't want to build on a preview feature that gets turned off later. Be sure to check the docs against your current build before you commit.

The bottom line

The NVARCHAR(MAX)-plus-ISJSON pattern was never wrong. It was just the best we had. Very much like the FOR XML PATH string trick was the best we had before STRING_AGG. SQL Server 2025 finally makes JSON a real data type instead of a string with a usage note attached: validated by the engine, stored parsed, cheaper to read and to modify.

It is not a license to turn your relational database into a document store. JSON is still for the variable-shaped data -- the flexible attribute bag, API payloads and unmodeled log records. Do not use it as a shortcut to avoid defining columns that should have been native columns from day one. But for the JSON you legitimately have, this is your upgrade. Just keep an eye on that 'preview' status -- it's a dev-box feature on SQL Server 2025 until Microsoft says otherwise.

More to Read

Microsoft Learn: JSON data type
Microsoft Learn: Work with JSON data in SQL Server
Microsoft Learn: SQL Server 2025 Preview Features FAQ
Microsoft: GA of JSON data type and JSON aggregates
sqlfingers inc: Three T-SQL Tricks You May Not Be Using Yet

Wednesday, May 20, 2026

CISA Left 844 MB of Credentials on Public GitHub. Check Your Own Repos.

CISA is the United States federal agency that tells everyone else how to protect their passwords and credentials. For six months, it left 844 MB of its own — passwords, cloud keys, certificates — sitting in a public GitHub repository.

That is not a hack. Nobody broke in. A contractor put it there.

Disclaimer: Not a cybersecurity expert. Just a DBA reading the news.

What happened

On May 14, 2026, researcher Guillaume Valadon at GitGuardian found a public GitHub repository named 'Private-CISA'. Inside was 844 MB of plaintext passwords, AWS GovCloud credentials, API tokens, Entra ID SAML certificates, SSH keys, Kubernetes manifests, Terraform code, and internal documentation backups belonging to the Cybersecurity and Infrastructure Security Agency. The repository had been created on November 13, 2025 — public for six months before anyone outside flagged it.

Valadon reported it. CISA took the repository offline within 26 hours. Some of the exposed AWS keys stayed valid for another 48 hours after that.

The repository was maintained on a personal GitHub account by a contractor, who appears to have used it to sync files between a work machine and a home machine. The commit history shows GitHub's secret scanning was deliberately disabled.

Again — CISA is the agency that writes the rules.

CISA's statement: "Currently, there is no indication that any sensitive data was compromised as a result of this incident."

What was actually in it

The file names tell the whole story.

importantAWStokens — administrative credentials to three AWS GovCloud accounts.

AWS-Workspace-Firefox-Passwords.csv — plaintext usernames and passwords for dozens of internal CISA systems — the kind of list a browser builds when you let it save every login.

Nobody had to decrypt anything. A file called importantAWStokens held exactly that. No exploit. No zero-day. Just a contractor saving passwords to a CSV and pushing it to a public repo.

Valadon, the researcher who found it, put it this way:

"I honestly believed that it was all fake before analyzing the content deeper. This is indeed the worst leak that I've witnessed in my career. It is obviously an individual's mistake, but I believe that it might reveal internal practices."

This is not a federal story. It is a habits story.

It is easy to read this as a government story and move on. Do not. Every failure in this incident is something I find in SQL Server shops.

The CISA failure The SQL Server shop equivalent
Plaintext passwords in a CSV Connection strings with embedded sa passwords in .dtsConfig files, deployment scripts, and .publish.xml
Backups committed to Git .bak files and BCP exports sitting in a repo
Secret scanning disabled on purpose The 'make it stop warning me' move, in any repo your team owns
AWS keys valid 48 hours after takedown Can you actually rotate a SQL login or a linked server credential in minutes?
Personal repo used to sync work files Contractors and developers moving work between machines through personal GitHub

Read that last row again. Work files on a personal account. One of the worst things for any security team to see - and it's everywhere.

Find it in your own estate

You do not need a federal contractor to have this problem. SQL Server credentials hide in source control in predictable places.

SSIS configuration files
A .dtsConfig with a connection string carries whatever was in it, password included, unless the package protection level forced the credential out.

Deployment artifacts
.publish.xml profiles, post-deployment scripts, and .sqlproj files routinely carry connection details.

SQL Agent job scripts
Job step T-SQL scripted out to a repo can carry linked server definitions and the credentials behind them.

BCP exports and format files
A .bak or a BCP dump committed 'temporarily' is a database sitting in your Git history forever.

Two things to do very soon. First, secret scanning — a GitHub feature that watches commits for things that look like credentials, AWS keys, tokens, connection strings, and either warns you or blocks the push outright. Turn it on for every repo your team owns, and confirm nobody has turned it off. The CISA contractor's repo had it disabled on purpose. Then ask the harder question: if a credential leaked today, how fast could you rotate it? If the honest answer is days, that is its own finding — just like the 48-hour CISA window.

The bottom line

I have written a lot about AI governance — agents with elevated credentials, the absence of data discipline, the legal exposure forming around it. This is the same story without the AI. A shop moving fast, skipping the basics, one switched-off setting away from a very bad day.

This leak is an 844 MB reminder of how little it takes. No AI. No attacker. Just plaintext passwords in a repo and a scanning feature switched off on purpose.

Secret scanning on. Secrets out of Git. Revocation you can do in minutes, not days. The boring discipline is still where shops are bleeding.

CISA could not keep its credentials out of a public repository. Can you?

More to Read

Krebs on Security: CISA Admin Leaked AWS GovCloud Keys on Github
GitGuardian: How We Got a CISA GitHub Leak Taken Down in Under a Day
The Register: America's top cyber-defense agency left a GitHub repo open
sqlfingers inc: AI Agent. Nine Seconds. One Production Database. Gone.

Tuesday, May 19, 2026

Patch Tuesday May 2026: SQL Didn't Get Off Quite That Easy

Last Wednesday I told you SQL Server got off easy in the May Patch Tuesday release. No critical, no public PoC, no headline. Apply the GDR or CU and move on.

I missed something. There was a SQL Server CVE worth talking about, and it got by me because the bug isn't in SQL Server. It's in SSIS.

CVE-2026-40370

CVSS 8.8. Published May 12. Microsoft's one-line description: 'External control of file name or path in SQL Server allows an authorized attacker to execute code over a network.' That reads like a SQL Server engine bug. It isn't.

The actual fix, per KB5089899 and the sibling KBs covering every supported version back to 2016 SP3:

This fix addresses an XML external entity (XXE) vulnerability in the Web
Service Task that allows an attacker to read arbitrary files from the local
file system or cause a denial-of-service (DoS) attack.

Fix area: Integration Services. Component: Integration Services. Platform: Windows.

That is a SSIS bug shipped under a SQL Server CVE number. If you don't run SSIS, the headline does not apply to you. If you do run SSIS and have any package using the Web Service Task, this is yours to deal with.

What XXE is, briefly

XML External Entity is an old class of XML parser vulnerability. The XML spec lets a document declare external entities, like references to content fetched from a URL or a local file at parse time. If the parser resolves those references and the application returns the result, an attacker who controls the XML can read files off the server.

The classic payload looks like this:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///c:/windows/win.ini">
]>
<foo>&xxe;</foo>

A vulnerable parser fetches c:\windows\win.ini and substitutes the contents into the response. Swap in any path the SQL Server service account can read -- config files, certificates, credential stores, anything -- and you have file disclosure. Point the entity at a non-responsive server and you have a DoS.

OWASP has covered this for over a decade. The fix in modern XML parsers is to disable DTD processing and external entity resolution by default. Microsoft's KB confirms that is what shipped on May 12 for the Web Service Task.

Who is actually affected

Three conditions have to be true at the same time for this CVE to matter to your environment:

Condition Detail
You run SSIS Integration Services installed on a Windows host running a supported SQL Server version.
At least one package uses Web Service Task The control flow task that calls a SOAP web service. If your packages do not include it, the bug is not reachable.
An attacker can influence the XML Microsoft classifies the attacker as 'authorized,' meaning they need permission to run packages or supply package input. Not unauthenticated.

The third condition is what keeps this off the wormable list, and it is why I let it slide last week. 'Authorized attacker' is not nothing though. A user with SSIS execute permissions, a compromised application account, a malicious insider — any of those qualifies. And depending on which environment we are talking about, that could be a dozen people or a hundred.

One thing the bug description does not say is whether the vulnerability is reachable through package design alone, or only through runtime input. Until that is clear, the safe read is: if you have packages with the Web Service Task and any non-DBA can affect what XML those packages process, patch.

The patch matrix

Microsoft shipped a fix for every supported SQL Server version on May 12. Builds and KBs verified against Microsoft Support:

Version Servicing path KB Build
SQL Server 2016 SP3 GDR KB5089271 13.0.6490.1
SQL Server 2017 GDR KB5090347 14.0.2110.2
SQL Server 2019 GDR KB5090408 15.0.2170.1
SQL Server 2022 GDR KB5091158 16.0.1180.1
SQL Server 2022 CU24 KB5089900 16.0.4252.3
SQL Server 2025 GDR KB5091223 17.0.1115.1
SQL Server 2025 CU4 KB5089899 17.0.4040.1

Sibling KBs for the CU paths on 2016, 2017, and 2019 also shipped the same day. Check the Microsoft Security Update Guide for the package that matches your current build. As I noted last week, GDR and CU are separate lanes. Pick the one you have been applying.

SQL Server 2014 and earlier are out of mainstream support and do not appear in the May release. If you are on one of those and running SSIS, you are not getting this fix.

What to do this week

1. Inventory your packages.

Find every package that uses the Web Service Task. If your packages live in SSISDB, the project XML is queryable. If they live on the file system or in msdb, a recursive grep for STOCK:WebServiceTask in the .dtsx files will find them. Most shops have never inventoried which SSIS tasks are in use where. This is a good reason to start.

2. Identify who can supply input to those packages.

Any account that can execute the package, modify its variables at runtime, or supply the XML input it processes. SSIS catalog permissions, SQL Agent proxy accounts, application accounts calling the package via T-SQL -- all of them count.

3. Patch.

Apply the GDR or CU that matches your servicing path. The fix is in the SSIS runtime on the server, not the package itself. You do not need to redeploy packages.

4. If you cannot patch this week, look at your packages.

The Web Service Task is one of several tasks in SSIS that consume XML. If a package is reading XML from a source an untrusted user can influence, that is the exposure window. Consider disabling those packages until the patch lands.

What I should have said last week

'SQL Server got off easy' was true for the engine. It was not true for SSIS. A CVSS 8.8 RCE that ships under a SQL Server CVE number, with a fix in every supported version, deserves more than a footnote -- even when the attack surface is narrower than the headline suggests.

This is the second month in a row that the SQL Server CVE worth caring about is not actually in the engine. March was the sysadmin escalation in linked servers. May is XXE in the SSIS Web Service Task. The pattern is worth noting. The headline CVE rating tells you the score. It does not tell you which component shipped the fix or whether the bug is reachable in your environment. For that, you have to read the KB

Apologies for the miss. Patch your SSIS hosts this week.

More to Read

Microsoft Security Response Center: CVE-2026-40370
Microsoft Security Update Guide: May 2026 Release Notes
KB5089899: SQL Server 2025 CU4 (canonical fix description)
OWASP: XML External Entity (XXE) Processing
sqlfingers inc: Patch Tuesday May 2026 — SQL Got Off Easy. Your Domain Didn't.