Tuesday, June 9, 2026

Why Are We Still Paying for SQL Server?

A client asked me last week, plainly: 'Why are we still paying for SQL Server when Postgres is free?' It is a fair question, and it is being asked more and more every day. This post is the honest answer from a SQL Server professional who has also spent real time with PostgreSQL and MySQL. Just my thoughts on where SQL Server and Postgres stand in June 2026, what each one really wins for us, and the bill nobody's talking about when they say 'free'.

Where They Actually Stand

Popularity is not really a benchmark, but it is a signal, and the trend line is worth reading. The independent DB-Engines ranking scores engines on search interest, job listings, technical mentions, and social signals. Here is the top of the table for June 2026, with the change on the month.

Rank Engine Score
(Jun 2026)
Change
(month)
Change
(year)
1 Oracle 1140.04 -3.24 -90.35
2 MySQL 856.29 -0.21 -97.29
3 Microsoft SQL Server 698.04 -2.95 -78.71
4 PostgreSQL 688.23 +5.55 +7.58
5 MongoDB 387.97 +3.33 -14.87

SQL Server still holds third, but barely. Less than ten points now separate it from PostgreSQL, on a scale where it led by a comfortable margin a year ago. The direction of travel is the real story, though, like what's in that last column. Over the past year SQL Server shed almost 79 points while PostgreSQL gained about 8. It is the only engine in the top four that rose at all. Oracle and MySQL fell harder than SQL Server did. The whole top of the table is compressing, and Postgres is the one climbing into it.

But popularity rankings lean toward enterprise signals like job postings, and that tells a different story than what developers reach for on their own. In the 2024 Stack Overflow developer survey, PostgreSQL was the most-used database at about 49% of respondents, against around 25% for SQL Server. Read together, the two measures explain each other: SQL Server still wins on enterprise-weighted popularity, while Postgres owns developer mindshare and new projects. I'm pretty sure that split is exactly why my client is asking.

What Postgres Genuinely Does Better

If you want to be trusted on this, you have to concede the real ground first. Postgres earns its reputation in a few places.

The license line is zero.

Not discounted, but zero. There is no per-core tax, no edition ceiling pushing you to a more expensive SKU, and no Software Assurance clock. We will put real numbers on what that saves in a minute.

Extensibility is a first-class idea.

PostGIS for spatial, pgvector for embeddings, foreign data wrappers, and a deep catalog of extensions you can bolt on without waiting for the vendor to ship a feature. When a new workload shows up, Postgres usually already has an extension for it.

It runs anywhere.

Every major cloud has a managed Postgres, and the engine itself is portable across all of them. No platform gravity and no licensing-mobility headaches when you move it.

What SQL Server Still Wins

And then the other side of the ledger, which is still just as real.

The tooling and the operational story.

SSMS, Query Store, the Intelligent Query Processing family, a mature Always On availability-group stack, and first-party enterprise support. If you live in the Microsoft estate, that integration is not just a nice-to-have. It is the reason the lights stay on in many shops.

It is still shipping real innovation.

This is not a legacy engine coasting. SQL Server 2025 put a native JSON type, fuzzy string matching, and vector support in the box. I wrote up the native JSON data type recently, and there is more in that release than most shops have looked at yet.

The Microsoft-stack gravity is a feature, not just lock-in.

Entra ID, Power BI, SSIS, Fabric, the whole pipeline assumes SQL Server is in the middle. Ripping it out means rebuilding far more than just your database.

The Migration Bill Nobody Quotes

Here is where the 'just move to Postgres' conversation usually goes quiet. Moving the table data is the easy part, and it is almost never where the cost lives. The bill is in the procedural code.

Every stored procedure, trigger, and function written in T-SQL has to be converted to PL/pgSQL. Your SSIS packages do not come along. Your SQL Agent jobs do not come along. Your application's data-access layer has to be retested against different behavior in everything from identity columns to isolation levels to date handling. This is very important and can become very complicated. There are many tools to help with the mechanics. Like Babelfish, which eases database migrations by enabling PostgreSQL to understand and execute Microsoft SQL Server commands. There's also pgloader, the AWS Schema Conversion Tool, and AWS DMS -- but these automate the translation, not the validation. Budget most of a migration for converting procedural code and testing it, not for moving rows. This is the line item that turns a 'free' database into a six-figure project.

The Cost Math, Honestly

So put the license numbers on the table. These are SQL Server 2025 list prices, before any volume or Software Assurance discounting, and 2025 holds the same rate as 2022.

Scenario SQL Server
Enterprise
SQL Server
Standard
PostgreSQL
Per 2-core pack $15,123 $3,945 $0
Minimum (4 cores) $30,246 $7,890 $0
Typical 16-core box $120,984 $31,560 $0

That Enterprise column is the number your client saw on a renewal quote, and is why the question keeps getting asked. But we have to be honest about the other side, too. 'Free' is the license, not the total. Postgres still costs you in operational expertise, in support contracts (if you want one), and in the migration project like I mentioned above. The license line really is zero, but the total cost of ownership most definitely is not. Anyone telling you that it is, is selling you something else entirely.

The Honest Verdict

So how did I actually answer my client's question? In short, the license is almost never the real reason you're 'still paying.' The real cost of leaving is your T-SQL, your SSIS, and your Agent jobs, and that bill dwarfs the renewal. So before you look at Postgres at all, check whether you're paying Enterprise prices for a workload that would run perfectly fine on Standard Edition. That is the question worth answering first. Then it depends on three things: the workload, the team, and how deep you already sit in the Microsoft stack.

For a brand-new, cost-sensitive, developer-led project with nothing built in T-SQL yet, Postgres is a viable default now, not a compromise. The license savings are real and the engine is genuinely quite capable. For a Microsoft-ecosystem enterprise with years of T-SQL, SSIS, and Always On built in, SQL Server still earns its keep, and the migration math usually says stay put, at least for the core. Where it gets interesting is the shops paying Enterprise prices for Standard-shaped workloads. That is where a hard look pays off, and it is not always a Postgres answer. Sometimes it is just right-sizing to SQL Server Standard, which just became a lot more capable in the 2025 release.

The thing I will not tell a client anymore is that the choice is obvious. The era where nobody got fired for buying SQL Server is over. A ten-point gap on DB-Engines, with Postgres the only one of the top four still climbing, is a strong signal to run the numbers for your workload instead of just renewing on autopilot. Ask the question my client asked. Just make sure you cost out the whole answer, not the zero-price sticker.

More to Read

Microsoft: SQL Server 2025 Pricing (official PDF)
Microsoft Learn: What's new in SQL Server 2025
Microsoft Learn: Editions and supported features of SQL Server 2025
DB-Engines: Complete database popularity ranking
sqlfingers inc: Your JSON Column Was Never a JSON Column

Monday, June 8, 2026

Least Privilege for AI Accounts in SQL Server

Over the past few months, attackers have been talking Meta's new AI support bot into handing over Instagram accounts they didn't own. The attack was almost too simple. Start a normal password reset, open the support chat, tell the bot you're locked out, and ask it to change the recovery email on the account. The bot did exactly that, then sent the one-time code straight to the attacker's inbox.

Two weekends ago Meta pushed an emergency patch after accounts belonging to the Obama White House (now dormant), Sephora, and a senior US Space Force official were taken over. Meta has not said how many accounts were hit.

Confused deputy

The bot appears to have been wired into Meta's account management systems with permission to make changes like email swaps and password resets, but it was not taught to confirm it was talking to the real account owner. Security people have a name for this going back to the 80s -- the 'confused deputy'. A trusted process with real privileges gets tricked into misusing them on someone else's behalf.

Very simply, the bot was argued into doing something its permissions allowed and its judgment should have stopped. Per Brian Krebs, cybersecurity investigative journalist, the attack failed against accounts with multi-factor authentication enabled, including SMS codes. Translation: Go turn on Two-factor Authentication. Now.

What this means if you're putting AI near SQL Server

This is a consumer story, but the lesson lands squarely in our world. As people start wiring AI agents and MCP servers into SQL Server, the account that the AI runs under is the real safety boundary -- not the model's good sense or anyone's assumptions about how it is supposed to work. It MUST be controlled by the account privileges. If you grant an AI principal more than it strictly needs, you're trusting the model to never be argued out of restraint, and Meta just showed us how that goes.

So the takeaway is an old one, only pointed at a new kind of caller. Grant the AI account the minimum it needs, audit what it can touch, and assume it can be talked into anything its permissions allow. And don't ever treat this as a one-time setup. Monitor what your AI and service accounts are doing regularly, the same way you would any other privileged login.

More to Read

Meta's AI support bot happily handed Instagram accounts to hackers (Malwarebytes)

Saturday, June 6, 2026

Globex, Globx, and GLOBEX: Fuzzy String Matching in SQL Server 2025

You have a customer list. Somewhere in it, 'Globex Corporation' got entered four different ways by four different people at four different times. One typed 'Globx Corporation'. One typed 'Globex Corporaton'. One held the shift key down and gave you 'GLOBEX CORPORATION'. They are all the same company, but your reports are counting them as four different entities.

SQL Server never had a good built-in answer for this. SOUNDEX has been in the box forever, but it was built to match names that sound alike, not to measure how close two typed strings actually are -- a different job entirely. So, we either hand-wrote the matching logic ourselves or pushed the whole problem out to SSIS Fuzzy Lookup or CLR. SQL Server 2025 finally ships the math itself. Four functions, no UDF, no external code. This post is the demo: the old pain, the four new functions, a real dedup query, and the one gotcha that may bite you on day one.

The Old Way: Roll Your Own

The closest thing we had built in was SOUNDEX, and it was never up to this job. It is phonetic, it is English-centric, and it crushes every word down to a four-character code, so collisions are constant. It can tell you two words sound alike. It cannot tell you that 'Globex Corporaton' is one keystroke from 'Globex Corporation'.

-- The tool we used to reach for. It answers a different question than the one you have.
SELECT SOUNDEX('Globex'), SOUNDEX('Globx');

So most shops did one of two things. They hand-wrote a function to count the differences between two strings and ran it across a million rows -- the per-row, parallelism-killing scalar UDF curse I wrote about earlier this month. Or they gave up on T-SQL entirely and moved the work to SSIS or a CLR assembly. Both work, but neither is something you want to manage as an 'extra' in your data layer.

Turn It On First

Two things have to be true before the four new functions can be used. The database has to be at the SQL Server 2025 compatibility level (170), and the preview-features database scoped configuration has to be on.

-- Check where you are
SELECT name, compatibility_level
FROM sys.databases
WHERE name = DB_NAME();

-- Set the 2025 compatibility level if you are not there yet
ALTER DATABASE [YourDbName] SET COMPATIBILITY_LEVEL = 170;

-- Turn on the preview features for this database
ALTER DATABASE SCOPED CONFIGURATION SET PREVIEW_FEATURES = ON;

Skip that last step and SQL Server does not tell you what you missed. It just acts like the function does not exist, because as far as it is concerned, it doesn't:

Msg 195, Level 15, State 10, Line 1
'EDIT_DISTANCE' is not a recognized built-in function name.

If you see that on SQL Server 2025, your first check is whether the preview-features setting is ON.

USE DBA; -- Change to the db where you've enabled it.
SELECT 
    CASE value 
        WHEN 1 THEN 'TRUE (Enabled)' 
        ELSE 'FALSE (Disabled)' 
    END AS Is_Preview_Features_Enabled
FROM sys.database_scoped_configurations
WHERE name = 'PREVIEW_FEATURES';

Use the ALTER statement above to set it up if it's not enabled. Otherwise, keep moving.

The Four Functions

There are four functions, built on two different algorithms that answer the same question -- how far apart are two strings? There's two because they just measure it differently.

The first algo is Levenshtein, which measures what's called the edit distance, which is the number of single-character changes -- add a letter, remove one, swap one -- it takes to turn one string into the other. 'Globx' into 'Globex' is one edit, just add the missing e. 'Cat' to 'Dog' is three. The smaller that number, the closer the two strings are to matching.

The second algo is Jaro-Winkler. It scores similarity on a different formula, and the part that matters is that it gives extra credit when two strings start the same way. We will see exactly where that pays off in a minute.

Each algorithm comes as a pair. One function hands you the raw distance, the other turns it into a 0-to-100 similarity score.

Function Returns Range
EDIT_DISTANCE Edits to turn one string into the other
(insert, delete, substitute)
0 = identical,
higher = farther apart
EDIT_DISTANCE_SIMILARITY Similarity score from the edit distance 0 = no match,
100 = full match
JARO_WINKLER_DISTANCE Distance, weighted toward strings that
agree on their opening characters
0 = identical,
1 = no match
JARO_WINKLER_SIMILARITY Similarity, same prefix weighting 0 = no match,
100 = full match

One footnote before we move on. Microsoft's docs call this Damerau-Levenshtein and list transposition as a fourth operation -- until they admit a few lines down that transpositions are not supported yet. So today it is plain Levenshtein: three operations, not four. Test a straight swap like 'AB' against 'BA' and you get 2, not the 1 a real Damerau-Levenshtein would give you.

One other thing to keep straight is that distance and similarity run in opposite directions. A distance of 0 is a perfect match and climbs as the strings differ. A similarity of 100 is a perfect match and falls toward 0. So a good match means a low distance but a high similarity. Get that backwards in a WHERE clause and you will filter out exactly the rows you meant to keep.

A Quick Look at All Four

Here are five pairs run through all four functions at once, so you can see how they disagree.

SELECT T.source_string,
       T.target_string,
       EDIT_DISTANCE(T.source_string, T.target_string)            AS ED_Distance,
       JARO_WINKLER_DISTANCE(T.source_string, T.target_string)    AS JW_Distance,
       EDIT_DISTANCE_SIMILARITY(T.source_string, T.target_string) AS ED_Similarity,
       JARO_WINKLER_SIMILARITY(T.source_string, T.target_string)  AS JW_Similarity
FROM (VALUES ('Black', 'Red'),
             ('Colour', 'Yellow'),
             ('Colour', 'Color'),
             ('Microsoft', 'Msft'),
             ('Regex', 'Regex')
     ) AS T(source_string, target_string);

Look at the 'Microsoft' / 'Msft' row. Edit distance sees five changes and scores it a weak 44 because half the letters are gone. Jaro-Winkler is kinder at 51, because the two share an opening 'M' and the same letters in order. That difference is the whole personality split between the two algorithms, and it is why abbreviations are where they part ways.

The Real Job: Catching Duplicate Customers

The warm-up used literals. Here is the actual problem -- a staging table full of names a human typed, and a known-good master value you want to match them against.

CREATE TABLE dbo.CustomerImport
(
    RowID       INT IDENTITY(1,1) PRIMARY KEY,
    EnteredName NVARCHAR(100) NOT NULL
);

INSERT dbo.CustomerImport (EnteredName)
VALUES (N'Globex Corporation'),   -- clean
       (N'Globexx Corporation'),  -- doubled letter
       (N'Globex Corporaton'),    -- dropped an i
       (N'Globx Corporation'),    -- dropped an e
       (N'Globex Corp'),          -- abbreviation
       (N'GLOBEX CORPORATION');   -- shift key never came up

Now score every entered name against the master string and attach a verdict. The similarity score does the work; the CASE expression just turns a number into a decision a human can act on.

SELECT EnteredName,
       EDIT_DISTANCE(EnteredName, N'Globex Corporation')            AS Edits,
       EDIT_DISTANCE_SIMILARITY(EnteredName, N'Globex Corporation') AS Similarity,
       CASE
           WHEN EDIT_DISTANCE_SIMILARITY(EnteredName, N'Globex Corporation') >= 95
               THEN 'Almost certainly the same'
           WHEN EDIT_DISTANCE_SIMILARITY(EnteredName, N'Globex Corporation') >= 85
               THEN 'Likely the same - review'
           WHEN EDIT_DISTANCE_SIMILARITY(EnteredName, N'Globex Corporation') >= 70
               THEN 'Possible match'
           ELSE 'Probably different'
       END AS Verdict
FROM dbo.CustomerImport
ORDER BY Similarity DESC;

The three typos land at 94 and 95 and all get flagged for review or better -- that's the point, a single dropped or doubled letter barely moves the score on an eighteen-character string. But the bottom two rows are also the same company, and both fell through to 'Probably different'. Those two failures are what you need to understand before you trust this in production.

'Globex Corp' scored 61 -- abbreviations are hard for edit distance.

'Globex Corp' is just the company name shortened. But turning 'Globex Corp' into 'Globex Corporation' means adding seven letters, so the edit distance is seven and the score drops to 61. Edit distance only counts the changes; it has no idea the seven missing letters spell a harmless 'oration'. Any abbreviation looks far apart for the same reason. Jaro-Winkler does better here, because the two names start identically, and that is exactly what it rewards. So when your data is full of 'Corp', 'Inc', and 'Ltd', try both and compare the scores.

'GLOBEX CORPORATION' scored 17 -- and that is the gotcha that gets everyone.

The Gotcha: These Functions Are Case-Sensitive

This is the one to circle. The same company name, in capitals, scored 17 out of 100. Not because the algorithm is wrong, but because right now these functions ignore your collation entirely. Microsoft says so plainly in the docs: "the functions don't adhere to the comparison semantics defined by collation settings, such as case insensitivity and other collation-specific rules."

So even on a case-insensitive database, which most of them are, an uppercase letter and its lowercase version count as a change. 'GLOBEX CORPORATION' differs from 'Globex Corporation' in fifteen places, so the engine counts fifteen edits and scores it 17.

-- Same company. Case-sensitive comparison thinks they are barely related.
SELECT EDIT_DISTANCE_SIMILARITY(N'Globex Corporation', N'GLOBEX CORPORATION') AS RawScore;

-- Normalize the case on both sides first, and the truth comes back.
SELECT EDIT_DISTANCE_SIMILARITY(
           UPPER(N'Globex Corporation'),
           UPPER(N'GLOBEX CORPORATION')) AS NormalizedScore;

The fix is one fast function call away. Wrap both sides in UPPER or LOWER. You need to pick one and stay consistent so that case never enters into the comparison. Until Microsoft teaches these functions to honor collation, do this on every fuzzy comparison you write. Every time.

EDIT_DISTANCE or JARO_WINKLER?

You have two algorithms. They are good at different things.

Reach for When
EDIT_DISTANCE Typos, transpositions, dropped or doubled
characters -- short, roughly equal-length strings
JARO_WINKLER Names and strings that share a strong start
but vary in length, including abbreviations

Edit distance is literal. It simply counts changes. Great for the 'Corporaton' kind of typo, lousy at the 'Corp' kind, because a long missing tail reads as a pile of edits. Jaro-Winkler forgives a different ending when the beginning matches, which is how human-typed names usually drift. When you are unsure, run both and compare the scores, the way my warm-up query above did. Where they disagree is where your judgment goes.

The One Thing to Read First

Here is the honest version. On SQL Server 2025, fuzzy string matching is still in preview. It ships in the box, it works the moment you flip the database scoped configuration, and you can build and test on it today. But preview is not GA. This is the same caveat I raised on the native JSON data type, since preview behavior can change between cumulative updates.

Where Fuzzy matching status
SQL Server 2025 (box) In preview -- requires PREVIEW_FEATURES = ON
Azure SQL Database Generally available
Azure SQL Managed Instance Available on the SQL Server 2025 / Always-up-to-date update policy
SQL database in Microsoft Fabric Generally available

One more thing, beyond the case issue: these functions run per row. Comparing two literals is free. Comparing a column against a master across a few million rows has real compute costs on every row. Narrow the field first with something cheap, like a same-first-letter filter, a length window, or an existing index, and then let the fuzzy function do its work on what's left. Don't point it at the whole table and just hope it works out.

The Bottom Line

The hand-rolled Levenshtein UDF was never wrong. It was just the best we had, and it dragged a per-row scalar function through every query that used it. SQL Server 2025 puts the math in the engine where it belongs, and the whole pattern collapses to a function call.

Edit Distance and Jaro-Winkler are the two newly introduced built-in fuzzy string-matching functions designed to identify typos, duplicate entries, and approximate text matches. This is a nice uptick from the custom T-SQL scripts we had to write and/or the CLR assemblies we used to achieve the same before SQL Server 2025.

Two things to remember. The functions are case-sensitive today regardless of your collation, so fold the case on both sides before you compare. And edit distance and Jaro-Winkler are not interchangeable. One counts keystrokes, the other forgives a different ending. Get those two right and you can finally answer 'is this the same customer?' within T-SQL.

Oh yeah -- also keep an eye on that preview flag until Microsoft moves it to GA in the box.

More to Read

Microsoft Learn: What is fuzzy string matching?
Microsoft Learn: EDIT_DISTANCE (Transact-SQL)
Microsoft Learn: JARO_WINKLER_SIMILARITY (Transact-SQL)
Microsoft Learn: ALTER DATABASE SCOPED CONFIGURATION (preview features)
sqlfingers inc: Cursed SQL -- Six Queries That Run Fine Until They Don't
sqlfingers inc: Your JSON Column Was Never a JSON Column

Wednesday, June 3, 2026

The AI Didn't Hack Mexico. SQL Injection Did.

I live in Cozumel, MX part-time. So when one person walked off with about 150GB of Mexican government data this winter — tax records, voter rolls, civil registry files, government logins — it was not abstract news to me. It was my neighbors' data.

The headline everyone ran with was 'AI hacked the Mexican government.' That is the wrong lesson. The AI did not invent a new way in. It ran the oldest trick in the book at machine speed: SQL injection.

What Happened

On February 25, 2026, the Israeli firm Gambit Security disclosed the campaign, first reported by Bloomberg. One unidentified operator, working for about a month starting in December 2025, hit roughly ten government bodies; the federal tax authority (SAT), the electoral institute (INE), several state governments, Mexico City's civil registry, even Monterrey's water utility.

The toolkit was not a nation-state arsenal. It was a consumer AI subscription. The operator jailbroke Claude Code, which means they talked it past its safety guardrails, with Spanish-language prompts, framing every request as an authorized 'bug bounty' and role-playing the model as an 'elite hacker.' When Claude balked, they switched over to GPT-4.1 to keep moving. Total haul: about 150GB, and roughly 195 million identities exposed.

The Part Nobody Wants to Say Out Loud

Gambit found at least 20 vulnerabilities exploited across those systems. None were exotic. Just exposed admin panels. Default credentials. Unpatched web apps matching CVE-2023-era patterns. Login pages with no rate limiting. The kind of risks sitting in thousands of shops right now — maybe even one that you support.

Here is the line from the logs that should stop every DBA cold. The operator had a frontier AI model at the keyboard. What did they have it write? Python SQL injection payloads against login pages on .gov.mx, built around a string straight out of a 2006 tutorial:

' UNION SELECT username, password FROM users--

That is not a 2026 attack. That is a 2006 attack, run at machine speed by something that never gets tired of rephrasing the prompt. Gambit's CEO called it this way: "This reality is changing all the game rules we have ever known." Maybe so. But the door it walked through is one we have known how to lock for twenty years.

SQL Injection, Still Undefeated in 2026

If you concatenate user input into a query string, you are vulnerable. It does not matter how clever your firewall is. Here is the whole problem and the whole fix, end to end. Run it on any version, sp_executesql has been there since forever.

CREATE TABLE dbo.Users
(
    UserId       int IDENTITY(1,1) PRIMARY KEY,
    Username     nvarchar(128) NOT NULL,
    PasswordHash nvarchar(256) NOT NULL
);

INSERT dbo.Users (Username, PasswordHash)
VALUES (N'rlewis', N'hash_aaa'),
       (N'admin',  N'hash_zzz');

The vulnerable pattern

Build the query by gluing the input straight into the string, and you have handed over the keys to whoever is typing.

CREATE OR ALTER PROCEDURE dbo.GetUser
    @Username nvarchar(128)
AS
BEGIN
    DECLARE @sql nvarchar(max);

    SET @sql = N'SELECT Username, PasswordHash
                 FROM dbo.Users
                 WHERE Username = ''' + @Username + N'''';

    EXEC (@sql);
END;

Call it normally and it behaves:

EXEC dbo.GetUser @Username = N'rlewis';

Now feed it the kind of input the attacker used:

EXEC dbo.GetUser @Username = N''' OR 1=1 --';

The string your server builds and then runs becomes this:

SELECT Username, PasswordHash
FROM dbo.Users
WHERE Username = '' OR 1=1 --'

Every row in the table comes back. Username and password hash for every account, handed over by your own stored procedure. That is the breach, in miniature.

The fix: parameterize

The query text is fixed and the input rides in as a parameter, so the engine compiles the command before it ever sees the value. The input can only ever be a username to look up, never code to run. That is the whole difference: concatenation lets the attacker write SQL, parameters only let them supply a value.

CREATE OR ALTER PROCEDURE dbo.GetUser_Safe
    @Username nvarchar(128)
AS
BEGIN
    DECLARE @sql nvarchar(max);

    SET @sql = N'SELECT Username, PasswordHash
                 FROM dbo.Users
                 WHERE Username = @Username';

    EXEC sys.sp_executesql
        @sql,
        N'@Username nvarchar(128)',
        @Username = @Username;
END;

Same malicious input, run again:

EXEC dbo.GetUser_Safe @Username = N''' OR 1=1 --';

Zero rows. The input is read as a literal username that simply does not exist, not as a condition:

(0 rows affected)

What This Means for You

The uncomfortable part of this story is how little of it is about AI. Strip the model out and you still have unpatched servers, over-privileged accounts, and login pages that a 2006 worm could have walked through. AI did not lower your defenses. It lowered the skill and the patience required to find the gaps you already had.

So the homework has not changed. It has only become more pressing.

Control Why it matters now
Parameterize everything No concatenated user input, anywhere. sp_executesql or parameters, every time.
Patch on a real cadence The 20 flaws were known and old. A model can find every one of them in minutes.
Least privilege A compromised login should reach almost nothing. Scope every account down.
Rate limit and lock out Machine-speed credential stuffing dies fast against throttling and lockouts.
Watch behavior, not signatures No human types thousands of login attempts a minute. Alert on the pattern.

The AI is new. The way in was not. Lock the door we have known about since SQL Server 6.5.

More to Read

Security Affairs: Claude Code abused to steal 150GB in cyberattack on Mexican agencies
HawkEye: How hackers used Claude to breach the Mexican government (the full attack chain)

Tuesday, June 2, 2026

Cursed SQL: Six Queries That Run Fine Until They Don't

Sometimes SQL can be cursed. It runs fine in dev and makes it through testing without issue. Then it gets to production, the table grows, the load spikes, and things turn on you at 2 a.m.

The cursed SQL returns answers. The problem is that the answer is wrong, or slow, or both -- and SQL Server never says a word. That's the curse. Here are six good examples.

CREATE TABLE dbo.Orders (
    OrderID     INT IDENTITY(1,1) PRIMARY KEY,
    CustomerID  INT          NOT NULL,
    AccountNo   VARCHAR(20)  NOT NULL,
    OrderDate   DATETIME2    NOT NULL,
    Amount      DECIMAL(10,2) NOT NULL
);

-- A covering index for some of our victims below
CREATE INDEX idx_Orders_CustomerID ON dbo.Orders (CustomerID) INCLUDE (Amount);

-- And one on the columns we will abuse
CREATE INDEX idx_Orders_AccountNo ON dbo.Orders (AccountNo);
CREATE INDEX idx_Orders_OrderDate ON dbo.Orders (OrderDate);

INSERT dbo.Orders (CustomerID, AccountNo, OrderDate, Amount)
SELECT TOP (500000) -- 500K rows, enough to make the bad plans hurt (TOP caps the cross join)
    ABS(CHECKSUM(NEWID())) % 1000,
    CAST(ABS(CHECKSUM(NEWID())) % 100000 AS VARCHAR(20)),
    DATEADD(DAY, -(ABS(CHECKSUM(NEWID())) % 1000), SYSDATETIME()),
    (ABS(CHECKSUM(NEWID())) % 50000) / 100.0
FROM sys.all_objects a CROSS JOIN sys.all_objects b;

Curse 1: The Cursor That Should Have Been a Set-Based Manipulation

The classic. You need to update every row, so you reach for the tool that processes one row at a time. It feels procedural. It feels safe. It is neither.

DECLARE @id INT, @amt DECIMAL(10,2);
DECLARE c CURSOR FOR 
SELECT OrderID, Amount FROM dbo.Orders;
OPEN c;
FETCH NEXT FROM c INTO @id, @amt;

WHILE @@FETCH_STATUS = 0
BEGIN
    UPDATE dbo.Orders
    SET Amount = @amt * 1.05
    WHERE OrderID = @id;

    FETCH NEXT FROM c INTO @id, @amt;
END

CLOSE c;
DEALLOCATE c;

The cursor spells out a procedure: fetch a row, update that row, fetch the next, repeat half a million times. Each pass is its own latch, lock, and log record. But SQL Server is a set-based engine, and a safer approach is a working table you walk in batches with a WHILE loop, updating through a join instead of fetching one row at a time.

-- Stage the rows to process in a working table with an identity column
CREATE TABLE #Work (
    RowID    INT IDENTITY(1,1) PRIMARY KEY,
    OrderID  INT NOT NULL
);

INSERT #Work (OrderID)
SELECT OrderID FROM dbo.Orders;

DECLARE @BatchStart INT = 1,
        @BatchSize  INT = 10000,
        @MaxRow     INT;
SELECT @MaxRow = MAX(RowID) FROM #Work;

WHILE @BatchStart <= @MaxRow
BEGIN
    UPDATE o
    SET o.Amount = o.Amount * 1.05
    FROM dbo.Orders o INNER JOIN #Work w 
      ON o.OrderID = w.OrderID
    WHERE w.RowID >= @BatchStart
    AND w.RowID <  @BatchStart + @BatchSize;

    SET @BatchStart = @BatchStart + @BatchSize;
END

DROP TABLE #Work;

That is the set-based manipulation: stage the rows in a working table keyed by an identity column, then walk it with a WHILE loop, updating through the join one batch at a time. No CURSOR, no FETCH, no DEALLOCATE, and none of the per-row engine machinery a cursor hauls around behind it. It still walks the set, but on your terms, through a structure you own, and each pass updates ten thousand rows in a single set operation instead of one row in its own transaction. Batching is the whole point: you cap transaction size, lock footprint, and log growth instead of letting one statement hold the entire table hostage. And the 10K is a starting point. Test it out. You can probably go as high as 100K.

Curse 2: SELECT * Where Nobody Asked

SELECT * is convenient, but it's also a promise you can't keep. The moment someone adds, drops, or reorders a column, everything downstream gets a surprise. And what about the overhead? SELECT * kills index efficiency by forcing expensive Key Lookups or Clustered Index Scans instead of using fast, narrow indexes. It also wastes network bandwidth by pushing large amounts of unneeded data across the network - and more.

We have an index that covers CustomerID and Amount. Ask for precisely those two columns and the index answers the query by itself; SQL Server never touches the table:

SELECT CustomerID, Amount
FROM dbo.Orders
WHERE CustomerID = 42; -- Index Seek on idx_Orders_CustomerID.

Now ask for everything. The index doesn't carry the other columns, so SQL Server has to go fetch them -- a key lookup back to the clustered index for every matching row, or it abandons the index entirely and scans. Same WHERE clause, far more work:

SELECT *
FROM dbo.Orders
WHERE CustomerID = 42; -- Key Lookup or Clustered Index Scan. More reads, more pages, more time.

Name your columns. You get the index you paid for, and your query still means the same thing six months from now.

Curse 3: A Function on the Wrong Side of the WHERE

You want this year's orders, so you wrap the column in a function. Reads beautifully but performs like a swamp. When you apply a function to an indexed column in the WHERE clause, the predicate is no longer SARGable, the index goes unused, and SQL Server scans the whole thing.

SELECT OrderID, Amount
FROM dbo.Orders
WHERE YEAR(OrderDate) = 2026; -- Index Scan. SQL Server has to compute YEAR() for every single row.

Leave the column alone and express the same idea as a range. Now the index on OrderDate does its job.

SELECT OrderID, Amount
FROM dbo.Orders
WHERE OrderDate >= '2026-01-01'
AND OrderDate < '2027-01-01'; -- Index Seek. Touches only the rows in range.

Note the half-open range, '>=' to '<', rather than BETWEEN. That will sidestep the boundary fights you get with the time portion of a datetime2.

Curse 4: The Implicit Conversion You Never See

This one is the quietest of them all. AccountNo is a VARCHAR. You compare it to a number because, well, it looks like a number. SQL Server obliges, because it never says no when it can convert.

SELECT OrderID, AccountNo
FROM dbo.Orders
WHERE AccountNo = 12345;   -- integer literal vs VARCHAR column

Here's the trap. When two types meet, the one with lower precedence is converted up to the higher one, and INT outranks VARCHAR. So SQL Server doesn't convert your literal down to a string, it converts the entire column up to INT, row by row. CONVERT_IMPLICIT on every value means your index on AccountNo is useless and it's going to scan. The plan for this query even waves a little yellow warning triangle at you that nobody ever clicks.

Match the literal to the column's type and the conversion vanishes:

SELECT OrderID, AccountNo
FROM dbo.Orders
WHERE AccountNo = '12345';   -- string vs string, index seek

The fix is one pair of quotes. Nothing was broken -- SQL Server did exactly what the precedence rules say it should. It just did it silently, on every row, and handed back the right answer very slowly. That's the whole story of implicit conversion.

Curse 5: NOLOCK as a Performance Setting

Somewhere along the line, NOLOCK got a reputation as the 'go faster' hint. Slap it on, skip the locks, and get your data sooner. What it actually does is set that statement to READ UNCOMMITTED, and that means something very different than just 'faster.'

SELECT CustomerID, Amount
FROM dbo.Orders WITH (NOLOCK);

READ UNCOMMITTED reads rows other transactions have modified but not committed. Those are dirty reads. Values that may never have existed if the other transaction rolls back. But it gets worse than reading the wrong number. Straight from Microsoft: with NOLOCK, rows can appear or disappear in your result set before your statement finishes. During a page split it can skip rows entirely, or hand you the same row twice. You don't get an error. You get a report with bad data. Again, dirty reads. You don't want these.

NOLOCK is not a performance tuning option. If blocking is your real problem, the grown-up answer is to identify and resolve what is causing the blocking. Row versioning is an option, where readers do not request shared locks, and writers do not block readers:

USE master;
ALTER DATABASE YourDBName
SET READ_COMMITTED_SNAPSHOT ON
WITH ROLLBACK IMMEDIATE;

Reach for NOLOCK only when you've decided intentionally that speed beats accuracy. That's a real tradeoff sometimes, and it's almost never the one people think they're making.

Curse 6: The Scalar UDF Hiding in a SELECT

You wrap a little logic in a scalar function because it's tidy and reusable. Then you call it in a SELECT over a million rows, and that tidy little function runs a million times, serially, with the Optimizer mostly blind to what's inside it.

CREATE FUNCTION dbo.udfAmountWithTax (@amt DECIMAL(10,2))
RETURNS DECIMAL(10,2)
AS
BEGIN
    RETURN @amt * 1.08;
END
GO

SELECT OrderID, dbo.udfAmountWithTax(Amount) AS WithTax
FROM dbo.Orders;

Historically this was a guaranteed performance crime with per-row execution and a query that refuses to go parallel. But there is a plot twist. Starting in SQL Server 2019, scalar UDF inlining can fold qualifying functions right into the calling query, so it gets cost-based, set-oriented, and parallel-friendly, with no code change. It's on by default at database compatibility level 150 or higher.

This doesn't remove the curse. It just hides it better. Inlining doesn't catch everything. A function that references certain intrinsics, like GETDATE() or @@ROWCOUNT, can be disqualified and quietly fall back to the old per-row misery. And if you're on an older version, or running an older compatibility level in a newer version, you get none of the magic. The honest fix is to not hide set logic inside a scalar shell. Inline the expression, or use an inline table-valued function the Optimizer can actually see through.

SELECT OrderID, Amount * 1.08 AS WithTax
FROM dbo.Orders;

In Summary

None of the above throw errors. They pass review and run clean until scale, concurrency, or a schema change surfaces them. That's what makes it helpful to know them on sight. The fix is almost always trivial once you spot the shape, and invisible until you do. Now you have six shapes to spot.

More to Read

Data type precedence (Transact-SQL)
SET TRANSACTION ISOLATION LEVEL (Transact-SQL)
Scalar UDF Inlining
Intelligent Query Processing details

Monday, June 1, 2026

Error 7416: When a Security Patch Breaks Your Linked Servers

Microsoft's May 2026 cumulative update for SQL Server 2025 (CU4, KB5089899) did two things on the same day. It closed a remote code execution hole, CVE-2026-40370, and it quietly tightened how the engine validates linked server connections. The first one got the headline. The second one is the one that broke things.

If you run linked servers over the OLE DB Provider for ODBC, ie., MSDASQL, with a provider string, then any login that is not a member of sysadmin can suddenly get bounced. Nothing about your linked server changed. Nothing about your login mappings changed. The query just stops.

Msg 7416, Level 16, State 1, Line 1
Access to the remote server is denied because no login-mapping exists.

Ironically, the same security update that hardened the engine is the one rejecting your connection.

Three Things Have to Be True

The new validation check fires under a specific shape. All of these have to be true:

Condition Detail
Provider Linked server uses MSDASQL (OLE DB Provider for ODBC Drivers)
Provider string The definition specifies '@provstr'
Login The executing login is NOT in the 'sysadmin' fixed server role

What makes it crazy is that nothing is misconfigured. The linked server is right and the login mappings are right -- but the query still dies. A stricter connection validation check in the Database Engine now rejects connections that earlier builds allowed. Full stop. And since sysadmin logins are exempt from that check, the problem hides from you until a non-sysadmin application login runs a query.

Microsoft does not document the internals beyond calling it a stricter connection validation check, so the honest answer stops at what they assert: the issue hits logins that are not members of the sysadmin fixed server role, it occurs even when the linked server and login mappings are configured correctly, and it is tied to MSDASQL definitions that carry a provider string. The fix is the tell. Adding 'User ID' to '@provstr' clears it, so the check is satisfied once the provider string carries an explicit user credential.

This is not unique to SQL Server 2025. The validation change rolled out across the spring 2026 servicing stream.

Version Update KB
SQL Server 2025 CU4 / CU5 KB5089899 / KB5084896
SQL Server 2022 CU25 KB5081477
SQL Server 2019 CU32 KB5090407

SQL Server 2025 CU4 lands the engine at build 17.0.4040.1. If you are at or past these builds and a linked server job started failing in April or May, this is your first suspect.

The Setup That Breaks

Here is the shape that trips the check. An MSDASQL linked server defined with a provider string, queried by a non-sysadmin login.

EXEC master.dbo.sp_addlinkedserver
    @server     = N'ODBC_LINK',
    @srvproduct = N'',
    @provider   = N'MSDASQL',
    @provstr    = N'Driver={ODBC Driver 18 for SQL Server};Server=REMOTESRV;Database=AppDB;';
GO

Run a query against it under a non-sysadmin login and the engine says no right away.

SELECT TOP (10) *
FROM OPENQUERY(ODBC_LINK, 'SELECT * FROM dbo.Orders');
GO
Msg 7416, Level 16, State 1, Line 1
Access to the remote server is denied because no login-mapping exists.

The Fixes

You might be thinking about uninstalling the update, but that update closed an RCE. Rolling it back to fix a linked server is like trading a flat tire for a brake failure. Microsoft documents three workarounds:

1. Add User ID to the provider string

This is the cleanest fix. The login still has to supply 'UID' in the provider string, which is exactly what the stricter check now wants to see. Drop and recreate the linked server with 'User ID' included.

EXEC master.dbo.sp_addlinkedserver
    @server     = N'ODBC_LINK',
    @srvproduct = N'',
    @provider   = N'MSDASQL',
    @provstr    = N'Driver={ODBC Driver 18 for SQL Server};Server=REMOTESRV;Database=AppDB;User ID=AppReader;';
GO

2. Drop @provstr entirely

If the provider string is not pulling its weight, get rid of it. Push the connection target into '@datasrc' and let 'sp_addlinkedsrvlogin' carry the security, instead of stuffing everything into a string the engine now scrutinizes.

EXEC master.dbo.sp_addlinkedserver
    @server     = N'ODBC_LINK',
    @srvproduct = N'',
    @provider   = N'MSDASQL',
    @datasrc    = N'MyOdbcDsn';
GO

EXEC master.dbo.sp_addlinkedsrvlogin
    @rmtsrvname = N'ODBC_LINK',
    @useself    = N'False',
    @locallogin = NULL,
    @rmtuser    = N'AppReader',
    @rmtpassword = N'********';
GO

3. Grant the login sysadmin (not recommended)

It works because sysadmin skips the check. It is also a privilege escalation you are doing to yourself, on purpose, to undo a security patch. Microsoft lists it, then tells you not to do it. Listen to the second half of that sentence.

The Takeaway

Security hardening and 'no changes were made' failures arrive in the same box more often than vendors like to admit. When a linked server that ran for years dies right after a CU, the mappings are usually fine. Check the build, check the provider, and check whether the login carrying the query is a sysadmin or a mere mortal... The fix is very often just a User ID away.

More to Read

SQL Server 2025 Known Issues
SQL Server 2022 Release Notes (workarounds spelled out)
Cumulative Update 4 for SQL Server 2025 (KB5081495)
KB5089899 - Security update for SQL Server 2025 CU4 (CVE-2026-40370)

Tuesday, May 26, 2026

SIL — Because Holding 1,000 KEY Locks Is So 2022

Back in September I wrote about Optimized Locking in SQL Server 2025. Two components: Transaction ID (TID) locking and Lock After Qualification (LAQ). I told you the engine had stopped 'locking first and asking questions later'. That was true, but it was also only half the story.

Microsoft has since gone back and fine-tuned the feature, giving us two improvements that make a database more lock-free without you touching anything. The headliner is called Skip Index Locks (SIL), and the numbers are pretty good:

Measurement Result
Key locks skipped (Microsoft demo) 99.6%
Page locks skipped (Microsoft demo) 79%
Nonclustered index locks skipped (Azure fleet, ~25K DBs) 81%

No configuration. No hint to add. If you're on SQL Server 2025 with Optimized Locking enabled, SIL is already in there working for you. This post explains what SIL is, how it works, and a demo you can run yourself to see the locks vanish.

Quick recap: TID and LAQ

Transaction ID (TID) locking. Every modified row gets stamped with the ID of the transaction that touched it last. Row and page locks are still taken, but they're released the moment each row is modified, instead of being held through to the commit. The only thing held until commit is one X lock on the TID. Lock memory drops, and lock escalation drops with it.

Lock After Qualification (LAQ). The engine evaluates the predicate against the latest committed row version first, then locks only the rows that actually qualify. Pre v2025, a WHERE OrderDate < '2025-01-01' UPDATE took a U lock on every row it scanned, qualified or not. With LAQ, locks land only on the rows that match.

Both require Accelerated Database Recovery and LAQ also requires Read Committed Snapshot Isolation. Both of these shipped in SQL Server 2025 at launch.

The improvement: Skip Index Locks (SIL)

Under TID locking with RCSI and LAQ in the mix, those short-duration row and page locks are only necessary if some other query is also touching the row and expecting it to stay stable. The kind of query that needs that guarantee is one running under Repeatable Read or Serializable, or using locking hints like 'HOLDLOCK' or 'READCOMMITTEDLOCK'. Microsoft calls these Row Locking Queries (RLQ).

If no RLQ is touching the page, those row and page locks aren't preventing any conflict. Meaning, there's nothing there to conflict with and SIL recognizes that and skips them. The row still gets modified, the change is still ACID-safe, and the lock manager never gets the call. Nice.

The mechanism: Every page in the buffer pool now carries a single No-RLQ bit. When that bit is set, it tells the engine that no Row Locking Query is currently touching any row on the page. A DML statement modifying a row on that page skips the row lock and the page lock entirely, and takes a short exclusive page latch instead. The latch is brief. Each lock would have been a round trip into the Lock Manager, and for a transaction touching thousands of rows, those round trips will add up. That is the overhead that SIL minimizes for us.

Per Microsoft's Azure telemetry, RLQ-style queries are uncommon in real-world workloads using RCSI or snapshot isolation. That is why the skip ratio is so high. Most pages, most of the time, have nobody using them in a way that demands a lock.

What SIL skips, and what it doesn't

SIL IS used for:

  • INSERT statements on heaps - the intent-exclusive (IX) page lock is skipped.
  • UPDATE statements on clustered, nonclustered indexes, and heaps - both the IX page lock and the exclusive (X) row lock are skipped. The engine leans on the page latch and the TID lock instead.

SIL is NOT used for:

  • DELETE statements.
  • UPDATE on heaps where the row has existing forwarding pointers, or where the update creates new ones.
  • Rows with LOB columns - VARCHAR(MAX), NVARCHAR(MAX), VARBINARY(MAX), JSON.
  • Rows on pages that were split in the same transaction.

The exclusions matter. If your hot tables are LOB-heavy or your workload is heavy deletes, your skip ratio will be lower than the headline numbers. Those headline numbers come from update-heavy OLTP under RCSI, which is one of the most common workloads out there.

The other half: LAQ

SIL got the headline, but it shipped with a sibling. Lock After Qualification, or LAQ. LAQ is optimistic. It evaluates query predicates without placing locks on the rows while scanning. In short, it checks the latest committed version of the row and acquires the necessary lock only after the row has qualified for the modification.

The demo

Here is a full working demo for SQL Server 2025 and ADR and RCSI are required.

Step 1: Create the database and enable everything.

CREATE DATABASE SILDemo;
ALTER DATABASE SILDemo SET ACCELERATED_DATABASE_RECOVERY = ON;
ALTER DATABASE SILDemo SET READ_COMMITTED_SNAPSHOT ON;
ALTER DATABASE SILDemo SET OPTIMIZED_LOCKING = ON;
GO

Step 2: Verify SIL is enabled.

USE SILDemo;
SELECT DATABASEPROPERTYEX(DB_NAME(), 'IsOptimizedLockingOn') AS is_optimized_locking_enabled;
GO

Step 3: Create a table with a clustered index and add data.

CREATE TABLE dbo.TestSIL
(
  ID int IDENTITY(1,1) PRIMARY KEY CLUSTERED,
  Value int NOT NULL
);
GO

INSERT INTO dbo.TestSIL (Value)
SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_objects a
CROSS JOIN sys.all_objects b;
GO

Step 4: SIL active — clustered index update.

Open a transaction, update rows, and see what locks the engine holds.

BEGIN TRANSACTION;

UPDATE dbo.TestSIL
SET Value = Value + 10;

SELECT resource_type,
       request_mode,
       COUNT(*) AS lock_count
FROM sys.dm_tran_locks
WHERE request_session_id = @@SPID
  AND resource_type IN ('KEY','PAGE','RID','OBJECT','XACT')
GROUP BY resource_type, request_mode
ORDER BY resource_type, request_mode;

COMMIT TRANSACTION;
GO

If SIL is working correctly, we will see ZERO KEY locks. Under classic locking, an update of 1,000 rows would hold 1,000 X KEY locks (one per modified row) all the way to the commit. SIL skipped every one of them. We still see a small number of PAGE locks (IX mode) and an OBJECT lock (IX), but those are expected and are not held long-term. The KEY lock count is where the story is, and the KEY lock count is zero.

In a nutshell, under classic locking without SIL, updating 1,000 rows would hold 1,000 X KEY locks. With SIL, those KEY locks are skipped entirely.

What you need, and why it matters

SIL turns on for free if the rest of the stack is in place:

  • SQL Server 2025 — it shipped with GA release.
  • Accelerated Database Recovery enabled on the database.
  • Read Committed Snapshot Isolation enabled — without it, LAQ never engages and SIL has nothing to build on.
  • Optimized Locking enabled — ALTER DATABASE [dbname] SET OPTIMIZED_LOCKING = ON;
  • A workload that mostly INSERTs or UPDATEs indexed tables, under RCSI, light on LOB columns.

Azure SQL Database and SQL database in Fabric get all of this automatically. Azure SQL Managed Instance is rolling out. On-prem SQL Server 2025 needs it setup at the database-level.

When it's working, the payoff is the kind a DBA notices: lock memory drops, blocking chains on update-heavy OLTP shorten or even disappear, and you got there with no code changes and no new hints scattered throughout your queries. This is server level. There is no way to opt a single statement into SIL, and there doesn't need to be.

SQL Server 2025's Optimized Locking was already a win. I think SIL is the part that makes the numbers even more worth bragging about.

More to Read

Microsoft Tech Community: Introducing optimized locking v2
Microsoft Learn: Optimized Locking
Microsoft Learn: Accelerated Database Recovery
sqlfingers inc: Optimized Locking in SQL Server 2025 — Concurrency Gets Smarter (September 2025)

Monday, May 25, 2026

Your SQL Server 2016 Upgrade Will Succeed. Then Monday Happens.

SQL Server 2016 goes out of support on July 14, 2026. You know that already. What you may not know is how many seemingly successful upgrades are still going to fail.

Not during the installer or post-upgrade checks. Not during the Friday night smoke test. It's Monday morning -- when the applications reconnect, the linked servers are called, and the ETL starts up -- that things that worked quietly for a decade start throwing SSL errors nobody saw coming.

The upgrade succeeded. The reconnect didn't. That is this post.

The engine is not the problem

When upgrades go bad, it's typically not the engine. It's everything standing around it. Newer drivers. Stricter TLS defaults. Certificate validation that suddenly matters -- and an old environment that never had to think about any of it.

Your installer finishes, the databases come online, SSMS connects, the smoke tests pass, and everyone signs off. Then the application reconnects with a newer OLE DB or ODBC driver -- one that validates the certificate by default -- and the SQL Server that ran fine for years starts refusing connections because nobody ever put a real certificate on it.

Here is what actually changed. Somewhere between 2016-era connectivity and current Microsoft providers, certificate validation became strict by default. Old environments leaned on self-signed certificates, internal certificates that nothing ever really trusted, or no certificate story at all -- because the clients were not checking. The gap was always there, but nothing ever pushed it to the table.

Until now — and this is what it looks like:

A connection was successfully established with the server, but then an error
occurred during the login process. (provider: SSL Provider, error: 0 -
The certificate chain was issued by an authority that is not trusted.)

Read that first line again. The connection to SQL Server succeeded. The network is fine. The SQL Server is fine. The client reached the server, looked at the certificate, and stopped right there. The login never happens. It's a TLS trust problem wearing a login error as a costume.

And why didn't your testing catch it? Your DBA workstation probably already trusts that certificate chain -- so SSMS worked perfectly for everything you tested. The application servers and ETL servers do not trust it. You go home Friday believing the environment is healthy. Production tells you otherwise Monday morning.

Catch this before you upgrade

You don't have to wait until you've upgraded to see this happen. The switch is sitting right in the connection string, and you can test it today against any dev or non-prod SQL Server you have. The keyword is TrustServerCertificate.

This is the old behavior -- the one many of us leaned on for years:

Server=YourServer;Database=YourDb;
Encrypt=True;TrustServerCertificate=True;

And this is the modern default:

Server=YourServer;Database=YourDb;
Encrypt=True;TrustServerCertificate=False;

Test that second one against a SQL Server running a self-signed certificate and the SSL Provider error shows up on demand. That is the whole failure.

It is also why setting TrustServerCertificate=True is not a fix, even though it makes the error vanish instantly. It doesn't solve anything -- it just switches the check off. The traffic is still encrypted, but you are no longer verifying who you connected to. That workaround is all over the forums because it restores connectivity fast, but it's also the kind of shortcut you don't want running in production for another ten years.

The real fix is to put a properly trusted certificate on SQL Server before the migration, then validate the chain from the application and ETL servers that actually have to reconnect. Confirm the environment behaves under modern driver defaults before upgrade weekend -- not after.

Linked servers are a common place to miss this

Linked servers get overlooked because people don't picture them as client connections. But that's exactly what they are -- it's just that SQL Server is the client.

After an upgrade, a linked server inherits the same provider behavior and certificate rules as everything else. So a linked server that has run untouched since 2016 can start failing SSL validation against its partner, even though nothing in the linked server definition changed. The upgrade log won't explain it. The instance looks healthy. You find out when the nightly ETL fails, the reports stop building, or a cross-server query throws an SSL error at 2 AM.

I've seen the identical linked servers work from one SQL Server and fail from another, because the providers and trust stores underneath were different. Same definition. Same target. Completely different TLS behavior. The linked server didn't change. Everything underneath it did.

What about Extended Security Updates?

Not every shop is going to make the July 14 date. Miss it, and 2016 keeps running -- it's the patching that stops. To cover that gap, Microsoft sells Extended Security Updates for up to three more years. ESU buys security update coverage and nothing else. No features, no fixes, no performance. It just keeps the lights on while your upgrade clock keeps ticking -- and the price climbs high every year.

If you genuinely need the runway, ESU can make sense. Just treat it as what it is -- a temporary holdover, not a long-term plan. For a good breakdown of how expensive that path gets, Brent Ozar covered it here.

The bottom line

The upgrade is usually the easy part. The surrounding reconnects are where the breakage live.

Certificates, drivers, linked servers, old providers -- assumptions that survived quietly for a decade. These are what break in an upgrade. Test the connection layer now, while it is still your project and not everybody else's July emergency.

More to Read

Microsoft Lifecycle: SQL Server 2016
Microsoft Learn: Configure SQL Server Encryption
Microsoft Learn: SQL Server Extended Security Updates
Brent Ozar: SQL Server 2016 Support Ends in 90 Days

Thursday, May 21, 2026

Your JSON Column Was Never a JSON Column

For roughly a decade, the way you stored a JSON document in SQL Server was to not store a JSON document at all. You stored a string. NVARCHAR(MAX), an ISJSON check constraint to keep the garbage out, and a quiet agreement with yourself that the column was JSON because you said so.

SQL Server 2025 ships an actual json data type. Binary storage, parsed once, queryable without re-parsing every row. This post is the demo: the old pain, the new behavior, and the one word in the Microsoft docs you need to read before you put this into production.

The old way

Here is the pattern every shop has somewhere. A string column, pretending.

USE tempdb;
GO

CREATE TABLE dbo.OrdersOld
(
    OrderID INT IDENTITY(1,1) PRIMARY KEY,
    Payload NVARCHAR(MAX)
        CONSTRAINT ck_OrdersOld_IsJson_Payload CHECK (ISJSON(Payload) = 1)
);
GO

INSERT dbo.OrdersOld (Payload) VALUES
(N'{"customer":"Acme","total":1200.00,"items":["Coffee","Tea"]}'),
(N'{"customer":"Globex","total":450.00,"items":["Cookies"]}');
GO

It works. It has always worked. But look at what it actually is:

What you wanted What NVARCHAR(MAX) gave you
A JSON document A string to parse
Validation A check constraint you remembered to add
Fast reads A full re-parse of the text on every JSON_VALUE call
Cheap updates Rewrite the whole document to change one key

The check constraint is the giveaway. The engine does not know the column is JSON. You are bolting validation on from the outside, and if a table gets created without that constraint -- it happens -- the column happily accepts 'not json at all' and nobody finds out until a report breaks.

The new way

Same table, one keyword different.

CREATE TABLE dbo.OrdersNew
(
    OrderID INT IDENTITY(1,1) PRIMARY KEY,
    Payload JSON NOT NULL
);
GO

INSERT dbo.OrdersNew (Payload) VALUES
(N'{"customer":"Acme","total":1200.00,"items":["Coffee","Tea"]}'),
(N'{"customer":"Globex","total":450.00,"items":["Cookies"]}');
GO

No check constraint. The validation is in the data type now as it should be. The engine will not let an invalid document into a json column. Try it:

INSERT dbo.OrdersNew (Payload) VALUES (N'not json at all');

That is the difference, right there in the screenshot. The old table only caught bad data if you remembered the constraint. The new table catches it because the JSON data type itself will not accept anything that is not valid JSON.

It is stored parsed, not as text

This is the part that matters for performance, and it is invisible until you go looking. A json column is not stored as the characters you typed -- it is stored in a binary format, parsed once when the row is written.

That changes the math on reads. With the old method, NVARCHAR(MAX), every JSON_VALUE call parses the string from scratch: the engine walks it character by character, finds your key, hands back the value, throws the parse away, and does it all again on the next row. With the json type, that work is already done, and the reads navigate the binary structure directly.

The functions you already know will work against it as-is, no syntax changes:

SELECT
    OrderID,
    Customer = JSON_VALUE(Payload, '$.customer'),
    Total = JSON_VALUE(Payload, '$.total')
FROM dbo.OrdersNew;

Identical T-SQL to what you would write against the old NVARCHAR(MAX) column. That is deliberate -- Microsoft kept the surface area the same so your existing code keeps running. What changed is underneath it.

That is the claim. Here is the measurement. Build the same document set -- 200,000 rows of a heavier document -- in both an NVARCHAR(MAX) column and a json column:

CREATE TABLE dbo.JsonString (id INT PRIMARY KEY, Payload NVARCHAR(MAX));
CREATE TABLE dbo.JsonNative (id INT PRIMARY KEY, Payload JSON);
GO

INSERT dbo.JsonString (id, Payload)
SELECT TOP (200000)
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
    N'{"customer":"Acme Industries","region":"West","tier":"Gold",
"total":1200.00,"tax":96.00,"shipping":"expedited","priority":7,
"notes":"recurring quarterly order, net 30 terms on file",
"items":["Coffee","Tea","Cake","Cookies","Sandwich","Juice"],
"contact":{"name":"Jane Roe","email":"jane@acme.test","phone":"555-0100"}}'
FROM sys.all_objects a CROSS JOIN sys.all_objects b;

INSERT dbo.JsonNative (id, Payload)
SELECT id, Payload FROM dbo.JsonString;
GO

Now extract several values per row -- each JSON_VALUE call against the string column is another full parse of that document:

SET STATISTICS TIME ON;

-- The old way: re-parse the string on every call, every row
SELECT COUNT(*)
FROM dbo.JsonString
WHERE JSON_VALUE(Payload, '$.customer')      IS NOT NULL
  AND JSON_VALUE(Payload, '$.region')        IS NOT NULL
  AND JSON_VALUE(Payload, '$.tier')          IS NOT NULL
  AND JSON_VALUE(Payload, '$.contact.name')  IS NOT NULL
  AND JSON_VALUE(Payload, '$.contact.email') IS NOT NULL
  AND JSON_VALUE(Payload, '$.shipping')      IS NOT NULL;

-- The new way: document already parsed, navigated directly
SELECT COUNT(*)
FROM dbo.JsonNative
WHERE JSON_VALUE(Payload, '$.customer')      IS NOT NULL
  AND JSON_VALUE(Payload, '$.region')        IS NOT NULL
  AND JSON_VALUE(Payload, '$.tier')          IS NOT NULL
  AND JSON_VALUE(Payload, '$.contact.name')  IS NOT NULL
  AND JSON_VALUE(Payload, '$.contact.email') IS NOT NULL
  AND JSON_VALUE(Payload, '$.shipping')      IS NOT NULL;

SET STATISTICS TIME OFF;

On my box the string column ran 5,672 ms of CPU. The json column ran the identical query in 3,828 ms -- roughly a third faster, for changing one keyword in a CREATE TABLE. Same query, same data, same result. The only difference is that the string column re-parses the whole document on every one of those six JSON_VALUE calls, on every row, while the json column was parsed once on insert and just reads the structure. Your numbers will differ, but the shape will not: the more keys you pull and the bigger the document, the wider that gap gets.

A note on JSON_MODIFY

On a string column, changing one key means rewriting the entire document -- read it, modify the text, write the whole thing back. The json type adds a modify method that is built to do in-place updates where the new value fits, instead of rewriting the document in full. Very useful, and on the list of things I plan to lab, but don't miss the version note below. The modify method specifically is one of the pieces still marked preview.

Change existing column to json

You do not have to rebuild the table. ALTER TABLE ... ALTER COLUMN converts a character column in place:

-- The check constraint references Payload -- must be dropped first
ALTER TABLE dbo.OrdersOld
DROP CONSTRAINT ck_OrdersOld_IsJson_Payload;
GO

-- Now take the old string column to the real type
ALTER TABLE dbo.OrdersOld
ALTER COLUMN Payload JSON NOT NULL;
GO

One thing to know before you run that against a real table: the conversion validates every row. Any row holding text that is not well-formed JSON will fail the ALTER. That is not a bug -- it is the data type doing its job -- but it means the ALTER to json doubles as a data-quality check. On a column that has been collecting strings for years with a check constraint that may not have always been trusted, you can probably expect the first attempt to find something. That is not a bad thing.

The order of operations for a production column json change: add the new json column alongside the old one, backfill in batches, find and fix the rows that will not convert, then drop the old column when you know it's clean.

The one thing you should read first

Here is the honest version, because the headlines blur it.

The native JSON data type is generally available -- on Azure SQL Database and Azure SQL Managed Instance. For the boxed product, SQL Server 2025, Microsoft's own documentation still describes the json type as being in preview. The type ships in the box, it works under every database compatibility level, and you can build on it today in a dev environment. But 'in the box' and 'GA' are not the same thing, and the modify method and the JSON aggregate functions are explicitly called out as preview on SQL Server 2025.

Where json type status
Azure SQL Database Generally available
Azure SQL Managed Instance GA (SQL Server 2025 / Always-up-to-date policy)
SQL Server 2025 (box) In preview, per Microsoft Learn

What that means in practice: lab it, learn it, play with it. Do not bet a production schema on the preview-flagged pieces until Microsoft moves them to GA. Preview features can change between CUs, and you don't want to build on a preview feature that gets turned off later. Be sure to check the docs against your current build before you commit.

The bottom line

The NVARCHAR(MAX)-plus-ISJSON pattern was never wrong. It was just the best we had. Very much like the FOR XML PATH string trick was the best we had before STRING_AGG. SQL Server 2025 finally makes JSON a real data type instead of a string with a usage note attached: validated by the engine, stored parsed, cheaper to read and to modify.

It is not a license to turn your relational database into a document store. JSON is still for the variable-shaped data -- the flexible attribute bag, API payloads and unmodeled log records. Do not use it as a shortcut to avoid defining columns that should have been native columns from day one. But for the JSON you legitimately have, this is your upgrade. Just keep an eye on that 'preview' status -- it's a dev-box feature on SQL Server 2025 until Microsoft says otherwise.

More to Read

Microsoft Learn: JSON data type
Microsoft Learn: Work with JSON data in SQL Server
Microsoft Learn: SQL Server 2025 Preview Features FAQ
Microsoft: GA of JSON data type and JSON aggregates
sqlfingers inc: Three T-SQL Tricks You May Not Be Using Yet