Thursday, May 21, 2026

Your JSON Column Was Never a JSON Column

For roughly a decade, the way you stored a JSON document in SQL Server was to not store a JSON document at all. You stored a string. NVARCHAR(MAX), an ISJSON check constraint to keep the garbage out, and a quiet agreement with yourself that the column was JSON because you said so.

SQL Server 2025 ships an actual json data type. Binary storage, parsed once, queryable without re-parsing every row. This post is the demo: the old pain, the new behavior, and the one word in the Microsoft docs you need to read before you put this into production.

The old way

Here is the pattern every shop has somewhere. A string column, pretending.

USE tempdb;
GO

CREATE TABLE dbo.OrdersOld
(
    OrderID INT IDENTITY(1,1) PRIMARY KEY,
    Payload NVARCHAR(MAX)
        CONSTRAINT ck_OrdersOld_IsJson_Payload CHECK (ISJSON(Payload) = 1)
);
GO

INSERT dbo.OrdersOld (Payload) VALUES
(N'{"customer":"Acme","total":1200.00,"items":["Coffee","Tea"]}'),
(N'{"customer":"Globex","total":450.00,"items":["Cookies"]}');
GO

It works. It has always worked. But look at what it actually is:

What you wanted What NVARCHAR(MAX) gave you
A JSON document A string to parse
Validation A check constraint you remembered to add
Fast reads A full re-parse of the text on every JSON_VALUE call
Cheap updates Rewrite the whole document to change one key

The check constraint is the giveaway. The engine does not know the column is JSON. You are bolting validation on from the outside, and if a table gets created without that constraint -- it happens -- the column happily accepts 'not json at all' and nobody finds out until a report breaks.

The new way

Same table, one keyword different.

CREATE TABLE dbo.OrdersNew
(
    OrderID INT IDENTITY(1,1) PRIMARY KEY,
    Payload JSON NOT NULL
);
GO

INSERT dbo.OrdersNew (Payload) VALUES
(N'{"customer":"Acme","total":1200.00,"items":["Coffee","Tea"]}'),
(N'{"customer":"Globex","total":450.00,"items":["Cookies"]}');
GO

No check constraint. The validation is in the data type now as it should be. The engine will not let an invalid document into a json column. Try it:

INSERT dbo.OrdersNew (Payload) VALUES (N'not json at all');

That is the difference, right there in the screenshot. The old table only caught bad data if you remembered the constraint. The new table catches it because the JSON data type itself will not accept anything that is not valid JSON.

It is stored parsed, not as text

This is the part that matters for performance, and it is invisible until you go looking. A json column is not stored as the characters you typed -- it is stored in a binary format, parsed once when the row is written.

That changes the math on reads. With the old method, NVARCHAR(MAX), every JSON_VALUE call parses the string from scratch: the engine walks it character by character, finds your key, hands back the value, throws the parse away, and does it all again on the next row. With the json type, that work is already done, and the reads navigate the binary structure directly.

The functions you already know will work against it as-is, no syntax changes:

SELECT
    OrderID,
    Customer = JSON_VALUE(Payload, '$.customer'),
    Total = JSON_VALUE(Payload, '$.total')
FROM dbo.OrdersNew;

Identical T-SQL to what you would write against the old NVARCHAR(MAX) column. That is deliberate -- Microsoft kept the surface area the same so your existing code keeps running. What changed is underneath it.

That is the claim. Here is the measurement. Build the same document set -- 200,000 rows of a heavier document -- in both an NVARCHAR(MAX) column and a json column:

CREATE TABLE dbo.JsonString (id INT PRIMARY KEY, Payload NVARCHAR(MAX));
CREATE TABLE dbo.JsonNative (id INT PRIMARY KEY, Payload JSON);
GO

INSERT dbo.JsonString (id, Payload)
SELECT TOP (200000)
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
    N'{"customer":"Acme Industries","region":"West","tier":"Gold",
"total":1200.00,"tax":96.00,"shipping":"expedited","priority":7,
"notes":"recurring quarterly order, net 30 terms on file",
"items":["Coffee","Tea","Cake","Cookies","Sandwich","Juice"],
"contact":{"name":"Jane Roe","email":"jane@acme.test","phone":"555-0100"}}'
FROM sys.all_objects a CROSS JOIN sys.all_objects b;

INSERT dbo.JsonNative (id, Payload)
SELECT id, Payload FROM dbo.JsonString;
GO

Now extract several values per row -- each JSON_VALUE call against the string column is another full parse of that document:

SET STATISTICS TIME ON;

-- The old way: re-parse the string on every call, every row
SELECT COUNT(*)
FROM dbo.JsonString
WHERE JSON_VALUE(Payload, '$.customer')      IS NOT NULL
  AND JSON_VALUE(Payload, '$.region')        IS NOT NULL
  AND JSON_VALUE(Payload, '$.tier')          IS NOT NULL
  AND JSON_VALUE(Payload, '$.contact.name')  IS NOT NULL
  AND JSON_VALUE(Payload, '$.contact.email') IS NOT NULL
  AND JSON_VALUE(Payload, '$.shipping')      IS NOT NULL;

-- The new way: document already parsed, navigated directly
SELECT COUNT(*)
FROM dbo.JsonNative
WHERE JSON_VALUE(Payload, '$.customer')      IS NOT NULL
  AND JSON_VALUE(Payload, '$.region')        IS NOT NULL
  AND JSON_VALUE(Payload, '$.tier')          IS NOT NULL
  AND JSON_VALUE(Payload, '$.contact.name')  IS NOT NULL
  AND JSON_VALUE(Payload, '$.contact.email') IS NOT NULL
  AND JSON_VALUE(Payload, '$.shipping')      IS NOT NULL;

SET STATISTICS TIME OFF;

On my box the string column ran 5,672 ms of CPU. The json column ran the identical query in 3,828 ms -- roughly a third faster, for changing one keyword in a CREATE TABLE. Same query, same data, same result. The only difference is that the string column re-parses the whole document on every one of those six JSON_VALUE calls, on every row, while the json column was parsed once on insert and just reads the structure. Your numbers will differ, but the shape will not: the more keys you pull and the bigger the document, the wider that gap gets.

A note on JSON_MODIFY

On a string column, changing one key means rewriting the entire document -- read it, modify the text, write the whole thing back. The json type adds a modify method that is built to do in-place updates where the new value fits, instead of rewriting the document in full. Very useful, and on the list of things I plan to lab, but don't miss the version note below. The modify method specifically is one of the pieces still marked preview.

Change existing column to json

You do not have to rebuild the table. ALTER TABLE ... ALTER COLUMN converts a character column in place:

-- The check constraint references Payload -- must be dropped first
ALTER TABLE dbo.OrdersOld
DROP CONSTRAINT ck_OrdersOld_IsJson_Payload;
GO

-- Now take the old string column to the real type
ALTER TABLE dbo.OrdersOld
ALTER COLUMN Payload JSON NOT NULL;
GO

One thing to know before you run that against a real table: the conversion validates every row. Any row holding text that is not well-formed JSON will fail the ALTER. That is not a bug -- it is the data type doing its job -- but it means the ALTER to json doubles as a data-quality check. On a column that has been collecting strings for years with a check constraint that may not have always been trusted, you can probably expect the first attempt to find something. That is not a bad thing.

The order of operations for a production column json change: add the new json column alongside the old one, backfill in batches, find and fix the rows that will not convert, then drop the old column when you know it's clean.

The one thing you should read first

Here is the honest version, because the headlines blur it.

The native JSON data type is generally available -- on Azure SQL Database and Azure SQL Managed Instance. For the boxed product, SQL Server 2025, Microsoft's own documentation still describes the json type as being in preview. The type ships in the box, it works under every database compatibility level, and you can build on it today in a dev environment. But 'in the box' and 'GA' are not the same thing, and the modify method and the JSON aggregate functions are explicitly called out as preview on SQL Server 2025.

Where json type status
Azure SQL Database Generally available
Azure SQL Managed Instance GA (SQL Server 2025 / Always-up-to-date policy)
SQL Server 2025 (box) In preview, per Microsoft Learn

What that means in practice: lab it, learn it, play with it. Do not bet a production schema on the preview-flagged pieces until Microsoft moves them to GA. Preview features can change between CUs, and you don't want to build on a preview feature that gets turned off later. Be sure to check the docs against your current build before you commit.

The bottom line

The NVARCHAR(MAX)-plus-ISJSON pattern was never wrong. It was just the best we had. Very much like the FOR XML PATH string trick was the best we had before STRING_AGG. SQL Server 2025 finally makes JSON a real data type instead of a string with a usage note attached: validated by the engine, stored parsed, cheaper to read and to modify.

It is not a license to turn your relational database into a document store. JSON is still for the variable-shaped data -- the flexible attribute bag, API payloads and unmodeled log records. Do not use it as a shortcut to avoid defining columns that should have been native columns from day one. But for the JSON you legitimately have, this is your upgrade. Just keep an eye on that 'preview' status -- it's a dev-box feature on SQL Server 2025 until Microsoft says otherwise.

More to Read

Microsoft Learn: JSON data type
Microsoft Learn: Work with JSON data in SQL Server
Microsoft Learn: SQL Server 2025 Preview Features FAQ
Microsoft: GA of JSON data type and JSON aggregates
sqlfingers inc: Three T-SQL Tricks You May Not Be Using Yet

Wednesday, May 20, 2026

CISA Left 844 MB of Credentials on Public GitHub. Check Your Own Repos.

CISA is the United States federal agency that tells everyone else how to protect their passwords and credentials. For six months, it left 844 MB of its own — passwords, cloud keys, certificates — sitting in a public GitHub repository.

That is not a hack. Nobody broke in. A contractor put it there.

Disclaimer: Not a cybersecurity expert. Just a DBA reading the news.

What happened

On May 14, 2026, researcher Guillaume Valadon at GitGuardian found a public GitHub repository named 'Private-CISA'. Inside was 844 MB of plaintext passwords, AWS GovCloud credentials, API tokens, Entra ID SAML certificates, SSH keys, Kubernetes manifests, Terraform code, and internal documentation backups belonging to the Cybersecurity and Infrastructure Security Agency. The repository had been created on November 13, 2025 — public for six months before anyone outside flagged it.

Valadon reported it. CISA took the repository offline within 26 hours. Some of the exposed AWS keys stayed valid for another 48 hours after that.

The repository was maintained on a personal GitHub account by a contractor, who appears to have used it to sync files between a work machine and a home machine. The commit history shows GitHub's secret scanning was deliberately disabled.

Again — CISA is the agency that writes the rules.

CISA's statement: "Currently, there is no indication that any sensitive data was compromised as a result of this incident."

What was actually in it

The file names tell the whole story.

importantAWStokens — administrative credentials to three AWS GovCloud accounts.

AWS-Workspace-Firefox-Passwords.csv — plaintext usernames and passwords for dozens of internal CISA systems — the kind of list a browser builds when you let it save every login.

Nobody had to decrypt anything. A file called importantAWStokens held exactly that. No exploit. No zero-day. Just a contractor saving passwords to a CSV and pushing it to a public repo.

Valadon, the researcher who found it, put it this way:

"I honestly believed that it was all fake before analyzing the content deeper. This is indeed the worst leak that I've witnessed in my career. It is obviously an individual's mistake, but I believe that it might reveal internal practices."

This is not a federal story. It is a habits story.

It is easy to read this as a government story and move on. Do not. Every failure in this incident is something I find in SQL Server shops.

The CISA failure The SQL Server shop equivalent
Plaintext passwords in a CSV Connection strings with embedded sa passwords in .dtsConfig files, deployment scripts, and .publish.xml
Backups committed to Git .bak files and BCP exports sitting in a repo
Secret scanning disabled on purpose The 'make it stop warning me' move, in any repo your team owns
AWS keys valid 48 hours after takedown Can you actually rotate a SQL login or a linked server credential in minutes?
Personal repo used to sync work files Contractors and developers moving work between machines through personal GitHub

Read that last row again. Work files on a personal account. One of the worst things for any security team to see - and it's everywhere.

Find it in your own estate

You do not need a federal contractor to have this problem. SQL Server credentials hide in source control in predictable places.

SSIS configuration files
A .dtsConfig with a connection string carries whatever was in it, password included, unless the package protection level forced the credential out.

Deployment artifacts
.publish.xml profiles, post-deployment scripts, and .sqlproj files routinely carry connection details.

SQL Agent job scripts
Job step T-SQL scripted out to a repo can carry linked server definitions and the credentials behind them.

BCP exports and format files
A .bak or a BCP dump committed 'temporarily' is a database sitting in your Git history forever.

Two things to do very soon. First, secret scanning — a GitHub feature that watches commits for things that look like credentials, AWS keys, tokens, connection strings, and either warns you or blocks the push outright. Turn it on for every repo your team owns, and confirm nobody has turned it off. The CISA contractor's repo had it disabled on purpose. Then ask the harder question: if a credential leaked today, how fast could you rotate it? If the honest answer is days, that is its own finding — just like the 48-hour CISA window.

The bottom line

I have written a lot about AI governance — agents with elevated credentials, the absence of data discipline, the legal exposure forming around it. This is the same story without the AI. A shop moving fast, skipping the basics, one switched-off setting away from a very bad day.

This leak is an 844 MB reminder of how little it takes. No AI. No attacker. Just plaintext passwords in a repo and a scanning feature switched off on purpose.

Secret scanning on. Secrets out of Git. Revocation you can do in minutes, not days. The boring discipline is still where shops are bleeding.

CISA could not keep its credentials out of a public repository. Can you?

More to Read

Krebs on Security: CISA Admin Leaked AWS GovCloud Keys on Github
GitGuardian: How We Got a CISA GitHub Leak Taken Down in Under a Day
The Register: America's top cyber-defense agency left a GitHub repo open
sqlfingers inc: AI Agent. Nine Seconds. One Production Database. Gone.

Tuesday, May 19, 2026

Patch Tuesday May 2026: SQL Didn't Get Off Quite That Easy

Last Wednesday I told you SQL Server got off easy in the May Patch Tuesday release. No critical, no public PoC, no headline. Apply the GDR or CU and move on.

I missed something. There was a SQL Server CVE worth talking about, and it got by me because the bug isn't in SQL Server. It's in SSIS.

CVE-2026-40370

CVSS 8.8. Published May 12. Microsoft's one-line description: 'External control of file name or path in SQL Server allows an authorized attacker to execute code over a network.' That reads like a SQL Server engine bug. It isn't.

The actual fix, per KB5089899 and the sibling KBs covering every supported version back to 2016 SP3:

This fix addresses an XML external entity (XXE) vulnerability in the Web
Service Task that allows an attacker to read arbitrary files from the local
file system or cause a denial-of-service (DoS) attack.

Fix area: Integration Services. Component: Integration Services. Platform: Windows.

That is a SSIS bug shipped under a SQL Server CVE number. If you don't run SSIS, the headline does not apply to you. If you do run SSIS and have any package using the Web Service Task, this is yours to deal with.

What XXE is, briefly

XML External Entity is an old class of XML parser vulnerability. The XML spec lets a document declare external entities, like references to content fetched from a URL or a local file at parse time. If the parser resolves those references and the application returns the result, an attacker who controls the XML can read files off the server.

The classic payload looks like this:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///c:/windows/win.ini">
]>
<foo>&xxe;</foo>

A vulnerable parser fetches c:\windows\win.ini and substitutes the contents into the response. Swap in any path the SQL Server service account can read -- config files, certificates, credential stores, anything -- and you have file disclosure. Point the entity at a non-responsive server and you have a DoS.

OWASP has covered this for over a decade. The fix in modern XML parsers is to disable DTD processing and external entity resolution by default. Microsoft's KB confirms that is what shipped on May 12 for the Web Service Task.

Who is actually affected

Three conditions have to be true at the same time for this CVE to matter to your environment:

Condition Detail
You run SSIS Integration Services installed on a Windows host running a supported SQL Server version.
At least one package uses Web Service Task The control flow task that calls a SOAP web service. If your packages do not include it, the bug is not reachable.
An attacker can influence the XML Microsoft classifies the attacker as 'authorized,' meaning they need permission to run packages or supply package input. Not unauthenticated.

The third condition is what keeps this off the wormable list, and it is why I let it slide last week. 'Authorized attacker' is not nothing though. A user with SSIS execute permissions, a compromised application account, a malicious insider — any of those qualifies. And depending on which environment we are talking about, that could be a dozen people or a hundred.

One thing the bug description does not say is whether the vulnerability is reachable through package design alone, or only through runtime input. Until that is clear, the safe read is: if you have packages with the Web Service Task and any non-DBA can affect what XML those packages process, patch.

The patch matrix

Microsoft shipped a fix for every supported SQL Server version on May 12. Builds and KBs verified against Microsoft Support:

Version Servicing path KB Build
SQL Server 2016 SP3 GDR KB5089271 13.0.6490.1
SQL Server 2017 GDR KB5090347 14.0.2110.2
SQL Server 2019 GDR KB5090408 15.0.2170.1
SQL Server 2022 GDR KB5091158 16.0.1180.1
SQL Server 2022 CU24 KB5089900 16.0.4252.3
SQL Server 2025 GDR KB5091223 17.0.1115.1
SQL Server 2025 CU4 KB5089899 17.0.4040.1

Sibling KBs for the CU paths on 2016, 2017, and 2019 also shipped the same day. Check the Microsoft Security Update Guide for the package that matches your current build. As I noted last week, GDR and CU are separate lanes. Pick the one you have been applying.

SQL Server 2014 and earlier are out of mainstream support and do not appear in the May release. If you are on one of those and running SSIS, you are not getting this fix.

What to do this week

1. Inventory your packages.

Find every package that uses the Web Service Task. If your packages live in SSISDB, the project XML is queryable. If they live on the file system or in msdb, a recursive grep for STOCK:WebServiceTask in the .dtsx files will find them. Most shops have never inventoried which SSIS tasks are in use where. This is a good reason to start.

2. Identify who can supply input to those packages.

Any account that can execute the package, modify its variables at runtime, or supply the XML input it processes. SSIS catalog permissions, SQL Agent proxy accounts, application accounts calling the package via T-SQL -- all of them count.

3. Patch.

Apply the GDR or CU that matches your servicing path. The fix is in the SSIS runtime on the server, not the package itself. You do not need to redeploy packages.

4. If you cannot patch this week, look at your packages.

The Web Service Task is one of several tasks in SSIS that consume XML. If a package is reading XML from a source an untrusted user can influence, that is the exposure window. Consider disabling those packages until the patch lands.

What I should have said last week

'SQL Server got off easy' was true for the engine. It was not true for SSIS. A CVSS 8.8 RCE that ships under a SQL Server CVE number, with a fix in every supported version, deserves more than a footnote -- even when the attack surface is narrower than the headline suggests.

This is the second month in a row that the SQL Server CVE worth caring about is not actually in the engine. March was the sysadmin escalation in linked servers. May is XXE in the SSIS Web Service Task. The pattern is worth noting. The headline CVE rating tells you the score. It does not tell you which component shipped the fix or whether the bug is reachable in your environment. For that, you have to read the KB

Apologies for the miss. Patch your SSIS hosts this week.

More to Read

Microsoft Security Response Center: CVE-2026-40370
Microsoft Security Update Guide: May 2026 Release Notes
KB5089899: SQL Server 2025 CU4 (canonical fix description)
OWASP: XML External Entity (XXE) Processing
sqlfingers inc: Patch Tuesday May 2026 — SQL Got Off Easy. Your Domain Didn't.

Wednesday, May 13, 2026

Patch Tuesday May 2026: SQL Got Off Easy. Your Domain Didn't.

Yesterday was Patch Tuesday. For the first time since June 2024, Microsoft shipped a release with zero zero-days. Nothing exploited in the wild. Nothing publicly disclosed before the fix landed. 118 CVEs, 16 critical, 102 important, and not a single one on fire.

SQL Server got off easy this month, too. We're in the patch list, but no critical SQL Server specific CVE drew a headline. A welcome change after March's CVE-2026-21262 sysadmin-escalation party.

Before you breathe out though, read my title again. The bullet missed SQL Server. It did not miss your domain controller.

The numbers

Metric May 2026
Total CVEs 118
Critical 16
Important 102
Zero-days exploited 0
Zero-days publicly disclosed 0
EoP share 48.3%
RCE share 24.6%

Quick refresher on what a zero-day actually is: a vulnerability the vendor hasn't patched yet — but attackers already know about it. Microsoft tracks two flavors separately. 'Exploited in the wild' means attackers are already using it. 'Publicly disclosed' means the technical details are out, so weaponization is hours away. The 'zero' counts the days defenders had to prepare. March's CVE-2026-21262 was publicly disclosed before the patch shipped — that's why we said 'stop reading and go patch.' This month we've got neither flavor. That is very interesting.

Why this matters to you

SQL Server didn't draw a critical CVE, but three flaws in the May release hit the stack around it. In priority order:

1. The domain controllers your SQL Servers live behind

CVE-2026-41089 is a stack-based buffer overflow in Windows Netlogon. CVSS 9.8. A remote, unauthenticated attacker sending a crafted packet to a server running as a domain controller can execute code on the box. Microsoft tagged it 'Exploitation Less Likely' — and ZDI called it 'wormable' in the same breath, noting that a compromised domain controller is a compromised domain.

'wormable'? All I can say is, do you remember SQL Slammer? January 2003. Took down half the internet in ten minutes. Oh man, the war stories...

Your SQL Server doesn't need to be vulnerable to be at risk. Most of us run Windows Authentication. Most of us trust the domain. If the DC is owned, your service accounts, your AG endpoints, your SPNs, your sa-equivalent Windows logins — all of it inherits the problem.

2. The Word document that owns the DBA laptop

Four critical RCEs in Microsoft Word landed this month — CVE-2026-40361, CVE-2026-40364, CVE-2026-40366, CVE-2026-40367. All CVSS 8.4. The first two are flagged 'Exploitation More Likely'. The detail that should make every DBA stop scrolling: Microsoft notes the Preview Pane is an attack vector. You don't have to open the file. You just have to see it listed in Outlook.

Now think about how a DBA's inbox actually looks. Vendor docs. Customer-supplied schemas in .docx. Audit checklists, Technical Reports and Logs, SOWs... Half of it arrives from an email you don't recognize with attachments too many will double-click before verifying. Patch your workstation this week.

3. Everything else that touches the database tier

CVE What CVSS Why you care
CVE-2026-41096 Windows DNS Client RCE 9.8 Heap overflow via malicious DNS response. Every Windows box runs the client.
CVE-2026-40415 Windows TCP/IP RCE 9.8 Use-after-free, unauthenticated, no user interaction. Wormable.
CVE-2026-42898 Dynamics 365 On-Prem RCE 9.9 Authenticated, with scope change. NAV/BC shops, this is yours.
CVE-2026-41103 SSO Plugin for Jira & Confluence EoP 9.1 Forged identity, bypasses Entra ID auth. 'Exploitation More Likely'.

SQL Server's patches this month

SQL Server appears in the May 2026 patch list. No critical, no public PoC, no headline. Just the routine cumulative-update plumbing we expect. I like boring. Boring is good. Apply the GDR or CU for your servicing path and move on.

One reminder while you're in there: GDR and CU are separate servicing paths. Once you apply a CU, you can't revert to GDR for that installation. Pick the lane that matches what you have already been applying. Microsoft's Security Update Guide is the canonical source for which KB matches your build.

What 'no zero-day' does not mean

It does not mean safe. It means defenders got a head start for a change. Every Patch Tuesday, attackers reverse-engineer the fixes to figure out what bug was patched and how to exploit it. That work is continually faster — AI tooling now does in hours what used to take a skilled researcher days. A 'no zero-day' month just means the exploit work starts after the patch ships instead of before. The exploits are still coming. You just have a little more time to prepare.

It also doesn't mean you can skip a cycle. A zero-day-free Patch Tuesday can still be a bad Patch Tuesday — wormable DC bugs, preview-pane RCEs, and TCP/IP unauthenticated RCEs all add up to a busy week.

The bottom line

SQL Server caught a break. Your domain didn't. Patch your DCs this week. Patch your laptop this week. Apply the SQL Server GDR or CU for your path while you're at it. We got lucky. Be glad May was quiet.

More to Read

Microsoft Security Update Guide: May 2026 Release Notes
Tenable: May 2026 Patch Tuesday Addresses 118 CVEs
Zero Day Initiative: The May 2026 Security Update Review
Help Net Security: Many fixes, but no zero-days
sqlfingers inc: Patch Tuesday — Your sysadmin Role Was Up for Grabs (March 2026)

Friday, May 8, 2026

June 1. Copilot Billing Changes. SSMS Has a Problem.

Three weeks. One billing model change. One SSMS bug that may already be costing you. On June 1, 2026, GitHub Copilot moves from fixed-fee request-based billing to token-based billing. The fallback to a cheaper model when credits run out option is gone. Why? Because an autonomous agent session can cost the same as a quick chat question, and GitHub says the flat rate model is no longer sustainable. And SSMS 22's Copilot integration had a documented infinite-loop bug that was silently exhausting user request quotas. Microsoft fixed it in March — but only if you patched.

Connect those dots.

Disclaimer: not a GitHub billing expert. Just a DBA reading the announcement.

What's changing on June 1

Per GitHub's April 27 announcement, premium request units (PRUs) are out. GitHub AI Credits are in. One credit equals one US cent. Plan prices are unchanged. What changes is how those dollars get consumed.

Plan Monthly cost Monthly AI Credits
Free $0 Limited allowance
Pro $10 $10 (1,000 credits)
Pro+ $39 $39 (3,900 credits)
Business $19/seat $19 (1,900 credits)
Enterprise $39/seat $39 (3,900 credits)

Code completions and Next Edit Suggestions remain free and unmetered. Everything else — chat, agent mode, code review — draws from the credit pool, priced per token by model. Per GitHub's published rate sheet, premium models like Anthropic's Claude Opus and OpenAI's GPT-5.5 cost dramatically more per token than default models.

Two changes worth pulling out of the announcement:

The fallback to a cheaper model is gone. Today, when you exhaust your PRUs, Copilot quietly downgrades you to a smaller model so things keep working. After June 1, when credits run out, that's it — you're done. Unless an admin has set up additional usage budget, in which case you are paying overage at published rates with no cap unless one is configured.

Annual plans are also getting squeezed. Annual Pro and Pro+ subscribers stay on the request-based model until renewal, but the model multipliers are increasing on June 1, and the refund-and-cancel option is only open until May 20. That is twelve days from today.

Why DBA work eats tokens faster than dev work

Token math, briefly. The general rule is roughly four characters per token for English prose. Code and markup run denser — around three to three-and-a-half characters per token — because of all the brackets, attributes, and namespace noise.

What does that mean in practice? Here is a rough input-token cost per paste, per artifact a DBA hands to Copilot every day:

Artifact Typical size Input tokens (approx)
Simple plan XML 10-50 KB 2,500 - 12,500
Complex plan XML ~100 KB 25,000 - 33,000
Big analytical plan 500 KB+ 125,000+
sp_BlitzCache top 50 50-200 KB 12,500 - 65,000
Deadlock graph XML 5-50 KB 1,250 - 12,500
Full schema (50 tables) 100 KB+ 25,000+

And that is per paste. The DBA workflow rarely stops at one. Paste plan, ask. Paste a different plan, ask again. Call sp_BlitzCache @AI = 1. Paste sp_WhoIsActive output, ask why blocking is occurring. Paste the deadlock graph, ask which transaction was the victim. Three to five iterations is normal for any non-trivial troubleshooting session, and each one carries the full prior context with it as input tokens.

Compare that to the developer workflow Copilot was originally built around. Inline code completions. Short prompts. Small inline edits. Inline completions remain free under the new model. The DBA pattern of pasting big diagnostic XML into chat is what now costs real money — and at premium model rates, the math gets real big real fast.

Multiply your typical paste size by three to five iterations, multiply that by the per-token rate of whichever model you are using, and you have your monthly Copilot exposure. Then go look at their pricing page and do the math with your actual workflow.

The SSMS 22 Copilot bug — fixed in 22.4.1 — if you patched

A user thread on the GitHub Community board, discussion #181818, documents Copilot in SSMS 22 entering an infinite loop on certain failed or stalled query executions, sending repeated API calls in the background while showing a loading screen, and burning through users' premium request quotas overnight. One user reported 1,202 Claude Sonnet 4.5 requests in a single day attributed to the SSMS integration alone. Multiple users in the thread confirmed identical behavior, often without even using the chat. The same Copilot integration in Visual Studio and VS Code did not exhibit this issue.

The PM for GitHub Copilot in SSMS responded directly in the thread: "We are aware of this problem and are working on a resolution." The issue was also tracked on Microsoft's SSMS Developer Community.

SSMS 22.4.1, released March 25, 2026 alongside the GA of GitHub Copilot in SSMS, includes improved handling of query executions that either return no results or fail completely. That is the fix. Some users have also reported success rolling back to SSMS 22.1.0 if 22.4.1 still misbehaves in their specific environment.

If you are running an earlier SSMS 22 build with Copilot enabled and the patched 22.4.1 is not deployed, you are carrying this risk straight into the per-token billing model change on June 1. Today the bug burns PRUs — you hit a usage cap, you get warnings, you investigate. After June 1, the same bug burns your credit card balance at API rates with no fallback option. For shops on Copilot Business or Enterprise, where credits are pooled across the organization, a single unpatched user with this bug active could drain their team's monthly allocation overnight.

Read that again. One unpatched SSMS 22 user with a bad API loop, can drain the pooled credit pool for the entire team in a single day.

What to do before June 1

  • Check the preview bill. GitHub launched a preview bill experience in early May, accessible from the Billing Overview page on github.com. It shows what your April usage would have cost under the new model. Look at it before June 1 surprises you.
  • Decide on the annual plan refund window. If you are on annual Pro or Pro+, the refund-with-cancel option closes May 20. After that, you are stuck on request-based billing with the new (worse) model multipliers until renewal.
  • Set spending budgets. Admins on Business and Enterprise can set per-user budgets. A $0 user budget is the kill switch — no credit consumption for that user. Use this on accounts that do not need Copilot or where you suspect bug exposure.
  • Confirm everyone is on SSMS 22.4.1 or later. The 22.4.1 release contains the Copilot infinite-loop fix. Anyone running an earlier SSMS 22 build with the AI Assistance workload installed is exposed. Verify the version with Help, About in SSMS. If you are already on 22.4.1 and the issue persists, rolling back to SSMS 22.1.0 has resolved it for some users.
  • Train the team off paste-the-whole-thing prompts. Pasting a 500KB execution plan into chat to ask 'why is this slow' is now an expensive habit. Targeted questions against scoped context — the specific operator you are concerned about, the specific predicate, the specific wait type — cost a fraction of the same answer at published per-token model rates. Another good reminder that we should be very mindful of what we give to AI.
  • Watch the SSMS feedback site. Per the PM in discussion #181818, the SSMS feedback site is where SSMS Copilot issues should be logged. Subscribe to the relevant items if you are running SSMS 22 in production.

The bigger picture

None of this should be a surprise. The flat-rate AI buffet was the customer-acquisition phase. This is the bill phase. It was always going to come.

I have been writing about the operational and legal risks of AI tooling in production for weeks now. An AI agent deleted a production database in nine seconds. 99% of US enterprises consider themselves AI-ready while 60% admit they cannot manage their data. P2SQL injection turns plain English into damaging, destructive commands. SQL Server Ledger gives us tamper-evident logging when an agent runs amok.

Now we add a new risk to the operational and legal pile: cost. The all-you-can-eat AI subscription was a transitional pricing model. GitHub said it themselves — the per-seat model is not sustainable when an autonomous agent session can cost the same as a quick chat question. Token-based billing is what AI economics actually look like at the API layer, and it's just making its way to Copilot itself.

The shops that survive the transition cleanly will be the ones whose DBAs treat tokens like CPU cycles — finite, measurable, and worth optimizing. Targeted prompts. Scoped context. Cheaper models for the routine stuff. Premium models reserved for the questions that actually need them.

The shops that do not will find out in July, when the first usage-based bill arrives.

More to Read

GitHub Blog: GitHub Copilot is moving to usage-based billing
GitHub Docs: About billing for individual GitHub Copilot plans
GitHub Docs: Models and pricing for GitHub Copilot
GitHub Community Discussion #181818: SSMS 22 Copilot integration usage spike
Microsoft Fabric Community: SSMS 22.4.1 and GitHub Copilot in SSMS (Generally Available)
Microsoft: SSMS Feedback Site

Wednesday, May 6, 2026

Three T-SQL Tricks You May Not Be Using Yet

Three T-SQL features that have shipped over the last few releases and quietly retired patterns many of us are still using out of habit. Each replaces a stale workaround with one line of code, and in two of three cases it runs much faster, too. Take a look, try them out.

1. APPROX_COUNT_DISTINCT -- when COUNT(DISTINCT) is too expensive

Available since SQL Server 2019. COUNT(DISTINCT col) on a high-cardinality column requires SQL Server to track every unique value it has seen, which means a memory grant proportional to cardinality and a hash aggregate that loves to spill to tempdb on big tables. APPROX_COUNT_DISTINCT uses HyperLogLog instead, which trades a small accuracy hit for dramatically lower memory and faster runs.

Haven't heard of HyperLogLog before? Short story: HyperLogLog (HLL) in SQL Server is used to provide extremely fast, approximate counts of unique values (cardinality) in massive datasets using very little memory. Implemented with the APPROX_COUNT_DISTINCT function, allowing analytics on billions of rows with a roughly 2% error margin, significantly faster than COUNT(DISTINCT). Pretty cool. I use this a lot in trading data.

Build a five-million-row table with a few hundred thousand distinct customers (takes a few seconds on a laptop):

WITH N AS
(
    SELECT TOP (5000000)
        OrderID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM sys.all_objects a
    CROSS JOIN sys.all_objects b
    CROSS JOIN sys.all_objects c
)
SELECT
    OrderID,
    CustomerID = ABS(CHECKSUM(NEWID())) % 250000,
    OrderDate = DATEADD(SECOND, OrderID, '2020-01-01'),
    Amount = CAST(OrderID % 1000 AS DECIMAL(10,2))
INTO dbo.OrdersDemo
FROM N;

CREATE CLUSTERED INDEX IX_OrdersDemo_OrderID ON dbo.OrdersDemo (OrderID);

The old way:

SET STATISTICS TIME, IO ON;

SELECT COUNT(DISTINCT CustomerID) AS UniqueCustomers
FROM dbo.OrdersDemo;

The new way:

SELECT APPROX_COUNT_DISTINCT(CustomerID) AS UniqueCustomers
FROM dbo.OrdersDemo;

SET STATISTICS TIME, IO OFF;

Run both, compare CPU time and elapsed time, or even pull the actual execution plan to compare the memory grant on the root operator. You can clearly see, APPROX_COUNT_DISTINCT finishes noticeably faster with a smaller grant.

And the memory grant...

Don't use it for anything that has to be exact -- billing, regulatory counts, reconciliation. Do use it for the dozens of dashboards and exploratory queries where 'about a million' is the same answer as 'exactly 1,003,847'.

2. STRING_AGG -- retire the FOR XML PATH('') hack

Available since SQL Server 2017. Many shops still use legacy code building delimited strings with FOR XML PATH('') and STUFF(). It worked, technically. It was also suboptimal performance.

Setup -- a tiny order schema with a few items per order:

CREATE TABLE dbo.Products
(
    ProductID    INT PRIMARY KEY,
    ProductName  NVARCHAR(50)
);

CREATE TABLE dbo.Orders
(
    OrderID      INT PRIMARY KEY,
    CustomerID   INT
);

CREATE TABLE dbo.OrderItems
(
    OrderID    INT,
    ProductID  INT
);

INSERT dbo.Products VALUES
(1, 'Coffee'), (2, 'Tea'), (3, 'Cookies'), (4, 'Cake'), (5, 'Sandwich');

INSERT dbo.Orders VALUES
(1001, 50), (1002, 50), (1003, 51), (1004, 52);

INSERT dbo.OrderItems VALUES
(1001, 1), (1001, 3),
(1002, 2), (1002, 4), (1002, 5),
(1003, 1), (1003, 2),
(1004, 5);

The old way:

SELECT
    o.OrderID,
    o.CustomerID,
    Products = STUFF((
        SELECT ', ' + p.ProductName
        FROM dbo.OrderItems oi JOIN dbo.Products p 
          ON oi.ProductID = p.ProductID
        WHERE  oi.OrderID = o.OrderID
        FOR XML PATH(''), TYPE
    ).value('.', 'NVARCHAR(MAX)'), 1, 2, '')
FROM dbo.Orders o
ORDER BY o.OrderID;

The new way:

SELECT
    o.OrderID,
    o.CustomerID,
    Products = STRING_AGG(p.ProductName, ', ')
                 WITHIN GROUP (ORDER BY p.ProductName)
FROM dbo.Orders o JOIN dbo.OrderItems oi 
  ON o.OrderID = oi.OrderID JOIN dbo.Products p
    ON oi.ProductID = p.ProductID
GROUP BY o.OrderID, o.CustomerID
ORDER BY o.OrderID;

Both produce the same resultset:

But look at that implicit conversion, larger plan cache and higher subtree cost in one vs the other:

3. GENERATE_SERIES -- kill the homemade tally CTE

This one was introduced in SQL Server 2022. Every DBA has copy-pasted some variant of a recursive CTE to generate a sequence of numbers or dates. The classic use case: a date-spine report that fills in zeros for days with no activity. GENERATE_SERIES replaces it with one function call.

Setup -- a sales table with deliberately sparse dates:

CREATE TABLE dbo.DailySales
(
    SaleDate DATE,
    Amount   DECIMAL(10,2)
);

INSERT dbo.DailySales VALUES
('2026-01-02', 1200.00),
('2026-01-03',  450.00),
('2026-01-05', 2300.00),
('2026-01-08',  890.00),
('2026-01-10', 1100.00);

The old way -- recursive CTE for the date spine:

DECLARE @StartDate DATE = '2026-01-01';
;WITH Days AS
(
    SELECT @StartDate AS TheDate, 0 AS n
    UNION ALL
    SELECT DATEADD(DAY, n + 1, @StartDate), n + 1
    FROM Days
    WHERE n < 9
)
SELECT
    d.TheDate,
    Amount = ISNULL(s.Amount, 0)
FROM Days d LEFT JOIN dbo.DailySales s 
  ON s.SaleDate = d.TheDate
ORDER BY d.TheDate
OPTION (MAXRECURSION 0);

The new way:

DECLARE @StartDate DATE = '2026-01-01';
SELECT
    TheDate = DATEADD(DAY, gs.value, @StartDate),
    Amount = ISNULL(s.Amount, 0)
FROM GENERATE_SERIES(0, 9) gs LEFT JOIN dbo.DailySales s
  ON s.SaleDate = DATEADD(DAY, gs.value, @StartDate)
ORDER BY TheDate;

Both produce the same ten-row date spine with zeros filled in for the empty days, but compare the two constructs. With GENERATE_SERIES we've got no CTE, no MAXRECURSION and no helper table. GENERATE_SERIES accepts (start, stop) or (start, stop, step), supports negative steps, and returns a single column called value. The Optimizer treats it as a normal table-valued function and can parallelize work that depends on it -- something the recursive CTE cannot do.

Why we are still writing the old versions

Full disclosure: this post happened because I caught myself reaching for a recursive CTE last night before remembering the GENERATE_SERIES improvement. Habits stick. But they can be broken easily.

The recursive tally CTE was the first 'clever' T-SQL many of us wrote. The FOR XML PATH trick was THE Stack Overflow answer for a decade. COUNT(DISTINCT) still gets typed reflexively when 'roughly how many' would do.

None of these replacements are new. STRING_AGG turned eight this year. APPROX_COUNT_DISTINCT shipped in 2019, and GENERATE_SERIES has been around since 2022. Time to check them out.

More to Read

Microsoft Learn: APPROX_COUNT_DISTINCT
Microsoft Learn: STRING_AGG
Microsoft Learn: GENERATE_SERIES

Monday, May 4, 2026

99% of US Enterprises Are AI-Ready. Their Lawyers Beg to Differ.

Two facts. The same week.

99% of US enterprises consider themselves AI-ready. 88% believe they are ahead of their competitors. 60% of those same organizations cite data management and governance as their number one challenge.

A federal court just denied Meta's motion to dismiss a class action that turned on this question: when a platform's generative AI tools "developed the ultimate content" of fraudulent ads, does Section 230 still protect the platform?

One of those is a confidence problem. The other is a liability problem. They are about to meet.

Disclaimer: not a lawyer, not legal advice. Just a DBA reading the docket.

The Confidence Gap

Semarchy released its 2026 AI Report on March 9. The full survey covered 1,000 C-level executives in the UK, US, and France, all at companies with $200 million-plus in annual turnover. The US-specific findings landed in Solutions Review on April 27, courtesy of Semarchy's CTO Craig Gravina:

Finding US enterprises
Consider themselves AI-ready 99%
Believe they are ahead of competitors 88%
Cite data management and governance as their single biggest challenge 60%

Read that again. The same set of organizations is telling pollsters two contradictory things in the same survey: 'we are ready' and 'we cannot manage our data'.

The supporting numbers do not improve the picture. These are from the global Semarchy report covering the UK, US, and France:

Finding Share of respondents
Implementing AI initiatives without Master Data Management foundations 51%
Not enforcing data quality standards 38%
Experienced AI project delays last year due to data quality concerns 22%
Reported operational inefficiencies from unreliable data 21%
CDOs viewed as holding a chief role in their organization's AI strategy 7%
CIOs viewed as holding a chief role in their organization's AI strategy 18%

In plain English: the people who actually know whether the data is ready are not in the room when the AI strategy is set. But the strategy is being set anyway.

This is what the customer base looks like right now, regardless of what the press releases say. Willing to bet that anyone reading this who runs SQL Server for a living is nodding.

The courts just drew a line

Now layer this against what the Northern District of California has been doing.

Three cases, all involving Meta. Stay with me — the technical detail is what makes the bridge to the data layer.

Case Citation Judge Outcome at dismissal
Forrest v. Meta Platforms, Inc. N.D. Cal., No. 22-cv-03699-PCP, opinion 6/17/24 P. Casey Pitts Motion to dismiss denied. Case survived.
Bouck v. Meta Platforms, Inc. N.D. Cal., No. 25-cv-05194-RS, opinion 3/24/26 Chief Judge Richard Seeborg Motion to dismiss denied. Case survived.
Suddeth v. Meta Platforms, Inc. N.D. Cal., No. 25-cv-08581-RS, opinion 3/24/26 Chief Judge Richard Seeborg Motion to dismiss granted. Case dismissed.

In Forrest, the plaintiffs alleged that Meta's ad tools "mix and match" images, videos, text, and audio, and use generative AI to optimize ads automatically. Under the Ninth Circuit's framework in Calise v. Meta, that active involvement created a genuine factual dispute over whether Meta materially contributed to the ads' illegality. The case survived dismissal.

In Bouck, the plaintiffs alleged that Meta's generative AI tools themselves "developed the ultimate content of the fraudulent ads", making Meta "a genuine co-conspirator in the creation of the offending content". Section 230 did not protect Meta at the dismissal stage.

In Suddeth, same judge, same day, different outcome. The plaintiffs alleged that Meta's machine-learning systems 'maximize reach, engagement, or downstream actions' and that algorithmic amplification was itself content development. Judge Seeborg dismissed the case. Meta's targeting tools, he wrote, are "content neutral on their own". Algorithmic amplification "is nothing more than an averment of facilitation".

A motion to dismiss is not a finding on the merits. Surviving one means the plaintiffs alleged enough to proceed — but not that they have won. The dividing line the courts are drawing matters either way: targeting an audience is protected distribution, transforming or generating ad content is not.

Where plaintiffs plausibly allege that generative AI itself authored the content — not just amplified it — Section 230 has now failed twice to make the case disappear at the pleadings stage. That sentence is doing a lot of work, pro and con. Maybe read it twice.

The shoe that hasn't dropped

Here is where I have to be careful, because what comes next is doctrinal commentary, not a court holding.

Seth Oranburg, a professor at the University of New Hampshire School of Law, published an analysis in Bloomberg Law on April 14 arguing that the Bouck/Forrest line of reasoning collides with a separate Supreme Court doctrine on securities fraud.

The Supreme Court held in Janus Capital Group v. First Derivative Traders that "the maker of a statement is the person or entity with ultimate authority over the statement, including its content and whether and how to communicate it". The same opinion noted that "merely hosting a document on a Web site does not indicate that the hosting entity adopts the document as its own statement or exercises control over its content".

Oranburg's argument: when a platform's generative AI exercises ultimate authority over the assembled content of a fraudulent investment solicitation, the platform may be the 'maker' under Rule 10b-5 of the Securities Exchange Act. Primary 10b-5 liability has no Section 230 analog.

I am quoting Oranburg directly here because the precision matters. He calls this "the argument no court has yet reached".

So to be clear about what is and isn't true today:

Question Status
Does Section 230 protect platforms when generative AI assembles ad content? Being tested. Two N.D. Cal. cases have survived motions to dismiss on this theory.
Can a platform whose AI has 'ultimate authority' over assembled fraudulent content be a 'maker' under Rule 10b-5? Not decided by any court. Currently a doctrinal argument from legal scholars, sitting on top of a Supreme Court precedent, watching for a case that brings the question forward.

The shoe is dangling. Whether and when it drops is not known today... but I am going to be watching this one for sure.

Why DBAs should care

The principle the courts are circling is broader than ad platforms. It is about who has ultimate authority over assembled content when AI is doing the assembling.

Sit with that for a second. Who has ultimate authority over assembled content when AI is doing the assembling?

Now overlay the Semarchy picture. Half of enterprises are running AI without MDM. A third don't enforce data quality. The data leaders are not in the AI strategy meetings. And on top of those data foundations — or in the absence of them — companies are deploying agents that summarize regulatory submissions, draft customer-facing financial output, generate reports that flow into SEC filings, and write rows that auditors will later have questions about.

If the doctrinal extension Oranburg outlines ever reaches a court holding, the test will be: who exercised ultimate authority over the assembled content? An organization that cannot say where its data lives, who can change it, what shape it takes when it leaves the database, or which AI agent touched it last is going to have a difficult time answering that question.

Two days ago I wrote that permissions are the only line agents cannot cross. The legal frame is starting to align with that observation. The DBA who can show what an agent touched, when, and under whose credentials is also the DBA whose company has an answer ready.

The closer

The numbers and the law are pointing the same direction. Confidence is up. Data discipline is down. Legal exposure is forming around the gap between them.

The boring work — Master Data Management, data quality, audit trails, scoped permissions, governance, knowing what your AI agents are touching and on whose authority — is now also a legal posture, not just an operational one. Those shops treating it as unnecessary overhead may be in the legal cases we're reading about next year.

99% of US enterprises think they are AI-ready. Is yours?

More to Read

Solutions Review (Craig Gravina, 4/27/26): Why Your AI Investments Keep Failing (And How to Fix It)
Semarchy press release (3/9/26): Data Management Overtakes Cost and Talent as Top AI Challenge
Bloomberg Law (Seth Oranburg, 4/14/26): Meta Cases Put Social Media Platforms at Securities Fraud Risk
Cornell LII: Rule 10b-5
sqlfingers inc: AI Agent. Nine Seconds. One Production Database. Gone.

Sunday, May 3, 2026

Invoke SSIS in Fabric: Not for Everyone Yet

At FabCon and SQLCon Atlanta in March, Microsoft announced a new pipeline activity in Microsoft Fabric Data Factory: Invoke SSIS Package. It is now in public preview. In short, it runs an existing .dtsx file inside a Fabric pipeline, with the package stored in OneLake and Fabric handling the compute.

That's interesting by itself, but it also has fences around it that matter — particularly for on-prem shops — and a cost model worth understanding before anyone starts dropping packages into OneLake. For now, this post is just an overview, but I'll revisit after I've tested things.

A quick translation for SSIS readers

Fabric uses its own vocabulary, and some of it overlaps with terms we already use in SSIS. This is a short glossary I find useful:

Fabric term Closest SSIS equivalent
Pipeline A Control Flow. A container holding steps with precedence between them.
Activity A task. A single step inside the pipeline (ie., Execute SQL Task, File System Task).
Invoke SSIS Package activity An Execute Package Task. One step in the pipeline that runs a .dtsx file.
Pipeline canvas The Control Flow design surface in SSDT — the visual area where you drop steps and draw arrows between them.
OneLake The package store, in Fabric's case. Replaces the SSIS catalog or msdb as the place packages live.
Lakehouse A storage object that lives inside OneLake. Roughly, a Lakehouse is to OneLake what a database is to a SQL Server instance — packages and files live in a Lakehouse, which lives in OneLake.
Workspace A logical container used to create, organize, and manage SQL databases, warehouses, reports, and data pipelines together.

So 'add the Invoke SSIS Package activity to a Fabric pipeline' is, in SSIS terms, 'add an Execute Package Task to a Control Flow.' What's different is that the package lives in OneLake now, instead of the SSISDB catalog or msdb, and that it is run by Fabric compute rather than the SSIS runtime on a SQL Server host.

What the Invoke SSIS Package activity actually is

Per Microsoft Learn, the Invoke SSIS Package activity is a pipeline activity (a step in a Fabric pipeline) inside Data Factory for Microsoft Fabric. You upload your .dtsx file (and optional .dtsConfig) to OneLake, add the activity to a pipeline, point it at the package and run it. Fabric provides the compute. This means that Microsoft Fabric acts as the infrastructure, engine, and host for running your ETL tasks.

The package store is OneLake, not the SSIS catalog and not msdb. Runtime overrides for connection managers and package properties are supplied through dedicated tabs on the activity. Logs return to OneLake when logging is enabled. There is no Integration Runtime to provision — Fabric handles the runtime on its side.

When logging is enabled in the activity settings, detailed execution logs are automatically written back to OneLake. In my book, you always want the logging on, but I don't know yet if you can customize the logged events in Fabric as you can in classic SSIS.

How Fabric SSIS Logging Works

  • Package Setup: You upload your .dtsx package and optional .dtsConfig files to a Lakehouse within OneLake.
  • Activity Configuration: You add the Invoke SSIS Package activity to a pipeline and, in the Settings tab, enable the Enable logging option.
  • Execution & Logging: When the pipeline runs, Fabric executes the package and captures log data.
  • Log Destination: The logs are written back to a specific folder in your OneLake Lakehouse.
  • Monitoring: The output of the pipeline activity provides the exact path (logLocation) to these log files in OneLake.

The setup workflow, in six steps

Microsoft's docs lay it out as six steps. Lightly paraphrased:

Step What you do
1 Move .dtsx (and optional .dtsConfig) files into OneLake — drag and drop via OneLake file explorer, or upload through the Fabric portal.
2 Open or create a pipeline (the Control Flow equivalent). Add the Invoke SSIS Package activity from the Activities pane.
3 On the Settings tab, point Package path at the .dtsx, optionally point Configuration path at a .dtsConfig, and tick Enable logging if you want logs in OneLake.
4 Set runtime values on the Connection Managers and Property Overrides tabs. Connection Managers takes a Scope, Name, Property, and Value per override. Property Overrides takes a property path and a value — for example, \Package.Variables[User::<variable name>].Value.
5 Save, and run the pipeline immediately or schedule it.
6 Monitor in the pipeline Output tab or the workspace Monitor hub (Fabric's equivalent of looking at job history in SSMS). If logging was enabled, the activity output points you to a logging path on OneLake.

What stands out about the workflow is what's missing from it. No Visual Studio deployment. No SSIS catalog setup. No SQL Agent job. No proxy account. Six clicks in a browser, start to finish. One gotcha worth flagging from step 4: if your package uses the DontSaveSensitive protection level, the Connection Managers and Property Overrides tabs are not optional — they're where credentials have to live, since the package itself cannot carry them anymore.

The four preview limitations

Straight from the Limitations section of Microsoft Learn:

Limitation What it means in practice
OneLake only Only packages stored in OneLake are supported. Not the SSIS catalog. Not msdb. Not anywhere within the filesystem.
No on-premises sources or destinations The activity can't connect to on-premises systems. If your package reads from on-prem SQL Server or writes to a UNC share, this preview is not for you yet.
No private-network endpoints VNet-injected resources, private endpoints — not supported.
No custom or third-party components Packages depending on custom components or third-party components aren't supported.

That fourth one deserves its own attention, because it is the easiest to overlook. A lot of real-world SSIS packages depend on third-party connectors — Salesforce, ServiceNow, SAP, the assortment of CData and KingswaySoft adapters that show up in many shops that I've supported. None of those will run here right now.

The first three together describe a Fabric-to-Fabric workload. Packages that already live in the cloud, talking to purely cloud-resident systems and nothing on-prem. That is what the preview is built for today.

For a sizable share of today's SSIS shops — anyone whose packages still touch a file share, a domain account, or an on-prem SQL Server — that is a fence with their packages on the wrong side.

Worth noting that the inverse path exists today: traditional SSIS packages running on-prem or in Azure-SSIS IR can target Fabric services as destinations. Microsoft has published tutorials for connecting SSIS packages to Fabric SQL Database and to Fabric Data Warehouse, both of which require Microsoft Entra ID-based authentication. So while you can't yet run SSIS inside Fabric for on-prem workloads, you can write into Fabric from SSIS where you already run it.

What it costs

The pricing model for the activity is in Microsoft Learn. Quick note for SSIS readers: Fabric bills in Capacity Units (CU), which are the abstract compute currency you reserve when you buy a Fabric capacity SKU. CU hours roll up against that reservation. With that in mind, the headline meter looks like this:

Operation Consumption Meter CU consumption rate
SSIS uptime SSIS in Fabric 1.5 CU hours per vCore

The mechanics, again per the docs:

Each workspace is allocated 4 vCores for SSIS runtime execution. During preview that allocation is fixed and cannot be modified.

Uptime starts when the first Invoke SSIS Package activity in the workspace begins running, and continues as long as at least one activity is in progress. After the last one completes, the SSIS runtime stays warm for a fixed Time-To-Live (TTL) of 30 minutes. If a new activity arrives in that window, it benefits from no cold start. If not, the runtime shuts down and billing stops.

The TTL is fixed at 30 minutes during preview and cannot be configured.

Underneath the headline meter, Microsoft's docs add a note worth quoting plainly:

In addition to the SSIS uptime meter, pipeline orchestration runs and OneLake 
storage/transactions are charged under their respective meters. For details, see Data Factory 
pricing for Microsoft Fabric and OneLake consumption.

Translation, for the invoice: the SSIS uptime meter is the visible one, but it is not the only one. Pipeline orchestration runs are billed under Data Factory pricing for Microsoft Fabric, and the storage and transactions on the OneLake side roll into OneLake consumption. If you are sizing for this, you are sizing three things, not one.

What this is, and what it isn't, in one paragraph each

What it is. A preview-stage Fabric Data Factory pipeline activity that executes existing .dtsx packages stored in OneLake, with Fabric providing the compute and OneLake holding the logs.

What it isn't. A solution for on-premises SSIS workloads. The combination of OneLake-only storage, no on-premises connectivity, no private endpoints, and no custom or third-party components draws a clear fence around what the preview supports today. Everything inside the fence is cloud-resident and stock-component. Everything outside it is unsupported in this preview.

What I'm watching for

The preview is gated, and I'm not standing up a Fabric tenant right now to lab it. So, this is an honest research-based overview rather than a hands-on walkthrough. The primary things that I'll be watching (and waiting) for:

On-premises connectivity. The single biggest gap for the on-prem shops I work with. Until that fence comes down, this is a Fabric-to-Fabric story.

Custom and third-party component support. The fourth limitation is the easiest to overlook and the most likely to silently break a real package. Salesforce, ServiceNow, SAP adapters, CData, KingswaySoft — none of those run today.

Pricing clarity. Three meters in play during preview, with a fixed 4-vCore allocation and a fixed 30-minute TTL. When either of those becomes configurable, the cost conversation gets a lot more interesting.

GA timing. Preview features change before general availability. The limitations list and pricing model are both flagged as preview-specific. GA is when the real evaluation starts.

Where this lands depends on where your packages live. If everything is already cloud-resident and built on stock components, the Invoke SSIS Package activity removes real friction. If your packages still touch on-prem in any way, this preview does nothing for you. For the SSIS shops I support, that is not a footnote — it is the whole story. Read the limitations list carefully before assuming your packages will run. I am watching this one closely and will post again soon.

More to Read

Microsoft Learn: Use the Invoke SSIS Package activity in a pipeline (Preview)
Microsoft Learn: Data Factory pricing for Microsoft Fabric
Microsoft Learn: OneLake consumption
Andy Leonard: Microsoft Just Made SSIS-to-Fabric Easier
sqlfingers inc: SSIS Is Not Dead. Yet.