Wednesday, January 7, 2026

Copilot Doesn't Know What Your Data Means

I read an interview yesterday on DataStoryteller with Kelly, a data scientist and author of A Friendly Guide to Data Science. They covered a lot of ground — career paths, soft skills, ethics — but one line got my attention instantly:

"Generative AI is powerful but overhyped — AI cannot replace data quality, context, or human judgment."

That's it! That's the whole thing I keep bumping into. I want to try to show you exactly what Kelly means.

Let's Set the Stage

Open SSMS, connect to your working database, and run this to create some sample data:

CREATE TABLE dbo.Orders (
    OrderID INT IDENTITY(1,1),
    CustomerName VARCHAR(100),
    OrderTotal DECIMAL(10,2),
    OrderDate DATE
);

INSERT INTO dbo.Orders (CustomerName, OrderTotal, OrderDate) VALUES
('Acme Corp', 15000.00, '2024-02-15'),
('ACME Corporation', 22000.00, '2024-03-10'),
('Acme Corp.', 8500.00, '2024-05-22'),
('ACME CORP', 31000.00, '2024-07-08'),
('Globex Industries', 45000.00, '2024-01-20'),
('Globex Industries', 52000.00, '2024-06-14'),
('Initech', 12000.00, '2024-04-03'),
('DO NOT USE - TEST', 99999.00, '2024-08-01'),
('Test Customer', 50000.00, '2024-09-15'),
(NULL, 18000.00, '2024-02-28'),
(NULL, 23000.00, '2024-05-11'),
(NULL, 7500.00, '2024-10-05');

Now open the GitHub Copilot Chat window (View > GitHub Copilot Chat) and ask it:

"Write me a query for total revenue by customer for 2024 from dbo.Orders"

Here's my Copilot's answer:

GitHub Copilot response showing a sophisticated query with NULL handling and a note about name normalization

Look at that response. Copilot explored the database, confirmed the table exists and validated the syntax. It wrote sophisticated code with ISNULL(NULLIF(LTRIM(RTRIM(...)))) to handle NULLs and empty strings. It added a @TopN variable I didn't ask for, and even threw in an OrderCount for good measure.

But look at the bottom. A polite acknowledgement that name variants like "Acme Corp" and "ACME CORPORATION" might need normalization - but it doesn't actually fix the problem. It just mentions it.

Let's run it and see what we get:

-- Created by GitHub Copilot in SSMS - review carefully before executing
DECLARE @TopN INT = 10;

SELECT TOP (@TopN) WITH TIES
    ISNULL(NULLIF(LTRIM(RTRIM(CustomerName)), ''), '(Unknown)') AS CustomerName,
    SUM(OrderTotal) AS TotalRevenue,
    COUNT(*) AS OrderCount
FROM dbo.Orders
WHERE OrderDate >= '20240101' AND OrderDate < '20250101'
GROUP BY ISNULL(NULLIF(LTRIM(RTRIM(CustomerName)), ''), '(Unknown)')
ORDER BY TotalRevenue DESC;
Query results showing DO NOT USE - TEST as the top customer with $99,999 in revenue

The Problem is Hiding in Plain Sight

Look at row 1. Your #1 customer by revenue is "DO NOT USE - TEST" at $99,999. And row 3? "Test Customer" at $50,000. Two test accounts in your top 3.

Now look at rows 5, 6, and 8: "ACME CORP", "ACME Corporation", "Acme Corp." — the same customer split across three rows, their combined $76,500 fragmented into pieces... as if they were three different customers.

Let's verify:

-- How many versions of "Acme" do we have?
SELECT DISTINCT CustomerName
FROM dbo.Orders
WHERE CustomerName LIKE '%Acme%';

-- How much revenue is hiding in NULL customers?
SELECT COUNT(*) AS NullCustomerOrders,
       SUM(OrderTotal) AS MissingRevenue
FROM dbo.Orders 
WHERE CustomerName IS NULL;

-- How much 'fake' revenue from test accounts?
SELECT CustomerName, SUM(OrderTotal) AS TotalRevenue
FROM dbo.Orders
WHERE CustomerName LIKE '%TEST%' OR CustomerName LIKE '%DO NOT%'
GROUP BY CustomerName;
Verification query results showing duplicate Acme entries, NULL revenue, and test account totals

Three versions of Acme. $48,500 in NULL revenue (which Copilot did handle, to be fair), and $149,999 in test account revenue polluting the results.

The Point

The AI wrote syntactically perfect SQL. It even tried to be clever with NULL handling, AND it acknowledged the duplicate name problem in its response. Yet it still produced a report that's quietly, confidently wrong — because it doesn't know your data the way you do. It can't attend your Monday meetings. It doesn't know that Karen in accounting enters everything in ALL CAPS. It is trusting everything blindly.

Kelly nailed this one too: "Garbage in, garbage out is very common — most real-world data is not analysis-ready."

The bottleneck was never writing the query. The bottleneck comes with understanding what the data actually means.

The Bottom Line

Use Copilot. Use ChatGPT. These tools are genuinely useful — they reduce friction, eliminate syntax errors and save a lot of time.

But before you hit send, run a few sanity checks. CHECK it before you email it to your customer. Poke at the data. Look for dupes, NULLs, and test records. Garbage in = garbage out has been true since the dawn of time — and it will ALWAYS be true. If you're not cautious, AI just helps you push that garbage out faster.

AI doesn't know what your data means.

You do.

More to Read:

Monday, January 5, 2026

SQL Database in Fabric: What it is, what it isn't, and where it makes sense.

In Part 1 we answered 'What is Fabric?' In Part 2 we covered how Fabric organizes data with OneLake, Lakehouses, and Warehouses, and in Part 3 we explored the ingestion options.

Now let's look at an actual transactional database running inside Fabric.

SQL database in Microsoft Fabric became generally available at Ignite in November 2025. This isn't a data warehouse. It's not a lakehouse with a SQL endpoint. It's a real OLTP database — based on the same engine as Azure SQL Database — designed for operational workloads, running as a fully managed SaaS service inside your Fabric capacity.

This matters because it changes what Fabric can be. Until now, Fabric was an analytics destination. Now it can also be an application backend.

WHAT IT IS

SQL database in Fabric uses the same SQL Database Engine as Azure SQL Database. Same tSQL. Same query optimizer. Same tooling support (SSMS, Azure Data Studio, sqlcmd). If you've worked with Azure SQL Database, you already know how to query it.

What makes it different:
  • Automatic mirroring to OneLake. Every table you create is automatically replicated to OneLake in Delta Parquet format. No configuration required. Your transactional data becomes immediately available for Spark notebooks, Power BI Direct Lake, cross-database queries -- all the Fabric analytics tools.
  • Minimal configuration. No vCores to choose. No service tier decisions. No firewall rules to configure. Database creation takes under a minute and it scales automatically based on workload.
  • Unified billing. Capacity Units from your Fabric SKU. One bill, one pool of compute shared with all your other Fabric workloads.
  • Built-in intelligence. Automatic indexing via Automatic Tuning is enabled by default. The database creates and drops indexes based on workload patterns without you asking. Not entirely sure how I feel about this yet.

WHAT IT ISN'T

1. It's not a place for SQL Agent jobs.

There's no SQL Server Agent. No Elastic Jobs either. If you need recurring jobs like maintenance, ETL, or data cleanup, you'll use Data Factory pipelines or Apache Airflow jobs in Fabric. This is a shift if you're used to the Agent being the answer to everything.

2. It's not the right fit for Always Encrypted or customer-managed keys.

Storage encryption exists, but it's service-managed only. Always Encrypted is not supported. If your compliance requirements demand customer-managed encryption keys, this isn't the database for that workload. Yet.

3. It doesn't support Change Data Capture (CDC).

This surprised me. CDC is supported in Azure SQL Database (S3 and above), but not in Fabric SQL database. If you need CDC for downstream consumers, you'll need to architect around it -- perhaps using the automatic OneLake mirroring as a substitute, since it captures changes near realtime anyway.

4. It doesn't offer the same network isolation model as Azure SQL.

Unlike Azure SQL Database, there are no VNet service endpoints or database-specific private endpoints. Fabric does support private links at the workspace level, so network isolation is possible, but it is implemented differently than what you're accustomed to with Azure SQL.

5. It's not unlimited.

Maximum 32 vCores. Maximum 4 TB storage. Maximum 1 TB tempdb. For many operational workloads this is plenty, but if you're thinking about running your ERP backend here, be sure to verify FIRST that this ceiling fits your peak.

6. It's not instant replication.

Microsoft describes the OneLake mirroring as 'near real-time'. In community testing, latency has ranged from a couple of minutes to slightly longer, depending on workload. For most analytics use cases, that's fine. For use cases requiring second-level consistency between transactional and analytical views, test it with your workload and set expectations accordingly.

7. The capacity math isn't intuitive.

1 Fabric Capacity Unit ≈ 0.383 SQL database vCores. Or flip it: 1 database vCore ≈ 2.61 Fabric Capacity Units. An F2 capacity gives you roughly 0.77 vCores equivalent. An F64 gives you about 24.5 vCores. The documentation has the conversion, but you have to look for it.

WHEN IT MAKES SENSE

Scenario Verdict Why
New app backend, team already in Fabric Good fit Minimal friction to create, automatic analytics integration
Prototype or dev/test database Good fit Fast provisioning, no infrastructure decisions, pause-friendly billing
Operational database that feeds Power BI Great fit Direct Lake mode works against the mirrored data with no import refresh
Application requiring Always Encrypted Not today Feature not supported
Workloads needing traditional VNet isolation Evaluate carefully Fabric uses workspace-level private links, not database-level VNet endpoints
Existing Azure SQL DB already in production Keep it Mirroring to Fabric gives you analytics access without migration
On-prem SQL Server you want to modernize Consider Mirroring first Less risky than full migration; proves value before committing

WHEN TO USE MIRRORING INSTEAD

If you already have an Azure SQL Database or on-premises SQL Server that runs your operational workload, you don't need to migrate it to Fabric SQL database to get the analytics benefits.

Mirroring (which we covered in Part 3) replicates your existing database into OneLake. You get the same automatic Delta table conversion, the same cross-database query capability, the same Power BI integration — without moving the transactional workload.

The difference is who manages the database:

Aspect SQL database in Fabric Mirroring Azure SQL DB
Database management Fabric (SaaS) You (PaaS)
Scaling decisions Automatic Your choice
Backup control Automatic, 7-day retention You configure
Security features Fabric workspace + SQL RBAC Full Azure SQL feature set
Cost model Fabric CUs Azure SQL pricing + Fabric CUs for analytics

If your production database needs features Fabric SQL doesn't have yet (Always Encrypted, customer-managed keys, elastic pools, geo-replication), keep it in Azure SQL Database and use Mirroring.

PRACTICAL ADVICE

1. Try the quick creation test.

Create a Fabric workspace (trial capacity works), click New Item → SQL database. The speed is genuinely impressive and helps you understand what 'SaaS SQL' actually means in practice.

2. Understand what you're giving up.

Review the limitations page before committing a production workload. The feature gaps are reasonable trade-offs for many applications, but not all.

3. Watch your Capacity Unit consumption.

SQL database shares CUs with everything else in Fabric. A runaway query in your database can throttle your Power BI reports. You'll want to use the Fabric Capacity Metrics app to understand who's consuming what.

4. Plan your scheduling story early.

No SQL Agent means you need to think about where scheduled work runs. Data Factory pipelines are the natural fit within Fabric. Don't discover this gap the week before go-live.

5. Test the mirroring latency for your use case.

If your analysts expect changes to appear in reports within seconds, the mirroring delay may cause friction. Set expectations or design around it, where the transactional endpoint is always current and the OneLake mirror has a small lag.

THE BOTTOM LINE

SQL database in Microsoft Fabric is the first true SaaS SQL Server offering. It removes nearly all operational burden: no vCores to select, no failover to configure, no patching to schedule, no firewall rules to maintain.

In exchange, you must accept constraints: fewer security features than Azure SQL Database, a different network isolation model, no SQL Agent, and hard ceilings on compute and storage.

Where it fits well: New applications designed for Fabric. Dev/test databases. Operational workloads where the built-in analytics integration is the primary value proposition.

Where it doesn't fit yet: Highly regulated workloads requiring customer-managed encryption. Large databases that exceed the 4 TB ceiling. Environments requiring traditional VNet-level network isolation.

For shops with existing SQL Server or Azure SQL workloads, Mirroring is often the smarter first step. You get the Fabric analytics benefits without migrating your production transactional workload. Once you've proven value and understood Fabric's operational model, you can decide whether future databases belong inside Fabric directly.

WRAPPING IT UP

This post completes my four-part series on Microsoft Fabric from a DBA's perspective:


My goal with this series was to cut through the marketing and give you -- the working DBA -- a practical understanding of what Fabric actually is, how it works, and where it might fit in your environment. Not hype. Not a sales pitch. Just the facts, the trade-offs, and the questions you should be asking.

Fabric is still very young, with many limitations. Features are being added monthly. If you're evaluating Fabric for your organization, revisit the documentation regularly, and be sure to test with your own workloads rather than relying on anyone's benchmarks, including Microsoft's -- Brent Ozar: Fabric Is Just Plain Unreliable.

Thanks for reading along. If this series helped you make sense of Fabric, I'd love to hear about it. And if you have questions I didn't answer — drop me a line.

More to Read: