In my last post, I showed how Copilot can confidently produce wrong results when your data is messy. Fair enough. But what happens when you give it clean data?
Spoiler: it actually works pretty well.
The Setup
Same exercise, different data. This time we have a properly designed trading table with constraints, clear naming, and no garbage:
CREATE TABLE dbo.Trades (
TradeID INT IDENTITY(1,1) PRIMARY KEY,
Symbol VARCHAR(10) NOT NULL,
Quantity INT NOT NULL CHECK (Quantity > 0),
Price DECIMAL(10,4) NOT NULL CHECK (Price > 0),
TradeDate DATE NOT NULL,
TraderID INT NOT NULL,
CONSTRAINT CK_Symbol CHECK (Symbol NOT LIKE '%TEST%')
);
INSERT INTO dbo.Trades (Symbol, Quantity, Price, TradeDate, TraderID) VALUES
('AAPL', 500, 178.25, '2024-01-15', 101),
('AAPL', 300, 182.50, '2024-02-20', 101),
('AAPL', 200, 175.00, '2024-03-10', 102),
('MSFT', 400, 378.00, '2024-01-22', 101),
('MSFT', 250, 385.50, '2024-02-28', 103),
('MSFT', 350, 390.25, '2024-03-15', 102),
('GOOGL', 150, 141.20, '2024-01-30', 102),
('GOOGL', 200, 145.75, '2024-02-15', 101),
('GOOGL', 175, 148.30, '2024-03-20', 103);
Notice: no NULLs, no duplicates, no "DO NOT USE - TEST" records. The CHECK constraints enforce data quality at the source.
Now I ask Copilot: "Show me total trading volume and value by symbol for Q1 2024."
What Copilot Produced
SELECT
Symbol,
SUM(Quantity) AS TotalVolume,
SUM(Quantity * Price) AS TotalValue,
COUNT(*) AS TradeCount
FROM dbo.Trades
WHERE TradeDate >= '2024-01-01' AND TradeDate < '2024-04-01'
GROUP BY Symbol
ORDER BY TotalValue DESC;
Clean. Correct. No unnecessary complexity.
The Results
| Symbol | TotalVolume | TotalValue | TradeCount |
|---|---|---|---|
| MSFT | 1,000 | $384,787.50 | 3 |
| AAPL | 1,000 | $178,875.00 | 3 |
| GOOGL | 525 | $76,010.25 | 3 |
No test accounts polluting the top. No fragmented customer names. No NULL revenue hiding in the corners. Just accurate numbers I can actually send to someone.
The Point
Copilot didn't get smarter between my last post and this one. The data got smarter. Constraints prevented garbage from entering. Consistent naming eliminated ambiguity. Clear column purposes meant the AI could infer intent correctly.
AI rewards good data hygiene. Always has. Always will.
Still verify everything. But when your foundation is solid, there's a lot less to verify.
No comments:
Post a Comment