For two decades, getting regular expression support into SQL Server meant CLR assemblies, deployment headaches, and a whole lot of discussions with your security team. Or, you may have used the PATINDEX method and spent 45 minutes writing something that any other language handles in one line. Either way, you were doing a lot of work to get the job done.
SQL Server 2025 changes this. Native RegEx support is here, built directly into T-SQL, no CLR required. Seven new functions ship with the engine, and they cover the full range of what you actually need: matching, extracting, replacing, splitting, and counting.
The Seven Functions
| Function | What it does | Type |
|---|---|---|
| REGEXP_LIKE | Boolean match — does the string match the pattern? | Scalar |
| REGEXP_COUNT | Returns the number of times a pattern matches | Scalar |
| REGEXP_INSTR | Returns the position where the pattern matches (like CHARINDEX, with regex) | Scalar |
| REGEXP_SUBSTR | Extracts the matching substring | Scalar |
| REGEXP_REPLACE | Replaces matches with a specified string | Scalar |
| REGEXP_MATCHES | Returns all matches with position details | Table-valued |
| REGEXP_SPLIT_TO_TABLE | Splits a string on a pattern delimiter, returns rows | Table-valued |
One prerequisite to flag up front: REGEXP_LIKE, REGEXP_MATCHES, and REGEXP_SPLIT_TO_TABLE require database compatibility level 170. If you just installed SQL Server 2025 but haven't bumped your CL yet, those three won't work. Check yours and adjust if needed:
-- Check current compat level SELECT name, compatibility_level FROM sys.databases; -- Bump to 170 if needed (SQL Server 2025) ALTER DATABASE [YourDatabase] SET COMPATIBILITY_LEVEL = 170;
The other four — REGEXP_COUNT, REGEXP_INSTR, REGEXP_SUBSTR, REGEXP_REPLACE — are available at all compatibility levels in SQL Server 2025.
REGEXP_LIKE in a CHECK Constraint
If you're building a new table, put the constraint in day one. Then, bad data never gets in, and you won't end up having to find and fix the junk at a later date. With the email check constraint, the engine enforces it on every insert and update without any application code, triggers, or stored procedures required. With REGEXP_LIKE now available in 2025 T-SQL, you can enforce a real email pattern directly in the column definition:
CREATE TABLE dbo.Contacts (
ContactID INT IDENTITY PRIMARY KEY,
FullName VARCHAR(100),
Email VARCHAR(200) CONSTRAINT chk_Email_Format
CHECK (REGEXP_LIKE(Email, '^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$')),
Phone VARCHAR(50)
);
Good data goes in fine:
INSERT INTO dbo.Contacts (FullName, Email, Phone)
VALUES
('Rebecca Lewis', 'rebecca@sqlfingers.com', '(555) 867-5309'),
('Jane Doe', 'jane.doe@somewhere.com', '5551234567');
Bad email gets rejected immediately:
INSERT INTO dbo.Contacts (FullName, Email, Phone)
VALUES ('Bob Slob', 'bob.slob@@notright', '555.123.4567');
Msg 547, Level 16, State 0, Line 1 The INSERT statement conflicted with the CHECK constraint "chk_Email_Format". The conflict occurred in database "DBA", table "dbo.Contacts", column 'Email'.
Very simple, easy data validation lives in the table definition - where it belongs.
REGEXP_LIKE: Find the Junk in an Existing Table
That's the clean-slate scenario. For the more common one, where you've inherited a table with no constraints in place, the approach is the same — REGEXP_LIKE will find the damage:
CREATE TABLE dbo.ContactsDirty (
ContactID INT IDENTITY PRIMARY KEY,
FullName VARCHAR(100),
Email VARCHAR(200),
Phone VARCHAR(50)
);
INSERT INTO dbo.ContactsDirty (FullName, Email, Phone)
VALUES
('Rebecca Lewis', 'rebecca@sqlfingers.com', '(555) 867-5309'),
('Bob Slob', 'bob.slob@@notright', '555.123.4567'),
('Jane Doe', 'jane.doe@somewhere.com', '5551234567'),
('No Format Guy', 'noemail', '(555) 999-0000');
-- Find the bad rows
SELECT ContactID, FullName, Email
FROM dbo.ContactsDirty
WHERE NOT REGEXP_LIKE(Email, '^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$');
Results:
Two rows of garbage, found instantly. No CHARINDEX nesting, no PATINDEX gymnastics.
REGEXP_REPLACE: Clean Up Messy Phone Data
Classic ETL problem: a phone number column with every possible format imaginable. Dashes, dots, parentheses, spaces — pick your poison. Use this to strip everything non-numeric in one pass:
SELECT
Phone AS OriginalPhone,
REGEXP_REPLACE(Phone, '[^0-9]', '') AS DigitsOnly
FROM dbo.ContactsDirty;
Results:
Or, maybe you want to reformat to a standard pattern while you're at it?
;WITH Stripped AS (
SELECT ContactID, Phone,
REGEXP_REPLACE(Phone, '[^0-9]', '') AS DigitsOnly
FROM dbo.ContactsDirty
)
SELECT Phone,
REGEXP_REPLACE(DigitsOnly, '^(\d{3})(\d{3})(\d{4})$', '(\1) \2-\3') AS Formatted
FROM Stripped;
REGEXP_SUBSTR: Extract Only What You Need
Here's a good one. Use REGEXP_SUBSTR to pull only what you need from the targeted string values. Domain, area code, whatever you need:
SELECT
Email,
REGEXP_SUBSTR(Email, '[^@]+$') AS Domain
FROM dbo.ContactsDirty
WHERE REGEXP_LIKE(Email, '^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$');
SELECT
Phone,
REGEXP_SUBSTR(REGEXP_REPLACE(Phone, '[^0-9]', ''), '^\d{3}') AS AreaCode
FROM dbo.ContactsDirty;
Add the Constraint After Cleanup
Once the data is clean, add the constraint so it stays that way. You need to verify that all existing data is good first, or the ALTER TABLE will fail:
-- Confirm no bad rows remain before adding the constraint
SELECT ContactID, FullName, Email
FROM dbo.ContactsDirty
WHERE NOT REGEXP_LIKE(Email, '^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$');
-- If that returns zero rows, you're clear to add the constraint
ALTER TABLE dbo.ContactsDirty
ADD CONSTRAINT chk_Email_Format
CHECK (REGEXP_LIKE(Email, '^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$'));
Before You Go Wild: There are Some Gotchas
It's the RE2 library, not PCRE. SQL Server 2025 uses Google's RE2 regex engine. RE2 is fast and safe (no ReDoS vulnerability), but it does not support lookahead
(?=...)or lookbehind(?<=...). If you've written regex in .NET or Python, those won't work here.Not SARGable. REGEXP_LIKE in a WHERE clause will not use an index — it's a full scan. Use it for data cleaning, validation, ad-hoc queries, and ETL. Don't drop it on a large OLTP table and expect it to perform. Very important.
Don't replace working LIKE code. LIKE is significantly faster than REGEXP_LIKE for simple pattern matching. If LIKE does the job, keep it. Regex earns its place when the pattern is genuinely complex or when you need extract/replace behavior.
LOB types are partially supported. REGEXP_LIKE, REGEXP_COUNT, and REGEXP_INSTR support
varchar(max)/nvarchar(max)up to 2MB. REGEXP_REPLACE, REGEXP_SUBSTR, REGEXP_MATCHES, and REGEXP_SPLIT_TO_TABLE do not support LOBs at all — RE2 library limitation.
Those are real constraints. None change the fact that RegEx in SQL Server 2025 is a long-overdue quality-of-life improvement for any DBAs who've spent the last two decades cobbling this together with CLR assemblies or creative PATINDEX methods.
More to Read
Work with Regular Expressions - SQL Server 2025 (Microsoft Learn)
What's New in SQL Server 2025 (Microsoft Learn)
Regular Expressions in SQL Server 2025 - Rudi Bruchez (deep dive on RE2 and limitations)





No comments:
Post a Comment