The Old World Horror Story
It's 2 AM. You're on call. The primary replica of your critical Always On Availability Group just failed over. Everything should be fine, right? Wrong.
Suddenly, your phone explodes. Applications can't connect. Jobs are failing. Reports are timing out... Why? Because that one SQL login you created last week on the primary replica doesn't exist on the secondary. Or worse, it exists but with a different SID. Welcome to the seventh circle of DBA hell.
If you've managed traditional Availability Groups (AGs) for any length of time, you know this pain. Every single system object -- logins, AGENT JOBS, linked servers, server-level permissions -- must be manually synchronized across every replica. Miss one? Enjoy your incident ticket.
The Symphony of Suffering
Let's count the ways traditional AGs made us suffer:
Login Roulette. Creating a login on the primary? Better script it out with the exact same SID and password hash for every secondary. Oh, and don't forget to update all replicas when someone changes their password. Fun times!
Job Juggling. My LEAST FAVORITE, the SQL Server Agent jobs only run where they're created. Failover to a secondary? Hope you remembered to disable those jobs on the secondaries and have a process to enable them post-failover. Spoiler: you didn't.
Linked Server Limbo. Each linked server needs to be configured identically on every replica. Connection strings, security contexts, provider options -- one tiny difference and your cross-server queries explode spectacularly.
Permission Purgatory. Server-level permissions don't replicate. That service account that needs VIEW SERVER STATE? Better grant it everywhere. Manually. Every. Single. Time.
This wasn't just inconvenient -- it was a reliability nightmare and a complete pain in the a*s. Microsoft clearly got tired of hearing us complain about it at every conference, user group, and feedback forum.
Enter the Hero. Contained Availability Groups
With SQL Server 2022, Microsoft finally said, "Hey. Let's fix this mess." Contained Availability Groups (Contained AGs) bring sanity to the chaos by automatically synchronizing system metadata between replicas.
Think of it as a bubble around your AG that includes the databases AND all system objects that they databases need to function. Logins, AGENT JOBS, permissions -- they all live inside the bubble now, traveling together, hands held and smiling as one happy family during failovers. Serious!
The Magic Under the Hood
Contained AGs store system metadata in the master database of each replica, but here's the key: this metadata is scoped specifically to the AG. When you create a login in a Contained AG, it's not just a regular server login -- it's an AG-contained login that automatically synchronizes to all replicas.
No more SID mismatches. No more orphaned users. No more 2 AM panic attacks.
Making the Jump From Traditional to Contained
Ready to escape the old world? Here's the deal: You cannot convert a traditional AG to a Contained AG. SQL Server doesn't support changing an AG from one type to the other. You have to create a new Contained AG from scratch. But don't worry -- we've got options for minimizing the pain.
Step 1: Check Your Version First, ensure you're on SQL Server 2022 or later. Contained AGs are the new kids on the block.
Step 2: Choose Your Migration Strategy
Option A: The UIP (Upgrade-in-Place) - Downtime Required, so plan accordingly. This is the straightforward but disruptive approach:
- Script all relevant logins, jobs, and linked servers from your current primary
- Generate FULL database backups for all AG-databases
- Take your databases offline from the traditional AG
- Drop the traditional AG completely
- Create a new Contained AG using the same databases
- Re-create your system objects within the Contained AG context (via the listener)
- Bring everything back online
Option B: The Side-by-Side Migration (Zero Downtime!) The smooth operator's approach... A little more complex but no downtime.
-
Set up a parallel infrastructure
- You'll need completely separate SQL Server instances because a database can only belong to one AG at a time
- This means temporarily doubling your infrastructure:
- Spin up new VMs/servers (ideal if you have cloud elasticity)
- Leverage Azure SQL VM or AWS EC2 for temporary replicas
- Use existing dev/test servers temporarily (risky but it can be done)
- Install additional named instances on existing hardware (watch those resources!)
-
Create your new Contained AG on the new infrastructure
CREATE AVAILABILITY GROUP [YourContainedAG] CONTAINED <-- The magic word! WITH ( AUTOMATED_BACKUP_PREFERENCE = SECONDARY, DB_FAILOVER = ON, DTC_SUPPORT = PER_DB ) FOR DATABASE [YourDatabase] REPLICA ON 'NewServer1' WITH (ENDPOINT_URL = 'TCP://NewServer1:5022'), 'NewServer2' WITH (ENDPOINT_URL = 'TCP://NewServer2:5022')
-
Migrate your databases
- Restore previously created FULL backups WITH NORECOVERY to your new Contained AG primary instance
- Add databases to the Contained AG
- Let automatic seeding handle the secondaries
-
Recreate system objects in the Contained AG context
- Connect to the Contained AG listener (not the instance)
- Script in all logins, AGENT JOBS, linked servers
- These will now automatically replicate within the Contained AG
-
Test thoroughly
- Point test applications to the new AG listener
- Verify logins, permissions, and jobs work correctly
- Run parallel for several days to verify all business points of operation
-
Cut over your applications
- Update connection strings to point to the new Contained AG listener
- Keep the old AG running as a fallback
-
Decommission the old infrastructure
- After a burn-in period (72+ hours recommended)
- Drop the traditional AG
- Repurpose or decommission the old servers
Step 3: Verify Your Success Fail over to a secondary and watch in amazement as everything just... works. No missing logins. No broken jobs. It's beautiful.
The Fine Print
Before you rush off to implement Contained AGs everywhere, a few considerations:
- SQL Server 2022+ only. No backporting this goodness to older versions
- Some limitations apply. Not all system objects are supported (yet)
- Plan your migration. Moving from traditional to contained requires some downtime
- Different management approach. Some DMVs and system views work differently with contained objects
The Bottom Line
Contained Availability Groups aren't just an incremental improvement—they're a fundamental fix to one of the most frustrating aspects of SQL Server high availability. Microsoft clearly listened to the community's pain and delivered a solution that actually works.
If you're still managing traditional AGs and spending hours keeping system objects in sync, it's time to seriously consider SQL Server 2022 and Contained AGs. Your future self (and your sleep schedule) will thank you.
More information:
What is a Contained Availability Group ?Have you made the jump to Contained AGs? What's been your experience? Drop a comment below and let's compare notes on escaping the old world of AG management.
The main downside of contained Availability Groups (AGs), in my opinion, is that databases not yet members of the contained AG cannot be reached through the listener name—even when the host is the primary host for the contained AG.
ReplyDeleteSoftware vendors and installers often require full access to the database/server, so installations and upgrades don’t work while the database is part of an AG. Additionally, databases can’t be created within the contained AG by the installer.
Therefore, during installation, the installer has to connect using the hostname. But after installation (and after joining the contained AG), the application must connect using the listener name.
As a workaround, I create a dedicated DNS entry like {database(s)}.sql.ourdomain.int. During installation, this entry is a CNAME pointing to the host, and after installation, it points to the contained AG listener.
This approach helps because installers tend to hide connection settings in random places for later use, and you don’t want to search for and manually update them after the installer finishes.
An added bonus is that if the database is moved elsewhere, you can once again benefit from the flexibility of DNS without having to hunt down hidden connection details. :-)
Great point about the installation/upgrade challenges! That's definitely a pain point that MSFT's documentation tends to gloss over. Your DNS CNAME workaround is clever - essentially creating a switchable pointer that can flip between the host and listener as needed. Thanks for sharing your solution - that's a field-tested workaround that others may find useful!
DeleteThanks! You missed one complaint about AGs, which is that backups history is kept in msdb, and any backups monitoring will not understand that TLog backups might be running on a secondary replica, and in case of failover will not understand why suddenly there is no backup history.
ReplyDeleteGood catch on the backup history issue. The fact that msdb is instance-specific while backups run across replicas can be a real monitoring headache post failover. Have you tried centralizing backup history? Some folks collect backup data from all replicas into a central table, or you could consider using extended events to track backups at the AG level rather than relying on msdb.
DeleteBut yes, it would be something if they made backup history follow the database rather than the instance. Thanks for highlighting this!
"Restore previously created FULL backups WITH NORECOVERY to your new Contained AG primary instance" - That's not enough, right? If my original server is still online, then the full backups won't be at the same point in time as the original server's databases.
ReplyDeleteValid point — the FULL restore alone wouldn’t line up with a still-running source. The exact sequence really depends on whether you’re seeding from a static copy or syncing with live, which is why I kept the post at a higher level. The core migration pattern works either way, just with different approaches to managing the delta.
DeleteIn Option B you've shown a step with code to both create the contained AG and load your database into the contained AG followed by the step to restore the database with norecovery into the instance. Wouldn't you want to create the contained AG as a separate step, followed by the restore with norecovery of any databases that will be loaded, then use ALTER DATABASE to load them into the AG? Without that I would expect it to error out as you try to load a non-existent database into the AG.
ReplyDeleteYou're absolutely right, and thanks for the careful read! I should have been clearer about the sequencing in Option B. The steps as written would indeed throw an error trying to add a non-existent database to the AG. The correct sequence should be:
DeleteCreate the Contained AG shell first (without the FOR DATABASE clause)
Restore your databases WITH NORECOVERY to the new primary instance
Add the databases to the Contained AG using ALTER AVAILABILITY GROUP
I was trying to keep the example concise and inadvertently compressed steps that should be separate. I'll update the post to clarify this. Ironically, this is exactly this kind of detail that trips people up during migrations, so I appreciate you calling it out.
For anyone reading, this is a great reminder that even with the simplified management of Contained AGs, the initial setup still requires careful attention to the order of operations. Thanks for helping make the post more accurate!