Monday, December 29, 2025

Your Transaction Log is a Time Bomb... Here's How to Check the Fuse

A 200GB transaction log on a 10GB database. I've seen it. I'm betting that you have, too.

Transaction logs grow silently. They don't complain until the disk fills up, the database goes read-only, and your phone starts buzzing. By then, you're not troubleshooting -- you're firefighting.

Let's be proactive and defuse it before it blows.

Find the Bloated Logs

This script shows the ratio of log size to data file size for every database. Starting your logs out at 25% of the data file size is a good place to start, but there is no fixed ratio. I often start at 25% and monitor during peak usage to see if it needs to be adjusted --- and it's safe to always give yourself a little buffer. Run this query to see anywhere that you may be hurting:

SELECT 
    db.name AS DatabaseName,
    db.recovery_model_desc AS RecoveryModel,
    CAST(SUM(CASE WHEN mf.type = 0 THEN mf.size END) * 8.0 / 1024 AS DECIMAL(10,2)) AS DataMB,
    CAST(SUM(CASE WHEN mf.type = 1 THEN mf.size END) * 8.0 / 1024 AS DECIMAL(10,2)) AS LogMB,
    CAST(SUM(CASE WHEN mf.type = 1 THEN mf.size END) * 100.0 / 
         NULLIF(SUM(CASE WHEN mf.type = 0 THEN mf.size END), 0) AS DECIMAL(5,1)) AS LogToDataRatio
FROM sys.databases db INNER JOIN sys.master_files mf 
  ON db.database_id = mf.database_id
GROUP BY db.name, db.recovery_model_desc
HAVING SUM(CASE WHEN mf.type = 1 THEN mf.size END) * 100.0 / 
       NULLIF(SUM(CASE WHEN mf.type = 0 THEN mf.size END), 0) > 50
ORDER BY LogToDataRatio DESC;

Sample output from a server that needs attention:

Database Recovery Data MB Log MB Ratio %
LegacyApp FULL 8,420 187,392 2,225.0
Warehouse FULL 245,000 89,500 36.5
WebApp SIMPLE 52,100 4,200 8.1

LegacyApp with a 2,225% ratio. Full recovery model. Someone set it up years ago, enabled it as FULL because it sounded important, and never configured log backups. Classic.


Why Won't the Log Truncate?

This tells you exactly what's holding your log hostage:

SELECT 
    name AS DatabaseName,
    log_reuse_wait_desc AS WhyLogCantTruncate
FROM sys.databases
WHERE log_reuse_wait_desc <> 'NOTHING'
ORDER BY name;

Common culprits:

Wait Type Translation
LOG_BACKUP No log backups. Ever. Or not often enough.
ACTIVE_TRANSACTION Someone left a transaction open. Find it.
REPLICATION Log reader agent is behind or broken.
AVAILABILITY_REPLICA AG secondary can't keep up. Network or disk issue.

The VLF Problem Nobody Talks About

Virtual Log Files (VLFs) are internal chunks of your transaction log. Too many = slow backups, slow restores, slow recovery. I asked an AI chatbot about VLFs once and it gave me a 3,000-word essay. Here's the short version: keep them under 1,000.

SELECT 
    db.name AS DatabaseName,
    COUNT(li.database_id) AS VLF_Count
FROM sys.databases db
CROSS APPLY sys.dm_db_log_info(db.database_id) li
GROUP BY db.name
HAVING COUNT(li.database_id) > 1000 --- comment this out to see the current VLF counts on all of your databases
ORDER BY VLF_Count DESC;

Thousands of VLFs usually means the log grew in tiny, panicked increments because autogrowth was set to something absurd like 1MB. Fix the autogrowth, shrink the log, grow it back properly.


To Better Manage Your Transaction Log

If recovery model is FULL: Back up your logs. Every 15-30 minutes is a good method for busy systems. If you do not need point-in-time recovery, switch to SIMPLE and be sure you understand your recovery goals. Maybe in the reverse order.

If there's an open transaction: Find it with DBCC OPENTRAN. Talk to the owner first, then kill it if you must.

If VLFs are out of control: Shrink the log (yes, I know, but sometimes you have no choice), then grow it back in large, fixed chunks—4GB or 8GB at a time.

Set sensible autogrowth: 256MB for data, 128MB for log. Never percentages. Never 1MB.


Make It a Habit

Add this to your weekly checks. A simple alert when any log exceeds 50% of data size will allow you to review potential problems before they become emergencies:

-- alert when log file size exceeds 50% of data file size
SELECT db.name, 
    CAST(SUM(CASE WHEN mf.type = 1 THEN mf.size END) * 100.0 / 
	NULLIF(SUM(CASE WHEN mf.type = 0 THEN mf.size END), 0) AS DECIMAL(5,1)) AS LogToDataRatio
FROM sys.databases db INNER JOIN sys.master_files mf 
  ON db.database_id = mf.database_id
GROUP BY db.name
HAVING SUM(CASE WHEN mf.type = 1 THEN mf.size END) * 100.0 / 
 NULLIF(SUM(CASE WHEN mf.type = 0 THEN mf.size END), 0) > 50;

The Bottom Line

Transaction logs are patient. They'll grow quietly for months, even years, waiting for the perfect moment to ruin your weekend. Don't let them.

Check yours now. Your future self and your disk space will thank you.


More to Read:

6 comments:

  1. Usefull article (I immediate checked / fixed some of my dbs :-)), but it is a bit strange / uncommon, that the DWH in your first example results is in FULL recovery mode, while the WebApp is in SIMPLE.

    Usually it is exactly the opposite, since you don't want to lose data in the user app and bulk load large stuff into the DWH.

    PS: I know, its just a wild example, but a newbie or AI may read it and make the wrong conclusions.

    ReplyDelete
    Replies
    1. I hear you... but one can never make assumptions on the recovery models. Anybody can change them however they wish. ;) Glad you found it useful!

      Delete
  2. Just FYI, I couldn't run the ratio query without adding omitting the ReportServer databases.

    FROM sys.databases db INNER JOIN sys.master_files mf
    ON db.database_id = mf.database_id
    WHERE db.name NOT LIKE 'ReportServer%'
    GROUP BY db.name, db.recovery_model_desc

    ReplyDelete
    Replies
    1. Interesting! It completes just fine for me on a server with SSRS databases. Could be permissions, db state issues... or something else completely unrelated. Hard to say without knowing the error. If you want to dig in further, feel free to reach out.

      Delete
  3. Rebecca

    Thanks for a well-written article. But I have a question from my implementation of it. I've left your code mostly alone, only dropping the where clauses so I can see all the databases on my server. When I run the last code segment (alert when log file size exceeds 50% of data file size), I get 8 different tempdb results, each saying the log to data ratio is 100%. The first code segment shows the grouped sum for the DataMB as 64.00 and the LogMB as 8.00 with the log to data ratio as 12.5%. You weren't grouping the last segment, so would that be the correct fix ... or am I missing something else.

    ReplyDelete
    Replies
    1. Good catch — you're right, it needed the grouping. I've updated the post. Thanks!!

      Delete