Tuesday, January 20, 2026

I/O Requests Taking Longer Than 15 Seconds -- Now What?

In a previous post, I shared a script to detect the I/O requests taking longer than 15 seconds warning across your SQL Server inventory.  Now let's talk about what to do when you find it.

Here are five of the most common causes with some tips to investigate each:

1. Storage Subsystem Bottlenecks

This is the most common cause. Your SAN, NAS, or local disks are overloaded, hitting IOPS limits, experiencing network/iSCSI latency, or suffering from VM I/O throttling ('noisy neighbors'). Driver/firmware issues and even the Windows power plan can contribute.

Check file-level latency from inside SQL Server:

SELECT
    DB_NAME(fs.database_id) database_name,
    mf.name logical_name,
    mf.physical_name,
    CASE WHEN num_of_reads = 0 THEN 0 
         ELSE (io_stall_read_ms / num_of_reads) END avg_read_latency_ms,
    CASE WHEN num_of_writes = 0 THEN 0 
         ELSE (io_stall_write_ms / num_of_writes) END avg_write_latency_ms
FROM sys.dm_io_virtual_file_stats(NULL, NULL) fs INNER JOIN sys.master_files mf 
  ON fs.database_id = mf.database_id 
  AND fs.file_id = mf.file_id
ORDER BY (io_stall_read_ms + io_stall_write_ms) DESC;

Read latency consistently over 20ms or write latency over 10ms indicates a problem. For transaction logs, you want sub-millisecond writes. Also check Perfmon counters Avg. Disk sec/Read and Avg. Disk sec/Write for the affected volumes. If latencies are high at the OS level, pull in your storage/infrastructure team to review further.

CAUSE: External — not SQL Server.

2. Tempdb Contention

Tempdb gets special treatment because it's so critical.  Causes include too few data files, tempdb on slow storage, heavy spills from sorts/hashes, or version store pressure.

Check for PAGELATCH waits on tempdb allocation pages:

SELECT 
    session_id,
    wait_type,
    wait_time,
    wait_resource
FROM sys.dm_exec_requests
WHERE wait_type LIKE 'PAGELATCH%'
  AND wait_resource LIKE '2:%' --- The '2:' prefix means database_id 2, which is tempdb.
ORDER BY wait_time DESC;

Check tempdb file count vs. CPU cores:

SELECT 
    COUNT(*) tempdb_data_files,
    (SELECT cpu_count FROM sys.dm_os_sys_info) logical_cpus
FROM sys.master_files
WHERE database_id = 2 AND type = 0;

If you have fewer tempdb data files than logical CPUs (up to 8), and you're seeing PAGELATCH_UP or PAGELATCH_EX waits on pages like 2:1:1 (PFS) or 2:1:3 (SGAM), consider adding more equally-sized tempdb data files.

CAUSE: SQL Server configuration.

3. File Autogrowth Delays

When data or log files auto-grow — especially if IFI (Instant File Initialization) isn't enabled — Windows has to zero-fill the new space. This can freeze I/O for seconds or longer.

Check if IFI is enabled:

SELECT 
    servicename,
    instant_file_initialization_enabled
FROM sys.dm_server_services
WHERE servicename LIKE 'SQL Server (%';

Check current autogrowth settings:

SELECT 
    DB_NAME(database_id) database_name,
    name logical_name,
    type_desc,
    CASE WHEN is_percent_growth = 1 
         THEN CAST(growth AS VARCHAR) + '%'
         ELSE CAST((growth * 8) / 1024 AS VARCHAR) + ' MB' 
    END autogrowth_setting,
    (size * 8) / 1024 current_size_mb
FROM sys.master_files
--WHERE database_id > 4 -- uncomment this to look at ALL database files
ORDER BY database_id, type;

IFI should be enabled — it's one of my default recommendations on any SQL Server build that does not have TDE (Transparent Data Encryption) enabled. Autogrowth settings should be in MB, not percentages, and sized appropriately.  Small increments cause frequent grows; huge increments cause long freezes.

CAUSE: SQL Server configuration.

4. Antivirus, Filter Drivers & Maintenance Conflicts

Real-time AV scanning of MDF/NDF/LDF files, VSS snapshots, encryption software, backup agents, or index rebuilds and DBCC checks competing for I/O during peak hours can all trigger the warning.

List filter drivers attached to your SQL Server volumes (run from an elevated command prompt):

fltmc instances

Look for antivirus drivers (altitudes 320000-329998) attached to your data volumes.

Correlate warning times with maintenance jobs:

SELECT 
    j.name job_name,
    h.step_name,
    h.run_date,
    h.run_time,
    h.run_duration
FROM msdb.dbo.sysjobhistory h INNER JOIN msdb.dbo.sysjobs j 
  ON h.job_id = j.job_id
WHERE h.run_status = 1
AND h.run_date >= CONVERT(INT, CONVERT(VARCHAR(8), DATEADD(DAY, -7, GETDATE()), 112))
ORDER BY h.run_date DESC, h.run_time DESC;

If the 15-second warnings correlate with backup jobs, index rebuilds, or DBCC CHECKDB, you should research whether they can be rescheduled. If AV scanning is active on your data volumes, work with your security team to exclude SQL Server files.

CAUSE: External / SQL Server scheduling.

5. Query/Workload Pressure

Missing indexes, large scans, or memory pressure forcing excessive physical reads can overwhelm even healthy storage.

Check wait stats for I/O-related waits:

SELECT TOP 10
    wait_type,
    wait_time_ms / 1000.0 wait_time_sec,
    (wait_time_ms - signal_wait_time_ms) / 1000.0 resource_wait_sec,
    waiting_tasks_count
FROM sys.dm_os_wait_stats
WHERE wait_type IN (
	'PAGEIOLATCH_SH', 'PAGEIOLATCH_EX', 'WRITELOG', 'IO_COMPLETION', 'ASYNC_IO_COMPLETION'
    )
ORDER BY wait_time_ms DESC;

Check for missing indexes:

SELECT TOP 10
    DB_NAME(d.database_id) database_name,
    OBJECT_NAME(d.object_id, d.database_id) table_name,
    d.equality_columns,
    d.inequality_columns,
    d.included_columns,
    s.user_seeks,
    s.avg_user_impact
FROM sys.dm_db_missing_index_details d INNER JOIN sys.dm_db_missing_index_groups g 
  ON d.index_handle = g.index_handle INNER JOIN sys.dm_db_missing_index_group_stats s 
    ON g.index_group_handle = s.group_handle
WHERE d.database_id > 4
ORDER BY (s.user_seeks * s.avg_user_impact) DESC;

High PAGEIOLATCH_SH waits often indicate queries doing more physical reads than necessary.  This might be a memory issue (not enough buffer pool) or an indexing issue (scans instead of seeks).  Missing indexes with high user_seeks and avg_user_impact are good candidates for creation.

CAUSE: Workload / SQL Server configuration.

Bottom Line: Is it SQL Server or Not?

Run the diagnostics above.  If your file latencies (from sys.dm_io_virtual_file_stats) are high AND Perfmon shows healthy disk response times, the problem is very likely inside SQL Server — configuration, queries, or scheduling.

If both SQL Server and the OS agree that disks are slow, approach your infrastructure team with this evidence.  The 15-second warning is SQL Server telling you something is wrong.  Your job is to figure out which side of the fence the problem lives on.

More to Read

No comments:

Post a Comment