Friday, September 5, 2014

What are SQL Server Statistics?

Statistics store information about the distribution of data in your tables.  The optimizer uses this metadata to estimate the selectivity of a given statement, and create the optimal query plan.  To manage SQL Server efficiently, you really need a good understanding of the statistics.  In this post I will just provide some information that may help you to better understand, and manage your statistics

Statistics are automatically created on each of the index columns, when an index is created, and they are created automatically by the optimizer, when AUTO_CREATE_STATISTICS is enabled. Auto-created stats names include the column name and object ID in hexadecimal format, like this:     _WA_Sys_00000002_79FD19BE


Statistics can also be created manually, via the CREATE STATISTICS statement... though I, personally, have never had a need.  Generally speaking, the optimizer typically does quite fine. 

That's how they are created, but where are they stored? Statistics are stored in the sysindexes table.  We can also find additional information within the sys.stats and sys.indexes metadata views.  

By default, the statistics are updated automatically.  This occurs when 20% of the data in a table changes. It's important to know that this threshold can sometimes be a little high on the larger tables.  For this reason, it's not a bad idea to create a custom routine specific to the larger tables alone.  Or, even to just update the stats manually on those larger tables, on a periodic basis.  

How do you know if your statistics are being updated?  Run this statement to see if AUTO UPDATE is enabled:

       EXEC sp_helpdb YourDBName

The status line in the result-set should include 'IsAutoUpdateStatistics'.  If it doesn't, your statistics are not being updated.... ouch.  You'll want to correct that with this next statement, but remember -- if stats haven't been updated for a while, this 1st run can be pretty timely, and invasive.  Be sure to do this after hours:

      USE YourDBName;
      -- Enable Auto Update of Statistics
      ALTER DATABASE YourDBName
      SET AUTO_UPDATE_STATISTICS ON;
      GO
      -- Update Statistics for whole database 
      EXEC sp_updatestats
      GO

So, how does SQL Server actually use the statistics?  The optimizer uses this metadata to estimate how many rows each operator in the query plan will operate on.  That estimation is used to calculate the cost of each operator, and identify the cheapest query plan for the statement you are running.  Outdated, inaccurate or missing statistics can dramatically impact this process.  If the optimizer cannot accurately estimate this cost, you are going to see substantial performance degradation.

Performance not quite what it should be?  Well... when were your statistics last updated? Real easy to check:  

     SELECT 
          OBJECT_NAME(object_id) [Table],
          STATS_DATE(object_id, stats_id) [LastUpdated],
          name [Statistic]
     FROM 
         sys.stats
    WHERE 
         OBJECT_NAME(object_id) NOT LIKE 'sys%'
    ORDER BY 
         STATS_DATE(object_id, stats_id) DESC

    TIP:   Keep that one in your quick list. Somebody says 'why is it so slow?'.... start by 
             checking the stats.

Alright, so how do you read your stats?  I've listed several different methods below.  Run them on your system, take a look at your output.  This should help to better familiarize yourself with the statistics overall.  

     -- using sys.partitions and sys.indexes
     SELECT
     OBJECT_NAME(p.object_id) [Table],
             p.index_id [IndexID],
             i.name [Index],
             p.rows [#Rows]
      FROM
     sys.partitions p INNER JOIN sys.indexes i 
        ON p.object_id = i.object_id
                AND p.index_id = i.index_id
      WHERE   
      p.object_id = OBJECT_ID(N'Production.WorkOrder');

     -- using sys.dm_db_index_physical_stats
     SELECT  
       OBJECT_NAME(ips.object_id) [Table],
             ips.index_id [IndexID],
             i.name [Index],
             ips.record_count [#Rows]
     FROM
 sys.dm_db_index_physical_stats(DB_ID(N'AdventureWorks2012'),OBJECT_ID(N'Production.WorkOrder'),NULL,NULL,'DETAILED') ips JOIN sys.indexes i 
 ON ips.object_id = i.object_id
                  AND ips.index_id = i.index_id
     WHERE
    ips.index_level = 0;


     -- using sysindexes, sysobjects, sysusers, and information_schema.tables
     SELECT
         schemas.name [Schema],
         tbls.name [Table],
         i.name [Index],
         i.id [TableID],
         i.indid [IndexID],
         i.GroupID,
         i.rowmodctr [ModifiedRows],
          (SELECT MAX(rowcnt) 
           FROM sysindexes i2 
           WHERE i.id = i2.id
           AND i2.indid < 2) [RowCnt],
             CONVERT(DECIMAL(18,8),CONVERT(DECIMAL(18,8),i.rowmodctr) /
              CONVERT(DECIMAL(18,8),
(SELECT MAX(rowcnt) FROM sysindexes i2 WHERE i.id = i2.id AND i2.indid < 2) ) )      
           [ModifiedPct],
         stats_date( i.id, i.indid ) [LastStatsUpdate],
    'False' [Processed]
     FROM
          sysindexes i INNER JOIN sysobjects tbls
             ON i.id = tbls.id INNER JOIN sysusers schemas 
                 ON tbls.uid = schemas.uid INNER JOIN information_schema.tables tl
                    ON tbls.name = tl.table_name
                    AND schemas.name = tl.table_schema
                    AND tl.table_type = 'BASE TABLE'
     WHERE
          0 < i.indid 
         AND i.indid < 255
         AND table_schema <> 'sys'
         AND i.rowmodctr <> 0
AND (SELECT MAX(rowcnt) FROM sysindexes i2 WHERE i.id = i2.id AND i2.indid < 2) > 0


     -- using sys.stats and sys.dm_db_stats_properties
     SELECT 
          st.object_id [TableID],
          OBJECT_NAME(st.object_id) [Table],
          st.name [Index],
          STATS_DATE(st.object_id, st.stats_id) [LastUpdated],
          modification_counter [RowsModified]
     FROM
          sys.stats st CROSS APPLY sys.dm_db_stats_properties(st.object_id, st.stats_id) sp 
    WHERE
          STATS_DATE(st.object_id, st.stats_id) <= DATEADD(DAY,-1,GETDATE())
         AND modification_counter > 0
         AND OBJECTPROPERTY(st.object_id'IsUserTable' ) = 1


That's it for now.  It's pretty simple -- if your statistics are out of date, or missing, the optimizer is not able to make optimal query plan choices.  This will impede query performance.  In some cases very dramatically.  I have given you a lot of choices for viewing your statistics, and I have even sampled an actual statistic update.  I encourage you to take a look at Ola Hallengren's database maintenance solution, which includes a good piece for maintaining SQL Server statistics.   See here:   

       https://ola.hallengren.com/sql-server-index-and-statistics-maintenance.html

Another good reference is at Brent Ozar's site, the sp_Blitz script.  Among many other things, this procedure will tell you where your auto stats updates are disabled.  I encourage you to take a look at this page, but -- don't stop with the statistics.  You can use sp_Blitz to perform a very fast, yet hugely informative health check on your systems. Check it out.

      http://www.brentozar.com/blitz/auto-update-stats-disabled/


Lastly, take a look here for more in depth information regarding the statistics:
   
      http://msdn.microsoft.com/en-us/library/ms190397.aspx

No comments:

Post a Comment