Wednesday, January 28, 2015

SQL Server -- Daily DBA Checklist

" If you were running a DB area - What key items would you check / monitor every morning? "

I received that question from one of my customers recently.  A very good question, in my opinion, and one that should be asked of any potential DBA candidate.   

Best case, the DBA would have automated checks in place that run 24/7, proactively notifying, or alerting of any problems BEFORE they occur.  There are many different was to define these type of alerts - SQL Server Agent, SMO, Windows, WMI, and more.  Regardless, if an automated routine is in place, the daily DBA checklist becomes fairly minimal.  One cannot, however, just assume the alerts are running… and the checklist still exists.  I've provided three different groups of checks.  

Group 1    
This 1st group must be done every day.  Each of these items are perfectly alert-able, but the alerts may not be setup yet, so I suggest that each of these checks must occur daily, via automated routine or manually: 
        
          Backups
               a. All backups completed successfully?
               b. Capacity concerns on backup drive?
               c. Backup durations very long?
         Security
               a. Review SQL Server log AND Windows Security Event Log
        Capacity
               a. Data and Log file space -- file sizes, growth?
               b. Disk space - ALL drives -- available disk space?
        
        OS Event Logs/SQL Server Logs
               a. Review for any abnormal events
        Scheduled jobs - (Agent, Windows Scheduled tasks, etc.)
               a. Confirm all scheduled jobs ran successfully
               b. Even check job step duration, make sure nothing is taking too long


Group 2
This 2nd group is also extremely valuable, but it only provides an immediate state, UNLESS the DBA has setup an automated collection.  This can be done using the Data Collector, to capture statistics for Server Activity, Disk Usage and Query Statistics.  These collections can be run every day, creating moving metrics that grow and change as your business activity does.  When certain events or conditions are detected, the DBA can be notified/alerted.  This is not available out of the box with SQL Server.  It’s got to be setup, and managed regularly (see here).  If it is not in place, the checking should still occur, at least on a weekly basis.  Even just the current state is very valuable.

               Review blocking issues
                   a. SQL Server performance counters
              Review table/index fragmentation statistics
                   a. See my query here, using sys.dm_db_index_physical_stats, sys.indexes
              Review performance statistics
                   a. See sp_blitz from Brent Ozar’s team


Group 3
Lastly, this check should be done at least once monthly.  It can be completely automated, but given that it only occurs monthly, the manual invocation is acceptable, too.  Don't just look at the success/fail status of the backup job.  You've got to restore a .bak file and confirm the integrity of the data.  The last thing you want is to be doing an audit, and find your .bak files are unusable.

Restore a backup at least weekly, to validate backup solution

One last thing that I would note is that if the DBA is logging ALL changes that are made everywhere, as is recommended, then he/she should take time to review these metrics on a regular basis, too.  One server or 100, one DBA or a Team -- it's just a very good thing to keep an eye on.

No comments:

Post a Comment