I received that question from one of my customers recently. A very good question, in my opinion, and one that should be asked
of any potential DBA candidate.
Best case, the DBA would have automated
checks in place that run 24/7, proactively notifying, or alerting of any problems
BEFORE they occur. There are many different was to define these type of alerts
- SQL Server Agent, SMO, Windows, WMI, and more. Regardless, if an automated routine is in
place, the daily DBA checklist becomes fairly minimal. One cannot,
however, just assume the alerts are running… and the checklist still
exists. I've provided three different
groups of checks.
Group 1
This 1st group must be done every day. Each of these items
are perfectly alert-able, but the alerts may not be setup yet, so I
suggest that each of these checks must occur daily, via automated routine or
manually:
a. All backups completed successfully?
b. Capacity concerns on backup drive?
c. Backup durations very long?
Security
a. Review SQL Server log AND Windows Security Event Log
Capacity
a. Data and Log file space -- file sizes, growth?
b. Disk space - ALL drives -- available disk space?
OS Event Logs/SQL Server Logs
a. Review for any abnormal events
Scheduled jobs - (Agent, Windows Scheduled tasks, etc.)
a. Confirm all scheduled jobs ran successfully
b. Even check job step duration, make sure nothing is taking too long
Group 2
This 2nd group is also extremely valuable, but it only provides an immediate state, UNLESS the DBA has setup an automated
collection. This can be done using the Data Collector, to capture
statistics for Server Activity, Disk Usage and Query Statistics. These collections can be run every day,
creating moving metrics that grow and change as your business activity
does. When certain events or conditions
are detected, the DBA can be notified/alerted. This is not available out
of the box with SQL Server. It’s got to
be setup, and managed regularly (see here). If it is not in place, the checking
should still occur, at least on a weekly basis. Even just the current state is very valuable.
a. SQL Server performance counters
Review table/index fragmentation statistics
a. See my query here, using sys.dm_db_index_physical_stats, sys.indexes
Review performance statistics
a. See sp_blitz from Brent Ozar’s team
Group 3
Lastly, this check should be done at least once monthly. It
can be completely automated, but given that it only occurs monthly, the
manual invocation is acceptable, too. Don't just look at the success/fail status of the backup job. You've got to restore a .bak file and confirm the integrity of the data. The last thing you want is to be doing an audit, and find your .bak files are unusable.
Restore a backup at least weekly, to validate backup solution
One last thing that I would note is that if the DBA is
logging ALL changes that are made everywhere, as is recommended, then
he/she should take time to review these metrics on a regular basis, too. One server or 100, one DBA or a Team -- it's just a very good thing to keep an eye on.
No comments:
Post a Comment