health checks | Andrew McDermid

Here’s a brain-storm wish list of my dream sql health check (Did I really just say that? Dream sql health check? I did, didn’t I. That’s it, I’m knocking off early for a pint).

—————————————————————–

Useful for a DBA – minimum of unnecessary information, focus on actionable items, pinpoints issues to improve health

Deliverable to other DBA or interested party – focus on showing steady improvement or consistently high overall health. Should be explainable, organized, professional, comprehensive, readable and understandable. Should satisfy – to some degree – typical audit requirements.

Quantifies health (e.g. 60%, B-, 5/10). The health quantification should be meaningful and easily explainable.

Illustrates trends in overall health over time

Leverages post SQL2008 out-of-box ‘health’ features – system health extended events and ring buffers, policy based management.

Centralized collection (w\option for single instance runs)

Customizable:

Include some functionality that examines policies in the Policy Based Management for specific categories of policies and reports on them.
Include some functionality that examines jobs for a specific category that would indicate a custom maintenance job and possibly prioritize them in the report.

Configurable:

Allows exclusion of some checks, but this idea can be expanded on so that – for instance – a daily health check is executed for just the backup section of the report. Via a SSRS subscription or db_mail this backup report would be emailed to the DBA team and\or the DB owner for morning review. This could be helpful solution for a number of different frequently requested reports – index frag, disk space, ect.
Incorporate an ‘.ini’ configuration file for each instance (could be txt or xml) to keep ideal configurations. Initially the ini file contains best-practice\default values, but the DBA either brings the instance in line with the best-practice\default values or – in case of exceptions – edits the ini file to match the instance settings. In this way the overall health visibly improves.

There are three categories health check info falls into. A report could be organized according to these categories and with sub heading for server, instance, database, table levels ….

Informational
Configurations
Health

And here’s a list of what could be included in each category.

Informational:

Overall Heath Rating\Grade
OS Version and Service Pack
Is Physical \ Virtual
Clustered
- Nodes ect
CPU (make\model, speed, platform (32\64bit)
RAM
Disk
Uptime
Description of use e.g. “sharepoint server”, “backend of web, critical”
ServerName\InstanceName
- Version
- Edition
- SP
- CUP
- Global Trace Flags
Logical CPUs
Memory
- Min \ Max
- Total Server Memory \ Target Server Memory
- PLE Snapshot
- Is AWE?
MDOP
CTHFP

Configurations:

Server

VAS =
SQL Service has lock pages in memory?
SQL Service account instance file initialization?

Instance

SQL Service account
PBM enabled\configured
Data Collector enabled\configured
Resoure Gov enabled\configured Login trigger?
Database Mail enabled\configured
SQL Error Log file count and last cycled
TempDB file count and sizes
DB file locations (eg not with system dbs)
Some key ‘best practice’ settings from sys.configurations
- max server memory (MB) =
- min server memory (MB) =
- affinity I/O mask = 0
- affinity mask = 0
- affinity64 I/O mask = 0
- affinity64 mask = 0
- Agent XPs = 1
- backup compression default = 1
- cost threshold for parallelism = 25
- Database Mail XPs = 1
- default trace enabled = 1
- fill factor (%) = 0
- lightweight pooling = 0
- max degree of parallelism = 0
- optimize for ad hoc workloads = 1
- priority boost = 0
- recovery interval (min)= 0
- remote admin connections = 1
- scan for startup procs = 0
- show advanced options = 1
- xp_cmdshell = 1
Database configurations
- Status = ONLINE
- Mirrored\AG member =
- Published =
- Subscriber =
- RCSI =
- Encrypted =
- Read only =
- Broker Enabled =
- Compatibility Level = Server Compatibility Level
- File growth in percent = 0
- File growth > 10MB = 1
- Page Verify = CHECKSUM
- Recovery setting = (FULL|SIMPLE|BULK)
- Forced Parameterization = 0
- Auto Close = 0
- Auto Shrink = 0
- Auto Update Stats = 1
- Auto Update Stats Asynch = 1
- Orphaned Users Count = 0
- Suspect pages = 0
- Separation of mdf and ldf

Health and Performance:

Server\OS:

Monitoring – is it installed and running
%Free Disk Space
%Free Memory
%CPU usage (snapshot or short history via ring buffer)

Instance:

Waits (top 95^th % percentile)
Default trace
- Count of key events (e.g. sorts, hash joins, file growth, etc.)
Error Log
- Scrape for Sev 16-25
Count of suspect pages
Jobs
- Failed
- Disabled
Backup Jobs
- Schedules and durations (overlaps?)
Top 1-3 High Resource Use Queries (from system health extended event or plan cache)
Recent Blocking & Deadlocking (system health extended event)
Snapshot Batch\sec, Compilations and Recompilations
Security
- sysadmin
Database
- DB file sizes & space free
- Database Backups
  - Date of most recent full and log
  - Destination
- DB space allocation – data, index, allocated, unused
- DB size trending (via backup size)
- Tables\indexes
  - Heaps
  - Index Fragmentation
  - Index Use — unused, writes>reads, high-use\hot-spots
  - Missing Indexes (top 95^th % percentile)
  - Duplicate Indexes
  - Overlapping Indexes
  - Stats last updated

… end of brain dump! Now, time to get to work and build some stuff!

I’m sure there could be more … whats on your SQL health check list?

Andrew McDermid

this is my professional blog

Tag Archives: health checks

What would a perfect SQL Server health check report look like?