Here’s a brain-storm wish list of my dream sql health check (Did I really just say that? Dream sql health check? I did, didn’t I. That’s it, I’m knocking off early for a pint).
—————————————————————–
Useful for a DBA – minimum of unnecessary information, focus on actionable items, pinpoints issues to improve health
Deliverable to other DBA or interested party – focus on showing steady improvement or consistently high overall health. Should be explainable, organized, professional, comprehensive, readable and understandable. Should satisfy – to some degree – typical audit requirements.
Quantifies health (e.g. 60%, B-, 5/10). The health quantification should be meaningful and easily explainable.
Illustrates trends in overall health over time
Leverages post SQL2008 out-of-box ‘health’ features – system health extended events and ring buffers, policy based management.
Centralized collection (w\option for single instance runs)
Customizable:
- Include some functionality that examines policies in the Policy Based Management for specific categories of policies and reports on them.
- Include some functionality that examines jobs for a specific category that would indicate a custom maintenance job and possibly prioritize them in the report.
Configurable:
- Allows exclusion of some checks, but this idea can be expanded on so that – for instance – a daily health check is executed for just the backup section of the report. Via a SSRS subscription or db_mail this backup report would be emailed to the DBA team and\or the DB owner for morning review. This could be helpful solution for a number of different frequently requested reports – index frag, disk space, ect.
- Incorporate an ‘.ini’ configuration file for each instance (could be txt or xml) to keep ideal configurations. Initially the ini file contains best-practice\default values, but the DBA either brings the instance in line with the best-practice\default values or – in case of exceptions – edits the ini file to match the instance settings. In this way the overall health visibly improves.
There are three categories health check info falls into. A report could be organized according to these categories and with sub heading for server, instance, database, table levels ….
- Informational
- Configurations
- Health
And here’s a list of what could be included in each category.
Informational:
- Overall Heath Rating\Grade
- OS Version and Service Pack
- Is Physical \ Virtual
- Clustered
- Nodes ect
- CPU (make\model, speed, platform (32\64bit)
- RAM
- Disk
- Uptime
- Description of use e.g. “sharepoint server”, “backend of web, critical”
- ServerName\InstanceName
- Version
- Edition
- SP
- CUP
- Global Trace Flags
- Logical CPUs
- Memory
- Min \ Max
- Total Server Memory \ Target Server Memory
- PLE Snapshot
- Is AWE?
- MDOP
- CTHFP
Configurations:
Server
- VAS =
- SQL Service has lock pages in memory?
- SQL Service account instance file initialization?
Instance
- SQL Service account
- PBM enabled\configured
- Data Collector enabled\configured
- Resoure Gov enabled\configured Login trigger?
- Database Mail enabled\configured
- SQL Error Log file count and last cycled
- TempDB file count and sizes
- DB file locations (eg not with system dbs)
- Some key ‘best practice’ settings from sys.configurations
- max server memory (MB) =
- min server memory (MB) =
- affinity I/O mask = 0
- affinity mask = 0
- affinity64 I/O mask = 0
- affinity64 mask = 0
- Agent XPs = 1
- backup compression default = 1
- cost threshold for parallelism = 25
- Database Mail XPs = 1
- default trace enabled = 1
- fill factor (%) = 0
- lightweight pooling = 0
- max degree of parallelism = 0
- optimize for ad hoc workloads = 1
- priority boost = 0
- recovery interval (min)= 0
- remote admin connections = 1
- scan for startup procs = 0
- show advanced options = 1
- xp_cmdshell = 1
- Database configurations
- Status = ONLINE
- Mirrored\AG member =
- Published =
- Subscriber =
- RCSI =
- Encrypted =
- Read only =
- Broker Enabled =
- Compatibility Level = Server Compatibility Level
- File growth in percent = 0
- File growth > 10MB = 1
- Page Verify = CHECKSUM
- Recovery setting = (FULL|SIMPLE|BULK)
- Forced Parameterization = 0
- Auto Close = 0
- Auto Shrink = 0
- Auto Update Stats = 1
- Auto Update Stats Asynch = 1
- Orphaned Users Count = 0
- Suspect pages = 0
- Separation of mdf and ldf
Health and Performance:
Server\OS:
- Monitoring – is it installed and running
- %Free Disk Space
- %Free Memory
- %CPU usage (snapshot or short history via ring buffer)
Instance:
- Waits (top 95th % percentile)
- Default trace
- Count of key events (e.g. sorts, hash joins, file growth, etc.)
- Error Log
- Scrape for Sev 16-25
- Count of suspect pages
- Jobs
- Failed
- Disabled
- Backup Jobs
- Schedules and durations (overlaps?)
- Top 1-3 High Resource Use Queries (from system health extended event or plan cache)
- Recent Blocking & Deadlocking (system health extended event)
- Snapshot Batch\sec, Compilations and Recompilations
- Security
- sysadmin
- Database
- DB file sizes & space free
- Database Backups
- Date of most recent full and log
- Destination
- DB space allocation – data, index, allocated, unused
- DB size trending (via backup size)
- Tables\indexes
- Heaps
- Index Fragmentation
- Index Use — unused, writes>reads, high-use\hot-spots
- Missing Indexes (top 95th % percentile)
- Duplicate Indexes
- Overlapping Indexes
- Stats last updated
… end of brain dump! Now, time to get to work and build some stuff!
I’m sure there could be more … whats on your SQL health check list?