Chasing Down Errors in the Management Data Warehouse Disk Usage Collection Set Report

If you use Management Data Warehouse and the accompanying SSRS reports you might have already noticed the issues with the Disk Usage Collection Set Reports. It is spelled out here – http://social.msdn.microsoft.com/Forums/en-US/sqlkjmanageability/thread/5af56ba9-0ef0-4609-9a23-6da2f59bb266/ – but in summary if you click on the Database Trend line, the graph you see shows you a total database size approximately twice as big as the actual file size. Furthermore, the series fields on the graph – data, reserved, unused and index – don’t really make sense and examining the table view of the data enforces this feeling: Why is Data Size itself as big as the database’s physical data file? Why would Data Size and Unused Size be close the same value? Why are the Unallocated Size and Unused Size values static over time through the entire 100 samples?

In this screen shot from an example database you can see all these issues. The graph reports data size near twice as big as the physical file, a relatively large and suspicious ‘Unused Size’ as well as static values for ‘Unallocated Size’ field.

file vs graph

Obviously, there is more going on here than any clever interpretation of the data or graph can account for… in fact – as I discovered- it is just plain wrong. I dug a little deeper into the data and process underlying the report to figure out why and this post describes what I found out.

Chasing Rabbits:

To begin with let’s review the terms and layout of database disk usage with a review of the old standby sp_spaceused.  Per BOL this proc – when run against a database – returns both summary and detailed results–

Summary

  • database_name:               Name of the current database.
  • database_size:                  Size of the current database in megabytes. database_size includes both data and log files.
  • unallocated space:           Space in the database that has not been reserved for database objects.

Details

  • Reserved:            Total amount of space allocated by objects in the database.
  • Data:                    Total amount of space used by data.
  • Index_size:          Total amount of space used by indexes.
  • Unused:               Total amount of space reserved for objects in the database, but not yet  used.

Here we have the general usage-type building blocks that make up a data file (or files) and the layout can be envisioned as something like this–

DFS

…with the Data, Index and Unused blocks summing up to the Reserved size and the remaining ‘free-space’ in the file termed Unallocated.

For the example database shown above sp_spaceused returns–

sp_spaceused MB
database_size 570473
unallocated space 30903.91
data 255669.368*
index_size 185424.8*
unused 2601.024*
reserved 443695.192*

*converted from KB

So – as we suspected- we do not have much of a match up with the Disk Usage Collector report and sp_spaceused for the same database.

One drawback of the MDW SSRS reports is that the RDL is not accessible – it is wrapped up in the SSRS application and cannot be viewed or modified. So, the next steps in identifying where the report goes awry is to check in on the source data for the report – working back from the MDW table to the data collector. Running a profiler trace against the MDW DB while executing the report shows the underlying query to be this—

DECLARE @snapshot_id int

SELECT TOP 1 @snapshot_id = snapshot_id FROM (
SELECT DISTINCT TOP 100 d.snapshot_id
FROM snapshots.disk_usage d, core.snapshots ss
WHERE ss.instance_name = @ServerName
AND ss.snapshot_id = d.snapshot_id
ORDER BY d.snapshot_id DESC
) AS q
ORDER BY snapshot_id ASC

DECLARE @unallocated_space_mb float
DECLARE @reserved_mb float

SELECT   @reserved_mb=(d.reservedpages * 8192 / 1048576.0),
@unallocated_space_mb=(
case
when d.dbsize >= d.reservedpages
then (convert (dec (15,2),d.dbsize) - convert (dec (15,2),d.reservedpages)) * 8192 / 1048576.0
else 0
end)
FROM snapshots.disk_usage d, core.snapshots ss
WHERE d.database_name=@SelectedDatabaseName
AND ss.instance_name = @ServerName
AND d.snapshot_id = ss.snapshot_id

SELECT
CONVERT (datetime, SWITCHOFFSET (CAST (d.collection_time AS datetimeoffset), '+00:00')) AS collection_time,
d.snapshot_id,
((convert(dec (15,2),d.dbsize) + convert(dec(15,2),d.logsize)) * 8192 / 1048576.0) AS 'database_size_mb',
@reserved_mb AS 'reserved_mb',
@unallocated_space_mb AS 'unallocated_space_mb',
((d.usedpages - d.pages) * 8192 / 1048576.0) AS 'index_mb'
FROM snapshots.disk_usage d, core.snapshots ss
WHERE database_name=@SelectedDatabaseName
AND d.snapshot_id >= @snapshot_id
AND ss.instance_name = @ServerName
AND d.snapshot_id = ss.snapshot_id
ORDER BY collection_time

The most important thing to note here is the source table snapshots.disk_usage since we’ll need to check that data out. However, just a quick overview of this query brings a few issues to light. If you break the query down into its three component SELECT-FROM statements you get this analysis—

  1. Sets @snapshot_id to the most recent 100th record (for this DB and instance)
  2. Sets @reserved_mb and @unallocated_space_mb to the values in the most recent 100th record
  3. Returns collection_time, database_size_mb, reserved_mb, unallocated_space_mb and index_mb

Interesting: reserved_mb, unallocated_space_mb should change over time but since they are selected from the variables @reserved_mb and @unallocated_space_mb they will be static for each row. Also to note, the database_size_mb field sums data and log file size similarly to the sp_spacedused proc results. This looks correct in itself, but going back to the report the graph illustrates ‘Data Size’ which would seem to indicate data only. Something is just not right here… For comparison with sp_spaceused this is the top result for our example DB–

MDW Report MB
database_size_mb 570473
reserved_mb 433156.62
unallocated_space_mb 31043.383
index_mb 181004.77

…and to illustrate the mis-mapped fields (like the database_size_mb = ‘Data Size’ mentioned above but also including reserved_mb = ‘Unused Size’), here is how these results end up transferring to the report—

MDW disk 2

To continue to chase this rabbit, here is a sample of the data direct from the MDW source table snapshots.disk_usage –

snapshots.disk_usage value
dbsize 59417600
logsize 16216192
ftsize 0
reservedpages 55493987
usedpages 55148908
pages 31941867
database_name Example
collection_time 2013-06-03 17:00:24.0000000 +00:00
snapshot_id 178238

According to the BOL entry for System Data Collection Sets here- http://technet.microsoft.com/en-us/library/bb964725(v=sql.105).aspx -we can see what query the data collector is running. Then, referring to the BOL entries for the system catalogues the query uses, we can figure the meaning of each field–

  • dbsize — size from sys.database_files
  • logsize — size from sys.database_files
  • ftsize: — size from sys.database_files
  • reserved: — total_pages from sys.allocation_units –Total number of pages allocated or reserved by this allocation unit.
  • used: — used_pages from sys.allocation_units — Number of total pages actually in use.
  • pages: — from sys.allocation_units — Number of used pages that have In-row data, LOB data or Row-overflow data (value returned excludes internal index pages and allocation-management pages). Excludes xml_index_nodes and fulltext_catalog_map. When LOB or Row-overflow data then number of total pages actually in use (used_pages), when Heap or Clustered then In-row data pages (data_pages). In short, purely data excluding indexes.

So, the data collector query seems to align fairly closely with sp_spaceused used, showing some summary DB info (but with distinct database and log values) as well as details on the ‘allocated’ portion of the file. In fact we can translate the data collector query results into sp_spaceused results like this—

  • database_size = dbsize + logsize
  • unallocated space = dbsize – reserved
  • data = pages
  • index_size = used – pages
  • unused = reserved – used
  • reserved = reserved

…and, if we refer to the data collector data for our example database, we can see the data matched up reasonably well (give-or-take due to a number of factors including the time of the samples). Let’s see them side-by-side—

data collector query MB*   sp_spaceused MB
dbsize 464200 database_size 570473
logsize 106273 unallocated space 30903.91
ftsize 0 data 255669.368
reserved 433296.52 index_size 185424.8
used 430756.48 unused 2601.024
pages 249677.59 reserved 443695.192

*converted from pages

Climbing back out of the rabbit hole:

Back to the report – it would seem likely that the report is meant to show a version of sp_spaceused but somewhere the wires were crossed and terminology confused. We know from this investigation that the data collector and the MWD table it populates are A.O.K- the data there is good. The trouble begins with the query the report issues against the MDW database. Here is a version of that query as it could be – returning from snapshots.disk_usage the top most recent 100 records for collection_time, database_size_mb, reserved_mb, unallocated_space_mb and index_mb as well as the missing piece – data_mb

DECLARE @ServerName varchar(max)
DECLARE @SelectedDatabaseName varchar(max)
DECLARE @snapshot_id int

SELECT TOP 1 @snapshot_id = snapshot_id FROM (

SELECT DISTINCT TOP 100 d.snapshot_id
FROM snapshots.disk_usage d, core.snapshots ss
WHERE ss.instance_name = @ServerName
AND ss.snapshot_id = d.snapshot_id
ORDER BY d.snapshot_id DESC
) AS q
ORDER BY snapshot_id ASC
SELECT
database_name,
CONVERT (datetime, SWITCHOFFSET (CAST (d.collection_time AS datetimeoffset), '+00:00')) AS collection_time,
d.snapshot_id,
((convert(dec (15,2),d.dbsize) + convert(dec(15,2),d.logsize)) * 8192 / 1048576.0) AS 'database_size_mb',
'reserved_mb' = (d.reservedpages * 8192 / 1048576.0),
'data_mb' = convert(dec(15,2),d.pages) * 8192 / 1048576.0,
'index_mb'= (d.usedpages - d.pages) * 8192 / 1048576.0,
'unused_mb' = ((convert(dec (15,2),d.reservedpages) - convert(dec(15,2),d.usedpages)) * 8192 / 1048576.0),
'unallocated_space_mb' = (
case
when d.dbsize >= d.reservedpages
then (convert (dec (15,2),d.dbsize) - convert (dec (15,2),d.reservedpages)) * 8192 / 1048576.0
else 0
end)
FROM snapshots.disk_usage d, core.snapshots ss
WHERE database_name =@SelectedDatabaseName
AND d.snapshot_id >= @snapshot_id
AND ss.instance_name = @ServerName
AND d.snapshot_id = ss.snapshot_id
ORDER BY collection_time asc

Using this data the report can be corrected to show the trends of disk use for a given database. We can put the data in excel and create a graph similar to the SSRS report that actually correlates with the observed data file size—

excel graph

Unfortunately, it does not look like Microsoft will be fixing this report anytime soon (in fact SQL2012 still has some of these issues) and – as mentioned – there is no way to modify and correct the RDL code. That being said, the underlying data in the MDW database (in this case and as a whole) is correct, clean, and includes many interesting metrics. Given some of the limitations (or misinformation in this case) of the SSRS MDW reports it is worth taking a look into alternate ways to collect and deliver the MDW data. I hope to take a look at this idea in future posts.


2 thoughts on “Chasing Down Errors in the Management Data Warehouse Disk Usage Collection Set Report

    1. admin Post author

      Hi Theo, we are looking for database size here, so if the database size is >= the reserved page count (d.dbsize >= d.reservedpages) then this converts the pages to MB.
      BTW, this part of the query is MS code, I tweaked some of it to work, but this particular part is theirs (as explained in the post). HTH, thanks for reading.

      Reply

Leave a Reply

Your email address will not be published.