Blog Post

BLOB Report T-SQL Tuesday #005- Reporting

,

This month for my second submission into T-SQL Tuesday, I decided to go with something that provides some insight into the data that is being stored and the impact it may have on storage.  If you haven’t seen it by now, I made a bone-headed mistake last week by submitting my entry one week early.  That entry was about trying to create a report that would join information between some DMVs and the Default Trace.  I posted an admission about the mistake as well.  That brings us to the current post.

This month T-SQL Tuesday (brainchild of Adam Machanic (blog | Twitter) is being hosted by Aaron Nelson, and the topic is reporting.  It looks like there are a lot of submissions this month, and I am a little behind at this point.  I hope this is useful to somebody.  I know from reading the submission of Brad Schulz, I knew of some immediate uses for his handy little report.

Background

This new idea was born out of necessity for me about a year ago.  I was having a hard time finding why my primary file group was still so large after moving all of the User objects out of that filegroup and into new filegroups.  I couldn’t find the tying factor – until I checked for BLOBs that were created in the Primary Filegroup by default.  To move them was a manual process and is a different topic from the crux of this one.  Since the initial report to find the BLOBs, I have evolved it somewhat to try and find the size of each object, as well as to report on the usage related with each.  To do this, I needed to use a method I posted about in a different post about finding some Index Information.

Pre-Req

This is a simple requirement for later consumption by the reporting queries.  In favor of time, I am creating a populating a staging table that can be disposed of later, after the queries have been executed and the necessary information has been gleaned from them.

CREATE TABLE #indstats (
indstatsid INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
,database_id BIGINT
,index_id BIGINT
,IndexSizeMB DECIMAL(16,1)
)
 
INSERT INTO #indstats (database_id,index_id,IndexSizeMB)
SELECT database_id,index_id
,CONVERT(DECIMAL(16,1)
,(SUM(ps.avg_record_size_in_bytes * ps.record_count) / (1024.0 * 1024))) AS IndexSizeMB
FROM sys.dm_db_index_physical_stats(DB_ID(),null,NULL,NULL,'DETAILED') ps
GROUP BY database_id,index_id
;

Base Query

To get things rolling I want to show what the initial query looked like.  The only point here was to show what objects held BLOB data.

SELECT FileGroupName = FILEGROUP_NAME(a.data_space_id)
,TableName = OBJECT_NAME(p.OBJECT_ID)
,IndexName = i.name
FROM sys.allocation_units a
INNER Join sys.partitions p
ON p.partition_id = a.container_id
And a.type = 2--LOB data is stored in pages of type Text/Image
LEFT Outer Join sys.indexes i
ON i.OBJECT_ID = p.OBJECT_ID
And i.index_id = p.index_id
WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
--And a.data_pages > 0
--And filegroup_name(a.data_space_id) = 'Primary'
UNION
SELECT FileGroupName = FILEGROUP_NAME(a.data_space_id)
,TableName = OBJECT_NAME(p.OBJECT_ID)
,IndexName = i.name
FROM sys.allocation_units a
INNER Join sys.partitions p
ON p.hobt_id = a.container_id
And a.type = 3--Overflow data is stored in pages of type Text/Image
LEFT Outer Join sys.indexes i
ON i.OBJECT_ID = p.OBJECT_ID
And i.index_id = p.index_id
WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
--And filegroup_name(a.data_space_id) = 'Primary'
--And a.data_pages > 0
ORDER BY TableName ASC

The query requires a union select in order to find all of the BLOBS involved.  Depending on whether it is LOB or it is OverFlow_Data, the query will be different and needs to account for both types.  The views relevant to finding this information and being queried are sys.partitions and sys.allocation_units.  This query was adequate for the initial needs in determining what was causing the bloat in my primary filegroup.  This is why it is also the base query.  Now, I would like to build on it and find more information concerning these BLOBs.

Up a Step

Using the same views as the base query and just expanding the results returned, I can calculate an approximate size for each BLOB.  This additional information is more useful to me.  This gives me a greater insight into my database objects.  The changes are quite simple, just to iterate, since the pertinent information is obtainable from the present structure of the query.

Step-Up   
SELECT FileGroupName = FILEGROUP_NAME(a.data_space_id)
,TableName = OBJECT_NAME(p.OBJECT_ID)
,IndexName = i.name
,LOBUsedPages = a.used_pages
,LOBTotalPages = a.total_pages
,LOBDataSizeMB = a.used_pages * 8/1024
FROM sys.allocation_units a
INNER Join sys.partitions p
ON p.partition_id = a.container_id
And a.type = 2--LOB data is stored in pages of type Text/Image
LEFT Outer Join sys.indexes i
ON i.OBJECT_ID = p.OBJECT_ID
And i.index_id = p.index_id
WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
--And a.data_pages > 0
--And filegroup_name(a.data_space_id) = 'Primary'
UNION
SELECT FileGroupName = FILEGROUP_NAME(a.data_space_id)
,TableName = OBJECT_NAME(p.OBJECT_ID)
,IndexName = i.name
,LOBUsedPages = a.used_pages
,LOBTotalPages = a.total_pages
,LOBDataSizeMB = a.used_pages * 8/1024
FROM sys.allocation_units a
INNER Join sys.partitions p
ON p.hobt_id = a.container_id
And a.type = 3--Overflow data is stored in pages of type Text/Image
LEFT Outer Join sys.indexes i
ON i.OBJECT_ID = p.OBJECT_ID
And i.index_id = p.index_id
WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
--And filegroup_name(a.data_space_id) = 'Primary'
--And a.data_pages > 0
ORDER BY TableName ASC

Besides the additional fields pulled into the query, you can see that I also left a couple of comment lines in the query.  This is for informational purposes.  Sometimes, I like to filter out the BLOBs by specific filegroup or by whether or not they have any data in them.

Taking it to the Xtreme

OK, cheesy subheading but it has been getting ingrained in my head from Tony Horton.  This is not an advertisement for that product, but it is kinda like how I feel about the next query.  In this query I will be querying that table that was pre-populated as well as adding a view.  The view that has been added is sys.dm_db_index_usage_stats.  I am also querying these via a Left Outer Join.  I want to return the additional reporting fields for the BLOB related items even if there is not index statistics associated with it.  With these additional objects added to the base query, I have also added a handful of fields to return in the report.

And now the beef of the query.

SELECT FileGroupName = FILEGROUP_NAME(a.data_space_id)
,TableName = OBJECT_NAME(p.OBJECT_ID)
,IndexName = i.name
,LOBUsedPages = a.used_pages
,LOBTotalPages = a.total_pages
,LOBDataSizeMB = a.used_pages * 8/1024
, ps.IndexSizeMB
, (us.user_seeks + us.user_scans + us.user_lookups) AS UserRequests
, (us.user_updates) AS UserUpdates
, us.last_user_update AS LastUpdate
, CAST(us.user_seeks + us.user_scans + us.user_lookups AS REAL)
/ CAST(CASE us.user_updates WHEN 0 THEN 1 ELSE us.user_updates END AS REAL) AS RatioRequestsToUpdates
,a.type_desc AS AllocUnitType
FROM sys.allocation_units a
INNER Join sys.partitions p
ON p.partition_id = a.container_id
And a.type = 2--LOB data is stored in pages of type Text/Image
LEFT Outer Join sys.dm_db_index_usage_stats us
ON us.OBJECT_ID = p.OBJECT_ID
And us.index_id = p.index_id
And us.database_id = DB_ID()
LEFT Outer Join #indstats ps
ON us.index_id = ps.index_id
And us.database_id = ps.database_id
And us.OBJECT_ID = ps.OBJECT_ID
LEFT Outer Join sys.indexes i
ON i.OBJECT_ID = p.OBJECT_ID
And i.index_id = p.index_id
WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
--And a.data_pages > 0
--And filegroup_name(a.data_space_id) = 'Primary'
UNION
SELECT FileGroupName = FILEGROUP_NAME(a.data_space_id)
,TableName = OBJECT_NAME(p.OBJECT_ID)
,IndexName = i.name
,LOBUsedPages = a.used_pages
,LOBTotalPages = a.total_pages
,LOBDataSizeMB = a.used_pages * 8/1024
, ps.IndexSizeMB
, (us.user_seeks + us.user_scans + us.user_lookups) AS UserRequests
, (us.user_updates) AS UserUpdates
, us.last_user_update AS LastUpdate
, CAST(us.user_seeks + us.user_scans + us.user_lookups AS REAL)
/ CAST(CASE us.user_updates WHEN 0 THEN 1 ELSE us.user_updates END AS REAL) AS RatioRequestsToUpdates
,a.type_desc AS AllocUnitType
FROM sys.allocation_units a
INNER Join sys.partitions p
ON p.hobt_id = a.container_id
And a.type = 3--Overflow data is stored in pages of type Text/Image
LEFT Outer Join sys.dm_db_index_usage_stats us
ON us.OBJECT_ID = p.OBJECT_ID
And us.index_id = p.index_id
And us.database_id = DB_ID()
LEFT Outer Join #indstats ps
ON us.index_id = ps.index_id
And us.database_id = ps.database_id
And us.OBJECT_ID = ps.OBJECT_ID
LEFT Outer Join sys.indexes i
ON i.OBJECT_ID = p.OBJECT_ID
And i.index_id = p.index_id
WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
--And filegroup_name(a.data_space_id) = 'Primary'
--And a.data_pages > 0
ORDER BY TableName ASC,a.type_desc

The fields that I added to the above query are listed below.

, ps.IndexSizeMB
, (us.user_seeks + us.user_scans + us.user_lookups) AS UserRequests
, (us.user_updates) AS UserUpdates
, us.last_user_update AS LastUpdate
, CAST(us.user_seeks + us.user_scans + us.user_lookups AS REAL)
/ CAST(CASE us.user_updates WHEN 0 THEN 1 ELSE us.user_updates END AS REAL) AS RatioRequestsToUpdates
,a.type_desc AS AllocUnitType

The field that I considered omitting is the ps.IndexSizeMB.  This field reports the size of the Index and not necessarily the size of BLOB data.  I decided to keep it because it can prove useful when trying to determine where the bloat may or may not be.  I also added the type_desc field so I could determine the type of data reported in the query.  Without this field, I think it would be too easy to dismiss some of the entries as duplicates when they really should be present in the report.  The others are information on the usage of the data related to the index and BLOB data.

Conclusion

As you run these queries you will find that there are useful tidbits of information for administrative reporting purposes.  It should also be noted that running the final query will produce the same index usage info for an Index involved in the two types of BLOB data being reported.  It will not split out the UserRequests (for instance) that are performed against LOB_DATA versus ROW_OVERFLOW_DATA.  This can create a reporting problem / mis-perception if one is not aware of it first.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating