Category Archives: TSQL Tues

TSQL Tuesday #67: Why extended? And why events?

 

tt150x150_thumb

Quiz

When I say Extended Events, what immediately comes to mind?

A) “Save save save at Hank Honda’s Happy Honda Days Sales Event! Now Extended through Monday!”

B) ”Although our cannoodler valves are of extremely high quality, with tight tolerances designed to last through several years of use, they may require maintenance in the event of a failure. For only 2 dollars, our one year extended warranty plan can potentially save you the cost of an expensive repair in the future.”

C) When I say Extended, you say Events.

Extended. Events! Extended. Events!

D) N.O.T.A.

If you are a SQL DBA like me, of course you know D) N.O.T.A is the correct answer. That’s because when I hear Extended Events I think of the MS SQL Server Extended Event (XE) feature. This is a TSQL Tuesday #67 post (thanks to Jes Borland (b|t) for hosting). Please bear with me as I shoe-horn my write-up, which I guess is kinda more theory than practice, into this month’s topic. I (arrg)… think it will (ggrgl)… fit …

So … why “extended”? And why “events”?

Events

Let’s start with this blurb from MSDN: “All applications have execution points that are useful both inside and outside an application. Inside the application, asynchronous processing may be enqueued using information that is collected during the initial execution of a task.” So while the SQL engine is humming along doing it’s stuff there are points within the executing code where stateful information can be branched off and made available to a monitoring application (in this case, XE).  When the executing thread hits those points… that is an event.

Just a quick review of the Extended Event feature will make it apparent that it shares many events with SQL Trace Events: SQL:Batch Completed = sql_batch_completed, Missing Join Predicate = missing_join_predicate, etc. In fact, since SQL2012 , Extended Events include ALL the events available in SQL Trace. And you might notice, as I did, that there is also a lot of overlap with the events available via DDL Triggers as well as in Event Notifications. AHA, I said to myself, and this was –for me -a big insight (YMMV), under the hood all these events are all the same thing! I was super lucky to get a chance to attend a SQL Skills class a few years back and they tell everyone in class this: If you have any SQL Server question you can email us and we will email you back. So I emailed a question to validate this idea and sure enough those guys got back to me within a day (so cool), saying – more or less – yep, under the hood XE events, SQL Trace events, DDL Trigger events, and Event Notification events … they are all pretty much the same thing.

Extended

But wait, there’s more. Much more. How about this ‘extended’ term? Well, along with the specific set information that comes along with each event, XE allows you to add on a number of ‘actions’. For example let’s take the wait_info event. The following info comprises this event: duration, opcode, signal duration, and wait type. We can build on this info by attaching ‘actions’; aka Global Fields, in general stateless in contrast to stateful, and roughly analogous to columns in SQL Trace. After the wait_info event has fired we can extend it by adding these actions to include more info like session_id, database_id, sql_text, etc.

You may have noticed the wait_info event info looks a lot like what you get back when you query sys.dm_os_wait_stats. And also, nothing at all like that is available in SQL Trace. Pick a DMV, any DMV, and there is a fair chance you’ll find that same info set (or part of it) is available in XE. So, here we have another way to think about the ‘extended’ in Extended Events – you get way more stuff than SQL Trace (I’m not sure this explanation of extended is technically correct according to MS, but it works for me). This brings us to part 2 where we will consider two more terms, but where I will not ask you to take a quiz.

Part II

Let’s talk about polling vs tracing (meaning tracing operations in general, not the specific tool\operation SQL Trace). To poll a metric on a SQL instance I might set up a job to regularly run a query and insert that data into a table. Later, I may grab that data and drop it in excel, perhaps to do some base-line analysis. So, here is a definition for polling – regularly collecting metric values for, typically, trend analysis. Tracing is different. Tracing exposes a single standout occurrence of interest or group of occurrences of interest (e.g. reads greater than 20K) for a range of time. When I collect a trace I usually identify the standouts and then I work to address the reason behind these standout occurrences. When I need to trace SQL Server, I set up a SQL Trace. Or, I should say, I used to…

Getting back to our wait stats example, before XE, to trace wait stats we are stuck with a polling methodology and that is not so simple to set up. Check out this script (the 1st one) from Glen Berry which accounts for the cumulative nature of sys.dm_os_wait_stats metrics.  Not that it’s too complex, but it’s not just select splat either. And you need a table to store the data. And you might want to purge that table regularly. And you need to schedule this query via SQL Agent. If you truly are trying to trace for interesting waits no schedule will be granular enough, there will always be a gap between snapshots. That’s where Extended Events comes in. Wait stats are a good example, but as mentioned above there are many more events available. XE opens up a lot of possibilities to trace for interesting events where we used to have to poll (or we had nothing at all). Add to that, setting up an XE session to do this tracing can be as easy as a few clicks in SSMS. So in this case the only question left to ask is…

How many waiting events do you want to collect?

A: ALL OF THEM because we can (umm, wait, no, not all of them… can we put some kind of filter on that?

Yes, yes we can. And we haven’t even mentioned all the options where we can write the results).

My Favorite Things

There you have it. A couple of my favorite things about XE – #1) aha moment regarding events and event consumers in SQL Server and #2) aspects of XE open up new troubleshooting and problem solving methods … Not too tech, hope y’all don’t mind. Just some ideas around how to think about this new tool. I imagine other TSQL-Tuesday posts will be more technical and I’m looking forward to reading those and learning. Thanks for reading!

T-SQL Tuesday #46 : Table Partitioning to Archive

TT150x150_thumb

This is my first T-SQL Tuesday post. Thanks to Rick Krueger (blog)  for hosting.

On hearing this month’s subject I googled around a bit for the ‘official’ definition of a Rube Goldberg Machine. Wikipedia  has it as  “…a deliberately over-engineered or overdone machine that performs a very simple task in a very complex fashion, usually including a chain reaction.” I guess I’m not the only one this Tuesday who ran across this video from OK Go which is an entertaining example.

If you are at all like me (read DBA nerd), you tend to see any block-like units from legos to shipping containers as somehow representing values in the all-encompassing, cosmic database schema of reality (there is one right?). So perhaps you’ll understand how the first scene of the video -the falling dominoes – as well as the ‘chain reaction’ in the wikipedia definition got me thinking about table partitioning. More specifically, a sliding-window type implementation of table partitioning I once configured to age-out\ roll-off older date-based data from an active oltp table into a sibling report table and so enforcing some segregation of conflicting query types.

Of course, there are a number of simple ways to age-out old data (e.g. TSQL insert\delete, SSIS, etc) and table partitioning may be more typically thought of as a complex solution applied to large data-sets and enterprise infrastructure. Depending on the situation, table partitioning just might be overkill … Rube Goldberg indeed. However, several features of table partitioning really may make it an neat solution regardless of the size of the data and hopefully the following examples will bring them to light.

At this point I want to mention that table partitioning is an enterprise feature. Also, thanks to Kendra Little and her post here – those ‘helper views’ are definitely helpful.

Gathering the material and building the contraption:

Example Tables:

This is our table of date-based data, product.

CREATE TABLE product (
asofdate datetime2, descr varchar(100)
)

And here we populate it with records for every minute of every hour for the last 24 hours (I chose to partition by the hour for this post but it could be any unit of time). In the real world there could be any number of records for any hour but to keep it simple for this example each hours-worth of data will have 60 rows.

DECLARE @start datetime2 = DATEADD(d,-1,SYSDATETIME())
DECLARE @inc int = 1439

INSERT INTO product
VALUES (@start,cast(CRYPT_GEN_RANDOM(50) as varchar(100)))

WHILE @inc > 0
BEGIN
INSERT INTO product
SELECT TOP 1 DATEADD(MI, @inc, asofdate),
CAST(CRYPT_GEN_RANDOM(50) as varchar(100))
FROM product order by asofdate 
SELECT @inc = @inc-1
END

As each hour goes by, the oldest data in the product table become 25 hours old. This is the data we wish to age-out and roll off to an archive table.

CREATE TABLE archive (
asofdate datetime2, descr varchar(100)
)

Partition Functions:

Next, we need two partition functions, one for each table.

For the product table we start with a single range value 25 hours prior to the last date in the product table to set an initial boundary.

DECLARE @StartHour datetime2 = DATEADD(hh,-25,(SELECT MAX(asofdate) FROM product));
CREATE PARTITION FUNCTION product_pf (datetime2)
AS RANGE RIGHT FOR VALUES (@StartHour);
GO

I find visualizing the partition ranges is very helpful — here is simple text diagram to illustrate the range values of the partition function at this point. The bar represents the boundary (aka range value) and the number represents the the hours since the newest value in the product table, i.e the hours-old.

older than 25 | 25 and newer

Next we loop a split operation to create range values (i.e. boundaries) for each hour of the day to build the partiton function into something like this …

older | 25 | 24 | 23 | … | 2 | 1 | newer

DECLARE @StartHour datetime2 = DATEADD(hh,-24,(SELECT MAX(asofdate) FROM product));
DECLARE @i int =0
WHILE @i<>25
BEGIN
ALTER PARTITION FUNCTION product_PF() 
SPLIT RANGE (DATEADD(hh,@i,@StartHour));
SET @i=@i+1
END

We can examine our partition function to be sure it has the range values we want …

SELECT pf.name, rv.boundary_id, rv.value, 
DATEDIFF(hh,convert(datetime2,rv.value),(SELECT MAX(asofdate) FROM product)) AS 'HoursOld'
FROM sys.partition_range_values rv
JOIN sys.partition_functions pf 
ON rv.function_id = pf.function_id
WHERE pf.name = 'product_PF'

….and we can add another – or another few – as needed. There is no harm, and in fact you are building in a margin of safety for later split operations, if you create extra boundaries on the leading end.

older | 25 | 24 | 23 | … | 2 | 1 | 0 | -1 | -2 | newer

--find the max range value 
DECLARE @maxpartbound datetime2
SELECT @maxpartbound = CAST(MAX(value) as datetime2)
FROM sys.partition_range_values rv
JOIN sys.partition_functions pf 
ON rv.function_id = pf.function_id
WHERE pf.name = 'product_pf'

--add a hour to the max range value and split off a new range
ALTER PARTITION FUNCTION product_pf() 
SPLIT RANGE (DATEADD(hh,1,@maxpartbound))

For the archive table we initially only need 2 range values to define the boundary between data which is 23 hours old (and newer), 24 hour old data, and data older than 24 hours.

older than 24 | 24 | 23 and newer

As you will see, once we apply this function to the archive table the partition housing the 24-hour-old data will always be empty since that data is still part of the product table, however it needs to be there as a place-holder when the 24-hour-old data turns 25.

DECLARE @StartHour datetime2=DATEADD(hh,-24,getdate());
CREATE PARTITION FUNCTION archive_pf (datetime2)
AS RANGE RIGHT FOR VALUES
(@StartHour);

Partition Scheme:

We require two partition schemes – one for each partition function. In short, the partition scheme maps each table partition to a file group, ideally so as to spread the IO across multiple disks. Although it might be nice, in this scenario where we are solely focusing on an archiving solution, we don’t necessarily need multiple disks, we need a simple, efficient, repeatable archive process. When this came to light for me I was very excited since it simplifies things to a large degree and makes this otherwise complex feature a Rube Goldberg viable solution. The partition schemes will map all partitions to the primary filegroup.

--create the product partition scheme
CREATE PARTITION SCHEME product_ps 
AS PARTITION product_pf
ALL TO ([PRIMARY]);

--create the archive partition scheme
CREATE PARTITION SCHEME archive_ps 
AS PARTITION archive_pf
ALL TO ( [PRIMARY] );

Indexes:

Lastly we need two clustered indexes – one for each table. Building the cluster index on the table using the partition scheme effectively applies the partitioning to the table.

CREATE CLUSTERED INDEX [cix_product-asofdate]
ON product(asofdate ASC ) 
ON product_ps(asofdate)

CREATE CLUSTERED INDEX [cix_archive-asofdate] 
ON archive(asofdate ASC )
ON archive_ps(asofdate)

Tipping the domino:

As each hour passes the oldest data in the product table becomes eligible for archival. To archive that partition we’ll alter the partition functions to split off new boundaries or merge old ones and use the switch argument of the alter table command.

As a point of reference here is a diagram of the usual, between-the-hours state of partitioning in the system. This is T1.

T1

T1 – Standard configuration

At T2 the oldest data in the product table is 25 hours old and ready to be moved to the archive table. 

T2

T2 – At the turn of the hour

This is the time to initiate the archive process beginning with the switch.

Switch:

T3

Post switch

To run the switch we need a source and target partition number from the product and archive tables. We can get that via a partition function but first we need the minimum range values of the product table’s partitions. While we’re at it we can also grab the minimum range values of the archive tables partitions which we’ll use later for the merge.

--find the min archive boundry -will use it to merge later
DECLARE @MinArchiveBoundry DATETIME2(0);
SELECT @MinArchiveBoundry = CAST(MIN(rv.value) AS DATETIME2(0))
FROM sys.partition_functions  pf
JOIN sys.partition_range_values rv 
ON pf.function_id=rv.function_id
where pf.name='archive_pf'

--find the minimum range partition in product so we can pass it to the $PARTITION function
DECLARE @MinProductBoundry DATETIME2(0);
SELECT @MinProductBoundry = CAST(MIN(rv.value) AS DATETIME2(0))
FROM sys.partition_functions  pf
JOIN sys.partition_range_values rv 
ON pf.function_id=rv.function_id
where pf.name='product_pf'

--use $PARTITION function to figure the correct partition numbers based on the minimum range val in product table
declare @ProductPartitionToArchive int, @ArchivePartition int
select @ProductPartitionToArchive =$PARTITION.product_pf(@MinProductBoundry)
select @ArchivePartition =$PARTITION.archive_pf(@MinProductBoundry)

--switch details to archive
ALTER TABLE product SWITCH PARTITION @DetailPartitionToArchive  TO archive PARTITION @ArchivePartition;

 Merge:

T4

Before merge

Using the same variables we set as minimum range values during the switch we can merge the trailing partitions of each table.

--merge archive trailing partitions
ALTER PARTITION FUNCTION archive_pf() MERGE RANGE (@MinArchiveBoundry);

--merge detail tail ends
ALTER PARTITION FUNCTION detail_pf() MERGE RANGE (@MinDetailBoundry);

T5

Post merge

Split:

T6

Pre split at T6

It is expensive to split a partition containing data since it results in disk IO so we always want to split empty partitions. Since we lost a partition boundary in the merge, the last step of the archive process is to split a new partition range off the leading (and empty) end of each table. This results in an empty place holder partition and prepares the table for the next switch-merge-split. Note the ALTER PARTITION SCHEME call here which would normally set the next file group – in our case its not necessary since all partitions are on the primary file group but I have included it.

--split off some new empty future partitions
ALTER PARTITION SCHEME archive_ps NEXT USED [PRIMARY]
--find the most recent boundary
 DECLARE @MaxArchiveBoundry DATETIME2(0);
SELECT @MaxArchiveBoundry = CAST(MAX(rv.value) AS DATETIME2(0))
FROM sys.partition_functions  pf
JOIN sys.partition_range_values rv 
ON pf.function_id=rv.function_id
where pf.name='archive_pf'
--add one hour to the most recent boundary and split the partition
ALTER PARTITION FUNCTION archive_pf() 
SPLIT RANGE (DATEADD(hh,1,CAST(@MaxArchiveBoundry AS DATETIME2)))

ALTER PARTITION SCHEME product_ps NEXT USED [PRIMARY]
--find the most recent boundary
DECLARE @MaxProductBoundry DATETIME2(0);
SELECT @MaxProductBoundry = CAST(MAX(rv.value) AS DATETIME2(0))
FROM sys.partition_functions  pf
JOIN sys.partition_range_values rv 
ON pf.function_id=rv.function_id
where pf.name='product_pf'
--add one hour to the most recent boundary and split the partition
ALTER PARTITION FUNCTION product_pf() 
SPLIT RANGE (DATEADD(hh,1,CAST(@MaxProductBoundry AS DATETIME2)))

At T7 the split has been applied to both partitioned tables and we’ve come full circle back to the standard operating configuration.

T7

 

Well, there you have it…

…a walk through and working example of a simple data archive solution using table partitioning and a Rube Goldberg contraption in its own right.

If only I had just a little more time I’d put a trigger on the archive table that inserted a row to another table where the insert fires-off a service broker message that runs a procedure which fails-over a mirror session which generates an alert which cause a job to run which …..  …. a yoga-ball which turns on the coffee maker which kicks-off sp_send_dbmail and finally, finally, finally sends you an email to say…..

“Thank you for reading my post!”