In my environment we see an occasional situation where a system process blocks user process. To find what the system process was doing we looked to the wait_resource column in sys.dm_exec_requests to see what pages the user session was waiting on. We then took the pages numbers and looked to DBCC PAGE to determine the object. This led us to the syscommittab table – the global ‘log’ table for the change tracking feature. Of course we noticed all the blocked user process use change tracking functions so that was a clue as well…
I set out to find out what I could about the cleanup process. I’m not the first one to have problems with change tracking. Here are a couple links where others are trying to get a handle on this – mostly – black box:
These posts gave me a better understanding – especially with regards to how the side tables and the syscommittab log table hand off “delete-watermark” values to each other (i.e. versions less than a “delete-watermark” will be deleted):
Using a test database with an approx 20 million row syscommittab I was able to watch the process via extended events and came up with these observations.
- Auto-cleanup runs every 30 minutes. The initial cleanup seems to be run is 30 minutes after auto-cleanup is first turned on. Successive runs are every 30 min after the end of the previous cleanup. This means long running cleanup operations will postpone any further cleanup.
- Auto-cleanup process side tables (in descending order by side-table object id) serially. Side table rows are deleted in batches of 5000. Some cleanup process – especially for large tables – do not complete and that failure is logged in SQL Error Log as ‘ <side table blocked> ’ type messages.
- Only after all side tables have been processed (with or without the ‘side table blocked’ error) does the auto-cleanup go to work on purging syscommittab. The syscommittab cleanup works in batches of (default) 10000. The syscommittab cleanup process takes the minimum cleanup version from all the recent side-table cleanups and deletes rows from syscommittab with versions older than that minimum cleanup version. If the minimum cleanup version returned by the side table cleanup is less than the retention version (per change traking configuration retention setting) no syscommittab rows can be cleaned up. It can be a long time between syscommittab cleanups due to ‘every 30 min’ (bullet 1 above) and ‘side table blocked’ error (bullet 3 above) which inhibits the advancement of the minimum cleanup version.
- Manual side table cleanup follows the same rules as auto-cleanup – namely it will not clean any rows unless auto-cleanup is turned on and then the first cleanup cycle starts after auto-cleanup has been tuned on for 30 minutes.
- Manual syscommittab cleanup also requires auto-cleanup to be turned on. If the most recent minimum cleanup version returned by the tables is less than the retention version no syscommitab rows can safely be cleaned up and no rows will be deleted.
Due to all the points above you can see the cleanup process is less than ideal and syscommittab may not be cleaned often enough to keep up with growth. And the larger it gets the longer between cleanup cycles which leads to further un-checked growth. When (if) the cleanup does finally kick in on syscommittab, the delete process can be long-running – up to 40 minutes in test cases. During this syscommittab delete process the locks against the syscommittab table block user session using change tracking functions.
Also of interest, once a ‘delayed’ delete process does complete on a large syscommittab table there is typically a long checkpoint (5 min in tests) which has production impacts of it’s own…
Stay tuned for more on change tracking