January 13, 2015 by Kenneth Fisher
I don’t think anyone’s ever been happy to get a support call at 11pm on a Saturday night. However this past Saturday that call did have a few things going for it. First of all I was able to spend the first hour or so of it watching the MidnightDBAs (b/t) weekly webcast during down moments. So why is this important? Well to be honest at 11pm I need any help I can get to stay awake and Jen and Sean are nothing if not amusing. Second it reminded me of an important piece of SQL Server that I had tucked in the back of my mind several years back. msdb.dbo.suspect_pages and the database setting Page_Verify. Why should I care? Well first of all we have some old databases (converted from SQL 2000 or older) that have page_verify set to TORN_PAGE or even NONE so getting that fixed is a to-do for this week. Second it finally gave me a good topic for this month’s T-SQL Tuesday. Our host this month is Robert Pearl (b/t) and the topic is “What is your health check plan?”
Obviously there is a lot to making sure that your SQL Server is and stays healthy. Making sure that the PAGE_VERIFY setting is set to CHECKSUM is a pretty simple, but important step. There are three possible options, NONE, TORN_PAGE and CHECKSUM. All new databases created in SQL 2005 or later have the PAGE_VERIFY setting set to CHECKSUM so why should we check? Well at least in my case we have a fair number of databases that were created using SQL 2000 (and even earlier). Fortunately it’s a pretty simple test. As with a number of database level settings we can just check sys.databases.
SELECT name, page_verify_option_desc FROM sys.databases WHERE page_verify_option_desc != 'CHECKSUM'
And then change everything that shows up to CHECKSUM.
USE [master] GO ALTER DATABASE DBName SET PAGE_VERIFY CHECKSUM WITH NO_WAIT; GO
Once you change the setting, CHECKSUMs are NOT automatically written to each page. The CHECKSUMs are only be written when the page is actually written out for some other reason. This means that until the next write action you are still unprotected. I read a recommendation in at least one place that suggested making the change and then during your regular maintenance period rebuilding all of the indexes to make sure all of the pages are written out and CHECKSUMs added.
So the next question is why CHECKSUM? In fact why not just leave page_verify to NONE? Well simply enough the page_verify setting determines what test (if any) is done to detect corruption in a page.
The options are:
- TORN_PAGE – An alternating two bit pattern is written for each 512 byte sector.
- CHECKSUM – A checksum is calculated for the whole page and written into the header of the page each time the page is written.
- NONE – No test is done.
So why CHECKSUM? Obviously it’s a much more complete test. There are still ways it could fail but it’s fairly unlikely. Certainly CHECKSUM is a lot more likely to detect corruption than TORN_PAGE. And just as obviously NONE leaves you completely exposed.
Each time a page is read the CHECKSUM (or TORN_PAGE bits) are checked against the current value. If a difference is found, then that means there is some level of corruption and an entry is added to the suspect_pages table in msdb. Note there are some caveats here. From what I read in BOL the table will only hold 1000 rows and once full no more pages are written. And even better the table does not get cleared out automatically. You must either automate the cleanout yourself, or manually delete the rows. I do want to mention that I got a quick review of this post by Robert Davis aka SQLSoldier (b/t) and in it he mentioned this about the 1000 row limit:
I would need to test, but IIRC, restoring a database or other events will remove the related entries from the table. If you have so many that you need to worry about trimming this table, you’ve got major problems.
He certainly has a good point. If you have 1000 bad pages on your instance that you haven’t already dealt with and/or cleaned up you have some serious problems. Fortunately you are going to have an automated report and/or alerts to let you know any time rows get added to the table right? So it’s going to be easy enough to delete the entries as you deal with them. Some entries aren’t really going to be that important. They were already fixed by an automated system (Mirroring) for example. Others you are going to have to deal with yourself. Remember that you can actually restore a single page from backup. Lastly remember that an occasional bad page is probably not something to worry about (once you have it fixed), if on the other hand you start to see a series of bad pages on the same instance then you might want to a) carefully check your backups and b) check your hardware. You might have a disk going bad or some other problem with your IO systems.
Hopefully you can see now why making sure page_verify is set to CHECKSUM is an important part of maintaining your system health.
- Colleen Morrow’s post PAGE_VERIFY – Checksum vs. Torn Page Detection.
- BOL’s entry Manage the suspect_pages Table.
- Paul Randall’s A SQL Server DBA myth a day: (17/30) page checksums.
- Kendra Little’s post about Why you should check your page verify settings.
- Jes Schultz Borland’s post on How to Restore a Page in SQL Server Standard and Enterprise Edition.
- Robert Davis (SQL Soldier) Days 9 and 17 of his 31 Days on Database Recovery.