January 23, 2019 by Kenneth Fisher
Of the different basic types of backups (full, differential and log) I find the differential the most interesting, and frequently the least understood. Full backups are easy. The whole database including any log information needed to make the committed transactions durable. Transaction logs aren’t much more complicated. They contain information on any transactions committed since the last transaction log. (yes, I’m simplifying on both. sue me)
What makes differentials different? They contain any extents (8 pages) that have changed since the last full backup. Sounds simple right? And it is. Ok, so if it’s simple why would it be misunderstood? Because it backs up changed extents. Not transactions. If one small piece of one row in a given extent has been changed then that whole extent will be included in the differential backup. This might seem a bit wasteful. I mean a changing a single bit means the entire extent of 64kb gets written to the backup. On the other hand, if that same small piece of information is changed 100, 1000, 10000 times that extent will still only be included in the differential backup once. (Interestingly data can and will be included multiple times in the log backup, once for each transaction where the data is changed.)
So in order to take a differential SQL has to know each extent that’s been changed. It would seem like that could be a fair amount of work if the database is of any reasonable size. In fact, SQL is pretty efficient. It keeps what’s called a DCM (differential change map) that has a flag for each extent within a fairly large amount of space. (as I understand it just shy of 4gb) By checking that one page SQL knows which extents out of that 4gb of data needs to be written to the differential. I honestly find this incredibly cool. One single page tells SQL what extents over 4gb of data need to be backed up.
The other cool bit, and I mean really really cool, is the fact that essentially the differential is an overlay of the data. Let me explain. A differential has a starting and ending LSN (log sequence number). If the current database state is anywhere between those two LSN then applying the differential will immediately bring it up to that final LSN. It makes absolutely no difference where the database is along the chain between the beginning LSN and ending, the differential brings it up to current (for the differential).
As a somewhat concrete example let’s say you have a database with a single table that looks like this (yea, I know I’m simplifying again) at 1pm on Friday.
|Kenneth Fisher||123 Smith Street|
Now, you get updates
- 1pm Friday: Full backup taken
- 2pm Friday: Address changed to 124 Smith Street
Transaction Log Backup 1
- 4pm Friday: First name changed to Bob
Transaction Log Backup 2
- 10pm Friday: Address changed to 123 Smith Street
Transaction Log Backup 3
- 2am Saturday: Last name changed to Smith
Transaction Log Backup 4
- 10am Saturday: Address changed to 123 Fisher Street
Transaction Log Backup 5
- 3pm Saturday: Differential taken
Once we’ve restored the FULL you can either apply a bunch of transaction log backups, and each is going to basically run every transaction to that point against the database, or you can apply the differential. Or, in fact you can apply some of the transaction logs and then the differential. As long as the database isn’t marked recovered, it’s going to just overwrite whatever is there with the final value.
|Bob Smith||123 Fisher Street|
Completely skipping any intervening transactions or values.
And the last couple of things I want to point out:
- A differential file could get as large as the full backup
but no larger(I was WRONG), although it probably won’t.
- Differential backups are available on all recovery types. Full, Bulk and Simple.