Altering a column in a large table: A case study
20July 14, 2016 by Kenneth Fisher
Fair warning, this is a discussion piece. I had a task, I went through several possibilities and I’m going to share each of them and the results of those I tried. I will not be giving any specific examples or code. Feel free to give me your opinion in the comments below.
I was asked to alter a column from CHAR(2) to CHAR(4). You might not think that would be a big deal but it ended up being a big pain. To start with here are a few facts.
- The table has 1.4 billion rows.
- The table takes up 750 GB.
- The database has multiple filegroups each with multiple files. Total size a bit over 2 TB.
- The main drive the database is on is already 2 TB in size (I’m told this is the limit for our drives but I don’t know if that’s a political or technical limit). There was 300 GB free.
- The log when I started was 50 GB with the max size also set to 50 GB.
- The database is in SIMPLE recovery mode so no need to take log backups during the change to manage log size. I only have to worry about the log size of the largest single transaction.
- This is a test environment so I can play a bit, but I’m planning for the production change.
- I can get more disk space by requesting a new disk but I want to avoid the paperwork and complications of moving some of the files to the new location.
My first attempt was just a straight ALTER TABLE ALTER COLUMN. After about an hour I got back a log full error. I then tried a 200 GB log and a 350 GB log. These failed at 3 and 5 hours. While this was going on I checked on #sqlhelp to see if anyone knew any way to minimize the log useage so my command would finish.
The primary suggestions were:
- Add a new column to the end of the table, populate it in batches, then remove the old column.
- Create a new table, populate it, index it, drop the old table, re-name the new table.
Here were my thoughts on the first suggestion, adding a new column.
- This should work because adding the new column is a schema change only so it will be almost instant and take no extra space.
- Once I start putting data into the column it will actually add space to the row, potentially causing page splits.
- I can fill the new column in small/medium sized batches so that each batch goes fairly quickly and doesn’t use too much log space.
- The fatal flaw with this plan (at least for me) is that this is vendor/legacy code. I can’t be certain I won’t break something by moving the column. If everything is properly coded I shouldn’t, but I can almost guarantee that not everything is.
So how about the next suggestion, adding a new table and copying the data over.
- I discounted this almost immediately because this would require at least at some point two copies of my 750 GB table. I’m trying to minimize my space usage .. not increase it.
- There are no problems with the log however. I can use a minimally logged operation or load the new table a bit at a time and avoid overloading the log.
- I need to make sure to script out everything from the original table (indexes, constraints and easiest to forget, permissions).
Well that leaves me right back where I started. Maybe extend my log to another drive temporarily and just let the log grow as large as it needs to then clean up afterwards.
Then I started looking at the table a bit closer. It turns out that ~400 GB of the space used by the table are indexes (no comments please, I didn’t design it). And that one fact was enough to give me my solution.
A variation on the second suggestion. Drop all of the indexes, create my new table, copy data, rename and re-create the indexes.
My initial attempt to copy the data was using an SSIS package (import wizard). It took aprox. 10 hours.
Next I looked at The Data Loading Performance Guide. Unfortunately even using an INSERT INTO SELECT with TF 610 into an empty table I wasn’t able to get it minimally logged. I’m still working on it but ran out of time before the production move.
In the end I did this:
- I changed the column size in the table designer. I did NOT save.
- I generated a script using the Generate Scripts option of the table designer.
This gives me a script that drops & re-creates constraints. Creates the new table, does the re-name swap, creates the new indexes, creates new permissions etc.
- Next I modified the script to drop all but the clustered index from the old table and commented out the part that moved the data from the old table to the new. I also modified the new table to include the clustered index.
- I used the first part of the script to create the new table and with the clustered index.
- I used the import/export wizard to copy the data from one table to the other. An INSERT INTO SELECT (minimally logged) would probably have been faster but I couldn’t get that to work.
- I ran the remainder of the script (dropping the old table, renaming the new one, creating constraints, permissions etc.) except for the new indexes.
- Finally I created the indexes in parallel (three wide). There were 7 indexes and they took about 1.5 hours each to create. So all told the indexes took about 4.5 hours to create.
Again this is just meant to be a discussion of what I did and how I got there. Feel free to give your own opinions in the comments.
These are always tricky “size of data” problems and the best solution is to see what the business requirements really are and if this can be solved in code better than by altering schema or monkeying around with views, synonyms, and other abstractions.
I worked on a finance app once that had decimal (14,2) everywhere…until we tried to implement it in Korea where we needed decimal (20,4)…take a look at the exchange rate to the Won and you’ll quickly see the problem. This essentially required taking all of those cols and expanding them, which is *not* a simple metadata operation, rather it rewrites the whole table.
Assuming you MUST solve this with a schema mod, which was our case too, we landed on this solution, which required near-zero downtime without the perf, log, and SSIS/Rube Goldberg Machine penalties you are incurring.
–this is done a week or so before the migration, on the live running system
1)CREATE SCHEMA migrate; CREATE TABLE migrate.table with new column definitions
2)Use something asynchronous to begin copying the data from App.table to migrate.table. Service Broker or replication work well. Even SQL Agent or a big ‘ol cursor work ok.
3)script out your GRANTs and FKs for the table
4)CREATE SCHEMA FallBack;
–Migration Day: quiesce your system and take a deep breath.
1) ALTER SCHEMA FallBack TRANSFER App.table;
2) ALTER SCHEMA App TRANSFER migrate.table;
3) reapply GRANTs
4) reapply FKS WITH NOCHECK
–your system is back. Now do this while the system is live, since this *might* take some time. Having indexes helps and this assumes your app can tolerate a missing FK for some small amount of time.
3) reapply FKs WITH CHECK
If something goes wrong you have the FallBack schema that can quickly be reverted.
Religiously this was taking under 10 seconds regardless of size of App.table.
TANSTAAFL…you are essentially making a copy of the entire table, so make sure you have disk space.
This actually isn’t that different from what I did. The main differences are that you copied the data over slowly rather than in one big chunk and that you had space to hold two full, indexed copies of the table (I didn’t).
Things certainly get more interesting when you start working with tables with multi billion rows aren’t they 🙂
I am facing the same problem, where I will change a FLOAT datatype to NUMBER(6,3) with almost the same numbers in tablesize/rows. (calculated savings: 16GB) Here’s what in my mind:
1) create final FULL and LOG backup
2) set database in SIMPLE recovery (action will take place in the weekend)
3) Create the new table
4) in a batch loop:
4.1) Delete x rows from the old table, output deleted.* into the new table
5) create additional indexes on the new table
6) drop old table
7) rename new table to its correct name
8) put database back in FULL recovery
I would suggest adding a rename of the original table at step 3a) to some TEMP name. At that point the original table will no longer exist and any User / batch Job will fail (safe!) if it attempts to add/modify rows in the original table.
We have to change an identity column from integer to big integer due to table growth. So far my successful tests take about 5 hours using the “create new table, SSIS data export, deal with indexes etc, rename tables” approach. We have the room to keep both tables in place for a while. The first of several tables requiring this is about 1 billion rows and 60GB.
I like your approach of doing a “fake” table modification in the designer to generate some of the needed scripts.
In our case after making the change in QA we found some of the app code handles the column as int, not big int, so things broke. Now I’m waiting for the code to be fixed and letting them know that in 12 months or so the database/app will stop functioning when no further records can be added to the table.
Good for you for catching the need to change early! A lot of people don’t realize they have run out of space until it’s an emergency situation. And nice planning. Good luck when the time comes. Switching over identity columns can be hard work and it sounds like you’ve got it down cold.
I am presuming that your existing Identity is positive INTs? If your APPs will handle negative ID numbers (and they won’t look “daft” on reports etc.!) one option would be to reset the IDENTITY, when it nears the max value, to restart at -1 and increment/step by -1. That will double your value pool … and buy you soem more time 🙂
I didn’t see @jeremy hughes suggestion below before I replied. Sorry about that.
Also consider submitting T-SQL against a Change Data Capture solution (solution is only needed if deletes and updates are allowed against the table whose DDL must change), using the BCP utility, and a couple of sp_renames.
@Randall – INT should take you up to over 2 billion rows on an identity column if there aren’t many gaps and you started at 0. If you start at -2 billion, you can get almost 4 billion rows. It seems like you have everything under control, but just mentioning this in case you hadn’t thought about reseeding to -2 billion.
Good suggestion but it does have a few risks. While re-seeding to a negative value can be a quick way to fix a problem it does have some potential problems. If your clustered index is on the identity then you are inserting at the front every time slowing down your inserts and causing a fair amount of fragmentation. Also if you are sorting by the id you can run into some other problems too.
hmm, never thought about going negative with those. Can’t go to zero because there have been purges and the unused numbers might run out unexpectedly.
You can actually start with 0 and increment by -1. The other option, as Jeremy said, is to start at min INT and work up to 0. But again there are some definite downsides to it.
Here is the sql agent job and procedure I use to track identities nearing max. I believe I got it here: http://sqlfool.com/2011/01/identity-columns-are-you-nearing-the-limits/
USE [master]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
— execute usp_CheckMaxIdentityValue
create PROCEDURE [dbo].[usp_CheckMaxIdentityValue] as
DECLARE @threshold DECIMAL(3, 2) = .85;
/* Create a temp table */
CREATE TABLE #identityStatus (
database_name VARCHAR(128)
,table_name VARCHAR(128)
,column_name VARCHAR(128)
,data_type VARCHAR(128)
,last_value BIGINT
,max_value BIGINT
);
/* Use an undocumented command to run a SQL statement
in each database on a server */
EXECUTE sp_msforeachdb
‘
Use [?];
Insert Into #identityStatus
Select ”?” As [database_name]
, Object_Name(id.object_id, DB_ID(”?”)) As [table_name]
, id.name As [column_name]
, t.name As [data_type]
, Cast(id.last_value As bigint) As [last_value]
, Case
When t.name = ”tinyint” Then 255
When t.name = ”smallint” Then 32767
When t.name = ”int” Then 2147483647
When t.name = ”bigint” Then 9223372036854775807
End As [max_value]
From sys.identity_columns As id
Join sys.types As t
On id.system_type_id = t.system_type_id
Where id.last_value Is Not Null’
;
/* Retrieve our results and format it all prettily */
SELECT TOP 10 database_name
,table_name
,column_name
,data_type
,last_value
,CASE
WHEN last_value = @threshold
THEN ‘warning: approaching max limit’
ELSE ‘okay’
END AS [id_status]
FROM #identityStatus
WHERE (1 – Cast(last_value AS FLOAT(4)) / max_value) * 100 IS NOT null
ORDER BY percentLeft;
/* Clean up after ourselves */
DROP TABLE #identityStatus;
GO
———JOB TO SEND OUT RESULTS
SET QUOTED_IDENTIFIER Off
DECLARE @query_result_separator char(1)= char(9)
declare @sql varchar(1500)
declare @emailbody varchar(200)
set @emailbody=’TMS Columns approaching max identity value’
–DECLARE @query_result_separator char(1)= char(9) — comma char(44)
set @sql=’execute usp_CheckMaxIdentityValue’
EXEC msdb.dbo.sp_send_dbmail
@profile_name = ‘yourmailprofile’,
@recipients =’john.smith@yada.com’,
@query = @SQL,
@body=@emailbody,
@query_result_header = 1, — 1 means include column headers
@body_format = ‘text’,
@query_result_no_padding = 1,
@append_query_error = 0,
@exclude_query_output = 0,
@subject = ‘MY Columns approaching max identity value’,
@query_result_separator = @query_result_separator,
@query_result_width=1500,
@attach_query_result_as_file = 1,
@query_attachment_filename = ‘MyIdentityColumns.csv’
I’ve seen a few similar scripts. It’s still really awesome that you actually applied it. I don’t think many people do.
Once I realized not only our risk but also the repercussions, getting a daily email was a no-brainer. The hardest thing ( about DBA work in general ) is getting people in DEV, QA and the business side to factor in physical realities. Yes, if the max identity is reached, your app won’t work and your customers will call and revenue may go down. Same for disk space — you can’t put together a project plan without considering these things. What I’m pushing for now is a DEV/QA hurdle or gate. Since we can never develop anything even close to a prod load in QA, my gate would be: if it doesn’t run blazingly fast in QA, it doesn’t go to prod, period. Most of our SQL is C# .NET ORM-generated ( almost no procs ) so I can work on indexes and such, but the queries are on DEV. Actually, in general, we’ve dealt with performance with hardware dollars. Now at 512 GB of memory and 64 cores. Soon going to most of our 5TB of data on enterprise grade flash drives.
( side note: have you tried the sql 2014 upgrade advisor? for me it either fails with “network path could not be found” or reports “success, no issues” with no other information whatsoever. and the log says things like this “VERSION mismatch – Rule AUTOGROW does not support scanning SQL Server instance with version 11.00.5582.
07/15/2016 11:09:30SKU mismatch – Rule AUTOGROW does not support scanning SQL Server instance with edition Enterprise Edition (64-bit).” )
Extended Event job filtering on “deprecation_final_support” might catch some things if only upgrading one-notch (and perhaps also include “deprecation_announcement” for things more distant into the future)
Couple questions about this scenario… are you using partitioning on the table? also, is the 750GB the size of clustered and non-clustered indexes or just the clustered index?
No partitioning and that’s the total of all indexes. CI was ~350 if I remember correctly. That’s why I was able to drop the NCIs and make room for a second copy of the table. Then drop the original and add the indexes back to the copy.
[…] Kenneth Fisher wrote a great post about his experience with altering a column in a 750GB table in his article – Altering a column in a large table: a case study. […]