June 3, 2013 by Kenneth Fisher
I was asked today to take a table with a social security column and put all 0s if the column had any letters in it. This is the typical social security column that actually has more rows with ‘XXX-XX-XXXX’ than it does valid SSNs. There are even cases that look like ‘1234-AB-56TS’ etc.
One option, if I was working in SQL 2012 (which I am very much looking forward to), is to use the new functions TRY_CAST and TRY_CONVERT like so.
SELECT CASE WHEN TRY_CONVERT(int, REPLACE(SSN_Col,'-','')) IS NULL THEN '000000000' ELSE SSN_Col END AS Tested_SSN
However I’m not working in SQL 2012 so my best bet is a pattern match. Going on the basis that there are some people who haven’t worked with pattern matching beyond LIKE ‘%TestString%’ I’m going to explain each pattern as I go along. For those that are more familiar with pattern matching please feel free to skim these bits.
I’ve seen patterns for SSNs like this:
SSN_Col LIKE '[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]'
By putting s around one or more values you can test a single character for the pattern within the s. So [a19c] tests to see if a single character is an a, 1, 9 or c. To test against a range you use a dash. So [0-9] tests a single character for a number from 0 to 9.
Now this has some obvious advantages over the TRY_CAST/CONVERT method. It’s a little bit more complicated but you get to put in an exact pattern. 123-45-6789. Numbers where you want them, dashes where you want them. However in my particular case my data isn’t all that good so I’m not sure if I have 1, 2 or no dashes.
So I can’t use as exact a pattern, but I could do this.
REPLACE(SSN_Col,'-','') LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
Now it doesn’t matter if I have dashes or not, I’ve stripped them out. I’m now testing that I have exactly 9 numbers. Absolutely perfect for testing against an SSN with uncertain delimiters. Unfortunately because of the REPLACE around the column any indexes on the SSN_Col table can’t be used.
So what’s another option? What I ended up with is going the negative approach. I want to find any character that ISN’T in the list. In pattern matching a ^ is used for just that. First we construct a pattern to find a given character. This is fairly basic and I think most DBAs have done something along these lines.
SSN_Col LIKE '%A%'
Now that is done we add the test for the existence of a number.
SSN_Col LIKE '%[0-9]%'
And last but least we test for the existence of any character that is NOT a number.
SSN_Col LIKE '%[^0-9]%'
And if you feel like allowing some symbols, for example the dash from above and maybe an @ sign then do this.
SSN_Col LIKE '%[^0-9-@]%'
In case I just confused anyone, the second dash is fine because it isn’t being used as a range. You can also add in an escape character if you want.
SSN_Col LIKE '%[^0-9\-@]%' ESCAPE '\'
The ESCAPE clause of LIKE lets us set the escape character for this string.
As you can see pattern matching can get fairly complex. These are just a few samples of what pattern searching can do. To see the rest of the pattern matching options check out the BOL entry for Like (http://msdn.microsoft.com/en-us/library/ms179859.aspx) and Pattern Matching in Search Conditions(http://msdn.microsoft.com/en-us/library/ms187489(v=sql.105).aspx).