Calculating the length of a string (Len vs DataLength)

3

April 14, 2013 by Kenneth Fisher

This one is basic but still something that can catch you unawares if you aren’t careful. Most DBAs and developers I know use the function LEN to calculate the length of a string. Little wonder since that is exactly what this function is for. However not everyone realizes the small gotcha with this very basic function. It doesn’t count trailing spaces!

 PRINT LEN('123')
PRINT LEN('123   ') -- three trailing spaces
-----------------------------------
3
3

Once a developer discovers this, the next most obvious question is “How do I get it to count the trailing space?” Well the easiest way I’ve found is to add a character to the end of the string, then subtract one. Looks a little goofy but it works.

PRINT LEN('123'+'.')-1
PRINT LEN('123   '+'.')-1 -- three trailing spaces
-----------------------------------
3
6 

Then comes the function DataLength. This function also returns the length of a string. Or does it? Technically no, it doesn’t. It returns the number of bytes used to represent an expression, or in this particular case a string.

 PRINT DATALENGTH('123')
PRINT DATALENGTH('123   ') -- three trailing spaces 
-----------------------------------
3
6 

The results of this are exactly what you probably expected. 3 and 6. So why not use DataLength instead of Len?

 PRINT DATALENGTH(N'123')
PRINT DATALENGTH(N'123   ') -- three trailing spaces 
-----------------------------------
6
12 

For those of you who don’t know the N at the beginning of a string means that the string is to be Unicode. And Unicode strings take up 2 bytes for every character. Not 1. So now DataLength is giving us double the length of the string. Not exactly what we were looking for. We could divide by two, but only if we know for sure the string is going to be Unicode.

So to recap. If you are uncertain about the data type of your string then you use LEN. If you care about trailing spaces use LEN(string+’.’)-1, if not just use LEN() on its own. Now if you are certain about the data type of your string then you can use DataLength. Unicode is DataLength(string)/2 and Non-Unicode is just DataLength().

As a side note DataLength works on any datatype. Give it a shot with some numeric, date, datetime etc values. It can be fairly interesting.

3 thoughts on “Calculating the length of a string (Len vs DataLength)

  1. alexander says:

    this does’nt seem to be the case with some languages, take python 3.3 for example,
    >>> len(‘123’)
    3
    >>> len(‘123 ‘) # three white spaces
    6

  2. […] how about using DATALENGTH? Well I’ve talked about the difference between LEN and DATALENGTH before so I’m not going to go into this in depth. But suffice it to say that DATALENGTH gives […]

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 6,758 other subscribers

Follow me on Twitter

Archives

ToadWorld Pro of the Month November 2013