Convert Comma Separated String to Table : 4 different approaches
Recently, I came across a piece of TSQL code that would take a comma separated string as an input and parse it to return a single column table from it.
Lets’ call this function as Split1. The code is as follows:
CREATE FUNCTION [dbo].Split1(@input AS Varchar(4000) )
RETURNS
@Result TABLE(Value BIGINT)
AS
BEGIN
DECLARE @str VARCHAR(20)
DECLARE @ind Int
IF(@input is not null)
BEGIN
SET @ind = CharIndex(',',@input)
WHILE @ind > 0
BEGIN
SET @str = SUBSTRING(@input,1,@ind-1)
SET @input = SUBSTRING(@input,@ind+1,LEN(@input)-@ind)
INSERT INTO @Result values (@str)
SET @ind = CharIndex(',',@input)
END
SET @str = @input
INSERT INTO @Result values (@str)
END
RETURN
END
This is a very old fashioned (but still effective enough) script which does a loop over a string to cut out all possible string values that are separated by a comma.
Let’s see now, how the same could be achieved in modern day TSQL languages (such as SQL 2005 or SQL 2008).
Approach 1: Common Table Expression (CTE)
Lets call this function as Split2. here we are using
CREATE FUNCTION dbo.Split2 ( @strString varchar(4000))
RETURNS @Result TABLE(Value BIGINT)
AS
begin
WITH StrCTE(start, stop) AS
(
SELECT 1, CHARINDEX(',' , @strString )
UNION ALL
SELECT stop + 1, CHARINDEX(',' ,@strString , stop + 1)
FROM StrCTE
WHERE stop > 0
)
insert into @Result
SELECT SUBSTRING(@strString , start, CASE WHEN stop > 0 THEN stop-start ELSE 4000 END) AS stringValue
FROM StrCTE
return
end
GO
Approach 2: XML (surprise)
XML could be applied to do some type of string parsing (see this) Let’s call this function as Split3.
CREATE FUNCTION dbo.Split3 ( @strString varchar(4000))
RETURNS @Result TABLE(Value BIGINT)
AS
BEGIN
DECLARE @x XML
SELECT @x = CAST('<A>'+ REPLACE(@strString,',','</A><A>')+ '</A>' AS XML)
INSERT INTO @Result
SELECT t.value('.', 'int') AS inVal
FROM @x.nodes('/A') AS x(t)
RETURN
END
GO
Approach 4: Classic TSQL Way
I got this approach from SQL Server Central site. This approach is slightly unusual but very effective. this needs you to create a table of sequential numbers called a Tally Table.
SELECT TOP 11000 --equates to more than 30 years of dates
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
Lets index the table for better performance.
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)
WITH FILLFACTOR = 100
Finally out Split4 function.
CREATE FUNCTION dbo.Split4 ( @strString varchar(4000))
RETURNS @Result TABLE(Value BIGINT)
AS
BEGIN
SET @strString = ','+@strString +','
INSERT INTO @t (Value)
SELECT SUBSTRING(@strString,N+1,CHARINDEX(',',@strString,N+1)-N-1)
FROM dbo.Tally
WHERE N < LEN(@strString)
AND SUBSTRING(@strString,N,1) = ',' --Notice how we find the comma
RETURN
END
GO
Now, what about the most crucial question: Performance. so lets put all 4 functions to test.
please note that I am running this on SQL Server 2008. you may need to modify it for SQL 2005.
DECLARE @str VARCHAR(4000) = '6,7,7,8,10,12,13,14,16,44,46,47,394,396,417,488,714,717,718,719,722,725,811,818,832,833,836,837,846,913,914,919,922,923,924,925,926,927,927,928,929,929,930,931,932,934,935,1029,1072,1187,1188,1192,1196,1197,1199,1199,1199,1199,1200,1201,1202,1203,1204,1205,1206,1207,1208,1209,1366,1367,1387,1388,1666,1759,1870,2042,2045,2163,2261,2374,2445,2550,2676,2879,2880,2881,2892,2893,2894'
SELECT * FROM dbo.split1 ( @str )
SELECT * FROM dbo.split2 ( @str )
SELECT * FROM dbo.split3 ( @str )
SELECT * FROM dbo.split3 ( @str )
On my laptop, I saw following numbers. I expected the XML to be the fastest.
so I changed the code slightly to test out XML function directly.
DECLARE @str VARCHAR(4000)
= '6,7,7,8,10,12,13,14,16,44,46,47,394,396,417,488,714,717,718,719,722,725,811,818,832,833,836,837,846,913,914,919,922,923,924,925,926,927,927,928,929,929,930,931,932,934,935,1029,1072,1187,1188,1192,1196,1197,1199,1199,1199,1199,1200,1201,1202,1203,1204,1205,1206,1207,1208,1209,1366,1367,1387,1388,1666,1759,1870,2042,2045,2163,2261,2374,2445,2550,2676,2879,2880,2881,2892,2893,2894'
Declare @x XML
select @x = cast('<A>'+ replace(@str,',','</A><A>')+ '</A>' as xml)
select t.value('.', 'int') as inVal
from @x.nodes('/A') as x(t)
Hmm. so just as I thought, XML is faster on its’ own.
there are some more options that I did not consider such as CLR functions. but we will get to those some other time.