Split pipe delimited columns in TSQL

Question

what's the best way to spilt the pipe delimited columns in SQL 2017.

Input:

declare @table table (sno int,sname varchar(200),sname1 varchar(200))  
  
Insert into @Table(sno,sname,sname1)  
Values (1,'Mark|James','Dallas|Houston')

Excepted Output:

1 Mark Dallas
1 James Houston

Thank you.

Answer

Hi @Erland Sommarskog

The only reason your query is fast is related to the fact that you used temporary table and has nothing to do with your solution (your function).

In such table with such amount of data it is simply better to use two temporary table and then JOIN

I executed your query using your data but since my laptop is poor, in the PipeData table I only filled 10k rows using your query with top 10000

CREATE DATABASE [ErlandTest]  
	CONTAINMENT = NONE  
	ON  PRIMARY ( NAME = N'ErlandTest', FILENAME = N'E:\SQL_Files\ErlandTest.mdf' , SIZE = 8192KB , FILEGROWTH = 65536KB )  
	LOG ON ( NAME = N'ErlandTest_log', FILENAME = N'E:\SQL_Files\ErlandTest_log.ldf' , SIZE = 8192KB , FILEGROWTH = 65536KB )  
GO  
  
USE [ErlandTest]  
go  
  
CREATE TABLE PipeData (id          int           NOT NULL,   
                       ShipName    nvarchar(MAX) NOT NULL,   
                       ShipAddress nvarchar(MAX) NOT NULL,  
                       CONSTRAINT pk_PipeData PRIMARY KEY (id)  
)  
go  

-- THE ONLY DIFFERENT IS THAT I ADDED "top 10000"  
INSERT PipeData(id, ShipName, ShipAddress)  
   SELECT top 10000 CustomerID, string_agg(convert(nvarchar(MAX), ShipName), '|') WITHIN GROUP (ORDER BY OrderID),   
                      string_agg(convert(nvarchar(MAX), ShipAddress), '|') WITHIN GROUP (ORDER BY OrderID)  
   FROM   BigDB..BigOrders  
   GROUP  BY CustomerID  
go

----------

Next, I tested your solution on these 10k rows and it took more than 5 minutes.

I tested my solution (meaning using OPENJSON) using temporary tables, and guess what ?!?

As I expected my solution finished in 20 seconds!

Assuming I did not made a mistake in my test (and this is the expected result I had) the different is 20 seconds vs 5 minuets

So here is my query using temp table

DROP TABLE IF EXISTS #t1  
DROP TABLE IF EXISTS #t2  
DROP TABLE IF EXISTS #temp4  
go  
  
  
DECLARE @d datetime2 = sysdatetime()  
CREATE TABLE #t1 (id  int NOT NULL,  
                  pos int NOT NULL,  
                  ShipName nvarchar(60) NOT NULL,  
                  PRIMARY KEY (id, pos))  
CREATE TABLE #t2 (id  int NOT NULL,  
                  pos int NOT NULL,  
                  ShipAddress nvarchar(60) NOT NULL,  
                  PRIMARY KEY (id, pos))  
  
INSERT #t1(id, pos, ShipName)  
 SELECT t.id,[key] listpos ,[value] nstr  
 from PipeData t  
 CROSS APPLY OPENJSON ('["' + REPLACE(t.ShipName,'|','","') + '"]') AS s  
  
INSERT #t2(id, pos, ShipAddress)  
 SELECT t.id,[key] listpos ,[value] nstr  
 from PipeData t  
 CROSS APPLY OPENJSON ('["' + REPLACE(t.ShipAddress,'|','","') + '"]') AS s  
  
SELECT t1.id, t1.ShipName, t2.ShipAddress as str  
INTO   #temp4  
FROM   #t1 t1  
JOIN   #t2 t2 ON t1.id = t2.id  
             AND t1.pos = t2.pos  
  
SELECT datediff(ms, @d, sysdatetime())  
go -- finished in 20410

Answer

Hi @SQL Whisper ,

In your situation, CLR function could be the best solution. You could take Adam's CLR function and compile the code to a DLL (using csc), and deployed just that file to the server if you would like to use CLR function.

Besides, XML is also good solution, you could refer other experts' queries.

I also found one good function DelimitedSplit8k which could also work for you.

 declare @table table (sno int,sname varchar(200),sname1 varchar(200))  
      
 Insert into @Table(sno,sname,sname1)  
 Values   
  (1,'1A','1a'),   
  (2,'2A|2B','2a|2b'),   
  (3,'3A|3B|3C','3a|3b|3c'),  
  (4,'4A|4B|4C|4D','4a|4b|4c|4d'),  
  (5,'5A|5B|5C|5D|5E','5a|5b|5c|5d|5e')  
  
select a.sno,s.item sname,s1.item sname1 from @Table a  
CROSS APPLY dbo.DelimitedSplit8k(a.sname,'|') s  
CROSS APPLY dbo.DelimitedSplit8k(a.sname1, '|') AS s1  
WHERE  s.itemnumber = s1.itemnumber

Output:

sno	sname	sname1  
1	1A	1a  
2	2A	2a  
2	2B	2b  
3	3A	3a  
3	3B	3b  
3	3C	3c  
4	4A	4a  
4	4B	4b  
4	4C	4c  
4	4D	4d  
5	5A	5a  
5	5B	5b  
5	5C	5c  
5	5D	5d  
5	5E	5e

Regarding to the performance of multiple methods, you could refer Split strings the right way – or the next best way and choose the best for you.

Best regards
Melissa

If the answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Answer

Yet a solution. This one is based on a performance test I did, and where I tried the solutions posted by Yitzhak, Ronen and me with a fairly large data set. I gave all ten minutes to run before I killed them. Looking at the plans, Ronen's is probably the one that would have completed first, but in how many hours, I don't know.

While I was waiting, I wrote a new solution which completed in three minutes. With a faster string splitter the time can be reduced further. But the keys is that the two strings need to split separately, so that we can join over an index.

SQLWhisper will have to accept that the names have changed. id = sno, ShipName = sname, ShipAddress = sname1

CREATE TABLE #t1 (id  int NOT NULL,  
                  pos int NOT NULL,  
                  ShipName nvarchar(60) NOT NULL,  
                  PRIMARY KEY (id, pos))  
CREATE TABLE #t2 (id  int NOT NULL,  
                  pos int NOT NULL,  
                  ShipAddress nvarchar(60) NOT NULL,  
                  PRIMARY KEY (id, pos))  
  
INSERT #t1(id, pos, ShipName)  
  SELECT t.id, s.listpos, s.nstr  
  FROM   PipeData t  
  CROSS  APPLY iter_charlist_to_tbl(t.ShipName, '|') AS s  
  
INSERT #t2(id, pos, ShipAddress)  
  SELECT t.id, s.listpos, s.nstr  
  FROM   PipeData t  
  CROSS  APPLY iter_charlist_to_tbl(t.ShipAddress, '|') AS s  
  
SELECT t1.id, t1.ShipName + ' ' + t2.ShipAddress as str  
FROM   #t1 t1  
JOIN   #t2 t2 ON t1.id = t2.id  
             AND t1.pos = t2.pos

For the curious, my test script is here: 34739-pipedatasplt.txt. The BigDB database that I load the data from is on http://www.sommarskog.se/present/BigDB.bak. Warning! This is a 3GB download, and the full database size is 20 GB. SQL 2016 or later is needed. (This is a demo database that I had uploaded already.)

Answer

Here is a solution that does not use XML or JSON (which both can result in problems if the data includes characters with a special function in these formats):

      declare @Table table (sno int,sname varchar(200),sname1 varchar(200))

     Insert into @Table(sno,sname,sname1)
     Values (1,'Mark|James','Dallas|Houston')

SELECT t.sno, s.str + ' ' + s1.str
FROM   @Table t
CROSS  APPLY iter_charlist_to_tbl(t.sname, '|') AS s
CROSS  APPLY iter_charlist_to_tbl(t.sname1, '|') AS s1
WHERE  s.listpos = s1.listpos

You find the code for iter_charlist_to_tbl here: http://www.sommarskog.se/arrays-in-sql-2005.html#iter-list-of-strings

Answer

Hi :-)

According to your answer to my clarification question the following should solve your need (please confirm)

----------

First I create table for test with more rows and options than the opn you provided

DROP TABLE IF EXISTS T1  
GO  
CREATE table T1 (sno int,sname varchar(200),sname1 varchar(200))  
GO  
  
Insert into T1(sno,sname,sname1)  
 Values   
 (1,'1A','1a'), (1,'2A|2B','2a|2b'),   
 (1,'3A|3B|3C','3a|3b|3c'), (1,'4A|4B|4C|4D','4a|4b|4c|4d')  
GO  
  
SELECT * FROM T1  
GO

And here is the solution

;With MyCTE0 as (  
 select sno, sname,sname1  
 ,s = '["' + Replace(sname, '|', '","') + '"]'  
 ,s1 = '["' + Replace(sname1, '|', '","') + '"]'  
 from T1  
),  
MyCTE1 as (  
 select sno, s, s1  
 , k1 = t.[key], v1 = t.[value]  
 from MyCTE0  
 CROSS APPLY OPENJSON (s, N'$') t  
)  
SELECT sno, v1 , v2  
FROM MyCTE1  
CROSS APPLY (SELECT t1.[key] k2 , t1.[value] v2 FROM OPENJSON (s1, N'$') t1 where t1.[key] = MyCTE1.k1) t  
GO

Split pipe delimited columns in TSQL

7 answers

First I create table for test with more rows and options than the opn you provided

And here is the solution