Split pipe delimited columns in TSQL

SQL Whisper 1 Reputation point
2020-10-25T02:00:56.887+00:00

what's the best way to spilt the pipe delimited columns in SQL 2017.

Input:

declare @table table (sno int,sname varchar(200),sname1 varchar(200))  
  
Insert into @Table(sno,sname,sname1)  
Values (1,'Mark|James','Dallas|Houston')  

Excepted Output:

1 Mark Dallas
1 James Houston

Thank you.

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,552 questions
{count} votes

7 answers

Sort by: Newest
  1. Ronen Ariely 15,096 Reputation points
    2020-10-26T03:38:18.067+00:00

    Hi @Erland Sommarskog

    The only reason your query is fast is related to the fact that you used temporary table and has nothing to do with your solution (your function).

    In such table with such amount of data it is simply better to use two temporary table and then JOIN

    I executed your query using your data but since my laptop is poor, in the PipeData table I only filled 10k rows using your query with top 10000

    CREATE DATABASE [ErlandTest]  
    	CONTAINMENT = NONE  
    	ON  PRIMARY ( NAME = N'ErlandTest', FILENAME = N'E:\SQL_Files\ErlandTest.mdf' , SIZE = 8192KB , FILEGROWTH = 65536KB )  
    	LOG ON ( NAME = N'ErlandTest_log', FILENAME = N'E:\SQL_Files\ErlandTest_log.ldf' , SIZE = 8192KB , FILEGROWTH = 65536KB )  
    GO  
      
    USE [ErlandTest]  
    go  
      
    CREATE TABLE PipeData (id          int           NOT NULL,   
                           ShipName    nvarchar(MAX) NOT NULL,   
                           ShipAddress nvarchar(MAX) NOT NULL,  
                           CONSTRAINT pk_PipeData PRIMARY KEY (id)  
    )  
    go  
    
    -- THE ONLY DIFFERENT IS THAT I ADDED "top 10000"  
    INSERT PipeData(id, ShipName, ShipAddress)  
       SELECT top 10000 CustomerID, string_agg(convert(nvarchar(MAX), ShipName), '|') WITHIN GROUP (ORDER BY OrderID),   
                          string_agg(convert(nvarchar(MAX), ShipAddress), '|') WITHIN GROUP (ORDER BY OrderID)  
       FROM   BigDB..BigOrders  
       GROUP  BY CustomerID  
    go  
    

    ----------

    Next, I tested your solution on these 10k rows and it took more than 5 minutes.

    I tested my solution (meaning using OPENJSON) using temporary tables, and guess what ?!?

    As I expected my solution finished in 20 seconds!

    Assuming I did not made a mistake in my test (and this is the expected result I had) the different is 20 seconds vs 5 minuets

    So here is my query using temp table

    DROP TABLE IF EXISTS #t1  
    DROP TABLE IF EXISTS #t2  
    DROP TABLE IF EXISTS #temp4  
    go  
      
      
    DECLARE @d datetime2 = sysdatetime()  
    CREATE TABLE #t1 (id  int NOT NULL,  
                      pos int NOT NULL,  
                      ShipName nvarchar(60) NOT NULL,  
                      PRIMARY KEY (id, pos))  
    CREATE TABLE #t2 (id  int NOT NULL,  
                      pos int NOT NULL,  
                      ShipAddress nvarchar(60) NOT NULL,  
                      PRIMARY KEY (id, pos))  
      
    INSERT #t1(id, pos, ShipName)  
     SELECT t.id,[key] listpos ,[value] nstr  
     from PipeData t  
     CROSS APPLY OPENJSON ('["' + REPLACE(t.ShipName,'|','","') + '"]') AS s  
      
    INSERT #t2(id, pos, ShipAddress)  
     SELECT t.id,[key] listpos ,[value] nstr  
     from PipeData t  
     CROSS APPLY OPENJSON ('["' + REPLACE(t.ShipAddress,'|','","') + '"]') AS s  
      
    SELECT t1.id, t1.ShipName, t2.ShipAddress as str  
    INTO   #temp4  
    FROM   #t1 t1  
    JOIN   #t2 t2 ON t1.id = t2.id  
                 AND t1.pos = t2.pos  
      
    SELECT datediff(ms, @d, sysdatetime())  
    go -- finished in 20410  
    

  2. MelissaMa-MSFT 24,176 Reputation points
    2020-10-26T02:11:45.667+00:00

    Hi @SQL Whisper ,

    In your situation, CLR function could be the best solution. You could take Adam's CLR function and compile the code to a DLL (using csc), and deployed just that file to the server if you would like to use CLR function.

    Besides, XML is also good solution, you could refer other experts' queries.

    I also found one good function DelimitedSplit8k which could also work for you.

     declare @table table (sno int,sname varchar(200),sname1 varchar(200))  
          
     Insert into @Table(sno,sname,sname1)  
     Values   
      (1,'1A','1a'),   
      (2,'2A|2B','2a|2b'),   
      (3,'3A|3B|3C','3a|3b|3c'),  
      (4,'4A|4B|4C|4D','4a|4b|4c|4d'),  
      (5,'5A|5B|5C|5D|5E','5a|5b|5c|5d|5e')  
      
    select a.sno,s.item sname,s1.item sname1 from @Table a  
    CROSS APPLY dbo.DelimitedSplit8k(a.sname,'|') s  
    CROSS APPLY dbo.DelimitedSplit8k(a.sname1, '|') AS s1  
    WHERE  s.itemnumber = s1.itemnumber  
    

    Output:

    sno	sname	sname1  
    1	1A	1a  
    2	2A	2a  
    2	2B	2b  
    3	3A	3a  
    3	3B	3b  
    3	3C	3c  
    4	4A	4a  
    4	4B	4b  
    4	4C	4c  
    4	4D	4d  
    5	5A	5a  
    5	5B	5b  
    5	5C	5c  
    5	5D	5d  
    5	5E	5e  
    

    Regarding to the performance of multiple methods, you could refer Split strings the right way – or the next best way and choose the best for you.

    Best regards
    Melissa


    If the answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

  3. Erland Sommarskog 101.1K Reputation points MVP
    2020-10-25T22:45:29.927+00:00

    Yet a solution. This one is based on a performance test I did, and where I tried the solutions posted by Yitzhak, Ronen and me with a fairly large data set. I gave all ten minutes to run before I killed them. Looking at the plans, Ronen's is probably the one that would have completed first, but in how many hours, I don't know.

    While I was waiting, I wrote a new solution which completed in three minutes. With a faster string splitter the time can be reduced further. But the keys is that the two strings need to split separately, so that we can join over an index.

    SQLWhisper will have to accept that the names have changed. id = sno, ShipName = sname, ShipAddress = sname1

    CREATE TABLE #t1 (id  int NOT NULL,  
                      pos int NOT NULL,  
                      ShipName nvarchar(60) NOT NULL,  
                      PRIMARY KEY (id, pos))  
    CREATE TABLE #t2 (id  int NOT NULL,  
                      pos int NOT NULL,  
                      ShipAddress nvarchar(60) NOT NULL,  
                      PRIMARY KEY (id, pos))  
      
    INSERT #t1(id, pos, ShipName)  
      SELECT t.id, s.listpos, s.nstr  
      FROM   PipeData t  
      CROSS  APPLY iter_charlist_to_tbl(t.ShipName, '|') AS s  
      
    INSERT #t2(id, pos, ShipAddress)  
      SELECT t.id, s.listpos, s.nstr  
      FROM   PipeData t  
      CROSS  APPLY iter_charlist_to_tbl(t.ShipAddress, '|') AS s  
      
    SELECT t1.id, t1.ShipName + ' ' + t2.ShipAddress as str  
    FROM   #t1 t1  
    JOIN   #t2 t2 ON t1.id = t2.id  
                 AND t1.pos = t2.pos  
    
     
    

    For the curious, my test script is here: 34739-pipedatasplt.txt. The BigDB database that I load the data from is on http://www.sommarskog.se/present/BigDB.bak. Warning! This is a 3GB download, and the full database size is 20 GB. SQL 2016 or later is needed. (This is a demo database that I had uploaded already.)


  4. Erland Sommarskog 101.1K Reputation points MVP
    2020-10-25T08:32:08.973+00:00

    Here is a solution that does not use XML or JSON (which both can result in problems if the data includes characters with a special function in these formats):

          declare @Table table (sno int,sname varchar(200),sname1 varchar(200))
    
         Insert into @Table(sno,sname,sname1)
         Values (1,'Mark|James','Dallas|Houston')
    
    SELECT t.sno, s.str + ' ' + s1.str
    FROM   @Table t
    CROSS  APPLY iter_charlist_to_tbl(t.sname, '|') AS s
    CROSS  APPLY iter_charlist_to_tbl(t.sname1, '|') AS s1
    WHERE  s.listpos = s1.listpos
    

    You find the code for iter_charlist_to_tbl here: http://www.sommarskog.se/arrays-in-sql-2005.html#iter-list-of-strings


  5. Ronen Ariely 15,096 Reputation points
    2020-10-25T05:28:21.463+00:00

    Hi :-)

    According to your answer to my clarification question the following should solve your need (please confirm)

    ----------

    First I create table for test with more rows and options than the opn you provided

    DROP TABLE IF EXISTS T1  
    GO  
    CREATE table T1 (sno int,sname varchar(200),sname1 varchar(200))  
    GO  
      
    Insert into T1(sno,sname,sname1)  
     Values   
     (1,'1A','1a'), (1,'2A|2B','2a|2b'),   
     (1,'3A|3B|3C','3a|3b|3c'), (1,'4A|4B|4C|4D','4a|4b|4c|4d')  
    GO  
      
    SELECT * FROM T1  
    GO  
    

    34786-image.png

    And here is the solution

    ;With MyCTE0 as (  
     select sno, sname,sname1  
     ,s = '["' + Replace(sname, '|', '","') + '"]'  
     ,s1 = '["' + Replace(sname1, '|', '","') + '"]'  
     from T1  
    ),  
    MyCTE1 as (  
     select sno, s, s1  
     , k1 = t.[key], v1 = t.[value]  
     from MyCTE0  
     CROSS APPLY OPENJSON (s, N'$') t  
    )  
    SELECT sno, v1 , v2  
    FROM MyCTE1  
    CROSS APPLY (SELECT t1.[key] k2 , t1.[value] v2 FROM OPENJSON (s1, N'$') t1 where t1.[key] = MyCTE1.k1) t  
    GO  
    

    34757-image.png

    0 comments No comments