Rewrite T-SQL Script in Azure Synapse

Lawrence Nkwana 21 Reputation points
2021-09-20T06:48:16.623+00:00

How can I re-write the below scripts that removes duplicates in Azure Synapse?This script works fine in T-SQL but I want to run it in Azure Synapse.


ALTER PROCEDURE [dbo].[sp_DeleteDuplicates](
@schemaName VARCHAR(128),
@tableName VARCHAR(128),
@displayOnly BIT = 0)
AS
BEGIN
SET NOCOUNT ON;

IF LEN(@schemaName) = 0
BEGIN
    PRINT 'You must specify the schema of the table!'
    RETURN
END

IF LEN(@tableName) = 0
BEGIN
    PRINT 'You must specify the name of the table!'
    RETURN
END

IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = @schemaName AND  TABLE_NAME = @tableName)
BEGIN
    DECLARE @pkColumnName  VARCHAR(128);
    DECLARE @columnName    VARCHAR(128);
    DECLARE @sqlCommand    VARCHAR(MAX);
    DECLARE @columnsList   VARCHAR(MAX);
    DECLARE @pkColumnsList VARCHAR(MAX);
    DECLARE @pkColumns     TABLE(pkColumn VARCHAR(128));
    DECLARE @limit         INT;

    INSERT INTO @pkColumns
    SELECT K.COLUMN_NAME
    FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS AS C
    JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS K ON C.TABLE_NAME = K.TABLE_NAME AND C.CONSTRAINT_SCHEMA = K.CONSTRAINT_SCHEMA
    WHERE C.CONSTRAINT_TYPE = 'PRIMARY KEY'
      AND C.CONSTRAINT_SCHEMA = @schemaName AND C.TABLE_NAME = @tableName

    IF((SELECT COUNT(*) FROM @pkColumns) > 0)
    BEGIN
        DECLARE pk_cursor CURSOR FOR 
        SELECT * FROM @pkColumns

        OPEN pk_cursor  
        FETCH NEXT FROM pk_cursor INTO @pkColumnName 

        WHILE @@FETCH_STATUS = 0  
        BEGIN  
            SET @pkColumnsList = CONCAT(@pkColumnsList,'',@pkColumnName,',')
            FETCH NEXT FROM pk_cursor INTO @pkColumnName 
        END 

        CLOSE pk_cursor  
        DEALLOCATE pk_cursor 

        SET @pkColumnsList = SUBSTRING(@pkColumnsList,1,LEN(@pkColumnsList)-1)
    END  

    DECLARE columns_cursor CURSOR FOR 
    SELECT COLUMN_NAME
    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_SCHEMA = @schemaName AND TABLE_NAME = @tableName AND COLUMN_NAME NOT IN (SELECT pkColumn FROM @pkColumns)
    ORDER BY ORDINAL_POSITION;

    OPEN columns_cursor  
    FETCH NEXT FROM columns_cursor INTO @columnName 

    WHILE @@FETCH_STATUS = 0  
    BEGIN  
        SET @columnsList = CONCAT(@columnsList,'',@columnName,',')
        FETCH NEXT FROM columns_cursor INTO @columnName 
    END 

    CLOSE columns_cursor  
    DEALLOCATE columns_cursor 

    SET @columnsList = SUBSTRING(@columnsList,1,LEN(@columnsList)-1)

    IF((SELECT COUNT(*) FROM @pkColumns) > 0)
    BEGIN       

    IF(CHARINDEX(',',@columnsList) = 0)
    SET @limit = LEN(@columnsList)+1
    ELSE
    SET @limit = CHARINDEX(',',@columnsList)


    SET @sqlCommand = CONCAT('WITH CTE (',@columnsList,',DuplicateCount',') 
                                AS (SELECT ',@columnsList,',',
                                             'ROW_NUMBER() OVER(PARTITION BY ',@columnsList,' ',
                                             'ORDER BY ',SUBSTRING(@columnsList,1,@limit-1),') AS DuplicateCount
                                    FROM [',@schemaName,'].[',@tableName,'])

                              ')
    IF @displayOnly = 0
    SET @sqlCommand = CONCAT(@sqlCommand,'DELETE FROM CTE WHERE DuplicateCount > 1;')
    IF @displayOnly = 1
    SET @sqlCommand = CONCAT(@sqlCommand,'SELECT ',@columnsList,',MAX(DuplicateCount) AS DuplicateCount FROM CTE WHERE DuplicateCount > 1 GROUP BY ',@columnsList)

    END
    ELSE
    BEGIN       
    SET @sqlCommand = CONCAT('WITH CTE (',@columnsList,',DuplicateCount',') 
                                AS (SELECT ',@columnsList,',',
                                             'ROW_NUMBER() OVER(PARTITION BY ',@columnsList,' ',
                                             'ORDER BY ',SUBSTRING(@columnsList,1,CHARINDEX(',',@columnsList)-1),') AS DuplicateCount
                                    FROM [',@schemaName,'].[',@tableName,'])

                             ')

    IF @displayOnly = 0
    SET @sqlCommand = CONCAT(@sqlCommand,'DELETE FROM CTE WHERE DuplicateCount > 1;')
    IF @displayOnly = 1
    SET @sqlCommand = CONCAT(@sqlCommand,'SELECT * FROM CTE WHERE DuplicateCount > 1;')

    END

    EXEC (@sqlCommand)
END
ELSE
    BEGIN
        PRINT 'Table doesnt exist within this database!'
        RETURN
    END

END

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,365 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Saurabh Sharma 23,676 Reputation points Microsoft Employee
    2021-09-20T23:46:36.86+00:00

    Hi @Lawrence Nkwana ,

    Thanks for using Microsoft Q&A !!

    As Table variables and CURSOR is not supported in Azure Synapse you can rewrite the above code by -

    1. Create a Temp Table instead of using a Table variable
    2. Replace CURSOR with WHILE Loop

    So, you could write your code something like below -

    DECLARE @schemaName VARCHAR(128),@tableName  VARCHAR(128),@displayOnly BIT = 0  
    SET @schemaName = 'dbo'  
    SET @tableName  = 't1'  
      
    IF LEN(@schemaName) = 0  
    BEGIN  
         PRINT 'You must specify the schema of the table!'  
         RETURN  
    END  
    IF LEN(@tableName) = 0  
    BEGIN  
         PRINT 'You must specify the name of the table!'  
         RETURN  
    END  
    IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = @schemaName AND  TABLE_NAME = @tableName)  
    BEGIN  
     DECLARE @pkColumnName  VARCHAR(128);  
     DECLARE @columnName    VARCHAR(128);  
     DECLARE @sqlCommand    VARCHAR(MAX);  
     DECLARE @columnsList   VARCHAR(MAX);  
     DECLARE @pkColumnsList VARCHAR(MAX);  
    
     IF OBJECT_ID('tempdb..#pkColumns') IS NOT NULL  
     BEGIN  
     DROP TABLE #pkColumns  
     END  
      
     CREATE TABLE #pkColumns   
     (  
     pkColumn VARCHAR(128)  
     )  
      
     IF OBJECT_ID('tempdb..#pkColumnsClone') IS NOT NULL  
     BEGIN  
     DROP TABLE #pkColumnsClone  
     END  
     CREATE TABLE #pkColumnsClone   
     (  
     pkColumn VARCHAR(128)  
     )  
      
     IF OBJECT_ID('tempdb..#RemainingColumns') IS NOT NULL  
     BEGIN  
     DROP TABLE #RemainingColumns  
     END  
      
     CREATE TABLE #RemainingColumns   
     (  
      ColumnName VARCHAR(128)  
     )  
      
     DECLARE @limit         INT;  
              
     INSERT INTO #pkColumns  
     SELECT K.COLUMN_NAME  
     FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS AS C  
     JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS K ON C.TABLE_NAME = K.TABLE_NAME AND C.CONSTRAINT_SCHEMA = K.CONSTRAINT_SCHEMA  
     WHERE C.CONSTRAINT_TYPE = 'PRIMARY KEY'  
     AND C.CONSTRAINT_SCHEMA = @schemaName AND C.TABLE_NAME = @tableName  
      
     INSERT INTO #pkColumnsClone  
     SELECT * FROM #pkColumns  
      
    
     DECLARE @pkColumnsCount INT = (SELECT COUNT(*) FROM #pkColumns), @i INT = 1  
      
     IF((SELECT COUNT(*) FROM #pkColumnsClone) > 0)  
     BEGIN  
     WHILE   @i <= @pkColumnsCount  
     BEGIN  
     SELECT TOP 1 @pkColumnName = pkColumn FROM #pkColumnsClone ORDER BY pkColumn  
     SET @pkColumnsList = CONCAT(@pkColumnsList,'',@pkColumnName,',')  
     SET     @i +=1;  
     DELETE FROM #pkColumnsClone WHERE pkColumn = @pkColumnName  
     END  
     SET @pkColumnsList = SUBSTRING(@pkColumnsList,1,LEN(@pkColumnsList)-1)  
     END    
      
     SELECT @pkColumnsList  
      
     INSERT INTO #RemainingColumns  
     SELECT TOP 100000 COLUMN_NAME AS ColumnName  
     FROM INFORMATION_SCHEMA.COLUMNS  
     WHERE TABLE_SCHEMA = @schemaName AND TABLE_NAME = @tableName AND COLUMN_NAME NOT IN (SELECT pkColumn FROM #pkColumns)  
     ORDER BY ORDINAL_POSITION;  
         
      
     DECLARE @ColumnsCount INT = (SELECT COUNT(*) FROM #RemainingColumns)  
     SET @i = 1  
      
     IF((SELECT COUNT(*) FROM #RemainingColumns) > 0)  
     BEGIN  
     WHILE   @i <= @ColumnsCount  
     BEGIN  
     SELECT TOP 1 @ColumnName = ColumnName FROM #RemainingColumns ORDER BY ColumnName  
     SET @columnsList = CONCAT(@columnsList,'',@columnName,',')  
     SET     @i +=1;  
     DELETE FROM #RemainingColumns WHERE ColumnName = @columnName  
     END  
     SET @columnsList = SUBSTRING(@columnsList,1,LEN(@columnsList)-1)  
     END    
      
     SELECT @columnsList   
      
     IF(@pkColumnsCount > 0)  
     BEGIN          
      IF(CHARINDEX(',',@columnsList) = 0)  
     SET @limit = LEN(@columnsList)+1  
      ELSE  
     SET @limit = CHARINDEX(',',@columnsList)  
        
      SET @sqlCommand = CONCAT('WITH CTE (',@columnsList,',DuplicateCount',')   
      AS (SELECT ',@columnsList,',',  
       'ROW_NUMBER() OVER(PARTITION BY ',@columnsList,' ',  
       'ORDER BY ',SUBSTRING(@columnsList,1,@limit-1),') AS DuplicateCount  
      FROM [',@schemaName,'].[',@tableName,'])  
                                              
        ')  
      
     IF @displayOnly = 0  
     SET @sqlCommand = CONCAT(@sqlCommand,'DELETE FROM CTE WHERE DuplicateCount > 1;')  
     IF @displayOnly = 1  
     SET @sqlCommand = CONCAT(@sqlCommand,'SELECT ',@columnsList,',MAX(DuplicateCount) AS DuplicateCount FROM CTE WHERE DuplicateCount > 1 GROUP BY ',@columnsList)  
     END  
     ELSE  
     BEGIN          
      SET @sqlCommand = CONCAT('WITH CTE (',@columnsList,',DuplicateCount',')   
      AS (SELECT ',@columnsList,',',  
       'ROW_NUMBER() OVER(PARTITION BY ',@columnsList,' ',  
       'ORDER BY ',SUBSTRING(@columnsList,1,CHARINDEX(',',@columnsList)-1),') AS DuplicateCount  
      FROM [',@schemaName,'].[',@tableName,'])  
                                              
       ')  
      IF @displayOnly = 0  
      SET @sqlCommand = CONCAT(@sqlCommand,'DELETE FROM CTE WHERE DuplicateCount > 1;')  
      IF @displayOnly = 1  
      SET @sqlCommand = CONCAT(@sqlCommand,'SELECT * FROM CTE WHERE DuplicateCount > 1;')  
     END  
     EXEC (@sqlCommand)  
    END  
    ELSE  
    BEGIN  
     PRINT 'Table doesnt exist within this database!'           
    END  
    

    I have done basic testing of this code and it works fine. You may need to work on testing this more and could change it accordingly.
    Also, please note that CTE queries are not supported on non-hash distributed tables, so you may need to change the CTE logic in the above for tables created with non-hash distributions like ROUND_ROBIN .

    Also, adding below gif to show you the output on a Synapse SQL Pool table.

    133755-removeduplicatessynapse.gif
    Please let me know if you have any questions.

    Thanks
    Saurabh

    ----------

    Please do not forget to "Accept the answer" wherever the information provided helps you to help others in the community.