question

LawrenceNkwana-9127 avatar image
0 Votes"
LawrenceNkwana-9127 asked SaurabhSharma-msft commented

Rewrite T-SQL Script in Azure Synapse

How can I re-write the below scripts that removes duplicates in Azure Synapse?This script works fine in T-SQL but I want to run it in Azure Synapse.



ALTER PROCEDURE [dbo].[sp_DeleteDuplicates](
@schemaName VARCHAR(128),
@tableName VARCHAR(128),
@displayOnly BIT = 0)
AS
BEGIN
SET NOCOUNT ON;

 IF LEN(@schemaName) = 0
 BEGIN
     PRINT 'You must specify the schema of the table!'
     RETURN
 END

 IF LEN(@tableName) = 0
 BEGIN
     PRINT 'You must specify the name of the table!'
     RETURN
 END

 IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = @schemaName AND  TABLE_NAME = @tableName)
 BEGIN
     DECLARE @pkColumnName  VARCHAR(128);
     DECLARE @columnName    VARCHAR(128);
     DECLARE @sqlCommand    VARCHAR(MAX);
     DECLARE @columnsList   VARCHAR(MAX);
     DECLARE @pkColumnsList VARCHAR(MAX);
     DECLARE @pkColumns     TABLE(pkColumn VARCHAR(128));
     DECLARE @limit         INT;
        
     INSERT INTO @pkColumns
     SELECT K.COLUMN_NAME
     FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS AS C
     JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS K ON C.TABLE_NAME = K.TABLE_NAME AND C.CONSTRAINT_SCHEMA = K.CONSTRAINT_SCHEMA
     WHERE C.CONSTRAINT_TYPE = 'PRIMARY KEY'
       AND C.CONSTRAINT_SCHEMA = @schemaName AND C.TABLE_NAME = @tableName

     IF((SELECT COUNT(*) FROM @pkColumns) > 0)
     BEGIN
         DECLARE pk_cursor CURSOR FOR 
         SELECT * FROM @pkColumns
    
         OPEN pk_cursor  
         FETCH NEXT FROM pk_cursor INTO @pkColumnName 
        
         WHILE @@FETCH_STATUS = 0  
         BEGIN  
             SET @pkColumnsList = CONCAT(@pkColumnsList,'',@pkColumnName,',')
             FETCH NEXT FROM pk_cursor INTO @pkColumnName 
         END 

         CLOSE pk_cursor  
         DEALLOCATE pk_cursor 

         SET @pkColumnsList = SUBSTRING(@pkColumnsList,1,LEN(@pkColumnsList)-1)
     END  
        
     DECLARE columns_cursor CURSOR FOR 
     SELECT COLUMN_NAME
     FROM INFORMATION_SCHEMA.COLUMNS
     WHERE TABLE_SCHEMA = @schemaName AND TABLE_NAME = @tableName AND COLUMN_NAME NOT IN (SELECT pkColumn FROM @pkColumns)
     ORDER BY ORDINAL_POSITION;

     OPEN columns_cursor  
     FETCH NEXT FROM columns_cursor INTO @columnName 
        
     WHILE @@FETCH_STATUS = 0  
     BEGIN  
         SET @columnsList = CONCAT(@columnsList,'',@columnName,',')
         FETCH NEXT FROM columns_cursor INTO @columnName 
     END 

     CLOSE columns_cursor  
     DEALLOCATE columns_cursor 
        
     SET @columnsList = SUBSTRING(@columnsList,1,LEN(@columnsList)-1)

     IF((SELECT COUNT(*) FROM @pkColumns) > 0)
     BEGIN        

     IF(CHARINDEX(',',@columnsList) = 0)
     SET @limit = LEN(@columnsList)+1
     ELSE
     SET @limit = CHARINDEX(',',@columnsList)

        
     SET @sqlCommand = CONCAT('WITH CTE (',@columnsList,',DuplicateCount',') 
                                 AS (SELECT ',@columnsList,',',
                                              'ROW_NUMBER() OVER(PARTITION BY ',@columnsList,' ',
                                              'ORDER BY ',SUBSTRING(@columnsList,1,@limit-1),') AS DuplicateCount
                                     FROM [',@schemaName,'].[',@tableName,'])
                                        
                               ')
     IF @displayOnly = 0
     SET @sqlCommand = CONCAT(@sqlCommand,'DELETE FROM CTE WHERE DuplicateCount > 1;')
     IF @displayOnly = 1
     SET @sqlCommand = CONCAT(@sqlCommand,'SELECT ',@columnsList,',MAX(DuplicateCount) AS DuplicateCount FROM CTE WHERE DuplicateCount > 1 GROUP BY ',@columnsList)

     END
     ELSE
     BEGIN        
     SET @sqlCommand = CONCAT('WITH CTE (',@columnsList,',DuplicateCount',') 
                                 AS (SELECT ',@columnsList,',',
                                              'ROW_NUMBER() OVER(PARTITION BY ',@columnsList,' ',
                                              'ORDER BY ',SUBSTRING(@columnsList,1,CHARINDEX(',',@columnsList)-1),') AS DuplicateCount
                                     FROM [',@schemaName,'].[',@tableName,'])
                                        
                              ')

     IF @displayOnly = 0
     SET @sqlCommand = CONCAT(@sqlCommand,'DELETE FROM CTE WHERE DuplicateCount > 1;')
     IF @displayOnly = 1
     SET @sqlCommand = CONCAT(@sqlCommand,'SELECT * FROM CTE WHERE DuplicateCount > 1;')

     END
        
     EXEC (@sqlCommand)
 END
 ELSE
     BEGIN
         PRINT 'Table doesnt exist within this database!'
         RETURN
     END

END

azure-synapse-analytics
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

SaurabhSharma-msft avatar image
0 Votes"
SaurabhSharma-msft answered SaurabhSharma-msft commented

Hi @LawrenceNkwana-9127,

Thanks for using Microsoft Q&A !!

As Table variables and CURSOR is not supported in Azure Synapse you can rewrite the above code by -
1. Create a Temp Table instead of using a Table variable
2. Replace CURSOR with WHILE Loop

So, you could write your code something like below -

 DECLARE @schemaName VARCHAR(128),@tableName  VARCHAR(128),@displayOnly BIT = 0
 SET @schemaName = 'dbo'
 SET @tableName  = 't1'
    
 IF LEN(@schemaName) = 0
 BEGIN
      PRINT 'You must specify the schema of the table!'
      RETURN
 END
 IF LEN(@tableName) = 0
 BEGIN
      PRINT 'You must specify the name of the table!'
      RETURN
 END
 IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = @schemaName AND  TABLE_NAME = @tableName)
 BEGIN
  DECLARE @pkColumnName  VARCHAR(128);
  DECLARE @columnName    VARCHAR(128);
  DECLARE @sqlCommand    VARCHAR(MAX);
  DECLARE @columnsList   VARCHAR(MAX);
  DECLARE @pkColumnsList VARCHAR(MAX);

  IF OBJECT_ID('tempdb..#pkColumns') IS NOT NULL
  BEGIN
  DROP TABLE #pkColumns
  END
    
  CREATE TABLE #pkColumns 
  (
  pkColumn VARCHAR(128)
  )
    
  IF OBJECT_ID('tempdb..#pkColumnsClone') IS NOT NULL
  BEGIN
  DROP TABLE #pkColumnsClone
  END
  CREATE TABLE #pkColumnsClone 
  (
  pkColumn VARCHAR(128)
  )
    
  IF OBJECT_ID('tempdb..#RemainingColumns') IS NOT NULL
  BEGIN
  DROP TABLE #RemainingColumns
  END
    
  CREATE TABLE #RemainingColumns 
  (
   ColumnName VARCHAR(128)
  )
    
  DECLARE @limit         INT;
            
  INSERT INTO #pkColumns
  SELECT K.COLUMN_NAME
  FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS AS C
  JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS K ON C.TABLE_NAME = K.TABLE_NAME AND C.CONSTRAINT_SCHEMA = K.CONSTRAINT_SCHEMA
  WHERE C.CONSTRAINT_TYPE = 'PRIMARY KEY'
  AND C.CONSTRAINT_SCHEMA = @schemaName AND C.TABLE_NAME = @tableName
    
  INSERT INTO #pkColumnsClone
  SELECT * FROM #pkColumns
    

  DECLARE @pkColumnsCount INT = (SELECT COUNT(*) FROM #pkColumns), @i INT = 1
    
  IF((SELECT COUNT(*) FROM #pkColumnsClone) > 0)
  BEGIN
  WHILE   @i <= @pkColumnsCount
  BEGIN
  SELECT TOP 1 @pkColumnName = pkColumn FROM #pkColumnsClone ORDER BY pkColumn
  SET @pkColumnsList = CONCAT(@pkColumnsList,'',@pkColumnName,',')
  SET     @i +=1;
  DELETE FROM #pkColumnsClone WHERE pkColumn = @pkColumnName
  END
  SET @pkColumnsList = SUBSTRING(@pkColumnsList,1,LEN(@pkColumnsList)-1)
  END  
    
  SELECT @pkColumnsList
    
  INSERT INTO #RemainingColumns
  SELECT TOP 100000 COLUMN_NAME AS ColumnName
  FROM INFORMATION_SCHEMA.COLUMNS
  WHERE TABLE_SCHEMA = @schemaName AND TABLE_NAME = @tableName AND COLUMN_NAME NOT IN (SELECT pkColumn FROM #pkColumns)
  ORDER BY ORDINAL_POSITION;
       
    
  DECLARE @ColumnsCount INT = (SELECT COUNT(*) FROM #RemainingColumns)
  SET @i = 1
    
  IF((SELECT COUNT(*) FROM #RemainingColumns) > 0)
  BEGIN
  WHILE   @i <= @ColumnsCount
  BEGIN
  SELECT TOP 1 @ColumnName = ColumnName FROM #RemainingColumns ORDER BY ColumnName
  SET @columnsList = CONCAT(@columnsList,'',@columnName,',')
  SET     @i +=1;
  DELETE FROM #RemainingColumns WHERE ColumnName = @columnName
  END
  SET @columnsList = SUBSTRING(@columnsList,1,LEN(@columnsList)-1)
  END  
    
  SELECT @columnsList 
    
  IF(@pkColumnsCount > 0)
  BEGIN        
   IF(CHARINDEX(',',@columnsList) = 0)
  SET @limit = LEN(@columnsList)+1
   ELSE
  SET @limit = CHARINDEX(',',@columnsList)
      
   SET @sqlCommand = CONCAT('WITH CTE (',@columnsList,',DuplicateCount',') 
   AS (SELECT ',@columnsList,',',
    'ROW_NUMBER() OVER(PARTITION BY ',@columnsList,' ',
    'ORDER BY ',SUBSTRING(@columnsList,1,@limit-1),') AS DuplicateCount
   FROM [',@schemaName,'].[',@tableName,'])
                                            
     ')
    
  IF @displayOnly = 0
  SET @sqlCommand = CONCAT(@sqlCommand,'DELETE FROM CTE WHERE DuplicateCount > 1;')
  IF @displayOnly = 1
  SET @sqlCommand = CONCAT(@sqlCommand,'SELECT ',@columnsList,',MAX(DuplicateCount) AS DuplicateCount FROM CTE WHERE DuplicateCount > 1 GROUP BY ',@columnsList)
  END
  ELSE
  BEGIN        
   SET @sqlCommand = CONCAT('WITH CTE (',@columnsList,',DuplicateCount',') 
   AS (SELECT ',@columnsList,',',
    'ROW_NUMBER() OVER(PARTITION BY ',@columnsList,' ',
    'ORDER BY ',SUBSTRING(@columnsList,1,CHARINDEX(',',@columnsList)-1),') AS DuplicateCount
   FROM [',@schemaName,'].[',@tableName,'])
                                            
    ')
   IF @displayOnly = 0
   SET @sqlCommand = CONCAT(@sqlCommand,'DELETE FROM CTE WHERE DuplicateCount > 1;')
   IF @displayOnly = 1
   SET @sqlCommand = CONCAT(@sqlCommand,'SELECT * FROM CTE WHERE DuplicateCount > 1;')
  END
  EXEC (@sqlCommand)
 END
 ELSE
 BEGIN
  PRINT 'Table doesnt exist within this database!'         
 END

I have done basic testing of this code and it works fine. You may need to work on testing this more and could change it accordingly.
Also, please note that CTE queries are not supported on non-hash distributed tables, so you may need to change the CTE logic in the above for tables created with non-hash distributions like ROUND_ROBIN .

Also, adding below gif to show you the output on a Synapse SQL Pool table.

133755-removeduplicatessynapse.gif
Please let me know if you have any questions.

Thanks
Saurabh


Please do not forget to "Accept the answer" wherever the information provided helps you to help others in the community.


· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @LawrenceNkwana-9127,

Please let me know if you find above reply useful. If yes, please 'Accept Answer' in above reply. This will help other community members facing similar query to refer to this solution.

Thanks,
Saurabh

0 Votes 0 ·

Hi @LawrenceNkwana-9127,

I have not heard back from you. Did my answer solve your issue? If so, please mark as accepted answer. If not, please let me know how I may better assist.


Thanks
Saurabh

0 Votes 0 ·

Hi @LawrenceNkwana-9127,

We haven't heard back from you. Just wanted to check if you are you still facing the issue? In case If you already found a solution, would you please share it here with the community? Otherwise, let us know and we will continue to engage with you on the issue.

Thanks
Saurabh

0 Votes 0 ·