Getting 'User does not have permission to perform this action' when enable staging is turned on in Data factory writing to dedicated SQL pool (synapse)

PeterSh 176 Reputation points
2021-09-27T12:04:07.873+00:00

I've been using data factory to shuffle data on an Azure SQL database, and have recently set up a ... well ... dedicated SQL pool (formerly SQL DW) but now apparently referred to as Synapse, according to the linked service setup in data factory.

I set up the Synapse Analytics linked service in Data Factory using the managed identity. Test connection successful.

I created the dataset, and it tests ok and returns the correct schema without data (there are no rows yet).

I created a fairly simple flow that uses the database as one of two sources and again as the sink. I linked that to a pipeline, and it prompted to set up a staging linked service. I selected my gen2 storage account and entered a valid path. Test connection successful.

Attempting to debug the pipeline gets me this error:

"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: at Source 'existing': shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: User does not have permission to perform this action."

Going into the data flow with the debug instance running, I can get a data preview (though blank) on both the source and the sink.

But if I turn off 'Enable Staging' on the source, it now skips that stage and fails on the sink. I assume that it would run if I turned it off in the sink as well, however, I'm reluctant to do so as it warns me staging should be turned on and there are a large number of rows to be written.

From what I have read, it seems that I need to set up permissions to allow the SQL server to write to the storage account directly. I've tried various things, eventually resorting to powershell to give the SQL server a managed identity and giving that identity the Storage Blob Data Contributor role, but to no avail.

I'm wondering if anyone might have any suggestions about what else to check or what else could be going on here?

It's worth noting that we do use privatelink, which may be complicating things as well. I did just notice that the pool seems to have its own networking settings, so I'm now wondering if I need to set that up separately, or perhaps even see if it needs a managed identity of it's own.

Any help you could give would be great. Thanks.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,422 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,643 questions
0 comments No comments
{count} votes

Accepted answer
  1. PeterSh 176 Reputation points
    2021-09-28T02:08:01.197+00:00

    I've managed to find a solution, though I'm not entirely sure why it works or what specific permissions it was looking for.

    It started working as soon as I made the managed identity the db_owner.

    EXEC sp_addrolemember db_owner, [factory managed identity];
    

    I made so many changes to permissions and firewalls, so I'm going to need to backtrack and tighten it all back up again, but it seems this was all that was needed.

    Hope this helps someone else, and if you have any idea why this worked, please let me know. I'm still a bit confused about it.

    Thanks

    2 people found this answer helpful.

0 additional answers

Sort by: Most helpful