Azure Data Factory - Stage data Credential Error for SQL DW

Question

I have a pipeline with a series of Data Flows in them. These data flows read data from a Cosmos database and write them to a SQL DW instance. However I am required to have a staging folder location for these. So I create a Gen2 storage location, create a container and give the data factory both Contributor and Storage Blob Data Owner permissions to the storage to enable control/read/write.

I then assign the stage folder in the settings of the Data Factory pipeline and test it which seems to work:

However, I go to run the pipeline via a time trigger and the executions fail. They fail on the write to sink task, claiming there is some missing credential:

What am I doing wrong here, this looks like it should work? Everything is located in Aus East.

Oh, this is trying to write to a Synapse DW. This is the storage location role assignments:

Accepted Answer

Worked through Microsoft support for this. The error I was receiving was completely incorrect, had nothing to do with the issue that caused it. There was no credential issue and no issue with the identified data flow object (it seems to have defaulted to the last object). This went through to the DW/Synapse team who examined what was being run in the background. Looks like it was an unhandled exception which didn't surface the right error.

The issue was I was reading data from a Dimension table in Synapse. The dimension has an "Active" field, indicating whether the row is active or not. This is set as a boolean on the SQL DW side and I needed to filter all those where Active = true. In the "Filter On" statement on the Filter transformation I had:

toBoolean(byName('Active'))==true()

This was the issue, for some reason the underlying engine doesn't know what to do with this and cannot equate the true value coming from DW to Data Factories definition of true. As advised by the support team, I changed this to the following:

toString(byName('Active'))=='true'

which worked.

This was a pretty simple but obscure solution, really a work around. It was pretty much a mystery that it was the filter causing the issue as it was never specified in any of the error messages. I have told the support specialists at Microsoft that this isn't really acceptable, the appropriate error must be surfaced for the appropriate transformation, else we are just guessing at solutions. My Data flow has 20 objects in it, guessing which is causing an error (when I know the error returned may or may not be the actual error) isn't particularly enjoyable. Symptomatic of a "not quite ready for production" immature software product...

Answer

Hi @Anonymous ,
Thanks for your response. Trigger run status are independent of pipeline run status, that is the reason why you are seeing the trigger runs as success.

For the storage permission issue :
Possible root cause:
To sink the data into Synapse analytics or pool the server does the "pull" of data from the staging. In case of account key auth for staging, the key will be passed to SQL Synapse by the data factory and SQL does the pull.

But with managed Identity, the pull will be using SQL's managed identity as the pull is done by SQL and it cannot use Data Factory's identity.

Resolution:
For the SQL server hosting Synapse/pools does not have a system-assigned managed identity and hence it is required to generate the identity for the server using the steps below

If you have a standalone dedicated SQL pool, register your SQL server with Azure AD by using PowerShell:

Connect-AzAccount
Select-AzSubscription -SubscriptionId
Set-AzSqlServer -ResourceGroupName your-database-server-resourceGroup -ServerName your-SQL-servername –AssignIdentity

If you have an Azure Synapse Analytics workspace, register your workspace's system-managed identity:

Go to your Azure Synapse Analytics workspace in the Azure portal.
Go to the Managed identities pane.
Make sure the Allow Pipelines option is enabled.

Under your storage account, go to Access Control (IAM), and select Add role assignment. Assign the Storage Blob Data Contributor Azure role to the server or workspace hosting your dedicated SQL pool, which you've registered with Azure AD.

Once the permissions are assigned to Synapse Managed Identity, then Rerun the pipeline and see how it goes.

Hope this helps. Do let us know how it goes.

----------

Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.

Azure Data Factory - Stage data Credential Error for SQL DW

1 additional answer