Azure Data Factory - Stage data Credential Error for SQL DW

Anonymous
2021-05-21T03:20:54.097+00:00

I have a pipeline with a series of Data Flows in them. These data flows read data from a Cosmos database and write them to a SQL DW instance. However I am required to have a staging folder location for these. So I create a Gen2 storage location, create a container and give the data factory both Contributor and Storage Blob Data Owner permissions to the storage to enable control/read/write.

I then assign the stage folder in the settings of the Data Factory pipeline and test it which seems to work:

98446-image.png

However, I go to run the pipeline via a time trigger and the executions fail. They fail on the write to sink task, claiming there is some missing credential:
98453-image.png

What am I doing wrong here, this looks like it should work? Everything is located in Aus East.

Oh, this is trying to write to a Synapse DW. This is the storage location role assignments:

98471-image.png

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,450 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,426 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,654 questions
{count} votes

Accepted answer
  1. Anonymous
    2021-06-21T22:37:43.64+00:00

    Worked through Microsoft support for this. The error I was receiving was completely incorrect, had nothing to do with the issue that caused it. There was no credential issue and no issue with the identified data flow object (it seems to have defaulted to the last object). This went through to the DW/Synapse team who examined what was being run in the background. Looks like it was an unhandled exception which didn't surface the right error.

    The issue was I was reading data from a Dimension table in Synapse. The dimension has an "Active" field, indicating whether the row is active or not. This is set as a boolean on the SQL DW side and I needed to filter all those where Active = true. In the "Filter On" statement on the Filter transformation I had:

    toBoolean(byName('Active'))==true()

    This was the issue, for some reason the underlying engine doesn't know what to do with this and cannot equate the true value coming from DW to Data Factories definition of true. As advised by the support team, I changed this to the following:

    toString(byName('Active'))=='true'

    which worked.

    This was a pretty simple but obscure solution, really a work around. It was pretty much a mystery that it was the filter causing the issue as it was never specified in any of the error messages. I have told the support specialists at Microsoft that this isn't really acceptable, the appropriate error must be surfaced for the appropriate transformation, else we are just guessing at solutions. My Data flow has 20 objects in it, guessing which is causing an error (when I know the error returned may or may not be the actual error) isn't particularly enjoyable. Symptomatic of a "not quite ready for production" immature software product...


1 additional answer

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,422 Reputation points Microsoft Employee
    2021-05-25T00:15:07.117+00:00

    Hi @Anonymous ,
    Thanks for your response. Trigger run status are independent of pipeline run status, that is the reason why you are seeing the trigger runs as success.

    For the storage permission issue :
    Possible root cause:
    To sink the data into Synapse analytics or pool the server does the "pull" of data from the staging. In case of account key auth for staging, the key will be passed to SQL Synapse by the data factory and SQL does the pull.

    But with managed Identity, the pull will be using SQL's managed identity as the pull is done by SQL and it cannot use Data Factory's identity.

    Resolution:
    For the SQL server hosting Synapse/pools does not have a system-assigned managed identity and hence it is required to generate the identity for the server using the steps below

    If you have a standalone dedicated SQL pool, register your SQL server with Azure AD by using PowerShell:

    • Connect-AzAccount
    • Select-AzSubscription -SubscriptionId <subscriptionId>
    • Set-AzSqlServer -ResourceGroupName your-database-server-resourceGroup -ServerName your-SQL-servername –AssignIdentity

    If you have an Azure Synapse Analytics workspace, register your workspace's system-managed identity:

    • Go to your Azure Synapse Analytics workspace in the Azure portal.
    • Go to the Managed identities pane.
    • Make sure the Allow Pipelines option is enabled.

    Under your storage account, go to Access Control (IAM), and select Add role assignment. Assign the Storage Blob Data Contributor Azure role to the server or workspace hosting your dedicated SQL pool, which you've registered with Azure AD.

    Once the permissions are assigned to Synapse Managed Identity, then Rerun the pipeline and see how it goes.

    Hope this helps. Do let us know how it goes.

    ----------

    Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.