What is the Synapse security setup to allow Reader access to databases and tables created by a Spark pool through the serverless SQL poolpublic endpoint?

Question

Goal: Connect client apps (e.g., SSMS, Tableau, Power BI) to the Synapse workspace serverless SQL endpoint and query databases and tables created by an Apache Spark pool in the same Synapse workspace.

Setup:
What we have:

Active Directory users with Azure Reader role assignment on the Resource group which includes the Synapse workspace, Apache Spark pool, and storage account.
Inherited Azure Reader role assignment on the Synapse workspace and storage account.
Firewall rules to allow access from client machines.
Synapse Administrator Synapse RBAC role assignment to be able to query the information. (This role looks overly powerful. See question below.)

At that point one user got an error:
[Microsoft][ODBC Driver 17 for SQL Server][SQL Server]External table 'dbo' is not accessible because content of directory cannot be listed.
The table "[dbo].[billing_all_prod]" does not exist.

Looking at the document on setting up Synapse access control (https://learn.microsoft.com/en-us/azure/synapse-analytics/security/how-to-set-up-access-control),
we assigned the Storage Blob Data Reader role to the AD user. We haven't heard back yet about whether this resolved all of the access issues, but the end-users (managers for this project) are tired of being used as debuggers, and we would like to be sure everything that needs to be done is completed.

That Synapse access control document, in STEP 7:
By default, all users assigned the Synapse Administrator role are also assigned the SQL db_owner role on the serverless SQL pool, 'Built-in', and all its databases.
Access to SQL pools for other users and for the workspace MSI is controlled using SQL permissions.

When I proceed to STEP 7.1: Serverless SQL pool, Built-in and run the scripts, when trying to create users on the Spark-created databases I get an error:
Operation CREATE USER is not allowed for a replicated database.

That sort of makes sense, since the Spark pool actually created the database and the serverless SQL pool is just using a copy of some metadata. But my questions from above are:

What is the minimum (most restrictive) Synapse RBAC role that I would need to assign so that a client could use the serverless SQL pool public endpoint to query a database and table created by the Spark pool?
Is there any SQL-level access that needs to be granted, or can it all be done with various Azure and Synapse RBAC roles? (Again, we have Reader on the resource group, workspace, and storage account, and we also have Storage Blob Data reader on the storage account, and Synapse Administrator at the workspace scope.)

Thanks,
Johnny

Accepted Answer

Adding Storage Blob Data Reader should resolve that error.

There are two levels of authentication required to do what you are trying to do:

SQL Server level. Outside of admin it isn't currently possible to handle this purely with RBAC, but you can create a server login that is am AAD security group and then assign your users to that group. This avoids having to manage them individually within SQL and will resolve needing to give everyone admin.
CREATE LOGIN [Security Group Name] FROM EXTERNAL PROVIDER; This must be a security group in AAD, not a Microsoft 365/Distribution group
As a Spark database is not actually a SQL database but a set of parquet files, adding SQL database level permissions doesn't work, as you have discovered. The Storage Blob Data Reader role takes its place as your serverless pool is reading directly from the storage account.

Answer

Hi! I pretty much have the exact same case, but the suggested solution does not resolve the issue.

I've tried creating SQL logins (CREATE LOGIN), which makes me login and see the databases on the server. But when I try to expand "tables", I get this:

Which makes me think the SQL login works, but the data access is limited. I get the same result when I create a login for the user who are to access the data based on their email: CREATE LOGIN email@ssss .com FROM EXTERNAL PROVIDER;.

However, the user is granted Storage Blob Data Reader access on the resource group that contains the data lake/storage account, without this changing the outcome. The user is supposed to connect through Power BI, and the error looks like this:

What am I doing wrong here?

What is the Synapse security setup to allow Reader access to databases and tables created by a Spark pool through the serverless SQL poolpublic endpoint?

1 additional answer