question

FerreiraGregorio-7079 avatar image
0 Votes"
FerreiraGregorio-7079 asked SaurabhSharma-msft answered

Public access error from Spark Pool, but access working from SQL on Demand, linked blob storage

Hi,

I have successfully linked to my Synapse workspace different Data Lake Gen2 (Abfss) storage accounts, and Blob storage accounts. I can explore the content, and even during the linking process, I get the connection test successful. All using managed identity and registered as Blob contributor on each storage account.

The Gen2 account all work ok (having them as private), very practical to right-click and load the data in Spark or SQL.

For the Blob storage accounts, only SQL on-demand works. The Spark pool does not work, and I get the error "public access is disabled on storage account".

If this is a linked service, and even using the automatically generated notebooks, why this error? Why the SQL on demand is authorized and not the Spark pool? I even tried passing a SAS token to the Spark session, but I get the same error.

I also do not understand, why when using the Synapse workspace, I need to white list my IP on the Storage account firewall (Vnet is active). I have whitelisted the Synapse IPs for my region, as well as listed the Synapse workspace in the resource instances and marked to allow Azure trusted services.

If anyone has any experience with all the different configurations necessary to access a blob storage from Synapse, please let me know how can I solve my issue and/or modify my configuration to keep all secured but be able to work with my data.

azure-synapse-analyticsazure-blob-storage
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @ferreiragregorio-7079

Thanks for using Microsoft Q&A !!
Is the workspace configured with a managed VNET? Here is the documentation and more details on why a managed VNET is required for Spark pool to connect to ADLS Gen2.

Thanks
Saurabh

0 Votes 0 ·
FerreiraGregorio-7079 avatar image
0 Votes"
FerreiraGregorio-7079 answered SaurabhSharma-msft converted comment to answer

Hi Saurabh,

Thanks for your reply.

Please confirm, as I'm not able to understand your answer.

I have only issues with accessing Azure Blob Storage "wasbs" NOT Gen2 "abfss". My Synapse workspace do not have a managed VNET, only the blob storage are in a Vnet.

Let me give you an example. Using linked services I am able to:

 blob_sas_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service)

 spark.conf.set(
     f"fs.azure.sas.{container}.{account}.dfs.core.windows.net",
     blob_sas_token)

 abfss_path = f'abfss://{blob_container_name}@{blob_account_name}.dfs.core.windows.net/path/part0.snappy.parquet'
    
 df = spark.read.load(abfss_path, format='parquet')
 display(df.limit(10))

I'm even able to read this without having to get and pass as spark config the blob_sas_token.

If I take the same code, but for blob storage (wasbs). I get the error:

Py4JJavaError: An error occurred while calling o167.load. : org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: Public access is not permitted on this storage account.

Which I find it weird due to the fact I can access to it form the SQL pool.

The example code, as above:

 blob_sas_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service)
        
 spark.conf.set(
  f"fs.azure.sas.{container}.{account}.blob.core.windows.net",
  blob_sas_token)
        
 wasbs_path = f'wasbs://{blob_container_name}@{blob_account_name}.blob.core.windows.net/path/part0.snappy.parquet'
            
 df = spark.read.load(wasbs_path, format='parquet')
 display(df.limit(10))

  • for each of them I have created a linked service

Any advice/idea on I should be doing this is very much welcome



· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@ferreiragregorio-7079 ok, Thanks for sharing details. I am checking on this one and get back to you.

Thanks
Saurabh

0 Votes 0 ·
SaurabhSharma-msft avatar image
0 Votes"
SaurabhSharma-msft answered

@ferreiragregorio-7079,

In order to make it work you should create a private link from managed VNET Managed private endpoints - Azure Synapse Analytics.
SQL On-Demand bypasses VNETs (we also set it up during workspace provisioning (you can also set it up during workspace provisioning – and we products team has plan to automate flow of creating private link as well in coming months).
Also, in the first code snippet, AAD-passthrough is being used inadvertently (fs.azure.sas doesn't apply to abfs driver).
The second code snippet is equivalent in functionality to the first one (abfss + dfs.core = wasbs + blob.core).
I suggest you to use the first code snippet instead of the second one and in order to pass SAS in, you can use ConfBasedSASProvider or AkvBasedSASProvider (documentation available through TokenLibrary.help() in Notebook).
Please let me know if you have any questions.

Thanks
Saurabh

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @SaurabhSharma-msft ,

Thanks for your support.

We will be working on it this week. I'll update here.

Regards,
Gregorio

0 Votes 0 ·

Hi @ferreiragregorio-7079,

Sure.

Thanks
Saurabh

0 Votes 0 ·