Connect Synapse Spark to Synapse serverless SQL pool view (pySpark)

Edwin 61 Reputation points
2021-06-09T13:05:24.823+00:00

This seems like a really basic question but I haven't found an answer to it. How can I, in Synapse Studio using pySpark, connect to a SQL view in the same Synapse's SQL serverless pool?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,422 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Samara Soucy - MSFT 5,051 Reputation points
    2021-06-14T18:18:57.923+00:00

    I did get this working, including a list of packages I added to the server.

    Note 1: Everything in a serverless database is in storage somewhere- it doesn't have any storage on its own. If you can go straight to storage that scenario is better supported. I can see scenarios where this is not ideal though- wanting to access views on the serverless DB for example.

    Note 2: There is a synapsesql() function for connecting between Spark and SQL pools. This only works on dedicated pools and is designed to data transfer only, so there are some limitations there.

    Note 3: While SQL logins are technically supported by serverless pools, the fact that the login also needs to have permissions on the storage account makes it a less than ideal route- using AAD is going to have a much better result.

    With that out of the way, here are the packages I added to my Spark pool. There are other dependencies, but they are already included in the base image.

    • mssql-jdbc-9.2.1.jre8.jar
    • spark-mssql-connector-1.0.1.jar
    • msal4j-1.10.0.jar
    • asm-8.0.1.jar
    • content-type-2.1.jar
    • lang-tag-1.5.jar
    • oauth2-oidc-sdk-9.4.jar

    Once the pool has updated you can use this as a base template for accessing the pool. 'ActiveDirectoryPassword' or 'ActiveDirectoryServicePrincipal'. MSI and Interactive logins don't work at this time. You can also run adhoc queries using the query property.

    %%pyspark  
      
    db_properties={}  
    db_properties["authentication"] = "ActiveDirectoryServicePrincipal"  
    db_properties["AADSecurePrincipalId"] = "<app ID>"  
    db_properties["AADSecurePrincipalSecret"] = "<app secret>"  
    db_properties["driver"] = "com.microsoft.sqlserver.jdbc.SQLServerDriver"  
      
    serverName = "<server name>"  
    databaseName = "<database name>"  
    tableName = "<table name>"  
      
    data = spark.read.jdbc("jdbc:sqlserver://<server>-ondemand.sql.azuresynapse.net;databaseName=<dbName>", "<table>",properties=db_properties)  
      
    data.show(10)  
    
    2 people found this answer helpful.
    0 comments No comments

  2. Edwin 61 Reputation points
    2021-06-10T11:40:34.78+00:00

    Can anyone help me with this?

    0 comments No comments