Connect Synapse Spark to Synapse serverless SQL pool view (pySpark)

Question

This seems like a really basic question but I haven't found an answer to it. How can I, in Synapse Studio using pySpark, connect to a SQL view in the same Synapse's SQL serverless pool?

Answer

I did get this working, including a list of packages I added to the server.

Note 1: Everything in a serverless database is in storage somewhere- it doesn't have any storage on its own. If you can go straight to storage that scenario is better supported. I can see scenarios where this is not ideal though- wanting to access views on the serverless DB for example.

Note 2: There is a synapsesql() function for connecting between Spark and SQL pools. This only works on dedicated pools and is designed to data transfer only, so there are some limitations there.

Note 3: While SQL logins are technically supported by serverless pools, the fact that the login also needs to have permissions on the storage account makes it a less than ideal route- using AAD is going to have a much better result.

With that out of the way, here are the packages I added to my Spark pool. There are other dependencies, but they are already included in the base image.

mssql-jdbc-9.2.1.jre8.jar
spark-mssql-connector-1.0.1.jar
msal4j-1.10.0.jar
asm-8.0.1.jar
content-type-2.1.jar
lang-tag-1.5.jar
oauth2-oidc-sdk-9.4.jar

Once the pool has updated you can use this as a base template for accessing the pool. 'ActiveDirectoryPassword' or 'ActiveDirectoryServicePrincipal'. MSI and Interactive logins don't work at this time. You can also run adhoc queries using the query property.

%%pyspark  
  
db_properties={}  
db_properties["authentication"] = "ActiveDirectoryServicePrincipal"  
db_properties["AADSecurePrincipalId"] = ""  
db_properties["AADSecurePrincipalSecret"] = ""  
db_properties["driver"] = "com.microsoft.sqlserver.jdbc.SQLServerDriver"  
  
serverName = ""  
databaseName = ""  
tableName = ""  
  
data = spark.read.jdbc("jdbc:sqlserver://-ondemand.sql.azuresynapse.net;databaseName=", "",properties=db_properties)  
  
data.show(10)

Answer

Can anyone help me with this?

Connect Synapse Spark to Synapse serverless SQL pool view (pySpark)

2 answers