This seems like a really basic question but I haven't found an answer to it. How can I, in Synapse Studio using pySpark, connect to a SQL view in the same Synapse's SQL serverless pool?
This seems like a really basic question but I haven't found an answer to it. How can I, in Synapse Studio using pySpark, connect to a SQL view in the same Synapse's SQL serverless pool?
The short answer is yes, the jdbc driver can do this. However, when working with serverless pool you defintiely want to use Azure AD authentication instead of the default SQL auth, which requires using a newer version of the jdbc driver than is included with Synapse Spark. I'm still working on creating a full working sample for you, including which packages you need to add to the pool.
That would be great, thanks! Using AD would be the preferred solution as well.
I did get this working, including a list of packages I added to the server.
Note 1: Everything in a serverless database is in storage somewhere- it doesn't have any storage on its own. If you can go straight to storage that scenario is better supported. I can see scenarios where this is not ideal though- wanting to access views on the serverless DB for example.
Note 2: There is a synapsesql() function for connecting between Spark and SQL pools. This only works on dedicated pools and is designed to data transfer only, so there are some limitations there.
Note 3: While SQL logins are technically supported by serverless pools, the fact that the login also needs to have permissions on the storage account makes it a less than ideal route- using AAD is going to have a much better result.
With that out of the way, here are the packages I added to my Spark pool. There are other dependencies, but they are already included in the base image.
mssql-jdbc-9.2.1.jre8.jar
spark-mssql-connector-1.0.1.jar
msal4j-1.10.0.jar
asm-8.0.1.jar
content-type-2.1.jar
lang-tag-1.5.jar
oauth2-oidc-sdk-9.4.jar
Once the pool has updated you can use this as a base template for accessing the pool. 'ActiveDirectoryPassword' or 'ActiveDirectoryServicePrincipal'. MSI and Interactive logins don't work at this time. You can also run adhoc queries using the query property.
%%pyspark
db_properties={}
db_properties["authentication"] = "ActiveDirectoryServicePrincipal"
db_properties["AADSecurePrincipalId"] = "<app ID>"
db_properties["AADSecurePrincipalSecret"] = "<app secret>"
db_properties["driver"] = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
serverName = "<server name>"
databaseName = "<database name>"
tableName = "<table name>"
data = spark.read.jdbc("jdbc:sqlserver://<server>-ondemand.sql.azuresynapse.net;databaseName=<dbName>", "<table>",properties=db_properties)
data.show(10)
6 people are following this question.