question

KristoRaun-5511 avatar image
0 Votes"
KristoRaun-5511 asked AndrewFogarty-1127 answered

How to read uniqueidentifier into Spark notebook in Synapse Analytics Studio?

We are looking into a possibility to move some notebooks from Databricks to Synapse Studio. However, it seems that it is impossible in Synapse Spark notebooks to query tables with uniqueidentifiers?

Example table:

 CREATE TABLE dbo.test
 (id uniqueidentifier null
 , ts datetimeoffset not null
 ,data float null
 ) 

Create a spark dataframe in notebook:
val df = spark.read.synapsesql("testdb.dbo.test")
display(df)

The display() command returns error:
Error : com.microsoft.spark.sqlanalytics.exception.SQLAnalyticsConnectorException: com.microsoft.sqlserver.jdbc.SQLServerException: Columns with UniqueIdentifier types are not supported in external tables.

I thought about creating a view where the uniqueidentifier would be a varchar, but views are not supported by synapsesql().

Do I have any other way to get this table into Spark notebook for some processing, or should we just forget about Synapse studio for now?

azure-synapse-analytics
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello anonymous user,

Welcome to the Microsoft Q&A platform.

We are reaching out to the internal team to get more help on this, I will update you once we hear back from them.

0 Votes 0 ·

Hello anonymous user ,

If you check the SQL data types https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/types/package-summary.html Unique identifier is not called out here so its not supported .

Thanks
HImanshu

0 Votes 0 ·
KristoRaun-5511 avatar image
1 Vote"
KristoRaun-5511 answered KristoRaun-5511 published

Hi @HimanshuSinha-MSFT , Thanks for your comment, but I find it hard to agree with. On the link you provided, it also does not list datetimeoffset(7) as a supported data type, but still when using synapsesql() it loads the datetimeoffset(7) column as a Spark string type, AND it displays the dataframe without errors. Also, uniqueidentifier is mapped to string, but then it gives the error as in my original post. Please see images added to this post. ![62388-image-7.png][1] ![62464-image-9.png][2] It seems to me that correctly fetching uniqueidentifier by synapsesql() is just not yet implemented, or it is implemented erroneously. If the first is true (not yet implemented), I would strongly recommend to implement in the near future. Otherwise, what is the benefit of synapsesql() if you cannot use it for some data types? Also, uniqueidentifier is supported by JDBC and other competitive platforms (eg Databricks). It is strange that synapsesql() doesn't do it, because based on the image here: https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/synapse-spark-sql-pool-import-export I understand that synapsesql() is also using jdbc. Unfortunately, Synapse Studio is not useable for our use case then at the moment. Let me know if I misunderstood something. [1]: /answers/storage/attachments/62388-image-7.png [2]: /answers/storage/attachments/62464-image-9.png

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

AndrewFogarty-1127 avatar image
1 Vote"
AndrewFogarty-1127 answered

This is still an issue and a recommended work around would be great.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.