Help on executing custom query on Azure Cosmos DB using Spark in Azure Synapse

Abhiram Duvvuru 191 Reputation points Microsoft Employee
2024-04-30T21:06:42.2066667+00:00

Hi Team,

I want to run custom query against cosmos db in spark using clientid and secret but I'm ending up with below error. can you please assit. I tried multiple way but it still gives same error.

AppId has Reader access on the cosmos db account.

As per link in error (https://aka.ms/cosmosdb-tsg-service-unavailable-java) , I executed telnet command and it doesn't display any information.

 cosmosDbOptions = {
                "spark.cosmos.accountEndpoint" : "https://xxxxxx.documents.azure.com:443/",
                "spark.cosmos.auth.type" : "ServicePrinciple",
                "spark.cosmos.account.subscriptionId" : "xxxxxx-xxxx-xxxx-xxxx-xxxxxxxx",
                "spark.cosmos.account.tenantId" : tenant_id,
                "spark.cosmos.account.resourceGroupName" : "xxx-xxx",
                "spark.cosmos.auth.aad.clientId" : "xxxxx-xxxx-xxxx-xxx-xxxxxx",
                "spark.cosmos.auth.aad.clientSecret" : secret,
                "spark.cosmos.database": "xxxxx",
                "spark.cosmos.container" : "xxxx"
            }

        df_results = (
            spark.read.format("cosmos.oltp")
            .option("spark.cosmos.read.customQuery", cosmosDBQuery)
            .options(**cosmosDbOptions)
            .load()
        )

Error:

Py4JJavaError: An error occurred while calling o4062.load. : java.lang.RuntimeException: Client initialization failed. Check if the endpoint is reachable and if your auth token is valid. More info: https://aka.ms/cosmosdb-tsg-service-unavailable-java at azure_cosmos_spark.com.azure.cosmos.implementation.RxDocumentClientImpl.initializeGatewayConfigurationReader(RxDocumentClientImpl.java:493) at azure_cosmos_spark.com.azure.cosmos.implementation.RxDocumentClientImpl.init(RxDocumentClientImpl.java:528) at azure_cosmos_spark.com.azure.cosmos.implementation.AsyncDocumentClient$Builder.build(AsyncDocumentClient.java:281) at azure_cosmos_spark.com.azure.cosmos.CosmosAsyncClient.<init>(CosmosAsyncClient.java:164) at azure_cosmos_spark.com.azure.cosmos.CosmosClientBuilder.buildAsyncClient(CosmosClientBuilder.java:961) at com.azure.cosmos.spark.CosmosClientCache$.createCosmosAsyncClient(CosmosClientCache.scala:323) at com.azure.cosmos.spark.CosmosClientCache$.syncCreate(CosmosClientCache.scala:134) at com.azure.cosmos.spark.CosmosClientCache$.apply(CosmosClientCache.scala:76) at com.azure.cosmos.spark.ItemsReadOnlyTable.schema(ItemsReadOnlyTable.scala:111) at com.azure.cosmos.spark.CosmosItemsDataSource.inferSchema(CosmosItemsDataSource.scala:36) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81) at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:241) at scala.Option.map(Option.scala:230) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:218) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:176) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:750)

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,454 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 79,141 Reputation points Microsoft Employee
    2024-05-10T02:53:54.34+00:00

    @Abhiram Duvvuru - I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer .

    Ask: Help on executing custom query on Azure Cosmos DB using Spark in Azure Synapse?

    Solution: The issue is resolved. After assigning the Cosmos DB built-in data contributor role to the AppId on the database, it started functioning properly. This capability isn't accessible through IAM, requiring the execution of a PowerShell script to provision access.

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

    If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

0 additional answers

Sort by: Most helpful