Access Azure Cosmos DB Cassandra API data from Azure Databricks
APPLIES TO:
Cassandra API
This article details how to work with Azure Cosmos DB Cassandra API from Spark on Azure Databricks.
Prerequisites
Review the basics of connecting to Azure Cosmos DB Cassandra API
Cassandra API instance configuration for Cassandra connector:
The connector for Cassandra API requires the Cassandra connection details to be initialized as part of the spark context. When you launch a Databricks notebook, the spark context is already initialized and it is not advisable to stop and reinitialize it. One solution is to add the Cassandra API instance configuration at a cluster level, in the cluster spark configuration. This is a one-time activity per cluster. Add the following code to the Spark configuration as a space separated key value pair:
spark.cassandra.connection.host YOUR_COSMOSDB_ACCOUNT_NAME.cassandra.cosmosdb.azure.com spark.cassandra.connection.port 10350 spark.cassandra.connection.ssl.enabled true spark.cassandra.auth.username YOUR_COSMOSDB_ACCOUNT_NAME spark.cassandra.auth.password YOUR_COSMOSDB_KEY
Add the required dependencies
Cassandra Spark connector: - To integrate Azure Cosmos DB Cassandra API with Spark, the Cassandra connector should be attached to the Azure Databricks cluster. To attach the cluster:
- Review the Databricks runtime version, the Spark version. Then find the maven coordinates that are compatible with the Cassandra Spark connector, and attach it to the cluster. See "Upload a Maven package or Spark package" article to attach the connector library to the cluster. We recommend selecting Databricks runtime version 7.5, which supports Spark 3.0. To add the Apache Spark Cassandra Connector, your cluster, select Libraries > Install New > Maven, and then add
com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.0.0in Maven coordinates. If using Spark 2.x, we recommend an environment with Spark version 2.4.5, using spark connector at maven coordinatescom.datastax.spark:spark-cassandra-connector_2.11:2.4.3.
- Review the Databricks runtime version, the Spark version. Then find the maven coordinates that are compatible with the Cassandra Spark connector, and attach it to the cluster. See "Upload a Maven package or Spark package" article to attach the connector library to the cluster. We recommend selecting Databricks runtime version 7.5, which supports Spark 3.0. To add the Apache Spark Cassandra Connector, your cluster, select Libraries > Install New > Maven, and then add
Azure Cosmos DB Cassandra API-specific library: - If you are using Spark 2.x, a custom connection factory is required to configure the retry policy from the Cassandra Spark connector to Azure Cosmos DB Cassandra API. Add the
com.microsoft.azure.cosmosdb:azure-cosmos-cassandra-spark-helper:1.2.0maven coordinates to attach the library to the cluster.
Note
If you are using Spark 3.0 or higher, you do not need to install the Cosmos DB Cassandra API-specific library mentioned above.
Sample notebooks
A list of Azure Databricks sample notebooks are available in GitHub repo for you to download. These samples include how to connect to Azure Cosmos DB Cassandra API from Spark and perform different CRUD operations on the data. You can also import all the notebooks into your Databricks cluster workspace and run it.
Accessing Azure Cosmos DB Cassandra API from Spark Scala programs
Spark programs to be run as automated processes on Azure Databricks are submitted to the cluster by using spark-submit) and scheduled to run through the Azure Databricks jobs.
The following are links to help you get started building Spark Scala programs to interact with Azure Cosmos DB Cassandra API.
- How to connect to Azure Cosmos DB Cassandra API from a Spark Scala program
- How to run a Spark Scala program as an automated job on Azure Databricks
- Complete list of code samples for working with Cassandra API
Next steps
Get started with creating a Cassandra API account, database, and a table by using a Java application.
Povratne informacije
Pošalјite i prikažite povratne informacije za