Azure Cosmos DB Apache Spark 2 OLTP Connector for Core (SQL) API: Release notes and resources

APPLIES TO: SQL API

You can accelerate big data analytics by using the Azure Cosmos DB Apache Spark 2 OLTP Connector for Core (SQL). The Spark Connector allows you to run Spark jobs on data stored in Azure Cosmos DB. Batch and stream processing are supported.

You can use the connector with Azure Databricks or Azure HDInsight, which provide managed Spark clusters on Azure. The following table shows supported versions:

Component Version
Apache Spark 2.4.x, 2.3.x, 2.2.x, and 2.1.x
Scala 2.11
Azure Databricks (runtime version) Later than 3.4

Warning

This connector supports the core (SQL) API of Azure Cosmos DB. For the Cosmos DB API for MongoDB, use the MongoDB Connector for Spark. For the Cosmos DB Cassandra API, use the Cassandra Spark connector.

Resources

Resource Link
SDK download Download latest .jar, Maven
API documentation Spark Connector reference
Contribute to the SDK Azure Cosmos DB Connector for Apache Spark on GitHub
Get started Accelerate big data analytics by using the Apache Spark to Azure Cosmos DB connector
Use Apache Spark Structured Streaming with Apache Kafka and Azure Cosmos DB

Release history

3.3.0

New features

  • Adds a new config option, changefeedstartfromdatetime, which can be used to specify the start time for when the changefeed should be processed. For more information, see Config options.

3.2.0

Key bug fixes

  • Fixes a regression that caused excessive memory consumption on the executors for large result sets (for example, with millions of rows), ultimately resulting in the error java.lang.OutOfMemoryError: GC overhead limit exceeded.

3.1.1

Key bug fixes

  • Fixes a streaming checkpoint edge case in which the ID contains the pipe character (|) with the ChangeFeedMaxPagesPerBatch config applied.

3.1.0

New features

  • Adds support for bulk updates when nested partition keys are used.
  • Adds support for Decimal and Float data types during writes to Azure Cosmos DB.
  • Adds support for Timestamp types when they're using Long (Unix epoch) as a value.

3.0.8

Key bug fixes

  • Fixes typecast exception that occurs when the WriteThroughputBudget config is used.

3.0.7

New features

  • Adds error information for bulk failures to exception and log.

3.0.6

Key bug fixes

  • Fixes streaming checkpoint issues.

3.0.5

Key bug fixes

  • To reduce noise, fixes log level of a message left unintentionally with level ERROR.

3.0.4

Key bug fixes

  • Fixes a bug in structured streaming during partition splits. The bug could result in some missing change feed records or Null exceptions for checkpoint writes.

3.0.3

Key bug fixes

  • Fixes a bug that causes a custom schema provided for readStream to be ignored.

3.0.2

Key bug fixes

  • Fixes a regression (unshaded JAR includes all shaded dependencies) that increases build time by 50 percent.

3.0.1

Key bug fixes

  • Fixes a dependency problem that causes Direct Transport over TCP to fail with RequestTimeoutException.

3.0.0

New features

  • Improves connection management and connection pooling to reduce the number of metadata calls.

FAQ

How will I be notified of the retiring SDK?

Microsoft will provide 12 month's advance notice before the end of support of the retiring SDK to facilitate a smooth transition to a supported SDK. We'll notify you through various communication channels: the Azure portal, Azure updates, and direct communication to assigned service administrators.

Can I author applications by using a to-be-retired Azure Cosmos DB SDK during the 12-month period?

Yes, you'll be able to author, deploy, and modify applications by using the to-be-retired Azure Cosmos DB SDK during the 12-month notice period. We recommend that you migrate to a newer supported version of the Azure Cosmos DB SDK during the 12-month notice period, as appropriate.

After the retirement date, what happens to applications that use the unsupported Azure Cosmos DB SDK?

After the retirement date, Azure Cosmos DB will no longer make bug fixes, add new features, or provide support to the retired SDK versions. If you prefer not to upgrade, requests sent from the retired versions of the SDK will continue to be served by the Azure Cosmos DB service.

Which SDK versions will have the latest features and updates?

New features and updates will be added only to the latest minor version of the latest supported major SDK version. We recommend that you always use the latest version to take advantage of new features, performance improvements, and bug fixes. If you're using an old, non-retired version of the SDK, your requests to Azure Cosmos DB will still function, but you won't have access to any new capabilities.

What should I do if I can't update my application before a cutoff date?

We recommend that you upgrade to the latest SDK as early as possible. After an SDK is tagged for retirement, you'll have 12 months to update your application. If you're not able to update by the retirement date, requests sent from the retired versions of the SDK will continue to be served by Azure Cosmos DB, so your running applications will continue to function. But Azure Cosmos DB will no longer make bug fixes, add new features, or provide support to the retired SDK versions.

If you have a support plan and require technical support, contact us by filing a support ticket.

Next steps

Learn more about Azure Cosmos DB.

Learn more about Apache Spark.