SQL Databases using the Apache Spark Connector

The Spark connector for Azure SQL Database and SQL Server enables these databases to act as input data sources and output data sinks for Apache Spark jobs. It allows you to use real-time transactional data in big data analytics and persist results for ad-hoc queries or reporting.

Compared to the built-in JDBC connector, this connector provides the ability to bulk insert data into SQL databases. It can outperform row-by-row insertion with 10x to 20x faster performance. The Spark connector for SQL Server and Azure SQL Database also supports Azure Active Directory (Azure AD) authentication, enabling you to connect securely to your Azure SQL databases from Azure Databricks using your AAD account. It provides interfaces that are similar to the built-in JDBC connector. It is easy to migrate your existing Spark jobs to use this connector.

Requirements

Component Versions Supported
Apache Spark 2.0.2 and above
Scala 2.10 and above
Microsoft JDBC Driver for SQL Server 6.2 and above
Microsoft SQL Server SQL Server 2008 and above
Azure SQL Database Supported

Create and install Spark connector library

  1. Create an Azure Databricks library for the Spark connector as a Maven library. Use the coordinate: com.microsoft.azure:azure-sqldb-spark:1.0.2.
  2. Install the library in the cluster that will access the database.

Use the Spark connector

For instructions on using the Spark connector, see Accelerate real-time big data analytics with Spark connector for Azure SQL Database and SQL Server.