How to configure ADF pipeline run, linked service, so it uses Databricks serverless compute

Question

Databricks has recently announced serverless compute for workflows:

https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/run-serverless-jobs

I would like to be able to execute Azure Data Factory (ADF) jobs using this functionality.

Currently, for job compute I have to specify driver and worker type, with serverless it is not needed.

Answer

@Krzysztof Przysowa

Thanks for using MS Q&A platform and posting your query.

Serverless compute for workflows allows you to run your Databricks job without configuring and deploying infrastructure With serverless compute, you focus on implementing your data processing and analysis pipelines, and Databricks efficiently manages compute resources, including optimizing and scaling compute for your workloads

To configure your Azure Data Factory (ADF) pipeline to use Databricks serverless compute.

Here are the steps to configure an existing job to use serverless compute:

Create a Linked Service for Databricks: On the ADF home page, switch to the Manage tab in the left panel. Select Linked services under Connections, and then select + New. In the New linked service window, select Compute > Azure Databricks, and then select Continue.
Configure the Linked Service: In the New linked service window, complete the following steps1:
- For Name, enter AzureDatabricks_LinkedService.
- Provide the necessary details for your Databricks workspace, such as the URL and access token.
Configure the ADF Pipeline: When creating or editing a pipeline in ADF, you can specify the Databricks linked service as the compute environment for your activities
Parametrize the Spark Configs: If you want to parametrize the spark config values as well as keys, you can do so when writing an ARM template for Data Factory. In the “Microsoft.DataFactory/factories/linkedservices” resource, you can define the newClusterSparkConf.
Use Serverless Compute with Databricks Jobs: To learn more about using serverless compute with your Azure Databricks jobs, you can refer to the official documentation
Open the job you want to edit.
In the Job details side panel click Swap under Compute.
Click New, enter or update any settings, and click Update.
Alternatively, you can click in the Compute drop-down menu and select Serverless.

please go through the link for more details:https://docs.databricks.com/en/workflows/jobs/run-serverless-jobs.html

Please note that your Databricks workspace must have Unity Catalog enabled and your workloads must support shared access mode.. Also, your Azure Databricks workspace must be in a supported region.

You can also automate creating and running jobs that use serverless compute with the Jobs API, Databricks Asset Bundles, and the Databricks SDK for Python.

please refer.

https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/run-serverless-jobs

https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/use-compute

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer

@Krzysztof Przysowa - Thanks for the question and using MS Q&A platform.

Here is an update from internal team:

The only way to make it work would be to use the Databricks REST API with ADF's web activity.

For more details, refer to Azure Databricks REST API -Jobs API 2.0 and Web activity in Azure Data Factory and Azure Synapse Analytics.

Here is an third-party which explains on how to run databrick rest api with ADF web activity: Azure Data Factory integration with Databricks Workflows.

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

How to configure ADF pipeline run, linked service, so it uses Databricks serverless compute

2 answers