Git integration with Databricks Repos

Note

Support for arbitrary files in Databricks Repos is now in Public Preview. For details, see Work with non-notebook files in an Azure Databricks repo and Import Python and R modules.

To support best practices for data science and engineering code development, Databricks Repos provides repository-level integration with Git providers. You can develop code in an Azure Databricks notebook and sync it with a remote Git repository. Databricks Repos lets you use Git functionality such as cloning a remote repo, managing branches, pushing and pulling changes, and visually comparing differences upon commit.

Databricks Repos also provides an API that you can integrate with your CI/CD pipeline. For example, you can programmatically update a Databricks repo so that it always has the most recent code version.

Databricks Repos provides security features such as allow lists to control access to Git repositories and detection of clear text secrets in source code.

When audit logging is enabled, audit events are logged when you interact with a Databricks repo. For example, an audit event is logged when you create, update, or delete a Databricks repo, when you list all Databricks Repos associated with a workspace, and when you sync changes between your Databricks repo and the Git remote.

For more information about best practices for code development using Databricks Repos, see CI/CD workflows with Databricks Repos and Git integration.

Requirements

Azure Databricks supports these Git providers:

  • GitHub
  • Bitbucket Cloud and Server
  • GitLab
  • Azure DevOps (not available in Azure China regions)
  • AWS CodeCommit
  • GitHub AE

Databricks Repos supports Bitbucket Server, GitHub Enterprise Server, or a GitLab self-managed subscription instance integration, if the server is internet accessible.

To integrate with a private Git server instance that is not internet-accessible, get in touch with your Databricks representative.

Support for arbitrary files in Databricks Repos is available in Databricks Runtime 8.4 and above.