Diagnostic logging in Azure Databricks

Azure Databricks provides comprehensive end-to-end diagnostic logs of activities performed by Azure Databricks users, allowing your enterprise to monitor detailed Azure Databricks usage patterns.

Configure diagnostic log delivery


Diagnostic logs require the Azure Databricks Premium Plan.

  1. Log in to the Azure portal as an Owner or Contributor for the Azure Databricks workspace and click your Azure Databricks Service resource.

  2. In the Monitoring section of the sidebar, click the Diagnostic settings tab.

  3. Click Turn on diagnostics.

    Azure Databricks turn on diagnostics

  4. On the Diagnostic settings page, provide the following configuration:


    Enter a name for the logs to create.

    Archive to a storage account

    To use this option, you need an existing storage account to connect to. To create a new storage account in the portal, see Create a storage account and follow the instructions to create an Azure Resource Manager, general-purpose account. Then return to this page in the portal to select your storage account. It might take a few minutes for newly created storage accounts to appear in the drop-down menu. For information about additional costs incurred by writing to a storage account, see Azure Storage pricing.

    Stream to an event hub

    To use this option, you need an existing Azure Event Hubs namespace and event hub to connect to. To create an Event Hubs namespace, see Create an Event Hubs namespace and an event hub by using the Azure portal. Then return to this page in the portal to select the Event Hubs namespace and policy name. For information about additional costs incurred by writing to an event hub, see Azure Event Hubs pricing.

    Send to Log Analytics

    To use this option, either use an existing Log Analytics workspace or create a new one by following the steps to Create a new workspace in the portal. For information about additional costs incurred by sending logs to Log Analytics, see Azure Monitor pricing.

    Azure Databricks Diagnostics settings

  5. Choose the services you want diagnostic logs for and set retention policies.

    Retention applies only to storage accounts. If you do not want to apply a retention policy and you want to retain data forever, set Retention (days) to 0.

  6. Select Save.

  7. If you receive an error that says “Failed to update diagnostics for . The subscription is not registered to use microsoft.insights,” follow the Troubleshoot Azure Diagnostics instructions to register the account and then retry this procedure.

  8. If you want to change how your diagnostic logs are saved at any point in the future, return to this page to modify the diagnostic log settings for your account.

Turn on logging using PowerShell

  1. Start an Azure PowerShell session and sign in to your Azure account with the following command:


    If you do not have Azure Powershell installed already, use the following commands to install Azure PowerShell and import the Azure RM module.

    Install-Module -Name Az -AllowClobber
    Import-Module AzureRM
  2. In the pop-up browser window, enter your Azure account user name and password. Azure PowerShell gets all of the subscriptions that are associated with this account, and by default, uses the first one.

    If you have more than one subscription, you might have to specify the specific subscription that was used to create your Azure key vault. To see the subscriptions for your account, type the following command:


    To specify the subscription that’s associated with the Azure Databricks account that you’re logging, type the following command:

    Set-AzContext -SubscriptionId <subscription ID>
  3. Set your Log Analytics resource name to a variable named logAnalytics, where ResourceName is the name of the Log Analytics workspace.

    $logAnalytics = Get-AzResource -ResourceGroupName <resource group name> -ResourceName <resource name> -ResourceType "Microsoft.OperationalInsights/workspaces"
  4. Set the Azure Databricks service resource name to a variable named databricks, where ResourceName is the name of the Azure Databricks service.

    $databricks = Get-AzResource -ResourceGroupName <your resource group name> -ResourceName <your Azure Databricks service name> -ResourceType "Microsoft.Databricks/workspaces"
  5. To enable logging for Azure Databricks, use the Set-AzDiagnosticSetting cmdlet with variables for the new storage account, Azure Databricks service, and the category to enable for logging. Run the following command and set the -Enabled flag to $true:

    Set-AzDiagnosticSetting -ResourceId $databricks.ResourceId -WorkspaceId $logAnalytics.ResourceId -Enabled $true -name "<diagnostic setting name>" -Category <comma separated list>

Enable logging by using Azure CLI

  1. Open PowerShell.

  2. Use the following command to connect to your Azure account:

    az login
  3. Run the following diagnostic setting command:

    az monitor diagnostic-settings create --name <diagnostic name>
    --resource-group <log analytics workspace resource group>
    --workspace <log analytics name or object ID>
    --resource <target resource object ID>
    --logs '[
      \"category\": <category name>,
      \"enabled\": true


Use the LogSettings API.


PUT https://management.azure.com/{resourceUri}/providers/microsoft.insights/diagnosticSettings/{name}?api-version=2017-05-01-preview

Request body

    "properties": {
    "workspaceId": "<log analytics resourceId>",
    "logs": [
        "category": "<category name>",
        "enabled": true,
        "retentionPolicy": {
          "enabled": false,
          "days": 0

Diagnostic log delivery

Once logging is enabled for your account, Azure Databricks automatically starts sending diagnostic logs to your delivery location. Logs are available within 15 minutes of activation. Azure Databricks auditable events typically appear in diagnostic logs within 15 minutes in Azure Commercial regions.


SSH login logs are delivered with high latency.

Diagnostic log schema

The schema of diagnostic log records is as follows:

Field Description
operationversion The schema version of the diagnostic log format.
time UTC timestamp of the action.
properties.sourceIPAddress The IP address of the source request.
properties.userAgent The browser or API client used to make the request.
properties.sessionId Session ID of the action.
identities Information about the user that makes the requests:

* email: User email address.
category The service that logged the request.
operationName The action, such as login, logout, read, write, etc.
properties.requestId Unique request ID.
properties.requestParams Parameter key-value pairs used in the event.
properties.response Response to the request:

* errorMessage: The error message if there was an error.
* result: The result of the request.
* statusCode: HTTP status code that indicates whether the request succeeds or not.
properties.logId The unique identifier for the log messages.


The category and operationName properties identify an event in a log record. Azure Databricks provides diagnostic logs for the following services:

  • DBFS
  • Clusters
  • Pools
  • Accounts
  • Jobs
  • Notebook
  • SSH
  • Workspace
  • Secrets
  • Databricks SQL
  • SQL Permissions
  • Repos

If actions take a long time, the request and response are logged separately, but the request and response pair have the same properties.requestId.

With the exception of mount-related operations, Azure Databricks diagnostic logs do not include DBFS-related operations.


Automated actions—such as resizing a cluster due to autoscaling or launching a job due to scheduling—are performed by the user System-User.

Sample log output

The following JSON sample is an example of Azure Databricks log output:

    "TenantId": "<your tenant id",
    "SourceSystem": "|Databricks|",
    "TimeGenerated": "2019-05-01T00:18:58Z",
    "OperationName": "Microsoft.Databricks/jobs/create",
    "OperationVersion": "1.0.0",
    "Category": "jobs",
    "Identity": {
        "email": "mail@contoso.com",
        "subjectName": null
    "SourceIPAddress": "",
    "LogId": "201b6d83-396a-4f3c-9dee-65c971ddeb2b",
    "ServiceName": "jobs",
    "UserAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36",
    "SessionId": "webapp-cons-webapp-01exaj6u94682b1an89u7g166c",
    "ActionName": "create",
    "RequestId": "ServiceMain-206b2474f0620002",
    "Response": {
        "statusCode": 200,
        "result": "{\"job_id\":1}"
    "RequestParams": {
        "name": "Untitled",
        "new_cluster": "{\"node_type_id\":\"Standard_DS3_v2\",\"spark_version\":\"5.2.x-scala2.11\",\"num_workers\":8,\"spark_conf\":{\"spark.databricks.delta.preview.enabled\":\"true\"},\"cluster_creator\":\"JOB_LAUNCHER\",\"spark_env_vars\":{\"PYSPARK_PYTHON\":\"/databricks/python3/bin/python3\"},\"enable_elastic_disk\":true}"
    "Type": "DatabricksJobs"

Analyze diagnostic logs

If you selected the Send to Log Analytics option when you turned on diagnostic logging, diagnostic data from your container is typically forwarded to Azure Monitor logs within 15 minutes.

Before you view your logs, verify if your Log Analytics workspace has been upgraded to use the new Kusto query language. To check, open the Azure portal and select Log Analytics on the far left. Then select your Log Analytics workspace. If you get a message to upgrade, see Upgrade your Azure Log Analytics workspace to new log search.

To view your diagnostic data in Azure Monitor logs, open the Log Search page from the left menu or the Management area of the page. Then enter your query into the Log search box.

Azure Log Analytics


Here are some additional queries that you can enter into the Log search box. These queries are written in Kusto Query Language.

  • To query all users who have accessed the Azure Databricks workspace and their location:

    | where ActionName contains "login"
    | extend d=parse_json(Identity)
    | project UserEmail=d.email, SourceIPAddress
  • To check the Spark versions used:

    | where ActionName == "create"
    | extend d=parse_json(RequestParams)
    | extend SparkVersion= d.spark_version
    | summarize Count=count() by tostring(SparkVersion)