Send Azure Databricks application logs to Azure Monitor
This article shows how to send application logs and metrics from Azure Databricks to a Log Analytics workspace. It uses the Azure Databricks Monitoring Library, which is available on GitHub.
Prerequisites
Configure your Azure Databricks cluster to use the monitoring library, as described in the GitHub readme.
Note
The monitoring library streams Apache Spark level events and Spark Structured Streaming metrics from your jobs to Azure Monitor. You don't need to make any changes to your application code for these events and metrics.
Send application metrics using Dropwizard
Spark uses a configurable metrics system based on the Dropwizard Metrics Library. For more information, see Metrics in the Spark documentation.
To send application metrics from Azure Databricks application code to Azure Monitor, follow these steps:
Build the spark-listeners-loganalytics-1.0-SNAPSHOT.jar JAR file as described in the GitHub readme.
Create Dropwizard gauges or counters in your application code. You can use the
UserMetricsSystemclass defined in the monitoring library. The following example creates a counter namedcounter1.import org.apache.spark.metrics.UserMetricsSystems import org.apache.spark.sql.SparkSession object StreamingQueryListenerSampleJob { private final val METRICS_NAMESPACE = "samplejob" private final val COUNTER_NAME = "counter1" def main(args: Array[String]): Unit = { val spark = SparkSession .builder .getOrCreate val driverMetricsSystem = UserMetricsSystems .getMetricSystem(METRICS_NAMESPACE, builder => { builder.registerCounter(COUNTER_NAME) }) driverMetricsSystem.counter(COUNTER_NAME).inc(5) } }The monitoring library includes a sample application that demonstrates how to use the
UserMetricsSystemclass.
Send application logs using Log4j
To send your Azure Databricks application logs to Azure Log Analytics using the Log4j appender in the library, follow these steps:
Build the spark-listeners-1.0-SNAPSHOT.jar and the spark-listeners-loganalytics-1.0-SNAPSHOT.jar JAR file as described in the GitHub readme.
Create a log4j.properties configuration file for your application. Include the following configuration properties. Substitute your application package name and log level where indicated:
log4j.appender.A1=com.microsoft.pnp.logging.loganalytics.LogAnalyticsAppender log4j.appender.A1.layout=com.microsoft.pnp.logging.JSONLayout log4j.appender.A1.layout.LocationInfo=false log4j.additivity.<your application package name>=false log4j.logger.<your application package name>=<log level>, A1You can find a sample configuration file here.
In your application code, include the spark-listeners-loganalytics project, and import
com.microsoft.pnp.logging.Log4jconfigurationto your application code.import com.microsoft.pnp.logging.Log4jConfigurationConfigure Log4j using the log4j.properties file you created in step 3:
getClass.getResourceAsStream("<path to file in your JAR file>/log4j.properties") { stream => { Log4jConfiguration.configure(stream) } }Add Apache Spark log messages at the appropriate level in your code as required. For example, use the
logDebugmethod to send a debug log message. For more information, see Logging in the Spark documentation.logTrace("Trace message") logDebug("Debug message") logInfo("Info message") logWarning("Warning message") logError("Error message")
Run the sample application
The monitoring library includes a sample application that demonstrates how to send both application metrics and application logs to Azure Monitor. To run the sample:
Build the spark-jobs project in the monitoring library, as described in the GitHub readme.
Navigate to your Databricks workspace and create a new job, as described here.
In the job detail page, select Set JAR.
Upload the JAR file from
/src/spark-jobs/target/spark-jobs-1.0-SNAPSHOT.jar.For Main class, enter
com.microsoft.pnp.samplejob.StreamingQueryListenerSampleJob.Select a cluster that is already configured to use the monitoring library. See Configure Azure Databricks to send metrics to Azure Monitor.
When the job runs, you can view the application logs and metrics in your Log Analytics workspace.
Application logs appear under SparkLoggingEvent_CL:
SparkLoggingEvent_CL | where logger_name_s contains "com.microsoft.pnp"
Application metrics appear under SparkMetric_CL:
SparkMetric_CL | where name_s contains "rowcounter" | limit 50
Important
After you verify the metrics appear, stop the sample application job.
Next steps
Deploy the performance monitoring dashboard that accompanies this code library to troubleshoot performance issues in your production Azure Databricks workloads.