Verify the version of Log4j on your cluster

Databricks recently published a blog on Log4j 2 Vulnerability (CVE-2021-44228) Research and Assessment. Databricks does not directly use a version of Log4j known to be affected by this vulnerability within the Azure Databricks platform in a way we understand may be vulnerable.

If you are using Log4j within your cluster (for example, if you are processing user-controlled strings through Log4j), your use may be potentially vulnerable to the exploit if you have installed, and are using, an affected version or have installed services that transitively depend on an affected version.

This article explains how to check your cluster for installed versions of Log4j 2 and how to upgrade those instances.

Important

DISCLAIMER: The suggestions provided in this article reflect Databricks’s best understanding of the ways to make these determinations at this time. Because we do not control your code, we cannot guarantee that if you fail to find Log4j by following these directions or using the suggested scanners, that affected Log4j code is not present in your code.

Check to see if Log4j 2 is installed

Check for a manual install

Manually review the libraries installed on your cluster.

If you have explicitly installed a version of Log4j 2 via Maven, it is listed under Libraries in the cluster UI.

Scan the classpath

Scan your classpath to check for a version of Log4j 2.

  1. Start your cluster.

  2. Attach a notebook to your cluster.

  3. Run this code to scan your classpath:

    {
      import scala.util.{Try, Success, Failure}
      import java.lang.ClassNotFoundException
      Try(Class.forName("org.apache.logging.log4j.core.Logger", false, this.getClass.getClassLoader)) match {
        case Success(loggerCls) =>
          Option(loggerCls.getPackage) match {
              case Some(pkg) =>
                println(s"Version: ${pkg.getSpecificationTitle} ${pkg.getSpecificationVersion}")
              case None =>
                println("Could not determine Log4J 2 version")
          }
        case Failure(e: ClassNotFoundException) =>
          println("Could not load Log4J 2 class")
        case Failure(e) =>
          println(s"Unexpected Error: $e")
          throw e
      }
    }
    
  • If Log4j 2 is NOT PRESENT on your classpath, you see a result like this:

    Could not load Log4J 2 class
    
  • If Log4j 2 is PRESENT on your classpath, you should see a result like this, which includes the Log4j 2 version:

    Version: Apache Log4j Core 2.15.0
    

Note

This method does not identify cases where Log4j classes are shaded or included transitively.

Scan all user installed jars

Locate all of the user installed jar files on your cluster and run a scanner to check for vulnerable Log4j 2 versions.

  1. Start your cluster.

  2. Attach a notebook to your cluster.

  3. Run this code to identify the location of the jar files:

    import org.apache.spark._
    
    val sparkEnv = SparkEnv.get
    val field = SparkEnv.get.getClass.getDeclaredField("driverTmpDir")
    field.setAccessible(true)
    println(s"Your jars are installed under ${field.get(sparkEnv).asInstanceOf[Option[String]].get}\n")
    
  4. The code displays the location of your jar files.

    Your jars are installed under /local_disk0/spark-1a6be695-9318-463c-b966-256c32e3771c/userFiles-582ca64b-93c9-444c-85b8-7779bd2c5e52
    
  5. Download the jar files to your local machine.

  6. Run a scanner like Logpresso to check for vulnerable Log4j 2 versions.

Important

DISCLAIMER: The Logpresso scanner is open source software provided by a third party. Databricks makes no representations of any kind regarding the function or quality of Logpresso.

Upgrade your Log4j 2 version

Upgrade via cluster UI

  • If you manually installed Log4j 2 via the cluster UI, ensure that it is version 2.17 or above. In this case, no action is required.
  • If you manually installed Log4j 2 via the cluster UI, and it is 2.16 or below, you should uninstall the library from the cluster and install version 2.17 or above.

Note

If Log4j 2 is a transitive dependency for another library, upgrade the library that uses Log4j 2 to a secure version. You can also exclude the Log4j 2 package when pulling in an outdated library, and explicitly include a secure version of Log4j 2. This is not guaranteed to work.

Upgrade via command line

If you have installed Log4j 2 via command line (or via SSH), use the same method to upgrade Log4j 2 to a secure version.

Upgrade custom built jar

If you include Log4j 2 in a custom built jar, upgrade Log4j 2 to a secure version and rebuild your jar.

Re-attach the updated jar to your cluster.

Restart your cluster after upgrading

Restart your cluster after upgrading Log4j 2.