question

DavidBeavon-2754 avatar image
0 Votes"
DavidBeavon-2754 asked ·

Please help interpret an obscure "warning" message from databricks scala REPL


I am getting a very scary "warning" message from my Azure Databricks notebook. See below. The scariest part about this is that the cell of the notebook will run to completion and Azure Databricks will continue running the other cells after it.

I've started the work of opening a ticket with Azure Databricks and I'm two weeks into it, without much to show for it. So I'm hoping the community can help decrypt the following, and extract something meaningful from it. Hopefully Databricks will eventually get around to pointing me to some explanation as well, and when they do I will add a reference to it here.

See the following message that I get in the output of one of my cells.
I've isolated the source of this message to a HIVE SQL expression that casts a double to a decimal:

                 CAST(CorporateExchangeRate AS DECIMAL(38,10)) AS CorporateExchangeRate,

The problem is that I don't know how to interpret this, nor do I know if there are any serious consequences to the results of the SQL. It would give me much more confidence if the notebook would just fail. Azure Databricks wasn't able to give me any definite reassurance that I'm not losing data.

(... continued...)






 java.lang.AssertionError: assertion failed: 
   Decimal$DecimalIsFractional
      while compiling: <notebook>
         during phase: globalPhase=terminal, enteringPhase=jvm
      library version: version 2.12.10
     compiler version: version 2.12.10
   reconstructed args: -deprecation 
    
    
 <snip>
 <snip>
 <snip>
    
 *** WARNING: skipped 115761 bytes of output ***
    
     at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2581)
     at org.codehaus.janino.UnitCompiler.access$2700(UnitCompiler.java:226)
     at org.codehaus.janino.UnitCompiler$6.visitLocalVariableDeclarationStatement(UnitCompiler.java:1506)
     at org.codehaus.janino.UnitCompiler$6.visitLocalVariableDeclarationStatement(UnitCompiler.java:1490)
     at org.codehaus.janino.Java$LocalVariableDeclarationStatement.accept(Java.java:3712)
     at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1490)
     at org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1573)
     at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1559)
     at org.codehaus.janino.UnitCompiler.access$1700(UnitCompiler.java:226)
     at org.codehaus.janino.UnitCompiler$6.visitBlock(UnitCompiler.java:1496)
     at org.codehaus.janino.UnitCompiler$6.visitBlock(UnitCompiler.java:1490)
     at org.codehaus.janino.Java$Block.accept(Java.java:2969)
     at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1490)
    
    
 <snip>
 <snip>
 <snip>
     
     at org.apache.spark.sql.catalyst.expressions.codegen.ClassBodyCompiler.cook(ClassBodyCompiler.scala:40)
     at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:205)
     at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80)
    
     at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1672)
     at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$2.load(CodeGenerator.scala:1774)
    
     at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$2.load(CodeGenerator.scala:1770)
    
     at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
     at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
     at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
     at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
     at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
     at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3936)
     at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4806)
     at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1625)
     at org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:697)
    
 <snip>
 <snip>
 <snip>
    
     at org.apache.spark.sql.Dataset.checkpoint(Dataset.scala:645)
    
     at linea4f254654449420eaafac100828cf736425.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$$$6ccebf379a837ca9c81f2f7985871da$$$$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$EpicorDataBridgeLogic$.JoinWithBridgeSurrogates(command-1450701816617879:98)
     at linea4f254654449420eaafac100828cf736425.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-1450701816617879:120)
     at linea4f254654449420eaafac100828cf736425.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-1450701816617879:303)
     at linea4f254654449420eaafac100828cf736425.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-1450701816617879:305)
     at linea4f254654449420eaafac100828cf736425.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-1450701816617879:307)
     at linea4f254654449420eaafac100828cf736425.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-1450701816617879:309)
     at linea4f254654449420eaafac100828cf736425.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-1450701816617879:311)
     at linea4f254654449420eaafac100828cf736425.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-1450701816617879:313)
     at linea4f254654449420eaafac100828cf736425.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-1450701816617879:315)
     at linea4f254654449420eaafac100828cf736425.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-1450701816617879:317)
     at linea4f254654449420eaafac100828cf736425.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-1450701816617879:319)
     at linea4f254654449420eaafac100828cf736425.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-1450701816617879:321)
    
 <snip>
 <snip>
 <snip>
    
    
    
     at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
     at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
     at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
    
 <snip>
 <snip>
 <snip>
    
     at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:215)
     at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$repl$1(ScalaDriverLocal.scala:204)
     at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    
 <snip>
 <snip>
 <snip>
    
     at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:486)
     at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:598)
     at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:391)
     at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:337)
     at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:219)
     at java.lang.Thread.run(Thread.java:748)
    
    
 error: error while loading Decimal, class file '/databricks/jars/----workspace_spark_3_0--sql--catalyst--catalyst-hive-2.3__hadoop-2.7_2.12_deploy.jar(org/apache/spark/sql/types/Decimal.class)' is broken
 (class java.lang.RuntimeException/error reading Scala signature of Decimal.class: assertion failed:
   Decimal$DecimalIsFractional
      while compiling: <notebook>
         during phase: globalPhase=terminal, enteringPhase=jvm
      library version: version 2.12.10
     compiler version: version 2.12.10
   reconstructed args: -deprecation -classpath 
    
    
 <snip>
 <snip>
 <snip>
    
    
    
   last tree to typer: TypeTree(class Byte)
        tree position: line 6 of <notebook>
             tree tpe: Byte
               symbol: (final abstract) class Byte in package scala
    symbol definition: final abstract class Byte extends  (a ClassSymbol)
       symbol package: scala
        symbol owners: class Byte
            call site: constructor $eval in object $eval in package linea4f254654449420eaafac100828cf736425
    
 <snip>
 <snip>
 <snip>
    
    
 == Source file context for tree position ==
    
      3
      4 object $eval {
      5   lazy val $result = INSTANCE.$ires30
      6   lazy val $print: _root_.java.lang.String =  {
      7     INSTANCE.$iw
      8
      9 val sb = new _root_.scala.StringBuilder)

Questions:

  • What the heck does this mean?

  • What components are generating the error? The Databricks REPL? I noticed the databricks package in the callstack. Is this component something that is shared with Apache Spark as well or is it a Databricks-specific issue?

  • What is janino? Some sort of a compiler used by the Databricks REPL?

  • Is janino encountering a java class loader issue? (see error reading Scala signature of Decimal.class)

  • Shouldn't it cause a fatal error if there is a runtime class loading failure?

  • Even if the databricks packages choose to ignore this exception, why would the apache code do the same (see stack references to org.apache.spark.sql)

  • Databricks says this is a known issue that they've been aware of for months. Why can't I google for it?

  • Is the CAST operation that I'm doing so unusual (from double to decimal)?

  • I have custom modules that are loaded into the cluster. They seem to be part of the issue because I can't repro without custom modules. Of course I don't have my own definition of org/apache/spark/sql/types/Decimal.class. Why do custom modules trigger this failure?

  • My environment is Scala 2.12.10 and Spark 3.0.1 and DBR 7.3 LTS (Azure Databricks)

Any help with the interpretation of this message would be greatly appreciated. I'm not a Scala nor Java expert, but would like to know what is going wrong.
The Databricks support team didn't seem especially alarmed about this. It took a couple weeks just to hear that this is a "known issue". They still haven't referred me to any KB or documentation that would help decipher this message. Nor do I know with certainty if there are consequences such as data loss.

The workaround seems to be to avoid my (seemingly innocent) CAST operation or push that out of their REPL and back into a custom module, or back into the data source. I am never opposed to workarounds, but would like to at least know what fragile portion of Databricks I'm avoiding so that I can make a note to avoid it in the future. Perhaps the goal is to avoid their REPL altogether, or avoid using it in conjunction with HIVE SQL?

Any help would be appreciated. My google searches are not giving me any helpful results.






azure-databricks
· 2
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @DavidBeavon-2754 and welcome back to Microsoft Q&A.

While I can't provide a definitive answer, I think I have a few leads.

The DecimalIsFractional seems to be a nested class of Decimal . There is also DecimalType. (See Tree for how these are related)

To help separate complications, do you still get the error when you change
CAST(CorporateExchangeRate AS DECIMAL(38,10)) AS CorporateExchangeRate;
to
CAST(CorporateExchangeRate AS DECIMAL(38,10)) AS NewDecimalRate;
?


0 Votes 0 ·

Hi @MartinJaffer-MSFT ,
Thanks for the tip about the nested class. I use those in C# too.

For starters, are you aware what component of databricks is sending me this message in the first place?

I'm going to do some testing from an apache spark cluster as well, and possibly from databricks-connect. But I don't have a full picture of how these internal components of databricks are relying on each other (REPL, HIVE SQL, janino compiler, etc). I'd rather have some basic level of understanding, before I flail about with various types of tests.

I'd especially like to know if this might be a databricks-specific issue rather than one in Spark 3.0.1. Perhaps it is specific to their REPL or notebooks or whatever. That would explain why I don't see many search results in google. In the past I've rarely had luck searching for Databricks issues, but I have a bit more luck when my searches are related to Spark in general.

0 Votes 0 ·
DavidBeavon-2754 avatar image
0 Votes"
DavidBeavon-2754 answered ·

Here is the KB:

https://docs.microsoft.com/en-us/azure/databricks/kb/scala/decimal-is-fractional-error

Hopefully this is helpful to anyone else that may encounter the message. Perhaps it will save you from spending time on a tech support case with databricks or azure databricks.

So far I haven't noticed any significant implications of that message. I agree that this message can probably be ignored, even as scary as it is. In general the scala programming environment will bubble up any "real" exceptions. I suspect this one is being deliberately handled/suppressed. Ideally the internal message wasn't exposed to us, given how alarming it appears.



·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

DavidBeavon-2754 avatar image
0 Votes"
DavidBeavon-2754 answered ·

I'm still waiting for more information from support, but they continue to say that it is only a "warning":

...team has confirmed that its not a real error but an annoying warning and will not cause any data loss...

I'm not totally convinced, given the fact that the message doesn't say it is a warning. It starts out with " java.lang.AssertionError: assertion failed". And the word "error" is subsequently repeated many times. This noise in the output of that notebook is enough to make any customer very nervous.


The only clue that it might be a warning is the fact that the notebook continues to execute after the issue is encountered. However if ignoring the issue was deliberate, then you would think that databricks would annotate the message by saying there were one or more internal issues that it considered unimportant.

IMHO This "warning" information should probably be suppressed/hidden unless customers set a higher logging level, and choose to view the various internal messages that are not actually having any impact on the end results.

If anyone can dissect the message above and pick out the components that are producing it, I would greatly appreciate it. I'd feel more comfortable ignoring scary messages from databricks if it were clear that source of the messages was from non-critical internal component, or was doing a non-critical function.

·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.