Unable to run Sql queries in Azure Synapse. Error: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException

Yash Tamakuwala 1

Hi,
I have deployed a Synapse workspace through devops and running some sql queries inside it. I can run simple SQL queries but can not run anything related to Delta Table. Commands like -

SHOW TABLES
%%sql
CREATE DATABASE AdventureWorksLT2019
DROP TABLE IF EXISTS table_name

all fail with 'Error: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException'. I originally want to save a delta table to my ADLS but my saveAsTable command fails. Running new_target_df.write.format("delta") \ .mode('append').option("overwriteSchema", "true") \ .option("path", delta_table_path) \ .partitionBy('subscriptionId','year','month','day') \ .saveAsTable(delta_table_name) # External table
It gives -

AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
Traceback (most recent call last):

  File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 1158, in saveAsTable
    self._jwrite.saveAsTable(name)

  File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/py4j/java_gateway.py", line 1304, in __call__
    return_value = get_return_value(

  File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 117, in deco
    raise converted from None

pyspark.sql.utils.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException

I am however able to write dataframe as delta lake at destination using - new_target_df.write.format('delta').mode('append').option("overwriteSchema", "true").save(delta_table_path) so I don't think it is a permission issue. I am Synapse Administrator, Synapse workspace has Storage Blob Data Contributor on ADLS.

It seems to be an error with Hive Metastore maybe but I dont understand clearly. I recreated workspace to no avail. Please help

AnnuKumari-MSFT 31,061 Reputation points Microsoft Employee

2022-02-01T11:38:46.96+00:00

Hi @Yash Tamakuwala ,
Following up to see if the suggestion provided was helpful, kindly do click Accept Answer and Up-Vote for the same as Accepted answer helps community as well. In case you have any further query please do let us know.

Nikhilesh Negi 0

I am facing the same issue. I am not able to query a spark table which is also listed in the spark catalog and it is also listed in the SHOW TABLES command. I try to drop, read, overwrite, refresh the table, but it throws the same error.
I do see that the tableType seems to "None" for that table which is weird because it should either be Managed or External.
I can read and write data in the ADLS Storage which was initially used to create the said table but for some reason I can't query the table.

Py4JJavaError: An error occurred while calling o274.sql.
: java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1308)
	at org.apache.spark.sql.hive.client.HiveClientImpl.getRawTableOption(HiveClientImpl.scala:408)
	at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$tableExists$1(HiveClientImpl.scala:422)
	at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
	at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:306)
	at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:236)
	at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:235)
	at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:285)
	at org.apache.spark.sql.hive.client.HiveClientImpl.tableExists(HiveClientImpl.scala:422)
	at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$tableExists$1(HiveExternalCatalog.scala:864)
	at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
	at org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:864)
	at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:146)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:547)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:238)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableRawMetadata(SessionCatalog.scala:573)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:559)
	at org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.loadTable(V2SessionCatalog.scala:65)
	at org.apache.spark.sql.connector.catalog.DelegatingCatalogExtension.loadTable(DelegatingCatalogExtension.java:68)
	at org.apache.spark.sql.delta.catalog.DeltaCatalog.loadTable(DeltaCatalog.scala:173)
	at org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:281)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.loaded$lzycompute$1(Analyzer.scala:1329)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.loaded$1(Analyzer.scala:1329)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1366)
	at scala.Option.orElse(Option.scala:447)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupRelation(Analyzer.scala:1365)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$15.applyOrElse(Analyzer.scala:1277)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$15.applyOrElse(Analyzer.scala:1240)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:138)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:138)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$2(AnalysisHelper.scala:135)
	at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1130)
	at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1129)
	at org.apache.spark.sql.catalyst.plans.logical.OrderPreservingUnaryNode.mapChildren(LogicalPlan.scala:206)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:135)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:1240)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:1206)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
	at scala.collection.immutable.List.foldLeft(List.scala:91)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:225)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:221)
	at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:170)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:221)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:185)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:93)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:206)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:205)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:78)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:120)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:207)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:207)
	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:78)
	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:76)
	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:68)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)

1 answer

AnnuKumari-MSFT 31,061 Reputation points Microsoft Employee

2022-01-24T08:12:02.117+00:00
Hi @Yash Tamakuwala ,
Thankyou for using Microsoft Q&A platform and posting your queries.
In Azure Synapse workspace, you need to go to Develop tab, and create a new notebook in order to run these queries . The notebook should be attached to Spark pool . You can create Apache spark pool in Manage tab of Synapse Workspace and attach your notebook with the spark pool.

Note: As Spark pools are a provisioned service, you pay for the resources provisioned. You can go with small node size and keep Max number of nodes as 3 for getting the least charge

1. SHOW TABLES

2. CREATE DATABASE AdventureWorksLT2019

3. DROP TABLE IF EXISTS table_name

Hope this will help. Please let us know if any further queries.

------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you.
Original posters help the community find answers faster by identifying the correct answer. Here is how

Want a reminder to come back and check responses? Here is how to subscribe to a notification

If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
Please sign in to rate this answer.
Yash Tamakuwala 1 Reputation point

2022-01-24T08:27:57.973+00:00

Hi, thanks for replying. I tried that, but doesn't work. Even tried with a new spark pool. ![167658-shot-220124-192517.png][1]

[1]: /api/attachments/167658-shot-220124-192517.png?platform=QnA I am able to read and write to the ADLS so I do not think this is a permission issue.

Yash Tamakuwala 1 Reputation point

2022-01-24T08:29:50.897+00:00

The same command works in a workspace that I created manually as opposed to one deployed through az cli. So it seems to be something specific to this workspace that I can not figure out. Can share whatever details you require to investigate.

Yash Tamakuwala 1 Reputation point

2022-01-24T22:22:36.387+00:00

I found the problem @AnnuKumari-MSFT . During resource creation, we have to give the name of the default storage account and container. I wasn't deploying the container in my ARM template. When I made that change, it was all working fine. The error message could have been more user-friendly. Nothing in the stack trace says anything about container not being present.

AnnuKumari-MSFT 31,061 Reputation points Microsoft Employee

2022-01-25T11:27:55.75+00:00

Hi @Yash Tamakuwala ,
Thanks for providing the update. Glad that you found the way out. I tried to repro your scenario and did not get the NullPointException. In case you find any such
similar issues in future , you can directly create a support ticket with Microsoft. To get more details on how to raise a support ticket , kindly refer this article : Create an Azure support request

Please consider hitting Accept Answer button and upvote for the same. Accepted answers helps community as well.

luiggi navilys 1 Reputation point

2022-12-12T17:58:22.22+00:00

Hi @Yash Tamakuwala ,
What do you mean by : " During resource creation, we have to give the name of the default storage account and container"
Could you provide the code please ? Are you talking about the creation of the sparkpool or the synapse environnement creation.
Honestly, It's not really clear. Let me know, I'm facing the same issue and I don't see any default storage account property on sparkpool initialization in manage apache spark pool of synapse tab.

Thanks in advance.

Pablo Chinchilla Valverde 31 Reputation points

2022-12-13T23:23:54.813+00:00

Hi @luiggi navilys , I would suggest you check if your connections on Linked services are working.
Sign in to comment