如何对多个 Apache Hive 元存储问题进行故障排除How to troubleshoot several Apache Hive metastore problems

问题1:外部元存储表不可用Problem 1: External metastore tables not available

检查驱动程序日志时,将看到包含错误的堆栈跟踪 Required table missingWhen you inspect the driver logs, you see a stack trace that includes the error Required table missing:

WARN Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MDatabase and subclasses resulted in no possible candidates

Required table missing: "DBS" in Catalog "" Schema "". DataNucleus requires this table to perform its
persistence operations. Either your MetaData is incorrect, or you need to enable
"datanucleus.schema.autoCreateTables"

org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : "DBS" in Catalog ""  Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable
"datanucleus.schema.autoCreateTables"

   at

org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:606)

   at

org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:33
85)

原因Cause

此数据库存在,但没有元存储的表。The database is present, but there are no metastore tables.

解决方案Solution

如果外部元存储版本是 Hive 2.0 或更高版本,请使用Hive 架构工具创建元存储表。If the external metastore version is Hive 2.0 or above, use the Hive Schema Tool to create the metastore tables. 对于 Hive 2.0 以下的版本,请在现有 init 脚本中添加具有以下配置的元存储表:For versions below Hive 2.0, add the metastore tables with the following configurations in your existing init script:

spark.hadoop.datanucleus.autoCreateSchema=true
spark.hadoop.datanucleus.fixedDatastore=false

还可以直接在 Apache Spark 配置中设置这些配置:You can also set these configurations in the Apache Spark configuration directly:

datanucleus.autoCreateSchema true
datanucleus.fixedDatastore false

问题2: Hive 元存储验证失败Problem 2: Hive metastore verification failed

检查驱动程序日志时,将看到一个包含如下错误的堆栈跟踪:When you inspect the driver logs, you see a stack trace that includes an error like the following:

18/09/24 14:51:07 ERROR RetryingHMSHandler: HMSHandler Fatal error:
MetaException(message:Version information not found in metastore. )

   at
org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore
.java:7564)

   at
org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.
java:7542)

   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

原因Cause

VERSION元存储中的表为空。The VERSION table in the metastore is empty.

解决方案Solution

执行下列操作之一:Do one of the following:

  • VERSION使用查询为表填充正确的版本值 INSERTPopulate the VERSION table with the correct version values using an INSERT query.

  • 设置以下配置,在群集的 Spark 配置中关闭元存储验证:Set the following configurations to turn off the metastore verification in the Spark configuration of the cluster:

    hive.metastore.schema.verification false
    hive.metastore.schema.verification.record.version false
    

问题3:超过了元存储连接限制Problem 3: Metastore connection limit exceeded

在群集上运行的命令将失败,并在驱动程序日志中出现以下堆栈跟踪:Commands run on the cluster fail with the following stack trace in the driver logs:

Unable to open a test connection to the given
database. JDBC url =
jdbc:<jdbcURL>?trustServerCertificate=true&useSS
L=true, username = <REDACTED>. Terminating
connection pool (set lazyInit to true if you
expect to start your database after your app).
Original Exception: ------

java.sql.SQLSyntaxErrorException: User
'<userId>' has exceeded the
'max_user_connections' resource (current value:
100)
at
org.mariadb.jdbc.internal.util.exceptions.Except
ionMapper.get(ExceptionMapper.java:163)
at
org.mariadb.jdbc.internal.util.exceptions.Except
ionMapper.getException(ExceptionMapper.java:106)
at
org.mariadb.jdbc.internal.protocol.AbstractConne
ctProtocol.connectWithoutProxy(AbstractConnectPr
otocol.java:1036)

原因Cause

元存储配置仅允许100连接。The metastore configuration allows only 100 connections. 达到连接限制后,将不允许新的连接,并且命令将失败并出现此错误。When the connection limit is reached, new connections are not allowed, and commands fail with this error. Azure Databricks 工作区中的每个群集都建立与元存储的连接。Each cluster in the Azure Databricks workspace establishes a connection with the metastore. 如果有大量群集正在运行,则可能出现此问题。If you have a large number of clusters running, then this issue can occur. 此外,不正确的配置会导致连接泄露,导致连接数不断增加,直到达到限制。Additionally, incorrect configurations can cause a connection leak, causing the number of connections to keep increasing until the limit is reached.

解决方案Solution

更正以下操作之一的问题:Correct the problem with one of the following actions:

  • 如果你使用的是外部元存储并且有大量群集正在运行,则在外部元存储上增加连接限制。If you are using an external metastore and you have a large number of clusters running, then increase the connection limit on your external metastore.
  • 如果使用的不是外部元存储,请确保群集上没有任何自定义的 Hive 元存储配置。If you are not using an external metastore, ensure that you do not have any custom Hive metastore configurations on your cluster. 使用 Azure Databricks 提供的元存储时,应在群集上对 Hive 元存储使用默认配置。When using the metastore provided by Azure Databricks, you should use the default configurations on the cluster for the Hive metastore.
  • 如果你使用的是默认配置,但仍遇到此问题,请联系 Azure Databricks 支持人员。If you are using the default configuration and still encounter this issue, contact Azure Databricks Support. 根据 Azure Databricks 工作区的配置,可能会增加允许的内部元存储连接数。Depending on the configuration of your Azure Databricks workspace, it might be possible to increase the number of connections allowed to the internal metastore.

问题4:由于列的元数据太多,导致表操作失败Problem 4: Table actions fail because column has too much metadata

当单列的元数据数量超过4000个字符时,表操作将失败并出现如下错误:When the quantity of metadata for a single column exceeds 4000 characters, table actions fail with an error like this:

Error in SQL statement: IllegalArgumentException:
Error: type expected at the position 3998 of 'struct<num_ad_accounts:bigint,num_benchmarks:bigint,num_days_range:string,num_days_in_history:string,num_fb_pages:bigint,num_g_profiles:bigint,num_ga_views:bigint,num_groups:bigint,num_ig_profiles:bigint,num_li_pages:bigint,num_labels:string,num_labels_added:bigint,num_labels_

原因Cause

这是在 Hive 元存储版本2.3.0 (hive-12274)中修复的 bug。This is a bug that was fixed in Hive Metastore version 2.3.0 (HIVE-12274). Azure Databricks 使用 Hive 元存储的早期版本(版本0.13),因此当某个列的元数据太多(如导入的 JSON 架构)时,会出现此错误。Azure Databricks uses an earlier version of Hive Metastore (version 0.13), so this bug occurs when there is too much metadata for a column, such as an imported JSON schema.

解决方案Solution

解决方法是设置使用2.3.0 或更高版本的外部 Hive 元存储As a workaround, set up an external Hive metastore that uses version 2.3.0 or above. 然后,用以下命令删除现有表:Then delete the existing table with the following command:

spark.sessionState
  .catalog
  .externalCatalog
  .dropTable("default", "test_table_tabledrop_1", ignoreIfNotExists = false, purge = false)