Azure HDInsight release notes
This article provides information about the most recent Azure HDInsight release updates. For information on earlier releases, see HDInsight Release Notes Archive.
Summary
Azure HDInsight is one of the most popular services among enterprise customers for open-source analytics on Azure. If you would like to subscribe on release notes, watch releases on this GitHub repository.
Release date: 06/03/2022
This release applies for HDInsight 4.0. HDInsight release is made available to all regions over several days. The release date here indicates the first region release date. If you don't see below changes, wait for the release being live in your region over several days.
Release highlights
The Hive Warehouse Connector (HWC) on Spark v3.1.2
The Hive Warehouse Connector (HWC) allows you to take advantage of the unique features of Hive and Spark to build powerful big-data applications. HWC is currently supported for Spark v2.4 only. This feature adds business value by allowing ACID transactions on Hive Tables using Spark. This feature is useful for customers who use both Hive and Spark in their data estate. For more information, see Apache Spark & Hive - Hive Warehouse Connector - Azure HDInsight | Microsoft Docs
Ambari
- Scaling and provisioning improvement changes
- HDI hive is now compatible with OSS version 3.1.2
HDI Hive 3.1 version is upgraded to OSS Hive 3.1.2. This version has all fixes and features available in open source Hive 3.1.2 version.
Note
Spark
- If you are using Azure User Interface to create Spark Cluster for HDInsight, you will see from the dropdown list an additional version Spark 3.1.(HDI 5.0) along with the older versions. This version is a renamed version of Spark 3.1.(HDI 4.0). This is only an UI level change, which doesn’t impact anything for the existing users and users who are already using the ARM template.

Note
Interactive Query
- If you are creating an Interactive Query Cluster, you will see from the dropdown list an additional version as Interactive Query 3.1 (HDI 5.0).
- If you are going to use Spark 3.1 version along with Hive which require ACID support, you need to select this version Interactive Query 3.1 (HDI 5.0).

TEZ bug fixes
| Bug Fixes | Apache JIRA |
|---|---|
| TezUtils.createConfFromByteString on Configuration larger than 32 MB throws com.google.protobuf.CodedInputStream exception | TEZ-4142 |
| TezUtils createByteStringFromConf should use snappy instead of DeflaterOutputStream | TEZ-4113 |
HBase bug fixes
| Bug Fixes | Apache JIRA |
|---|---|
| TableSnapshotInputFormat should use ReadType.STREAM for scanning HFiles | HBASE-26273 |
| Add option to disable scanMetrics in TableSnapshotInputFormat | HBASE-26330 |
| Fix for ArrayIndexOutOfBoundsException when balancer is executed | HBASE-22739 |
Hive bug fixes
| Bug Fixes | Apache JIRA |
|---|---|
| NPE when inserting data with 'distribute by' clause with dynpart sort optimization | HIVE-18284 |
| MSCK REPAIR Command with Partition Filtering Fails While Dropping Partitions | HIVE-23851 |
| Wrong exception thrown if capacity<=0 | HIVE-25446 |
| Support parallel load for HastTables - Interfaces | HIVE-25583 |
| Include MultiDelimitSerDe in HiveServer2 By Default | HIVE-20619 |
| Remove glassfish.jersey and mssql-jdbc classes from jdbc-standalone jar | HIVE-22134 |
| Null pointer exception on running compaction against an MM table. | HIVE-21280 |
| Hive query with large size via knox fails with Broken pipe Write failed | HIVE-22231 |
| Adding ability for user to set bind user | HIVE-21009 |
| Implement UDF to interpret date/timestamp using its internal representation and Gregorian-Julian hybrid calendar | HIVE-22241 |
| Beeline option to show/not show execution report | HIVE-22204 |
| Tez: SplitGenerator tries to look for plan files, which won't exist for Tez | HIVE-22169 |
| Remove expensive logging from the LLAP cache hotpath | HIVE-22168 |
| UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType class | HIVE-22161 |
| Prevent the creation of query routing appender if property is set to false | HIVE-22115 |
| Remove cross-query synchronization for the partition-eval | HIVE-22106 |
| Skip setting up hive scratch dir during planning | HIVE-21182 |
| Skip creating scratch dirs for tez if RPC is on | HIVE-21171 |
| switch Hive UDFs to use Re2J regex engine | HIVE-19661 |
| Migrated clustered tables using bucketing_version 1 on hive 3 uses bucketing_version 2 for inserts | HIVE-22429 |
| Bucketing: Bucketing version 1 is incorrectly partitioning data | HIVE-21167 |
| Adding ASF License header to the newly added file | HIVE-22498 |
| Schema tool enhancements to support mergeCatalog | HIVE-22498 |
| Hive with TEZ UNION ALL and UDTF results in data loss | HIVE-21915 |
| Split text files even if header/footer exists | HIVE-21924 |
| MultiDelimitSerDe returns wrong results in last column when the loaded file has more columns than the once are present in table schema | HIVE-22360 |
| LLAP external client - Need to reduce LlapBaseInputFormat#getSplits() footprint | HIVE-22221 |
| Column name with reserved keyword is unescaped when query including join on table with mask column is rewritten (Zoltan Matyus via Zoltan Haindrich) | HIVE-22208 |
| Prevent LLAP shutdown on AMReporter related RuntimeException | HIVE-22113 |
| LLAP status service driver may get stuck with wrong Yarn app ID | HIVE-21866 |
| OperationManager.queryIdOperation doesn't properly clean up multiple queryIds | HIVE-22275 |
| Bringing a node manager down blocks restart of LLAP service | HIVE-22219 |
| StackOverflowError when drop lots of partitions | HIVE-15956 |
| Access check is failed when a temporary directory is removed | HIVE-22273 |
| Fix wrong results/ArrayOutOfBound exception in left outer map joins on specific boundary conditions | HIVE-22120 |
| Remove distribution management tag from pom.xml | HIVE-19667 |
| Parsing time can be high if there's deeply nested subqueries | HIVE-21980 |
For ALTER TABLE t SET TBLPROPERTIES ('EXTERNAL'='TRUE'); TBL_TYPE attribute changes not reflecting for non-CAPS |
HIVE-20057 |
| JDBC: HiveConnection shades log4j interfaces | HIVE-18874 |
| Update repo URLs in poms - branh 3.1 version | HIVE-21786 |
| DBInstall tests broken on master and branch-3.1 | HIVE-21758 |
| Load data into a bucketed table is ignoring partitions specs and loads data into default partition | HIVE-21564 |
| Queries with join condition having timestamp or timestamp with local time zone literal throw SemanticException | HIVE-21613 |
| Analyze compute stats for column leave behind staging dir on HDFS | HIVE-21342 |
| Incompatible change in Hive bucket computation | HIVE-21376 |
| Provide a fallback authorizer when no other authorizer is in use | HIVE-20420 |
| Some alterPartitions invocations throw 'NumberFormatException: null' | HIVE-18767 |
| HiveServer2: Preauthenticated subject for http transport isn't retained for entire duration of http communication in some cases | HIVE-20555 |
Maklum balas
Kirim dan lihat maklum balas untuk