Release notes for Hadoop components on Azure HDInsight

This article provides information about the most recent Azure HDInsight release updates. For information on earlier releases, see HDInsight Release Notes Archive.

Important

Linux is the only operating system used on HDInsight version 3.4 or greater. For more information, see HDInsight versioning article.

Notes for 03/20/2018 - Release of Spark 2.2 on HDInsight 3.6

  • Spark 2.2.0 improves stability across Spark Core, SQL, ML, and brings Structured Streaming to GA status. Spark 2.2.0 is now available on HDInsight 3.6.

Notes for 08/01/2017 release of HDInsight

Title Description Impacted Area Cluster Type
Release of Microsoft R Server 9.1 on HDInsight HDInsight now supports provisioning R Server 9.1 clusters on HDInsight. For more information on Microsoft R Server 9.1 release, see this blog. Service R Server
HDInsight 3.6 now includes newer versions of the Hadoop stack Service All
Updates to Interactive Hive (Preview) clusters Service Interactive Hive (Preview)
Updates to Hadoop clusters Templeton job operation reliability is improved. For more information, see https://issues.apache.org/jira/browse/HIVE-15947 Service Hadoop
YARN updates HDInsight now creates a 250 GB Ambari database (without increasing cost), which results in a better experience for customers. This change should prevent ATS from getting filled up and likely have a better performance. Service All
Spark updates Release of Spark 2.1.1. For more information, see Spark Release 2.1.1. Service Spark

04/06/2017 - General availability of HDInsight 3.6

  • With this release, Azure HDInsight adds version 3.6, which is based on HDP 2.6. HDP 2.6 release notes are available here and more information on HDInsight versions can be found here. HDInsight 3.6 is available for the following workloads:

    • Hadoop v2.7.3
    • HBase v1.1.2
    • Storm v1.1.0
    • Spark v2.1.0
    • Interactive Hive v2.1.0
  • Support for Hive View 2.0. This should improve the user experience for Interactive Hive. For more information, see Hortonworks documentation.

  • Performance enhancements with Hive LLAP. For more information, see Hortonworks documentation.

  • New features in Hive. See Hortonworks documentation.

  • Hive CLI Deprecation: Hive CLI is being deprecated and customers are encouraged to use Beeline instead. For more information, see Apache documentation. For instructions on how to use Beeline with HDInsight, see Use Beeline with HDInsight Hadoop clusters.

  • New features in Apache Phoenix and HBase.

    • Storage quota support: Commonly used in multi-tenant environments, allowing limited storage space on a per table and per namespace level.
    • Phoenix indexing improvements: Incremental index creation and rebuild/resume indexing from previous failures.
    • Phoenix data integrity tool: Supports validation of schema, index, and other metadata.
  • Issue with HBase: While running a CSV bulk upload MapReduce job, the following syntax might result in an error.

      HADOOP_CLASSPATH=$(hbase mapredcp):/path/to/hbase/conf hadoop jar phoenix-<version>-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table EXAMPLE --input /data/example.csv
    

    Use the following syntax instead:

      HADOOP_CLASSPATH=/path/to/hbase-protocol.jar:/path/to/hbase/conf hadoop jar phoenix-<version>-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table EXAMPLE --input /data/example.csv
    

02/28/2017 - Release of Spark 2.1 on HDInsight 3.6 (Preview)

  • Spark 2.1 improves many stability and usability issues with previous versions. It also brings new features across all Spark workloads, such as Spark Core, SQL, ML, and Streaming.
  • Structured Streaming gets improved scalability with support for event time watermarks and Kafka 0.10 connector.
  • Spark SQL partitioning is now handled using new Scalable Partition Handling mechanism. See more details here on how to upgrade.
  • Spark 2.1 on Azure HDInsight 3.6 Preview currently does not support BI Tool connectivity using ODBC driver.
  • Azure Data Lake Store access from Spark 2.1 clusters is not supported in this Preview.

11/18/2016 - Release of Spark 2.0.1 on HDInsight 3.5

Spark 2.0.1 is now available on Spark clusters (HDInsight version 3.5).

11/16/2016 - Release of R Server 9.0 on HDInsight 3.5 (Spark 2.0)

  • R Server clusters now include the option for two versions: R Server 9.0 on HDI 3.5 (Spark 2.0) and R Server 8.0 on HDI 3.4 (Spark 1.6).
  • R Server 9.0 on HDI 3.5 (Spark 2.0) is built on R 3.3.2 and includes new ScaleR data source functions called RxHiveData and RxParquetData for loading data from Hive and Parquet directly to Spark DataFrames for analysis by ScaleR. For more information, see the inline help on these functions in R through use of the ?RxHiveData and ?RxParquetData commands.
  • RStudio Server community edition is now installed by default (with an opt-out option) on the Cluster Configuration blade as part of the provisioning flow.

11/09/2016 - Release of Spark 2.0 on HDInsight

  • Spark 2.0 clusters on HDInsight 3.5 now support Livy and Jupyter services.

10/26/2016 - Release of R Server on HDInsight

  • The URI for edge node access has changed to clustername-ed-ssh.azurehdinsight.net
  • R Server on HDInsight cluster provisioning has been streamlined.
  • R Server on HDInsight is now available as regular HDInsight "R Server" cluster type and no longer installed as a separate HDInsight application. The edge node and R Server binaries are now provisioned as part of the R Server cluster deployment. This improves speed and reliability of provisioning. Pricing model for R Server is updated accordingly.
  • R Server cluster type price is now based on Standard tier price plus R Server surcharge price. This change doesn't affect effective pricing of R Server; it changes only how the charges are presented in the bill. All existing R Server clusters continue to work and Resource Manager templates continue to function until deprecation notice. It is recommended though to update your scripted deployments to use new Resource Manager template.