Databricks 运行时维护更新Databricks runtime maintenance updates

此页列出了针对 Databricks Runtime 发布发布的维护更新。This page lists maintenance updates issued for Databricks Runtime releases. 若要将维护更新添加到现有群集,请重新启动该群集。To add a maintenance update to an existing cluster, restart the cluster.

支持的 Databricks Runtime 版本Supported Databricks Runtime releases

支持的 Databricks Runtime 版本的维护更新:Maintenance updates for supported Databricks Runtime releases:

对于原始发行说明,请单击副标题下面的链接。For the original release notes, follow the link below the subheading.

Databricks Runtime 7.3 LTS Databricks Runtime 7.3 LTS

请参阅 Databricks Runtime 7.3 LTSSee Databricks Runtime 7.3 LTS.

  • 10月13日,2020Oct 13, 2020
    • 操作系统安全更新。Operating system security updates.
    • 当启用了高并发凭据启用的群集时,可以使用/dbfs/上的熔断器装载从 DBFS 读取和写入。You can read and write from DBFS using the FUSE mount at /dbfs/ when on a high concurrency credential passthrough enabled cluster. 支持常规装载,但尚不支持需要传递凭据的装载。Regular mounts are supported but mounts that need passthrough credentials are not supported yet.
    • [SPARK-32999][SQL] 使用 Utils 来避免在 TreeNode 中命中格式不正确的类名称[SPARK-32999][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
    • [SPARK-32585][SQL] 支持 ScalaReflection 中的 scala 枚举[SPARK-32585][SQL] Support scala enumeration in ScalaReflection
    • 修复了在熔断器装载中列出了包含包含无效 XML 字符的文件名的目录Fixed listing directories in FUSE mount that contain file names with invalid XML characters
    • 保险丝装载不再使用 ListMultipartUploadsFUSE mount no longer uses ListMultipartUploads
  • 2020年9月29日Sep 29, 2020
    • [SPARK-32718][SQL] 删除间隔单位的不必要关键字[SPARK-32718][SQL] Remove unnecessary keywords for interval units
    • [SPARK-32635][SQL] 修复折叠传播[SPARK-32635][SQL] Fix foldable propagation
    • 添加新的配置 spark.shuffle.io.decoder.consolidateThresholdAdd a new config spark.shuffle.io.decoder.consolidateThreshold. 将配置值设置为 Long.MAX_VALUE 以跳过 Netty FrameBuffers 的合并,这会防止出现这种 java.lang.IndexOutOfBoundsException 情况。Set the config value to Long.MAX_VALUE to skip the consolidation of netty FrameBuffers, which prevents java.lang.IndexOutOfBoundsException in corner cases.

Databricks Runtime 7。2 Databricks Runtime 7.2

请参阅 Databricks Runtime 7.2See Databricks Runtime 7.2.

  • 10月13日,2020Oct 13, 2020
    • 操作系统安全更新。Operating system security updates.
    • [SPARK-32999][SQL] 使用 Utils 来避免在 TreeNode 中命中格式不正确的类名称[SPARK-32999][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
    • 修复了在熔断器装载中列出了包含包含无效 XML 字符的文件名的目录Fixed listing directories in FUSE mount that contain file names with invalid XML characters
    • 保险丝装载不再使用 ListMultipartUploadsFUSE mount no longer uses ListMultipartUploads
  • 2020年9月29日Sep 29, 2020
    • [SPARK-28863][SQL] [WARMFIX] 引入 AlreadyOptimized 以防止 reanalysis V1FallbackWriters[SPARK-28863][SQL][WARMFIX] Introduce AlreadyOptimized to prevent reanalysis of V1FallbackWriters
    • [SPARK-32635][SQL] 修复折叠传播[SPARK-32635][SQL] Fix foldable propagation
    • 添加新的配置 spark.shuffle.io.decoder.consolidateThresholdAdd a new config spark.shuffle.io.decoder.consolidateThreshold. 将配置值设置为 Long.MAX_VALUE 以跳过 Netty FrameBuffers 的合并,这会防止出现这种 java.lang.IndexOutOfBoundsException 情况。Set the config value to Long.MAX_VALUE to skip the consolidation of netty FrameBuffers, which prevents java.lang.IndexOutOfBoundsException in corner cases.
  • 09月24,2020Sep 24, 2020
    • [SPARK-32764][SQL]-0.0 应等于0。0[SPARK-32764][SQL] -0.0 should be equal to 0.0
    • [SPARK-32753][SQL] 转换计划时仅将标记复制到没有标记的节点[SPARK-32753][SQL] Only copy tags to node with no tags when transforming plans
    • [SPARK-32659][SQL] 修复在非原子类型上插入动态分区修剪的数据问题[SPARK-32659][SQL] Fix the data issue of inserted Dynamic Partition Pruning on non-atomic type
    • 操作系统安全更新。Operating system security updates.
  • 2020 年 9 月 8 日Sep 8, 2020
    • 已为 Azure Synapse Analytics 创建了一个新参数 maxbinlengthA new parameter was created for Azure Synapse Analytics, maxbinlength. 此参数用于控制 BinaryType 列的列长度,并转换为 VARBINARY(maxbinlength)This parameter is used to control the column length of BinaryType columns, and is translated as VARBINARY(maxbinlength). 它可以使用进行设置 .option("maxbinlength", n) ,其中 0 < n <= 8000。It can be set using .option("maxbinlength", n), where 0 < n <= 8000.

Databricks Runtime 7。1 Databricks Runtime 7.1

请参阅 Databricks Runtime 7.1See Databricks Runtime 7.1.

  • 10月13日,2020Oct 13, 2020
    • 操作系统安全更新。Operating system security updates.
    • [SPARK-32999][SQL] 使用 Utils 来避免在 TreeNode 中命中格式不正确的类名称[SPARK-32999][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
    • 修复了在熔断器装载中列出了包含包含无效 XML 字符的文件名的目录Fixed listing directories in FUSE mount that contain file names with invalid XML characters
    • 保险丝装载不再使用 ListMultipartUploadsFUSE mount no longer uses ListMultipartUploads
  • 2020年9月29日Sep 29, 2020
    • [SPARK-28863][SQL] [WARMFIX] 引入 AlreadyOptimized 以防止 reanalysis V1FallbackWriters[SPARK-28863][SQL][WARMFIX] Introduce AlreadyOptimized to prevent reanalysis of V1FallbackWriters
    • [SPARK-32635][SQL] 修复折叠传播[SPARK-32635][SQL] Fix foldable propagation
    • 添加新的配置 spark.shuffle.io.decoder.consolidateThresholdAdd a new config spark.shuffle.io.decoder.consolidateThreshold. 将配置值设置为 Long.MAX_VALUE 以跳过 Netty FrameBuffers 的合并,这会防止出现这种 java.lang.IndexOutOfBoundsException 情况。Set the config value to Long.MAX_VALUE to skip the consolidation of netty FrameBuffers, which prevents java.lang.IndexOutOfBoundsException in corner cases.
  • 09月24,2020Sep 24, 2020
    • [SPARK-32764][SQL]-0.0 应等于0。0[SPARK-32764][SQL] -0.0 should be equal to 0.0
    • [SPARK-32753][SQL] 转换计划时仅将标记复制到没有标记的节点[SPARK-32753][SQL] Only copy tags to node with no tags when transforming plans
    • [SPARK-32659][SQL] 修复在非原子类型上插入动态分区修剪的数据问题[SPARK-32659][SQL] Fix the data issue of inserted Dynamic Partition Pruning on non-atomic type
    • 操作系统安全更新。Operating system security updates.
  • 2020 年 9 月 8 日Sep 8, 2020
    • 已为 Azure Synapse Analytics 创建了一个新参数 maxbinlengthA new parameter was created for Azure Synapse Analytics, maxbinlength. 此参数用于控制 BinaryType 列的列长度,并转换为 VARBINARY(maxbinlength)This parameter is used to control the column length of BinaryType columns, and is translated as VARBINARY(maxbinlength). 它可以使用进行设置 .option("maxbinlength", n) ,其中 0 < n <= 8000。It can be set using .option("maxbinlength", n), where 0 < n <= 8000.
  • 8月25日,2020Aug 25, 2020
    • [SPARK-32159][SQL] 修复与之间 Aggregator[Array[_], _, _] 的集成 UnresolvedMapObjects[SPARK-32159][SQL] Fix integration between Aggregator[Array[_], _, _] and UnresolvedMapObjects
    • [SPARK-32559][SQL] 修复中的剪裁逻辑 UTF8String.toInt/toLong ,该逻辑未正确处理非 ASCII 字符[SPARK-32559][SQL] Fix the trim logic in UTF8String.toInt/toLong, which didn’t handle non-ASCII characters correctly
    • [SPARK-32543][R] 删除 arrow::as_tibble SparkR 中的用法[SPARK-32543][R] Remove arrow::as_tibble usage in SparkR
    • [SPARK-32091][核心] 删除丢失的执行程序上的块时忽略超时错误[SPARK-32091][CORE] Ignore timeout error when removing blocks on the lost executor
    • 修复了影响 Azure Synapse 连接器与 MSI 凭据的问题Fixed an issue affecting Azure Synapse connector with MSI credentials
    • 修复了自我合并中不明确的特性解析Fixed ambiguous attribute resolution in self-merge
  • 8月18日,2020Aug 18, 2020
    • [SPARK-32594][SQL] 修复插入到 Hive 表中的日期的序列化[SPARK-32594][SQL] Fix serialization of dates inserted to Hive tables
    • [SPARK-32237][SQL] 在 CTE 中解析提示[SPARK-32237][SQL] Resolve hint in CTE
    • [SPARK-32431][SQL] 检查从生成的数据源中读取的重复嵌套列[SPARK-32431][SQL] Check duplicate nested columns in read from in-built datasources
    • [SPARK-32467][UI] 避免在 https 重定向时编码 URL 两次[SPARK-32467][UI] Avoid encoding URL twice on https redirect
    • 修复了使用 Trigger 时 AQS 连接器中的争用条件。Fixed a race condition in the AQS connector when using Trigger.Once.
  • 2020 年 8 月 11 日Aug 11, 2020
  • 8月3日,2020Aug 3, 2020
    • 你现在可以在启用了直通的群集上使用 LDA 转换函数。You can now use the LDA transform function on a passthrough-enabled cluster.

Databricks Runtime 7。0 Databricks Runtime 7.0

请参阅 Databricks Runtime 7.0See Databricks Runtime 7.0.

  • 10月13日,2020Oct 13, 2020
    • 操作系统安全更新。Operating system security updates.
    • [SPARK-32999][SQL] 使用 Utils 来避免在 TreeNode 中命中格式不正确的类名称[SPARK-32999][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
    • 修复了在熔断器装载中列出了包含包含无效 XML 字符的文件名的目录Fixed listing directories in FUSE mount that contain file names with invalid XML characters
    • 保险丝装载不再使用 ListMultipartUploadsFUSE mount no longer uses ListMultipartUploads
  • 2020年9月29日Sep 29, 2020
    • [SPARK-28863][SQL] [WARMFIX] 引入 AlreadyOptimized 以防止 reanalysis V1FallbackWriters[SPARK-28863][SQL][WARMFIX] Introduce AlreadyOptimized to prevent reanalysis of V1FallbackWriters
    • [SPARK-32635][SQL] 修复折叠传播[SPARK-32635][SQL] Fix foldable propagation
    • 添加新的配置 spark.shuffle.io.decoder.consolidateThresholdAdd a new config spark.shuffle.io.decoder.consolidateThreshold. 将配置值设置为 Long.MAX_VALUE 以跳过 Netty FrameBuffers 的合并,这会防止出现这种 java.lang.IndexOutOfBoundsException 情况。Set the config value to Long.MAX_VALUE to skip the consolidation of netty FrameBuffers, which prevents java.lang.IndexOutOfBoundsException in corner cases.
  • 09月24,2020Sep 24, 2020
    • [SPARK-32764][SQL]-0.0 应等于0。0[SPARK-32764][SQL] -0.0 should be equal to 0.0
    • [SPARK-32753][SQL] 转换计划时仅将标记复制到没有标记的节点[SPARK-32753][SQL] Only copy tags to node with no tags when transforming plans
    • [SPARK-32659][SQL] 修复在非原子类型上插入动态分区修剪的数据问题[SPARK-32659][SQL] Fix the data issue of inserted Dynamic Partition Pruning on non-atomic type
    • 操作系统安全更新。Operating system security updates.
  • 2020 年 9 月 8 日Sep 8, 2020
    • 已为 Azure Synapse Analytics 创建了一个新参数 maxbinlengthA new parameter was created for Azure Synapse Analytics, maxbinlength. 此参数用于控制 BinaryType 列的列长度,并转换为 VARBINARY(maxbinlength)This parameter is used to control the column length of BinaryType columns, and is translated as VARBINARY(maxbinlength). 它可以使用进行设置 .option("maxbinlength", n) ,其中 0 < n <= 8000。It can be set using .option("maxbinlength", n), where 0 < n <= 8000.
  • 8月25日,2020Aug 25, 2020
    • [SPARK-32159][SQL] 修复与之间 Aggregator[Array[_], _, _] 的集成 UnresolvedMapObjects[SPARK-32159][SQL] Fix integration between Aggregator[Array[_], _, _] and UnresolvedMapObjects
    • [SPARK-32559][SQL] 修复中的剪裁逻辑 UTF8String.toInt/toLong ,该逻辑未正确处理非 ASCII 字符[SPARK-32559][SQL] Fix the trim logic in UTF8String.toInt/toLong, which didn’t handle non-ASCII characters correctly
    • [SPARK-32543][R] 删除 arrow::as_tibble SparkR 中的用法[SPARK-32543][R] Remove arrow::as_tibble usage in SparkR
    • [SPARK-32091][核心] 删除丢失的执行程序上的块时忽略超时错误[SPARK-32091][CORE] Ignore timeout error when removing blocks on the lost executor
    • 修复了影响 Azure Synapse 连接器与 MSI 凭据的问题Fixed an issue affecting Azure Synapse connector with MSI credentials
    • 修复了自我合并中不明确的特性解析Fixed ambiguous attribute resolution in self-merge
  • 8月18日,2020Aug 18, 2020
    • [SPARK-32594][SQL] 修复插入到 Hive 表中的日期的序列化[SPARK-32594][SQL] Fix serialization of dates inserted to Hive tables
    • [SPARK-32237][SQL] 在 CTE 中解析提示[SPARK-32237][SQL] Resolve hint in CTE
    • [SPARK-32431][SQL] 检查从生成的数据源中读取的重复嵌套列[SPARK-32431][SQL] Check duplicate nested columns in read from in-built datasources
    • [SPARK-32467][UI] 避免在 https 重定向时编码 URL 两次[SPARK-32467][UI] Avoid encoding URL twice on https redirect
    • 修复了使用 Trigger 时 AQS 连接器中的争用条件。Fixed a race condition in the AQS connector when using Trigger.Once.
  • 2020 年 8 月 11 日Aug 11, 2020
    • [Spark-32280][spark-32372][SQL] ResolveReferences。 dedupRight 只应重写冲突计划的祖先节点的属性[SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict plan
    • [Spark-32234][SQL] SPARK SQL 命令在选择 ORC 表时失败[SPARK-32234][SQL] Spark SQL commands are failing on selecting the ORC tables
    • 你现在可以在启用了直通的群集上使用 LDA 转换函数。You can now use the LDA transform function on a passthrough-enabled cluster.

Databricks Runtime 6。6 Databricks Runtime 6.6

请参阅 Databricks Runtime 6.6See Databricks Runtime 6.6.

  • 10月13日,2020Oct 13, 2020
    • 操作系统安全更新。Operating system security updates.
    • [SPARK-32999][SQL] [2.4] 使用 Utils 来避免在 TreeNode 中命中格式不正确的类名称[SPARK-32999][SQL][2.4] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
    • 修复了在熔断器装载中列出了包含包含无效 XML 字符的文件名的目录Fixed listing directories in FUSE mount that contain file names with invalid XML characters
    • 保险丝装载不再使用 ListMultipartUploadsFUSE mount no longer uses ListMultipartUploads
  • 09月24,2020Sep 24, 2020
    • 操作系统安全更新。Operating system security updates.
  • 2020 年 9 月 8 日Sep 8, 2020
    • 已为 Azure Synapse Analytics 创建了一个新参数 maxbinlengthA new parameter was created for Azure Synapse Analytics, maxbinlength. 此参数用于控制 BinaryType 列的列长度,并转换为 VARBINARY(maxbinlength)This parameter is used to control the column length of BinaryType columns, and is translated as VARBINARY(maxbinlength). 它可以使用进行设置 .option("maxbinlength", n) ,其中 0 < n <= 8000。It can be set using .option("maxbinlength", n), where 0 < n <= 8000.
    • 将 Azure 存储 SDK 更新为8.6.4,并使 TCP 在 WASB 驱动程序所建立的连接上保持活动状态Update Azure Storage SDK to 8.6.4 and enable TCP keep alive on connections made by the WASB driver
  • 8月25日,2020Aug 25, 2020
    • 修复了自我合并中不明确的特性解析Fixed ambiguous attribute resolution in self-merge
  • 8月18日,2020Aug 18, 2020
    • [SPARK-32431][SQL] 检查从生成的数据源中读取的重复嵌套列[SPARK-32431][SQL] Check duplicate nested columns in read from in-built datasources
    • 修复了使用 Trigger 时 AQS 连接器中的争用条件。Fixed a race condition in the AQS connector when using Trigger.Once.
  • 2020 年 8 月 11 日Aug 11, 2020
    • [SPARK-28676][核心] 避免过多 ContextCleaner 的日志记录[SPARK-28676][CORE] Avoid Excessive logging from ContextCleaner
    • [SPARK-31967][UI] 降级到 vis.js 4.21.0 以修复作业 UI 加载时间回归[SPARK-31967][UI] Downgrade to vis.js 4.21.0 to fix Jobs UI loading time regression
  • 8月3日,2020Aug 3, 2020
    • 你现在可以在启用了直通的群集上使用 LDA 转换函数。You can now use the LDA transform function on a passthrough-enabled cluster.
    • 操作系统安全更新。Operating system security updates.

Databricks Runtime 6。4 Databricks Runtime 6.4

请参阅 Databricks Runtime 6.4See Databricks Runtime 6.4.

  • 10月13日,2020Oct 13, 2020
    • 操作系统安全更新。Operating system security updates.
    • [SPARK-32999][SQL] [2.4] 使用 Utils 来避免在 TreeNode 中命中格式不正确的类名称[SPARK-32999][SQL][2.4] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
    • 修复了在熔断器装载中列出了包含包含无效 XML 字符的文件名的目录Fixed listing directories in FUSE mount that contain file names with invalid XML characters
    • 保险丝装载不再使用 ListMultipartUploadsFUSE mount no longer uses ListMultipartUploads
  • 09月24,2020Sep 24, 2020
    • 解决了以前的限制:标准群集上的 passthrough 仍会限制用户使用的文件系统实现。Fixed a previous limitation where passthrough on standard cluster would still restrict the filesystem implementation user uses. 现在用户可以不受限制地访问本地文件系统。Now users would be able to access local filesystems without restrictions.
    • 操作系统安全更新。Operating system security updates.
  • 2020 年 9 月 8 日Sep 8, 2020
    • 已为 Azure Synapse Analytics 创建了一个新参数 maxbinlengthA new parameter was created for Azure Synapse Analytics, maxbinlength. 此参数用于控制 BinaryType 列的列长度,并转换为 VARBINARY(maxbinlength)This parameter is used to control the column length of BinaryType columns, and is translated as VARBINARY(maxbinlength). 它可以使用进行设置 .option("maxbinlength", n) ,其中 0 < n <= 8000。It can be set using .option("maxbinlength", n), where 0 < n <= 8000.
    • 将 Azure 存储 SDK 更新为8.6.4,并使 TCP 在 WASB 驱动程序所建立的连接上保持活动状态Update Azure Storage SDK to 8.6.4 and enable TCP keep alive on connections made by the WASB driver
  • 8月25日,2020Aug 25, 2020
    • 修复了自我合并中不明确的特性解析Fixed ambiguous attribute resolution in self-merge
  • 8月18日,2020Aug 18, 2020
    • [SPARK-32431][SQL] 检查从生成的数据源中读取的重复嵌套列[SPARK-32431][SQL] Check duplicate nested columns in read from in-built datasources
    • 修复了使用 Trigger 时 AQS 连接器中的争用条件。Fixed a race condition in the AQS connector when using Trigger.Once.
  • 2020 年 8 月 11 日Aug 11, 2020
    • [SPARK-28676][核心] 避免过多 ContextCleaner 的日志记录[SPARK-28676][CORE] Avoid Excessive logging from ContextCleaner
  • 8月3日,2020Aug 3, 2020
    • 你现在可以在启用了直通的群集上使用 LDA 转换函数。You can now use the LDA transform function on a passthrough-enabled cluster.
    • 操作系统安全更新。Operating system security updates.
  • 7月7日,2020Jul 7, 2020
    • 已将 Java 版本从 1.8.0 _232 升级到 1.8.0 _252。Upgraded Java version from 1.8.0_232 to 1.8.0_252.
  • Apr 21,2020Apr 21, 2020
    • [SPARK-31312][SQL] HIVEFUNCTIONWRAPPER 中 UDF 实例的缓存类实例[SPARK-31312][SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper
  • Apr 7,2020Apr 7, 2020
    • 若要解决 pandas udf 无法与 PyArrow 0.15.0 和更高版本结合使用的问题,我们添加了一个环境变量 (ARROW_PRE_0_15_IPC_FORMAT=1) 启用对这些版本的 PyArrow 的支持。To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (ARROW_PRE_0_15_IPC_FORMAT=1) to enable support for those versions of PyArrow. 请参阅 [SPARK-29367]中的说明。See the instructions in [SPARK-29367].
  • 2020 年 3 月 10 日March 10, 2020
    • 现在,默认情况下,在 Azure Databricks Premium 计划的所有用途群集上都使用优化的自动缩放。Optimized autoscaling is now used by default on all-purpose clusters on the Azure Databricks Premium Plan.
    • Databricks Runtime 中包含的雪花连接器 (spark-snowflake_2.11) 更新为版本2.5.9。The Snowflake connector (spark-snowflake_2.11) included in Databricks Runtime is updated to version 2.5.9. snowflake-jdbc 已更新到版本3.12.0。snowflake-jdbc is updated to version 3.12.0.

Databricks Runtime 5.5 LTS Databricks Runtime 5.5 LTS

请参阅 Databricks Runtime 5.5 LTSSee Databricks Runtime 5.5 LTS.

  • 10月13日,2020Oct 13, 2020
    • 操作系统安全更新。Operating system security updates.
    • [SPARK-32999][SQL] [2.4] 使用 Utils 来避免在 TreeNode 中命中格式不正确的类名称[SPARK-32999][SQL][2.4] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
  • 09月24,2020Sep 24, 2020
    • 操作系统安全更新。Operating system security updates.
  • 2020 年 9 月 8 日Sep 8, 2020
    • 已为 Azure Synapse Analytics 创建了一个新参数 maxbinlengthA new parameter was created for Azure Synapse Analytics, maxbinlength. 此参数用于控制 BinaryType 列的列长度,并转换为 VARBINARY(maxbinlength)This parameter is used to control the column length of BinaryType columns, and is translated as VARBINARY(maxbinlength). 它可以使用进行设置 .option("maxbinlength", n) ,其中 0 < n <= 8000。It can be set using .option("maxbinlength", n), where 0 < n <= 8000.
  • 8月18日,2020Aug 18, 2020
    • [SPARK-32431][SQL] 检查从生成的数据源中读取的重复嵌套列[SPARK-32431][SQL] Check duplicate nested columns in read from in-built datasources
    • 修复了使用 Trigger 时 AQS 连接器中的争用条件。Fixed a race condition in the AQS connector when using Trigger.Once.
  • 2020 年 8 月 11 日Aug 11, 2020
    • [SPARK-28676][核心] 避免过多 ContextCleaner 的日志记录[SPARK-28676][CORE] Avoid Excessive logging from ContextCleaner
  • 8月3日,2020Aug 3, 2020
    • 操作系统安全更新Operating system security updates
  • 7月7日,2020Jul 7, 2020
    • 已将 Java 版本从 1.8.0 _232 升级到 1.8.0 _252。Upgraded Java version from 1.8.0_232 to 1.8.0_252.
  • Apr 21,2020Apr 21, 2020
    • [SPARK-31312][SQL] HIVEFUNCTIONWRAPPER 中 UDF 实例的缓存类实例[SPARK-31312][SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper
  • Apr 7,2020Apr 7, 2020
    • 若要解决 pandas udf 无法与 PyArrow 0.15.0 和更高版本结合使用的问题,我们添加了一个环境变量 (ARROW_PRE_0_15_IPC_FORMAT=1) 启用对这些版本的 PyArrow 的支持。To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (ARROW_PRE_0_15_IPC_FORMAT=1) to enable support for those versions of PyArrow. 请参阅 [SPARK-29367]中的说明。See the instructions in [SPARK-29367].
  • 3月25日,2020Mar 25, 2020
    • Databricks Runtime 中包含的雪花连接器 (spark-snowflake_2.11) 更新为版本2.5.9。The Snowflake connector (spark-snowflake_2.11) included in Databricks Runtime is updated to version 2.5.9. snowflake-jdbc 已更新到版本3.12.0。snowflake-jdbc is updated to version 3.12.0.
  • 2020 年 3 月 10 日Mar 10, 2020
    • 作业输出(如发送到 stdout 的日志输出)受到20MB 的大小限制。Job output, such as log output emitted to stdout, is subject to a 20MB size limit. 如果输出的总大小较大,则该运行将被取消并标记为失败。If the total output has a larger size, the run will be canceled and marked as failed. 若要避免出现此限制,可以通过将 Spark 配置设置为,防止 stdout 从驱动程序返回到 spark.databricks.driver.disableScalaOutput trueTo avoid encountering this limit, you can prevent stdout from being returned from the driver to by setting the spark.databricks.driver.disableScalaOutput Spark configuration to true. 默认情况下,标志值为 falseBy default the flag value is false. 该标志控制 Scala JAR 作业和 Scala 笔记本的单元输出。The flag controls cell output for Scala JAR jobs and Scala notebooks. 如果启用了标志,Spark 不会将作业执行结果返回给客户端。If the flag is enabled, Spark does not return job execution results to the client. 该标志不会影响在群集的日志文件中写入的数据。The flag does not affect the data that is written in the cluster’s log files. 建议将此标志设置为仅用于 JAR 作业的自动群集,因为它将禁用笔记本结果。Setting this flag is recommended only for automated clusters for JAR jobs, because it will disable notebook results.
  • 2020年2月18日Feb 18, 2020
    • [Spark-24783][SQL] SPARK = 0 应引发异常[SPARK-24783][SQL] spark.sql.shuffle.partitions=0 should throw exception
    • 当启用 ADLS 客户端预提取时,通过不正确的线程本地处理,与 ADLS Gen2 的凭据传递性能会下降。Credential passthrough with ADLS Gen2 has a performance degradation due to incorrect thread local handling when ADLS client prefetching is enabled. 此版本在启用凭据传递后禁用 ADLS Gen2 预提取,直到我们有正确的修补程序。This release disables ADLS Gen2 prefetching when credential passthrough is enabled until we have a proper fix.
  • 2020年1月28日Jan 28, 2020
  • 2020 年 1 月 14 日Jan 14, 2020
    • 已将 Java 版本从 1.8.0 _222 升级到 1.8.0 _232。Upgraded Java version from 1.8.0_222 to 1.8.0_232.
  • 11月19日2019Nov 19, 2019
    • [SPARK-29743] [SQL] 如果其子 needCopyResult 为 true,则示例应将 needCopyResult 设置为 true[SPARK-29743] [SQL] sample should set needCopyResult to true if its child’s needCopyResult is true
    • R 版本 unintendedly 从3.6.0 升级到3.6.1。R version was unintendedly upgraded to 3.6.1 from 3.6.0. 我们已将其降级回3.6.0。We downgraded it back to 3.6.0.
  • 11月5日2019Nov 5, 2019
    • 已将 Java 版本从 1.8.0 _212 升级到 1.8.0 _222。Upgraded Java version from 1.8.0_212 to 1.8.0_222.
  • 2019 年 10 月 23 日Oct 23, 2019
    • [SPARK-29244][核心] 在 BytesToBytesMap 中再次阻止释放页面[SPARK-29244][CORE] Prevent freed page in BytesToBytesMap free again
  • 2019 年 10 月 8 日Oct 8, 2019
    • 服务器端更改为允许 Simba Apache Spark ODBC 驱动程序在提取结果期间发生连接故障时重新连接并继续, (需要 Simba APACHE SPARK ODBC 驱动程序版本 2.6.10) 。Server side changes to allow Simba Apache Spark ODBC driver to reconnect and continue after a connection failure during fetching results (requires Simba Apache Spark ODBC driver version 2.6.10).
    • 修复了影响 Optimize 启用了表 ACL 的群集的 using 命令的问题。Fixed an issue affecting using Optimize command with table ACL enabled clusters.
    • 修复了这样一个问题: pyspark.ml 由于在表 ACL 和凭据启用的群集上出现 SCALA UDF 禁止错误,库将会失败。Fixed an issue where pyspark.ml libraries would fail due to Scala UDF forbidden error on table ACL and credential passthrough enabled clusters.
    • 用于凭据传递的白名单 SerDe/SerDeUtil 方法。Whitelisted SerDe/SerDeUtil methods for credential passthrough.
    • 修复了在 WASB 客户端中检查错误代码时 NullPointerException。Fixed NullPointerException when checking error code in the WASB client.
  • 09月24,2019Sep 24, 2019
    • 提高了 Parquet 编写器的稳定性。Improved stability of Parquet writer.
    • 修复了 Thrift 查询在开始执行之前取消的问题可能会停滞为 "已启动" 状态。Fixed the problem that Thrift query cancelled before it starts executing may stuck in STARTED state.
  • 09月10日,2019Sep 10, 2019
    • 向 BytesToBytesMap 添加线程安全迭代器Add thread safe iterator to BytesToBytesMap
    • [Spark-27992][spark-28881]允许 Python 加入连接线程以传播错误[SPARK-27992][SPARK-28881]Allow Python to join with connection thread to propagate errors
    • 修复了影响某些全局聚合查询的 bug。Fixed a bug affecting certain global aggregation queries.
    • 改进了凭据密文。Improved credential redaction.
    • [SPARK-27330][SS] 支持在 foreach writer 中中止任务[SPARK-27330][SS] support task abort in foreach writer
    • [SPARK-28642]隐藏 CREATE TABLE 中的凭据[SPARK-28642]Hide credentials in SHOW CREATE TABLE
    • [SPARK-28699][SQL] 禁用对 ShuffleExchangeExec 进行重新分区时使用基数排序[SPARK-28699][SQL] Disable using radix sort for ShuffleExchangeExec in repartition case
  • 2019 年 8 月 27 日Aug 27, 2019
    • [SPARK-20906][SQL] 允许 API to_avro 中包含架构注册表的用户指定的架构[SPARK-20906][SQL]Allow user-specified schema in the API to_avro with schema registry
    • [SPARK-27838][SQL] 支持用户提供的可为 null 的 catalyst 架构的非空 avro 架构,无任何空记录[SPARK-27838][SQL] Support user provided non-nullable avro schema for nullable catalyst schema without any null record
    • 对增量 Lake 旅行时间的改进Improvement on Delta Lake time travel
    • 修复了影响特定表达式的问题 transformFixed an issue affecting certain transform expression
    • 启用进程隔离时支持广播变量Supports broadcast variables when Process Isolation is enabled
  • 2019 年 8 月 13 日Aug 13, 2019
    • 增量流式处理源应检查表的最新协议Delta streaming source should check the latest protocol of a table
    • [SPARK-28260]向 Executionstate& 添加关闭状态[SPARK-28260]Add CLOSED state to ExecutionState
    • [SPARK-28489][SS] 修复 KafkaOffsetRangeCalculator 的错误。 getRanges 可能会丢弃偏移量[SPARK-28489][SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets
  • 2019年7月30日Jul 30, 2019
    • [SPARK-28015][SQL] 检查 stringToDate ( # A1 对 yyyy 和 yyyy-[m] m 格式使用整个输入[SPARK-28015][SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats
    • [SPARK-28308][CORE] 在分析之前应填充 CalendarInterval 子部分[SPARK-28308][CORE] CalendarInterval sub-second part should be padded before parsing
    • [SPARK-27485]EnsureRequirements 应妥善处理重复表达式[SPARK-27485]EnsureRequirements.reorder should handle duplicate expressions gracefully
    • [Spark-28355][核心] [PYTHON] 使用 SPARK 应对阈值,使用广播压缩 UDF[SPARK-28355][CORE][PYTHON] Use Spark conf for threshold at which UDF is compressed by broadcast

Databricks 灯 2.4 Databricks Light 2.4

请参阅 Databricks Light 2.4See Databricks Light 2.4.

  • 10月13日,2020Oct 13, 2020
    • 操作系统安全更新。Operating system security updates.

不支持的 Databricks Runtime 版本Unsupported Databricks Runtime releases

不受支持的 Databricks Runtime 版本的维护更新:Maintenance updates for unsupported Databricks Runtime releases:

对于原始发行说明,请单击副标题下面的链接。For the original release notes, follow the link below the subheading.

Databricks Runtime 6.5 (不 支持) Databricks Runtime 6.5 (Unsupported)

请参阅 Databricks Runtime 6.5 (不支持的) See Databricks Runtime 6.5 (Unsupported).

  • 09月24,2020Sep 24, 2020
    • 解决了以前的限制:标准群集上的 passthrough 仍会限制用户使用的文件系统实现。Fixed a previous limitation where passthrough on standard cluster would still restrict the filesystem implementation user uses. 现在用户可以不受限制地访问本地文件系统。Now users would be able to access local filesystems without restrictions.
    • 操作系统安全更新。Operating system security updates.
  • 2020 年 9 月 8 日Sep 8, 2020
    • 已为 Azure Synapse Analytics 创建了一个新参数 maxbinlengthA new parameter was created for Azure Synapse Analytics, maxbinlength. 此参数用于控制 BinaryType 列的列长度,并转换为 VARBINARY(maxbinlength)This parameter is used to control the column length of BinaryType columns, and is translated as VARBINARY(maxbinlength). 它可以使用进行设置 .option("maxbinlength", n) ,其中 0 < n <= 8000。It can be set using .option("maxbinlength", n), where 0 < n <= 8000.
    • 将 Azure 存储 SDK 更新为8.6.4,并使 TCP 在 WASB 驱动程序所建立的连接上保持活动状态Update Azure Storage SDK to 8.6.4 and enable TCP keep alive on connections made by the WASB driver
  • 8月25日,2020Aug 25, 2020
    • 修复了自我合并中不明确的特性解析Fixed ambiguous attribute resolution in self-merge
  • 8月18日,2020Aug 18, 2020
    • [SPARK-32431][SQL] 检查从生成的数据源中读取的重复嵌套列[SPARK-32431][SQL] Check duplicate nested columns in read from in-built datasources
    • 修复了使用 Trigger 时 AQS 连接器中的争用条件。Fixed a race condition in the AQS connector when using Trigger.Once.
  • 2020 年 8 月 11 日Aug 11, 2020
    • [SPARK-28676][核心] 避免过多 ContextCleaner 的日志记录[SPARK-28676][CORE] Avoid Excessive logging from ContextCleaner
  • 8月3日,2020Aug 3, 2020
    • 你现在可以在启用了直通的群集上使用 LDA 转换函数。You can now use the LDA transform function on a passthrough-enabled cluster.
    • 操作系统安全更新。Operating system security updates.
  • 7月7日,2020Jul 7, 2020
    • 已将 Java 版本从 1.8.0 _242 升级到 1.8.0 _252。Upgraded Java version from 1.8.0_242 to 1.8.0_252.
  • Apr 21,2020Apr 21, 2020
    • [SPARK-31312][SQL] HIVEFUNCTIONWRAPPER 中 UDF 实例的缓存类实例[SPARK-31312][SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper

Databricks Runtime 6.3 (不 支持) Databricks Runtime 6.3 (Unsupported)

请参阅 Databricks Runtime 6.3 (不支持的) See Databricks Runtime 6.3 (Unsupported).

  • 7月7日,2020Jul 7, 2020
    • 已将 Java 版本从 1.8.0 _232 升级到 1.8.0 _252。Upgraded Java version from 1.8.0_232 to 1.8.0_252.
  • Apr 21,2020Apr 21, 2020
    • [SPARK-31312][SQL] HIVEFUNCTIONWRAPPER 中 UDF 实例的缓存类实例[SPARK-31312][SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper
  • Apr 7,2020Apr 7, 2020
    • 若要解决 pandas udf 无法与 PyArrow 0.15.0 和更高版本结合使用的问题,我们添加了一个环境变量 (ARROW_PRE_0_15_IPC_FORMAT=1) 启用对这些版本的 PyArrow 的支持。To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (ARROW_PRE_0_15_IPC_FORMAT=1) to enable support for those versions of PyArrow. 请参阅 [SPARK-29367]中的说明。See the instructions in [SPARK-29367].
  • 2020 年 3 月 10 日Mar 10, 2020
    • Databricks Runtime 中包含的雪花连接器 (spark-snowflake_2.11) 更新为版本2.5.9。The Snowflake connector (spark-snowflake_2.11) included in Databricks Runtime is updated to version 2.5.9. snowflake-jdbc 已更新到版本3.12.0。snowflake-jdbc is updated to version 3.12.0.
  • 2020年2月18日Feb 18, 2020
    • 当启用 ADLS 客户端预提取时,通过不正确的线程本地处理,与 ADLS Gen2 的凭据传递性能会下降。Credential passthrough with ADLS Gen2 has a performance degradation due to incorrect thread local handling when ADLS client prefetching is enabled. 此版本在启用凭据传递后禁用 ADLS Gen2 预提取,直到我们有正确的修补程序。This release disables ADLS Gen2 prefetching when credential passthrough is enabled until we have a proper fix.
  • 2020 年 2 月 11 日Feb 11, 2020
    • [Spark-24783][SQL] SPARK = 0 应引发异常[SPARK-24783][SQL] spark.sql.shuffle.partitions=0 should throw exception
    • [SPARK-30447][SQL] 常量传播为空性问题[SPARK-30447][SQL] Constant propagation nullability issue
    • [SPARK-28152][SQL] 为旧的 MsSqlServerDialect 数字映射添加旧的会议[SPARK-28152][SQL] Add a legacy conf for old MsSqlServerDialect numeric mapping
    • 将覆盖函数列入白名单,以便 MLModels 扩展 MLWriter 可以调用函数。Whitelisted the overwrite function so that the MLModels extends MLWriter could call the function.

Databricks Runtime 6.2 (不 支持) Databricks Runtime 6.2 (Unsupported)

请参阅 Databricks Runtime 6.2 (不支持的) See Databricks Runtime 6.2 (Unsupported).

  • Apr 21,2020Apr 21, 2020
    • [SPARK-31312][SQL] HIVEFUNCTIONWRAPPER 中 UDF 实例的缓存类实例[SPARK-31312][SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper
  • Apr 7,2020Apr 7, 2020
    • 若要解决 pandas udf 无法与 PyArrow 0.15.0 和更高版本结合使用的问题,我们添加了一个环境变量 (ARROW_PRE_0_15_IPC_FORMAT=1) 启用对这些版本的 PyArrow 的支持。To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (ARROW_PRE_0_15_IPC_FORMAT=1) to enable support for those versions of PyArrow. 请参阅 [SPARK-29367]中的说明。See the instructions in [SPARK-29367].
  • 3月25日,2020Mar 25, 2020
    • 作业输出(如发送到 stdout 的日志输出)受到20MB 的大小限制。Job output, such as log output emitted to stdout, is subject to a 20MB size limit. 如果输出的总大小较大,则该运行将被取消并标记为失败。If the total output has a larger size, the run will be canceled and marked as failed. 若要避免出现此限制,可以通过将 Spark 配置设置为,防止 stdout 从驱动程序返回到 spark.databricks.driver.disableScalaOutput trueTo avoid encountering this limit, you can prevent stdout from being returned from the driver to by setting the spark.databricks.driver.disableScalaOutput Spark configuration to true. 默认情况下,标志值为 falseBy default the flag value is false. 该标志控制 Scala JAR 作业和 Scala 笔记本的单元输出。The flag controls cell output for Scala JAR jobs and Scala notebooks. 如果启用了标志,Spark 不会将作业执行结果返回给客户端。If the flag is enabled, Spark does not return job execution results to the client. 该标志不会影响在群集的日志文件中写入的数据。The flag does not affect the data that is written in the cluster’s log files. 建议将此标志设置为仅用于 JAR 作业的自动群集,因为它将禁用笔记本结果。Setting this flag is recommended only for automated clusters for JAR jobs, because it will disable notebook results.
  • 2020 年 3 月 10 日Mar 10, 2020
    • Databricks Runtime 中包含的雪花连接器 (spark-snowflake_2.11) 更新为版本2.5.9。The Snowflake connector (spark-snowflake_2.11) included in Databricks Runtime is updated to version 2.5.9. snowflake-jdbc 已更新到版本3.12.0。snowflake-jdbc is updated to version 3.12.0.
  • 2020年2月18日Feb 18, 2020
    • [Spark-24783][SQL] SPARK = 0 应引发异常[SPARK-24783][SQL] spark.sql.shuffle.partitions=0 should throw exception
    • 当启用 ADLS 客户端预提取时,通过不正确的线程本地处理,与 ADLS Gen2 的凭据传递性能会下降。Credential passthrough with ADLS Gen2 has a performance degradation due to incorrect thread local handling when ADLS client prefetching is enabled. 此版本在启用凭据传递后禁用 ADLS Gen2 预提取,直到我们有正确的修补程序。This release disables ADLS Gen2 prefetching when credential passthrough is enabled until we have a proper fix.
  • 2020年1月28日Jan 28, 2020
    • 为启用了凭据的群集启用了白名单 ML 模型写入器的覆盖功能,使模型保存可以在凭据传递群集上使用覆盖模式。Whitelisted ML Model Writers’ overwrite function for clusters enabled for credential passthrough, so that model save can use overwrite mode on credential passthrough clusters.
    • [SPARK-30447][SQL] 常量传播为空性问题。[SPARK-30447][SQL] Constant propagation nullability issue.
    • [SPARK-28152][SQL] 为旧的 MsSqlServerDialect 数字映射添加旧的会议。[SPARK-28152][SQL] Add a legacy conf for old MsSqlServerDialect numeric mapping.
  • 2020 年 1 月 14 日Jan 14, 2020
    • 已将 Java 版本从 1.8.0 _222 升级到 1.8.0 _232。Upgraded Java version from 1.8.0_222 to 1.8.0_232.
  • 2019 年 12 月 10 日Dec 10, 2019
    • [SPARK-29904][SQL] 按 JSON/CSV 数据源分析以微秒精度格式的时间戳。[SPARK-29904][SQL] Parse timestamps in microsecond precision by JSON/CSV data sources.

Databricks Runtime 6.1 (不 支持) Databricks Runtime 6.1 (Unsupported)

请参阅 Databricks Runtime 6.1 (不支持的) See Databricks Runtime 6.1 (Unsupported).

  • Apr 7,2020Apr 7, 2020
    • 若要解决 pandas udf 无法与 PyArrow 0.15.0 和更高版本结合使用的问题,我们添加了一个环境变量 (ARROW_PRE_0_15_IPC_FORMAT=1) 启用对这些版本的 PyArrow 的支持。To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (ARROW_PRE_0_15_IPC_FORMAT=1) to enable support for those versions of PyArrow. 请参阅 [SPARK-29367]中的说明。See the instructions in [SPARK-29367].
  • 3月25日,2020Mar 25, 2020
    • 作业输出(如发送到 stdout 的日志输出)受到20MB 的大小限制。Job output, such as log output emitted to stdout, is subject to a 20MB size limit. 如果输出的总大小较大,则该运行将被取消并标记为失败。If the total output has a larger size, the run will be canceled and marked as failed. 若要避免出现此限制,可以通过将 Spark 配置设置为,防止 stdout 从驱动程序返回到 spark.databricks.driver.disableScalaOutput trueTo avoid encountering this limit, you can prevent stdout from being returned from the driver to by setting the spark.databricks.driver.disableScalaOutput Spark configuration to true. 默认情况下,标志值为 falseBy default the flag value is false. 该标志控制 Scala JAR 作业和 Scala 笔记本的单元输出。The flag controls cell output for Scala JAR jobs and Scala notebooks. 如果启用了标志,Spark 不会将作业执行结果返回给客户端。If the flag is enabled, Spark does not return job execution results to the client. 该标志不会影响在群集的日志文件中写入的数据。The flag does not affect the data that is written in the cluster’s log files. 建议将此标志设置为仅用于 JAR 作业的自动群集,因为它将禁用笔记本结果。Setting this flag is recommended only for automated clusters for JAR jobs, because it will disable notebook results.
  • 2020 年 3 月 10 日Mar 10, 2020
    • Databricks Runtime 中包含的雪花连接器 (spark-snowflake_2.11) 更新为版本2.5.9。The Snowflake connector (spark-snowflake_2.11) included in Databricks Runtime is updated to version 2.5.9. snowflake-jdbc 已更新到版本3.12.0。snowflake-jdbc is updated to version 3.12.0.
  • 2020年2月18日Feb 18, 2020
    • [Spark-24783][SQL] SPARK = 0 应引发异常[SPARK-24783][SQL] spark.sql.shuffle.partitions=0 should throw exception
    • 当启用 ADLS 客户端预提取时,通过不正确的线程本地处理,与 ADLS Gen2 的凭据传递性能会下降。Credential passthrough with ADLS Gen2 has a performance degradation due to incorrect thread local handling when ADLS client prefetching is enabled. 此版本在启用凭据传递后禁用 ADLS Gen2 预提取,直到我们有正确的修补程序。This release disables ADLS Gen2 prefetching when credential passthrough is enabled until we have a proper fix.
  • 2020年1月28日Jan 28, 2020
    • [SPARK-30447][SQL] 常量传播为空性问题。[SPARK-30447][SQL] Constant propagation nullability issue.
    • [SPARK-28152][SQL] 为旧的 MsSqlServerDialect 数字映射添加旧的会议。[SPARK-28152][SQL] Add a legacy conf for old MsSqlServerDialect numeric mapping.
  • 2020 年 1 月 14 日Jan 14, 2020
    • 已将 Java 版本从 1.8.0 _222 升级到 1.8.0 _232。Upgraded Java version from 1.8.0_222 to 1.8.0_232.
  • 11月7日,2019Nov 7, 2019
  • 11月5日2019Nov 5, 2019
    • 修复了 DBFS 熔断器中用于处理装入点 // 在其路径中的 bug。Fixed a bug in DBFS FUSE to handle mount points having // in its path.
    • [SPARK-29081] 使用更快的实现替换对属性的 SerializationUtils 调用[SPARK-29081] Replace calls to SerializationUtils.clone on properties with a faster implementation
    • [SPARK-29244][核心] 在 BytesToBytesMap 中再次阻止释放页面[SPARK-29244][CORE] Prevent freed page in BytesToBytesMap free again
    • ** (6.1 ML) ** 无意中安装了库 mkl 版本2019.4。(6.1 ML) Library mkl version 2019.4 was installed unintentionally. 我们将其降级到了 mkl 版本2019.3,以匹配 Anaconda 分发2019.03。We downgraded it to mkl version 2019.3 to match Anaconda Distribution 2019.03.

Databricks Runtime 6.0 (不 支持) Databricks Runtime 6.0 (Unsupported)

请参阅 Databricks Runtime 6.0 (不支持的) See Databricks Runtime 6.0 (Unsupported).

  • 3月25日,2020Mar 25, 2020
    • 作业输出(如发送到 stdout 的日志输出)受到20MB 的大小限制。Job output, such as log output emitted to stdout, is subject to a 20MB size limit. 如果输出的总大小较大,则该运行将被取消并标记为失败。If the total output has a larger size, the run will be canceled and marked as failed. 若要避免出现此限制,可以通过将 Spark 配置设置为,防止 stdout 从驱动程序返回到 spark.databricks.driver.disableScalaOutput trueTo avoid encountering this limit, you can prevent stdout from being returned from the driver to by setting the spark.databricks.driver.disableScalaOutput Spark configuration to true. 默认情况下,标志值为 falseBy default the flag value is false. 该标志控制 Scala JAR 作业和 Scala 笔记本的单元输出。The flag controls cell output for Scala JAR jobs and Scala notebooks. 如果启用了标志,Spark 不会将作业执行结果返回给客户端。If the flag is enabled, Spark does not return job execution results to the client. 该标志不会影响在群集的日志文件中写入的数据。The flag does not affect the data that is written in the cluster’s log files. 建议将此标志设置为仅用于 JAR 作业的自动群集,因为它将禁用笔记本结果。Setting this flag is recommended only for automated clusters for JAR jobs, because it will disable notebook results.
  • 2020年2月18日Feb 18, 2020
    • 当启用 ADLS 客户端预提取时,通过不正确的线程本地处理,与 ADLS Gen2 的凭据传递性能会下降。Credential passthrough with ADLS Gen2 has a performance degradation due to incorrect thread local handling when ADLS client prefetching is enabled. 此版本在启用凭据传递后禁用 ADLS Gen2 预提取,直到我们有正确的修补程序。This release disables ADLS Gen2 prefetching when credential passthrough is enabled until we have a proper fix.
  • 2020 年 2 月 11 日Feb 11, 2020
  • 2020年1月28日Jan 28, 2020
    • [SPARK-30447][SQL] 常量传播为空性问题。[SPARK-30447][SQL] Constant propagation nullability issue.
    • [SPARK-28152][SQL] 为旧的 MsSqlServerDialect 数字映射添加旧的会议。[SPARK-28152][SQL] Add a legacy conf for old MsSqlServerDialect numeric mapping.
  • 2020 年 1 月 14 日Jan 14, 2020
    • 已将 Java 版本从 1.8.0 _222 升级到 1.8.0 _232。Upgraded Java version from 1.8.0_222 to 1.8.0_232.
  • 11月19日2019Nov 19, 2019
    • [SPARK-29743] [SQL] 如果其子 needCopyResult 为 true,则示例应将 needCopyResult 设置为 true[SPARK-29743] [SQL] sample should set needCopyResult to true if its child’s needCopyResult is true
  • 11月5日2019Nov 5, 2019
    • dbutils.tensorboard.start() 现在,如果手动安装) ,则支持 TensorBoard 2.0 (。dbutils.tensorboard.start() now supports TensorBoard 2.0 (if installed manually).
    • 修复了 DBFS 熔断器中用于处理装入点 // 在其路径中的 bug。Fixed a bug in DBFS FUSE to handle mount points having // in its path.
    • [SPARK-29081]使用更快的实现替换对属性的 SerializationUtils 调用[SPARK-29081]Replace calls to SerializationUtils.clone on properties with a faster implementation
  • 2019 年 10 月 23 日Oct 23, 2019
    • [SPARK-29244][核心] 在 BytesToBytesMap 中再次阻止释放页面[SPARK-29244][CORE] Prevent freed page in BytesToBytesMap free again
  • 2019 年 10 月 8 日Oct 8, 2019
    • 服务器端更改为允许 Simba Apache Spark ODBC 驱动程序在提取结果期间发生连接故障时重新连接并继续, (需要 Simba APACHE SPARK ODBC 驱动程序版本 2.6.10) 。Server side changes to allow Simba Apache Spark ODBC driver to reconnect and continue after a connection failure during fetching results (requires Simba Apache Spark ODBC driver version 2.6.10).
    • 修复了影响 Optimize 启用了表 ACL 的群集的 using 命令的问题。Fixed an issue affecting using Optimize command with table ACL enabled clusters.
    • 修复了这样一个问题: pyspark.ml 由于在表 ACL 和凭据启用的群集上出现 SCALA UDF 禁止错误,库将会失败。Fixed an issue where pyspark.ml libraries would fail due to Scala UDF forbidden error on table ACL and credential passthrough enabled clusters.
    • 用于凭据传递的白名单 SerDe/SerDeUtil 方法。Whitelisted SerDe/SerDeUtil methods for credential passthrough.
    • 修复了在 WASB 客户端中检查错误代码时 NullPointerException。Fixed NullPointerException when checking error code in the WASB client.
    • 修复了用户凭据未转发到创建的作业的问题 dbutils.notebook.run()Fixed the issue where user credentials were not forwarded to jobs created by dbutils.notebook.run().

Databricks Runtime 5.4 ML (不受 支持) Databricks Runtime 5.4 ML (Unsupported)

请参阅 Databricks Runtime 5.4 ML (不支持的) See Databricks Runtime 5.4 ML (Unsupported).

  • 2019 年 6 月 18 日Jun 18, 2019
    • 提高了对 Hyperopt 集成中的 MLflow 活动运行的处理Improved handling of MLflow active runs in Hyperopt integration
    • 已改进 Hyperopt 中的消息Improved messages in Hyperopt
    • 将包 markdown 从3.1 更新为3.1。1Updated package markdown from 3.1 to 3.1.1

Databricks Runtime 5.4 (不 支持) Databricks Runtime 5.4 (Unsupported)

请参阅 Databricks Runtime 5.4 (不支持的) See Databricks Runtime 5.4 (Unsupported).

  • 11月19日2019Nov 19, 2019
    • [SPARK-29743] [SQL] 如果其子 needCopyResult 为 true,则示例应将 needCopyResult 设置为 true[SPARK-29743] [SQL] sample should set needCopyResult to true if its child’s needCopyResult is true
  • 2019 年 10 月 8 日Oct 8, 2019
    • 服务器端更改,以允许 Simba Apache Spark ODBC 驱动程序在提取结果失败后重新连接并继续, (需要 Simba Apache Spark ODBC 驱动程序更新为版本 2.6.10) 。Server side changes to allow Simba Apache Spark ODBC driver to reconnect and continue after a connection failure during fetching results (requires Simba Apache Spark ODBC driver update to version 2.6.10).
    • 修复了在 WASB 客户端中检查错误代码时 NullPointerException。Fixed NullPointerException when checking error code in the WASB client.
  • 09月10日,2019Sep 10, 2019
    • 向 BytesToBytesMap 添加线程安全迭代器Add thread safe iterator to BytesToBytesMap
    • 修复了影响某些全局聚合查询的 bug。Fixed a bug affecting certain global aggregation queries.
    • [SPARK-27330][SS] 支持在 foreach writer 中中止任务[SPARK-27330][SS] support task abort in foreach writer
    • [SPARK-28642]隐藏 CREATE TABLE 中的凭据[SPARK-28642]Hide credentials in SHOW CREATE TABLE
    • [SPARK-28699][SQL] 禁用对 ShuffleExchangeExec 进行重新分区时使用基数排序[SPARK-28699][SQL] Disable using radix sort for ShuffleExchangeExec in repartition case
    • [SPARK-28699][核心] 修复了中止不确定阶段的一个拐角情况[SPARK-28699][CORE] Fix a corner case for aborting indeterminate stage
  • 2019 年 8 月 27 日Aug 27, 2019
    • 修复了影响某些表达式的问题 transformFixed an issue affecting certain transform expressions
  • 2019 年 8 月 13 日Aug 13, 2019
    • 增量流式处理源应检查表的最新协议Delta streaming source should check the latest protocol of a table
    • [SPARK-28489][SS] 修复 KafkaOffsetRangeCalculator 的错误。 getRanges 可能会丢弃偏移量[SPARK-28489][SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets
  • 2019年7月30日Jul 30, 2019
    • [SPARK-28015][SQL] 检查 stringToDate ( # A1 对 yyyy 和 yyyy-[m] m 格式使用整个输入[SPARK-28015][SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats
    • [SPARK-28308][CORE] 在分析之前应填充 CalendarInterval 子部分[SPARK-28308][CORE] CalendarInterval sub-second part should be padded before parsing
    • [SPARK-27485]EnsureRequirements 应妥善处理重复表达式[SPARK-27485]EnsureRequirements.reorder should handle duplicate expressions gracefully
  • 7月2日,2019Jul 2, 2019
    • 已将 snappy 从1.1.7.1 升级到1.1.7.3。Upgraded snappy-java from 1.1.7.1 to 1.1.7.3.
  • 2019 年 6 月 18 日Jun 18, 2019
    • 提高了对 MLlib 集成中的 MLflow 活动运行的处理Improved handling of MLflow active runs in MLlib integration
    • 改善了与使用增量缓存相关的 Databricks Advisor 消息Improved Databricks Advisor message related to using Delta cache
    • 修复了使用高阶函数影响的 bugFixed a bug affecting using higher order functions
    • 修复了影响增量元数据查询的 bugFixed a bug affecting Delta metadata queries

Databricks Runtime 5.3 (不 支持) Databricks Runtime 5.3 (Unsupported)

请参阅 Databricks Runtime 5.3 (不支持的) See Databricks Runtime 5.3 (Unsupported).

  • 11月7日,2019Nov 7, 2019
    • [SPARK-29743][SQL] 如果其子 needCopyResult 为 true,则示例应将 needCopyResult 设置为 true[SPARK-29743][SQL] sample should set needCopyResult to true if its child’s needCopyResult is true
  • 2019 年 10 月 8 日Oct 8, 2019
    • 服务器端更改,以允许 Simba Apache Spark ODBC 驱动程序在提取结果失败后重新连接并继续, (需要 Simba Apache Spark ODBC 驱动程序更新为版本 2.6.10) 。Server side changes to allow Simba Apache Spark ODBC driver to reconnect and continue after a connection failure during fetching results (requires Simba Apache Spark ODBC driver update to version 2.6.10).
    • 修复了在 WASB 客户端中检查错误代码时 NullPointerException。Fixed NullPointerException when checking error code in the WASB client.
  • 09月10日,2019Sep 10, 2019
    • 向 BytesToBytesMap 添加线程安全迭代器Add thread safe iterator to BytesToBytesMap
    • 修复了影响某些全局聚合查询的 bug。Fixed a bug affecting certain global aggregation queries.
    • [SPARK-27330][SS] 支持在 foreach writer 中中止任务[SPARK-27330][SS] support task abort in foreach writer
    • [SPARK-28642]隐藏 CREATE TABLE 中的凭据[SPARK-28642]Hide credentials in SHOW CREATE TABLE
    • [SPARK-28699][SQL] 禁用对 ShuffleExchangeExec 进行重新分区时使用基数排序[SPARK-28699][SQL] Disable using radix sort for ShuffleExchangeExec in repartition case
    • [SPARK-28699][核心] 修复了中止不确定阶段的一个拐角情况[SPARK-28699][CORE] Fix a corner case for aborting indeterminate stage
  • 2019 年 8 月 27 日Aug 27, 2019
    • 修复了影响某些表达式的问题 transformFixed an issue affecting certain transform expressions
  • 2019 年 8 月 13 日Aug 13, 2019
    • 增量流式处理源应检查表的最新协议Delta streaming source should check the latest protocol of a table
    • [SPARK-28489][SS] 修复 KafkaOffsetRangeCalculator 的错误。 getRanges 可能会丢弃偏移量[SPARK-28489][SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets
  • 2019年7月30日Jul 30, 2019
    • [SPARK-28015][SQL] 检查 stringToDate ( # A1 对 yyyy 和 yyyy-[m] m 格式使用整个输入[SPARK-28015][SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats
    • [SPARK-28308][CORE] 在分析之前应填充 CalendarInterval 子部分[SPARK-28308][CORE] CalendarInterval sub-second part should be padded before parsing
    • [SPARK-27485]EnsureRequirements 应妥善处理重复表达式[SPARK-27485]EnsureRequirements.reorder should handle duplicate expressions gracefully
  • 2019 年 6 月 18 日Jun 18, 2019
    • 改善了与使用增量缓存相关的 Databricks Advisor 消息Improved Databricks Advisor message related to using Delta cache
    • 修复了使用高阶函数影响的 bugFixed a bug affecting using higher order functions
    • 修复了影响增量元数据查询的 bugFixed a bug affecting Delta metadata queries
  • 2019 年 5 月 28 日May 28, 2019
    • 提高了增量的稳定性Improved the stability of Delta
    • 读取增量 LAST_CHECKPOINT 文件时容忍 IOExceptionsTolerate IOExceptions when reading Delta LAST_CHECKPOINT file
      • 添加了对失败的库安装的恢复Added recovery to failed library installation
  • 2019 年 5 月 7 日May 7, 2019
    • 端口 HADOOP-15778 (ABFS:修复 Azure Data Lake Storage Gen2 连接器的读取) 的客户端限制Port HADOOP-15778 (ABFS: Fix client side throttling for read) to Azure Data Lake Storage Gen2 connector
    • 端口 HADOOP-16040 (ABFS: tolerateOobAppends 配置的 Bug 修复) 到 Azure Data Lake Storage Gen2 连接器Port HADOOP-16040 (ABFS: Bug fix for tolerateOobAppends configuration) to Azure Data Lake Storage Gen2 connector
    • 修复了影响表 Acl 的 bugFixed a bug affecting table ACLs
    • 修复了加载增量日志校验和文件时的争用情况Fixed a race condition when loading a Delta log checksum file
    • 修复了增量冲突检测逻辑,不会将 "插入 + 覆盖" 标识为纯 "append" 操作Fixed Delta conflict detection logic to not identify “insert + overwrite” as pure “append” operation
    • 确保启用表 Acl 时不禁用 DBIO 缓存Ensure that DBIO cache is not disabled when Table ACLs are enabled
    • [SPARK-27494]汇总空键/值在 Kafka source v2 中不起作用[SPARK-27494][SS] Null keys/values don’t work in Kafka source v2
    • [SPARK-27446][R] 使用现有 spark 会议(如果可用)。[SPARK-27446][R] Use existing spark conf if available.
    • [SPARK-27454][SPARK-27454]MLTRANSACT-SQL如果遇到某些非法图像,Spark 映像数据源会失败[SPARK-27454][SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images
    • [SPARK-27160]TRANSACT-SQL构建 orc 筛选器时修复 DecimalType[SPARK-27160][SQL] Fix DecimalType when building orc filters
    • [SPARK-27338]转储修复 UnsafeExternalSorter 和 TaskMemoryManager 之间的死锁[SPARK-27338][CORE] Fix deadlock between UnsafeExternalSorter and TaskMemoryManager

Databricks Runtime 5.2 (不 支持) Databricks Runtime 5.2 (Unsupported)

请参阅 Databricks Runtime 5.2 (不支持的) See Databricks Runtime 5.2 (Unsupported).

  • 09月10日,2019Sep 10, 2019
    • 向 BytesToBytesMap 添加线程安全迭代器Add thread safe iterator to BytesToBytesMap
    • 修复了影响某些全局聚合查询的 bug。Fixed a bug affecting certain global aggregation queries.
    • [SPARK-27330][SS] 支持在 foreach writer 中中止任务[SPARK-27330][SS] support task abort in foreach writer
    • [SPARK-28642]隐藏 CREATE TABLE 中的凭据[SPARK-28642]Hide credentials in SHOW CREATE TABLE
    • [SPARK-28699][SQL] 禁用对 ShuffleExchangeExec 进行重新分区时使用基数排序[SPARK-28699][SQL] Disable using radix sort for ShuffleExchangeExec in repartition case
    • [SPARK-28699][核心] 修复了中止不确定阶段的一个拐角情况[SPARK-28699][CORE] Fix a corner case for aborting indeterminate stage
  • 2019 年 8 月 27 日Aug 27, 2019
    • 修复了影响某些表达式的问题 transformFixed an issue affecting certain transform expressions
  • 2019 年 8 月 13 日Aug 13, 2019
    • 增量流式处理源应检查表的最新协议Delta streaming source should check the latest protocol of a table
    • [SPARK-28489][SS] 修复 KafkaOffsetRangeCalculator 的错误。 getRanges 可能会丢弃偏移量[SPARK-28489][SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets
  • 2019年7月30日Jul 30, 2019
    • [SPARK-28015][SQL] 检查 stringToDate ( # A1 对 yyyy 和 yyyy-[m] m 格式使用整个输入[SPARK-28015][SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats
    • [SPARK-28308][CORE] 在分析之前应填充 CalendarInterval 子部分[SPARK-28308][CORE] CalendarInterval sub-second part should be padded before parsing
    • [SPARK-27485]EnsureRequirements 应妥善处理重复表达式[SPARK-27485]EnsureRequirements.reorder should handle duplicate expressions gracefully
  • 7月2日,2019Jul 2, 2019
    • 读取增量 LAST_CHECKPOINT 文件时容忍 IOExceptionsTolerate IOExceptions when reading Delta LAST_CHECKPOINT file
  • 2019 年 6 月 18 日Jun 18, 2019
    • 改善了与使用增量缓存相关的 Databricks Advisor 消息Improved Databricks Advisor message related to using Delta cache
    • 修复了使用高阶函数影响的 bugFixed a bug affecting using higher order functions
    • 修复了影响增量元数据查询的 bugFixed a bug affecting Delta metadata queries
  • 2019 年 5 月 28 日May 28, 2019
    • 添加了对失败的库安装的恢复Added recovery to failed library installation
  • 2019 年 5 月 7 日May 7, 2019
    • 端口 HADOOP-15778 (ABFS:修复 Azure Data Lake Storage Gen2 连接器的读取) 的客户端限制Port HADOOP-15778 (ABFS: Fix client side throttling for read) to Azure Data Lake Storage Gen2 connector
    • 端口 HADOOP-16040 (ABFS: tolerateOobAppends 配置的 Bug 修复) 到 Azure Data Lake Storage Gen2 连接器Port HADOOP-16040 (ABFS: Bug fix for tolerateOobAppends configuration) to Azure Data Lake Storage Gen2 connector
    • 修复了加载增量日志校验和文件时的争用情况Fixed a race condition when loading a Delta log checksum file
    • 修复了增量冲突检测逻辑,不会将 "插入 + 覆盖" 标识为纯 "append" 操作Fixed Delta conflict detection logic to not identify “insert + overwrite” as pure “append” operation
    • 确保启用表 Acl 时不禁用 DBIO 缓存Ensure that DBIO cache is not disabled when Table ACLs are enabled
    • [SPARK-27494]汇总空键/值在 Kafka source v2 中不起作用[SPARK-27494][SS] Null keys/values don’t work in Kafka source v2
    • [SPARK-27454][SPARK-27454]MLTRANSACT-SQL如果遇到某些非法图像,Spark 映像数据源会失败[SPARK-27454][SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images
    • [SPARK-27160]TRANSACT-SQL构建 orc 筛选器时修复 DecimalType[SPARK-27160][SQL] Fix DecimalType when building orc filters
    • [SPARK-27338]转储修复 UnsafeExternalSorter 和 TaskMemoryManager 之间的死锁[SPARK-27338][CORE] Fix deadlock between UnsafeExternalSorter and TaskMemoryManager
  • 3月26日,2019Mar 26, 2019
    • 避免在整个阶段生成的代码中完全嵌入平台相关的偏移量Avoid embedding platform-dependent offsets literally in whole-stage generated code
    • [SPARK-26665][CORE] 修复了 BlockTransferService 可能永远挂起的 bug。[SPARK-26665][CORE] Fix a bug that BlockTransferService.fetchBlockSync may hang forever.
    • [SPARK-27134][SQL] array_distinct 函数对于包含数组数组的列不能正常工作。[SPARK-27134][SQL] array_distinct function does not work correctly with columns containing array of array.
    • [SPARK-24669][SQL] 在删除数据库级联的情况下使表失效。[SPARK-24669][SQL] Invalidate tables in case of DROP DATABASE CASCADE.
    • [SPARK-26572][SQL] 修复聚合 codegen 结果计算。[SPARK-26572][SQL] fix aggregate codegen result evaluation.
    • 修复了影响某些 PythonUDFs 的 bug。Fixed a bug affecting certain PythonUDFs.
  • 2019年2月26日Feb 26, 2019
    • [SPARK-26864][SQL] 如果 python udf 用作左半联接条件,则查询可能会返回不正确的结果。[SPARK-26864][SQL] Query may return incorrect result when python udf is used as a left-semi join condition.
    • [SPARK-26887][PYTHON] 直接创建日期时间,而不是以中间数据形式创建 datetime64。[SPARK-26887][PYTHON] Create datetime.date directly instead of creating datetime64 as intermediate data.
    • 修复了影响 JDBC/ODBC 服务器的 bug。Fixed a bug affecting JDBC/ODBC server.
    • 修复了影响 PySpark 的 bug。Fixed a bug affecting PySpark.
    • 生成 HadoopRDD 时排除隐藏的文件。Exclude the hidden files when building HadoopRDD.
    • 修复了导致序列化问题的增量中的 bug。Fixed a bug in Delta that caused serialization issues.
  • 2019年2月12日Feb 12, 2019
    • 修复了对 Azure ADLS Gen2 装入点使用增量影响的问题。Fixed an issue affecting using Delta with Azure ADLS Gen2 mount points.
    • 修复了在 spark.network.crypto.enabled 将设置为 true) 时,当发送启用了加密 (的大型 RPC 错误消息时,Spark 低级别网络协议可能会中断的问题。Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when spark.network.crypto.enabled is set to true).
  • 2019年1月30日Jan 30, 2019
    • 修复了在缓存的关系上放置倾斜联接提示时的 StackOverflowError。Fixed the StackOverflowError when putting skew join hint on cached relation.
    • 修复了 SQL 缓存的缓存 RDD 和其物理计划之间的不一致,这会导致不正确的结果。Fixed the inconsistency between a SQL cache’s cached RDD and its physical plan, which causes incorrect result.
    • [SPARK-26706][SQL] 修复 illegalNumericPrecedence ByteType。[SPARK-26706][SQL] Fix illegalNumericPrecedence for ByteType.
    • [SPARK-26709][SQL] OptimizeMetadataOnlyQuery 无法正确处理空记录。[SPARK-26709][SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.
    • 在推断架构时,CSV/JSON 数据源应避免组合路径。CSV/JSON data sources should avoid globbing paths when inferring schema.
    • 固定了对窗口运算符的约束推理。Fixed constraint inference on Window operator.
    • 修复了影响使用启用了表 ACL 的群集安装蛋状物库的问题。Fixed an issue affecting installing egg libraries with clusters having table ACL enabled.

Databricks Runtime 5.1 (不 支持) Databricks Runtime 5.1 (Unsupported)

请参阅 Databricks Runtime 5.1 (不支持的) See Databricks Runtime 5.1 (Unsupported).

  • 2019 年 8 月 13 日Aug 13, 2019
    • 增量流式处理源应检查表的最新协议Delta streaming source should check the latest protocol of a table
    • [SPARK-28489][SS] 修复 KafkaOffsetRangeCalculator 的错误。 getRanges 可能会丢弃偏移量[SPARK-28489][SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets
  • 2019年7月30日Jul 30, 2019
    • [SPARK-28015][SQL] 检查 stringToDate ( # A1 对 yyyy 和 yyyy-[m] m 格式使用整个输入[SPARK-28015][SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats
    • [SPARK-28308][CORE] 在分析之前应填充 CalendarInterval 子部分[SPARK-28308][CORE] CalendarInterval sub-second part should be padded before parsing
    • [SPARK-27485]EnsureRequirements 应妥善处理重复表达式[SPARK-27485]EnsureRequirements.reorder should handle duplicate expressions gracefully
  • 7月2日,2019Jul 2, 2019
    • 读取增量 LAST_CHECKPOINT 文件时容忍 IOExceptionsTolerate IOExceptions when reading Delta LAST_CHECKPOINT file
  • 2019 年 6 月 18 日Jun 18, 2019
    • 修复了使用高阶函数影响的 bugFixed a bug affecting using higher order functions
    • 修复了影响增量元数据查询的 bugFixed a bug affecting Delta metadata queries
  • 2019 年 5 月 28 日May 28, 2019
    • 添加了对失败的库安装的恢复Added recovery to failed library installation
  • 2019 年 5 月 7 日May 7, 2019
    • 端口 HADOOP-15778 (ABFS:修复 Azure Data Lake Storage Gen2 连接器的读取) 的客户端限制Port HADOOP-15778 (ABFS: Fix client side throttling for read) to Azure Data Lake Storage Gen2 connector
    • 端口 HADOOP-16040 (ABFS: tolerateOobAppends 配置的 Bug 修复) 到 Azure Data Lake Storage Gen2 连接器Port HADOOP-16040 (ABFS: Bug fix for tolerateOobAppends configuration) to Azure Data Lake Storage Gen2 connector
    • 修复了加载增量日志校验和文件时的争用情况Fixed a race condition when loading a Delta log checksum file
    • 修复了增量冲突检测逻辑,不会将 "插入 + 覆盖" 标识为纯 "append" 操作Fixed Delta conflict detection logic to not identify “insert + overwrite” as pure “append” operation
    • [SPARK-27494]汇总空键/值在 Kafka source v2 中不起作用[SPARK-27494][SS] Null keys/values don’t work in Kafka source v2
    • [SPARK-27454][SPARK-27454]MLTRANSACT-SQL如果遇到某些非法图像,Spark 映像数据源会失败[SPARK-27454][SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images
    • [SPARK-27160]TRANSACT-SQL构建 orc 筛选器时修复 DecimalType[SPARK-27160][SQL] Fix DecimalType when building orc filters
    • [SPARK-27338]转储修复 UnsafeExternalSorter 和 TaskMemoryManager 之间的死锁[SPARK-27338][CORE] Fix deadlock between UnsafeExternalSorter and TaskMemoryManager
  • 3月26日,2019Mar 26, 2019
    • 避免在整个阶段生成的代码中完全嵌入平台相关的偏移量Avoid embedding platform-dependent offsets literally in whole-stage generated code
    • 修复了影响某些 PythonUDFs 的 bug。Fixed a bug affecting certain PythonUDFs.
  • 2019年2月26日Feb 26, 2019
    • [SPARK-26864][SQL] 如果 python udf 用作左半联接条件,则查询可能会返回不正确的结果。[SPARK-26864][SQL] Query may return incorrect result when python udf is used as a left-semi join condition.
    • 修复了影响 JDBC/ODBC 服务器的 bug。Fixed a bug affecting JDBC/ODBC server.
    • 生成 HadoopRDD 时排除隐藏的文件。Exclude the hidden files when building HadoopRDD.
  • 2019年2月12日Feb 12, 2019
    • 修复了影响使用启用了表 ACL 的群集安装蛋状物库的问题。Fixed an issue affecting installing egg libraries with clusters having table ACL enabled.
    • 修复了 SQL 缓存的缓存 RDD 和其物理计划之间的不一致,这会导致不正确的结果。Fixed the inconsistency between a SQL cache’s cached RDD and its physical plan, which causes incorrect result.
    • [SPARK-26706][SQL] 修复 illegalNumericPrecedence ByteType。[SPARK-26706][SQL] Fix illegalNumericPrecedence for ByteType.
    • [SPARK-26709][SQL] OptimizeMetadataOnlyQuery 无法正确处理空记录。[SPARK-26709][SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.
    • 固定了对窗口运算符的约束推理。Fixed constraint inference on Window operator.
    • 修复了在 spark.network.crypto.enabled 将设置为 true) 时,当发送启用了加密 (的大型 RPC 错误消息时,Spark 低级别网络协议可能会中断的问题。Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when spark.network.crypto.enabled is set to true).
  • 2019年1月30日Jan 30, 2019
    • 修复了一个问题,该问题可能导致 df.rdd.count() UDT 导致某些情况下出现错误答案。Fixed an issue that can cause df.rdd.count() with UDT to return incorrect answer for certain cases.
    • 修复了影响安装 wheelhouses 的问题。Fixed an issue affecting installing wheelhouses.
    • [SPARK-26267]检测 Kafka 中的错误偏移量时重试。[SPARK-26267]Retry when detecting incorrect offsets from Kafka.
    • 修复了一个 bug,该 bug 影响流式处理查询中的多个文件流源。Fixed a bug that affects multiple file stream sources in a streaming query.
    • 修复了在缓存的关系上放置倾斜联接提示时的 StackOverflowError。Fixed the StackOverflowError when putting skew join hint on cached relation.
    • 修复了 SQL 缓存的缓存 RDD 和其物理计划之间的不一致,这会导致不正确的结果。Fixed the inconsistency between a SQL cache’s cached RDD and its physical plan, which causes incorrect result.
  • 2019年1月8日Jan 8, 2019
    • 修复了导致错误的问题 org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelistedFixed issue that causes the error org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelisted.
    • [SPARK-26352]联接重新排序不应更改输出属性的顺序。[SPARK-26352]join reordering should not change the order of output attributes.
    • [SPARK-26366]ReplaceExceptWithFilter 应将 NULL 视为 False。[SPARK-26366]ReplaceExceptWithFilter should consider NULL as False.
    • 增量 Lake 的稳定性改进。Stability improvement for Delta Lake.
    • 已启用增量 Lake。Delta Lake is enabled.
    • 修复了为 Azure Data Lake Storage Gen1 启用 Azure AD 凭据传递时导致失败 Azure Data Lake Storage Gen2 访问的问题。Fixed the issue that caused failed Azure Data Lake Storage Gen2 access when Azure AD Credential Passthrough is enabled for Azure Data Lake Storage Gen1.
    • 现在为所有定价层启用了 Ls 系列辅助角色实例类型的 Databricks IO 缓存。Databricks IO Cache is now enabled for Ls series worker instance types for all pricing tiers.

Databricks Runtime 5.0 (不 支持) Databricks Runtime 5.0 (Unsupported)

请参阅 Databricks Runtime 5.0 (不支持的) See Databricks Runtime 5.0 (Unsupported).

  • 2019 年 6 月 18 日Jun 18, 2019
    • 修复了使用高阶函数影响的 bugFixed a bug affecting using higher order functions
  • 2019 年 5 月 7 日May 7, 2019
    • 修复了加载增量日志校验和文件时的争用情况Fixed a race condition when loading a Delta log checksum file
    • 修复了增量冲突检测逻辑,不会将 "插入 + 覆盖" 标识为纯 "append" 操作Fixed Delta conflict detection logic to not identify “insert + overwrite” as pure “append” operation
    • [SPARK-27494]汇总空键/值在 Kafka source v2 中不起作用[SPARK-27494][SS] Null keys/values don’t work in Kafka source v2
    • [SPARK-27454][SPARK-27454]MLTRANSACT-SQL如果遇到某些非法图像,Spark 映像数据源会失败[SPARK-27454][SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images
    • [SPARK-27160]TRANSACT-SQL构建 orc 筛选器时修复 DecimalType[SPARK-27160][SQL] Fix DecimalType when building orc filters
      • [SPARK-27338]转储修复 UnsafeExternalSorter 和 TaskMemoryManager 之间的死锁[SPARK-27338][CORE] Fix deadlock between UnsafeExternalSorter and TaskMemoryManager
  • 3月26日,2019Mar 26, 2019
    • 避免在整个阶段生成的代码中完全嵌入平台相关的偏移量Avoid embedding platform-dependent offsets literally in whole-stage generated code
    • 修复了影响某些 PythonUDFs 的 bug。Fixed a bug affecting certain PythonUDFs.
  • 3月12日,2019Mar 12, 2019
    • [SPARK-26864][SQL] 如果 python udf 用作左半联接条件,则查询可能会返回不正确的结果。[SPARK-26864][SQL] Query may return incorrect result when python udf is used as a left-semi join condition.
  • 2019年2月26日Feb 26, 2019
    • 修复了影响 JDBC/ODBC 服务器的 bug。Fixed a bug affecting JDBC/ODBC server.
    • 生成 HadoopRDD 时排除隐藏的文件。Exclude the hidden files when building HadoopRDD.
  • 2019年2月12日Feb 12, 2019
    • 修复了 SQL 缓存的缓存 RDD 和其物理计划之间的不一致,这会导致不正确的结果。Fixed the inconsistency between a SQL cache’s cached RDD and its physical plan, which causes incorrect result.
    • [SPARK-26706][SQL] 修复 illegalNumericPrecedence ByteType。[SPARK-26706][SQL] Fix illegalNumericPrecedence for ByteType.
    • [SPARK-26709][SQL] OptimizeMetadataOnlyQuery 无法正确处理空记录。[SPARK-26709][SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.
    • 固定了对窗口运算符的约束推理。Fixed constraint inference on Window operator.
    • 修复了在 spark.network.crypto.enabled 将设置为 true) 时,当发送启用了加密 (的大型 RPC 错误消息时,Spark 低级别网络协议可能会中断的问题。Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when spark.network.crypto.enabled is set to true).
  • 2019年1月30日Jan 30, 2019
    • 修复了一个问题,该问题可能导致 df.rdd.count() UDT 导致某些情况下出现错误答案。Fixed an issue that can cause df.rdd.count() with UDT to return incorrect answer for certain cases.
    • [SPARK-26267]检测 Kafka 中的错误偏移量时重试。[SPARK-26267]Retry when detecting incorrect offsets from Kafka.
    • 修复了一个 bug,该 bug 影响流式处理查询中的多个文件流源。Fixed a bug that affects multiple file stream sources in a streaming query.
    • 修复了在缓存的关系上放置倾斜联接提示时的 StackOverflowError。Fixed the StackOverflowError when putting skew join hint on cached relation.
    • 修复了 SQL 缓存的缓存 RDD 和其物理计划之间的不一致,这会导致不正确的结果。Fixed the inconsistency between a SQL cache’s cached RDD and its physical plan, which causes incorrect result.
  • 2019年1月8日Jan 8, 2019
    • 修复了导致错误的问题 org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelistedFixed issue that caused the error org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelisted.
    • [SPARK-26352]联接重新排序不应更改输出属性的顺序。[SPARK-26352]join reordering should not change the order of output attributes.
    • [SPARK-26366]ReplaceExceptWithFilter 应将 NULL 视为 False。[SPARK-26366]ReplaceExceptWithFilter should consider NULL as False.
    • 增量 Lake 的稳定性改进。Stability improvement for Delta Lake.
    • 已启用增量 Lake。Delta Lake is enabled.
    • 现在为所有定价层启用了 Ls 系列辅助角色实例类型的 Databricks IO 缓存。Databricks IO Cache is now enabled for Ls series worker instance types for all pricing tiers.
  • 2018年12月18日Dec 18, 2018
    • [SPARK-26293]在子查询中具有 Python UDF 时出现强制转换异常[SPARK-26293]Cast exception when having Python UDF in subquery
    • 修复了使用联接和限制影响某些查询的问题。Fixed an issue affecting certain queries using Join and Limit.
    • Spark UI 中的 RDD 名称的修正凭据Redacted credentials from RDD names in Spark UI
  • 2018年12月6日Dec 6, 2018
    • 修复了使用 orderBy 后跟随 groupBy,并使用按 ctrl 键作为排序关键字的前导部分时,导致查询结果不正确的问题。Fixed an issue that caused incorrect query result when using orderBy followed immediately by groupBy with group-by key as the leading part of the sort-by key.
    • 已将 Spark 的雪花型连接器从 spark_2 升级到4_pre_release 到2。Upgraded Snowflake Connector for Spark from 2.4.9.2-spark_2.4_pre_release to 2.4.10.
    • spark.sql.files.ignoreCorruptFiles启用或标志后,仅在一次或多次重试后忽略损坏 spark.sql.files.ignoreMissingFiles 的文件。Only ignore corrupt files after one or more retries when spark.sql.files.ignoreCorruptFiles or spark.sql.files.ignoreMissingFiles flag is enabled.
    • 修复了影响某些自助查询的问题。Fixed an issue affecting certain self union queries.
    • 修复了 thrift 服务器的 bug,其中会话有时会在取消时泄露。Fixed a bug with the thrift server where sessions are sometimes leaked when cancelled.
    • [SPARK-26307]修复了使用 Hive SerDe 插入分区表时的 CTAS。[SPARK-26307]Fixed CTAS when INSERT a partitioned table using Hive SerDe.
    • [SPARK-26147]即使仅使用一侧联接的列,联接条件中的 Python Udf 也会失败[SPARK-26147]Python UDFs in join condition fail even when using columns from only one side of join
    • [SPARK-26211]为 binary 和具有 null 的结构和数组修复嵌入。[SPARK-26211]Fix InSet for binary, and struct and array with null.
    • [SPARK-26181]hasMinMaxStats 方法 ColumnStatsMap 不正确。[SPARK-26181]the hasMinMaxStats method of ColumnStatsMap is not correct.
    • 修复了在没有 Internet 访问的环境中影响安装 Python 轮的问题。Fixed an issue affecting installing Python Wheels in environments without Internet access.
  • 2018 年 11 月 20 日Nov 20, 2018
    • 修复了取消流式处理查询后导致笔记本不可用的问题。Fixed an issue that caused a notebook not usable after cancelling a streaming query.
    • 修复了使用窗口函数影响某些查询的问题。Fixed an issue affecting certain queries using window functions.
    • 修复了影响流与多个架构更改之间的差异的问题。Fixed an issue affecting a stream from Delta with multiple schema changes.
    • 修复了利用左半/反联接影响某些聚合查询的问题。Fixed an issue affecting certain aggregation queries with Left Semi/Anti joins.

Databricks Runtime 4.3 (不 支持) Databricks Runtime 4.3 (Unsupported)

请参阅 Databricks Runtime 4.3See Databricks Runtime 4.3.

  • Apr 9,2019Apr 9, 2019

    • [SPARK-26665][核心] 修复了一个 bug,该 bug 会导致 BlockTransferService 永远挂起。[SPARK-26665][CORE] Fix a bug that can cause BlockTransferService.fetchBlockSync to hang forever.
    • [SPARK-24669][SQL] 在删除数据库级联的情况下使表失效。[SPARK-24669][SQL] Invalidate tables in case of DROP DATABASE CASCADE.
  • 3月12日,2019Mar 12, 2019

    • 修复了影响代码生成的 bug。Fixed a bug affecting code generation.
    • 修复了影响增量的 bug。Fixed a bug affecting Delta.
  • 2019年2月26日Feb 26, 2019

    • 修复了影响 JDBC/ODBC 服务器的 bug。Fixed a bug affecting JDBC/ODBC server.
  • 2019年2月12日Feb 12, 2019

    • [SPARK-26709][SQL] OptimizeMetadataOnlyQuery 无法正确处理空记录。[SPARK-26709][SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.
    • 生成 HadoopRDD 时排除隐藏的文件。Excluding the hidden files when building HadoopRDD.
    • 固定的 Parquet 筛选器值为空时在 IN 谓词中转换。Fixed Parquet Filter Conversion for IN predicate when its value is empty.
    • 修复了在 spark.network.crypto.enabled 将设置为 true) 时,当发送启用了加密 (的大型 RPC 错误消息时,Spark 低级别网络协议可能会中断的问题。Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when spark.network.crypto.enabled is set to true).
  • 2019年1月30日Jan 30, 2019

    • 修复了一个问题,该问题可能导致 df.rdd.count() UDT 导致某些情况下出现错误答案。Fixed an issue that can cause df.rdd.count() with UDT to return incorrect answer for certain cases.
    • 修复了 SQL 缓存的缓存 RDD 和其物理计划之间的不一致,这会导致不正确的结果。Fixed the inconsistency between a SQL cache’s cached RDD and its physical plan, which causes incorrect result.
  • 2019年1月8日Jan 8, 2019

    • 修复了导致错误的问题 org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelistedFixed the issue that causes the error org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelisted.
    • Spark UI 中的 RDD 名称的修正凭据Redacted credentials from RDD names in Spark UI
    • [SPARK-26352]联接重新排序不应更改输出属性的顺序。[SPARK-26352]join reordering should not change the order of output attributes.
    • [SPARK-26366]ReplaceExceptWithFilter 应将 NULL 视为 False。[SPARK-26366]ReplaceExceptWithFilter should consider NULL as False.
    • 已启用增量 Lake。Delta Lake is enabled.
    • 现在为所有定价层启用了 Ls 系列辅助角色实例类型的 Databricks IO 缓存。Databricks IO Cache is now enabled for Ls series worker instance types for all pricing tiers.
  • 2018年12月18日Dec 18, 2018

    • [SPARK-25002]Avro:修改输出记录命名空间。[SPARK-25002]Avro: revise the output record namespace.
    • 修复了使用联接和限制影响某些查询的问题。Fixed an issue affecting certain queries using Join and Limit.
    • [SPARK-26307]修复了使用 Hive SerDe 插入分区表时的 CTAS。[SPARK-26307]Fixed CTAS when INSERT a partitioned table using Hive SerDe.
    • spark.sql.files.ignoreCorruptFiles启用或标志后,仅在一次或多次重试后忽略损坏 spark.sql.files.ignoreMissingFiles 的文件。Only ignore corrupt files after one or more retries when spark.sql.files.ignoreCorruptFiles or spark.sql.files.ignoreMissingFiles flag is enabled.
    • [SPARK-26181]hasMinMaxStats 方法 ColumnStatsMap 不正确。[SPARK-26181]the hasMinMaxStats method of ColumnStatsMap is not correct.
    • 修复了在没有 Internet 访问的环境中影响安装 Python 轮的问题。Fixed an issue affecting installing Python Wheels in environments without Internet access.
    • 修复了查询分析器中的性能问题。Fixed a performance issue in query analyzer.
    • 修复了 PySpark 中导致数据帧操作失败并出现 "连接被拒绝" 错误的问题。Fixed an issue in PySpark that caused DataFrame actions failed with “connection refused” error.
    • 修复了影响某些自助查询的问题。Fixed an issue affecting certain self union queries.
  • 2018 年 11 月 20 日Nov 20, 2018

    • [Spark-17916][spark-25241]当设置 nullValue 时,修复被分析为 null 的空字符串。[SPARK-17916][SPARK-25241]Fix empty string being parsed as null when nullValue is set.
    • [SPARK-25387]修复由无效的 CSV 输入引起的 NPE。[SPARK-25387]Fix for NPE caused by bad CSV input.
    • 修复了利用左半/反联接影响某些聚合查询的问题。Fixed an issue affecting certain aggregation queries with Left Semi/Anti joins.
  • 11月6日,2018Nov 6, 2018

    • [SPARK-25741]长 Url 在 web UI 中无法正确呈现。[SPARK-25741]Long URLs are not rendered properly in web UI.
    • [SPARK-25714]修复了优化器规则 BooleanSimplification 中的 Null 处理。[SPARK-25714]Fix Null Handling in the Optimizer rule BooleanSimplification.
    • 修复了在 Synapse Analytics 连接器中影响临时对象清除的问题。Fixed an issue affecting temporary objects cleanup in Synapse Analytics connector.
    • [SPARK-25816]在嵌套的提取器中修复属性解析。[SPARK-25816]Fix attribute resolution in nested extractors.
  • Oct 16,2018Oct 16, 2018

    • 修复了一个 bug,该 bug 影响 SHOW CREATE TABLE 在增量表上运行的的输出。Fixed a bug affecting the output of running SHOW CREATE TABLE on Delta tables.
    • 修复了影响操作的 bug UnionFixed a bug affecting Union operation.
  • 2018年9月25日Sep 25, 2018

    • [SPARK-25368][SQL] 错误的约束推理返回了错误的结果。[SPARK-25368][SQL] Incorrect constraint inference returns wrong result.
    • [SPARK-25402][SQL] BooleanSimplification 中的 Null 处理。[SPARK-25402][SQL] Null handling in BooleanSimplification.
    • 已修复 NotSerializableException Avro 数据源。Fixed NotSerializableException in Avro data source.
  • 09月11日,2018Sep 11, 2018

    • [SPARK-25214][SS] 修复了 Kafka v2 源可能会在何时返回重复记录的问题 failOnDataLoss=false[SPARK-25214][SS] Fix the issue that Kafka v2 source may return duplicated records when failOnDataLoss=false.
    • [SPARK-24987][SS] 在没有 articlePartition 的新偏移量时修复 Kafka 使用者泄漏。[SPARK-24987][SS] Fix Kafka consumer leak when no new offsets for articlePartition.
    • 筛选器减少应正确处理 null 值。Filter reduction should handle null value correctly.
    • 提高了执行引擎的稳定性。Improved stability of execution engine.
  • 2018年8月28日Aug 28, 2018

    • 修复了 Delta Lake Delete 命令中的一个 bug,该 bug 会错误地删除条件计算结果为 null 的行。Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.
    • [SPARK-25142]当 Python 工作线程无法在中打开套接字时添加错误消息 _load_from_socket[SPARK-25142]Add error messages when Python worker could not open socket in _load_from_socket.
  • 8月23日,2018Aug 23, 2018

    • [SPARK-23935]mapEntry 引发 org.codehaus.commons.compiler.CompileException[SPARK-23935]mapEntry throws org.codehaus.commons.compiler.CompileException.
    • 修复了 Parquet 读取器中可以为 null 的映射问题。Fixed nullable map issue in Parquet reader.
    • [SPARK-25051][SQL] FixNullability 不应在 AnalysisBarrier 上停止。[SPARK-25051][SQL] FixNullability should not stop on AnalysisBarrier.
    • [SPARK-25081]修复了一个 bug,当溢出无法分配内存时,ShuffleExternalSorter 可能会访问已释放的内存页。[SPARK-25081]Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.
    • 修复了 Databricks Delta 与 Pyspark 之间的交互,这可能会导致暂时性的读取失败。Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.
    • [SPARK-25084]"在多列上分发" (用方括号括起来) 可能会导致 codegen 问题。[SPARK-25084]”distribute by” on multiple columns (wrap in brackets) may lead to codegen issue.
    • [SPARK-25096]如果强制转换为可为 null,则放宽为空性。[SPARK-25096]Loosen nullability if the cast is force-nullable.
    • 降低了增量 Lake Optimize 命令使用的默认线程数,从而减少了内存开销并加快了数据的提交速度。Lowered the default number of threads used by the Delta Lake Optimize command, reducing memory overhead and committing data faster.
    • [SPARK-25114]如果两个单词之间的减法可被 Integer.MAX_VALUE 整除,请修复 RecordBinaryComparator。[SPARK-25114]Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE.
    • 修复了命令部分成功的机密管理器修订。Fixed secret manager redaction when command partially succeed.

Databricks Runtime 4.2 (不 支持) Databricks Runtime 4.2 (Unsupported)

请参阅 Databricks Runtime 4.2See Databricks Runtime 4.2.

  • 2019年2月26日Feb 26, 2019

    • 修复了影响 JDBC/ODBC 服务器的 bug。Fixed a bug affecting JDBC/ODBC server.
  • 2019年2月12日Feb 12, 2019

    • [SPARK-26709][SQL] OptimizeMetadataOnlyQuery 无法正确处理空记录。[SPARK-26709][SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.
    • 生成 HadoopRDD 时排除隐藏的文件。Excluding the hidden files when building HadoopRDD.
    • 固定的 Parquet 筛选器值为空时在 IN 谓词中转换。Fixed Parquet Filter Conversion for IN predicate when its value is empty.
    • 修复了在 spark.network.crypto.enabled 将设置为 true) 时,当发送启用了加密 (的大型 RPC 错误消息时,Spark 低级别网络协议可能会中断的问题。Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when spark.network.crypto.enabled is set to true).
  • 2019年1月30日Jan 30, 2019

    • 修复了一个问题,该问题可能导致 df.rdd.count() UDT 导致某些情况下出现错误答案。Fixed an issue that can cause df.rdd.count() with UDT to return incorrect answer for certain cases.
  • 2019年1月8日Jan 8, 2019

    • 修复了导致错误的问题 org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelistedFixed issue that causes the error org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelisted.
    • Spark UI 中的 RDD 名称的修正凭据Redacted credentials from RDD names in Spark UI
    • [SPARK-26352]联接重新排序不应更改输出属性的顺序。[SPARK-26352]join reordering should not change the order of output attributes.
    • [SPARK-26366]ReplaceExceptWithFilter 应将 NULL 视为 False。[SPARK-26366]ReplaceExceptWithFilter should consider NULL as False.
    • 已启用增量 Lake。Delta Lake is enabled.
    • 现在为所有定价层启用了 Ls 系列辅助角色实例类型的 Databricks IO 缓存。Databricks IO Cache is now enabled for Ls series worker instance types for all pricing tiers.
  • 2018年12月18日Dec 18, 2018

    • [SPARK-25002]Avro:修改输出记录命名空间。[SPARK-25002]Avro: revise the output record namespace.
    • 修复了使用联接和限制影响某些查询的问题。Fixed an issue affecting certain queries using Join and Limit.
    • [SPARK-26307]修复了使用 Hive SerDe 插入分区表时的 CTAS。[SPARK-26307]Fixed CTAS when INSERT a partitioned table using Hive SerDe.
    • spark.sql.files.ignoreCorruptFiles启用或标志后,仅在一次或多次重试后忽略损坏 spark.sql.files.ignoreMissingFiles 的文件。Only ignore corrupt files after one or more retries when spark.sql.files.ignoreCorruptFiles or spark.sql.files.ignoreMissingFiles flag is enabled.
    • [SPARK-26181]hasMinMaxStats 方法 ColumnStatsMap 不正确。[SPARK-26181]the hasMinMaxStats method of ColumnStatsMap is not correct.
    • 修复了在没有 Internet 访问的环境中影响安装 Python 轮的问题。Fixed an issue affecting installing Python Wheels in environments without Internet access.
    • 修复了查询分析器中的性能问题。Fixed a performance issue in query analyzer.
    • 修复了 PySpark 中导致数据帧操作失败并出现 "连接被拒绝" 错误的问题。Fixed an issue in PySpark that caused DataFrame actions failed with “connection refused” error.
    • 修复了影响某些自助查询的问题。Fixed an issue affecting certain self union queries.
  • 2018 年 11 月 20 日Nov 20, 2018

    • [Spark-17916][spark-25241]当设置 nullValue 时,修复被分析为 null 的空字符串。[SPARK-17916][SPARK-25241]Fix empty string being parsed as null when nullValue is set.
    • 修复了利用左半/反联接影响某些聚合查询的问题。Fixed an issue affecting certain aggregation queries with Left Semi/Anti joins.
  • 11月6日,2018Nov 6, 2018

    • [SPARK-25741]长 Url 在 web UI 中无法正确呈现。[SPARK-25741]Long URLs are not rendered properly in web UI.
    • [SPARK-25714]修复了优化器规则 BooleanSimplification 中的 Null 处理。[SPARK-25714]Fix Null Handling in the Optimizer rule BooleanSimplification.
  • Oct 16,2018Oct 16, 2018

    • 修复了一个 bug,该 bug 影响 SHOW CREATE TABLE 在增量表上运行的的输出。Fixed a bug affecting the output of running SHOW CREATE TABLE on Delta tables.
    • 修复了影响操作的 bug UnionFixed a bug affecting Union operation.
  • 2018年9月25日Sep 25, 2018

    • [SPARK-25368][SQL] 错误的约束推理返回了错误的结果。[SPARK-25368][SQL] Incorrect constraint inference returns wrong result.
    • [SPARK-25402][SQL] BooleanSimplification 中的 Null 处理。[SPARK-25402][SQL] Null handling in BooleanSimplification.
    • 已修复 NotSerializableException Avro 数据源。Fixed NotSerializableException in Avro data source.
  • 09月11日,2018Sep 11, 2018

    • [SPARK-25214][SS] 修复了 Kafka v2 源可能会在何时返回重复记录的问题 failOnDataLoss=false[SPARK-25214][SS] Fix the issue that Kafka v2 source may return duplicated records when failOnDataLoss=false.
    • [SPARK-24987][SS] 在没有 articlePartition 的新偏移量时修复 Kafka 使用者泄漏。[SPARK-24987][SS] Fix Kafka consumer leak when no new offsets for articlePartition.
    • 筛选器减少应正确处理 null 值。Filter reduction should handle null value correctly.
  • 2018年8月28日Aug 28, 2018

    • 修复了 Delta Lake Delete 命令中的一个 bug,该 bug 会错误地删除条件计算结果为 null 的行。Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.
  • 8月23日,2018Aug 23, 2018

    • 修复了增量快照的 NoClassDefErrorFixed NoClassDefError for Delta Snapshot
    • [SPARK-23935]mapEntry 引发 org.codehaus.commons.compiler.CompileException[SPARK-23935]mapEntry throws org.codehaus.commons.compiler.CompileException.
    • [SPARK-24957][SQL] Average with decimal 后跟聚合返回了错误的结果。[SPARK-24957][SQL] Average with decimal followed by aggregation returns wrong result. 可能会返回不正确的 AVERAGE 结果。The incorrect results of AVERAGE might be returned. 如果除法运算的结果类型与强制转换的类型相同,则会跳过在 Average 运算符中添加的强制转换。The CAST added in the Average operator will be bypassed if the result of Divide is the same type which it is casted to.
    • [SPARK-25081]修复了一个 bug,当溢出无法分配内存时,ShuffleExternalSorter 可能会访问已释放的内存页。[SPARK-25081]Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.
    • 修复了 Databricks Delta 与 Pyspark 之间的交互,这可能会导致暂时性的读取失败。Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.
    • [SPARK-25114]如果两个单词之间的减法可被 Integer.MAX_VALUE 整除,请修复 RecordBinaryComparator。[SPARK-25114]Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE.
    • [SPARK-25084]"在多列上分发" (用方括号括起来) 可能会导致 codegen 问题。[SPARK-25084]”distribute by” on multiple columns (wrap in brackets) may lead to codegen issue.
    • [SPARK-24934][SQL] 在内存中分区修剪的上/下边界内显式分隔支持的类型。[SPARK-24934][SQL] Explicitly whitelist supported types in upper/lower bounds for in-memory partition pruning. 当针对缓存数据在查询筛选器中使用复杂数据类型时,Spark 始终返回一个空结果集。When complex data types are used in query filters against cached data, Spark always returns an empty result set. 基于内存中统计信息的删除生成的结果不正确,因为为复杂类型设置了 null。The in-memory stats-based pruning generates incorrect results, because null is set for upper/lower bounds for complex types. 解决方法是不使用复杂类型的内存中基于统计信息的删除。The fix is to not use in-memory stats-based pruning for complex types.
    • 修复了命令部分成功的机密管理器修订。Fixed secret manager redaction when command partially succeed.
    • 修复了 Parquet 读取器中可以为 null 的映射问题。Fixed nullable map issue in Parquet reader.
  • 8月2日,2018Aug 2, 2018

    • 已在 Python 中添加 writeStream API。Added writeStream.table API in Python.
    • 修复了影响增量检查点的问题。Fixed an issue affecting Delta checkpointing.
    • [SPARK-24867][SQL] 将 AnalysisBarrier 添加到 DataFrameWriter。[SPARK-24867][SQL] Add AnalysisBarrier to DataFrameWriter. 使用 DataFrameWriter 通过 UDF 编写数据帧时,不使用 SQL 缓存。SQL cache is not being used when using DataFrameWriter to write a DataFrame with UDF. 这是因为我们在 AnalysisBarrier 中所做的更改导致的回归,因为并非所有分析器规则都是幂等的。This is a regression caused by the changes we made in AnalysisBarrier, since not all the Analyzer rules are idempotent.
    • 修复了可能导致 mergeInto 命令产生不正确结果的问题。Fixed an issue that could cause mergeInto command to produce incorrect results.
    • 提高了访问 Azure Data Lake Storage Gen1 的稳定性。Improved stability on accessing Azure Data Lake Storage Gen1.
    • [SPARK-24809]序列化执行器中的 LongHashedRelation 可能会导致数据错误。[SPARK-24809]Serializing LongHashedRelation in executor may result in data error.
    • [SPARK-24878][SQL] 为包含 null 的基元类型的数组类型修复反向函数。[SPARK-24878][SQL] Fix reverse function for array type of primitive type containing null.
  • 2018 年 7 月 11 日July 11, 2018

    • 修复了查询执行中的 bug,该 bug 会导致在某些情况下具有不同精度的十进制列上的聚合返回错误的结果。Fixed a bug in query execution that would cause aggregations on decimal columns with different precisions to return incorrect results in some cases.
    • 修复了 NullPointerException 在高级聚合操作过程中引发的 bug,如分组集。Fixed a NullPointerException bug that was thrown during advanced aggregation operations like grouping sets.

Databricks Runtime 4.1 ML (不受 支持) Databricks Runtime 4.1 ML (Unsupported)

请参阅 Databricks Runtime 4.1 ML (Beta) See Databricks Runtime 4.1 ML (Beta).

  • 2018 年 7 月 31 日July 31, 2018
    • 将 Azure Synapse Analytics 添加到 ML 运行时4。1Added Azure Synapse Analytics to ML Runtime 4.1
    • 修复了一个 bug,当谓词中使用的分区列的名称与表的架构中的列的大小写不同时,可能导致查询结果不正确。Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table.
    • 修复了影响 Spark SQL 执行引擎的 bug。Fixed a bug affecting Spark SQL execution engine.
    • 修复了影响代码生成的 bug。Fixed a bug affecting code generation.
    • 修复了 java.lang.NoClassDefFoundError 影响增量 Lake) (错误。Fixed a bug (java.lang.NoClassDefFoundError) affecting Delta Lake.
    • 改进了增量 Lake 中的错误处理。Improved error handling in Delta Lake.
    • 修复了一个 bug,该 bug 导致为字符串列32个或更长的字符串收集的错误数据。Fixed a bug that caused incorrect data skipping statistics to be collected for string columns 32 characters or greater.

Databricks Runtime 4.1 (不 支持) Databricks Runtime 4.1 (Unsupported)

请参阅 Databricks Runtime 4.1See Databricks Runtime 4.1.

  • 2019年1月8日Jan 8, 2019

    • [SPARK-26366]ReplaceExceptWithFilter 应将 NULL 视为 False。[SPARK-26366]ReplaceExceptWithFilter should consider NULL as False.
    • 已启用增量 Lake。Delta Lake is enabled.
  • 2018年12月18日Dec 18, 2018

    • [SPARK-25002]Avro:修改输出记录命名空间。[SPARK-25002]Avro: revise the output record namespace.
    • 修复了使用联接和限制影响某些查询的问题。Fixed an issue affecting certain queries using Join and Limit.
    • [SPARK-26307]修复了使用 Hive SerDe 插入分区表时的 CTAS。[SPARK-26307]Fixed CTAS when INSERT a partitioned table using Hive SerDe.
    • spark.sql.files.ignoreCorruptFiles启用或标志后,仅在一次或多次重试后忽略损坏 spark.sql.files.ignoreMissingFiles 的文件。Only ignore corrupt files after one or more retries when spark.sql.files.ignoreCorruptFiles or spark.sql.files.ignoreMissingFiles flag is enabled.
    • 修复了在没有 Internet 访问的环境中影响安装 Python 轮的问题。Fixed an issue affecting installing Python Wheels in environments without Internet access.
    • 修复了 PySpark 中导致数据帧操作失败并出现 "连接被拒绝" 错误的问题。Fixed an issue in PySpark that caused DataFrame actions failed with “connection refused” error.
    • 修复了影响某些自助查询的问题。Fixed an issue affecting certain self union queries.
  • 2018 年 11 月 20 日Nov 20, 2018

    • [Spark-17916][spark-25241]当设置 nullValue 时,修复被分析为 null 的空字符串。[SPARK-17916][SPARK-25241]Fix empty string being parsed as null when nullValue is set.
    • 修复了利用左半/反联接影响某些聚合查询的问题。Fixed an issue affecting certain aggregation queries with Left Semi/Anti joins.
  • 11月6日,2018Nov 6, 2018

    • [SPARK-25741]长 Url 在 web UI 中无法正确呈现。[SPARK-25741]Long URLs are not rendered properly in web UI.
    • [SPARK-25714]修复了优化器规则 BooleanSimplification 中的 Null 处理。[SPARK-25714]Fix Null Handling in the Optimizer rule BooleanSimplification.
  • Oct 16,2018Oct 16, 2018

    • 修复了一个 bug,该 bug 影响 SHOW CREATE TABLE 在增量表上运行的的输出。Fixed a bug affecting the output of running SHOW CREATE TABLE on Delta tables.
    • 修复了影响操作的 bug UnionFixed a bug affecting Union operation.
  • 2018年9月25日Sep 25, 2018

    • [SPARK-25368][SQL] 错误的约束推理返回了错误的结果。[SPARK-25368][SQL] Incorrect constraint inference returns wrong result.
    • [SPARK-25402][SQL] BooleanSimplification 中的 Null 处理。[SPARK-25402][SQL] Null handling in BooleanSimplification.
    • 已修复 NotSerializableException Avro 数据源。Fixed NotSerializableException in Avro data source.
  • 09月11日,2018Sep 11, 2018

    • [SPARK-25214][SS] 修复了 Kafka v2 源可能会在何时返回重复记录的问题 failOnDataLoss=false[SPARK-25214][SS] Fix the issue that Kafka v2 source may return duplicated records when failOnDataLoss=false.
    • [SPARK-24987][SS] 在没有 articlePartition 的新偏移量时修复 Kafka 使用者泄漏。[SPARK-24987][SS] Fix Kafka consumer leak when no new offsets for articlePartition.
    • 筛选器减少应正确处理 null 值。Filter reduction should handle null value correctly.
  • 2018年8月28日Aug 28, 2018

    • 修复了 Delta Lake Delete 命令中的一个 bug,该 bug 会错误地删除条件计算结果为 null 的行。Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.
    • [SPARK-25084]"在多列上分发" (用方括号括起来) 可能会导致 codegen 问题。[SPARK-25084]”distribute by” on multiple columns (wrap in brackets) may lead to codegen issue.
    • [SPARK-25114]如果两个单词之间的减法可被 Integer.MAX_VALUE 整除,请修复 RecordBinaryComparator。[SPARK-25114]Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE.
  • 8月23日,2018Aug 23, 2018

    • 修复了增量快照的 NoClassDefError。Fixed NoClassDefError for Delta Snapshot.
    • [SPARK-24957][SQL] Average with decimal 后跟聚合返回了错误的结果。[SPARK-24957][SQL] Average with decimal followed by aggregation returns wrong result. 可能会返回不正确的 AVERAGE 结果。The incorrect results of AVERAGE might be returned. 如果除法运算的结果类型与强制转换的类型相同,则会跳过在 Average 运算符中添加的强制转换。The CAST added in the Average operator will be bypassed if the result of Divide is the same type which it is casted to.
    • 修复了 Parquet 读取器中可以为 null 的映射问题。Fixed nullable map issue in Parquet reader.
    • [SPARK-24934][SQL] 在内存中分区修剪的上/下边界内显式分隔支持的类型。[SPARK-24934][SQL] Explicitly whitelist supported types in upper/lower bounds for in-memory partition pruning. 当针对缓存数据在查询筛选器中使用复杂数据类型时,Spark 始终返回一个空结果集。When complex data types are used in query filters against cached data, Spark always returns an empty result set. 基于内存中统计信息的删除生成的结果不正确,因为为复杂类型设置了 null。The in-memory stats-based pruning generates incorrect results, because null is set for upper/lower bounds for complex types. 解决方法是不使用复杂类型的内存中基于统计信息的删除。The fix is to not use in-memory stats-based pruning for complex types.
    • [SPARK-25081]修复了一个 bug,当溢出无法分配内存时,ShuffleExternalSorter 可能会访问已释放的内存页。[SPARK-25081]Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.
    • 修复了 Databricks Delta 与 Pyspark 之间的交互,这可能会导致暂时性的读取失败。Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.
    • 修复了命令部分成功时的机密管理器修订Fixed secret manager redaction when command partially succeed
  • 8月2日,2018Aug 2, 2018

    • [SPARK-24613][SQL] 具有 UDF 的缓存无法与后续依赖缓存匹配。[SPARK-24613][SQL] Cache with UDF could not be matched with subsequent dependent caches. 在 CacheManager 中使用 AnalysisBarrier 的执行计划编译来包装逻辑计划,以避免再次分析计划。Wraps the logical plan with a AnalysisBarrier for execution plan compilation in CacheManager, in order to avoid the plan being analyzed again. 这也是 Spark 2.3 的回归。This is also a regression of Spark 2.3.
    • 修复了影响时区转换以写入 DateType 数据的 Synapse Analytics 连接器问题。Fixed a Synapse Analytics connector issue affecting timezone conversion for writing DateType data.
    • 修复了影响增量检查点的问题。Fixed an issue affecting Delta checkpointing.
    • 修复了可能导致 mergeInto 命令产生不正确结果的问题。Fixed an issue that could cause mergeInto command to produce incorrect results.
    • [SPARK-24867][SQL] 将 AnalysisBarrier 添加到 DataFrameWriter。[SPARK-24867][SQL] Add AnalysisBarrier to DataFrameWriter. 使用 DataFrameWriter 通过 UDF 编写数据帧时,不使用 SQL 缓存。SQL cache is not being used when using DataFrameWriter to write a DataFrame with UDF. 这是因为我们在 AnalysisBarrier 中所做的更改导致的回归,因为并非所有分析器规则都是幂等的。This is a regression caused by the changes we made in AnalysisBarrier, since not all the Analyzer rules are idempotent.
    • [SPARK-24809]序列化执行器中的 LongHashedRelation 可能会导致数据错误。[SPARK-24809]Serializing LongHashedRelation in executor may result in data error.
  • 2018 年 7 月 11 日July 11, 2018

    • 修复了查询执行中的 bug,该 bug 会导致在某些情况下具有不同精度的十进制列上的聚合返回错误的结果。Fixed a bug in query execution that would cause aggregations on decimal columns with different precisions to return incorrect results in some cases.
    • 修复了 NullPointerException 在高级聚合操作过程中引发的 bug,如分组集。Fixed a NullPointerException bug that was thrown during advanced aggregation operations like grouping sets.
  • 2018 年 6 月 28 日June 28, 2018

    • 修复了一个 bug,当谓词中使用的分区列的名称与表的架构中的列的大小写不同时,可能导致查询结果不正确。Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table.
  • 2018年6月7日June 7, 2018

    • 修复了影响 Spark SQL 执行引擎的 bug。Fixed a bug affecting Spark SQL execution engine.
    • 修复了影响代码生成的 bug。Fixed a bug affecting code generation.
    • 修复了 java.lang.NoClassDefFoundError 影响增量 Lake) (错误。Fixed a bug (java.lang.NoClassDefFoundError) affecting Delta Lake.
    • 改进了增量 Lake 中的错误处理。Improved error handling in Delta Lake.
  • 5月17日2018May 17, 2018

    • 修复了一个 bug,该 bug 导致为字符串列32个或更长的字符串收集的错误数据。Fixed a bug that caused incorrect data skipping statistics to be collected for string columns 32 characters or greater.

Databricks Runtime 4.0 (不 支持) Databricks Runtime 4.0 (Unsupported)

请参阅 Databricks Runtime 4.0See Databricks Runtime 4.0.

  • 11月6日,2018Nov 6, 2018

    • [SPARK-25714]修复了优化器规则 BooleanSimplification 中的 Null 处理。[SPARK-25714]Fix Null Handling in the Optimizer rule BooleanSimplification.
  • Oct 16,2018Oct 16, 2018

    • 修复了影响操作的 bug UnionFixed a bug affecting Union operation.
  • 2018年9月25日Sep 25, 2018

    • [SPARK-25368][SQL] 错误的约束推理返回了错误的结果。[SPARK-25368][SQL] Incorrect constraint inference returns wrong result.
    • [SPARK-25402][SQL] BooleanSimplification 中的 Null 处理。[SPARK-25402][SQL] Null handling in BooleanSimplification.
    • 已修复 NotSerializableException Avro 数据源。Fixed NotSerializableException in Avro data source.
  • 09月11日,2018Sep 11, 2018

    • 筛选器减少应正确处理 null 值。Filter reduction should handle null value correctly.
  • 2018年8月28日Aug 28, 2018

    • 修复了 Delta Lake Delete 命令中的一个 bug,该 bug 会错误地删除条件计算结果为 null 的行。Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.
  • 8月23日,2018Aug 23, 2018

    • 修复了 Parquet 读取器中可以为 null 的映射问题。Fixed nullable map issue in Parquet reader.
    • 修复了命令部分成功时的机密管理器修订Fixed secret manager redaction when command partially succeed
    • 修复了 Databricks Delta 与 Pyspark 之间的交互,这可能会导致暂时性的读取失败。Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.
    • [SPARK-25081]修复了一个 bug,当溢出无法分配内存时,ShuffleExternalSorter 可能会访问已释放的内存页。[SPARK-25081]Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.
    • [SPARK-25114]如果两个单词之间的减法可被 Integer.MAX_VALUE 整除,请修复 RecordBinaryComparator。[SPARK-25114]Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE.
  • 8月2日,2018Aug 2, 2018

    • [SPARK-24452]避免在 int add 或多个中可能溢出。[SPARK-24452]Avoid possible overflow in int add or multiple.
    • [SPARK-24588]流式处理加入应需要子 HashClusteredPartitioning。[SPARK-24588]Streaming join should require HashClusteredPartitioning from children.
    • 修复了可能导致 mergeInto 命令产生不正确结果的问题。Fixed an issue that could cause mergeInto command to produce incorrect results.
    • [SPARK-24867][SQL] 将 AnalysisBarrier 添加到 DataFrameWriter。[SPARK-24867][SQL] Add AnalysisBarrier to DataFrameWriter. 使用 DataFrameWriter 通过 UDF 编写数据帧时,不使用 SQL 缓存。SQL cache is not being used when using DataFrameWriter to write a DataFrame with UDF. 这是因为我们在 AnalysisBarrier 中所做的更改导致的回归,因为并非所有分析器规则都是幂等的。This is a regression caused by the changes we made in AnalysisBarrier, since not all the Analyzer rules are idempotent.
    • [SPARK-24809]序列化执行器中的 LongHashedRelation 可能会导致数据错误。[SPARK-24809]Serializing LongHashedRelation in executor may result in data error.
  • 2018 年 6 月 28 日June 28, 2018

    • 修复了一个 bug,当谓词中使用的分区列的名称与表的架构中的列的大小写不同时,可能导致查询结果不正确。Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table.
  • 2018年6月7日June 7, 2018

    • 修复了影响 Spark SQL 执行引擎的 bug。Fixed a bug affecting Spark SQL execution engine.
    • 改进了增量 Lake 中的错误处理。Improved error handling in Delta Lake.
  • 5月17日2018May 17, 2018

    • Databricks 密钥管理的 Bug 修复。Bug fixes for Databricks secret management.
    • 提高了读取 Azure Data Lake Store 中存储的数据的稳定性。Improved stability on reading data stored in Azure Data Lake Store.
    • 修复了影响 RDD 缓存的 bug。Fixed a bug affecting RDD caching.
    • 修复了在 Spark SQL 中影响 Null 安全相等的 bug。Fixed a bug affecting Null-safe Equal in Spark SQL.
  • Apr 24,2018Apr 24, 2018

    • 将 Azure Data Lake Store SDK 从2.0.11 升级到2.2.8,提高对 Azure Data Lake Store 访问的稳定性。Upgraded Azure Data Lake Store SDK from 2.0.11 to 2.2.8 to improve the stability of access to Azure Data Lake Store.
    • 修复了一个 bug,该 bug 会在为时影响将覆盖插入已分区的 Hive 表 spark.databricks.io.hive.fastwriter.enabled falseFixed a bug affecting the insertion of overwrites to partitioned Hive tables when spark.databricks.io.hive.fastwriter.enabled is false.
    • 修复了失败任务序列化的问题。Fixed an issue that failed task serialization.
    • 提高了增量 Lake 稳定性。Improved Delta Lake stability.
  • 三月14,2018Mar 14, 2018

    • 写入增量 Lake 时,防止不必要的元数据更新。Prevent unnecessary metadata updates when writing into Delta Lake.
    • 修复了争用条件导致的问题,在极少数情况下,可能会导致某些输出文件丢失。Fixed an issue caused by a race condition that could, in rare circumstances, lead to loss of some output files.

Databricks Runtime 3.5 LTS (不 支持) Databricks Runtime 3.5 LTS (Unsupported)

请参阅 Databricks Runtime 3.5 LTS (不支持的) See Databricks Runtime 3.5 LTS (Unsupported).

  • 11月7日,2019Nov 7, 2019

    • [SPARK-29743][SQL] 如果其子 needCopyResult 为 true,则示例应将 needCopyResult 设置为 true[SPARK-29743][SQL] sample should set needCopyResult to true if its child’s needCopyResult is true
  • 2019 年 10 月 8 日Oct 8, 2019

    • 服务器端更改,以允许 Simba Apache Spark ODBC 驱动程序在提取结果失败后重新连接并继续, (需要 Simba Apache Spark ODBC 驱动程序更新为版本 2.6.10) 。Server side changes to allow Simba Apache Spark ODBC driver to reconnect and continue after a connection failure during fetching results (requires Simba Apache Spark ODBC driver update to version 2.6.10).
  • 09月10日,2019Sep 10, 2019

    • [SPARK-28699][SQL] 禁用对 ShuffleExchangeExec 进行重新分区时使用基数排序[SPARK-28699][SQL] Disable using radix sort for ShuffleExchangeExec in repartition case
  • Apr 9,2019Apr 9, 2019

    • [SPARK-26665][核心] 修复了一个 bug,该 bug 会导致 BlockTransferService 永远挂起。[SPARK-26665][CORE] Fix a bug that can cause BlockTransferService.fetchBlockSync to hang forever.
  • 2019年2月12日Feb 12, 2019

    • 修复了在 spark.network.crypto.enabled 将设置为 true) 时,当发送启用了加密 (的大型 RPC 错误消息时,Spark 低级别网络协议可能会中断的问题。Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when spark.network.crypto.enabled is set to true).
  • 2019年1月30日Jan 30, 2019

    • 修复了一个问题,该问题可能导致 df.rdd.count() UDT 导致某些情况下出现错误答案。Fixed an issue that can cause df.rdd.count() with UDT to return incorrect answer for certain cases.
  • 2018年12月18日Dec 18, 2018

    • spark.sql.files.ignoreCorruptFiles启用或标志后,仅在一次或多次重试后忽略损坏 spark.sql.files.ignoreMissingFiles 的文件。Only ignore corrupt files after one or more retries when spark.sql.files.ignoreCorruptFiles or spark.sql.files.ignoreMissingFiles flag is enabled.
    • 修复了影响某些自助查询的问题。Fixed an issue affecting certain self union queries.
  • 2018 年 11 月 20 日Nov 20, 2018

  • 11月6日,2018Nov 6, 2018

    • [SPARK-25714]修复了优化器规则 BooleanSimplification 中的 Null 处理。[SPARK-25714]Fix Null Handling in the Optimizer rule BooleanSimplification.
  • Oct 16,2018Oct 16, 2018

    • 修复了影响操作的 bug UnionFixed a bug affecting Union operation.
  • 2018年9月25日Sep 25, 2018

    • [SPARK-25402][SQL] BooleanSimplification 中的 Null 处理。[SPARK-25402][SQL] Null handling in BooleanSimplification.
    • 已修复 NotSerializableException Avro 数据源。Fixed NotSerializableException in Avro data source.
  • 09月11日,2018Sep 11, 2018

    • 筛选器减少应正确处理 null 值。Filter reduction should handle null value correctly.
  • 2018年8月28日Aug 28, 2018

    • 修复了 Delta Lake Delete 命令中的一个 bug,该 bug 会错误地删除条件计算结果为 null 的行。Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.
    • [SPARK-25114]如果两个单词之间的减法可被 Integer.MAX_VALUE 整除,请修复 RecordBinaryComparator。[SPARK-25114]Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE.
  • 8月23日,2018Aug 23, 2018

    • [SPARK-24809]序列化执行器中的 LongHashedRelation 可能会导致数据错误。[SPARK-24809]Serializing LongHashedRelation in executor may result in data error.
    • 修复了 Parquet 读取器中可以为 null 的映射问题。Fixed nullable map issue in Parquet reader.
    • [SPARK-25081]修复了一个 bug,当溢出无法分配内存时,ShuffleExternalSorter 可能会访问已释放的内存页。[SPARK-25081]Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.
    • 修复了 Databricks Delta 与 Pyspark 之间的交互,这可能会导致暂时性的读取失败。Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.
  • 2018 年 6 月 28 日June 28, 2018

    • 修复了一个 bug,当谓词中使用的分区列的名称与表的架构中的列的大小写不同时,可能导致查询结果不正确。Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table.
  • 2018 年 6 月 28 日June 28, 2018

    • 修复了一个 bug,当谓词中使用的分区列的名称与表的架构中的列的大小写不同时,可能导致查询结果不正确。Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table.
  • 2018年6月7日June 7, 2018

    • 修复了影响 Spark SQL 执行引擎的 bug。Fixed a bug affecting Spark SQL execution engine.
    • 改进了增量 Lake 中的错误处理。Improved error handling in Delta Lake.
  • 5月17日2018May 17, 2018

    • 提高了读取 Azure Data Lake Store 中存储的数据的稳定性。Improved stability on reading data stored in Azure Data Lake Store.
    • 修复了影响 RDD 缓存的 bug。Fixed a bug affecting RDD caching.
    • 修复了在 Spark SQL 中影响 Null 安全相等的 bug。Fixed a bug affecting Null-safe Equal in Spark SQL.
    • 修复了影响流式处理查询中的某些聚合的 bug。Fixed a bug affecting certain aggregations in streaming queries.
  • Apr 24,2018Apr 24, 2018

    • 将 Azure Data Lake Store SDK 从2.0.11 升级到2.2.8,提高对 Azure Data Lake Store 访问的稳定性。Upgraded Azure Data Lake Store SDK from 2.0.11 to 2.2.8 to improve the stability of access to Azure Data Lake Store.
    • 修复了一个 bug,该 bug 会在为时影响将覆盖插入已分区的 Hive 表 spark.databricks.io.hive.fastwriter.enabled falseFixed a bug affecting the insertion of overwrites to partitioned Hive tables when spark.databricks.io.hive.fastwriter.enabled is false.
    • 修复了失败任务序列化的问题。Fixed an issue that failed task serialization.
  • 三月09,2018Mar 09, 2018

    • 修复了争用条件导致的问题,在极少数情况下,可能会导致某些输出文件丢失。Fixed an issue caused by a race condition that could, in rare circumstances, lead to loss of some output files.
  • 三月01,2018Mar 01, 2018

    • 提高了处理可能需要很长时间才能停止的流的效率。Improved the efficiency of handling streams that can take a long time to stop.
    • 修复了影响 Python 自动完成的问题。Fixed an issue affecting Python autocomplete.
    • 应用 Ubuntu 安全修补程序。Applied Ubuntu security patches.
    • 修复了使用 Python Udf 和窗口函数影响某些查询的问题。Fixed an issue affecting certain queries using Python UDFs and window functions.
    • 修复了影响启用了表访问控制的群集上的 Udf 使用情况的问题。Fixed an issue affecting the use of UDFs on a cluster with table access control enabled.
  • 2018年1月29日Jan 29, 2018

    • 修复了影响存储在 Azure Blob 存储中的表的操作的问题。Fixed an issue affecting the manipulation of tables stored in Azure Blob storage.
    • 修复了 dropDuplicates on empty 数据帧之后的聚合。Fixed aggregation after dropDuplicates on empty DataFrame.

Databricks Runtime 3.4 (不 支持) Databricks Runtime 3.4 (Unsupported)

请参阅 Databricks Runtime 3.4See Databricks Runtime 3.4.

  • 2018年6月7日June 7, 2018

    • 修复了影响 Spark SQL 执行引擎的 bug。Fixed a bug affecting Spark SQL execution engine.
    • 改进了增量 Lake 中的错误处理。Improved error handling in Delta Lake.
  • 5月17日2018May 17, 2018

    • 提高了读取 Azure Data Lake Store 中存储的数据的稳定性。Improved stability on reading data stored in Azure Data Lake Store.
    • 修复了影响 RDD 缓存的 bug。Fixed a bug affecting RDD caching.
    • 修复了在 Spark SQL 中影响 Null 安全相等的 bug。Fixed a bug affecting Null-safe Equal in Spark SQL.
  • Apr 24,2018Apr 24, 2018

    • 修复了一个 bug,该 bug 会在为时影响将覆盖插入已分区的 Hive 表 spark.databricks.io.hive.fastwriter.enabled falseFixed a bug affecting the insertion of overwrites to partitioned Hive tables when spark.databricks.io.hive.fastwriter.enabled is false.
  • 三月09,2018Mar 09, 2018

    • 修复了争用条件导致的问题,在极少数情况下,可能会导致某些输出文件丢失。Fixed an issue caused by a race condition that could, in rare circumstances, lead to loss of some output files.
  • 12月13日,2017Dec 13, 2017

    • 修复了在 Scala 中影响 Udf 的问题。Fixed an issue affecting UDFs in Scala.
    • 修复了在存储在非 DBFS 路径中的数据源表上对数据的使用执行跳过索引的问题。Fixed an issue affecting the use of Data Skipping Index on data source tables stored in non-DBFS paths.
  • 2017年12月,Dec 07, 2017

    • 改进的无序稳定性。Improved shuffle stability.