提示Hints

建议用于生成执行计划的特定方法。Suggest specific approaches to generate an execution plan.

语法Syntax

/*+ hint [ , ... ] */

分区提示Partitioning hints

分区提示允许您建议 SQL Analytics 应遵循的分区策略。Partitioning hints allow you to suggest a partitioning strategy that SQL Analytics should follow. COALESCEREPARTITIONREPARTITION_BY_RANGE 提示均受支持,并分别等效于 coalescerepartitionrepartitionByRange 数据集 api。COALESCE, REPARTITION, and REPARTITION_BY_RANGE hints are supported and are equivalent to coalesce, repartition, and repartitionByRange Dataset APIs, respectively. 这些提示使你可以优化性能并控制输出文件的数量。These hints give you a way to tune performance and control the number of output files. 如果指定了多个分区提示,则会将多个节点插入到逻辑计划中,但优化器会选取最左侧的提示。When multiple partitioning hints are specified, multiple nodes are inserted into the logical plan, but the leftmost hint is picked by the optimizer.

分区提示类型Partitioning hint types

  • COALESCECOALESCE

    将分区数减少到指定的分区数。Reduce the number of partitions to the specified number of partitions. 它采用分区号作为参数。It takes a partition number as a parameter.

  • 对流REPARTITION

    使用指定的分区表达式将分区重新分区为指定数量的分区。Repartition to the specified number of partitions using the specified partitioning expressions. 它采用分区号和/或列名称作为参数。It takes a partition number, column names, or both as parameters.

  • REPARTITION_BY_RANGEREPARTITION_BY_RANGE

    使用指定的分区表达式将分区重新分区为指定数量的分区。Repartition to the specified number of partitions using the specified partitioning expressions. 它采用列名称和可选的分区号作为参数。It takes column names and an optional partition number as parameters.

示例Examples

SELECT /*+ COALESCE(3) */ * FROM t;

SELECT /*+ REPARTITION(3) */ * FROM t;

SELECT /*+ REPARTITION(c) */ * FROM t;

SELECT /*+ REPARTITION(3, c) */ * FROM t;

SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t;

SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t;

-- multiple partitioning hints
EXPLAIN EXTENDED SELECT /*+ REPARTITION(100), COALESCE(500), REPARTITION_BY_RANGE(3, c) */ * FROM t;
== Parsed Logical Plan ==
'UnresolvedHint REPARTITION, [100]
+- 'UnresolvedHint COALESCE, [500]
   +- 'UnresolvedHint REPARTITION_BY_RANGE, [3, 'c]
      +- 'Project [*]
         +- 'UnresolvedRelation [t]

== Analyzed Logical Plan ==
name: string, c: int
Repartition 100, true
+- Repartition 500, false
   +- RepartitionByExpression [c#30 ASC NULLS FIRST], 3
      +- Project [name#29, c#30]
         +- SubqueryAlias spark_catalog.default.t
            +- Relation[name#29,c#30] parquet

== Optimized Logical Plan ==
Repartition 100, true
+- Relation[name#29,c#30] parquet

== Physical Plan ==
Exchange RoundRobinPartitioning(100), false, [id=#121]
+- *(1) ColumnarToRow
   +- FileScan parquet default.t[name#29,c#30] Batched: true, DataFilters: [], Format: Parquet,
      Location: CatalogFileIndex[file:/spark/spark-warehouse/t], PartitionFilters: [],
      PushedFilters: [], ReadSchema: struct<name:string>

联接提示Join hints

联接提示允许您建议 SQL Analytics 应使用的联接策略。Join hints allow you to suggest the join strategy that SQL Analytics should use. 如果在联接的两个端指定了不同的联接策略提示,则 SQL Analytics 会按以下顺序划分提示: "高于" BROADCAST MERGE SHUFFLE_HASH SHUFFLE_REPLICATE_NLWhen different join strategy hints are specified on both sides of a join, SQL Analytics prioritizes hints in the following order: BROADCAST over MERGE over SHUFFLE_HASH over SHUFFLE_REPLICATE_NL. 如果两个端都是用 BROADCAST 提示或提示指定的 SHUFFLE_HASH ,则 SQL Analytics 会根据联接类型和关系大小选取生成端。When both sides are specified with the BROADCAST hint or the SHUFFLE_HASH hint, SQL Analytics picks the build side based on the join type and the sizes of the relations. 由于给定的策略可能不支持所有联接类型,因此不保证 SQL Analytics 使用提示建议的联接策略。Since a given strategy may not support all join types, SQL Analytics is not guaranteed to use the join strategy suggested by the hint.

联接提示类型Join hint types

  • 公布BROADCAST

    使用广播联接。Use broadcast join. 与提示的联接端是广播,而不考虑 autoBroadcastJoinThresholdThe join side with the hint is broadcast regardless of autoBroadcastJoinThreshold. 如果联接的两个端都具有广播提示,则根据统计) ,其大小较小的 (会广播。If both sides of the join have the broadcast hints, the one with the smaller size (based on stats) is broadcast. 的别名为 BROADCAST BROADCASTJOINMAPJOINThe aliases for BROADCAST are BROADCASTJOIN and MAPJOIN.

  • MERGEMERGE

    使用无序排序合并联接。Use shuffle sort merge join. 的别名为 MERGE SHUFFLE_MERGEMERGEJOINThe aliases for MERGE are SHUFFLE_MERGE and MERGEJOIN.

  • SHUFFLE_HASHSHUFFLE_HASH

    使用无序哈希联接。Use shuffle hash join. 如果两端都有无序哈希提示,SQL Analytics 会根据统计信息) 的生成端选择较小的侧 (。If both sides have the shuffle hash hints, SQL Analytics chooses the smaller side (based on stats) as the build side.

  • SHUFFLE_REPLICATE_NLSHUFFLE_REPLICATE_NL

    使用无序同步和复制嵌套循环联接。Use shuffle-and-replicate nested loop join.

示例Examples

-- Join Hints for broadcast join
SELECT /*+ BROADCAST(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
SELECT /*+ BROADCASTJOIN (t1) */ * FROM t1 left JOIN t2 ON t1.key = t2.key;
SELECT /*+ MAPJOIN(t2) */ * FROM t1 right JOIN t2 ON t1.key = t2.key;

-- Join Hints for shuffle sort merge join
SELECT /*+ SHUFFLE_MERGE(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
SELECT /*+ MERGEJOIN(t2) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
SELECT /*+ MERGE(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;

-- Join Hints for shuffle hash join
SELECT /*+ SHUFFLE_HASH(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;

-- Join Hints for shuffle-and-replicate nested loop join
SELECT /*+ SHUFFLE_REPLICATE_NL(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;

-- When different join strategy hints are specified on both sides of a join, SQL Analytics
-- prioritizes the BROADCAST hint over the MERGE hint over the SHUFFLE_HASH hint
-- over the SHUFFLE_REPLICATE_NL hint.
-- SQL Analytics will issue Warning in the following example
-- org.apache.spark.sql.catalyst.analysis.HintErrorLogger: Hint (strategy=merge)
-- is overridden by another hint and will not take effect.
SELECT /*+ BROADCAST(t1), MERGE(t1, t2) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;

倾斜提示Skew hints

Azure Databricks) 上的 (增量 Lake 的详细信息,请参阅 " 倾斜联接优化 " SKEW(Delta Lake on Azure Databricks) See Skew Join optimization for information about the SKEW hint.