提示Hints
建议用于生成执行计划的特定方法。Suggest specific approaches to generate an execution plan.
语法Syntax
/*+ hint [ , ... ] */
分区提示Partitioning hints
分区提示允许您建议 SQL Analytics 应遵循的分区策略。Partitioning hints allow you to suggest a partitioning strategy that SQL Analytics should follow.
COALESCE
、 REPARTITION
和 REPARTITION_BY_RANGE
提示均受支持,并分别等效于 coalesce
、 repartition
和 repartitionByRange
数据集 api。COALESCE
, REPARTITION
, and REPARTITION_BY_RANGE
hints are supported and are equivalent to coalesce
, repartition
, and repartitionByRange
Dataset APIs, respectively. 这些提示使你可以优化性能并控制输出文件的数量。These hints give you a way to tune performance and control the number of output files. 如果指定了多个分区提示,则会将多个节点插入到逻辑计划中,但优化器会选取最左侧的提示。When multiple partitioning hints are specified, multiple nodes are inserted into the logical plan, but the leftmost hint is picked by the optimizer.
分区提示类型Partitioning hint types
COALESCECOALESCE
将分区数减少到指定的分区数。Reduce the number of partitions to the specified number of partitions. 它采用分区号作为参数。It takes a partition number as a parameter.
对流REPARTITION
使用指定的分区表达式将分区重新分区为指定数量的分区。Repartition to the specified number of partitions using the specified partitioning expressions. 它采用分区号和/或列名称作为参数。It takes a partition number, column names, or both as parameters.
REPARTITION_BY_RANGEREPARTITION_BY_RANGE
使用指定的分区表达式将分区重新分区为指定数量的分区。Repartition to the specified number of partitions using the specified partitioning expressions. 它采用列名称和可选的分区号作为参数。It takes column names and an optional partition number as parameters.
示例Examples
SELECT /*+ COALESCE(3) */ * FROM t;
SELECT /*+ REPARTITION(3) */ * FROM t;
SELECT /*+ REPARTITION(c) */ * FROM t;
SELECT /*+ REPARTITION(3, c) */ * FROM t;
SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t;
SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t;
-- multiple partitioning hints
EXPLAIN EXTENDED SELECT /*+ REPARTITION(100), COALESCE(500), REPARTITION_BY_RANGE(3, c) */ * FROM t;
== Parsed Logical Plan ==
'UnresolvedHint REPARTITION, [100]
+- 'UnresolvedHint COALESCE, [500]
+- 'UnresolvedHint REPARTITION_BY_RANGE, [3, 'c]
+- 'Project [*]
+- 'UnresolvedRelation [t]
== Analyzed Logical Plan ==
name: string, c: int
Repartition 100, true
+- Repartition 500, false
+- RepartitionByExpression [c#30 ASC NULLS FIRST], 3
+- Project [name#29, c#30]
+- SubqueryAlias spark_catalog.default.t
+- Relation[name#29,c#30] parquet
== Optimized Logical Plan ==
Repartition 100, true
+- Relation[name#29,c#30] parquet
== Physical Plan ==
Exchange RoundRobinPartitioning(100), false, [id=#121]
+- *(1) ColumnarToRow
+- FileScan parquet default.t[name#29,c#30] Batched: true, DataFilters: [], Format: Parquet,
Location: CatalogFileIndex[file:/spark/spark-warehouse/t], PartitionFilters: [],
PushedFilters: [], ReadSchema: struct<name:string>
联接提示Join hints
联接提示允许您建议 SQL Analytics 应使用的联接策略。Join hints allow you to suggest the join strategy that SQL Analytics should use. 如果在联接的两个端指定了不同的联接策略提示,则 SQL Analytics 会按以下顺序划分提示: "高于" BROADCAST
MERGE
SHUFFLE_HASH
SHUFFLE_REPLICATE_NL
。When different join strategy hints are specified on both sides of a join, SQL Analytics prioritizes hints in the following order: BROADCAST
over MERGE
over SHUFFLE_HASH
over SHUFFLE_REPLICATE_NL
. 如果两个端都是用 BROADCAST
提示或提示指定的 SHUFFLE_HASH
,则 SQL Analytics 会根据联接类型和关系大小选取生成端。When both sides are specified with the BROADCAST
hint or the SHUFFLE_HASH
hint, SQL Analytics picks the build side based on the join type and the sizes of the relations. 由于给定的策略可能不支持所有联接类型,因此不保证 SQL Analytics 使用提示建议的联接策略。Since a given strategy may not support all join types, SQL Analytics is not guaranteed to use the join strategy suggested by the hint.
联接提示类型Join hint types
公布BROADCAST
使用广播联接。Use broadcast join. 与提示的联接端是广播,而不考虑
autoBroadcastJoinThreshold
。The join side with the hint is broadcast regardless ofautoBroadcastJoinThreshold
. 如果联接的两个端都具有广播提示,则根据统计) ,其大小较小的 (会广播。If both sides of the join have the broadcast hints, the one with the smaller size (based on stats) is broadcast. 的别名为BROADCAST
BROADCASTJOIN
和MAPJOIN
。The aliases forBROADCAST
areBROADCASTJOIN
andMAPJOIN
.MERGEMERGE
使用无序排序合并联接。Use shuffle sort merge join. 的别名为
MERGE
SHUFFLE_MERGE
和MERGEJOIN
。The aliases forMERGE
areSHUFFLE_MERGE
andMERGEJOIN
.SHUFFLE_HASHSHUFFLE_HASH
使用无序哈希联接。Use shuffle hash join. 如果两端都有无序哈希提示,SQL Analytics 会根据统计信息) 的生成端选择较小的侧 (。If both sides have the shuffle hash hints, SQL Analytics chooses the smaller side (based on stats) as the build side.
SHUFFLE_REPLICATE_NLSHUFFLE_REPLICATE_NL
使用无序同步和复制嵌套循环联接。Use shuffle-and-replicate nested loop join.
示例Examples
-- Join Hints for broadcast join
SELECT /*+ BROADCAST(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
SELECT /*+ BROADCASTJOIN (t1) */ * FROM t1 left JOIN t2 ON t1.key = t2.key;
SELECT /*+ MAPJOIN(t2) */ * FROM t1 right JOIN t2 ON t1.key = t2.key;
-- Join Hints for shuffle sort merge join
SELECT /*+ SHUFFLE_MERGE(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
SELECT /*+ MERGEJOIN(t2) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
SELECT /*+ MERGE(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
-- Join Hints for shuffle hash join
SELECT /*+ SHUFFLE_HASH(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
-- Join Hints for shuffle-and-replicate nested loop join
SELECT /*+ SHUFFLE_REPLICATE_NL(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
-- When different join strategy hints are specified on both sides of a join, SQL Analytics
-- prioritizes the BROADCAST hint over the MERGE hint over the SHUFFLE_HASH hint
-- over the SHUFFLE_REPLICATE_NL hint.
-- SQL Analytics will issue Warning in the following example
-- org.apache.spark.sql.catalyst.analysis.HintErrorLogger: Hint (strategy=merge)
-- is overridden by another hint and will not take effect.
SELECT /*+ BROADCAST(t1), MERGE(t1, t2) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
倾斜提示Skew hints
Azure Databricks) 上的 (增量 Lake 的详细信息,请参阅 " 倾斜联接优化 " SKEW
。(Delta Lake on Azure Databricks) See Skew Join optimization for information about the SKEW
hint.