從一或多個資料表抓取結果集。Retrieves result sets from one or more tables.


[ WITH with_query [ , ... ] ]
select_statement [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select_statement, ... ]
  [ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ] } ]
  [ SORT BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ] } ]
  [ CLUSTER BY { expression [ , ... ] } ]
  [ DISTRIBUTE BY { expression [, ... ] } ]
  [ WINDOW { named_window [ , WINDOW named_window, ... ] } ]
  [ LIMIT { ALL | expression } ]

While select_statement 定義為While select_statement is defined as

SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] }
  FROM { from_item [ , ... ] }
  [ PIVOT clause ]
  [ LATERAL VIEW clause ] [ ... ]
  [ WHERE boolean_expression ]
  [ GROUP BY expression [ , ... ] ]
  [ HAVING boolean_expression ]


  • with_querywith_query

    在主要查詢區塊之前的一或多個 通用資料表運算式 。稍後在子句中可以參考這些資料表運算式 FROMOne or more common table expressions before the main query block.These table expressions can be referenced later in the FROM clause. 這有助於在子句中抽象化重複的子查詢區塊 FROM ,並提升查詢的可讀性。This is useful to abstract out repeated subquery blocks in the FROM clause and improves readability of the query.

  • 提示hints

    提示可協助 Spark 優化工具做出更佳的規劃決策。Hints help the Spark optimizer make better planning decisions. Spark 支援提示,這些提示會影響資料的選取聯結策略和重新分割。Spark supports hints that influence selection of join strategies and repartitioning of the data.


    從關聯中選取所有相符的資料列。Select all matching rows from the relation. 預設為啟用。Enabled by default.


    移除結果中的重複專案之後,從關聯中選取所有相符的資料列。Select all matching rows from the relation after removing duplicates in results.

  • named_expressionnamed_expression

    具有已指派名稱的運算式。An expression with an assigned name. 表示資料行運算式。Denotes a column expression.

    語法:expression [AS] [alias]Syntax: expression [AS] [alias]

  • from_itemfrom_item

    查詢的輸入來源。A source of input for the query. 下列其中之一:One of the following:


    用於樞紐分析圖;您可以根據特定資料行值取得匯總的值。Used for data perspective; you can get the aggregated values based on specific column value.


    用來搭配產生器函式(例如 EXPLODE ),以產生包含一或多個資料列的虛擬資料表。Used in conjunction with generator functions such as EXPLODE, which generates a virtual table containing one or more rows. LATERAL VIEW 將資料列套用到每個原始輸出資料列。LATERAL VIEW applies the rows to each original output row.


    根據提供的述詞篩選子句的結果 FROMFilters the result of the FROM clause based on the supplied predicates.


    用來將資料列分組的運算式。The expressions that are used to group the rows. 這會與彙總函式搭配使用 (MINMAXCOUNT 、) , SUM AVG 以根據群組運算式和每個群組中的匯總值來群組資料列。This is used in conjunction with aggregate functions (MIN, MAX, COUNT, SUM, AVG) to group rows based on the grouping expressions and aggregate values in each group. FILTER 子句附加至彙總函式時,只會將相符的資料列傳遞給該函數。When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function.


    由所產生之資料列篩選的述詞 GROUP BYThe predicates by which the rows produced by GROUP BY are filtered. HAVING子句是用來在執行群組之後篩選資料列。The HAVING clause is used to filter rows after the grouping is performed. 如果您指定 [否] HAVING GROUP BY ,則表示 GROUP BY 沒有群組運算式 (全域匯總) 。If you specify HAVING without GROUP BY, it indicates a GROUP BY without grouping expressions (global aggregate).

  • 排序依據ORDER BY

    查詢完整結果集的資料列排序。An ordering of the rows of the complete result set of the query. 輸出資料列會在資料分割之間排序。The output rows are ordered across the partitions. 此參數與 SORT BYCLUSTER BY 、和互斥, DISTRIBUTE BY 不能同時指定。This parameter is mutually exclusive with SORT BY, CLUSTER BY, and DISTRIBUTE BY and cannot be specified together.

  • 排序依據SORT BY

    在每個資料分割內排序資料列的順序。An ordering by which the rows are ordered within each partition. 此參數與和互斥, ORDER BY CLUSTER BY 而且不能同時指定。This parameter is mutually exclusive with ORDER BY and CLUSTER BY and cannot be specified together.

  • 叢集依據CLUSTER BY

    一組用來重新分割和排序資料列的運算式。A set of expressions that is used to repartition and sort the rows. 使用這個子句和一起使用的效果相同 DISTRIBUTE BY SORT BYUsing this clause has the same effect of using DISTRIBUTE BY and SORT BY together.


    用來重新分割結果資料列的一組運算式。A set of expressions by which the result rows are repartitioned. 此參數與和互斥, ORDER BY CLUSTER BY 而且不能同時指定。This parameter is mutually exclusive with ORDER BY and CLUSTER BY and cannot be specified together.

  • 限制LIMIT

    語句或子查詢可以傳回的最大資料列數。The maximum number of rows that can be returned by a statement or subquery. 這個子句大部分是與一起用 ORDER BY 來產生決定性的結果。This clause is mostly used in the conjunction with ORDER BY to produce a deterministic result.

  • boolean_expressionboolean_expression

    評估為結果型別的任何運算式 BooleanAny expression that evaluates to a result type Boolean. 您可以使用邏輯運算子 (、) ,將兩個或多個運算式結合在一起 AND ORYou can combine two or more expressions together using the logical operators ( AND, OR ).

  • expressionexpression

    評估為值的一或多個值、運算子和 SQL 函數的組合。A combination of one or more values, operators, and SQL functions that evaluates to a value.

  • named_windownamed_window

    一或多個來源視窗規格的別名。Aliases for one or more source window specifications. 您可以在查詢的視窗定義中參考來源視窗規格。The source window specifications can be referenced in the window definitions in the query.