表值函数Table-valued functions

表值函数 (TVF) 是返回关系或一组行的函数。A table-valued function (TVF) is a function that returns a relation or a set of rows. Spark SQL 中有两种类型的 Tvf:There are two types of TVFs in Spark SQL:

  • 可在子句中指定的 TVF FROM ,例如 rangeTVF that can be specified in a FROM clause, for example, range.
  • 可在和子句中指定的 TVF SELECT LATERAL VIEW ,例如 explodeTVF that can be specified in SELECT and LATERAL VIEW clauses, for example, explode.

语法Syntax

function_name ( expression [ , ... ] ) [ table_alias ]

参数Parameters

  • expressionexpression

    指定导致值的一个或多个值、运算符和 SQL 函数的组合。Specifies a combination of one or more values, operators and SQL functions that results in a value.

  • table_aliastable_alias

    指定包含可选列名称列表的临时名称。Specifies a temporary name with an optional column name list.

    语法:[ AS ] table_name [ ( column_name [ , ... ] ) ]Syntax: [ AS ] table_name [ ( column_name [ , ... ] ) ]

支持的表值函数Supported table-valued functions

可在子句中指定的 Tvf FROMTVFs that can be specified in FROM clauses

函数Function 参数类型 (s) Argument Type(s) 说明Description
范围 (结束) range (end) LongLong 创建一个表,其中包含一个名为 id 的 LongType 列,其中包含的行范围介于0到 end (独占) ,步长值为1。Creates a table with a single LongType column named id, containing rows in a range from 0 to end (exclusive) with step value 1.
范围 (开始、结束) range (start, end) Long、LongLong, Long 创建一个表,其中包含一个名为 id 的 LongType 列,其中包含从开始到端的范围内的行, (独占) ,步骤值为1。Creates a table with a single LongType column named id, containing rows in a range from start to end (exclusive) with step value 1.
范围 (开始、结束、步骤) range (start, end, step) Long、Long、LongLong, Long, Long 创建一个表,其中包含一个名为 id 的 LongType 列,其中包含从开始到端 (具有步长值的独占) 的范围内的行。Creates a table with a single LongType column named id, containing rows in a range from start to end (exclusive) with step value.
范围 (开始、结束、步骤、numPartitions) range (start, end, step, numPartitions) Long、Long、Long、IntLong, Long, Long, Int 创建一个表,其中包含一个名为 id 的 LongType 列,其中包含从开始到端的范围内的行 (具有步长值的独占) ,并指定了分区号 numPartitions。Creates a table with a single LongType column named id, containing rows in a range from start to end (exclusive) with step value, with partition number numPartitions specified.

可在 SELECT 和子句中指定的 LATERAL VIEW tvfTVFs that can be specified in SELECT and LATERAL VIEW clauses

函数Function 参数类型 (s) Argument Type(s) 说明Description
(expr 的分解) explode (expr) 数组/映射Array/Map 将 array expr 的元素分为多行,或将 map expr 的元素分为多个行和列。Separates the elements of array expr into multiple rows, or the elements of map expr into multiple rows and columns. 除非另行指定,否则将为数组的元素或键的元素使用默认列名列,并为映射的元素使用值。Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.
explode_outer (expr) explode_outer (expr) 数组/映射Array/Map 将 array expr 的元素分为多行,或将 map expr 的元素分为多个行和列。Separates the elements of array expr into multiple rows, or the elements of map expr into multiple rows and columns. 除非另行指定,否则将为数组的元素或键的元素使用默认列名列,并为映射的元素使用值。Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.
内联 (expr) inline (expr) ExpressionExpression 将结构数组分解为一个表。Explodes an array of structs into a table. 默认情况下,除非另外指定,否则使用列名称 col1、col2 等。Uses column names col1, col2, etc. by default unless specified otherwise.
inline_outer (expr) inline_outer (expr) ExpressionExpression 将结构数组分解为一个表。Explodes an array of structs into a table. 默认情况下,除非另外指定,否则使用列名称 col1、col2 等。Uses column names col1, col2, etc. by default unless specified otherwise.
posexplode (expr) posexplode (expr) 数组/映射Array/Map 将 array expr 的元素分为多行和位置,或者将 map expr 的元素分为多个行和列。Separates the elements of array expr into multiple rows with positions, or the elements of map expr into multiple rows and columns with positions. 除非另行指定,否则将列名的位置、数组的元素的列或键的元素以及映射的元素的值。Unless specified otherwise, uses the column name pos for position, col for elements of the array or key and value for elements of the map.
posexplode_outer (expr) posexplode_outer (expr) 数组/映射Array/Map 将 array expr 的元素分为多行和位置,或者将 map expr 的元素分为多个行和列。Separates the elements of array expr into multiple rows with positions, or the elements of map expr into multiple rows and columns with positions. 除非另行指定,否则将列名的位置、数组的元素的列或键的元素以及映射的元素的值。Unless specified otherwise, uses the column name pos for position, col for elements of the array or key and value for elements of the map.
stack (n,表达式1,...,exprk) stack (n, expr1, …, exprk) Seq [Expression]Seq[Expression] 将表达式1,...,exprk 拆分为 n 行。Separates expr1, …, exprk into n rows. 默认情况下,除非另外指定,否则使用列名称 col0、col1 等。Uses column names col0, col1, etc. by default unless specified otherwise.
json_tuple (jsonStr、p1、p2、...、pn) json_tuple (jsonStr, p1, p2, …, pn) Seq [Expression]Seq[Expression] 返回 get_json_object 函数之类的元组,但它采用多个名称。Returns a tuple like the function get_json_object, but it takes multiple names. 所有输入参数和输出列类型均为字符串。All the input parameters and output column types are string.
parse_url (url,partToExtract [,key] ) parse_url (url, partToExtract[, key] ) Seq [Expression]Seq[Expression] 从 URL 中提取部件。Extracts a part from a URL.

示例Examples

-- range call with end
SELECT * FROM range(6 + cos(3));
+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
+---+

-- range call with start and end
SELECT * FROM range(5, 10);
+---+
| id|
+---+
|  5|
|  6|
|  7|
|  8|
|  9|
+---+

-- range call with numPartitions
SELECT * FROM range(0, 10, 2, 200);
+---+
| id|
+---+
|  0|
|  2|
|  4|
|  6|
|  8|
+---+

-- range call with a table alias
SELECT * FROM range(5, 8) AS test;
+---+
| id|
+---+
|  5|
|  6|
|  7|
+---+

SELECT explode(array(10, 20));
+---+
|col|
+---+
| 10|
| 20|
+---+

SELECT inline(array(struct(1, 'a'), struct(2, 'b')));
+----+----+
|col1|col2|
+----+----+
|   1|   a|
|   2|   b|
+----+----+

SELECT posexplode(array(10,20));
+---+---+
|pos|col|
+---+---+
|  0| 10|
|  1| 20|
+---+---+

SELECT stack(2, 1, 2, 3);
+----+----+
|col0|col1|
+----+----+
|   1|   2|
|   3|null|
+----+----+

SELECT json_tuple('{"a":1, "b":2}', 'a', 'b');
+---+---+
| c0| c1|
+---+---+
|  1|  2|
+---+---+

SELECT parse_url('http://spark.apache.org/path?query=1', 'HOST');
+-----------------------------------------------------+
|parse_url(http://spark.apache.org/path?query=1, HOST)|
+-----------------------------------------------------+
|                                     spark.apache.org|
+-----------------------------------------------------+

-- Use explode in a LATERAL VIEW clause
CREATE TABLE test (c1 INT);
INSERT INTO test VALUES (1);
INSERT INTO test VALUES (2);
SELECT * FROM test LATERAL VIEW explode (ARRAY(3,4)) AS c2;
+--+--+
|c1|c2|
+--+--+
| 1| 3|
| 1| 4|
| 2| 3|
| 2| 4|
+--+--+