用户定义的标量函数 (Udf) User-defined scalar functions (UDFs)

用户定义的标量函数 (Udf) 是在一行上操作的用户可编程的例程。User-defined scalar functions (UDFs) are user-programmable routines that act on one row. 此文档列出了创建和注册 Udf 所需的类。This documentation lists the classes that are required for creating and registering UDFs. 它还包含演示如何定义和注册 Udf 以及如何在 Spark SQL 中调用它们的示例。It also contains examples that demonstrate how to define and register UDFs and invoke them in Spark SQL.

UserDefinedFunctionUserDefinedFunction class

若要定义用户定义函数的属性,可以使用此类中定义的某些方法。To define the properties of a user-defined function, you can use some of the methods defined in this class.

  • asNonNullable ( # A1: UserDefinedFunction:更新 UserDefinedFunction 为不可为 null。asNonNullable(): UserDefinedFunction: Updates UserDefinedFunction to non-nullable.
  • asNondeterministic ( # A1: UserDefinedFunction:更新 UserDefinedFunction 为不确定性。asNondeterministic(): UserDefinedFunction: Updates UserDefinedFunction to nondeterministic.
  • withName (名称: String) : UserDefinedFunctionUserDefinedFunction 具有给定名称的更新。withName(name: String): UserDefinedFunction: Updates UserDefinedFunction with a given name.

示例Examples

ScalaScala

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.udf

val spark = SparkSession
      .builder()
      .appName("Spark SQL UDF scalar example")
      .getOrCreate()

// Define and register a zero-argument non-deterministic UDF
// UDF is deterministic by default, i.e. produces the same result for the same input.
val random = udf(() => Math.random())
spark.udf.register("random", random.asNondeterministic())
spark.sql("SELECT random()").show()
// +-------+
// |UDF()  |
// +-------+
// |xxxxxxx|
// +-------+

// Define and register a one-argument UDF
val plusOne = udf((x: Int) => x + 1)
spark.udf.register("plusOne", plusOne)
spark.sql("SELECT plusOne(5)").show()
// +------+
// |UDF(5)|
// +------+
// |     6|
// +------+

// Define a two-argument UDF and register it with Spark in one step
spark.udf.register("strLenScala", (_: String).length + (_: Int))
spark.sql("SELECT strLenScala('test', 1)").show()
// +--------------------+
// |strLenScala(test, 1)|
// +--------------------+
// |                   5|
// +--------------------+

// UDF in a WHERE clause
spark.udf.register("oneArgFilter", (n: Int) => { n > 5 })
spark.range(1, 10).createOrReplaceTempView("test")
spark.sql("SELECT * FROM test WHERE oneArgFilter(id)").show()
// +---+
// | id|
// +---+
// |  6|
// |  7|
// |  8|
// |  9|
// +---+

JavaJava

import org.apache.spark.sql.*;
import org.apache.spark.sql.api.java.UDF1;
import org.apache.spark.sql.expressions.UserDefinedFunction;
import static org.apache.spark.sql.functions.udf;
import org.apache.spark.sql.types.DataTypes;

SparkSession spark = SparkSession
      .builder()
      .appName("Java Spark SQL UDF scalar example")
      .getOrCreate();

// Define and register a zero-argument non-deterministic UDF
// UDF is deterministic by default, i.e. produces the same result for the same input.
UserDefinedFunction random = udf(
  () -> Math.random(), DataTypes.DoubleType
);
random.asNondeterministic();
spark.udf().register("random", random);
spark.sql("SELECT random()").show();
// +-------+
// |UDF()  |
// +-------+
// |xxxxxxx|
// +-------+

// Define and register a one-argument UDF
spark.udf().register("plusOne", new UDF1<Integer, Integer>() {
  @Override
  public Integer call(Integer x) {
    return x + 1;
  }
}, DataTypes.IntegerType);
spark.sql("SELECT plusOne(5)").show();
// +----------+
// |plusOne(5)|
// +----------+
// |         6|
// +----------+

// Define and register a two-argument UDF
UserDefinedFunction strLen = udf(
  (String s, Integer x) -> s.length() + x, DataTypes.IntegerType
);
spark.udf().register("strLen", strLen);
spark.sql("SELECT strLen('test', 1)").show();
// +------------+
// |UDF(test, 1)|
// +------------+
// |           5|
// +------------+

// UDF in a WHERE clause
spark.udf().register("oneArgFilter", new UDF1<Long, Boolean>() {
  @Override
  public Boolean call(Long x) {
    return  x > 5;
  }
}, DataTypes.BooleanType);
spark.range(1, 10).createOrReplaceTempView("test");
spark.sql("SELECT * FROM test WHERE oneArgFilter(id)").show();
// +---+
// | id|
// +---+
// |  6|
// |  7|
// |  8|
// |  9|
// +---+