Call a Java UDF from your .NET for Apache Spark application
In this article, you learn how to call a Java User-Defined Function (UDF) from your .NET for Apache Spark application.
- How to define your Java UDFs and compile them into a jar - this step is not needed if you already have a UDF defined in a jar file. In which case, all you need is the full name of the UDF function including the package.
- Register and call your Java UDF in your .NET for Apache Spark application.
Warning
.NET for Apache Spark targets an out-of-support version of .NET (.NET Core 3.1). For more information, see the .NET Support Policy.
Define and compile your Java UDFs
Create a Maven or SBT project and add the following dependencies into the project configuration file:
org.apache.spark.spark-core_2.11.<version>
org.apache.spark.spark-sql_2.11.<version>
Define your Java UDF by implementing the relevant interface (according to your UDF's signature) and importing the relevant package as shown below in a simple example
package com.ScalaUdf.app; // Name of package where UDF is defined import org.apache.spark.sql.api.java.UDF1; // UDF interface to implement public class JavaUdf implements UDF1<Integer, Integer> { // Name of the Java UDF private static final int serialVersionUID = 1; @Override public Integer call(Integer num) throws Exception { // Define logic of UDF return (num + 5); } }
Compile and package your project to create and executable jar say
UdfApp-0.0.1.jar
.
Register and call Java UDFs in .NET for Apache Spark
Use the
RegisterJava
API to register your Java UDF with Spark SQL.Register the
DataFrame
on which you want to call your UDF as an SQL Table using theCreateOrReplaceTempView
function.Use
SparkSession.Sql
to call the UDF on the table view using Spark SQL. A basic example to illustrate the above steps:class Program { static void Main() { SparkSession spark = SparkSession .Builder() .AppName("Scala/Java UDFs from .NET for Apache Spark") .GetOrCreate(); spark.Udf().RegisterJava<int>("udfAdd5", "com.ScalaUdf.app.JavaUdf"); // Register your Java UDF as 'udfAdd5' DataFrame df = spark.CreateDataFrame(new int[] { 2, 5 }); df.CreateOrReplaceTempView("numbersData"); // Create an SQL table from the DataFrame `df` DataFrame dfUdf = spark.Sql("SELECT udfAdd5(_1) As Result FROM numbersData"); // Call the registered UDF on the table dfUdf.Show(); spark.Stop(); } }
Submit this application using
spark-submit
by passing the previously compiled Java UDF jar through the--jars
option:spark-submit --master local --jars UdfApp-0.0.1.jar --class org.apache.spark.deploy.dotnet.DotnetRunner microsoft-spark-2-4_2.11-1.0.0.jar InterRuntimeUDFs.exe
The resultant
dfUdf
DataFrame had the number 5 added to each row of the input column as defined byJavaUdf
:+-------+ | Result| +-------+ | 7| | 10| +-------+
Call .NET UDF from Scala or Python in Apache Spark
You can also register and invoke a C# UDF from an Apache Spark application written in Scala or Python using the sparkdotnetudf open source tool.