Submit a .NET for Apache Spark job to Azure HDInsight

There are two ways to deploy your .NET for Apache Spark job to HDInsight: spark-submit and Apache Livy.

Deploy using spark-submit

You can use the spark-submit command to submit .NET for Apache Spark jobs to Azure HDInsight.

  1. Navigate to your HDInsight Spark cluster in Azure portal, and then select SSH + Cluster login.

  2. Copy the ssh login information and paste the login into a terminal. Sign in to your cluster using the password you set during cluster creation. You should see messages welcoming you to Ubuntu and Spark.

  3. Use the spark-submit command to run your app on your HDInsight cluster. Remember to replace mycontainer and mystorageaccount in the example script with the actual names of your blob container and storage account. Also, be sure to replace microsoft-spark-2.3.x-0.6.0.jar with the appropriate jar file you're using for deployment. 2.3.x represents the version of Apache Spark, and 0.6.0 represents the version of the .NET for Apache Spark worker.

    $SPARK_HOME/bin/spark-submit \
    --master yarn \
    --class org.apache.spark.deploy.DotnetRunner \
    wasbs://mycontainer@mystorageaccount.blob.core.windows.net/microsoft-spark-2.3.x-0.6.0.jar \
    wasbs://mycontainer@mystorageaccount.blob.core.windows.net/publish.zip mySparkApp
    

Deploy using Apache Livy

You can use Apache Livy, the Apache Spark REST API, to submit .NET for Apache Spark jobs to an Azure HDInsight Spark cluster. For more information, see Remote jobs with Apache Livy.

You can run the following command on Linux using curl:

curl -k -v -X POST "https://<your spark cluster>.azurehdinsight.net/livy/batches" \
-u "<hdinsight username>:<hdinsight password>" \
-H "Content-Type: application/json" \
-H "X-Requested-By: <hdinsight username>" \
-d @- << EOF
{
    "file":"abfss://<your-file-system-name>@<your-storage-account-name>.dfs.core.windows.net/<some dir>/microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar",
    "className":"org.apache.spark.deploy.dotnet.DotnetRunner",
    "files":["abfss://<your-file-system-name>@<your-storage-account-name>.dfs.core.windows.net/<some dir>/<udf assembly>", "abfss://<your-file-system-name>@<your-storage-account-name>.dfs.core.windows.net/<some dir>/<file>"],
    "args":["abfss://<your-file-system-name>@<your-storage-account-name>.dfs.core.windows.net/<some dir>/<your app>.zip","<your app>","<app arg 1>","<app arg 2>,"...","<app arg n>"]
}
EOF

Next steps