I would like to run spatial queries on large data sets; e.g. geopandas would be too slow. Inspiration I found here: https://anant-sharma.medium.com/apache-sedona-geospark-using-pyspark-e60485318fbe
But I have trouble registering the spatial functions I would like to use in SparkSQL (or PySpark).
In Spark Pool of Synapse Analytics I prepared (via Azure Portal):
Apache Spark Pool / Settings / Packages / Requirement files / requirement.txt: apache-sedona
Apache Spark Pool / Settings / Packages / Workspace packages:
geotools-wrapper-geotools-24.1.jar
sedona-sql-3.0_2.12-1.2.0-incubating.jar
Apache Spark Pool / Settings / Packages / Spark configuration / config.txt:
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator org.apache.sedona.core.serde.SedonaKryoRegistrator
Pyspark Notebook:
print(spark.version)
print(spark.conf.get("spark.kryo.registrator"))
print(spark.conf.get("spark.serializer"))
Print output from notebook:
3.1.2.5.0-58001107
org.apache.sedona.core.serde.SedonaKryoRegistrator
org.apache.spark.serializer.KryoSerializer
Then trying:
from pyspark.sql import SparkSession
from sedona.register import SedonaRegistrator
from sedona.utils import SedonaKryoRegistrator, KryoSerializer
spark = SparkSession.builder.master("local[*]").appName("Sedona App").config("spark.serializer", KryoSerializer.getName).config("spark.kryo.registrator", SedonaKryoRegistrator.getName).getOrCreate()
SedonaRegistrator.registerAll(spark)
But it failed: Py4JJavaError: An error occurred while calling o636.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: org.apache.spark.SparkException: Failed to register classes with Kryo
A simple check that stuff is correctly installed would probaly allow this:
%%sql
SELECT ST_Point(0,0);
Please help with getting the spatial functions registered in pyspark running in Synapse notebook!

or upvote
button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is