Hi @Dheeraj ,
You can use limit(100)
function to get 100 files to achieve the same.
Please let me know if you have any other questions.
Thanks
Saurabh
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
I am limited to use 4 vCores / 32 GB - 3 to 10 nodes configuration of apache spark pool.
I am trying to load all xml files from given folder with below code:
spark.read.format("com.databricks.spark.xml").option("rowTag","Quality").load("/mnt/dev/tmp/xml/100_file/M*.xml")
But the number of files in the folder is more than thousands and my small synapse spark pool with 32gb ram is not able to handle so many files efficiently. So, what I want is to read only 1st 100 files in 1st round. then next 100 files and so on..
is there any api function that allows me to do this?
Hi @Dheeraj ,
You can use limit(100)
function to get 100 files to achieve the same.
Please let me know if you have any other questions.
Thanks
Saurabh
Hi,
I realize this is a bit late in this thread, but I'm struggling to get that line working where you load the xml files:
spark.read.format("com.databricks.spark.xml").option("rowTag","Quality").load("/mnt/dev/tmp/xml/100_file/M*.xml")
How did you install the package in Synapse Analytics spark pool that supports this format? I'm getting this error:
"java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml"
Regards,
LJ