question

KarunaIshwarya-0598 avatar image
0 Votes"
KarunaIshwarya-0598 asked KarunaIshwarya-0598 commented

Loading 22 Mil + records into neo4j database via databricks environment

I am using neo4j spark connector and via databricks environment - I have loaded 22 Mil records into neo4j database and While establishing relationships for this 22mil records with the query

JournalLineItemdf.repartition(1).write.format("org.neo4j.spark.DataSource")\
.mode("Overwrite")\
.option("url", "bolt://xxx:7687")\
.option("authentication.basic.username", "xxx")\
.option("authentication.basic.password", "xxx")\
.option("database", "mydb")\
.option("query","""CALL apoc.periodic.iterate(
'MATCH(jh:JournalHeader) return jh', 'with jh MATCH (jli:JournalLineItem) WHERE jli.glheader_id=jh.uid MERGE(jh)-[:HAS_A]->(jli)',{batchSize:10000, parallel: true, iterateList:true}) yield batch return null;""")\
.save()

Though the required relationships got established in neo4j database. The spark job is hanging and no updates are being written.

Is there a way to make the query more efficient so once the relationships got established it will finish the job. Because in this case though the relationships got established still the query is searching for all the other records which is not necessary in this case.

Also If i gave the query inside the neo4j environment it is working but using datbricks neo4j spark connector and writing into neo4j is causing issues

azure-databricks
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @KarunaIshwarya-0598,

Welcome to the Microsoft Q&A platform.

May I know which Databricks Runtime are you using?

1 Vote 1 ·

1 Answer

KarunaIshwarya-0598 avatar image
0 Votes"
KarunaIshwarya-0598 answered KarunaIshwarya-0598 commented

DBR version 6.4 (includes Apache Spark 2.4.5, Scala 2.11) - We are using this version because we are loading the Comman Data Model - CDM data from Azure Data Lake Storage Gen 2 to Azure Databricks using Spark CDM connector spark-cdm-connector-assembly-0.19.1.jar . It is compatible with spark version 2.4+ . It is not compatible with spark 3 + . Hence We are using DBR version 6.4

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @KarunaIshwarya-0598,

Thanks for the details.

For a deeper investigation and immediate assistance on this issue, if you have a support plan you may file a support ticket. In case if you don't have the support plan do let us know.

0 Votes 0 ·

Hi @PRADEEPCHEEKATLA-MSFT . We don't have a support plan.

0 Votes 0 ·