Hi ,
I have created a data flow from ADLS source to Synapse table. This dataflow has one alterrow with upsertif true()
Screenshots below show the settings I used in the dataflow and pipeline.
I am able to run this dataflow with under 1000 records without any pipeline failures.
However, I need to run this dataflow for 1million records. I always get the below errors, I could not find any good information on the documentation.
Could you please help!
Error 1: {"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: The connection is closed.","Details":"shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:234)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerConnection.checkClosed(SQLServerConnection.java:1217)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerConnection.rollback(SQLServerConnection.java:3508)\n\tat org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:713)\n\tat org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:839)\n\tat org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:839)\n\tat org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:987)\n\tat org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:987)\n\tat org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:"}
Error 2. {"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: Could not find prepared statement with handle 1.","Details":"java.sql.BatchUpdateException: Could not find prepared statement with handle 1.\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2085)\n\tat org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:672)\n\tat org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:839)\n\tat org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:839)\n\tat org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:987)\n\tat org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:987)\n\tat org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2321)\n\tat org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2321)\n\tat org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)\n\tat org.apache.spark.scheduler.Task.doRunTask(Task.scala:140)\n\tat or"}




