question

DC-q489h2 avatar image
0 Votes"
DC-q489h2 asked MartinJaffer-MSFT commented

Data flow failing "Job failed due to reason: Unexpected end of input stream."

I have a data flow that flow that works as expected when you step through the 'data preview' of each step in the data flow. But when I got to actually run the data flow I receive the following error message and I cannot resolve it:
{"message":"Job failed due to reason: Unexpected end of input stream. Details:java.io.EOFException: Unexpected end of input stream\n\tat org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:165)\n\tat org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)\n\tat java.io.InputStream.read(InputStream.java:101)\n\tat org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:182)\n\tat org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:218)\n\tat org.apache.hadoop.util.LineReader.readLine(LineReader.java:176)\n\tat org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:193)\n\tat org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)\n\tat org.apache.spark.sql.execution.datasources.HadoopFileLinesReader.hasNext(HadoopFileLinesReader.scala:69)\n\tat scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)\n\tat scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)\n\tat scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)\n\tat scala.c","failureType":"UserError","target":"Incoming to Cleansed","errorCode":"DFExecutorUserError"}

azure-data-factoryazure-synapse-analytics
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @DC-q489h2 and welcome to Microsoft Q&A.

This error sounds like one of the rows of data is incomplete, perhaps.

The reason you do not see it in the preview step, is that the preview only load the first thousand rows or so. (I need to double-check the number). This also then means the problematic row is after the preview rows.

Narrowing down which row depends upon the partitioning.

0 Votes 0 ·

Hi @MartinJaffer-MSFT thank you for your response. Is there a way to skip and log the incomplete rows within the data flow?

0 Votes 0 ·

Hellow @DC-q489h2 . you can change how the problem rows are handled.

In your Dataflow, go to the sink transformation. Scrolling to the bottom, you may find "Error Row Handling Settings". Expand this, and change the "Error row handling" from "Stop on first error (default)" to "Continue on error". Then you can enable "Output rejected data" and log those problem rows.

Picture below

112762-image.png


0 Votes 0 ·
image.png (113.1 KiB)

0 Answers