question

tonyk-3581 avatar image
0 Votes"
tonyk-3581 asked YutongTie-MSFT answered

Form OCR Testing Tool backing page missing or offset with long single page documents

I have multiple PDF documents with 60+ pages that have been pre-processed into single-paged documents. We are tagging them using the 2.1 preview Form OCR Testing Tool (FOTT). I have successfully uploaded the first 2 documents and tagged them. When I upload the third document the white page backing in the center of the screen is offset against the recognized text after the layout has been analyzed. Any text that is not on the white background is unreadable as the text is black in color. If I upload a fourth document the white page backing is missing entirely. Without the white page backing correctly placed it isn't possible to tag the documents. The pre-processing was suggested by a Microsoft AI Expert in order to tag across pages. The single-page documents can be opened without issue using a normal Pdf reader.

Is this a known issue? or am I doing something silly?

Thank you!

120496-fott-bug.png


azure-form-recognizer
fott-bug.png (49.7 KiB)
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello,

Just to make things clear, could you please share which region you are in? Do you mean you have every document with more than 60 pages and the error is happening when you uploading the third one. Is this correct?


Regards,
Yutong

0 Votes 0 ·

I am in the UK South Region and that is where the Form Recognizer is also deployed. It's currently on the Standard pricing Tier (i.e. not the Free one)

Each document has been pre-processed from 60 individual pages into 1 single long page. See here for the utility I wrote to combine the PDF pages: https://github.com/Osborne-Clarke-UK/public_spikes/tree/main/pdfCombine

I have also done this with smaller multi-page documents (i.e. 3 pages) and had similar effects. So I don't believe it is the number of pages causing the issue, perhaps the designer not respecting the page dimensions. Also, the PDFs open and display fine from the file system. With these larger PDFs the error in the image seems to happen always on the third upload and the fourth is totally offset and unreadable.

Regards

0 Votes 0 ·

1 Answer

YutongTie-MSFT avatar image
0 Votes"
YutongTie-MSFT answered

Thanks @tonyk-3581 for the feedback, I have forwarded this issue to product group for investigation. I will update here as soon as I receive any response from them.

Regards,
Yutong

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.