question

29998411 avatar image
0 Votes"
29998411 asked 29998411 answered

How do I get the specified text from a multi-page PDF with form recognizer?

How do I get the specified text from a multi-page PDF with form recognizer?


I am aware that the same label name cannot be used for multiple pages.



I am aware that the same label name cannot be used for multiple pages.

One way is to change the label names according to the page number (e.g.; title1, title2) . Or, I think there is a way to split the PDF file.

Is there any other way other than the above?

azure-form-recognizer
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@29998411 I believe you have a certain field or text in a multi page document that needs to be extracted with a similar label or tag?
If Yes, then the mentioned methods should work to extract the text and you can process the tags or result in your client application later.
If possible, you can split the document as a pre-processing activity and use the model to extract text, this ensures that all the processed forms have the same label in the extracted text. The extracted text from each form can then be post-processed to display text as extracted.

1 Vote 1 ·

1 Answer

29998411 avatar image
1 Vote"
29998411 answered

I understand that there is no other solution than the h2 method in a case like this. 1. change the label names according to the page number (e.g.; title1, title2) .

  1. change the label names according to the page number (e.g.; title1, title2) .

  2. split each page by preprocessing

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.