question

SalikRafiq-1798 avatar image
0 Votes"
SalikRafiq-1798 asked ramr-msft answered

Complex Document to Parse - Looking for Ideas

I am been tasked with parsing data from Filing information documents which has a very odd layout.

I attempted to create my own layout and model using the editor but didn't have success.

115330-10045407-sh01-2021-07-15.pdf



If you look at the attachment this is a sample of what I would like to parse. I thought I'd try Forms Recognizer but it could not handle the repetitive part as a table. The training confidence was very very low at around 35%. I did try some sample but nothing was extracted, as expected.

Does anyone have any suggestions? Perhaps Forms Recognizer is the tool to use here?

Any help appreciated.

azure-form-recognizer
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

ramr-msft avatar image
0 Votes"
ramr-msft answered

@SalikRafiq-1798 Thanks for the question. Can you please add more details that has been extracted from the custom model form recognizer.
As a workaround until then you can try and use the Form Recognizer train with labels feature and label these tables as key value pairs, labeling each cell of the table as a value. Please note you will need to label and train with 5 samples with the maximum number of rows in the tables. Let me know if this helps.
Please follow the document to Train a custom model using the sample labeling tool.


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.