question

ericholland-7424 avatar image
0 Votes"
ericholland-7424 asked ericholland-7424 commented

Any way to improve form recognizer (layout) line separation when vertical form lines are present

When using layout, form recognizer often does not split text when vertical form lines are present. Instead, it combines two adjacent form cells into a single line. In the example photo, you can see boxes 16 and 17 combined as well as boxes 18 and 19, also the values in the second line for boxes 19 and 20. It seems to be be inconsistent/poor at splitting lines in a useful manner.127598-formrec-issue.png

Can this be improved?


azure-form-recognizer
formrec-issue.png (1.7 MiB)
· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

AWS textract handles these correctly by the way.

0 Votes 0 ·

Hi, thanks for reaching out. Can you please share the complete form that you are trying to analyze? It seems there are multiple table structures in this form hence the inconsistent result.

0 Votes 0 ·

Here is the same form from AWS textract. You can see they did a much better detection. Everything is properly separated128337-awscapture.png


0 Votes 0 ·
awscapture.png (78.8 KiB)

Thanks for your feedback. Will review and get back to you shortly.

0 Votes 0 ·

1 Answer

GiftA-MSFT avatar image
0 Votes"
GiftA-MSFT answered ericholland-7424 commented

Hi, I'm unable to reproduce this issue. Are you using the labeling tool?

128471-image.png



image.png (231.3 KiB)
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

No. I am just using "layout" not a custom model.
Thanks

0 Votes 0 ·

I'm also referring to Layout. At the sample tool home page select "use layout to get text, tables and selection marks".


0 Votes 0 ·

I am using the .NET client and showing LINES. I think the labeling tool is showing words. So my issue is that the service response is not grouping the words into lines very well.

i.e. the 9 boxes starting with "16" ... "tax" all get combined into one Line. It should come back as 2 lines broken at "17". The JSON results in the labeling tool show the same line grouping as I get from the .NET client call. I dont know of any way to visualize the lines rather than words with the tool.

0 Votes 0 ·