question

mikelor-9984 avatar image
0 Votes"
mikelor-9984 asked ramr-msft commented

Form Recognizer Change in Behavior in 2.1 vs 2.0 recognizing table cells with rowspan

I recently upgraded my TSA Throughput Project that processes TSA Throughput PDF files like this one on the TSA FOIA Reading Room site to use the latest stable version 3.1.1 of the Azure.AI.FormRecognizer client sdk.

This defaults to use the 2.1 version of the Form Recognizer service. The 2.1 version does not have the same behavior as the 2.0 client when recognizing rows that span more than one row.

Given the figure below
137621-image.png

Version 2.0 would recognize the Date and Hour of Day cells in the cell before the cell containing "ANC".
Version 2.1 would recognize
the Date in the cell before the cell in "MDW"
the Hour of Day in the cell before the "blank" cells and cell containing "Terminal 7 - Passenger"

This seems to be a less than optimum result as it doesn't correctly reflect the "Table Layout / Rowspan" coordinates. To fix this I had to specifically request version 2.0 using the FormRecognizerClientOptions.

What should happen: Version 2.1 should return the same Table/Cells layout as version 2.0 respecting RowSpan an Colspan layouts


azure-form-recognizer
image.png (266.7 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

mikelor-9984 avatar image
1 Vote"
mikelor-9984 answered ramr-msft commented

I did "workaround" the issue by utilizing the FormRecognizerClientOptions class and setting the ServiceVersion to V2.0. See Line 53 in Program.cs

But I think the V2.1 is broken.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@mikelor-9984 Thanks for the details. We have forwarded to the product team to check on this.

1 Vote 1 ·