Poor Performance of Document Intelligence on Table Extractions

Lee, Charlie 0

We started exploring the Azure Document Intelligence tool and unfortunately experienced very poor performance on table extraction. The tables we aim to extract have dynamic column names and rows, but they share a pretty similar overall structure (e.g., header, sub-header, and row header). All values except the headers are selection marks, as shown below in the original table. Apparently, Azure DI fails to detect all selection marks within this table. Could you please let us know how we can improve our custom models? We have tried creating ad-hoc labels in selection marks for all undetected selection marks from the source PDF. However, the results were no better than the default model. Any help would be greatly appreciated. Thanks!

User's image

VasaviLankipalle-MSFT 14,911 Reputation points

2024-04-30T20:17:50.43+00:00

Hello @Lee, Charlie , Thanks for using Microsoft Q&A Platform.

Is it possible to share the custom model details and the API version you are working on?
Lee, Charlie 0 Reputation points

2024-04-30T20:28:39.56+00:00

Thank you for your response. We have been using the prebuilt model from the backend and exploring custom models from the front end. Regarding the custom models, since the "auto label" did not detect some of the selection marks in a table object, we manually tagged the undetected selection marks ourselves. That's all we did. As for the API, we are using the 2024-02-29-preview version.
Lee, Charlie 0 Reputation points

2024-04-30T21:15:35.26+00:00

Share via

Poor Performance of Document Intelligence on Table Extractions