Discrepancy Between Document Intelligence Python SDK and UI Labels

Syed Umair Hasan 90

Hello, this is quite urgent as I'm encountering an issue with Document Intelligence. I'm using API version 2024-02-29-preview both in Document Intelligence UI and the Python SDK, which is the latest version supporting this API.

SDK version	Supported API service version
1.0.0b1	2023-10-31-preview
1.0.0b1	2023-10-31-preview
1.0.0b2	2024-02-29-preview

The problem arises when I test the same file on a trained custom extraction model using both the UI and the Python SDK. The output JSON from the UI correctly detects table cells, whereas the Python SDK incorrectly detects one cell, causing it to be missing from the custom table. This discrepancy is breaking the code logic.

Why is there a difference between Document Intelligence UI and the Python SDK? How can I ensure that the Python SDK provides labels similar to Document Intelligence UI? I have attached a screenshot for reference: the output on the left is from the Python SDK, and on the right is from the UI. The label 'Related Substances' has fewer occurrences in the Python SDK and is not labeled in the custom table, while it has more occurrences in the UI JSON and is correctly labeled in the UI. Please note that I used the same file, and this is the first time this has happened.

User's image

Thank you.

romungi-MSFT 42,761 Reputation points Microsoft Employee

2024-05-01T05:43:47.8+00:00
@Syed Umair Hasan Can you add an issue on the SDK repo for python with details of the request?

To debug the issue, I have used my project to check the API requests in both the cases and it seems like the studio request uses

stringIndexType=utf16CodeUnit

as a REST API call, where as SDK uses

stringIndexType=textElements

With SDK I think you can set the same too and check if the response works as expected. See the reference here.
Syed Umair Hasan 90 Reputation points

2024-05-16T14:46:24.2533333+00:00

Hi @romungi-MSFT , I switched to azure ai doucment intelligence python sdk and changed the stringIndexType in sdk but unforutantely getting the same discrepancy, Thanks.

Share via

Discrepancy Between Document Intelligence Python SDK and UI Labels