question

KaisHefaiedh-9247 avatar image
1 Vote"
KaisHefaiedh-9247 asked KaisHefaiedh-9247 answered

Why DocTypeConfidence is Low ?

Hi,
I use Form Recogniser to identify a document in order to calss it in the right category (Document type)
I used 5 forms for my model with one empty form (without data), some forms with electronic data (filled by a computer), and one is handwritten.

I test a handwritten document (see picture below), and i have a TypeDocConfidence = 16% while for all the tag i have about 99%.

Note that all my tags are fixed elements of the form like the form title, section title or Logo. (See picture below)

So here is my questions :

1) if my TypeDocConfidence is high (90-100), that mean that my tested document is recognided vs the model ?
2) If the answer for 1) is yes, why my TypeDocConfidence is low since he gave a 99% for tags as a result?
3)How to improve the TypeDocConfidence score ?
4) Is the use of fixed elements in the forms to recognise the form is a good way ? note that i only need to recognise a document, i dont need to exctract data. so that's why i used fixed elements.

My Results :
130796-image.png

My tags :
130832-image.png

Tested document :
130738-image.png


azure-form-recognizer
image.png (208.9 KiB)
image.png (619.4 KiB)
image.png (332.7 KiB)
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks for reaching out to us. We are investigating this internally and let you know soon.


Regards,
Yutong

0 Votes 0 ·
YutongTie-MSFT avatar image
0 Votes"
YutongTie-MSFT answered YutongTie-MSFT edited

@KaisHefaiedh-9247

Sorry for the long waiting, I have checked with several product guys to figure this out since there is no document about this. The answer I final got is as below:

First we have a quick question to make sure there is no misunderstanding. Is the test document a different type or template than the one that the model was trained for or is it the same type but being given a low doc type confidence?

Answers to your questions below:


1) If DocTypeConfidence is high, it means the the tested document is likely the same type as the ones used to train the model
2) The tag confidence and doc type are separate, so in this case it means we are not confident that the test document is of the same type as the model but if they are the same doc type, we are confident about the predicted value
3) Depends on the situation - if the test document is different type the low DocTypeConfidence is expected, otherwise, are they able to extract the right values for their test doc?

For question 4, what do you mean by fixed element? If you use the same type of document (does this mean fixed element on your side?), it will do help the recogniton.

Regards,
Yutong

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

KaisHefaiedh-9247 avatar image
0 Votes"
KaisHefaiedh-9247 answered

Hi,
When i have low DocTypeConfidence, its about the same type of test form and form in the model.

For questiuon 4) By fixed element i mean element that dont change like the title (see the image document and the dats in square) and always are there in the same place with same value. So they are not real key value pair.

What do you think?

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.