Why DocTypeConfidence is Low ?

Kais Hefaiedh 116 Reputation points
2021-09-09T15:44:27.647+00:00

Hi,
I use Form Recogniser to identify a document in order to calss it in the right category (Document type)
I used 5 forms for my model with one empty form (without data), some forms with electronic data (filled by a computer), and one is handwritten.

I test a handwritten document (see picture below), and i have a TypeDocConfidence = 16% while for all the tag i have about 99%.

Note that all my tags are fixed elements of the form like the form title, section title or Logo. (See picture below)

So here is my questions :

1) if my TypeDocConfidence is high (90-100), that mean that my tested document is recognided vs the model ?
2) If the answer for 1) is yes, why my TypeDocConfidence is low since he gave a 99% for tags as a result?
3)How to improve the TypeDocConfidence score ?
4) Is the use of fixed elements in the forms to recognise the form is a good way ? note that i only need to recognise a document, i dont need to exctract data. so that's why i used fixed elements.

My Results :
130796-image.png

My tags :
130832-image.png

Tested document :
130738-image.png

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,365 questions
{count} vote

2 answers

Sort by: Most helpful
  1. YutongTie-MSFT 46,406 Reputation points
    2021-10-06T05:00:43.897+00:00

    @Kais Hefaiedh

    Sorry for the long waiting, I have checked with several product guys to figure this out since there is no document about this. The answer I final got is as below:

    First we have a quick question to make sure there is no misunderstanding. Is the test document a different type or template than the one that the model was trained for or is it the same type but being given a low doc type confidence?

    Answers to your questions below:

    1) If DocTypeConfidence is high, it means the the tested document is likely the same type as the ones used to train the model
    2) The tag confidence and doc type are separate, so in this case it means we are not confident that the test document is of the same type as the model but if they are the same doc type, we are confident about the predicted value
    3) Depends on the situation - if the test document is different type the low DocTypeConfidence is expected, otherwise, are they able to extract the right values for their test doc?

    For question 4, what do you mean by fixed element? If you use the same type of document (does this mean fixed element on your side?), it will do help the recogniton.

    Regards,
    Yutong

    0 comments No comments

  2. Kais Hefaiedh 116 Reputation points
    2021-10-06T13:48:32.58+00:00

    Hi,
    When i have low DocTypeConfidence, its about the same type of test form and form in the model.

    For questiuon 4) By fixed element i mean element that dont change like the title (see the image document and the dats in square) and always are there in the same place with same value. So they are not real key value pair.

    What do you think?

    0 comments No comments