Cant find any valid labels for provided dataset / Label file belongs to other document / Cannot use fields.json [AZURE FORM RECOGNIZER]

Iulian Cojocaru 6 Reputation points
2021-08-03T12:00:36.85+00:00

Hello,

I am trying to train a custom model using labels provided using my app UI (PHP).
After uploading documents + (label,ocr,fields) JSON files to azure storage services - all documents are named/structured according to documentation I run into the following errors when training request is sent:
Cant find any valid labels for provided dataset / Label file belongs to other document / Cannot use fields.json when I send the training request.
I inspected all documents + label ocr and fields jsons and everything respects all documentation standards.
For testing purposes I uploaded manually the documents into the ocr testing tool and trained my custom model using my custom labels - everything works ok. Before pressing the train button i've downloaded all files generated by the testing tool and compared them with files uploaded to storage container using my app. No differences were found.
In labeling tool tab inside ocr testing tool document+labels looks ok - all labels have the correct position and content.

Does anyone have any idea how can i fix this?
Thanks :)

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,342 questions
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 41,861 Reputation points Microsoft Employee
    2021-08-03T14:13:50.613+00:00

    @Iulian Cojocaru Is it possible to point your app the same container SAS URL and check if the training can be started? I think this might eliminate any issues with the permissions to the container. If the same error is seen then I would re-check the request body for any sourceFilter settings that are set to false. For example:

    {  
      "source": "<SAS URL>",  
      "sourceFilter": {  
        "prefix": "<prefix string>",  
        "includeSubFolders": true  
      },  
      "useLabelFile": false  
    }  
    

    Since you have used the same set of files with the tool I would assume the naming convention is followed for the train request from your app. If not, could you please check if the convention is followed, For example:

    Form: 1994.pdf  
    Labels file: 1994.pdf.labels.json  
    OCR file: 1994.pdf.ocr.json  
    

    But, it would be interesting to check if the tool is creating newer json files from the original document instead of the file you manually uploaded.

    Lastly, Is the version of the API used in the app and the labeling tool version the same? You can lookup the tool version on the bottom right corner.

    1 person found this answer helpful.