question

IulianCojocaru-9715 avatar image
0 Votes"
IulianCojocaru-9715 asked DivyaRustagi-3948 commented

Cant find any valid labels for provided dataset / Label file belongs to other document / Cannot use fields.json [AZURE FORM RECOGNIZER]

Hello,

I am trying to train a custom model using labels provided using my app UI (PHP).
After uploading documents + (label,ocr,fields) JSON files to azure storage services - all documents are named/structured according to documentation I run into the following errors when training request is sent:
Cant find any valid labels for provided dataset / Label file belongs to other document / Cannot use fields.json when I send the training request.
I inspected all documents + label ocr and fields jsons and everything respects all documentation standards.
For testing purposes I uploaded manually the documents into the ocr testing tool and trained my custom model using my custom labels - everything works ok. Before pressing the train button i've downloaded all files generated by the testing tool and compared them with files uploaded to storage container using my app. No differences were found.
In labeling tool tab inside ocr testing tool document+labels looks ok - all labels have the correct position and content.

Does anyone have any idea how can i fix this?
Thanks :)

azure-form-recognizer
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

I am having a similar issue. I am using the sample labeling tool to make all the labels, and I am not sure what the issue is. I had originally had '/' in my label names and table tag fields, but I removed all of it.

I couldn't find the documentation for naming conventions, would appreciate it if you could share the link to it.

Here's a screenshot of my error:

120687-image.png


0 Votes 0 ·
image.png (103.8 KiB)

1 Answer

romungi-MSFT avatar image
1 Vote"
romungi-MSFT answered

@IulianCojocaru-9715 Is it possible to point your app the same container SAS URL and check if the training can be started? I think this might eliminate any issues with the permissions to the container. If the same error is seen then I would re-check the request body for any sourceFilter settings that are set to false. For example:

 {
   "source": "<SAS URL>",
   "sourceFilter": {
     "prefix": "<prefix string>",
     "includeSubFolders": true
   },
   "useLabelFile": false
 }

Since you have used the same set of files with the tool I would assume the naming convention is followed for the train request from your app. If not, could you please check if the convention is followed, For example:

 Form: 1994.pdf
 Labels file: 1994.pdf.labels.json
 OCR file: 1994.pdf.ocr.json

But, it would be interesting to check if the tool is creating newer json files from the original document instead of the file you manually uploaded.

Lastly, Is the version of the API used in the app and the labeling tool version the same? You can lookup the tool version on the bottom right corner.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Ty for response

1) Training requests works ok when uploading and labeling manually into ocr tool (sending request from app starts the training process). When files are uploaded through app- errors specified in title of this thread appear
2)Re-checked request body when using includeSubfolders True && useLabelFile False - training process starts - but cannot use model for prediction in ocr tool(when selecting the created trained model it gets stuck at Loading model information inside ocr testing tool). When using includeSubfolders false&& useLabelFile True - errors specified in title of this thread appear
3) Naming convention was respected, but i re-checked content of labels file and found out that 'document' field in Labels json file was missing index of prefix ( ex: document.pdf instead of 0document.pdf) - THIS SOLVED MY ISSUE

1 Vote 1 ·