Hi Expert,
I wanted to use Form Recognizer to load the data from PDF files using pipeline is there any perquisites conditions or criteria to use the data fir curation and further transformation purpose
Hi Expert,
I wanted to use Form Recognizer to load the data from PDF files using pipeline is there any perquisites conditions or criteria to use the data fir curation and further transformation purpose
Hello @ShambhuRai-4099
Thanks for reaching out to us here. Are you asking the commercial use permit for the result you got from Azure Form Recognizer Service?
Regards,
Yutong
HI Expert,
No. just wanted to check the the prerequisites or once the data loaded in blob or adf...how we can manage the load or parameters
or how it will work after scannnin.. any lefe cycle example or challanges
https://www.invoicesimple.com/wp-content/uploads/2018/05/InvoiceSimple-PDF-Template.pdf
Hello @ShambhuRai-4099
Sure, I can provide you a custom training example in Form Recognizer Studio to see the requirement, life cycle and challenge.
There are some points you should know before:
Language Support, please check if your target language is support here: https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/language-support
Supported document format, please check if your invoice is good for the format requirement: https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-model-overview#input-requirements
QuickStart guidance, you can refer to this guidance to try our product quick to see if it fulfill your need.
https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/quickstarts/try-v3-form-recognizer-studio
For the life cycle, the documents you upload will not change or disappear from the blob. After the training, the result will be in JSON file in the same blob as below screenshot.
For challenge, for now I feel Form Recognizer is a good fit for common scenario. Based on my knowledge, some of the customer is suffering from the multipage table is not supported now.
Please check above information and let me know if you have other concern, I am glad to help.
Regards,
Yutong
-Please kindly accept the answer if you feel helpful to help the community, thanks a lot.
How the data gets loaded after extraction in excel
Do you mean you want to use excel file as input? This is currently not supported. The format for input doc is as below:
Supported file formats: JPEG, PNG, BMP, TIFF, and PDF (text-embedded or scanned). Text-embedded PDFs are best to eliminate the possibility of error in character extraction and location.
Regards,
Yutong
Hi Expert,
I am talkin about output. How we can load it in blob or adl or excel and after extraction
Hello @ShambhuRai-4099
I hope I get the right point, if you are asking how the result load into blob, it depends on the feature you are using, most of them you can get the JSON result as below and download it directly.
Give feature General Document(https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-general-document) as an example, after analyzing, you can see the result in the right panel, and you can download it directly. In case you are not familiar with Form Recognizer Studio, I just made a screenshot guidance for you.
Train your custom model is different, the ocr and label JSON file will be in your blob storage directly as we mentioned above. I will highly recommend you take a try for those feature, and let me know which feature you are interested in, I can share more details about it. Below is all the models we have:
More reference you may be interested in:
https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-model-overview#model-overview
I hope this helps.
Regards,
Yutong
Hi Expert
in the blob structure, I am connecting to Azure database and when i mapped the json i am getting status, ppage, bounding text columns, i want table column header and to map with sql server table column
here is my code
{"status":"succeeded","createdDateTime":"2022-04-27T14:15:46Z","lastUpdatedDateTime":"2022-04-27T14:15:48Z","analyzeResult":{"version":"2.1.0","readResults":[{"page":1,"angle":0,"width":11.6806,"height":8.2639,"unit":"inch","lines":[{"boundingBox":[0.1977,0.8568,1.166,0.8619,1.166,1.0647,0.1977,1.0596],"text":"intertek","appearance":{"style":{"name":"other","confidence":0.878}},"words":[{"boundingBox":[0.1977,0.8619,1.1052,0.8619,1.1103,1.0697,0.2028,1.0647],"text":"intertek","confidence":0.986}]},{"boundingBox":[4.19,1.0151,5.5355,1.0151,5.5355,1.0806,4.19,1.0806],"text":"India CRUDE OIL QUALITY REPORT","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[4.19,1.0151,4.4838,1.0151,4.4838,1.0717,4.19,1.0717],"text":"India","confidence":1},{"boundingBox":[4.509,1.0152,4.7505,1.0152,4.7505,1.0717,4.509,1.0717],"text":"CRUDE","confidence":1},{"boundingBox":[4.7771,1.0151,4.8915,1.0151,4.8915,1.0717,4.7771,1.0717],"text":"OIL","confidence":1},{"boundingBox":[4.9153,1.0151,5.2293,1.0151,5.2293,1.0806,4.9153,1.0806],"text":"QUALITY","confidence":1},{"boundingBox":[5.2558,1.0151,5.5355,1.0151,5.5355,1.0717,5.2558,1.0717],"text":"REPORT","confidence":1}]},{"boundingBox":[0.2028,1.085,0.8821,1.085,0.8821,1.161,0.2079,1.161],"text":"Total Quality. Assured.","appearance":{"style":{"name":"other","confidence":0.878}},"words":[{"boundingBox":[0.2079,1.09,0.3447,1.09,0.3498,1.1661,0.2129,1.1661],"text":"Total","confidence":0.994},{"boundingBox":[0.3599,1.09,0.5932,1.09,0.5932,1.1661,0.365,1.1661],"text":"Quality.","confidence":0.991},{"boundingBox":[0.6084,1.09,0.8821,1.09,0.8821,1.161,0.6084,1.1661],"text":"Assured.","confidence":0.94}]},{"boundingBox":[0.4893,1.2427,0.6554,1.2427,0.6554,1.2984,0.4893,1.2984],"text":"Motel:","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[0.4893,1.2427,0.6554,1.2427,0.6554,1.2984,0.4893,1.2984],"text":"Motel:","confidence":1}]},{"boundingBox":[0.9796,1.2418,1.2381,1.2418,1.2381,1.2984,0.9796,1.2984],"text":"Tony","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[0.9796,1.2418,1.2381,1.2418,1.2381,1.2984,0.9796,1.2984],"text":"Tony","confidence":1}]},{"boundingBox":[4.1871,1.2386,5.4172,1.2386,5.4172,1.3128,4.1871,1.3128],"text":"Grade Name: Tony light crude oil","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[4.1871,1.2387,4.399,1.2387,4.399,1.2984,4.1871,1.2984],"text":"Grade","confidence":1},{"boundingBox":[4.4282,1.2426,4.6542,1.2426,4.6542,1.2984,4.4282,1.2984],"text":"Name:","confidence":1},{"boundingBox":[4.6847,1.2427,4.9071,1.2427,4.9071,1.3128,4.6847,1.3128],"text":"Tony","confidence":1},{"boundingBox":[4.9339,1.2386,5.0874,1.2386,5.0874,1.3128,4.9339,1.3128],"text":"light","confidence":1},{"boundingBox":[5.1128,1.2387,5.3097,1.2387,5.3097,1.2984,5.1128,1.2984],"text":"crude","confidence":1},{"boundingBox":[5.336,1.2386,5.4172,1.2386,5.4172,1.2984,5.336,1.2984],"text":"oil","confidence":1}]},{"boundingBox":[8.5307,1.2386,9.1631,1.2386,9.1631,1.3128,8.5307,1.3128],"text":"Month Reporting:","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[8.5307,1.2386,8.7644,1.2386,8.7644,1.2984,8.5307,1.2984],"text":"Month","confidence":1},{"boundingBox":[8.7956,1.24,9.1631,1.24,9.1631,1.3128,8.7956,1.3128],"text":"Reporting:","confidence":1}]},{"boundingBox":[9.5881,1.2418,9.8167,1.2418,9.8167,1.2984,9.5881,1.2984],"text":"Jan-22","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[9.5881,1.2418,9.8167,1.2418,9.8167,1.2984,9.5881,1.2984],"text":"Jan-22","confidence":1}]},{"boundingBox":[0.4393,1.3658,0.7068,1.3658,0.7068,1.4217,0.4393,1.4217]
How can we take columns headers and data and not any other things or may be we remove it use the file properly.just needs header and data
Expected output:
Motel:India Grade Name:India light crude oil India QUALITY
BL Date
4 people are following this question.
Azure Cognitive Services (Form Recogniser) keeps timing-out for the past two days
Trying to open a file and it says select the encoding that makes your document readable
Integrating Constant Contact with D365 Marketing Leads/Activities
Form Recognizer Sampling Tool unable to access files in Container