question

ShambhuRai-4099 avatar image
0 Votes"
ShambhuRai-4099 asked YutongTie-MSFT edited

PDF Extractor

Hi Expert,

I wanted to use Form Recognizer to load the data from PDF files using pipeline is there any perquisites conditions or criteria to use the data fir curation and further transformation purpose

azure-form-recognizer
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @ShambhuRai-4099

Thanks for reaching out to us here. Are you asking the commercial use permit for the result you got from Azure Form Recognizer Service?

Regards,
Yutong

0 Votes 0 ·

HI Expert,
No. just wanted to check the the prerequisites or once the data loaded in blob or adf...how we can manage the load or parameters
or how it will work after scannnin.. any lefe cycle example or challanges

https://www.invoicesimple.com/wp-content/uploads/2018/05/InvoiceSimple-PDF-Template.pdf

0 Votes 0 ·
YutongTie-MSFT avatar image
0 Votes"
YutongTie-MSFT answered YutongTie-MSFT edited

Hello @ShambhuRai-4099

Sure, I can provide you a custom training example in Form Recognizer Studio to see the requirement, life cycle and challenge.

There are some points you should know before:

  1. Language Support, please check if your target language is support here: https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/language-support

  2. Supported document format, please check if your invoice is good for the format requirement: https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-model-overview#input-requirements

  3. QuickStart guidance, you can refer to this guidance to try our product quick to see if it fulfill your need.
    https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/quickstarts/try-v3-form-recognizer-studio

For the life cycle, the documents you upload will not change or disappear from the blob. After the training, the result will be in JSON file in the same blob as below screenshot.
194461-image.png

For challenge, for now I feel Form Recognizer is a good fit for common scenario. Based on my knowledge, some of the customer is suffering from the multipage table is not supported now.

Please check above information and let me know if you have other concern, I am glad to help.

Regards,
Yutong

-Please kindly accept the answer if you feel helpful to help the community, thanks a lot.


image.png (17.3 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

ShambhuRai-4099 avatar image
0 Votes"
ShambhuRai-4099 answered

suggestion pls

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

ShambhuRai-4099 avatar image
0 Votes"
ShambhuRai-4099 answered YutongTie-MSFT commented

How the data gets loaded after extraction in excel

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@ShambhuRai-4099

Do you mean you want to use excel file as input? This is currently not supported. The format for input doc is as below:

Supported file formats: JPEG, PNG, BMP, TIFF, and PDF (text-embedded or scanned). Text-embedded PDFs are best to eliminate the possibility of error in character extraction and location.

Regards,
Yutong

0 Votes 0 ·
ShambhuRai-4099 avatar image
0 Votes"
ShambhuRai-4099 answered YutongTie-MSFT commented

Hi Expert,

I am talkin about output. How we can load it in blob or adl or excel and after extraction

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @ShambhuRai-4099

I hope I get the right point, if you are asking how the result load into blob, it depends on the feature you are using, most of them you can get the JSON result as below and download it directly.

Give feature General Document(https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-general-document) as an example, after analyzing, you can see the result in the right panel, and you can download it directly. In case you are not familiar with Form Recognizer Studio, I just made a screenshot guidance for you.

194493-321.png

Train your custom model is different, the ocr and label JSON file will be in your blob storage directly as we mentioned above. I will highly recommend you take a try for those feature, and let me know which feature you are interested in, I can share more details about it. Below is all the models we have:
194502-image.png

More reference you may be interested in:
https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-model-overview#model-overview
I hope this helps.

Regards,
Yutong


0 Votes 0 ·
321.png (413.1 KiB)
image.png (122.3 KiB)
ShambhuRai-4099 avatar image
0 Votes"
ShambhuRai-4099 answered ShambhuRai-4099 published

Hi Expert

in the blob structure, I am connecting to Azure database and when i mapped the json i am getting status, ppage, bounding text columns, i want table column header and to map with sql server table column

here is my code

{"status":"succeeded","createdDateTime":"2022-04-27T14:15:46Z","lastUpdatedDateTime":"2022-04-27T14:15:48Z","analyzeResult":{"version":"2.1.0","readResults":[{"page":1,"angle":0,"width":11.6806,"height":8.2639,"unit":"inch","lines":[{"boundingBox":[0.1977,0.8568,1.166,0.8619,1.166,1.0647,0.1977,1.0596],"text":"intertek","appearance":{"style":{"name":"other","confidence":0.878}},"words":[{"boundingBox":[0.1977,0.8619,1.1052,0.8619,1.1103,1.0697,0.2028,1.0647],"text":"intertek","confidence":0.986}]},{"boundingBox":[4.19,1.0151,5.5355,1.0151,5.5355,1.0806,4.19,1.0806],"text":"India CRUDE OIL QUALITY REPORT","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[4.19,1.0151,4.4838,1.0151,4.4838,1.0717,4.19,1.0717],"text":"India","confidence":1},{"boundingBox":[4.509,1.0152,4.7505,1.0152,4.7505,1.0717,4.509,1.0717],"text":"CRUDE","confidence":1},{"boundingBox":[4.7771,1.0151,4.8915,1.0151,4.8915,1.0717,4.7771,1.0717],"text":"OIL","confidence":1},{"boundingBox":[4.9153,1.0151,5.2293,1.0151,5.2293,1.0806,4.9153,1.0806],"text":"QUALITY","confidence":1},{"boundingBox":[5.2558,1.0151,5.5355,1.0151,5.5355,1.0717,5.2558,1.0717],"text":"REPORT","confidence":1}]},{"boundingBox":[0.2028,1.085,0.8821,1.085,0.8821,1.161,0.2079,1.161],"text":"Total Quality. Assured.","appearance":{"style":{"name":"other","confidence":0.878}},"words":[{"boundingBox":[0.2079,1.09,0.3447,1.09,0.3498,1.1661,0.2129,1.1661],"text":"Total","confidence":0.994},{"boundingBox":[0.3599,1.09,0.5932,1.09,0.5932,1.1661,0.365,1.1661],"text":"Quality.","confidence":0.991},{"boundingBox":[0.6084,1.09,0.8821,1.09,0.8821,1.161,0.6084,1.1661],"text":"Assured.","confidence":0.94}]},{"boundingBox":[0.4893,1.2427,0.6554,1.2427,0.6554,1.2984,0.4893,1.2984],"text":"Motel:","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[0.4893,1.2427,0.6554,1.2427,0.6554,1.2984,0.4893,1.2984],"text":"Motel:","confidence":1}]},{"boundingBox":[0.9796,1.2418,1.2381,1.2418,1.2381,1.2984,0.9796,1.2984],"text":"Tony","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[0.9796,1.2418,1.2381,1.2418,1.2381,1.2984,0.9796,1.2984],"text":"Tony","confidence":1}]},{"boundingBox":[4.1871,1.2386,5.4172,1.2386,5.4172,1.3128,4.1871,1.3128],"text":"Grade Name: Tony light crude oil","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[4.1871,1.2387,4.399,1.2387,4.399,1.2984,4.1871,1.2984],"text":"Grade","confidence":1},{"boundingBox":[4.4282,1.2426,4.6542,1.2426,4.6542,1.2984,4.4282,1.2984],"text":"Name:","confidence":1},{"boundingBox":[4.6847,1.2427,4.9071,1.2427,4.9071,1.3128,4.6847,1.3128],"text":"Tony","confidence":1},{"boundingBox":[4.9339,1.2386,5.0874,1.2386,5.0874,1.3128,4.9339,1.3128],"text":"light","confidence":1},{"boundingBox":[5.1128,1.2387,5.3097,1.2387,5.3097,1.2984,5.1128,1.2984],"text":"crude","confidence":1},{"boundingBox":[5.336,1.2386,5.4172,1.2386,5.4172,1.2984,5.336,1.2984],"text":"oil","confidence":1}]},{"boundingBox":[8.5307,1.2386,9.1631,1.2386,9.1631,1.3128,8.5307,1.3128],"text":"Month Reporting:","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[8.5307,1.2386,8.7644,1.2386,8.7644,1.2984,8.5307,1.2984],"text":"Month","confidence":1},{"boundingBox":[8.7956,1.24,9.1631,1.24,9.1631,1.3128,8.7956,1.3128],"text":"Reporting:","confidence":1}]},{"boundingBox":[9.5881,1.2418,9.8167,1.2418,9.8167,1.2984,9.5881,1.2984],"text":"Jan-22","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[9.5881,1.2418,9.8167,1.2418,9.8167,1.2984,9.5881,1.2984],"text":"Jan-22","confidence":1}]},{"boundingBox":[0.4393,1.3658,0.7068,1.3658,0.7068,1.4217,0.4393,1.4217]
How can we take columns headers and data and not any other things or may be we remove it use the file properly.just needs header and data

Expected output:
Motel:India Grade Name:India light crude oil India QUALITY

BL Date

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

ShambhuRai-4099 avatar image
0 Votes"
ShambhuRai-4099 answered

Suggestion please

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.