PDF Extractor

Shambhu Rai 1,406 Reputation points
2022-04-19T13:42:56.037+00:00

Hi Expert,

I wanted to use Form Recognizer to load the data from PDF files using pipeline is there any perquisites conditions or criteria to use the data fir curation and further transformation purpose

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,328 questions
{count} votes

Accepted answer
  1. YutongTie-MSFT 45,906 Reputation points
    2022-04-19T21:40:35.393+00:00

    Hello @Shambhu Rai

    Sure, I can provide you a custom training example in Form Recognizer Studio to see the requirement, life cycle and challenge.

    There are some points you should know before:

    1. Language Support, please check if your target language is support here: https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/language-support
    2. Supported document format, please check if your invoice is good for the format requirement: https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-model-overview#input-requirements 3. QuickStart guidance, you can refer to this guidance to try our product quick to see if it fulfill your need.
      https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/quickstarts/try-v3-form-recognizer-studio

    For the life cycle, the documents you upload will not change or disappear from the blob. After the training, the result will be in JSON file in the same blob as below screenshot.
    194461-image.png

    For challenge, for now I feel Form Recognizer is a good fit for common scenario. Based on my knowledge, some of the customer is suffering from the multipage table is not supported now.

    Please check above information and let me know if you have other concern, I am glad to help.

    Regards,
    Yutong

    -Please kindly accept the answer if you feel helpful to help the community, thanks a lot.

    0 comments No comments

5 additional answers

Sort by: Most helpful
  1. Shambhu Rai 1,406 Reputation points
    2022-04-19T16:55:09.23+00:00

    suggestion pls

    0 comments No comments

  2. Shambhu Rai 1,406 Reputation points
    2022-04-20T02:08:54.317+00:00

    How the data gets loaded after extraction in excel


  3. Shambhu Rai 1,406 Reputation points
    2022-04-20T02:20:30.267+00:00

    Hi Expert,

    I am talkin about output. How we can load it in blob or adl or excel and after extraction


  4. Shambhu Rai 1,406 Reputation points
    2022-04-27T15:26:12.72+00:00

    Hi Expert

    in the blob structure, I am connecting to Azure database and when i mapped the json i am getting status, ppage, bounding text columns, i want table column header and to map with sql server table column

    here is my code

    {"status":"succeeded","createdDateTime":"2022-04-27T14:15:46Z","lastUpdatedDateTime":"2022-04-27T14:15:48Z","analyzeResult":{"version":"2.1.0","readResults":[{"page":1,"angle":0,"width":11.6806,"height":8.2639,"unit":"inch","lines":[{"boundingBox":[0.1977,0.8568,1.166,0.8619,1.166,1.0647,0.1977,1.0596],"text":"intertek","appearance":{"style":{"name":"other","confidence":0.878}},"words":[{"boundingBox":[0.1977,0.8619,1.1052,0.8619,1.1103,1.0697,0.2028,1.0647],"text":"intertek","confidence":0.986}]},{"boundingBox":[4.19,1.0151,5.5355,1.0151,5.5355,1.0806,4.19,1.0806],"text":"India CRUDE OIL QUALITY REPORT","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[4.19,1.0151,4.4838,1.0151,4.4838,1.0717,4.19,1.0717],"text":"India","confidence":1},{"boundingBox":[4.509,1.0152,4.7505,1.0152,4.7505,1.0717,4.509,1.0717],"text":"CRUDE","confidence":1},{"boundingBox":[4.7771,1.0151,4.8915,1.0151,4.8915,1.0717,4.7771,1.0717],"text":"OIL","confidence":1},{"boundingBox":[4.9153,1.0151,5.2293,1.0151,5.2293,1.0806,4.9153,1.0806],"text":"QUALITY","confidence":1},{"boundingBox":[5.2558,1.0151,5.5355,1.0151,5.5355,1.0717,5.2558,1.0717],"text":"REPORT","confidence":1}]},{"boundingBox":[0.2028,1.085,0.8821,1.085,0.8821,1.161,0.2079,1.161],"text":"Total Quality. Assured.","appearance":{"style":{"name":"other","confidence":0.878}},"words":[{"boundingBox":[0.2079,1.09,0.3447,1.09,0.3498,1.1661,0.2129,1.1661],"text":"Total","confidence":0.994},{"boundingBox":[0.3599,1.09,0.5932,1.09,0.5932,1.1661,0.365,1.1661],"text":"Quality.","confidence":0.991},{"boundingBox":[0.6084,1.09,0.8821,1.09,0.8821,1.161,0.6084,1.1661],"text":"Assured.","confidence":0.94}]},{"boundingBox":[0.4893,1.2427,0.6554,1.2427,0.6554,1.2984,0.4893,1.2984],"text":"Motel:","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[0.4893,1.2427,0.6554,1.2427,0.6554,1.2984,0.4893,1.2984],"text":"Motel:","confidence":1}]},{"boundingBox":[0.9796,1.2418,1.2381,1.2418,1.2381,1.2984,0.9796,1.2984],"text":"Tony","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[0.9796,1.2418,1.2381,1.2418,1.2381,1.2984,0.9796,1.2984],"text":"Tony","confidence":1}]},{"boundingBox":[4.1871,1.2386,5.4172,1.2386,5.4172,1.3128,4.1871,1.3128],"text":"Grade Name: Tony light crude oil","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[4.1871,1.2387,4.399,1.2387,4.399,1.2984,4.1871,1.2984],"text":"Grade","confidence":1},{"boundingBox":[4.4282,1.2426,4.6542,1.2426,4.6542,1.2984,4.4282,1.2984],"text":"Name:","confidence":1},{"boundingBox":[4.6847,1.2427,4.9071,1.2427,4.9071,1.3128,4.6847,1.3128],"text":"Tony","confidence":1},{"boundingBox":[4.9339,1.2386,5.0874,1.2386,5.0874,1.3128,4.9339,1.3128],"text":"light","confidence":1},{"boundingBox":[5.1128,1.2387,5.3097,1.2387,5.3097,1.2984,5.1128,1.2984],"text":"crude","confidence":1},{"boundingBox":[5.336,1.2386,5.4172,1.2386,5.4172,1.2984,5.336,1.2984],"text":"oil","confidence":1}]},{"boundingBox":[8.5307,1.2386,9.1631,1.2386,9.1631,1.3128,8.5307,1.3128],"text":"Month Reporting:","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[8.5307,1.2386,8.7644,1.2386,8.7644,1.2984,8.5307,1.2984],"text":"Month","confidence":1},{"boundingBox":[8.7956,1.24,9.1631,1.24,9.1631,1.3128,8.7956,1.3128],"text":"Reporting:","confidence":1}]},{"boundingBox":[9.5881,1.2418,9.8167,1.2418,9.8167,1.2984,9.5881,1.2984],"text":"Jan-22","appearance":{"style":{"name":"other","confidence":1}},"words":[{"boundingBox":[9.5881,1.2418,9.8167,1.2418,9.8167,1.2984,9.5881,1.2984],"text":"Jan-22","confidence":1}]},{"boundingBox":[0.4393,1.3658,0.7068,1.3658,0.7068,1.4217,0.4393,1.4217]
    How can we take columns headers and data and not any other things or may be we remove it use the file properly.just needs header and data

    Expected output:
    Motel:India Grade Name:India light crude oil India QUALITY

    BL Date

    0 comments No comments