document intelligence for different language

Prajwal Jayarama Gowda 0 Reputation points
2024-04-29T11:55:07.59+00:00

hi,

I am trying to use azure document intelligence for reading the invoice and extract text from the image which is in Hindi language the service is throwing me an error saying "(NotSupportedLanguage) The requested operation is not supported in the language specified." is there any alternate solution or service that I can use to scrape text.

Note: in this https://documentintelligence.ai.azure.com/studio/read it is able to detect the text which is in Hindi but while using key and endpoints it is throwing me an error.

regards,

Prajwal Gowda

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,425 questions
{count} votes

1 answer

Sort by: Most helpful
  1. dupammi 7,135 Reputation points Microsoft Vendor
    2024-04-29T13:46:33.9+00:00

    Hi @Prajwal Jayarama Gowda

    Thank you for using the Microsoft Q&A forum.

    Below is the repro I tried from my end using the Python SDK. See below code snippet taken from the studio itself.

    from azure.core.credentials import AzureKeyCredential
    from azure.ai.formrecognizer import DocumentAnalysisClient
    from azure.core.exceptions import ResourceNotFoundError
    # Azure Form Recognizer endpoint and key
    endpoint = "YOUR_END_POINT"
    key = "YOUR_KEY"
    # Path to the local image file
    image_path = "C://Users//Downloads//hindi.png"
    def format_bounding_box(bounding_box):
        if not bounding_box:
            return "N/A"
        return ", ".join(["[{}, {}]".format(p.x, p.y) for p in bounding_box])
    def analyze_read():
        try:
            document_analysis_client = DocumentAnalysisClient(
                endpoint=endpoint, credential=AzureKeyCredential(key)
            )
            with open(image_path, "rb") as image_file:
                poller = document_analysis_client.begin_analyze_document(
                    "prebuilt-read", image_file
                )
                result = poller.result()
            print("Document contains content: ", result.content)
            for idx, style in enumerate(result.styles):
                print(
                    "Document contains {} content".format(
                        "handwritten" if style.is_handwritten else "no handwritten"
                    )
                )
            for page in result.pages:
                print("----Analyzing Read from page #{}----".format(page.page_number))
                print(
                    "Page has width: {} and height: {}, measured with unit: {}".format(
                        page.width, page.height, page.unit
                    )
                )
                for line_idx, line in enumerate(page.lines):
                    bounding_box = getattr(line, "bounding_box", None)
                    print(
                        "...Line # {} has text content '{}' within bounding box '{}'".format(
                            line_idx,
                            line.content,
                            format_bounding_box(bounding_box),
                        )
                    )
                for word in page.words:
                    print(
                        "...Word '{}' has a confidence of {}".format(
                            word.content, word.confidence
                        )
                    )
            print("----------------------------------------")
        except ResourceNotFoundError:
            print("The specified file '{}' was not found.".format(image_path))
        except Exception as ex:
            print("An error occurred:", ex)
    if __name__ == "__main__":
        analyze_read()
    

    Output: User's image

    It's possible that the error message mentioned by you was encountered in a different context or with a different code snippet. From the studio provided code and output, there's no indication of such an error.

    If you continue to encounter the error, it would be helpful to verify if you were using the correct endpoint and key for the Azure Document Intelligence service, and that you were passing the image correctly for analysis. Additionally, you may want to double-check if you are specifying the language when calling the service.

    Note Language code optional

    • Document Intelligence's deep learning based universal models extract all multi-lingual text in your documents, including text lines with mixed languages, and don't require specifying a language code.
    • Don't provide the language code as the parameter unless you are sure of the language and want to force the service to apply only the relevant model. Otherwise, the service may return incomplete and incorrect text.
    • Also, It's not necessary to specify a locale. This is an optional parameter. The Document Intelligence deep-learning technology will auto-detect the text language in your image.

    For more details and language support for printed and hand-written using prebuilt read and layout, please refer here.I hope the provided information helps in further debugging your issue at your end.

    Thank you.

    0 comments No comments