question

ShambhuRai-4099 avatar image
0 Votes"
ShambhuRai-4099 asked romungi-MSFT commented

Model id change

Hi Expert,

I am trying to use model id for form recognizer py script. Here is my code . currently I am using endpoint url and api key and wanted to export data. how can use it .. here is the code



 from django.shortcuts import render
 import os
 from django.http import HttpResponse
 import csv
 import re
 from azure.core.credentials import AzureKeyCredential
 from azure.ai.formrecognizer import FormRecognizerClient
 from azure.storage.blob import BlobClient
    
    
 # Create your views here.
    
 def download_blob(blob_name, output_path):
     """
     Download
     :param blob_name:
     :param output_path:
     :return:
     """
     _, filename = os.path.split(blob_name)
     destination_file = os.path.join(output_path, filename)
    
     blob_client = BlobClient.from_connection_string(
         conn_str='DefaultEndpointsProtocol=https;AccountName=demoretail;AccountKey=jSZtsbMoGpmViFuWtTXDwEJEktIs24oUAIPSz9tSiZ25zCPe0mFRWC6V0gvlZCcGU0HcxCTdV1GsAl5vMwnanA==;EndpointSuffix=core.windows.net',
         container_name='demo',
         blob_name=blob_name
     )
     with open(destination_file, "wb") as my_blob:
         blob_data = blob_client.download_blob()
         blob_data.readinto(my_blob)
    
     return destination_file
    
    
 def recognize_form_tables(form_path):
     endpoint = https://Test1.cognitiveservices.azure.com/
     credential = AzureKeyCredential("<key>")
     form_recognizer_client = FormRecognizerClient(endpoint, credential)
    
     with open(form_path, "rb") as fd:
         form = fd.read()
    
     os.remove(form_path)
    
     response = form_recognizer_client.begin_recognize_content(form)
     form_pages = response.result()
    
     tables = []
     table_label_data = []
     port_regex = '^col1:(.*)'
     header_regex = '.*col1:(.*)Area Name:(.*)Month Reporting:\s*([A-Za-z]{3}-[0-9]{2}).*'
    
     table_index = -1
     for content in form_pages:
         for table in content.tables:
             tables.append(table)
    
         table_header = ''
         i = 0
         flag = False
         for line_idx, line in enumerate(content.lines):
             port_line = re.findall(port_regex, line.text)
             if port_line:
                 table_index += 1
                 i = 0
                 flag = True
    
             if flag and i < 10 :
                 table_header += line.text + ' '
    
             if i == 10:
                 header_match = re.match(header_regex, table_header)
                 if header_match:
                     gr = header_match.groups()
                     table_label_data.append([gr[0], gr[1], gr[2]])
                 table_header = ''
                 flag = False
    
             i += 1
    
     return tables, table_label_data
    
    
 def create_csv(table, path):
     with open(path, 'a') as f:
         writer = csv.writer(f)
         for row in table:
             if len(row) < 10 or not row[3]:
                 continue
             writer.writerow(row)
    
    
 def create_csv_data(tables, table_label_data):
     count = 0
     for t in tables:
         count += 1
         table_data = []
         row_index = -1
         for cell in t.cells:
             cell = cell.to_dict()
    
             if count > 1 and 'is_header' in cell and cell['is_header']:
                 continue
             elif cell['row_index'] == row_index or (count > 1 and cell['row_index'] == row_index + 1):
                 table_data[row_index].append(cell['text'])
             else:
                 row_index += 1
                 if 'is_header' in cell and cell['is_header']:
                     table_data.append(['Port', 'Area Name', 'Month Reporting'])
                 else:
                     table_data.append([])
                     if len(table_label_data) > count:
                         table_data[row_index] = table_label_data[count - 1] + table_data[row_index]
                 table_data[row_index].append(cell['text'])
    
         create_csv(table_data, f'table.csv')
     print('Created or updated table.csv file.')
    
    
 def index(request):
     form_path = download_blob('Test.pdf', '')
     tables, table_label_data = recognize_form_tables(form_path)
     print('form recognize success')
     create_csv_data(tables, table_label_data)
     with open('table.csv', newline='') as in_file:
         with open('Test.csv', 'w', newline='') as out_file:
             writer = csv.writer(out_file)
             for row in csv.reader(in_file):
                 if row:
                     writer.writerow(row)
     return HttpResponse("Load Succeeded")
azure-form-recognizer
· 6
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@ShambhuRai-4099 Method form_recognizer_client.begin_recognize_content() will recognize text, selection marks, and table structures, along with their bounding box coordinates, from documents but it cannot be used to pass model id to extract text from custom forms. You will need to switch to form_recognizer_client.begin_recognize_custom_forms(model_id,form,include_field_elements) to pass your custom model id.

A ready to use script is available in the Azure SDK for python repo. Please update the endpoint, keys and model_id to analyze your form. You can also switch the method in your script but ensure to process the result based on the format of the response with custom forms. I hope this helps!!


0 Votes 0 ·

Okay but i want to export that in excel sheet .. could you tell me the script which can store data in excel sheet including label as column in tabular format

0 Votes 0 ·

There is no utility script that is available for this scenario. However, with the labeling tool v2.1 there is an option to download result as csv.

200976-image.png

I think it is easier to use the JSON result from the REST API or the tool and convert it to CSV using any 3rd party tool. There was a similar request to convert the client result to JSON for text analytics and the solution provided in this thread worked for the user. If you are looking to automate this then I would start with downloading the result to JSON and then look at converting it to csv, based on the structure of the result since the custom form result could vary based on the labels used and identified during training and prediction.


0 Votes 0 ·
image.png (12.3 KiB)

I am talking about script input and output

0 Votes 0 ·

in your script link .. can you tell me where we can mention the file path for import and export

https://github.com/Azure/azure-sdk-for-python/blob/azure-ai-formrecognizer_3.1.2/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_recognize_custom_forms.py


with open(path_to_sample_forms, "rb") as f:


0 Votes 0 ·

The script is using the form from sample_forms directory and the form Form_1.jpg
You can directly provide path to your form here instead.

 path_to_sample_forms = os.path.abspath(os.path.join(os.path.abspath(__file__),
                                                     "..", "./sample_forms/forms/Form_1.jpg"))


0 Votes 0 ·

0 Answers