Quickstart: Form Recognizer client library for .NET

Get started with the Form Recognizer client library for .NET. Form Recognizer is a Cognitive Service that uses machine learning technology to identify and extract key/value pairs and table data from form documents. It then outputs structured data that includes the relationships in the original file. Follow these steps to install the SDK package and try out the example code for basic tasks.

Use the Form Recognizer client library for .NET to:

Reference documentation | Library source code | Package (NuGet)

Prerequisites

Setting up

Create a Form Recognizer Azure resource

When you're granted access to use Form Recognizer, you'll receive a Welcome email with several links and resources. Use the "Azure portal" link in that message to open the Azure portal and create a Form Recognizer resource. In the Create pane, provide the following information:

Name A descriptive name for your resource. We recommend using a descriptive name, for example MyNameFormRecognizer.
Subscription Select the Azure subscription which has been granted access.
Location The location of your cognitive service instance. Different locations may introduce latency, but have no impact on the runtime availability of your resource.
Pricing tier The cost of your resource depends on the pricing tier you choose and your usage. For more information, see the API pricing details.
Resource group The Azure resource group that will contain your resource. You can create a new group or add it to a pre-existing group.

Important

Normally when you create a Cognitive Service resource in the Azure portal, you have the option to create a multi-service subscription key (used across multiple cognitive services) or a single-service subscription key (used only with a specific cognitive service). However, because Form Recognizer is a preview release, it is not included in the multi-service subscription, and you cannot create the single-service subscription unless you use the link provided in the Welcome email.

When your Form Recognizer resource finishes deploying, find and select it from the All resources list in the portal. Then select the Keys tab to view your subscription keys. Either key will give your app access to the resource. Copy the value of KEY 1.

After you get a key from your trial subscription or resource, create an environment variable for the key, named FORM_RECOGNIZER_KEY.

Create a new C# application

In a console window (such as cmd, PowerShell, or Bash), use the dotnet new command to create a new console app with the name formrecognizer-quickstart. This command creates a simple "Hello World" C# project with a single source file: Program.cs.

dotnet new console -n formrecognizer-quickstart

Change your directory to the newly created app folder. Then build the application with:

dotnet build

The build output should contain no warnings or errors.

...
Build succeeded.
 0 Warning(s)
 0 Error(s)
...

From the project directory, open the Program.cs file in your preferred editor or IDE. Add the following using statements:

using Microsoft.Azure.CognitiveServices.FormRecognizer;
using Microsoft.Azure.CognitiveServices.FormRecognizer.Models;

using System;
using System.IO;
using System.Threading.Tasks;

Then add the following code in the application's Main method. You'll define this asynchronous task later on.

static void Main(string[] args)
{
    var t1 = RunFormRecognizerClient();

    Task.WaitAll(t1);
}

Install the client library

Within the application directory, install the Form Recognizer client library for .NET with the following command:

dotnet add package Microsoft.Azure.CognitiveServices.FormRecognizer --version 0.8.0-preview

If you're using the Visual Studio IDE, the client library is available as a downloadable NuGet package.

Object model

The following classes handle the main functionality of the Form Recognizer SDK.

Name Description
FormRecognizerClient This class is needed for all Form Recognizer functionality. You instantiate it with your subscription information, and you use it to produce instances of other classes.
TrainRequest You use this class to train a custom Form Recognizer model using your own training input data.
TrainResult This class delivers the results of a custom model Train operation, including the model ID, which you can then use to analyze forms.
AnalyzeResult This class delivers the results of a custom model Analyze operation. It includes a list of ExtractedPage instances.
ExtractedPage This class represents all of the data extracted from a single form document.

Code examples

These code snippets show you how to do the following tasks with the Form Recognizer client library for .NET:

Define variables

Before you define any methods, add the following variable definitions to the top of your Program class. You'll need to fill in some of the variables yourself.

  • You can find your service's Endpoint value in the Overview section in the Azure portal.
  • To retrieve the SAS URL for your training data, open the Microsoft Azure Storage Explorer, right-click your container, and select Get shared access signature. Make sure the Read and List permissions are checked, and click Create. Then copy the value in the URL section. It should have the form: https://<storage account>.blob.core.windows.net/<container name>?<SAS value>.
  • If you need a sample form to analyze, you can use one of the files under the Test folder of the sample data set. This guide only uses PDF forms.
// // Add your Azure Form Recognizer subscription key and endpoint to your environment variables.
private static string subscriptionKey = Environment.GetEnvironmentVariable("FORM_RECOGNIZER_SUBSCRIPTION_KEY");
private static string formRecognizerEndpoint = Environment.GetEnvironmentVariable("FORM_RECOGNIZER__ENDPOINT");

// SAS Url to Azure Blob Storage container; this used for training the custom model
// For help using SAS see: 
// https://docs.microsoft.com/en-us/azure/storage/common/storage-dotnet-shared-access-signature-part-1
private const string trainingDataUrl = "<AzureBlobSaS>";

// Local path to a form to be analyzed
// Any one or all of file formats (pdf,jpg or png)can be used with a trained model. 
// For example,  
//  pdf file  : "c:\documents\invoice.pdf" 
//  jpeg file : "c:\documents\invoice.jpg"
//  png file  : "c:\documents\invoice.png"
private const string pdfFormFile = @"<pdfFormFileLocalPath>";
private const string jpgFormFile = @"<jpgFormFileLocalPath>";
private const string pngFormFile = @"<pngFormFileLocalPath>";

Authenticate the client

Below the Main method, define the task that is referenced in Main. Here, you'll authenticate the client object using the subscription variables you defined above. You'll define the other methods later on.

static async Task RunFormRecognizerClient()
{ 
    // Create form client object with Form Recognizer subscription key
    IFormRecognizerClient formClient = new FormRecognizerClient(
        new ApiKeyServiceClientCredentials(subscriptionKey))
    {
        Endpoint = formRecognizerEndpoint
    };

    Console.WriteLine("Train Model with training data...");
    Guid modelId = await TrainModelAsync(formClient, trainingDataUrl);

    Console.WriteLine("Get list of extracted keys...");
    await GetListOfExtractedKeys(formClient, modelId);

    // Choose any of the following three Analyze tasks:

    Console.WriteLine("Analyze PDF form...");
    await AnalyzePdfForm(formClient, modelId, pdfFormFile);

    //Console.WriteLine("Analyze JPEG form...");
    //await AnalyzeJpgForm(formClient, modelId, jpgFormFile);

    //Console.WriteLine("Analyze PNG form...");
    //await AnalyzePngForm(formClient, modelId, pngFormFile);


    Console.WriteLine("Get list of trained models ...");
    await GetListOfModels(formClient);

    Console.WriteLine("Delete Model...");
    await DeleteModel(formClient, modelId);
}

Train a custom model

The following method uses your Form Recognizer client object to train a new recognition model on the documents stored in your Azure blob container. It uses a helper method to display information about the newly trained model (represented by a ModelResult object), and it returns the model ID.

// Train model using training form data (pdf, jpg, png files)
private static async Task<Guid> TrainModelAsync(
    IFormRecognizerClient formClient, string trainingDataUrl)
{
    if (!Uri.IsWellFormedUriString(trainingDataUrl, UriKind.Absolute))
    {
        Console.WriteLine("\nInvalid trainingDataUrl:\n{0} \n", trainingDataUrl);
        return Guid.Empty;
    }

    try
    {
        TrainResult result = await formClient.TrainCustomModelAsync(new TrainRequest(trainingDataUrl));

        ModelResult model = await formClient.GetCustomModelAsync(result.ModelId);
        DisplayModelStatus(model);

        return result.ModelId;
    }
    catch (ErrorResponseException e)
    {
        Console.WriteLine("Train Model : " + e.Message);
        return Guid.Empty;
    }
}

The following helper method displays information about a Form Recognizer model.

// Display model status
private static void DisplayModelStatus(ModelResult model)
{
    Console.WriteLine("\nModel :");
    Console.WriteLine("\tModel id: " + model.ModelId);
    Console.WriteLine("\tStatus:  " + model.Status);
    Console.WriteLine("\tCreated: " + model.CreatedDateTime);
    Console.WriteLine("\tUpdated: " + model.LastUpdatedDateTime);
}

Get a list of extracted keys

Once training is completed, the custom model will keep a list of keys that it has extracted from the training documents. It expects future form documents to contain these keys, and it will extract their corresponding values in the Analyze operation. Use the following method to retrieve the list of extracted keys and print it to the console. This is a good way to verify that the training process was effective.

// Get and display list of extracted keys for training data 
// provided to train the model
private static async Task GetListOfExtractedKeys(
    IFormRecognizerClient formClient, Guid modelId)
{
    if (modelId == Guid.Empty)
    {
        Console.WriteLine("\nInvalid model Id.");
        return;
    }

    try
    {
        KeysResult kr = await formClient.GetExtractedKeysAsync(modelId);
        var clusters = kr.Clusters;
        foreach (var kvp in clusters)
        {
            Console.WriteLine("  Cluster: " + kvp.Key + ""); 
            foreach (var v in kvp.Value)
            {
                Console.WriteLine("\t" + v);
            }
        }
    }
    catch (ErrorResponseException e)
    {
        Console.WriteLine("Get list of extracted keys : " + e.Message);
    }
}

Analyze forms with a custom model

This method uses the Form Recognizer client and a model ID to analyze a PDF form document and extract key/value data. It uses a helper method to display the results (represented by a AnalyzeResult object).

Note

The following method analyzes a PDF form. For similar methods that analyze JPEG and PNG forms, see the full sample code on GitHub.

// Analyze PDF form data
private static async Task AnalyzePdfForm(
    IFormRecognizerClient formClient, Guid modelId, string pdfFormFile)
{
    if (string.IsNullOrEmpty(pdfFormFile))
    {
        Console.WriteLine("\nInvalid pdfFormFile.");
        return;
    }

    try
    {
        using (FileStream stream = new FileStream(pdfFormFile, FileMode.Open))
        {
            AnalyzeResult result = await formClient.AnalyzeWithCustomModelAsync(modelId, stream, contentType: "application/pdf");

            Console.WriteLine("\nExtracted data from:" + pdfFormFile);
            DisplayAnalyzeResult(result);
        }
    }
    catch (ErrorResponseException e)
    {
        Console.WriteLine("Analyze PDF form : " + e.Message);
    }
    catch (Exception ex)
    {
        Console.WriteLine("Analyze PDF form : " + ex.Message);
    }
}

The following helper method displays information about an Analyze operation.

// Display analyze status
private static void DisplayAnalyzeResult(AnalyzeResult result)
{
    foreach (var page in result.Pages)
    {
        Console.WriteLine("\tPage#: " + page.Number);
        Console.WriteLine("\tCluster Id: " + page.ClusterId);
        foreach (var kv in page.KeyValuePairs)
        {
            if (kv.Key.Count > 0)
                Console.Write(kv.Key[0].Text);

            if (kv.Value.Count > 0)
                Console.Write(" - " + kv.Value[0].Text);

            Console.WriteLine();
        }
        Console.WriteLine();

        foreach (var t in page.Tables)
        {
            Console.WriteLine("Table id: " + t.Id);
            foreach (var c in t.Columns)
            {
                foreach (var h in c.Header)
                    Console.Write(h.Text + "\t");

                foreach (var e in c.Entries)
                {
                    foreach (var ee in e)
                        Console.Write(ee.Text + "\t");
                }
                Console.WriteLine();
            }
            Console.WriteLine();
        }
    }
}

Get a list of custom models

You can return a list of all the trained models that belong to your account, and you can retrieve information about when they were created. The list of models is represented by a ModelsResult object.

// Get and display list of trained the models
private static async Task GetListOfModels(
    IFormRecognizerClient formClient)
{
    try
    {
        ModelsResult models = await formClient.GetCustomModelsAsync();
        foreach (ModelResult m in models.ModelsProperty)
        {
            Console.WriteLine(m.ModelId + " " + m.Status + " " + m.CreatedDateTime + " " + m.LastUpdatedDateTime);
        }
        Console.WriteLine();
    }
    catch (ErrorResponseException e)
    {
        Console.WriteLine("Get list of models : " + e.Message);
    }
}

Delete a custom model

If you want to delete the custom model from your account, use the following method:

// Delete a model
private static async Task DeleteModel(
    IFormRecognizerClient formClient, Guid modelId)
{
    try
    {
        Console.Write("Deleting model: {0}...", modelId.ToString());
        await formClient.DeleteCustomModelAsync(modelId);
        Console.WriteLine("done.\n");
    }
    catch (ErrorResponseException e)
    {
        Console.WriteLine("Delete model : " + e.Message);
    }
}

Run the application

Run the application by calling the dotnet run command from your application directory.

dotnet run

Clean up resources

If you want to clean up and remove a Cognitive Services subscription, you can delete the resource or resource group. Deleting the resource group also deletes any other resources associated with it.

Additionally, if you trained a custom model that you want to delete from your account, run the method in Delete a custom model.

Next steps

In this quickstart, you used the Form Recognizer .NET client library to train a custom model and analyze forms. Next, learn tips to create a better training data set and produce more accurate models.