Form Recognizer (Preview)

Extracts information from forms and images into structured data based on a model created by a set of representative training forms.

This connector is available in the following products and regions:

Service Class Regions
Logic Apps Standard All Logic Apps regions except the following:
     -   Azure Government regions
     -   Azure China regions
Flow Standard All Flow regions except the following:
     -   US Government (GCC)
PowerApps Standard All PowerApps regions except the following:
     -   US Government (GCC)

Creating a connection

To connect your account, you will need the following information:

Name Type Description
Account Key securestring

Cognitive Services Account Key

Site URL string

Endpoint Url (Example: https://westeurope.api.cognitive.microsoft.com). If not specified Url will default to 'https://westeurope.api.cognitive.microsoft.com'.

Throttling Limits

Name Calls Renewal Period
API calls per connection10060 seconds

Actions

Analyze Form

The document to analyze must be of a supported content type - 'application/pdf', 'image/jpeg' or 'image/png'. The response contains not just the extracted information of the analyzed form but also information about content that was not extracted along with a reason.

Delete Model

Delete model artifacts.

Get Keys

Use the API to retrieve the keys that were extracted by the specified model.

Get Model

Get information about a model.

Get Models

Get information about all trained models

Train Model

The train request must include a source parameter that is either an externally accessible Azure Storage blob container Uri (preferably a Shared Access Signature Uri) or valid path to a data folder in a locally mounted drive. When local paths are specified, they must follow the Linux/Unix path format and be an absolute path rooted to the input mount configuration setting value e.g., if '{Mounts:Input}' configuration setting value is '/input' then a valid source path would be '/input/contosodataset'. All data to be trained are expected to be under the source. Models are trained using documents that are of the following content type - 'application/pdf', 'image/jpeg' and 'image/png'." Other content is ignored when training a model.

Analyze Form

The document to analyze must be of a supported content type - 'application/pdf', 'image/jpeg' or 'image/png'. The response contains not just the extracted information of the analyzed form but also information about content that was not extracted along with a reason.

Parameters

Name Key Required Type Description
Model ID
modelId True string

This is your Model Identifier that is used to analyze the document.

Keys to extract
Keys string

An optional list of known keys to extract the values for.

Document
Document True binary

A PDF document or image (JPG or PNG) file to analyze.

Content type
Content-type string

Content type of the document to analyze.

Returns

Analyze API call result.

Delete Model

Delete model artifacts.

Parameters

Name Key Required Type Description
Model ID
modelId True string

The identifier of the model to delete.

Get Keys

Use the API to retrieve the keys that were extracted by the specified model.

Parameters

Name Key Required Type Description
Model ID
modelId True string

Model identifier.

Returns

Result of an operation to get the keys extracted by a model.

Body
KeysResult

Get Model

Get information about a model.

Parameters

Name Key Required Type Description
Model ID
modelId True string

This is your Model Identifier that is used to analyze your documents with.

Returns

Result of a model status query operation.

Get Models

Get information about all trained models

Returns

Result of query operation to fetch multiple models.

Train Model

The train request must include a source parameter that is either an externally accessible Azure Storage blob container Uri (preferably a Shared Access Signature Uri) or valid path to a data folder in a locally mounted drive. When local paths are specified, they must follow the Linux/Unix path format and be an absolute path rooted to the input mount configuration setting value e.g., if '{Mounts:Input}' configuration setting value is '/input' then a valid source path would be '/input/contosodataset'. All data to be trained are expected to be under the source. Models are trained using documents that are of the following content type - 'application/pdf', 'image/jpeg' and 'image/png'." Other content is ignored when training a model.

Parameters

Name Key Required Type Description
source
source True string

Get or set source path.

Returns

Response of the Train API call.

Definitions

AnalyzeResult

Analyze API call result.

Name Path Type Description
errors
errors array of FormOperationError

List of errors reported during the analyze operation.

pages
pages array of ExtractedPage

Page level information extracted in the analyzed document.

status
status string

Status of the analyze operation.

ExtractedKeyValuePair

Representation of a key-value pair as a list of key and value tokens.

Name Path Type Description
key
key array of ExtractedToken

List of tokens for the extracted key in a key-value pair.

value
value array of ExtractedToken

List of tokens for the extracted value in a key-value pair.

ExtractedPage

Extraction information of a single page in a with a document.

Name Path Type Description
clusterId
clusterId integer

Cluster identifier.

height
height integer

Height of the page (in pixels).

keyValuePairs
keyValuePairs array of ExtractedKeyValuePair

List of Key-Value pairs extracted from the page.

number
number integer

Page number.

tables
tables array of ExtractedTable

List of Tables and their information extracted from the page.

width
width integer

Width of the page (in pixels).

ExtractedTable

Extraction information about a table contained in a page.

Name Path Type Description
columns
columns array of ExtractedTableColumn

List of columns contained in the table.

id
id string

Table identifier.

ExtractedTableColumn

Extraction information of a column in a table.

Name Path Type Description
entries
entries array of array

Extracted text for each cell of a column. Each cell in the column can have a list of one or more tokens.

items
entries array of ExtractedToken
header
header array of ExtractedToken

List of extracted tokens for the column header.

ExtractedToken

Canonical representation of single extracted text.

Name Path Type Description
boundingBox
boundingBox array of double

Bounding box of the extracted text. Represents the location of the extracted text as a pair of cartesian co-ordinates. The co-ordinate pairs are arranged by top-left, top-right, bottom-right and bottom-left endpoints box with origin reference from the bottom-left of the page.

confidence
confidence double

A measure of accuracy of the extracted text.

text
text string

String value of the extracted text.

FormDocumentReport

Name Path Type Description
documentName
documentName string

Reference to the data that the report is for.

errors
errors array of string

List of errors per page.

pages
pages integer

Total number of pages trained on.

status
status string

Status of the training operation.

FormOperationError

Error reported during an operation.

Name Path Type Description
errorMessage
errorMessage string

Message reported during the train operation.

KeysResult

Result of an operation to get the keys extracted by a model.

Name Path Type Description
clusters
clusters object

Object mapping ClusterIds to Key lists.

ModelResult

Result of a model status query operation.

Name Path Type Description
createdDateTime
createdDateTime date-time

Get or set the created date time of the model.

lastUpdatedDateTime
lastUpdatedDateTime date-time

Get or set the model last updated datetime.

modelId
modelId uuid

Get or set model identifier.

status
status string

Get or set the status of model.

ModelsResult

Result of query operation to fetch multiple models.

Name Path Type Description
models
models array of ModelResult

Collection of models.

TrainResult

Response of the Train API call.

Name Path Type Description
errors
errors array of FormOperationError

Errors returned during the training operation.

modelId
modelId uuid

Identifier of the model.

trainingDocuments
trainingDocuments array of FormDocumentReport

List of documents used to train the model and the train operation error reported by each.