Form Recognizer (Preview)

Extracts information from forms and images into structured data based on a model created by a set of representative training forms.
This connector is available in the following products and regions:
Service | Class | Regions |
---|---|---|
Logic Apps | Standard | All Logic Apps regions except the following: - Azure Government regions - Azure China regions |
Flow | Standard | All Flow regions except the following: - US Government (GCC) |
PowerApps | Standard | All PowerApps regions except the following: - US Government (GCC) |
Creating a connection
To connect your account, you will need the following information:
Name | Type | Description |
---|---|---|
Account Key | securestring |
Cognitive Services Account Key |
Site URL | string |
Endpoint Url (Example: https://westeurope.api.cognitive.microsoft.com). If not specified Url will default to 'https://westeurope.api.cognitive.microsoft.com'. |
Throttling Limits
Name | Calls | Renewal Period |
---|---|---|
API calls per connection | 100 | 60 seconds |
Actions
Analyze Form |
The document to analyze must be of a supported content type - 'application/pdf', 'image/jpeg' or 'image/png'. The response contains not just the extracted information of the analyzed form but also information about content that was not extracted along with a reason. |
Delete Model |
Delete model artifacts. |
Get Keys |
Use the API to retrieve the keys that were extracted by the specified model. |
Get Model |
Get information about a model. |
Get Models |
Get information about all trained models |
Train Model |
The train request must include a source parameter that is either an externally accessible Azure Storage blob container Uri (preferably a Shared Access Signature Uri) or valid path to a data folder in a locally mounted drive. When local paths are specified, they must follow the Linux/Unix path format and be an absolute path rooted to the input mount configuration setting value e.g., if '{Mounts:Input}' configuration setting value is '/input' then a valid source path would be '/input/contosodataset'. All data to be trained are expected to be under the source. Models are trained using documents that are of the following content type - 'application/pdf', 'image/jpeg' and 'image/png'." Other content is ignored when training a model. |
Analyze Form
The document to analyze must be of a supported content type - 'application/pdf', 'image/jpeg' or 'image/png'. The response contains not just the extracted information of the analyzed form but also information about content that was not extracted along with a reason.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Model ID
|
modelId | True | string |
This is your Model Identifier that is used to analyze the document. |
Keys to extract
|
Keys | string |
An optional list of known keys to extract the values for. |
|
Document
|
Document | True | binary |
A PDF document or image (JPG or PNG) file to analyze. |
Content type
|
Content-type | string |
Content type of the document to analyze. |
Returns
Analyze API call result.
- Body
- AnalyzeResult
Delete Model
Delete model artifacts.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Model ID
|
modelId | True | string |
The identifier of the model to delete. |
Get Keys
Use the API to retrieve the keys that were extracted by the specified model.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Model ID
|
modelId | True | string |
Model identifier. |
Returns
Result of an operation to get the keys extracted by a model.
- Body
- KeysResult
Get Model
Get information about a model.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Model ID
|
modelId | True | string |
This is your Model Identifier that is used to analyze your documents with. |
Returns
Result of a model status query operation.
- Body
- ModelResult
Get Models
Get information about all trained models
Returns
Result of query operation to fetch multiple models.
- Body
- ModelsResult
Train Model
The train request must include a source parameter that is either an externally accessible Azure Storage blob container Uri (preferably a Shared Access Signature Uri) or valid path to a data folder in a locally mounted drive. When local paths are specified, they must follow the Linux/Unix path format and be an absolute path rooted to the input mount configuration setting value e.g., if '{Mounts:Input}' configuration setting value is '/input' then a valid source path would be '/input/contosodataset'. All data to be trained are expected to be under the source. Models are trained using documents that are of the following content type - 'application/pdf', 'image/jpeg' and 'image/png'." Other content is ignored when training a model.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
source
|
source | True | string |
Get or set source path. |
Returns
Response of the Train API call.
- Body
- TrainResult
Definitions
AnalyzeResult
Analyze API call result.
Name | Path | Type | Description |
---|---|---|---|
errors
|
errors | array of FormOperationError |
List of errors reported during the analyze operation. |
pages
|
pages | array of ExtractedPage |
Page level information extracted in the analyzed document. |
status
|
status | string |
Status of the analyze operation. |
ExtractedKeyValuePair
Representation of a key-value pair as a list of key and value tokens.
Name | Path | Type | Description |
---|---|---|---|
key
|
key | array of ExtractedToken |
List of tokens for the extracted key in a key-value pair. |
value
|
value | array of ExtractedToken |
List of tokens for the extracted value in a key-value pair. |
ExtractedPage
Extraction information of a single page in a with a document.
Name | Path | Type | Description |
---|---|---|---|
clusterId
|
clusterId | integer |
Cluster identifier. |
height
|
height | integer |
Height of the page (in pixels). |
keyValuePairs
|
keyValuePairs | array of ExtractedKeyValuePair |
List of Key-Value pairs extracted from the page. |
number
|
number | integer |
Page number. |
tables
|
tables | array of ExtractedTable |
List of Tables and their information extracted from the page. |
width
|
width | integer |
Width of the page (in pixels). |
ExtractedTable
Extraction information about a table contained in a page.
Name | Path | Type | Description |
---|---|---|---|
columns
|
columns | array of ExtractedTableColumn |
List of columns contained in the table. |
id
|
id | string |
Table identifier. |
ExtractedTableColumn
Extraction information of a column in a table.
Name | Path | Type | Description |
---|---|---|---|
entries
|
entries | array of array |
Extracted text for each cell of a column. Each cell in the column can have a list of one or more tokens. |
items
|
entries | array of ExtractedToken | |
header
|
header | array of ExtractedToken |
List of extracted tokens for the column header. |
ExtractedToken
Canonical representation of single extracted text.
Name | Path | Type | Description |
---|---|---|---|
boundingBox
|
boundingBox | array of double |
Bounding box of the extracted text. Represents the location of the extracted text as a pair of cartesian co-ordinates. The co-ordinate pairs are arranged by top-left, top-right, bottom-right and bottom-left endpoints box with origin reference from the bottom-left of the page. |
confidence
|
confidence | double |
A measure of accuracy of the extracted text. |
text
|
text | string |
String value of the extracted text. |
FormDocumentReport
Name | Path | Type | Description |
---|---|---|---|
documentName
|
documentName | string |
Reference to the data that the report is for. |
errors
|
errors | array of string |
List of errors per page. |
pages
|
pages | integer |
Total number of pages trained on. |
status
|
status | string |
Status of the training operation. |
FormOperationError
Error reported during an operation.
Name | Path | Type | Description |
---|---|---|---|
errorMessage
|
errorMessage | string |
Message reported during the train operation. |
KeysResult
Result of an operation to get the keys extracted by a model.
Name | Path | Type | Description |
---|---|---|---|
clusters
|
clusters | object |
Object mapping ClusterIds to Key lists. |
ModelResult
Result of a model status query operation.
Name | Path | Type | Description |
---|---|---|---|
createdDateTime
|
createdDateTime | date-time |
Get or set the created date time of the model. |
lastUpdatedDateTime
|
lastUpdatedDateTime | date-time |
Get or set the model last updated datetime. |
modelId
|
modelId | uuid |
Get or set model identifier. |
status
|
status | string |
Get or set the status of model. |
ModelsResult
Result of query operation to fetch multiple models.
Name | Path | Type | Description |
---|---|---|---|
models
|
models | array of ModelResult |
Collection of models. |
TrainResult
Response of the Train API call.
Name | Path | Type | Description |
---|---|---|---|
errors
|
errors | array of FormOperationError |
Errors returned during the training operation. |
modelId
|
modelId | uuid |
Identifier of the model. |
trainingDocuments
|
trainingDocuments | array of FormDocumentReport |
List of documents used to train the model and the train operation error reported by each. |