Create your first model

Completed

The following procedures show you how to create a Document processing model in AI Builder. This guided experience walks through each step of the model creation process. You can save your work and return at any time. Progress will be saved automatically when you move between steps.

Sign in to AI Builder

Follow these steps to sign in to AI Builder:

  1. Go to Power Automate and sign in with your organizational account.

  2. In the left navigation pane, select AI Hub, then AI Models. If you don't see the option for AI Hub, select the More option and you should be able to find AI Hub in the popup.

  3. Select Extract custom information from documents.

  4. If you want to create your model by using your own documents, make sure that you have at least five examples that use the same layout. Otherwise, you can use sample data we'll be using in this guided experience. You can download the sample data in English version or in Japanese version

  5. Select Create custom model.

Choose document type

When selecting the document type there are three options:

  • Structured and semi-structured documents. In structured and semi-structured documents, the fields, tables, checkboxes, and other items for a given layout are in similar places. Examples of structured and semi-structured documents include invoices, purchase orders, delivery orders, and tax documents.

  • Unstructured and free-form documents. In unstructured documents, there's no set structure, usually with a varying number of paragraphs. Examples of unstructured documents are contracts, statements of work, letters, etc.

  • Invoices. Invoice documents are standard account payable forms. This model type comes with standards fields and you can teach this model to extract additional custom data or update the standard data. Examples of such documents include invoices and purchase orders.

Select Structured documents and select Next.

Screenshot of the AI Builder "Choose document type" page where to choose between structured or unstructured documents or invoices.

Choose information to extract

In this step, we define the fields and tables you want to teach your model how to extract.

The provided sample data in English version or in Japanese version are invoices from two different providers. We'll define the following fields to extract:

  • Invoice number

  • Customer ID

  • Total amount

  • Due date

  1. Select + Add and select Text field. Then select Next.

    Screenshot of the Power Automate "Choose information to extract" page adding four fields on the Text Field tab.

  2. Enter the Text field name Invoice Number and select Done, repeat this step for Customer ID.

  3. Select + Add and select Number field (preview). Then select Next.

  4. Enter the Number field name Total amount and select Done.

  5. Select + Add and then select Date field (preview).

  6. Enter the Date field name Due Date and select Done.

    The model learns how to extract these fields from a document.

    Screenshot of the Power Automate "Choose information to extract" page about four fields name to be extracted from the document.

    We also want to extract the description and total amount for each line item present on the invoice. To do that, we define a table names Items with the columns Description and Item total.

  7. Select Add and Table, and then Next.

  8. Define as table name Items. Next, we define two columns: Description and Item total.

  9. Select Column1 and then rename it Description. Select Confirm.

  10. Select + New column, enter the column name Item total. Then select Add. Finally select Done.

  11. Select Next to continue to the next step in your model.

Define collections and upload documents

A collection is a group of documents that share the same layout. Create as many collections as documents with different layout that you want your model to process. Since we have two invoice providers, and each invoice provider uses a different invoice template, we define two collections.

  1. Select New collection and change the name of the first collection to Adatum.

  2. Add a second New collection and name the second collection Contoso.

    Now that we created our two collections, we need to upload at least five samples for each collection.

    For the collection named Adatum, upload the five documents from the AI Builder Document processing Sample Data/Adatum/Train folder. You'll do the same with the Contoso training documents.

  3. Select the + icon in each collection, and add the five "Train" documents for each company to their respective collections.

  4. Once you've uploaded the sample documents to each collection, select Next to continue.

Tag documents

Now is the time to teach your AI model how to extract the fields and tables you've defined. Begin by tagging the sample documents you've uploaded. As you tag (or annotate) all of the expected fields in each document, you will see a check appear over that document and the red dot at the top corner will disappear.

To start the tagging process, select the Contoso collection on the right panel.

Tag fields

Let’s start by tagging our defined fields Invoice number, Due date, and Total amount. To tag a field, draw a rectangle around the field on the document and select the field name it corresponds to.

Screenshot of the Power Automate "Tag documents" page drawing a rectangle around a field.

You can resize to adjust your selection at any time.

When you hover over different words in your documents, light blue boxes appear. The boxes indicate that you can draw a rectangle around those words to select a field.

Field or table not in document

Not all defined fields and tables need to necessarily be in all documents. In the Contoso collection, you'll see that the Customer ID field isn't present. You can tell the AI model that fields aren't present by going to the field or table on the right panel and selecting the ellipsis (...) to the right of the field and then selecting ‘Not available in the document’.

Tag tables

To tag a table:

  1. Draw a rectangle around the table in the document you're interested in, and then select the table name that it corresponds to.

    The content of the panel on the right changes.

  2. Draw rows by left clicking between row separators.

  3. Draw columns by pressing Ctrl + left-click (or ⌘ left-click on macOS).

  4. Once the rows and columns have been set, assign the headers to extract by selecting the header column and mapping it to the desired one.

  5. A preview of how the table with the extracted data appears on the panel on the right.

  6. If the header of the table has been tagged, select Ignore first row so the header of the table isn't extracted as the table content.

Tag all documents

Once you have finished tagging one document, move to the next one to tag by clicking the navigation arrows below the document preview on the top right.

Once you have finished tagging one collection, navigate back to the collection list to tag the second collection.

Model summary and train

After you've tagged all documents across all collections, follow these steps:

  1. Select the Next button at the bottom of the screen.

  2. Review the Model summary. Under Information to extract you'll see that Customer ID and Due Date only appeared in five examples out of 10, whereas everything else appeared in all 10 examples.

  3. If everything looks acceptable, select Train.

Next steps

Now that you've created a Document processing model in AI Builder, you'll learn how to test your model and use it in Power Apps and Power Automate.