Quickstart: Custom Named Entity Recognition (preview)

In this article, we use the Language studio to demonstrate key concepts of custom Named Entity Recognition (NER). As an example we’ll build a custom NER model to extract relevant entities from loan agreements.

Prerequisites

Create a new Azure resource and Azure Blob Storage account

Before you can use custom NER, you’ll need to create an Azure Language resource, which will give you the credentials that you need to create a project and start training a model. You’ll also need an Azure storage account, where you can upload your dataset that will be used to building your model.

Important

To get started quickly, we recommend creating a new Azure Language resource using the steps provided in this article, which will let you create the resource, and configure a storage account at the same time, which is easier than doing it later.

If you have a pre-existing resource you'd like to use, you will need to configure it and a storage account separately. See create project for information.

  1. Go to the Azure portal to create a new Azure Language resource. If you're asked to select additional features, select Custom text classification & custom NER. When you create your resource, ensure it has the following parameters.

    Azure resource requirement Required value
    Location "West US 2" or "West Europe"
    Pricing tier Standard (S) pricing tier
  2. In the Custom Named Entity Recognition (NER) & Custom Classification (Preview) section, select an existing storage account or select Create a new storage account. Note that these values are for this quickstart, and not necessarily the storage account values you’ll want to use in production environments.

    Storage account value Recommended value
    Name Any name
    Performance Standard
    Account kind Storage (general purpose v1)
    Replication Locally redundant storage (LRS)
    Location Any location closest to you, for best latency.

Upload sample data to blob container

After you have created an Azure storage account and linked it to your Language resource, you will need to upload the example files to the root directory of your container for this quickstart. These files will later be used to train your model.

  1. Download the example data for this quickstart from GitHub. Open the .zip file, and extract the folder containing text files within it.

  2. In the Azure portal, navigate to the storage account you created, and select it.

  3. In your storage account, select Containers from the left menu, located below Data storage. On the screen that appears, select + Container. Give the container the name example-data and leave the default Public access level.

    A screenshot showing the main page for a storage account.

  4. After your container has been created, click on it. Then select the Upload button to select the .txt and .json files you downloaded earlier.

    A screenshot showing the button for uploading files to the storage account.

    Tip

    When you select files to upload, a file explorer will open to your computer. To select all the files in the folder, press ctrl + a.

The provided sample dataset contains 20 loan agreements, each agreement includes two parties: a lender and a borrower. You can use the provided sample file to extract relevant information for both parties, an agreement date, a loan amount, and an interest rate.

Create a custom named entity recognition project

Once your resource and storage container are configured, create a new custom NER project. A project is a work area for building your custom AI models based on your data. Your project can only be accessed by you and others who have access to the Azure resource being used.

  1. Sign into the Language Studio portal. A window will appear to let you select your subscription and Language resource. Select the resource you created in the above step.

  2. Find the Entity extraction section, and select Custom named entity recognition from the available services.

    A screenshot showing the location of custom NER in the Language Studio landing page.

  3. Select Create new project from the top menu in your projects page. Creating a project will let you tag data, train, evaluate, improve, and deploy your models.

    A screenshot of the project creation page.

  4. After you click, Create new project, a screen will appear to let you connect your storage account. If you can’t find your storage account, make sure you created a resource using the steps above.

    Note

    • You only need to do this step once for each new resource you use.
    • This process is irreversible, if you connect a storage account to your resource you cannot disconnect it later.
    • You can only connect your resource to one storage account.
    • If you've already connected a storage account, you will see a Enter basic information screen instead. See the next step.

    A screenshot showing the storage connection screen.

  5. Enter the project information, including a name, description, and the language of the files in your project. You won’t be able to change the name of your project later.

  6. Select the container where you’ve uploaded your data.

Under Are your files already tagged with entities, select Yes and choose the available file. Then click Next. Review the data you entered and select Create Project.

Train your model

Typically after you create a project, you would import your data and begin tagging the entities within it to train the classification model. For this quickstart, you’ll use the example tagged data file you downloaded earlier, and stored in your Azure storage account.

A model is the machine learning object that will be trained to classify text. Your model will learn from the example data, and be able to classify loan agreements afterwards.

To start training your model:

  1. Select Train from the left side menu.

  2. Select Train a new model and type in the model name in the text box below.

    A screenshot showing the model selection page for training

  3. Click on the Train button at the bottom of the page.

    Note

    • While training, the data will be spilt into 2 sets for training and testing the model. See how to train a model for more information.
    • Training can take up to a few hours.

Deploy your model

Generally after training a model you would review it's evaluation details and make improvements if necessary. In this quickstart, you’ll just deploy your model, and make it available for you to try.

After your model is trained, you can deploy it. Deploying your model lets you start using it to extract named entities, using Analyze API.

  1. Go to your project in Language studio.

  2. From the left panel, select Deploy model.

  3. Click on Add deployment to submit a new deployment job.

    A screenshot showing the deployment button

  4. In the window that appears, you can create a new deployment name or override an existing one. Then, you can add a trained model to this deployment name.

    A screenshot showing the deployment screen

Test your model

After your model is deployed, you can start using it for entity extraction. Use the following steps to send your first entity extraction request.

  1. Select Test model from the left side menu.

  2. Select the model you want to test.

  3. Using one of the files you downloaded earlier, add the file's text to the textbox. You can also upload a .txt file.

  4. Click on Run the test.

  5. In the Result tab, you can see the extracted entities from your text and their types. You can also view the JSON response under the JSON tab.

    View the test results

Clean up resources

When you don't need your project anymore, you can delete your project using Language Studio. Select Custom Named Entity Recognition (NER) in the left navigation menu, select project you want to delete and click on Delete.

Prerequisites

Create a new Azure resource and Azure Blob Storage account

Before you can use custom NER, you’ll need to create an Azure Language resource, which will give you the credentials that you need to create a project and start training a model. You’ll also need an Azure storage account, where you can upload your dataset that will be used to building your model.

Important

To get started quickly, we recommend creating a new Azure Language resource using the steps provided in this article, which will let you create the resource, and configure a storage account at the same time, which is easier than doing it later.

If you have a pre-existing resource you'd like to use, you will need to configure it and a storage account separately. See project creation article for information.

  1. Go to the Azure portal to create a new Azure Language resource. If you're asked to select additional features, select Custom text classification & custom NER. When you create your resource, ensure it has the following parameters.

    Azure resource requirement Required value
    Location "West US 2" or "West Europe"
    Pricing tier Standard (S) pricing tier
  2. In the Custom Named Entity Recognition (NER) & Custom Classification (Preview) section, select an existing storage account or select Create a new storage account. Note that these values are for this quickstart, and not necessarily the storage account values you’ll want to use in production environments.

    Storage account value Recommended value
    Name Any name
    Performance Standard
    Account kind Storage (general purpose v1)
    Replication Locally redundant storage (LRS)
    Location Any location closest to you, for best latency.

Upload sample data to blob container

After you have created an Azure storage account and linked it to your Language resource, you will need to upload the example files to the root directory of your container for this quickstart. These files will later be used to train your model.

  1. Download the example data for this quickstart from GitHub. Open the .zip file, and extract the folder containing text files within it.

  2. In the Azure portal, navigate to the storage account you created, and select it.

  3. In your storage account, select Containers from the left menu, located below Data storage. On the screen that appears, select + Container. Give the container the name example-data and leave the default Public access level.

    A screenshot showing the main page for a storage account.

  4. After your container has been created, click on it. Then select the Upload button to select the .txt and .json files you downloaded earlier.

    A screenshot showing the button for uploading files to the storage account.

    Tip

    When you select files to upload, a file explorer will open to your computer. To select all the files in the folder, press ctrl + a.

The provided sample dataset contains 20 loan agreements, each agreement includes two parties: a lender and a borrower. You can use the provided sample file to extract relevant information for both parties, an agreement date, a loan amount, and an interest rate.

Get your resource keys endpoint

  • Go to your resource overview page in the Azure portal

  • From the menu of the left side of the screen, select Keys and Endpoint. Use endpoint for the API requests and you’ll need the key for Ocp-Apim-Subscription-Key header. A screenshot showing the key and endpoint screen for an Azure resource.

Create a custom NER project

Once your resource and storage container are configured, create a new custom NER project. A project is a work area for building your custom AI models based on your data. Your project can only be accessed by you and others who have access to the Azure resource being used.

Note

The project name is case sensitive for all operations.

Create a POST request using the following URL, headers, and JSON body to create your project and import the tags file.

Use the following URL to create a project and import your tags file. Replace the placeholder values below with your own values.

{YOUR-ENDPOINT}/language/analyze-text/projects/{projectName}/:import?api-version=2021-11-01-preview
Placeholder Value Example
{YOUR-ENDPOINT} The endpoint for authenticating your API request. https://<your-custom-subdomain>.cognitiveservices.azure.com

Headers

Use the following header to authenticate your request.

Key Value
Ocp-Apim-Subscription-Key The key to your resource. Used for authenticating your API requests.

Body

Use the following JSON in your request. Replace the placeholder values below with your own values. Use the tags file available in the sample data tab

{
    "api-version": "2021-11-01-preview",
    "metadata": {
        "name": "MyProject",
        "multiLingual": true,
        "description": "Trying out custom NER",
        "modelType": "Extraction",
        "language": "string",
        "storageInputContainerName": "YOUR-CONTAINER-NAME",
        "settings": {}
    },
    "assets": {
        "extractors": [
        {
            "name": "Entity1"
        },
        {
            "name": "Entity2"
        }
    ],
    "documents": [
        {
            "location": "doc1.txt",
            "language": "en-us",
            "dataset": "Train",
            "extractors": [
                {
                    "regionOffset": 0,
                    "regionLength": 500,
                    "labels": [
                        {
                            "extractorName": "Entity1",
                            "offset": 25,
                            "length": 10
                        },                    
                        {
                            "extractorName": "Entity2",
                            "offset": 120,
                            "length": 8
                        }
                    ]
                }
            ]
        },
        {
            "location": "doc2.txt",
            "language": "en-us",
            "dataset": "Test",
            "extractors": [
                {
                    "regionOffset": 0,
                    "regionLength": 100,
                    "labels": [
                        {
                            "extractorName": "Entity2",
                            "offset": 20,
                            "length": 5
                        }
                    ]
                }
            ]
        }
    ]
    }
}

For the metadata key:

Key Value Example
modelType Your Model type. Extraction
storageInputContainerName The name of your Azure blob storage container. myContainer

For the documents key:

Key Value Example
location Document name on the blob store. doc2.txt
language The language of the document. en-us
dataset Optional field to specify the dataset which this document will belong to. Train or Test

This request will return an error if:

  • The selected resource doesn't have proper permission for the storage account.

Start training your model

After your project has been created, you can begin training a custom NER model. Create a POST request using the following URL, headers, and JSON body to start training an NER model. ate a POST request using the following URL, headers, and JSON body to start training a text classification model.

Request URL

Use the following URL when creating your API request. Replace the placeholder values below with your own values.

{YOUR-ENDPOINT}/language/analyze-text/projects/{PROJECT-NAME}/:train?api-version=2021-11-01-preview
Placeholder Value Example
{YOUR-ENDPOINT} The endpoint for authenticating your API request. https://<your-custom-subdomain>.cognitiveservices.azure.com
{PROJECT-NAME} The name for your project. This value is case-sensitive. myProject

Headers

Use the following header to authenticate your request.

Key Value
Ocp-Apim-Subscription-Key The key to your resource. Used for authenticating your API requests.

Request body

Use the following JSON in your request. The model will be named MyModel once training is complete.

{
  "modelLabel": "MyModel",
  "runValidation": true,
  "evaluationOptions":
    {
        "type":"percentage",
        "testingSplitPercentage":"30",
        "trainingSplitPercentage":"70"
    }
}
Key Value Example
modelLabel Your Model name. MyModel
runValidation Boolean value to run validation on the test set. True or False
evaluationOptions Specifies evaluation options.
type Specifies datasplit type. set or percentage
testingSplitPercentage Required integer field if type is percentage. Specifies testing split. 30
trainingSplitPercentage Required integer field if type is percentage. Specifies training split. 70

Once you send your API request, you’ll receive a 202 response indicating success. In the response headers, extract the location value. It will be formatted like this:

{YOUR-ENDPOINT}/language/analyze-text/projects/{YOUR-PROJECT-NAME}/train/jobs/{JOB-ID}?api-version=2021-11-01-preview

JOB-ID is used to identify your request, since this operation is asynchronous. You’ll use this URL in the next step to get the training status.

Get Training Status

Use the following GET request to query the status of your model's training process. You can use the URL you received from the previous step, or replace the placeholder values below with your own values.

{YOUR-ENDPOINT}/language/analyze-text/projects/{YOUR-PROJECT-NAME}/train/jobs/{JOB-ID}?api-version=2021-11-01-preview
Placeholder Value Example
{YOUR-ENDPOINT} The endpoint for authenticating your API request. https://<your-custom-subdomain>.cognitiveservices.azure.com
{PROJECT-NAME} The name for your project. This value is case-sensitive. myProject
{JOB-ID} The ID for locating your model's training status. This is in the location header value you received in the previous step. xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx

Headers

Use the following header to authenticate your request.

Key Value
Ocp-Apim-Subscription-Key The key to your resource. Used for authenticating your API requests.

Response Body

Once you send the request, you’ll get the following response.

{
  "jobs": [
    {
      "result": {
        "trainedModelLabel": "MyModel",
        "trainStatus": {
          "percentComplete": 0,
          "elapsedTime": "string"
        },
        "evaluationStatus": {
          "percentComplete": 0,
          "elapsedTime": "string"
        }
      },
      "jobId": "string",
      "createdDateTime": "2021-10-19T23:24:41.572Z",
      "lastUpdatedDateTime": "2021-10-19T23:24:41.572Z",
      "expirationDateTime": "2021-10-19T23:24:41.572Z",
      "status": "unknown",
      "errors": [
        {
          "code": "unknown",
          "message": "string"
        }
      ]
    }
  ],
  "nextLink": "string"
}

Deploy your model

Create a PUT request using the following URL, headers, and JSON body to start deploying a custom NER model.

{YOUR-ENDPOINT}/language/analyze-text/projects/{PROJECT-NAME}/deployments/{DEPLOYMENT-NAME}?api-version=2021-11-01-preview
Placeholder Value Example
{YOUR-ENDPOINT} The endpoint for authenticating your API request. https://<your-custom-subdomain>.cognitiveservices.azure.com
{PROJECT-NAME} The name for your project. This value is case-sensitive. myProject
{DEPLOYMENT-NAME} The name of your deployment. This value is case-sensitive. prod

Headers

Use the following header to authenticate your request.

Key Value
Ocp-Apim-Subscription-Key The key to your resource. Used for authenticating your API requests.

Request body

Use the following JSON in your request. Use the name of the model you want to deploy.

{
  "trainedModelLabel": "MyModel",
  "deploymentName": "prod"
}

Once you send your API request, you’ll receive a 202 response indicating success. In the response headers, extract the location value. It will be formatted like this:

{YOUR-ENDPOINT}/language/analyze-text/projects/{YOUR-PROJECT-NAME}/deployments/{DEPLOYMENT-NAME}/jobs/{JOB-ID}?api-version=2021-11-01-preview

JOB-ID is used to identify your request, since this operation is asynchronous. You will use this URL in the next step to get the publishing status.

Get the deployment status

Use the following GET request to query the status of your model's publishing process. You can use the URL you received from the previous step, or replace the placeholder values below with your own values.

{YOUR-ENDPOINT}/language/analyze-text/projects/{YOUR-PROJECT-NAME}/deployments/{DEPLOYMENT-NAME}/jobs/{JOB-ID}?api-version=2021-11-01-preview
Placeholder Value Example
{YOUR-ENDPOINT} The endpoint for authenticating your API request. https://<your-custom-subdomain>.cognitiveservices.azure.com
{PROJECT-NAME} The name for your project. This value is case-sensitive. myProject
{DEPLOYMENT-NAME} The name of your deployment. This value is case-sensitive. prod
{JOB-ID} The ID for locating your model's training status. This is in the location header value you received in the previous step. xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx

Headers

Use the following header to authenticate your request.

Key Value
Ocp-Apim-Subscription-Key The key to your resource. Used for authenticating your API requests.

Submit custom NER task

Now that your model is deployed, you can begin sending entity recognition tasks to it.

Note

Project names is case sensitive.

Use this POST request to start an entity extraction task. Replace {projectName} with the project name where you have the model you want to use.

{YOUR-ENDPOINT}/text/analytics/v3.2-preview.2/analyze

Headers

Key Value
Ocp-Apim-Subscription-Key Your subscription key that provides access to this API.

Body

    {
    "displayName": "MyJobName",
    "analysisInput": {
        "documents": [
            {
                "id": "doc1", 
                "text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc tempus, felis sed vehicula lobortis, lectus ligula facilisis quam, quis aliquet lectus diam id erat. Vivamus eu semper tellus. Integer placerat sem vel eros iaculis dictum. Sed vel congue urna."
            },
            {
                "id": "doc2",
                "text": "Mauris dui dui, ultricies vel ligula ultricies, elementum viverra odio. Donec tempor odio nunc, quis fermentum lorem egestas commodo. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos."
            }
        ]
    },
    "tasks": {
        "customEntityRecognitionTasks": [      
            {
                "parameters": {
                      "project-name": "MyProject",
                      "deployment-name": "MyDeploymentName"
                      "stringIndexType": "TextElements_v8"
                }
            }
        ]
    }
}
Key Sample Value Description
displayName "MyJobName" Your job Name
documents [{},{}] List of documents to run tasks on
ID "doc1" a string document identifier
text "Lorem ipsum dolor sit amet" You document in string format
"tasks" [] List of tasks we want to perform.
-- customEntityRecognitionTasks Task identifer for task we want to perform.
parameters [] List of parameters to pass to task
project-name "MyProject" Your project name. The project name is case-sensitive.
deployment-name "MyDeploymentName" Your deployment name

Response

You will receive a 202 response indicating success. In the response headers, extract operation-location. operation-location is formatted like this:

{YOUR-ENDPOINT}/text/analytics/v3.2-preview.2/analyze/jobs/<jobId>

You will use this endpoint in the next step to get the custom recognition task results.

Get task status and results

Use the following GET request to query the status/results of the custom recognition task. You can use the endpoint you received from the previous step.

{YOUR-ENDPOINT}/text/analytics/v3.2-preview.2/analyze/jobs/<jobId>.

Headers

Key Value
Ocp-Apim-Subscription-Key Your Subscription key that provides access to this API.

Response Body

The response will be a JSON document with the following parameters

{
    "createdDateTime": "2021-05-19T14:32:25.578Z",
    "displayName": "MyJobName",
    "expirationDateTime": "2021-05-19T14:32:25.578Z",
    "jobId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "lastUpdateDateTime": "2021-05-19T14:32:25.578Z",
    "status": "completed",
    "errors": [],
    "tasks": {
        "details": {
            "name": "MyJobName",
            "lastUpdateDateTime": "2021-03-29T19:50:23Z",
            "status": "completed"
        },
        "completed": 1,
        "failed": 0,
        "inProgress": 0,
        "total": 1,
        "tasks": {
    "customEntityRecognitionTasks": [
        {
            "lastUpdateDateTime": "2021-05-19T14:32:25.579Z",
            "name": "MyJobName",
            "status": "completed",
            "results": {
                "documents": [
                    {
                        "id": "doc1",
                        "entities": [
                            {
                                "text": "Government",
                                "category": "restaurant_name",
                                "offset": 23,
                                "length": 10,
                                "confidenceScore": 0.0551877357
                            }
                        ],
                        "warnings": []
                    },
                    {
                        "id": "doc2",
                        "entities": [
                            {
                                "text": "David Schmidt",
                                "category": "artist",
                                "offset": 0,
                                "length": 13,
                                "confidenceScore": 0.8022353
                            }
                        ],
                        "warnings": []
                    }
                ],
                "errors": [],
                "statistics": {
                    "documentsCount":0,
                    "validDocumentsCount":0,
                    "erroneousDocumentsCount":0,
                    "transactionsCount":0
                }
                    }
                }
            ]
        }
    }

Clean up resources

When you no longer need your project, you can delete it with the following DELETE request. Replace the placeholder values with your own values.

{YOUR-ENDPOINT}/language/text/authoring/v1.0-preview.2/projects/{PROJECT-NAME}
Placeholder Value Example
{YOUR-ENDPOINT} The endpoint for authenticating your API request. https://<your-custom-subdomain>.cognitiveservices.azure.com
{PROJECT-NAME} The name for your project. This value is case-sensitive. myProject

Headers

Use the following header to authenticate your request.

Key Value
Ocp-Apim-Subscription-Key The key to your resource. Used for authenticating your API requests.

Next steps

After you've created entity extraction model, you can:

When you start to create your own custom NER projects, use the how-to articles to learn more about tagging, training and consuming your model in greater detail: