Quickstart: custom Text Analytics for health

Use this article to get started with creating a custom Text Analytics for health project where you can train custom models on top of Text Analytics for health for custom entity recognition. A model is artificial intelligence software that's trained to do a certain task. For this system, the models extract healthcare related named entities and are trained by learning from labeled data.

In this article, we use Language Studio to demonstrate key concepts of custom Text Analytics for health. As an example we’ll build a custom Text Analytics for health model to extract the Facility or treatment location from short discharge notes.

Prerequisites

Create a new Azure AI Language resource and Azure storage account

Before you can use custom Text Analytics for health, you need to create an Azure AI Language resource, which will give you the credentials that you need to create a project and start training a model. You'll also need an Azure storage account, where you can upload your dataset that is used to build your model.

Important

To quickly get started, we recommend creating a new Azure AI Language resource using the steps provided in this article. Using the steps in this article will let you create the Language resource and storage account at the same time, which is easier than doing it later.

If you have a pre-existing resource that you'd like to use, you will need to connect it to storage account. For more information, see guidance to using a pre-existing resource.

Create a new resource from the Azure portal

  1. Sign in to the Azure portal to create a new Azure AI Language resource.

  2. In the window that appears, select Custom text classification & custom named entity recognition from the custom features. Select Continue to create your resource at the bottom of the screen.

    A screenshot showing custom text classification & custom named entity recognition in the Azure portal.

  3. Create a Language resource with following details.

    Name Description
    Subscription Your Azure subscription.
    Resource group A resource group that will contain your resource. You can use an existing one, or create a new one.
    Region The region for your Language resource. For example, "West US 2".
    Name A name for your resource.
    Pricing tier The pricing tier for your Language resource. You can use the Free (F0) tier to try the service.

    Note

    If you get a message saying "your login account is not an owner of the selected storage account's resource group", your account needs to have an owner role assigned on the resource group before you can create a Language resource. Contact your Azure subscription owner for assistance.

  4. In the Custom text classification & custom named entity recognition section, select an existing storage account or select New storage account. These values are to help you get started, and not necessarily the storage account values you’ll want to use in production environments. To avoid latency during building your project connect to storage accounts in the same region as your Language resource.

    Storage account value Recommended value
    Storage account name Any name
    Storage account type Standard LRS
  5. Make sure the Responsible AI Notice is checked. Select Review + create at the bottom of the page, then select Create.

Upload sample data to blob container

After you have created an Azure storage account and connected it to your Language resource, you will need to upload the documents from the sample dataset to the root directory of your container. These documents will later be used to train your model.

  1. Download the sample dataset from GitHub.

  2. Open the .zip file, and extract the folder containing the documents.

  3. In the Azure portal, navigate to the storage account you created, and select it.

  4. In your storage account, select Containers from the left menu, located below Data storage. On the screen that appears, select + Container. Give the container the name example-data and leave the default Public access level.

    A screenshot showing the main page for a storage account.

  5. After your container has been created, select it. Then select Upload button to select the .txt and .json files you downloaded earlier.

    A screenshot showing the button for uploading files to the storage account.

The provided sample dataset contains 12 clinical notes. Each clinical note includes several medical entities and the treatment location. We will use the prebuilt entities to extract the medical entities and train the custom model to extract the treatment location using the entity's learned and list components.

Create a custom Text Analytics for health project

Once your resource and storage account are configured, create a new custom Text Analytics for health project. A project is a work area for building your custom ML models based on your data. Your project can only be accessed by you and others who have access to the Language resource being used.

  1. Sign into the Language Studio. A window will appear to let you select your subscription and Language resource. Select the Language resource you created in the above step.

  2. Under the Extract information section of Language Studio, select Custom Text Analytics for health.

  3. Select Create new project from the top menu in your projects page. Creating a project lets you label data, train, evaluate, improve, and deploy your models.

    A screenshot of the project creation page.

  4. Enter the project information, including a name, description, and the language of the files in your project. If you're using the example dataset, select English. You can't change the name of your project later. Select Next

    Tip

    Your dataset doesn't have to be entirely in the same language. You can have multiple documents, each with different supported languages. If your dataset contains documents of different languages or if you expect text from different languages during runtime, select enable multi-lingual dataset option when you enter the basic information for your project. This option can be enabled later from the Project settings page.

  5. After you select Create new project, a window will appear to let you connect your storage account. If you've already connected a storage account, you will see the storage accounted connected. If not, choose your storage account from the dropdown that appears and select Connect storage account; this will set the required roles for your storage account. This step will possibly return an error if you are not assigned as owner on the storage account.

    Note

    • You only need to do this step once for each new resource you use.
    • This process is irreversible, if you connect a storage account to your Language resource you cannot disconnect it later.
    • You can only connect your Language resource to one storage account.

    A screenshot showing the storage connection screen.

  6. Select the container where you have uploaded your dataset.

  7. If you have already labeled data make sure it follows the supported format and select Yes, my files are already labeled and I have formatted JSON labels file and select the labels file from the drop-down menu. Select Next. If you are using the dataset from the QuickStart, there is no need to review the formatting of the JSON labels file.

  8. Review the data you entered and select Create Project.

Train your model

Typically after you create a project, you go ahead and start labeling the documents you have in the container connected to your project. For this quickstart, you have imported a sample tagged dataset and initialized your project with the sample JSON labels file so there is no need to add additional labels.

To start training your model from within the Language Studio:

  1. Select Training jobs from the left side menu.

  2. Select Start a training job from the top menu.

  3. Select Train a new model and type in the model name in the text box. You can also overwrite an existing model by selecting this option and choosing the model you want to overwrite from the dropdown menu. Overwriting a trained model is irreversible, but it won't affect your deployed models until you deploy the new model.

    A screenshot showing the training job creation screen in Language Studio.

  4. Select data splitting method. You can choose Automatically splitting the testing set from training data where the system will split your labeled data between the training and testing sets, according to the specified percentages. Or you can Use a manual split of training and testing data, this option is only enabled if you have added documents to your testing set. See data labeling and how to train a model for information about data splitting.

  5. Select the Train button.

  6. If you select the Training Job ID from the list, a side pane will appear where you can check the Training progress, Job status, and other details for this job.

    Note

    • Only successfully completed training jobs will generate models.
    • Training can take some time between a couple of minutes and several hours based on the size of your labeled data.
    • You can only have one training job running at a time. You can't start other training job within the same project until the running job is completed.

Deploy your model

Generally after training a model you would review its evaluation details and make improvements if necessary. In this quickstart, you will just deploy your model, and make it available for you to try in Language studio, or you can call the prediction API.

To deploy your model from within the Language Studio:

  1. Select Deploying a model from the left side menu.

  2. Select Add deployment to start a new deployment job.

    A screenshot showing the deployment button in Language Studio.

  3. Select Create new deployment to create a new deployment and assign a trained model from the dropdown below. You can also Overwrite an existing deployment by selecting this option and select the trained model you want to assign to it from the dropdown below.

    Note

    Overwriting an existing deployment doesn't require changes to your prediction API call but the results you get will be based on the newly assigned model.

    A screenshot showing the model deployment options in Language Studio.

  4. Select Deploy to start the deployment job.

  5. After deployment is successful, an expiration date will appear next to it. Deployment expiration is when your deployed model will be unavailable to be used for prediction, which typically happens twelve months after a training configuration expires.

Test your model

After your model is deployed, you can start using it to extract entities from your text via Prediction API. For this quickstart, you will use the Language Studio to submit the custom Text Analytics for health prediction task and visualize the results. In the sample dataset you downloaded earlier, you can find some test documents that you can use in this step.

To test your deployed models from within the Language Studio:

  1. Select Testing deployments from the left side menu.

  2. Select the deployment you want to test. You can only test models that are assigned to deployments.

  3. Select the deployment you want to query/test from the dropdown.

  4. You can enter the text you want to submit to the request or upload a .txt file to use.

  5. Select Run the test from the top menu.

  6. In the Result tab, you can see the extracted entities from your text and their types. You can also view the JSON response under the JSON tab.

    A screenshot showing the deployment testing screen in Language Studio.

Clean up resources

When you don't need your project anymore, you can delete your project using Language Studio.

  1. Select the Language service feature you're using at the top of the page, s
  2. Select the project you want to delete
  3. Select Delete from the top menu.

Prerequisites

Create a new Azure AI Language resource and Azure storage account

Before you can use custom Text Analytics for health, you'll need to create an Azure AI Language resource, which will give you the credentials that you need to create a project and start training a model. You'll also need an Azure storage account, where you can upload your dataset that will be used in building your model.

Important

To get started quickly, we recommend creating a new Azure AI Language resource using the steps provided in this article, which will let you create the Language resource, and create and/or connect a storage account at the same time, which is easier than doing it later.

If you have a pre-existing resource that you'd like to use, you will need to connect it to storage account. See create project for more information.

Create a new resource from the Azure portal

  1. Sign in to the Azure portal to create a new Azure AI Language resource.

  2. In the window that appears, select Custom text classification & custom named entity recognition from the custom features. Select Continue to create your resource at the bottom of the screen.

    A screenshot showing custom text classification & custom named entity recognition in the Azure portal.

  3. Create a Language resource with following details.

    Name Description
    Subscription Your Azure subscription.
    Resource group A resource group that will contain your resource. You can use an existing one, or create a new one.
    Region The region for your Language resource. For example, "West US 2".
    Name A name for your resource.
    Pricing tier The pricing tier for your Language resource. You can use the Free (F0) tier to try the service.

    Note

    If you get a message saying "your login account is not an owner of the selected storage account's resource group", your account needs to have an owner role assigned on the resource group before you can create a Language resource. Contact your Azure subscription owner for assistance.

  4. In the Custom text classification & custom named entity recognition section, select an existing storage account or select New storage account. These values are to help you get started, and not necessarily the storage account values you’ll want to use in production environments. To avoid latency during building your project connect to storage accounts in the same region as your Language resource.

    Storage account value Recommended value
    Storage account name Any name
    Storage account type Standard LRS
  5. Make sure the Responsible AI Notice is checked. Select Review + create at the bottom of the page, then select Create.

Upload sample data to blob container

After you have created an Azure storage account and connected it to your Language resource, you will need to upload the documents from the sample dataset to the root directory of your container. These documents will later be used to train your model.

  1. Download the sample dataset from GitHub.

  2. Open the .zip file, and extract the folder containing the documents.

  3. In the Azure portal, navigate to the storage account you created, and select it.

  4. In your storage account, select Containers from the left menu, located below Data storage. On the screen that appears, select + Container. Give the container the name example-data and leave the default Public access level.

    A screenshot showing the main page for a storage account.

  5. After your container has been created, select it. Then select Upload button to select the .txt and .json files you downloaded earlier.

    A screenshot showing the button for uploading files to the storage account.

The provided sample dataset contains 12 clinical notes. Each clinical note includes several medical entities and the treatment location. We will use the prebuilt entities to extract the medical entities and train the custom model to extract the treatment location using the entity's learned and list components.

Get your resource keys and endpoint

  1. Go to your resource overview page in the Azure portal

  2. From the menu on the left side, select Keys and Endpoint. You'll use the endpoint and key for the API requests

    A screenshot showing the key and endpoint page in the Azure portal

Create a custom Text Analytics for health project

Once your resource and storage account are configured, create a new custom Text Analytics for health project. A project is a work area for building your custom ML models based on your data. Your project can only be accessed by you and others who have access to the Language resource being used.

Use the labels file you downloaded from the sample data in the previous step and add it to the body of the following request.

Trigger import project job

Submit a POST request using the following URL, headers, and JSON body to import your labels file. Make sure that your labels file follow the accepted format.

If a project with the same name already exists, the data of that project is replaced.

{Endpoint}/language/authoring/analyze-text/projects/{projectName}/:import?api-version={API-VERSION}
Placeholder Value Example
{ENDPOINT} The endpoint for authenticating your API request. https://<your-custom-subdomain>.cognitiveservices.azure.com
{PROJECT-NAME} The name for your project. This value is case-sensitive. myProject
{API-VERSION} The version of the API you are calling. The value referenced here is for the latest version released. See Model lifecycle to learn more about other available API versions. 2022-05-01

Headers

Use the following header to authenticate your request.

Key Value
Ocp-Apim-Subscription-Key The key to your resource. Used for authenticating your API requests.

Body

Use the following JSON in your request. Replace the placeholder values below with your own values.

{
	"projectFileVersion": "{API-VERSION}",
	"stringIndexType": "Utf16CodeUnit",
	"metadata": {
		"projectName": "{PROJECT-NAME}",
		"projectKind": "CustomHealthcare",
		"description": "Trying out custom Text Analytics for health",
		"language": "{LANGUAGE-CODE}",
		"multilingual": true,
		"storageInputContainerName": "{CONTAINER-NAME}",
		"settings": {}
	},
	"assets": {
		"projectKind": "CustomHealthcare",
		"entities": [
			{
				"category": "Entity1",
				"compositionSetting": "{COMPOSITION-SETTING}",
				"list": {
					"sublists": [
						{
							"listKey": "One",
							"synonyms": [
								{
									"language": "en",
									"values": [
										"EntityNumberOne",
										"FirstEntity"
									]
								}
							]
						}
					]
				}
			},
			{
				"category": "Entity2"
			},
			{
				"category": "MedicationName",
				"list": {
					"sublists": [
						{
							"listKey": "research drugs",
							"synonyms": [
								{
									"language": "en",
									"values": [
										"rdrug a",
										"rdrug b"
									]
								}
							]

						}
					]
				}
				"prebuilts": "MedicationName"
			}
		],
		"documents": [
			{
				"location": "{DOCUMENT-NAME}",
				"language": "{LANGUAGE-CODE}",
				"dataset": "{DATASET}",
				"entities": [
					{
						"regionOffset": 0,
						"regionLength": 500,
						"labels": [
							{
								"category": "Entity1",
								"offset": 25,
								"length": 10
							},
							{
								"category": "Entity2",
								"offset": 120,
								"length": 8
							}
						]
					}
				]
			},
			{
				"location": "{DOCUMENT-NAME}",
				"language": "{LANGUAGE-CODE}",
				"dataset": "{DATASET}",
				"entities": [
					{
						"regionOffset": 0,
						"regionLength": 100,
						"labels": [
							{
								"category": "Entity2",
								"offset": 20,
								"length": 5
							}
						]
					}
				]
			}
		]
	}
}

Key Placeholder Value Example
multilingual true A boolean value that enables you to have documents in multiple languages in your dataset and when your model is deployed you can query the model in any supported language (not necessarily included in your training documents). See language support to learn more about multilingual support. true
projectName {PROJECT-NAME} Project name myproject
storageInputContainerName {CONTAINER-NAME} Container name mycontainer
entities Array containing all the entity types you have in the project. These are the entity types that will be extracted from your documents into.
category The name of the entity type, which can be user defined for new entity definitions, or predefined for prebuilt entities.
compositionSetting {COMPOSITION-SETTING} Rule that defines how to manage multiple components in your entity. Options are combineComponents or separateComponents. combineComponents
list Array containing all the sublists you have in the project for a specific entity. Lists can be added to prebuilt entities or new entities with learned components.
sublists [] Array containing sublists. Each sublist is a key and its associated values. []
listKey One A normalized value for the list of synonyms to map back to in prediction. One
synonyms [] Array containing all the synonyms synonym
language {LANGUAGE-CODE} A string specifying the language code for the synonym in your sublist. If your project is a multilingual project and you want to support your list of synonyms for all the languages in your project, you have to explicitly add your synonyms to each language. See Language support for more information about supported language codes. en
values "EntityNumberone", "FirstEntity" A list of comma separated strings that will be matched exactly for extraction and map to the list key. "EntityNumberone", "FirstEntity"
prebuilts MedicationName The name of the prebuilt component populating the prebuilt entity. Prebuilt entities are automatically loaded into your project by default but you can extend them with list components in your labels file. MedicationName
documents Array containing all the documents in your project and list of the entities labeled within each document. []
location {DOCUMENT-NAME} The location of the documents in the storage container. Since all the documents are in the root of the container this should be the document name. doc1.txt
dataset {DATASET} The test set to which this file will go to when split before training. Possible values for this field are Train and Test. Train
regionOffset The inclusive character position of the start of the text. 0
regionLength The length of the bounding box in terms of UTF16 characters. Training only considers the data in this region. 500
category The type of entity associated with the span of text specified. Entity1
offset The start position for the entity text. 25
length The length of the entity in terms of UTF16 characters. 20
language {LANGUAGE-CODE} A string specifying the language code for the document used in your project. If your project is a multilingual project, choose the language code of the majority of the documents. See Language support for more information about supported language codes. en

Once you send your API request, you’ll receive a 202 response indicating that the job was submitted correctly. In the response headers, extract the operation-location value. It will be formatted like this:

{ENDPOINT}/language/authoring/analyze-text/projects/{PROJECT-NAME}/import/jobs/{JOB-ID}?api-version={API-VERSION}

{JOB-ID} is used to identify your request, since this operation is asynchronous. You’ll use this URL to get the import job status.

Possible error scenarios for this request:

  • The selected resource doesn't have proper permissions for the storage account.
  • The storageInputContainerName specified doesn't exist.
  • Invalid language code is used, or if the language code type isn't string.
  • multilingual value is a string and not a boolean.

Get import job status

Use the following GET request to get the status of your importing your project. Replace the placeholder values below with your own values.

Request URL

{ENDPOINT}/language/authoring/analyze-text/projects/{PROJECT-NAME}/import/jobs/{JOB-ID}?api-version={API-VERSION}
Placeholder Value Example
{ENDPOINT} The endpoint for authenticating your API request. https://<your-custom-subdomain>.cognitiveservices.azure.com
{PROJECT-NAME} The name of your project. This value is case-sensitive. myProject
{JOB-ID} The ID for locating your model's training status. This value is in the location header value you received in the previous step. xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx
{API-VERSION} The version of the API you are calling. The value referenced here is for the latest version released. See Model lifecycle to learn more about other available API versions. 2022-05-01

Headers

Use the following header to authenticate your request.

Key Value
Ocp-Apim-Subscription-Key The key to your resource. Used for authenticating your API requests.

Train your model

Typically after you create a project, you go ahead and start labeling the documents you have in the container connected to your project. For this quickstart, you have imported a sample tagged dataset and initialized your project with the sample JSON tags file.

Start training job

After your project has been imported, you can start training your model.

Submit a POST request using the following URL, headers, and JSON body to submit a training job. Replace the placeholder values with your own values.

{ENDPOINT}/language/authoring/analyze-text/projects/{PROJECT-NAME}/:train?api-version={API-VERSION}
Placeholder Value Example
{ENDPOINT} The endpoint for authenticating your API request. https://<your-custom-subdomain>.cognitiveservices.azure.com
{PROJECT-NAME} The name of your project. This value is case-sensitive. myProject
{API-VERSION} The version of the API you're calling. The value referenced here is for the latest version released. See Model lifecycle to learn more about other available API versions. 2022-05-01

Headers

Use the following header to authenticate your request.

Key Value
Ocp-Apim-Subscription-Key The key to your resource. Used for authenticating your API requests.

Request body

Use the following JSON in your request body. The model is given the {MODEL-NAME} once training is complete. Only successful training jobs produce models.

{
	"modelLabel": "{MODEL-NAME}",
	"trainingConfigVersion": "{CONFIG-VERSION}",
	"evaluationOptions": {
		"kind": "percentage",
		"trainingSplitPercentage": 80,
		"testingSplitPercentage": 20
	}
}
Key Placeholder Value Example
modelLabel {MODEL-NAME} The model name that is assigned to your model once trained successfully. myModel
trainingConfigVersion {CONFIG-VERSION} This is the model version that is used to train the model. 2022-05-01
evaluationOptions Option to split your data across training and testing sets. {}
kind percentage Split methods. Possible values are percentage or manual. See How to train a model for more information. percentage
trainingSplitPercentage 80 Percentage of your tagged data to be included in the training set. Recommended value is 80. 80
testingSplitPercentage 20 Percentage of your tagged data to be included in the testing set. Recommended value is 20. 20

Note

The trainingSplitPercentage and testingSplitPercentage are only required if Kind is set to percentage and the sum of both percentages should be equal to 100.

Once you send your API request, you’ll receive a 202 response indicating that the job was submitted correctly. In the response headers, extract the location value. It is formatted like this:

{ENDPOINT}/language/authoring/analyze-text/projects/{PROJECT-NAME}/train/jobs/{JOB-ID}?api-version={API-VERSION}

{JOB-ID} is used to identify your request, since this operation is asynchronous. You can use this URL to get the training status.

Get training job status

Training could take sometime between 10 and 30 minutes for this sample dataset. You can use the following request to keep polling the status of the training job until it is successfully completed.

Use the following GET request to get the status of your model's training progress. Replace the placeholder values below with your own values.

Request URL

{ENDPOINT}/language/authoring/analyze-text/projects/{PROJECT-NAME}/train/jobs/{JOB-ID}?api-version={API-VERSION}
Placeholder Value Example
{ENDPOINT} The endpoint for authenticating your API request. https://<your-custom-subdomain>.cognitiveservices.azure.com
{PROJECT-NAME} The name of your project. This value is case-sensitive. myProject
{JOB-ID} The ID for locating your model's training status. This value is in the location header value you received in the previous step. xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx
{API-VERSION} The version of the API you're calling. The value referenced here is for the latest version released. See Model lifecycle to learn more about other available API versions. 2022-05-01

Headers

Use the following header to authenticate your request.

Key Value
Ocp-Apim-Subscription-Key The key to your resource. Used for authenticating your API requests.

Response Body

After sending the request, you will get the following response.

{
  "result": {
    "modelLabel": "{MODEL-NAME}",
    "trainingConfigVersion": "{CONFIG-VERSION}",
    "estimatedEndDateTime": "2022-04-18T15:47:58.8190649Z",
    "trainingStatus": {
      "percentComplete": 3,
      "startDateTime": "2022-04-18T15:45:06.8190649Z",
      "status": "running"
    },
    "evaluationStatus": {
      "percentComplete": 0,
      "status": "notStarted"
    }
  },
  "jobId": "{JOB-ID}",
  "createdDateTime": "2022-04-18T15:44:44Z",
  "lastUpdatedDateTime": "2022-04-18T15:45:48Z",
  "expirationDateTime": "2022-04-25T15:44:44Z",
  "status": "running"
}

Deploy your model

Generally after training a model you would review its evaluation details and make improvements if necessary. In this quickstart, you will just deploy your model, and make it available for you to try in Language Studio, or you can call the prediction API.

Start deployment job

Submit a PUT request using the following URL, headers, and JSON body to submit a deployment job. Replace the placeholder values below with your own values.

{Endpoint}/language/authoring/analyze-text/projects/{projectName}/deployments/{deploymentName}?api-version={API-VERSION}
Placeholder Value Example
{ENDPOINT} The endpoint for authenticating your API request. https://<your-custom-subdomain>.cognitiveservices.azure.com
{PROJECT-NAME} The name of your project. This value is case-sensitive. myProject
{DEPLOYMENT-NAME} The name of your deployment. This value is case-sensitive. staging
{API-VERSION} The version of the API you're calling. The value referenced here is for the latest version released. See Model lifecycle to learn more about other available API versions. 2022-05-01

Headers

Use the following header to authenticate your request.

Key Value
Ocp-Apim-Subscription-Key The key to your resource. Used for authenticating your API requests.

Request body

Use the following JSON in the body of your request. Use the name of the model you to assign to the deployment.

{
  "trainedModelLabel": "{MODEL-NAME}"
}
Key Placeholder Value Example
trainedModelLabel {MODEL-NAME} The model name that will be assigned to your deployment. You can only assign successfully trained models. This value is case-sensitive. myModel

Once you send your API request, you’ll receive a 202 response indicating that the job was submitted correctly. In the response headers, extract the operation-location value. It will be formatted like this:

{ENDPOINT}/language/authoring/analyze-text/projects/{PROJECT-NAME}/deployments/{DEPLOYMENT-NAME}/jobs/{JOB-ID}?api-version={API-VERSION}

{JOB-ID} is used to identify your request, since this operation is asynchronous. You can use this URL to get the deployment status.

Get deployment job status

Use the following GET request to query the status of the deployment job. You can use the URL you received from the previous step, or replace the placeholder values below with your own values.

{ENDPOINT}/language/authoring/analyze-text/projects/{PROJECT-NAME}/deployments/{DEPLOYMENT-NAME}/jobs/{JOB-ID}?api-version={API-VERSION}
Placeholder Value Example
{ENDPOINT} The endpoint for authenticating your API request. https://<your-custom-subdomain>.cognitiveservices.azure.com
{PROJECT-NAME} The name of your project. This value is case-sensitive. myProject
{DEPLOYMENT-NAME} The name of your deployment. This value is case-sensitive. staging
{JOB-ID} The ID for locating your model's training status. This is in the location header value you received in the previous step. xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx
{API-VERSION} The version of the API you're calling. The value referenced here is for the latest version released. See Model lifecycle to learn more about other available API versions. 2022-05-01

Headers

Use the following header to authenticate your request.

Key Value
Ocp-Apim-Subscription-Key The key to your resource. Used for authenticating your API requests.

Response Body

You'll receive the following request when you send the request. Keep polling this endpoint until the status parameter changes to "succeeded". You should get a 200 code to indicate the success of the request.

{
    "jobId":"{JOB-ID}",
    "createdDateTime":"{CREATED-TIME}",
    "lastUpdatedDateTime":"{UPDATED-TIME}",
    "expirationDateTime":"{EXPIRATION-TIME}",
    "status":"running"
}

Make predictions with your trained model

After your model is deployed, you can start using it to extract entities from your text using the prediction API. In the sample dataset you downloaded earlier you can find some test documents that you can use in this step.

Submit a custom Text Analytics for health task

Use this POST request to start a Custom Text Analytics for health extraction task.

{ENDPOINT}/language/analyze-text/jobs?api-version={API-VERSION}
Placeholder Value Example
{ENDPOINT} The endpoint for authenticating your API request. https://<your-custom-subdomain>.cognitiveservices.azure.com
{API-VERSION} The version of the API you are calling. The value referenced here is for the latest version released. See Model lifecycle to learn more about other available API versions. 2022-05-01

Headers

Key Value
Ocp-Apim-Subscription-Key Your key that provides access to this API.

Body

{
  "displayName": "Extracting entities",
  "analysisInput": {
    "documents": [
      {
        "id": "1",
        "language": "{LANGUAGE-CODE}",
        "text": "Text1"
      },
      {
        "id": "2",
        "language": "{LANGUAGE-CODE}",
        "text": "Text2"
      }
    ]
  },
  "tasks": [
     {
      "kind": "CustomHealthcare",
      "taskName": "Custom TextAnalytics for Health Test",
      "parameters": {
        "projectName": "{PROJECT-NAME}",
        "deploymentName": "{DEPLOYMENT-NAME}"
      }
    }
  ]
}
Key Placeholder Value Example
displayName {JOB-NAME} Your job name. MyJobName
documents [{},{}] List of documents to run tasks on. [{},{}]
id {DOC-ID} Document name or ID. doc1
language {LANGUAGE-CODE} A string specifying the language code for the document. If this key isn't specified, the service will assume the default language of the project that was selected during project creation. See language support for a list of supported language codes. en-us
text {DOC-TEXT} Document task to run the tasks on. Lorem ipsum dolor sit amet
tasks List of tasks we want to perform. []
taskName Custom Text Analytics for Health Test The task name Custom Text Analytics for Health Test
kind CustomHealthcare The project or task kind we are trying to perform CustomHealthcare
parameters List of parameters to pass to the task.
project-name {PROJECT-NAME} The name for your project. This value is case-sensitive. myProject
deployment-name {DEPLOYMENT-NAME} The name of your deployment. This value is case-sensitive. prod

Response

You will receive a 202 response indicating that your task has been submitted successfully. In the response headers, extract operation-location. operation-location is formatted like this:

{ENDPOINT}/language/analyze-text/jobs/{JOB-ID}?api-version={API-VERSION}

You can use this URL to query the task completion status and get the results when task is completed.

Get task results

Use the following GET request to query the status/results of the custom entity recognition task.

{ENDPOINT}/language/analyze-text/jobs/{JOB-ID}?api-version={API-VERSION}
Placeholder Value Example
{ENDPOINT} The endpoint for authenticating your API request. https://<your-custom-subdomain>.cognitiveservices.azure.com
{API-VERSION} The version of the API you're calling. The value referenced here is for the latest version released. See Model lifecycle to learn more about other available API versions. 2022-05-01

Headers

Key Value
Ocp-Apim-Subscription-Key Your key that provides access to this API.

Response Body

The response is a JSON document with the following parameters

{
	"createdDateTime": "2021-05-19T14:32:25.578Z",
	"displayName": "MyJobName",
	"expirationDateTime": "2021-05-19T14:32:25.578Z",
	"jobId": "xxxx-xxxx-xxxxx-xxxxx",
	"lastUpdateDateTime": "2021-05-19T14:32:25.578Z",
	"status": "succeeded",
	"tasks": {
		"completed": 1,
		"failed": 0,
		"inProgress": 0,
		"total": 1,
		"items": [
			{
				"kind": "CustomHealthcareLROResults",
				"taskName": "Custom Text Analytics for Health Test",
				"lastUpdateDateTime": "2020-10-01T15:01:03Z",
				"status": "succeeded",
				"results": {
					"documents": [
						{
							"entities": [
								{
									"entityComponentInformation": [
										{
											"entityComponentKind": "learnedComponent"
										}
									],
									"offset": 0,
									"length": 11,
									"text": "first entity",
									"category": "Entity1",
									"confidenceScore": 0.98
								},
								{
									"entityComponentInformation": [
										{
											"entityComponentKind": "listComponent"
										}
									],
									"offset": 0,
									"length": 11,
									"text": "first entity",
									"category": "Entity1.Dictionary",
									"confidenceScore": 1.0
								},
								{
									"entityComponentInformation": [
										{
											"entityComponentKind": "learnedComponent"
										}
									],
									"offset": 16,
									"length": 9,
									"text": "entity two",
									"category": "Entity2",
									"confidenceScore": 1.0
								},
								{
									"entityComponentInformation": [
										{
											"entityComponentKind": "prebuiltComponent"
										}
									],
									"offset": 37,
									"length": 9,
									"text": "ibuprofen",
									"category": "MedicationName",
									"confidenceScore": 1,
									"assertion": {
										"certainty": "negative"
									},
									"name": "ibuprofen",
									"links": [
										{
											"dataSource": "UMLS",
											"id": "C0020740"
										},
										{
											"dataSource": "AOD",
											"id": "0000019879"
										},
										{
											"dataSource": "ATC",
											"id": "M01AE01"
										},
										{
											"dataSource": "CCPSS",
											"id": "0046165"
										},
										{
											"dataSource": "CHV",
											"id": "0000006519"
										},
										{
											"dataSource": "CSP",
											"id": "2270-2077"
										},
										{
											"dataSource": "DRUGBANK",
											"id": "DB01050"
										},
										{
											"dataSource": "GS",
											"id": "1611"
										},
										{
											"dataSource": "LCH_NW",
											"id": "sh97005926"
										},
										{
											"dataSource": "LNC",
											"id": "LP16165-0"
										},
										{
											"dataSource": "MEDCIN",
											"id": "40458"
										},
										{
											"dataSource": "MMSL",
											"id": "d00015"
										},
										{
											"dataSource": "MSH",
											"id": "D007052"
										},
										{
											"dataSource": "MTHSPL",
											"id": "WK2XYI10QM"
										},
										{
											"dataSource": "NCI",
											"id": "C561"
										},
										{
											"dataSource": "NCI_CTRP",
											"id": "C561"
										},
										{
											"dataSource": "NCI_DCP",
											"id": "00803"
										},
										{
											"dataSource": "NCI_DTP",
											"id": "NSC0256857"
										},
										{
											"dataSource": "NCI_FDA",
											"id": "WK2XYI10QM"
										},
										{
											"dataSource": "NCI_NCI-GLOSS",
											"id": "CDR0000613511"
										},
										{
											"dataSource": "NDDF",
											"id": "002377"
										},
										{
											"dataSource": "PDQ",
											"id": "CDR0000040475"
										},
										{
											"dataSource": "RCD",
											"id": "x02MO"
										},
										{
											"dataSource": "RXNORM",
											"id": "5640"
										},
										{
											"dataSource": "SNM",
											"id": "E-7772"
										},
										{
											"dataSource": "SNMI",
											"id": "C-603C0"
										},
										{
											"dataSource": "SNOMEDCT_US",
											"id": "387207008"
										},
										{
											"dataSource": "USP",
											"id": "m39860"
										},
										{
											"dataSource": "USPMG",
											"id": "MTHU000060"
										},
										{
											"dataSource": "VANDF",
											"id": "4017840"
										}
									]
								},
								{
									"entityComponentInformation": [
										{
											"entityComponentKind": "prebuiltComponent"
										}
									],
									"offset": 30,
									"length": 6,
									"text": "100 mg",
									"category": "Dosage",
									"confidenceScore": 0.98
								}
							],
							"relations": [
								{
									"confidenceScore": 1,
									"relationType": "DosageOfMedication",
									"entities": [
										{
											"ref": "#/documents/0/entities/1",
											"role": "Dosage"
										},
										{
											"ref": "#/documents/0/entities/0",
											"role": "Medication"
										}
									]
								}
							],
							"id": "1",
							"warnings": []
						}
					],
					"errors": [],
					"modelVersion": "2020-04-01"
				}
			}
		]
	}
}

Key Sample Value Description
entities [] An array containing all the extracted entities.
entityComponentKind prebuiltComponent A variable that indicates which component returned the specific entity. Possible values: prebuiltComponent, learnedComponent, listComponent
offset 0 A number denoting the starting point of the extracted entity by indexing over the characters
length 10 A number denoting the length of the extracted entity in number of characters.
text first entity The text that was extracted for a specific entity.
category MedicationName The name of the entity type or category corresponding to the extracted text.
confidenceScore 0.9 A number denoting the model's certainty level of the extracted entity ranging from 0 to 1 with higher number denoting higher certainty.
assertion certainty Assertions associated with the extracted entity. Assertions are only supported for prebuilt Text Analytics for health entities.
name Ibuprofen The normalized name for the entity linking associated with the extracted entity. Entity linking is only supported for prebuilt Text Analytics for health entities.
links [] An array containing all the results from the entity linking associated with the extracted entity. Entity linking is only supported for prebuilt Text Analytics for health entities.
dataSource UMLS The reference standard resulting from the entity linking associated with the extracted entity. Entity linking is only supported for prebuilt Text Analytics for health entities.
ID C0020740 The reference code resulting from the entity linking associated with the extracted entity belonging to the extracted data source. Entity linking is only supported for prebuilt Text Analytics for health entities.
relations [] Array containing all the extracted relationships. Relationship extraction is only supported for prebuilt Text Analytics for health entities.
relationType DosageOfMedication The category of the extracted relationship. Relationship extraction is only supported for prebuilt Text Analytics for health entities.
entities "Dosage", "Medication" The entities associated with the extracted relationship. Relationship extraction is only supported for prebuilt Text Analytics for health entities.

Clean up resources

When you no longer need your project, you can delete it with the following DELETE request. Replace the placeholder values with your own values.

{Endpoint}/language/authoring/analyze-text/projects/{projectName}?api-version={API-VERSION}
Placeholder Value Example
{ENDPOINT} The endpoint for authenticating your API request. https://<your-custom-subdomain>.cognitiveservices.azure.com
{PROJECT-NAME} The name for your project. This value is case-sensitive. myProject
{API-VERSION} The version of the API you are calling. The value referenced here is for the latest version released. See Model lifecycle to learn more about other available API versions. 2022-05-01

Headers

Use the following header to authenticate your request.

Key Value
Ocp-Apim-Subscription-Key The key to your resource. Used for authenticating your API requests.

Once you send your API request, you will receive a 202 response indicating success, which means your project has been deleted. A successful call results with an Operation-Location header used to check the status of the job.

Next steps

After you've created entity extraction model, you can:

When you start to create your own custom Text Analytics for health projects, use the how-to articles to learn more about data labeling, training and consuming your model in greater detail: