Quickstart: custom Text Analytics for health

Article
12/19/2023

Use this article to get started with creating a custom Text Analytics for health project where you can train custom models on top of Text Analytics for health for custom entity recognition. A model is artificial intelligence software that's trained to do a certain task. For this system, the models extract healthcare related named entities and are trained by learning from labeled data.

In this article, we use Language Studio to demonstrate key concepts of custom Text Analytics for health. As an example we’ll build a custom Text Analytics for health model to extract the Facility or treatment location from short discharge notes.

Prerequisites

Azure subscription - Create one for free

Name	Description
Subscription	Your Azure subscription.
Resource group	A resource group that will contain your resource. You can use an existing one, or create a new one.
Region	The region for your Language resource. For example, "West US 2".
Name	A name for your resource.
Pricing tier	The pricing tier for your Language resource. You can use the Free (F0) tier to try the service.

Storage account value	Recommended value
Storage account name	Any name
Storage account type	Standard LRS

Placeholder	Value	Example
`{ENDPOINT}`	The endpoint for authenticating your API request.	`https://<your-custom-subdomain>.cognitiveservices.azure.com`
`{PROJECT-NAME}`	The name for your project. This value is case-sensitive.	`myProject`
`{API-VERSION}`	The version of the API you are calling. The value referenced here is for the latest version released. See Model lifecycle to learn more about other available API versions.	`2022-05-01`

Key	Placeholder	Value	Example
`multilingual`	`true`	A boolean value that enables you to have documents in multiple languages in your dataset and when your model is deployed you can query the model in any supported language (not necessarily included in your training documents). See language support to learn more about multilingual support.	`true`
`projectName`	`{PROJECT-NAME}`	Project name	`myproject`
`storageInputContainerName`	`{CONTAINER-NAME}`	Container name	`mycontainer`
`entities`		Array containing all the entity types you have in the project. These are the entity types that will be extracted from your documents into.
`category`		The name of the entity type, which can be user defined for new entity definitions, or predefined for prebuilt entities.
`compositionSetting`	`{COMPOSITION-SETTING}`	Rule that defines how to manage multiple components in your entity. Options are `combineComponents` or `separateComponents`.	`combineComponents`
`list`		Array containing all the sublists you have in the project for a specific entity. Lists can be added to prebuilt entities or new entities with learned components.
`sublists`	`[]`	Array containing sublists. Each sublist is a key and its associated values.	`[]`
`listKey`	`One`	A normalized value for the list of synonyms to map back to in prediction.	`One`
`synonyms`	`[]`	Array containing all the synonyms	synonym
`language`	`{LANGUAGE-CODE}`	A string specifying the language code for the synonym in your sublist. If your project is a multilingual project and you want to support your list of synonyms for all the languages in your project, you have to explicitly add your synonyms to each language. See Language support for more information about supported language codes.	`en`
`values`	`"EntityNumberone"`, `"FirstEntity"`	A list of comma separated strings that will be matched exactly for extraction and map to the list key.	`"EntityNumberone"`, `"FirstEntity"`
`prebuilts`	`MedicationName`	The name of the prebuilt component populating the prebuilt entity. Prebuilt entities are automatically loaded into your project by default but you can extend them with list components in your labels file.	`MedicationName`
`documents`		Array containing all the documents in your project and list of the entities labeled within each document.	[]
`location`	`{DOCUMENT-NAME}`	The location of the documents in the storage container. Since all the documents are in the root of the container this should be the document name.	`doc1.txt`
`dataset`	`{DATASET}`	The test set to which this file will go to when split before training. Possible values for this field are `Train` and `Test`.	`Train`
`regionOffset`		The inclusive character position of the start of the text.	`0`
`regionLength`		The length of the bounding box in terms of UTF16 characters. Training only considers the data in this region.	`500`
`category`		The type of entity associated with the span of text specified.	`Entity1`
`offset`		The start position for the entity text.	`25`
`length`		The length of the entity in terms of UTF16 characters.	`20`
`language`	`{LANGUAGE-CODE}`	A string specifying the language code for the document used in your project. If your project is a multilingual project, choose the language code of the majority of the documents. See Language support for more information about supported language codes.	`en`

Key	Placeholder	Value	Example
modelLabel	`{MODEL-NAME}`	The model name that is assigned to your model once trained successfully.	`myModel`
trainingConfigVersion	`{CONFIG-VERSION}`	This is the model version that is used to train the model.	`2022-05-01`
evaluationOptions		Option to split your data across training and testing sets.	`{}`
kind	`percentage`	Split methods. Possible values are `percentage` or `manual`. See How to train a model for more information.	`percentage`
trainingSplitPercentage	`80`	Percentage of your tagged data to be included in the training set. Recommended value is `80`.	`80`
testingSplitPercentage	`20`	Percentage of your tagged data to be included in the testing set. Recommended value is `20`.	`20`

Key	Placeholder	Value	Example
`displayName`	`{JOB-NAME}`	Your job name.	`MyJobName`
`documents`	[{},{}]	List of documents to run tasks on.	`[{},{}]`
`id`	`{DOC-ID}`	Document name or ID.	`doc1`
`language`	`{LANGUAGE-CODE}`	A string specifying the language code for the document. If this key isn't specified, the service will assume the default language of the project that was selected during project creation. See language support for a list of supported language codes.	`en-us`
`text`	`{DOC-TEXT}`	Document task to run the tasks on.	`Lorem ipsum dolor sit amet`
`tasks`		List of tasks we want to perform.	`[]`
`taskName`	`Custom Text Analytics for Health Test`	The task name	`Custom Text Analytics for Health Test`
`kind`	`CustomHealthcare`	The project or task kind we are trying to perform	`CustomHealthcare`
`parameters`		List of parameters to pass to the task.
`project-name`	`{PROJECT-NAME}`	The name for your project. This value is case-sensitive.	`myProject`
`deployment-name`	`{DEPLOYMENT-NAME}`	The name of your deployment. This value is case-sensitive.	`prod`

Key	Sample Value	Description
entities	[]	An array containing all the extracted entities.
entityComponentKind	`prebuiltComponent`	A variable that indicates which component returned the specific entity. Possible values: `prebuiltComponent`, `learnedComponent`, `listComponent`
offset	`0`	A number denoting the starting point of the extracted entity by indexing over the characters
length	`10`	A number denoting the length of the extracted entity in number of characters.
text	`first entity`	The text that was extracted for a specific entity.
category	`MedicationName`	The name of the entity type or category corresponding to the extracted text.
confidenceScore	`0.9`	A number denoting the model's certainty level of the extracted entity ranging from 0 to 1 with higher number denoting higher certainty.
assertion	`certainty`	Assertions associated with the extracted entity. Assertions are only supported for prebuilt Text Analytics for health entities.
name	`Ibuprofen`	The normalized name for the entity linking associated with the extracted entity. Entity linking is only supported for prebuilt Text Analytics for health entities.
links	[]	An array containing all the results from the entity linking associated with the extracted entity. Entity linking is only supported for prebuilt Text Analytics for health entities.
dataSource	`UMLS`	The reference standard resulting from the entity linking associated with the extracted entity. Entity linking is only supported for prebuilt Text Analytics for health entities.
ID	`C0020740`	The reference code resulting from the entity linking associated with the extracted entity belonging to the extracted data source. Entity linking is only supported for prebuilt Text Analytics for health entities.
relations	[]	Array containing all the extracted relationships. Relationship extraction is only supported for prebuilt Text Analytics for health entities.
relationType	`DosageOfMedication`	The category of the extracted relationship. Relationship extraction is only supported for prebuilt Text Analytics for health entities.
entities	`"Dosage", "Medication"`	The entities associated with the extracted relationship. Relationship extraction is only supported for prebuilt Text Analytics for health entities.

Quickstart: custom Text Analytics for health

Prerequisites

Create a new Azure AI Language resource and Azure storage account

Create a new resource from the Azure portal

Upload sample data to blob container

Create a custom Text Analytics for health project

Train your model

Deploy your model

Test your model

Clean up resources

Prerequisites

Create a new Azure AI Language resource and Azure storage account

Create a new resource from the Azure portal

Upload sample data to blob container

Get your resource keys and endpoint

Create a custom Text Analytics for health project

Trigger import project job

Headers

Body

Get import job status

Request URL

Headers

Train your model

Start training job

Headers

Request body

Get training job status

Request URL

Headers

Response Body

Deploy your model

Start deployment job

Headers

Request body

Get deployment job status

Headers

Response Body

Make predictions with your trained model

Submit a custom Text Analytics for health task

Headers

Body

Response

Get task results

Headers

Response Body

Clean up resources

Headers

Next steps

Feedback

Additional resources