Quickstart: Custom text classification (preview)
Use this article to get started with creating a custom text classification project where you train custom models for text classification. A model is a machine learning object that will learn from example data we provide, and trained to classify text afterwards.
Prerequisites
- Azure subscription - Create one for free.
Create a new Azure resource and Azure Blob Storage account
Before you can use custom text classification, you will need to create an Azure Language resource, which will give you the credentials needed to create a project and start training a model. You will also need an Azure storage account, where you can upload your dataset that will be used to building your model.
Important
To get started quickly, we recommend creating a new Azure Language resource using the steps provided below, which will let you create the resource, and configure a storage account at the same time, which is easier than doing it later.
If you have a pre-existing resource you'd like to use, you will need to configure it and a storage account separately. See the project requirements for information.
Go to the Azure portal to create a new Azure Language resource. If you're asked to select additional features, select Custom text classification & custom NER. When you create your resource, ensure it has the following parameters.
Azure resource requirement Required value Location "West US 2" or "West Europe" Pricing tier Standard (S) pricing tier In the Custom named entity recognition (NER) & custom text classification (Preview) section, select an existing storage account or select Create a new storage account. Note that these values are for this quickstart, and not necessarily the storage account values you will want to use in production environments.
Storage account value Recommended value Name Any name Performance Standard Account kind Storage (general purpose v1) Replication Locally redundant storage (LRS) Location Any location closest to you, for best latency.
Upload sample data to blob container
After you have created an Azure storage account and linked it to your Language resource, you will need to upload the example files to the root directory of your container for this quickstart. These files will later be used to train your model.
Download the sample movie summary data for this quickstart from GitHub. Open the .zip file, and extract the folder containing text files within it.
In the Azure portal, navigate to the storage account you created, and select it.
In your storage account, select Containers from the left menu, located below Data storage. On the screen that appears, select + Container. Give the container the name example-data and leave the default Public access level.
After your container has been created, click on it. Then select the Upload button to select the .txt and .json files you downloaded earlier.
Tip
When you select files to upload, a file explorer will open to your computer. To select all the files in the folder, press ctrl + a.
The provided sample dataset contains around 200 movie summaries that belong to one or more of the following classes: "Mystery", "Drama", "Thriller", "Comedy", "Action".
Create a custom text classification project
Once your resource and storage container are configured, create a new custom text classification project. A project is a work area for building your custom AI models based on your data. Your project can only be accessed by you and others who have contributor access to the Azure resource being used.
Log in to Language Studio. A window will appear to let you select your subscription and Language resource. Select the resource you created in the above step.
Under the Classify text section of Language Studio, select Custom text classification from the available services, and select it.
Select Create new project from the top menu in your projects page. Creating a project will let you tag data, train, evaluate, improve, and deploy your models.
If you have created your resource using the steps above, you will need to add information about your project, like a name, and select your storage container.
Select your project type. For this quickstart, we will create a multi label classification project. Then click Next.
Enter the project information, including a name, description, and the language of the files in your project. You will not be able to change the name of your project later.
Tip
Your dataset doesn't have to be entirely in the same language. You can have multiple files, each with different supported languages. If your dataset contains files of different languages or if you expect different languages during runtime, select enable multi-lingual dataset when you enter the basic information for your project.
Select the container where you have uploaded your data. For this quickstart, we will use the existing tags file available in the container. Then click Next.
Review the data you entered and select Create Project.
Train your model
Typically after you create a project, you would import your data and begin tagging the entities within it to train the custom text classification model. For this quickstart, you will use the example tagged data file you downloaded earlier, and stored in your Azure storage account.
A model is the machine learning object that will be trained to classify text. Your model will learn from the example data, and be able to classify technical support tickets afterwards.
To start training your model:
Select Train from the left side menu.
Select Train a new model and type in the model name in the text box below.
Click on the Train button at the bottom of the page.
Note
- When you tag your data you can determine how your dataset is split into training and testing sets. You can also have your data split randomly into training and testing sets.
- Training can take up to a few hours.
Deploy your model
Generally after training a model you would review it's evaluation details and make improvements if necessary. In this quickstart, you will just deploy your model, and make it available for you to try.
After your model is trained, you can deploy it. Deploying your model lets you start using it to classify text, using Analyze API.
Go to your project in Language studio.
From the left panel, select Deploy model.
Click on Add deployment to submit a new deployment job.
In the window that appears, you can create a new deployment name by or override an existing one. Then, you can add a trained model to this deployment name.
Test your model
After your model is deployed, you can start using it for custom text classification. Use the following steps to send your first custom text classification request.
Select Test model from the left side menu.
Select the model you want to test.
Using one of the files you downloaded earlier, add the file's text to the textbox. You can also upload a
.txtfile.Click on Run the test.
In the Result tab, you can see the predicted classes for your text. You can also view the JSON response under the JSON tab.
Clean up projects
When you don't need your project anymore, you can delete it from your projects page in Language Studio. Select the project you want to delete and click on Delete.
Prerequisites
- Azure subscription - Create one for free.
Create a new Azure resource and Azure Blob Storage account
Before you can use custom text classification, you will need to create a Language resource, which will give you the subscription and credentials you will need to create a project and start training a model. You will also need an Azure blob storage account, which is the required online data storage to hold text for analysis.
Important
To get started quickly, we recommend creating a new Azure Language resource using the steps provided below, which will let you create the resource, and configure a storage account at the same time, which is easier than doing it later.
If you have a pre-existing resource you'd like to use, you will need to configure it and a storage account separately. See the Project requirements for information.
Go to the Azure portal to create a new Azure Language resource. If you're asked to select additional features, select Skip this step. When you create your resource, ensure it has the following parameters.
Azure resource requirement Required value Location "West US 2" or "West Europe" Pricing tier Standard (S) pricing tier In the Custom named entity recognition (NER) & custom text classification (Preview) section, select Create a new storage account. These values are for this quickstart, and not necessarily the storage account values you will want to use in production environments.
Storage account value Recommended value Name Any name Performance Standard Account kind Storage (general purpose v1) Replication Locally redundant storage (LRS) Location Any location closest to you, for best latency.
Upload sample data to blob container
After you have created an Azure storage account and linked it to your Language resource, you will need to upload the example files to the root directory of your container for this quickstart. These files will later be used to train your model.
Download the sample movie summary data for this quickstart from GitHub. Open the .zip file, and extract the folder containing text files within it.
In the Azure portal, navigate to the storage account you created, and select it.
In your storage account, select Containers from the left menu, located below Data storage. On the screen that appears, select + Container. Give the container the name example-data and leave the default Public access level.
After your container has been created, click on it. Then select the Upload button to select the .txt and .json files you downloaded earlier.
Tip
When you select files to upload, a file explorer will open to your computer. To select all the files in the folder, press ctrl + a.
The provided sample dataset contains around 200 movie summaries that belong to one or more of the following classes: "Mystery", "Drama", "Thriller", "Comedy", "Action".
Get your resource keys and endpoint
Go to your resource overview page in the Azure portal
From the menu on the left side, select Keys and Endpoint. You will use the endpoint and key for the API requests
Create project
To start creating a custom text classification model, you need to create a project. Creating a project will let you tag data, train, evaluate, improve, and deploy your models.
Note
The project name is case-sensitive for all operations.
Create a POST request using the following URL, headers, and JSON body to create your project and import the tags file.
Request URL
Use the following URL to create a project and import your tags file. Replace the placeholder values below with your own values.
{YOUR-ENDPOINT}/language/analyze-text/projects/{projectName}/:import?api-version=2021-11-01-preview
| Placeholder | Value | Example |
|---|---|---|
{YOUR-ENDPOINT} |
The endpoint for authenticating your API request. | https://<your-custom-subdomain>.cognitiveservices.azure.com |
Headers
Use the following header to authenticate your request.
| Key | Value |
|---|---|
Ocp-Apim-Subscription-Key |
The key to your resource. Used for authenticating your API requests. |
Body
Use the following JSON in your request. Replace the placeholder values below with your own values.
{
"api-version": "2021-11-01-preview",
"metadata": {
"name": "MyProject",
"multiLingual": true,
"description": "Trying out custom text classification",
"modelType": "multiClassification",
"language": "string",
"storageInputContainerName": "YOUR-CONTAINER-NAME",
"settings": {}
},
"assets": {
"classifiers": [
{
"name": "Class1"
}
],
"documents": [
{
"location": "doc1.txt",
"language": "en-us",
"dataset": "Train",
"classifiers": [
{
"classifierName": "Class1"
}
]
}
]
}
}
For the metadata key:
| Key | Value | Example |
|---|---|---|
modelType |
Your Model type, for single label classification use singleClassification. |
multiClassification |
storageInputContainerName |
The name of your Azure blob storage container. | myContainer |
For the documents key:
| Key | Value | Example |
|---|---|---|
location |
Document name on the blob store. | doc2.txt |
language |
The language of the document. | en-us |
dataset |
Optional field to specify the dataset which this document will belong to. | Train or Test |
This request will return an error if:
- The selected resource doesn't have proper permission for the storage account.
Start training your model
After your project has been created, you can begin training a custom text classification model. Create a POST request using the following URL, headers, and JSON body to start training a custom text classification model.
Request URL
Use the following URL when creating your API request. Replace the placeholder values below with your own values.
{YOUR-ENDPOINT}/language/analyze-text/projects/{PROJECT-NAME}/:train?api-version=2021-11-01-preview
| Placeholder | Value | Example |
|---|---|---|
{YOUR-ENDPOINT} |
The endpoint for authenticating your API request. | https://<your-custom-subdomain>.cognitiveservices.azure.com |
{PROJECT-NAME} |
The name for your project. This value is case-sensitive. | myProject |
Headers
Use the following header to authenticate your request.
| Key | Value |
|---|---|
Ocp-Apim-Subscription-Key |
The key to your resource. Used for authenticating your API requests. |
Request body
Use the following JSON in your request. The model will be named MyModel once training is complete.
{
"modelLabel": "MyModel",
"runValidation": true,
"evaluationOptions":
{
"type":"percentage",
"testingSplitPercentage":"30",
"trainingSplitPercentage":"70"
}
}
| Key | Value | Example |
|---|---|---|
modelLabel |
Your Model name. | MyModel |
runValidation |
Boolean value to run validation on the test set. | True or False |
evaluationOptions |
Specifies evaluation options. | |
type |
Specifies datasplit type. | set or percentage |
testingSplitPercentage |
Required integer field if type is percentage. Specifies testing split. |
30 |
trainingSplitPercentage |
Required integer field if type is percentage. Specifies training split. |
70 |
Once you send your API request, you will receive a 202 response indicating success. In the response headers, extract the location value. It will be formatted like this:
{YOUR-ENDPOINT}/language/analyze-text/projects/{YOUR-PROJECT-NAME}/train/jobs/{JOB-ID}?api-version=2021-11-01-preview
JOB-ID is used to identify your request, since this operation is asynchronous. You will use this URL in the next step to get the training status.
Get Training Status
Use the following GET request to query the status of your model's training process. You can use the URL you received from the previous step, or replace the placeholder values below with your own values.
{YOUR-ENDPOINT}/language/analyze-text/projects/{YOUR-PROJECT-NAME}/train/jobs/{JOB-ID}?api-version=2021-11-01-preview
| Placeholder | Value | Example |
|---|---|---|
{YOUR-ENDPOINT} |
The endpoint for authenticating your API request. | https://<your-custom-subdomain>.cognitiveservices.azure.com |
{PROJECT-NAME} |
The name for your project. This value is case-sensitive. | myProject |
{JOB-ID} |
The ID for locating your model's training status. This is in the location header value you received in the previous step. |
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx |
Headers
Use the following header to authenticate your request.
| Key | Value |
|---|---|
Ocp-Apim-Subscription-Key |
The key to your resource. Used for authenticating your API requests. |
Response Body
Once you send the request, you will get the following response.
{
"jobs": [
{
"result": {
"trainedModelLabel": "MyModel",
"trainStatus": {
"percentComplete": 0,
"elapsedTime": "string"
},
"evaluationStatus": {
"percentComplete": 0,
"elapsedTime": "string"
}
},
"jobId": "string",
"createdDateTime": "2021-10-19T23:24:41.572Z",
"lastUpdatedDateTime": "2021-10-19T23:24:41.572Z",
"expirationDateTime": "2021-10-19T23:24:41.572Z",
"status": "unknown",
"errors": [
{
"code": "unknown",
"message": "string"
}
]
}
],
"nextLink": "string"
}
Deploy your model
Create a PUT request using the following URL, headers, and JSON body to start deploying a custom text classification model.
{YOUR-ENDPOINT}/language/analyze-text/projects/{PROJECT-NAME}/deployments/{DEPLOYMENT-NAME}?api-version=2021-11-01-preview
| Placeholder | Value | Example |
|---|---|---|
{YOUR-ENDPOINT} |
The endpoint for authenticating your API request. | https://<your-custom-subdomain>.cognitiveservices.azure.com |
{PROJECT-NAME} |
The name for your project. This value is case-sensitive. | myProject |
{DEPLOYMENT-NAME} |
The name of your deployment. This value is case-sensitive. | prod |
Headers
Use the following header to authenticate your request.
| Key | Value |
|---|---|
Ocp-Apim-Subscription-Key |
The key to your resource. Used for authenticating your API requests. |
Request body
Use the following JSON in your request. The model will be named MyModel once training is complete.
{
"trainedModelLabel": "MyModel",
"deploymentName": "prod"
}
Once you send your API request, you will receive a 202 response indicating success. In the response headers, extract the location value. It will be formatted like this:
{YOUR-ENDPOINT}/language/analyze-text/projects/{YOUR-PROJECT-NAME}/deployments/{DEPLOYMENT-NAME}/jobs/{JOB-ID}?api-version=2021-11-01-preview
JOB-ID is used to identify your request, since this operation is asynchronous. You will use this URL in the next step to get the publishing status.
Get the deployment status
Use the following GET request to query the status of your model's publishing process. You can use the URL you received from the previous step, or replace the placeholder values below with your own values.
{YOUR-ENDPOINT}/language/analyze-text/projects/{YOUR-PROJECT-NAME}/deployments/{DEPLOYMENT-NAME}/jobs/{JOB-ID}?api-version=2021-11-01-preview
| Placeholder | Value | Example |
|---|---|---|
{YOUR-ENDPOINT} |
The endpoint for authenticating your API request. | https://<your-custom-subdomain>.cognitiveservices.azure.com |
{PROJECT-NAME} |
The name for your project. This value is case-sensitive. | myProject |
{DEPLOYMENT-NAME} |
The name of your deployment. This value is case-sensitive. | prod |
{JOB-ID} |
The ID for locating your model's training status. This is in the location header value you received in the previous step. |
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx |
Headers
Use the following header to authenticate your request.
| Key | Value |
|---|---|
Ocp-Apim-Subscription-Key |
The key to your resource. Used for authenticating your API requests. |
Submit a custom text classification task
Note
Project names are case sensitive.
Use this POST request to start an entity extraction task. Replace {projectName} with the project name where you have the model you want to use.
{YOUR-ENDPOINT}/text/analytics/v3.2-preview.2/analyze
Headers
| Key | Value |
|---|---|
| Ocp-Apim-Subscription-Key | Your subscription key that provides access to this API. |
Body
{
"displayName": "MyJobName",
"analysisInput": {
"documents": [
{
"id": "doc1",
"text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc tempus, felis sed vehicula lobortis, lectus ligula facilisis quam, quis aliquet lectus diam id erat. Vivamus eu semper tellus. Integer placerat sem vel eros iaculis dictum. Sed vel congue urna."
},
{
"id": "doc2",
"text": "Mauris dui dui, ultricies vel ligula ultricies, elementum viverra odio. Donec tempor odio nunc, quis fermentum lorem egestas commodo. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos."
}
]
},
"tasks": {
"customMultiClassificationTasks": [
{
"parameters": {
"project-name": "MyProject",
"deployment-name": "MyDeploymentName"
"stringIndexType": "TextElements_v8"
}
}
]
}
}
| Key | Sample Value | Description |
|---|---|---|
| displayName | "MyJobName" | Your job Name |
| documents | [{},{}] | List of documents to run tasks on |
| ID | "doc1" | a string document identifier |
| text | "Lorem ipsum dolor sit amet" | You document in string format |
| "tasks" | [] | List of tasks we want to perform. |
| -- | customMultiClassificationTasks | Task identifer for task we want to perform. Use customClassificationTasks for single label classification tasks and customMultiClassificationTasks for multi label classification tasks. |
| parameters | [] | List of parameters to pass to task |
| project-name | "MyProject" | Your project name. The project name is case-sensitive. |
| deployment-name | "MyDeploymentName" | Your deployment name |
Replace the text of the document with movie summaries to classify.
Response
You will receive a 202 response indicating success. In the response headers, extract operation-location.
operation-location is formatted like this:
{YOUR-ENDPOINT}/text/analytics/v3.2-preview.2/analyze/jobs/<jobId>
You will use this endpoint to get the custom text classification task results.
Get the custom text classification task status and results
Use the following GET request to query the status/results of the custom classification task. You can use the endpoint you received from the previous step.
{YOUR-ENDPOINT}/text/analytics/v3.2-preview.2/analyze/jobs/<jobId>.
Headers
| Key | Value |
|---|---|
| Ocp-Apim-Subscription-Key | Your Subscription key that provides access to this API. |
Response body
The response will be a JSON document with the following parameters.
{
"createdDateTime": "2021-05-19T14:32:25.578Z",
"displayName": "MyJobName",
"expirationDateTime": "2021-05-19T14:32:25.578Z",
"jobId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"lastUpdateDateTime": "2021-05-19T14:32:25.578Z",
"status": "completed",
"errors": [],
"tasks": {
"details": {
"name": "MyJobName",
"lastUpdateDateTime": "2021-03-29T19:50:23Z",
"status": "completed"
},
"completed": 1,
"failed": 0,
"inProgress": 0,
"total": 1,
"tasks": {
"customMultiClassificationTasks": [
{
"lastUpdateDateTime": "2021-05-19T14:32:25.579Z",
"name": "MyJobName",
"status": "completed",
"results": {
"documents": [
{
"id": "doc1",
"classes": [
{
"category": "Class_1",
"confidenceScore": 0.0551877357
}
],
"warnings": []
},
{
"id": "doc2",
"classes": [
{
"category": "Class_1",
"confidenceScore": 0.0551877357
},
{
"category": "Class_2",
"confidenceScore": 0.0551877357
}
],
"warnings": []
}
],
"errors": [],
"statistics": {
"documentsCount":0,
"erroneousDocumentsCount":0,
"transactionsCount":0
}
}
}
]
}
}
Clean up resources
When you no longer need your project, you can delete it with the following DELETE request. Replace the placeholder values with your own values.
{YOUR-ENDPOINT}/language/analyze-text/projects/{PROJECT-NAME}
| Placeholder | Value | Example |
|---|---|---|
{YOUR-ENDPOINT} |
The endpoint for authenticating your API request. | https://<your-custom-subdomain>.cognitiveservices.azure.com |
{PROJECT-NAME} |
The name for your project. This value is case-sensitive. | myProject |
Headers
Use the following header to authenticate your request.
| Key | Value |
|---|---|
| Ocp-Apim-Subscription-Key | The key to your resource. Used for authenticating your API requests. |
Next steps
After you've created a custom text classification model, you can:
When you start to create your own custom text classification projects, use the how-to articles to learn more about developing your model in greater detail:
Povratne informacije
Pošalјite i prikažite povratne informacije za