Upload training and testing datasets for Custom Speech
You need audio or text data for testing the accuracy of Microsoft speech recognition or training your custom models. For information about the data types supported for testing or training your model, see Training and testing datasets.
Tip
You can also use the online transcription editor to create and refine labeled audio datasets.
Upload datasets
To upload your own datasets in Speech Studio, follow these steps:
- Sign in to the Speech Studio.
- Select Custom Speech > Your project name > Speech datasets > Upload data.
- Select the Training data or Testing data tab.
- Select a dataset type, and then select Next.
- Specify the dataset location, and then select Next. You can choose a local file or enter a remote location such as Azure Blob public access URL.
- Enter the dataset name and description, and then select Next.
- Review your settings, and then select Save and close.
After your dataset is uploaded, go to the Train custom models page to train a custom model
With the Speech CLI and Speech-to-text REST API v3.0, unlike the Speech Studio, you don't choose whether a dataset is for testing or training at the time of upload. You specify how a dataset is used when you train a model or run a test.
Although you don't indicate whether the dataset is for testing or training, you must specify the dataset kind. The dataset kind is used to determine which type of dataset is created. In some cases, a dataset kind is only used for testing or training, but you shouldn't take a dependency on that. The Speech CLI and REST API kind values correspond to the options in the Speech Studio as described in the following table:
| CLI and API kind | Speech Studio options |
|---|---|
| Acoustic | Training data: Audio + human-labeled transcript Testing data: Transcript (automatic audio synthesis) Testing data: Audio + human-labeled transcript |
| AudioFiles | Testing data: Audio |
| Language | Training data: Plain text |
| Pronunciation | Training data: Pronunciation |
Note
Structured text in markdown format training datasets are not supported by the Speech CLI or Speech-to-text REST API v3.0.
To create a dataset and connect it to an existing project, use the spx csr dataset create command. Construct the request parameters according to the following instructions:
- Set the
projectparameter to the ID of an existing project. This is recommended so that you can also view and manage the dataset in Speech Studio. You can run thespx csr project listcommand to get available projects. - Set the required
kindparameter. The possible set of values for dataset kind are: Language, Acoustic, Pronunciation, and AudioFiles. - Set the required
contentUrlparameter. This is the location of the dataset. - Set the required
languageparameter. The dataset locale must match the locale of the project. The locale can't be changed later. The Speech CLIlanguageparameter corresponds to thelocaleproperty in the JSON request and response. - Set the required
nameparameter. This is the name that will be displayed in the Speech Studio. The Speech CLInameparameter corresponds to thedisplayNameproperty in the JSON request and response.
Here's an example Speech CLI command that creates a dataset and connects it to an existing project:
spx csr dataset create --kind "Acoustic" --name "My Acoustic Dataset" --description "My Acoustic Dataset Description" --project YourProjectId --content YourContentUrl --language "en-US"
You should receive a response body in the following format:
{
"self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.0/datasets/e0ea620b-e8c3-4a26-acb2-95fd0cbc625c",
"kind": "Acoustic",
"contentUrl": "https://contoso.com/mydatasetlocation",
"links": {
"files": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.0/datasets/e0ea620b-e8c3-4a26-acb2-95fd0cbc625c/files"
},
"project": {
"self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.0/projects/70ccbffc-cafb-4301-aa9f-ef658559d96e"
},
"properties": {
"acceptedLineCount": 0,
"rejectedLineCount": 0
},
"lastActionDateTime": "2022-05-20T14:07:11Z",
"status": "NotStarted",
"createdDateTime": "2022-05-20T14:07:11Z",
"locale": "en-US",
"displayName": "My Acoustic Dataset",
"description": "My Acoustic Dataset Description"
}
The top-level self property in the response body is the dataset's URI. Use this URI to get details about the dataset's project and files. You also use this URI to update or delete a dataset.
For Speech CLI help with datasets, run the following command:
spx help csr dataset
With the Speech CLI and Speech-to-text REST API v3.0, unlike the Speech Studio, you don't choose whether a dataset is for testing or training at the time of upload. You specify how a dataset is used when you train a model or run a test.
Although you don't indicate whether the dataset is for testing or training, you must specify the dataset kind. The dataset kind is used to determine which type of dataset is created. In some cases, a dataset kind is only used for testing or training, but you shouldn't take a dependency on that. The Speech CLI and REST API kind values correspond to the options in the Speech Studio as described in the following table:
| CLI and API kind | Speech Studio options |
|---|---|
| Acoustic | Training data: Audio + human-labeled transcript Testing data: Transcript (automatic audio synthesis) Testing data: Audio + human-labeled transcript |
| AudioFiles | Testing data: Audio |
| Language | Training data: Plain text |
| Pronunciation | Training data: Pronunciation |
Note
Structured text in markdown format training datasets are not supported by the Speech CLI or Speech-to-text REST API v3.0.
To create a dataset and connect it to an existing project, use the CreateDataset operation of the Speech-to-text REST API v3.0. Construct the request body according to the following instructions:
- Set the
projectproperty to the URI of an existing project. This is recommended so that you can also view and manage the dataset in Speech Studio. You can make a GetProjects request to get available projects. - Set the required
kindproperty. The possible set of values for dataset kind are: Language, Acoustic, Pronunciation, and AudioFiles. - Set the required
contentUrlproperty. This is the location of the dataset. - Set the required
localeproperty. The dataset locale must match the locale of the project. The locale can't be changed later. - Set the required
displayNameproperty. This is the name that will be displayed in the Speech Studio.
Make an HTTP POST request using the URI as shown in the following example. Replace YourSubscriptionKey with your Speech resource key, replace YourServiceRegion with your Speech resource region, and set the request body properties as previously described.
curl -v -X POST -H "Ocp-Apim-Subscription-Key: YourSubscriptionKey" -H "Content-Type: application/json" -d '{
"kind": "Acoustic",
"displayName": "My Acoustic Dataset",
"description": "My Acoustic Dataset Description",
"project": {
"self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.0/projects/70ccbffc-cafb-4301-aa9f-ef658559d96e"
},
"contentUrl": "https://contoso.com/mydatasetlocation",
"locale": "en-US",
}' "https://YourServiceRegion.api.cognitive.microsoft.com/speechtotext/v3.0/datasets"
You should receive a response body in the following format:
{
"self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.0/datasets/e0ea620b-e8c3-4a26-acb2-95fd0cbc625c",
"kind": "Acoustic",
"contentUrl": "https://contoso.com/mydatasetlocation",
"links": {
"files": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.0/datasets/e0ea620b-e8c3-4a26-acb2-95fd0cbc625c/files"
},
"project": {
"self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.0/projects/70ccbffc-cafb-4301-aa9f-ef658559d96e"
},
"properties": {
"acceptedLineCount": 0,
"rejectedLineCount": 0
},
"lastActionDateTime": "2022-05-20T14:07:11Z",
"status": "NotStarted",
"createdDateTime": "2022-05-20T14:07:11Z",
"locale": "en-US",
"displayName": "My Acoustic Dataset",
"description": "My Acoustic Dataset Description"
}
The top-level self property in the response body is the dataset's URI. Use this URI to get details about the dataset's project and files. You also use this URI to update or delete the dataset.
Important
Connecting a dataset to a Custom Speech project isn't required to train and test a custom model using the REST API or Speech CLI. But if the dataset is not connected to any project, you can't select it for training or testing in the Speech Studio.
Next steps
Povratne informacije
Pošalјite i prikažite povratne informacije za