Inspect Custom Speech data
This page assumes you've read Prepare test data for Custom Speech and have uploaded a dataset for inspection.
Custom Speech provides tools that allow you to visually inspect the recognition quality of a model by comparing audio data with the corresponding recognition result. From the Custom Speech portal, you can play back uploaded audio and determine if the provided recognition result is correct. This tool allows you to quickly inspect quality of Microsoft's baseline speech-to-text model or a trained custom model without having to transcribe any audio data.
In this document, you'll learn how to visually inspect the quality of a model using the training data you previously uploaded.
On this page, you'll learn how to visually inspect the quality of Microsoft's baseline speech-to-text model and/or a custom model that you've trained. You'll use the data you uploaded to the Data tab for testing.
Create a test
Follow these instructions to create a test:
- Sign in to the Custom Speech portal.
- Navigate to Speech-to-text > Custom Speech > Testing.
- Click Add Test.
- Select Inspect quality (Audio-only data). Give the test a name, description, and select your audio dataset.
- Select up to two models that you'd like to test.
- Click Create.
After a test has been successfully created, you can compare the models side by side.
Side-by-side model comparisons
When the test status is Succeeded, click in the test item name to see details of the test. This detail page lists all the utterances in your dataset, indicating the recognition results of the two models alongside the transcription from the submitted dataset.
To help inspect the side-by-side comparison, you can toggle various error types including insertion, deletion, and substitution. By listening to the audio and comparing recognition results in each column (showing human-labeled transcription and the results of two speech-to-text models), you can decide which model meets your needs and where improvements are needed.
Inspecting quality testing is useful to validate if the quality of a speech recognition endpoint is enough for an application. For an objective measure of accuracy, requiring transcribed audio, follow the instructions found in Evaluate Accuracy.