Batch testing with a set of example utterances
Batch testing is a comprehensive test on your current trained model to measure its performance in LUIS. The data sets used for batch testing should not include example utterances in the intents or utterances received from the prediction runtime endpoint.
Import a dataset file for batch testing
Select Test in the top bar, and then select Batch testing panel.
Select Import dataset. The Import new dataset dialog box appears. Select Choose File and locate a JSON file with the correct JSON format that contains no more than 1,000 utterances to test.
Import errors are reported in a red notification bar at the top of the browser. When an import has errors, no dataset is created. For more information, see Common errors.
In the Dataset Name field, enter a name for your dataset file. The dataset file includes an array of utterances including the labeled intent and entities. Review the example batch file for syntax.
Select Done. The dataset file is added.
Run, rename, export, or delete dataset
To run, rename, export, or delete the dataset, use the ellipsis (...) button at the end of the dataset row.
Run a batch test on your trained app
To run the test, select the dataset name. When the test completes, this row displays the test result of the dataset.
The downloadable dataset is the same file that was uploaded for batch testing.
|All utterances are successful.|
|At least one utterance intent did not match the prediction.|
|Test is ready to run.|
View batch test results
To review the batch test results, select See results.
Filter chart results
To filter the chart by a specific intent or entity, select the intent or entity in the right-side filtering panel. The data points and their distribution update in the graph according to your selection.
View single-point utterance data
In the chart, hover over a data point to see the certainty score of its prediction. Select a data point to retrieve its corresponding utterance in the utterances list at the bottom of the page.
View section data
In the four-section chart, select the section name, such as False Positive at the top-right of the chart. Below the chart, all utterances in that section display below the chart in a list.
In this preceding image, the utterance
switch on is labeled with the TurnAllOn intent, but received the prediction of None intent. This is an indication that the TurnAllOn intent needs more example utterances in order to make the expected prediction.
The two sections of the chart in red indicate utterances that did not match the expected prediction. These indicate utterances which LUIS needs more training.
The two sections of the chart in green did match the expected prediction.
Roles in batch testing
Entity roles are not supported in batch testing.
If testing indicates that your LUIS app doesn't recognize the correct intents and entities, you can work to improve your LUIS app's performance by labeling more utterances or adding features.