question

KunWu-2344 avatar image
0 Votes"
KunWu-2344 asked romungi-MSFT answered

Azure Speech To Text response confidence threadhold

Hello,

One customer is using Azure speech to text SDK 1.12.1. They complained that noise or silence are mis-recognized as "S", which as a result trigger the downstream component like NLU to act unexpectedly.

Below is NBest:
{
"DisplayText":"S",
"Duration":20900000,
"Id":"718b20d216f542d6beee087ba793b752",
"NBest":[
{
"Confidence":0.11938265,
"Display":"S",
"ITN":"s",
"Lexical":"s",
"MaskedITN":""
},
{
"Confidence":0.118968755,
"Display":"M",
"ITN":"m",
"Lexical":"m",
"MaskedITN":""
},
{
"Confidence":0.11897701,
"Display":"H",
"ITN":"h",
"Lexical":"h",
"MaskedITN":""
},
{
"Confidence":0.1189549,
"Display":"L",
"ITN":"l",
"Lexical":"l",
"MaskedITN":""
},
{
"Confidence":0.11928433,
"Display":"At",
"ITN":"at",
"Lexical":"at",
"MaskedITN":""
}
],
"Offset":4400000,
"RecognitionStatus":"Success"
}

Since these NBest candidates confidence score is low as 0.11, is there any suggested confidence threadhold for customer to ignore low confidence result? like 0.2 ?

Thank you.

azure-speech
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

romungi-MSFT avatar image
0 Votes"
romungi-MSFT answered

@KunWu-2344 The guidance for non-speech noise as per official documentation is to ensure the user to try again or use better recording conditions to avoid recognition of noise as speech. If this cannot be avoided you can base it off the confidence score but there is no guidance on the limit or cutoff. Depending on the quality of the speech you could decide to ignore text below a certain threshold.


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.