question

BrooksMeadowcroft-4026 avatar image
0 Votes"
BrooksMeadowcroft-4026 asked YutongTie-MSFT answered

Speech Recogntion Cross-Device

In using the Speech Recognition service, we have found that if a user uses a device to enroll a Microsoft speech profile and then uses a different device to try to verify on that same profile, that the scores are much lower than when verifying on the same device. Is this a known limitation of the service? For example, in recent testing we found scores from enrolling on a laptop and verifying with that same laptop to be scoring in the 80s. While trying to verify with a pair of headphones on the same profile with the same user we see scores in the 50s and as low as 42. We understand that 50 is a passing score in the service's eyes but from testing with other users trying to "spoof" the system we found scores as high as 57 so in our eyes anything below 60 isn't an acceptable score.

azure-speech
· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks for reaching out to us. Could you please share which kind device you are using so that we can check on that?


Regards,
Yutong

0 Votes 0 ·

Thank you for the response. It is not limited to this case but the numbers I provided above were when the enrollment session was done with the microphone in a 2019 MacBook Pro and the verifications were done on Bose sound sport headphones.

0 Votes 0 ·
YutongTie-MSFT avatar image YutongTie-MSFT BrooksMeadowcroft-4026 ·

Thanks for the details. I am reproducing this and checking internally. Will let you know soon.


Regards,
Yutong

0 Votes 0 ·
Show more comments

1 Answer

YutongTie-MSFT avatar image
0 Votes"
YutongTie-MSFT answered

Hello @BrooksMeadowcroft-4026

The misalignment with different devices is expected to reduce the similarity. What we suggest is to lower the threshold to below 0.5, say 0.45, 0.40 etc. (ignoring the default 0.5 threshold). We always recommend the customer to tune their own threshold based on their specific scenario with labeled data. Let me know if you need more help on the threshold tuning.

We do post processing to handle other sample rate (e.g. downsampling to 16kHz) but the quality is not guaranteed thus we don't put it in the doc. This is not recent change. If possible, could you process the data to 16kHz to ensure the quality?

Regards,
Yutong

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.