question

RoyPappachan-6426 avatar image
0 Votes"
RoyPappachan-6426 asked RoyPappachan-6426 answered

Azure Speech-to-text issue

Hello,

I am using Speech SDK in my custom app. When I use speech to text, I'm facing some issues like When I say "OPE-123" ( O P E hyphen one two three ) ,it is not coming as "OPE-123" instead it is showing as "OPE 123".
I created S0 pricing model's custom speech and there also, I'm not able create a pronunciation text with the above data.

Same type of issues with some lengthy text like "OHUB2004030005", "NQ01046967" ,"NQ01046223-01" "RWSTVA7TGEN0HM" etc..

So how to solve this issue?

azure-speech
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Seems like hyphen is not working here, let me check with product group to see any workable plan here. Thanks.

Regards,
Yutong

0 Votes 0 ·

Many Thanks. We cannot avoid this because its our key like opportunities, quote etc..

0 Votes 0 ·
YutongTie-MSFT avatar image
0 Votes"
YutongTie-MSFT answered RoyPappachan-6426 rolled back

@RoyPappachan-6426

Hello Roy,

I have checked with product group and tried on my side.

If you use "dash" directly, English-US (O P E dash one two three) and got back "OP E-123" Unfortunately it inserts this space between P and E

If you wish to convert "hyphen" to "-", you have to enable dictation (see for instance: https://docs.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.speechconfig.enabledictation?view=azure-dotnet#Microsoft_CognitiveServices_Speech_SpeechConfig_EnableDictation). "hyphen" (or any other punctuation word) will be replaced by the corresponding sign. As dash is not, stricly speaking, a punctuation sign, it is always replaced by "-". There is indeed an issue with the Display form which inserts wrongly a space after the hyphen (OPE- 123) or between the P and E with dash.


Lexical form: O P E hyphen one two three -> Display form: OPE- 123.
Lexical form: O P E dash one two three -> Display form: OP E-123.

We are checking on the form to see if we can make it as OPE - 123. Thansk.

Regards,
Yutong

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@YutongTie-MSFT
Yes. you are right. Its working as you mentioned and the same case for letters and number combination.

For example : our quotation starts with "NQ234567" . When we speak this, it will add a space in between "NQ" and "234567".

I tried to put these in pronunciation text in speech studio and the result is as below.

05955-01 zero five nine five five dash zero one oh five nine five five oh one

NQ01046-01 n q zero one zero four six dash zero one nq oh one oh four six oh one

OPE-05051 o p e dash zero five zero five one ope oh five oh five one

Also I have one more doubt. I have tried to Train & Test this pronunciation text in speech studio. But its not allowing me to do it. Instead its just allowing me to deploy with model 20210527. So I'm not sure, will it be picked correctly while we use it our speech-to-text.

Thanks,
Roy






0 Votes 0 ·
RoyPappachan-6426 avatar image
0 Votes"
RoyPappachan-6426 answered

@YutongTie-MSFT ,

From the above conversation : For example : our quotation starts with "NQ234567" . When we speak this, it will add a space in between "NQ" and "234567".

Is this still an existing issue or is there any work around to solve it?

Thanks,
Roy

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.