Text to Speech: imprecise Break times

mrx 21 Reputation points
2022-03-17T13:38:56.147+00:00

Asking for 20second of pause in text->speech gives only 5s and 10s for this code:
I would expect 20s!

Tested on website and over python API
https://azure.microsoft.com/en-gb/services/cognitive-services/text-to-speech/#features

  <speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="en-US">
    <voice name="en-US-SaraNeural">
      <mstts:express-as style="cheerful">
        <prosody pitch="0%" rate="1" volume="100">

        One. 
        <mstts:silence  type="Sentenceboundary"  value="20s"/>
        Two.

        One. 
        <break time="20s"/>
        Two.

        </prosody>
      </mstts:express-as>
    </voice>
  </speak>
Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
364 questions
{count} votes

1 additional answer

Sort by: Most helpful
  1. mrx 21 Reputation points
    2022-03-18T09:18:14.36+00:00

    As you can see in my 2nd comment, 4x5000ms != 20sec but 10,71sec, which makes it useless.