question

63066220 avatar image
0 Votes"
63066220 asked 63066220 published

Azure Speech SDK Viseme Audio Offset

Hello!

I'm using Azure TTS Viseme events.
(https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-speech-synthesis-viseme?pivots=programming-language-csharp)
When using Azure tts viseme SDK, the audio offset of viseme seems to be wrong.
Some viseme comes before the real audio and some viseme comes after the real audio.
In start part of every sentence is more critical to use.

https://drive.google.com/file/d/1f3I5Lny2iJNv49bi-rG2dsZxcyygZ17u/view?usp=sharing

I attach a image file that summarizes the viseme problems that have been identified so far.

Is there any way to align these viseme audio offset to real audio?

azure-speech
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi, we weren't able to reproduce this issue. Can you kindly share the SSML used? Thanks.

1 Vote 1 ·

Hi, sorry for the late response.

This is what I used in python Speech SDK.
<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\"><voice name=\"en-US-GuyNeural\">Hello. I'm researcher.</voice></speak>

Thanks!

0 Votes 0 ·

Below is what I got.

Viseme event received: audio offset: 50.0ms, viseme id: 0.
Viseme event received: audio offset: 50.0ms, viseme id: 12.
Viseme event received: audio offset: 212.5ms, viseme id: 4.
Viseme event received: audio offset: 275.0ms, viseme id: 14.
Viseme event received: audio offset: 362.5ms, viseme id: 8.
Viseme event received: audio offset: 537.0ms, viseme id: 0.
Viseme event received: audio offset: 1350.0ms, viseme id: 0.
Viseme event received: audio offset: 1350.0ms, viseme id: 2.
Viseme event received: audio offset: 1456.25ms, viseme id: 6.
Viseme event received: audio offset: 1562.5ms, viseme id: 21.
Viseme event received: audio offset: 1612.5ms, viseme id: 13.
Viseme event received: audio offset: 1662.5ms, viseme id: 6.
Viseme event received: audio offset: 1737.5ms, viseme id: 15.
Viseme event received: audio offset: 1825.0ms, viseme id: 5.
Viseme event received: audio offset: 1862.5ms, viseme id: 13.
Viseme event received: audio offset: 1925.0ms, viseme id: 19.
Viseme event received: audio offset: 1968.75ms, viseme id: 16.
Viseme event received: audio offset: 2012.5ms, viseme id: 1.
Viseme event received: audio offset: 2062.5ms, viseme id: 13.
Viseme event received: audio offset: 2137.0ms, viseme id: 0.

0 Votes 0 ·

0 Answers