Hello,
I am using viseme events.
For example, when the text "hello" is made into speech and viseme events for speech are output, speech and viseme events do not match.
Audio offset: 50ms, viseme id: 0.
Audio offset: 50ms, viseme id: 12.
Audio offset: 237ms, viseme id: 4.
Audio offset: 300ms, viseme id: 14.
Audio offset: 387ms, viseme id: 8.
Audio offset: 512ms, viseme id: 0.
The actual voice start is 118 ms, but the Audio offset comes out to be 50 ms. How can I solve this problem?