question

StephenCummings-3261 avatar image
1 Vote"
StephenCummings-3261 asked ramr-msft commented

Am I right that the Long Audio API is very clunky and limited when it comes to voice selection? Am I right to eschew/forswear/bag it in favor of using the regular synthesizer to make short files and stitch them together?

Am I right that even if you can configure a bunch of separate speech resources, each for a different region, you still can't access all of the neural voices through Long Audio? In my tests so far the voices returned by get_voices() — I'm on python — are (for each region that returns any) a freakily random set. In a resource configured for the 'centralindia' region I get no Hindi or other Indian voices. At the moment I need Hindi, Mandarin Chinese, Norwegian Bokmal, and English in various flavors.

I should forget Long Audio, right? Or is there a secret set of steps to follow to get to full access to all the voices?

I was very happy with my initial results with the speech synthesizer sdk and am Microsoft-leaning so I haven't investigated the competition yet. Does anyone know whether Polly or Google provide a simpler path to voice/language options during conversion of long text files to speech?



azure-speech
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@StephenCummings-3261 Thanks for the question. The Long Audio API supports the following Public Neural Voices and Custom Neural Voices.
Can you please share get_voices() that you are getting for the 'centralindia' region.
The Audio Content Creation platform makes high-quality audiobooks and enables you to visually control speech attributes in real-time – such as voice style, rate, pitch, volume, pronunciation and breaks. It allows you to quickly create more accurate, expressive and customized audio.

We are investigating the issue internally will confirm on the same.


0 Votes 0 ·

Here's a text file with my results from the get_voices() with the region set to centralindia.120229-voices-centralindia.txt


0 Votes 0 ·

Regarding the supported voices, I was already aware of the list you linked, and am glad that Long Audio is supposed to support them all. I guess it just doesn't work that way for me.

It would be much, much easier if you just published and kept updated a list of which voices go with which region, rather than making people try to find the voices they need by trial and error. (And how are these regions defined? There's a centralindia, but no northindia or southindia? Why?)

Of course, it would make much more sense if you just made all voices available everywhere—restrictions based on "regions" don't make any sense in a global economy.

0 Votes 0 ·
ramr-msft avatar image ramr-msft StephenCummings-3261 ·

@StephenCummings-3261 Thanks for the details. We have forwarded to the product team to check on this.

0 Votes 0 ·

0 Answers