I'm trying to use Microsoft Azure's Cognitive Services TTS to convert written Chinese into spoken audio. I have a Chinese-language learning app that includes a dictionary of 120,000+ Chinese words.
I was previously using TTS from Baidu, which let me dynamically reference audio with a URL that looked something like this:
https://tsn.baidu.com/text2audio?tex=战(zhan4)线(xian4)&lan=zh&spd=4&tok={my token that was re-generated every month}
This was useful for three reasons:
I could embed this into an HTML audio tag and play it with JavaScript only when (or if!) I ever needed it.
I did not need to store the audio anywhere -- it was always available through the URL.
As the speech engine improved over the years, the quality of my app's audio improved.
With Microsoft Azure, I've figured out how to create TTS audio, but it seems as if the result must always be downloaded and saved in order for me to do anything with it.
Thus:
1) I have to create millions of audio files and then save them (Amazon S3) ahead of time, even if I may never need them.
2) I will never get updates to the voice quality unless I re-run a batch process to re-create all the audio again.
Am I missing something? Is it true that I can't create URLs for audio I want converted on the fly by Azure TTS?