I'm currently developing a solution where we are using text-to-speech API with "ro-RO-AlinaNeural" synthesis voice name.
The solution produces a blob (audio file) that we are storing in Azure Blob Storage. Afterwards we use the produced "audio file" in all requests that are made to our application where the audio is required. This way we don't execute the text-to-speech API over and over again with the same text.
The text that serves as source input for the text-to-speech API is produced by us and it complies with Microsoft code of conduct (https://docs.microsoft.com/en-us/legal/cognitive-services/speech-service/tts-code-of-conduct?context=/azure/cognitive-services/speech-service/context/context).
The text has always the same value, thus the need to use always the produced audio file, rather than execute the API for all requests.
I would like to understand if this solution is fully compliant with Microsoft usage conditions. Thanks