question

sanm-7576 avatar image
0 Votes"
sanm-7576 asked romungi-MSFT commented

Speech to text Specify source language not working for all other language english only retured


Hi, I am using Speech-to- text and uploading one hindi language wave file but am not getting response in hindi language rather getting english language text. Below mentioned in my code.

var config = SpeechConfig.FromHost(new Uri("ws://**.io:5000/"));
var fileFullPath = await ReadFilePath(file);
var sourceLanguageConfig = SourceLanguageConfig.FromLanguage("hi-IN");
using (var audioConfig = AudioConfig.FromWavFileInput(fileFullPath))
using (var recognizer = new SpeechRecognizer(config, sourceLanguageConfig, audioConfig))
{
var result = await recognizer.RecognizeOnceAsync();

             if (result.Reason == ResultReason.RecognizedSpeech)
             {
                 Console.WriteLine($"We recognized: {result.Text}");
             }
             else if (result.Reason == ResultReason.NoMatch)
             {
                 Console.WriteLine($"NOMATCH: Speech could not be recognized.");
             }
             else if (result.Reason == ResultReason.Canceled)
             {
                 var cancellation = CancellationDetails.FromResult(result);
                 Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

                 if (cancellation.Reason == CancellationReason.Error)
                 {
                     Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                     Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                     Console.WriteLine($"CANCELED: Did you update the subscription info?");
                 }
             }
             var data = new Response()
             {
                 Prediction = result.Text

             };
             return new JsonResult(data);
         }
azure-cognitive-servicesazure-speech
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

romungi-MSFT avatar image
0 Votes"
romungi-MSFT answered sanm-7576 edited

@sanm-7576 I think in this case there are couple of things you should check.

  1. Set the language as config.SpeechRecognitionLanguage = "hi-IN"; instead of setting the source language config. So, the speech recognizer will only use SpeechRecognizer(config,audioConfig) . Remove all reference to source language config.


  2. I think you are printing this to console. So, the output is essentially in hindi and it is printing anything it could recognize in english only for the language pack that is installed on your machine. If you try to print it to file the text should be in hindi. I added this at the beginning of the method to print all console text to file to verify the same.

      FileStream filestream = new FileStream("out.txt", FileMode.Create);
         var streamwriter = new StreamWriter(filestream);
         streamwriter.AutoFlush = true;
         Console.SetOut(streamwriter);
         Console.SetError(streamwriter);
    

The out file should be in your debug folder. Here is the sample output for a phrase i spoke.


Say something ...
RECOGNIZED: Text=धन्यवाद।
DETAILED RESULTS:
Confidence: 0.4711725, Text: धन्यवाद।, LexicalForm: धन्यवाद, NormalizedForm: धन्यवाद, MaskedNormalizedForm: धन्यवाद।
Confidence: 0.4711725, Text: धन्यवाद सर, LexicalForm: धन्यवाद सर, NormalizedForm: धन्यवाद सर, MaskedNormalizedForm: धन्यवाद सर
Confidence: 0.4711725, Text: धंन्यवाद, LexicalForm: धंन्यवाद, NormalizedForm: धंन्यवाद, MaskedNormalizedForm: धंन्यवाद
Confidence: 0.4711725, Text: धन्यवाद दो, LexicalForm: धन्यवाद दो, NormalizedForm: धन्यवाद दो, MaskedNormalizedForm: धन्यवाद दो
Confidence: 0.4711725, Text: धन्यवाद है, LexicalForm: धन्यवाद है, NormalizedForm: धन्यवाद है, MaskedNormalizedForm: धन्यवाद है

Execution done. Your choice (0: Stop):



· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi,As you suggested i had use config.SpeechRecognitionLanguage = "hi-IN"; and pass the hindi language audio wave file but still i am getting the response in english language.


     public async Task<IActionResult> AudioFileToSpeech(IFormFile file)
     {

         var config = SpeechConfig.FromHost(new Uri("ws:/xxxx:5000/"));
         var fileFullPath = await ReadFilePath(file);
          **config.SpeechRecognitionLanguage = "hi-IN";**
         using (var audioConfig = AudioConfig.FromWavFileInput(fileFullPath))
         using (var recognizer = new SpeechRecognizer(config, audioConfig))
         {
             var result = await recognizer.RecognizeOnceAsync();

             if (result.Reason == ResultReason.RecognizedSpeech)
             {
                 Console.WriteLine($"We recognized: {result.Text}");
             }

Please help me to convert speech to text in hindi langugae and other languages.
my expectation is if user upload any language wave file or speak from microphone that should be converted into text with same language

0 Votes 0 ·
romungi-MSFT avatar image
0 Votes"
romungi-MSFT answered romungi-MSFT commented

@sanm-7576 this is the method I am using to make the call. You should be able to print to file with the above suggestions and below snippet.
I would also request you to try the same for debugging with your actual speech resource key and region config if the container endpoint is failing to do so.

     public static async Task RecognitionWithLanguageAndDetailedOutputAsync()
     {
         // Creates an instance of a speech config with specified subscription key and service region.
         // Replace with your own subscription key and service region (e.g., "westus") if using the Azure service API
         var config = SpeechConfig.FromSubscription("<your_key>", "<your_region>");
         config.SpeechRecognitionLanguage = "hi-IN";

         // Replace the language with your language in BCP-47 format, e.g., en-US.
         //var language = "en-US";
         config.OutputFormat = OutputFormat.Detailed;
            
            
         FileStream filestream = new FileStream("out.txt", FileMode.Create);
         var streamwriter = new StreamWriter(filestream);
         streamwriter.AutoFlush = true;
         Console.SetOut(streamwriter);
         Console.SetError(streamwriter);


         // Creates a speech recognizer for the specified language, using microphone as audio input.
         // Requests detailed output format.
         //using (var recognizer = new SpeechRecognizer(config, language))
         using (var recognizer = new SpeechRecognizer(config))
         {
             // Starts recognizing.
             //Console.WriteLine($"Say something in {language} ...");
             Console.WriteLine($"Say something  ...");

             // Starts speech recognition, and returns after a single utterance is recognized. The end of a
             // single utterance is determined by listening for silence at the end or until a maximum of 15
             // seconds of audio is processed.  The task returns the recognition text as result.
             // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
             // shot recognition like command or query.
             // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
             var result = await recognizer.RecognizeOnceAsync().ConfigureAwait(false);

             // Checks result.
             if (result.Reason == ResultReason.RecognizedSpeech)
             {
                 Console.WriteLine($"RECOGNIZED: Text={result.Text}");
                 Console.WriteLine("  DETAILED RESULTS:");

                 var detailedResults = result.Best();
                 foreach (var item in detailedResults) // NOTE: We need to put this in all languages, or take it out of CSharp
                 {
                     Console.WriteLine($"    Confidence: {item.Confidence}, Text: {item.Text}, LexicalForm: {item.LexicalForm}, NormalizedForm: {item.NormalizedForm}, MaskedNormalizedForm: {item.MaskedNormalizedForm}");
                     // Console.W($"    Confidence: {item.Confidence}, Text: {item.Text}, LexicalForm: {item.LexicalForm}, NormalizedForm: {item.NormalizedForm}, MaskedNormalizedForm: {item.MaskedNormalizedForm}");
                 }
             }
             else if (result.Reason == ResultReason.NoMatch)
             {
                 Console.WriteLine($"NOMATCH: Speech could not be recognized.");
             }
             else if (result.Reason == ResultReason.Canceled)
             {
                 var cancellation = CancellationDetails.FromResult(result);
                 Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

                 if (cancellation.Reason == CancellationReason.Error)
                 {
                     Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                     Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                     Console.WriteLine($"CANCELED: Did you update the subscription info?");
                 }
             }
         }
     }
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you for the response. i ran above code with SpeechRecognitionLanguage as "hi-IN"
But i didn't get hindi language as response rather got english lang response.

Say something ...
RECOGNIZED: Text=vietnam siam amir and i'm saying hey
DETAILED RESULTS:
Confidence: 0.44765067, Text: vietnam siam amir and i'm saying hey, LexicalForm: vietnam siam amir and i'm saying hey, NormalizedForm: vietnam siam amir and i'm saying hey, MaskedNormalizedForm: vietnam siam amir and i'm saying hey
Confidence: 0.41012722, Text: vietnam siam amiran arms i am here, LexicalForm: vietnam siam amiran arms i am here, NormalizedForm: vietnam siam amiran arms i am here, MaskedNormalizedForm: vietnam siam amiran arms i am here

0 Votes 0 ·

I Observed one thing if i pass the cognitive service subscription key and endpoint its retuning desired selected language
where as if i call my azure container which internally has same subscription key and endpoint is not working.

0 Votes 0 ·

@sanm-7576 I think the issue in this case is with the container as the latest container is used instead of the container with the required locale. This is documented here to use the container specific to locale instead of latest. All formats of container versions can be looked up here.

I have used the version "2.12.0-amd64-hi-in" i.e mcr.microsoft.com/azure-cognitive-services/speechservices/speech-to-text:2.12.0-amd64-hi-in in the docker run command and it produces the result in the correct language format.

 Say something  ...
 RECOGNIZED: Text=धन्यवाद नमस्कार
   DETAILED RESULTS:
     Confidence: 0.9999999, Text: धन्यवाद नमस्कार, LexicalForm: धन्यवाद नमस्कार, NormalizedForm: धन्यवाद नमस्कार, MaskedNormalizedForm: धन्यवाद नमस्कार

Could you try and check the same?



0 Votes 0 ·