I am passing a Stream to the service but never return the text

Question

I am trying to convert live stream to text , this works fine with microphone , but when i send the Stram it was not recognized I already try same stremresult file and works fine from file but not for live stream

   AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] {"es-HN","es-MX","es-PA"});
    var config = SpeechConfig.FromSubscription(_speechKey, _speechRegion);
    //config.SpeechRecognitionLanguage = "es-MX";
    //config.EnableDictation();
    config.OutputFormat=OutputFormat.Detailed;
    //config.SetProperty(PropertyId.Speech_LogFilename, logFilex);

        var stopRecognition = new TaskCompletionSource();

    byte channels = 1;
    byte bitsPerSample = 16;
    uint samplesPerSecond = 8000; // 768kbps = 768*1024 bps = 786432 bps; 786432 bps / 16 bits/sample = 49152 samples/second
    var audioFormat = AudioStreamFormat.GetWaveFormatPCM(samplesPerSecond, bitsPerSample, channels);

        var callback = new AudioInputCallback(Sentstream);
       
        using (var audioInput = AudioConfig.FromStreamInput(callback, audioFormat))
    {
        // Creates a speech recognizer using audio stream input.
        using (var recognizer = new SpeechRecognizer(config, autoDetectSourceLanguageConfig, audioInput))
        {
            Thread.Sleep(5000);
            // Subscribes to events.
            recognizer.Recognizing += (s, e) =>
            {
                Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}");

            };

            recognizer.Recognized += (s, e) =>
            {
                if (e.Result.Reason == ResultReason.RecognizedSpeech)
                {
                    Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
                }
                else if (e.Result.Reason == ResultReason.NoMatch)
                {
                    Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                }
            };

            recognizer.Canceled += (s, e) =>
            {
                Console.WriteLine($"CANCELED: Reason={e.Reason}");

                if (e.Reason == CancellationReason.Error)
                {
                    Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
                    Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
                    Console.WriteLine($"CANCELED: Did you update the subscription info?");
                }

                stopRecognition.TrySetResult(0);
            };

            recognizer.SessionStarted += (s, e) =>
            {
                Console.WriteLine("
    Session started event.");
            };

            recognizer.SessionStopped += (s, e) =>
            {
                Console.WriteLine("
    Session stopped event.");
                Console.WriteLine("
Stop recognition.");
                stopRecognition.TrySetResult(0);
            };

            // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
            await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);

            // Waits for completion.
            // Use Task.WaitAny to keep the task rooted.
            Task.WaitAny(new[] { stopRecognition.Task });

            // Stops recognition.
            await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
        }
    }
     
        recorder.Stop();
        Sentstream.Close();
        ReceivedStream.Close();
        //var fileStream = File.Create(outputfilename2);
        //speakerStream.Seek(0, SeekOrigin.Begin);
        //speakerStream.Close();
        //speakerStream.CopyTo(fileStream);
       // fileStream.Close();
    }
    finally
    {
        // Unsubscribe to stop getting events
        EventLogger.OnMessage -= OnMessageEvent;
    }

    // See resulting logs on the console
    Console.WriteLine("Here are the logs we captured:");
    foreach (string message in eventMessages)
    {
        Console.Write(message);
    }            

}

attached is the log file where i only can see :

[560060]: 25446ms SPX_DBG_TRACE_VERBOSE: audio_stream_session.cpp:466 [06C09CF0]CSpxAudioStreamSession::SetFormat: format != nullptr

can you please advice to fix the problem and have real time speach to text conversion?

Answer

@Juan Morales Marañon Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

Plan 1:
Could you please test with the below sample and check ?

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

namespace SpeechToTextStream
{
    class Program
    {
        static async Task Main(string[] args)
        {
            var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US" });
            var config = SpeechConfig.FromSubscription("965XXXXXXXXX0c8c8c2", "eastus");
            config.OutputFormat = OutputFormat.Detailed;
            config.SetProperty(PropertyId.Speech_LogFilename, "logFile.txt");

            byte channels = 1;
            byte bitsPerSample = 16;
            uint samplesPerSecond = 8000;
            var audioFormat = AudioStreamFormat.GetWaveFormatPCM(samplesPerSecond, bitsPerSample, channels);

            Stream myAudioStream = File.OpenRead(@"C:\myfile.wav");
            var callback = new MyPushAudioInputStreamCallback(myAudioStream);
             var pullStream = AudioInputStream.CreatePullStream(callback, audioFormat);
             var audioInput = AudioConfig.FromStreamInput(pullStream);


            var recognizer = new SpeechRecognizer(config, autoDetectSourceLanguageConfig, audioInput);

            Console.WriteLine("Processing the audio file...");
            var result = await recognizer.RecognizeOnceAsync();

            if (result.Reason == ResultReason.RecognizedSpeech)
            {
                Console.WriteLine($"We recognized: {result.Text}");
            }
            else if (result.Reason == ResultReason.NoMatch)
            {
                Console.WriteLine($"NOMATCH: Speech could not be recognized.");
            }
            else if (result.Reason == ResultReason.Canceled)
            {
                var cancellation = CancellationDetails.FromResult(result);
                Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

                if (cancellation.Reason == CancellationReason.Error)
                {
                    Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                    Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                    Console.WriteLine($"CANCELED: Did you update the subscription info?");
                }
            }
            Console.ReadLine();
        }

    }

    public class MyPushAudioInputStreamCallback : PullAudioInputStreamCallback
    {
        private Stream audioStream;

        public MyPushAudioInputStreamCallback(Stream audioStream)
        {
            this.audioStream = audioStream;
        }

        public override int Read(byte[] dataBuffer, uint size)
        {
            try
            {
                return audioStream.Read(dataBuffer, 0, (int)size);
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Error in Read: {ex.Message}");
                return -1;
            }
        }

        public override void Close()
        {
            audioStream.Close();
        }
    }
}

.
Plan 2:
Check your Audio Format: The Speech SDK expects the audio data to be in a specific format: single-channel (mono) PCM audio data with a sample rate of 16 kHz and 16 bits per sample. If your audio data is in a different format, you’ll need to convert it to this format before using it with the Speech SDK.
.**
Plan 3:**
Check your Audio Quality: The quality of the audio data can significantly affect speech recognition results. Background noise, low volume, or low bitrate can make the speech in the audio data difficult to recognize. You might want to check the quality of your audio data and, if necessary, use audio editing software to improve it.
.
Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

I am passing a Stream to the service but never return the text

1 answer