I am passing a Stream to the service but never return the text

Juan Morales Marañon 0 Reputation points Microsoft Employee
2024-01-09T18:58:32.7433333+00:00

speech-sdk-log.txt

I am trying to convert live stream to text , this works fine with microphone , but when i send the Stram it was not recognized I already try same stremresult file and works fine from file but not for live stream

   AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] {"es-HN","es-MX","es-PA"});
    var config = SpeechConfig.FromSubscription(_speechKey, _speechRegion);
    //config.SpeechRecognitionLanguage = "es-MX";
    //config.EnableDictation();
    config.OutputFormat=OutputFormat.Detailed;
    //config.SetProperty(PropertyId.Speech_LogFilename, logFilex);

        var stopRecognition = new TaskCompletionSource<int>();

    byte channels = 1;
    byte bitsPerSample = 16;
    uint samplesPerSecond = 8000; // 768kbps = 768*1024 bps = 786432 bps; 786432 bps / 16 bits/sample = 49152 samples/second
    var audioFormat = AudioStreamFormat.GetWaveFormatPCM(samplesPerSecond, bitsPerSample, channels);

        var callback = new AudioInputCallback(Sentstream);
       
        using (var audioInput = AudioConfig.FromStreamInput(callback, audioFormat))
    {
        // Creates a speech recognizer using audio stream input.
        using (var recognizer = new SpeechRecognizer(config, autoDetectSourceLanguageConfig, audioInput))
        {
            Thread.Sleep(5000);
            // Subscribes to events.
            recognizer.Recognizing += (s, e) =>
            {
                Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}");

            };

            recognizer.Recognized += (s, e) =>
            {
                if (e.Result.Reason == ResultReason.RecognizedSpeech)
                {
                    Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
                }
                else if (e.Result.Reason == ResultReason.NoMatch)
                {
                    Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                }
            };

            recognizer.Canceled += (s, e) =>
            {
                Console.WriteLine($"CANCELED: Reason={e.Reason}");

                if (e.Reason == CancellationReason.Error)
                {
                    Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
                    Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
                    Console.WriteLine($"CANCELED: Did you update the subscription info?");
                }

                stopRecognition.TrySetResult(0);
            };

            recognizer.SessionStarted += (s, e) =>
            {
                Console.WriteLine("\n    Session started event.");
            };

            recognizer.SessionStopped += (s, e) =>
            {
                Console.WriteLine("\n    Session stopped event.");
                Console.WriteLine("\nStop recognition.");
                stopRecognition.TrySetResult(0);
            };

            // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
            await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);

            // Waits for completion.
            // Use Task.WaitAny to keep the task rooted.
            Task.WaitAny(new[] { stopRecognition.Task });

            // Stops recognition.
            await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
        }
    }
     
        recorder.Stop();
        Sentstream.Close();
        ReceivedStream.Close();
        //var fileStream = File.Create(outputfilename2);
        //speakerStream.Seek(0, SeekOrigin.Begin);
        //speakerStream.Close();
        //speakerStream.CopyTo(fileStream);
       // fileStream.Close();
    }
    finally
    {
        // Unsubscribe to stop getting events
        EventLogger.OnMessage -= OnMessageEvent;
    }

    // See resulting logs on the console
    Console.WriteLine("Here are the logs we captured:");
    foreach (string message in eventMessages)
    {
        Console.Write(message);
    }            

}

attached is the log file where i only can see :

[560060]: 25446ms SPX_DBG_TRACE_VERBOSE: audio_stream_session.cpp:466 [06C09CF0]CSpxAudioStreamSession::SetFormat: format != nullptr

can you please advice to fix the problem and have real time speach to text conversion?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,393 questions
Azure Stream Analytics
Azure Stream Analytics
An Azure real-time analytics service designed for mission-critical workloads.
330 questions
Azure IoT SDK
Azure IoT SDK
An Azure software development kit that facilitates building applications that connect to Azure IoT services.
208 questions
C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
10,250 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,375 questions
{count} votes

1 answer

Sort by: Most helpful
  1. navba-MSFT 17,110 Reputation points Microsoft Employee
    2024-01-10T05:53:58.05+00:00

    @Juan Morales Marañon Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    Plan 1:
    Could you please test with the below sample and check ?

    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    using Microsoft.CognitiveServices.Speech;
    using Microsoft.CognitiveServices.Speech.Audio;
    
    namespace SpeechToTextStream
    {
        class Program
        {
            static async Task Main(string[] args)
            {
                var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US" });
                var config = SpeechConfig.FromSubscription("965XXXXXXXXX0c8c8c2", "eastus");
                config.OutputFormat = OutputFormat.Detailed;
                config.SetProperty(PropertyId.Speech_LogFilename, "logFile.txt");
    
                byte channels = 1;
                byte bitsPerSample = 16;
                uint samplesPerSecond = 8000;
                var audioFormat = AudioStreamFormat.GetWaveFormatPCM(samplesPerSecond, bitsPerSample, channels);
    
                Stream myAudioStream = File.OpenRead(@"C:\myfile.wav");
                var callback = new MyPushAudioInputStreamCallback(myAudioStream);
                 var pullStream = AudioInputStream.CreatePullStream(callback, audioFormat);
                 var audioInput = AudioConfig.FromStreamInput(pullStream);
    
    
                var recognizer = new SpeechRecognizer(config, autoDetectSourceLanguageConfig, audioInput);
    
                Console.WriteLine("Processing the audio file...");
                var result = await recognizer.RecognizeOnceAsync();
    
                if (result.Reason == ResultReason.RecognizedSpeech)
                {
                    Console.WriteLine($"We recognized: {result.Text}");
                }
                else if (result.Reason == ResultReason.NoMatch)
                {
                    Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                }
                else if (result.Reason == ResultReason.Canceled)
                {
                    var cancellation = CancellationDetails.FromResult(result);
                    Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
    
                    if (cancellation.Reason == CancellationReason.Error)
                    {
                        Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                        Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                        Console.WriteLine($"CANCELED: Did you update the subscription info?");
                    }
                }
                Console.ReadLine();
            }
    
        }
    
        public class MyPushAudioInputStreamCallback : PullAudioInputStreamCallback
        {
            private Stream audioStream;
    
            public MyPushAudioInputStreamCallback(Stream audioStream)
            {
                this.audioStream = audioStream;
            }
    
            public override int Read(byte[] dataBuffer, uint size)
            {
                try
                {
                    return audioStream.Read(dataBuffer, 0, (int)size);
                }
                catch (Exception ex)
                {
                    Console.WriteLine($"Error in Read: {ex.Message}");
                    return -1;
                }
            }
    
            public override void Close()
            {
                audioStream.Close();
            }
        }
    }
    
    

    .
    Plan 2:
    Check your Audio Format
    : The Speech SDK expects the audio data to be in a specific format: single-channel (mono) PCM audio data with a sample rate of 16 kHz and 16 bits per sample. If your audio data is in a different format, you’ll need to convert it to this format before using it with the Speech SDK.
    .**
    Plan 3:**
    Check your Audio Quality: The quality of the audio data can significantly affect speech recognition results. Background noise, low volume, or low bitrate can make the speech in the audio data difficult to recognize. You might want to check the quality of your audio data and, if necessary, use audio editing software to improve it.
    .
    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    0 comments No comments