question

Thamotharan-5143 avatar image
0 Votes"
Thamotharan-5143 asked LeonLu-MSFT commented

How to increase speech recognition waiting time in Xamarin Forms?

I am implemented the speech to text using https://github.com/aritchie/speechrecognition with following code

 using (var cancelSrc = new CancellationTokenSource())
 {
            string output = await CrossSpeechRecognition.Current.ListenUntilPause().ToTask(cancelSrc.Token);
 }

It is waiting only 5 secs, if we didn't speech anything.
But I need to increase this time as 30 secs.
How can I do it?


dotnet-xamarin
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

LeonLu-MSFT avatar image
0 Votes"
LeonLu-MSFT answered LeonLu-MSFT commented

Hello,​

Welcome to our Microsoft Q&A platform!

Speechrecognition this package is deprecated, And it do not provide the method to set the Duration. We can use two buttons to control the start recording and stop recording. myla is a label, When I click the Button_Clicked_1 to execute it, it will keep returning phrases as pause is observed. After clicking Button_Clicked_2 it will stop it.

private async void Button_Clicked_1(object sender, EventArgs e)
        {

           

             listener = CrossSpeechRecognition
      .Current
      .ContinuousDictation()
      .Subscribe(phrase => {
          // will keep returning phrases as pause is observed
          myla.Text += phrase;
    });
            
          
           
        }

        private void Button_Clicked_2(object sender, EventArgs e)
        {
            listener.Dispose();

        }


You can use Azure Speech Service to achieve it. If you want to use a plugin to achieve it(please search Xamarin Cognitive BingSpeech keywords).


====================Update======================

this plugin's source code in android platform.

https://github.com/aritchie/speechrecognition/tree/master/Plugin.SpeechRecognition/Platforms/Android

Because this plugin do not expose this method see this code, so we cannot set the intent at the application start.

https://github.com/aritchie/speechrecognition/blob/d4807cf30ad8c1d9d62c99fec2e650038f902e6a/Plugin.SpeechRecognition/Platforms/Android/SpeechRecognizerImpl.cs#L162
create an interface in the PCL.

public interface ISpeechRecognizer
    {
        
        IObservable<string> ListenUntilTimesEnd(int Endtimes);
    }


Then achieve it in the android portal.

using Android.App;
using Android.Content;
using Android.OS;
using Android.Runtime;
using Android.Speech;
using Android.Views;
using Android.Widget;
using App120.Droid;
using Plugin.SpeechRecognition;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Reactive.Linq;
using System.Reactive.Subjects;
using System.Text;
using Xamarin.Forms;
using Debug = System.Diagnostics.Debug;

[assembly: Dependency(typeof(SpeechRecognizerService))]
namespace App120.Droid
{
    public class SpeechRecognizerService : ISpeechRecognizer
    {
        protected Subject<bool> ListenSubject { get; } = new Subject<bool>();
        readonly object syncLock = new object();
        public IObservable<string> ListenUntilTimesEnd(int Endtimes) => Observable.Create<string>(ob =>
        {
            var final = "";
            var listener = new SpeechRecognitionListener
            {
                ReadyForSpeech = () => this.ListenSubject.OnNext(true),
                Error = ex => ob.OnError(new Exception("Failure in speech engine - " + ex)),
                PartialResults = sentence =>
                {
                    lock (this.syncLock)
                        final = sentence;
                },
                FinalResults = sentence =>
                {
                    lock (this.syncLock)
                        final = sentence;
                },
                EndOfSpeech = () =>
                {
                    lock (this.syncLock)
                    {
                        ob.OnNext(final);
                        ob.OnCompleted();
                        this.ListenSubject.OnNext(false);
                    }
                }
            };
            var speechRecognizer = SpeechRecognizer.CreateSpeechRecognizer(Android.App.Application.Context);
            speechRecognizer.SetRecognitionListener(listener);
            speechRecognizer.StartListening(this.CreateSpeechIntent(true, Endtimes));
            //speechRecognizer.StartListening(this.CreateSpeechIntent(false));

            return () =>
            {
                this.ListenSubject.OnNext(false);
                speechRecognizer.StopListening();
                speechRecognizer.Destroy();
            };
        });

        protected virtual Intent CreateSpeechIntent(bool partialResults,int Endtimes)
        {
            var intent = new Intent(RecognizerIntent.ActionRecognizeSpeech);
            intent.PutExtra(RecognizerIntent.ExtraLanguagePreference, Java.Util.Locale.Default);
            intent.PutExtra(RecognizerIntent.ExtraLanguage, Java.Util.Locale.Default);
            intent.PutExtra(RecognizerIntent.ExtraLanguageModel, RecognizerIntent.LanguageModelFreeForm);
            intent.PutExtra(RecognizerIntent.ExtraCallingPackage, Android.App.Application.Context.PackageName);
            intent.PutExtra(RecognizerIntent.ExtraMaxResults, 1);
            intent.PutExtra("android.speech.extra.DICTATION_MODE", true);
            intent.PutExtra(RecognizerIntent.ExtraPartialResults, false);

            intent.PutExtra(RecognizerIntent.ExtraSpeechInputCompleteSilenceLengthMillis, Endtimes);
            intent.PutExtra(RecognizerIntent.ExtraSpeechInputPossiblyCompleteSilenceLengthMillis, Endtimes);
            intent.PutExtra(RecognizerIntent.ExtraSpeechInputMinimumLengthMillis, 15000);
            intent.PutExtra(RecognizerIntent.ExtraPartialResults, partialResults);

            return intent;
        }
    }

    public class SpeechRecognitionListener : Java.Lang.Object, IRecognitionListener
    {
        public Action StartOfSpeech { get; set; }
        public Action EndOfSpeech { get; set; }
        public Action ReadyForSpeech { get; set; }
        public Action<SpeechRecognizerError> Error { get; set; }
        public Action<string> FinalResults { get; set; }
        public Action<string> PartialResults { get; set; }
        public Action<float> RmsChanged { get; set; }


        public void OnBeginningOfSpeech()
        {
            Debug.WriteLine("Beginning of Speech");
            this.StartOfSpeech?.Invoke();
        }


        public void OnBufferReceived(byte[] buffer) => Debug.WriteLine("Buffer Received");


        public void OnEndOfSpeech()
        {
            Debug.WriteLine("End of Speech");
            this.EndOfSpeech?.Invoke();
        }


        public void OnError(SpeechRecognizerError error)
        {
            Debug.WriteLine("Error: " + error);
            this.Error?.Invoke(error);
        }


        public void OnEvent(int eventType, Bundle @params) => Debug.WriteLine("OnEvent: " + eventType);


        public void OnReadyForSpeech(Bundle @params)
        {
            Debug.WriteLine("Ready for Speech");
            this.ReadyForSpeech?.Invoke();
        }


        public void OnPartialResults(Bundle bundle)
        {
            Debug.WriteLine("OnPartialResults");
            this.SendResults(bundle, this.PartialResults);
        }


        public void OnResults(Bundle bundle)
        {
            Debug.WriteLine("Speech Results");
            this.SendResults(bundle, this.FinalResults);
        }


        public void OnRmsChanged(float rmsdB)
        {
            Debug.WriteLine("RMS Changed: " + rmsdB);
            this.RmsChanged?.Invoke(rmsdB);
        }


        void SendResults(Bundle bundle, Action<string> action)
        {
            var matches = bundle.GetStringArrayList(SpeechRecognizer.ResultsRecognition);
            if (matches == null || matches.Count == 0)
            {
                Debug.WriteLine("Matches value is null in bundle");
                return;
            }

            if (Build.VERSION.SdkInt >= BuildVersionCodes.IceCreamSandwich && matches.Count > 1)
            {
                var scores = bundle.GetFloatArray(SpeechRecognizer.ConfidenceScores);
                var best = 0;
                for (var i = 0; i < scores.Length; i++)
                {
                    if (scores[best] < scores[i])
                        best = i;
                }
                var winner = matches[best];
                action?.Invoke(winner);
            }
            else
            {
                action?.Invoke(matches.First());
            }
        }
    }
}


Best Regards,

Leon Lu



If the response is helpful, please click "Accept Answer" and upvote it.

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


· 7
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

I am used this, but getting speech recognition sound is coming continuously in every 2 secs. I need to change that timing.

0 Votes 0 ·

intent.PutExtra(RecognizerIntent.ExtraSpeechInputCompleteSilenceLengthMillis, 1500);
intent.PutExtra(RecognizerIntent.ExtraSpeechInputPossiblyCompleteSilenceLengthMillis, 1500);
intent.PutExtra(RecognizerIntent.ExtraSpeechInputMinimumLengthMillis, 15000);

is it possible to add these lines on app startup?

0 Votes 0 ·

Hi, Please give me idea to auto stop the listener if user is not spoke any think until 60 secs.

0 Votes 0 ·

This project is open source, and it do not exposed a interface to set the intent.PutExtra(RecognizerIntent.ExtraSpeechInputCompleteSilenceLengthMillis, 1500);, we cannot add these lines on app startup.

But we can create a dependenceService, copy this source code in our project. Please see my updated answer.

0 Votes 0 ·

May I know if you have got any chance to check my new answer? I am glad to help if you have any other questions

0 Votes 0 ·

It is not working. I have tried your new answer with endtimes = 600000 but it is stopped in 2 seconds when user is not spoke anything. I am getting empty text after 2 or 3 secs. Could you please help me?

0 Votes 0 ·
Show more comments