How to implement AcousticEchoCanceler between TextToSpeech and SpeechRecognizer

Question

Hi,

My achievement is TextToSpeech will speak some paragraph, meanwhile SpeechRecognizer is listening for user input. If SpeechRecognition detect any text then TextToSpeech have to stop.

public class SpeechService : ISpeechRecognizer
{
    protected Subject ListenSubject { get; } = new Subject();
    readonly object syncLock = new object();
    [Obsolete]
    public IObservable Listen(Action action = null) => Observable.Create(ob =>
         {
             speechRecognizer = SpeechRecognizer.CreateSpeechRecognizer(Application.Context);
             var listener = new SpeechRecognitionListener(); 
             listener.ReadyForSpeech = () => this.ListenSubject.OnNext(true);
             listener.PartialResults = sentence =>
             {
                     if (action != null)
                     {
                         action.Invoke();
                     }
                     lock (this.syncLock)
                     {
                         sentence = sentence.Trim();
                         if (currentIndex > sentence.Length)
                             currentIndex = 0;

                         var newPart = sentence.Substring(currentIndex);
                         currentIndex = sentence.Length;
                         final = sentence;
                     }
         }
         listener.EndOfSpeech = () =>
         {
            ob.OnNext(final);
            ob.OnCompleted();
            this.ListenSubject.OnNext(false);
         }
         speechRecognizer.SetRecognitionListener(listener);
         speechRecognizer.StartListening(this.CreateSpeechIntent(true));
         return () =>
         {
             audioManager.SetStreamMute(Stream.Notification, false);
             stop = true;
             speechRecognizer?.StopListening();
             speechRecognizer?.Destroy();
             this.ListenSubject.OnNext(false);
         };
    });
    protected virtual Intent CreateSpeechIntent(bool partialResults)
    {

        var intent = new Intent(RecognizerIntent.ActionRecognizeSpeech);
        intent.PutExtra(RecognizerIntent.ExtraLanguagePreference, Java.Util.Locale.Default);
        intent.PutExtra(RecognizerIntent.ExtraLanguage, Java.Util.Locale.Default);
        intent.PutExtra(RecognizerIntent.ExtraLanguageModel, RecognizerIntent.LanguageModelFreeForm);
        intent.PutExtra(RecognizerIntent.ExtraCallingPackage, Application.Context.PackageName);
        intent.PutExtra(RecognizerIntent.ExtraPartialResults, partialResults);
        return intent;
    }
}

This my speech recognition calling

public class MyViewModel : ReactiveObject
    {
        CancellationTokenSource textToSpeechCancellationToken;
        public MyViewModel ()
        {
            speak("Xamarin is a Microsoft-owned San Francisco-based software company founded in May 2011 by the engineers that created Mono, Xamarin.Android and Xamarin.iOS, which are cross-platform implementations of the Common Language Infrastructure and Common Language Specifications.");
            Action actionAfterSpeechDetect = delegate
            {
                if (textToSpeechCancellationToken != null && !textToSpeechCancellationToken.IsCancellationRequested)
                {
                    textToSpeechCancellationToken.Cancel();
                }
            };
            using (var cancelSrc = new CancellationTokenSource())
            {
                output = await DependencyService.Get().Listen(actionAfterSpeechDetect).ToTask(cancelSrc.Token);
            }    
        }
        public void speak(string text)
        {
            Task.Run(async () =>
            {
                textToSpeechCancellationToken = new CancellationTokenSource();
                await TextToSpeech.SpeakAsync(text, textToSpeechCancellationToken.Token);
            });
        }
    }

For the above implementation, If I didn't speak anything then TextToSpeech is stopping. Because speech recognition detect text from speaker output. So, I would know about How to ignore mobile speaker output (TextToSpeech) for speech recognition in Xamarin Android.

I was found AcousticEchoCanceler, but I don't know how to implement the it with TextToSpeech and SpeechRecognizer.
Please Help me.

Answer

Hello,

Welcome to our Microsoft Q&A platform!

According to this doc, we can see " AEC is used by voice communication applications (voice chat, video conferencing, SIP calls) where the presence of echo with significant delay in the signal received from the remote party is highly disturbing. AEC is often used in conjunction with noise suppression (NS). "

AEC will not filter TTS, I'm afraid it is hard to implement it with TextToSpeech and SpeechRecognizer, you could try to use CONFIDENCE_SCORES to judge the recognition result is correct.

Best Regards,
Wenyan Zhang

If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

How to implement AcousticEchoCanceler between TextToSpeech and SpeechRecognizer

1 answer