How to implement AcousticEchoCanceler between TextToSpeech and SpeechRecognizer

Thamotharan 21 Reputation points
2021-10-20T07:47:27.787+00:00

Hi,

My achievement is TextToSpeech will speak some paragraph, meanwhile SpeechRecognizer is listening for user input. If SpeechRecognition detect any text then TextToSpeech have to stop.

public class SpeechService : ISpeechRecognizer
{
    protected Subject<bool> ListenSubject { get; } = new Subject<bool>();
    readonly object syncLock = new object();
    [Obsolete]
    public IObservable<string> Listen(Action action = null) => Observable.Create<string>(ob =>
         {
             speechRecognizer = SpeechRecognizer.CreateSpeechRecognizer(Application.Context);
             var listener = new SpeechRecognitionListener(); 
             listener.ReadyForSpeech = () => this.ListenSubject.OnNext(true);
             listener.PartialResults = sentence =>
             {
                     if (action != null)
                     {
                         action.Invoke();
                     }
                     lock (this.syncLock)
                     {
                         sentence = sentence.Trim();
                         if (currentIndex > sentence.Length)
                             currentIndex = 0;

                         var newPart = sentence.Substring(currentIndex);
                         currentIndex = sentence.Length;
                         final = sentence;
                     }
         }
         listener.EndOfSpeech = () =>
         {
            ob.OnNext(final);
            ob.OnCompleted();
            this.ListenSubject.OnNext(false);
         }
         speechRecognizer.SetRecognitionListener(listener);
         speechRecognizer.StartListening(this.CreateSpeechIntent(true));
         return () =>
         {
             audioManager.SetStreamMute(Stream.Notification, false);
             stop = true;
             speechRecognizer?.StopListening();
             speechRecognizer?.Destroy();
             this.ListenSubject.OnNext(false);
         };
    });
    protected virtual Intent CreateSpeechIntent(bool partialResults)
    {

        var intent = new Intent(RecognizerIntent.ActionRecognizeSpeech);
        intent.PutExtra(RecognizerIntent.ExtraLanguagePreference, Java.Util.Locale.Default);
        intent.PutExtra(RecognizerIntent.ExtraLanguage, Java.Util.Locale.Default);
        intent.PutExtra(RecognizerIntent.ExtraLanguageModel, RecognizerIntent.LanguageModelFreeForm);
        intent.PutExtra(RecognizerIntent.ExtraCallingPackage, Application.Context.PackageName);
        intent.PutExtra(RecognizerIntent.ExtraPartialResults, partialResults);
        return intent;
    }
}

This my speech recognition calling

public class MyViewModel : ReactiveObject
    {
        CancellationTokenSource textToSpeechCancellationToken;
        public MyViewModel ()
        {
            speak("Xamarin is a Microsoft-owned San Francisco-based software company founded in May 2011 by the engineers that created Mono, Xamarin.Android and Xamarin.iOS, which are cross-platform implementations of the Common Language Infrastructure and Common Language Specifications.");
            Action actionAfterSpeechDetect = delegate
            {
                if (textToSpeechCancellationToken != null && !textToSpeechCancellationToken.IsCancellationRequested)
                {
                    textToSpeechCancellationToken.Cancel();
                }
            };
            using (var cancelSrc = new CancellationTokenSource())
            {
                output = await DependencyService.Get<ISpeechRecognizer>().Listen(actionAfterSpeechDetect).ToTask(cancelSrc.Token);
            }    
        }
        public void speak(string text)
        {
            Task.Run(async () =>
            {
                textToSpeechCancellationToken = new CancellationTokenSource();
                await TextToSpeech.SpeakAsync(text, textToSpeechCancellationToken.Token);
            });
        }
    }

For the above implementation, If I didn't speak anything then TextToSpeech is stopping. Because speech recognition detect text from speaker output. So, I would know about How to ignore mobile speaker output (TextToSpeech) for speech recognition in Xamarin Android.

I was found AcousticEchoCanceler, but I don't know how to implement the it with TextToSpeech and SpeechRecognizer.
Please Help me.

Xamarin
Xamarin
A Microsoft open-source app platform for building Android and iOS apps with .NET and C#.
5,297 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Wenyan Zhang (Shanghai Wicresoft Co,.Ltd.) 26,751 Reputation points Microsoft Vendor
    2021-10-21T08:37:06.023+00:00

    Hello,

    Welcome to our Microsoft Q&A platform!

    According to this doc, we can see " AEC is used by voice communication applications (voice chat, video conferencing, SIP calls) where the presence of echo with significant delay in the signal received from the remote party is highly disturbing. AEC is often used in conjunction with noise suppression (NS). "

    AEC will not filter TTS, I'm afraid it is hard to implement it with TextToSpeech and SpeechRecognizer, you could try to use CONFIDENCE_SCORES to judge the recognition result is correct.

    Best Regards,
    Wenyan Zhang


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments