How to display string progressively when using speech synthisizer

Question

Hi

I use speechsynthisizer for reading text ,till now it works
but now i would like to display the text progressively when the voice speaks
for i.e if the text is 100 x 4 = 400
so when the voice reads one hundred so display 100 and so on
Thanks

Accepted Answer

Hello,

Welcome to Microsoft Q&A.

From the description of your question, what you need is to judge which texts have been read according to the playback progress of the stream data generated by SpeechSynthesizer.

Here is a simple example:

SpeechPage.xaml

SpeechPage.xaml.cs

public sealed partial class SpeechPage : Page
{
    private MediaPlayer _player = new MediaPlayer();
    private SpeechSynthesizer _synth = new SpeechSynthesizer();
    private string readText = "";
    private string totalText = "This is a simple example to test the progress of text reading";

    public SpeechPage()
    {
        this.InitializeComponent();
        _synth.Options.IncludeWordBoundaryMetadata = true;
        _synth.Options.IncludeSentenceBoundaryMetadata = true;
    }

    private async void Button_Click(object sender, RoutedEventArgs e)
    {
        readText = "";
        SpeechSynthesisStream synthesisStream = await _synth.SynthesizeTextToStreamAsync(totalText);

        // Create a media source from the stream: 
        var mediaSource = MediaSource.CreateFromStream(synthesisStream, synthesisStream.ContentType);
        //Create a Media Playback Item  
        var mediaPlaybackItem = new MediaPlaybackItem(mediaSource);
        RegisterForWordBoundaryEvents(mediaPlaybackItem);

        _player.Source = mediaPlaybackItem;
        _player.Play();
    }

    /// 
    /// This function executes when a SpeechCue is hit and calls the functions to update the UI
    /// 
    /// The timedMetadataTrack associated with the event.
    /// the arguments associated with the event.
    private async void metadata_SpeechCueEntered(TimedMetadataTrack timedMetadataTrack, MediaCueEventArgs args)
    {
        // Check in case there are different tracks and the handler was used for more tracks 
        if (timedMetadataTrack.TimedMetadataKind == TimedMetadataKind.Speech)
        {
            var cue = args.Cue as SpeechCue;
            if (cue != null)
            {
                System.Diagnostics.Debug.WriteLine("Hit Cue with start:" + cue.StartPositionInInput + " and end:" + cue.EndPositionInInput);
                System.Diagnostics.Debug.WriteLine("Cue text:[" + cue.Text + "]");
                // Do something with the cue 
                await Dispatcher.RunAsync(CoreDispatcherPriority.Normal,
                 () =>
                 {
                     if (cue.StartPositionInInput == 0 && cue.EndPositionInInput == totalText.Length)
                         return;
                     readText += cue.Text+" ";
                     ReadTextBlock.Text = readText.Trim();
                 });
            }
        }
    }

    /// 
    /// Register for all boundary events and register a function to add any new events if they arise.
    /// 
    /// The Media PLayback Item add handlers to.
    private void RegisterForWordBoundaryEvents(MediaPlaybackItem mediaPlaybackItem)
    {
        // If tracks were available at source resolution time, itterate through and register: 
        for (int index = 0; index < mediaPlaybackItem.TimedMetadataTracks.Count; index++)
        {
            RegisterMetadataHandlerForWords(mediaPlaybackItem, index);
        }

        // Since the tracks are added later we will  
        // monitor the tracks being added and subscribe to the ones of interest 
        mediaPlaybackItem.TimedMetadataTracksChanged += (MediaPlaybackItem sender, IVectorChangedEventArgs args) =>
        {
            if (args.CollectionChange == CollectionChange.ItemInserted)
            {
                RegisterMetadataHandlerForWords(sender, (int)args.Index);
            }
            else if (args.CollectionChange == CollectionChange.Reset)
            {
                for (int index = 0; index < sender.TimedMetadataTracks.Count; index++)
                {
                    RegisterMetadataHandlerForWords(sender, index);
                }
            }
        };
    }

    /// 
    /// Register for just word boundary events.
    /// 
    /// The Media Playback Item to register handlers for.
    /// Index of the timedMetadataTrack within the mediaPlaybackItem.
    private void RegisterMetadataHandlerForWords(MediaPlaybackItem mediaPlaybackItem, int index)
    {
        var timedTrack = mediaPlaybackItem.TimedMetadataTracks[index];
        //register for only word cues
        if (timedTrack.Id == "SpeechWord")
        {
            timedTrack.CueEntered += metadata_SpeechCueEntered;
            mediaPlaybackItem.TimedMetadataTracks.SetPresentationMode((uint)index, TimedMetadataTrackPresentationMode.ApplicationPresented);
        }
    }

}

By setting SpeechSynthesizerOptions.IncludeWordBoundaryMetadata=True , the stream data generated by SpeechSynthesizer.SynthesizeTextToStreamAsync() can contain the information of the read text, and then through the analysis of TimedMetadataTrack, we can get the text currently being read.

Microsoft provides a more complete example, you can refer to here

Thanks.

How to display string progressively when using speech synthisizer

0 additional answers