Quickstart: Create a voice-first virtual assistant with the Speech SDK, UWP

Quickstarts are also available for speech recognition, speech synthesis, and speech translation.

In this article, you'll develop a C# Universal Windows Platform (UWP) application by using the Speech SDK. The program will connect to a previously authored and configured bot to enable a voice-first virtual assistant experience from the client application. The application is built with the Speech SDK NuGet Package and Microsoft Visual Studio 2019 (any edition).

Note

The Universal Windows Platform lets you develop apps that run on any device that supports Windows 10, including PCs, Xbox, Surface Hub, and other devices.

Prerequisites

This quickstart requires:

Optional: Get started fast

This quickstart will describe, step by step, how to make a client application to connect to your speech-enabled bot. If you prefer to dive right in, the complete, ready-to-compile source code used in this quickstart is available in the Speech SDK Samples under the quickstart folder.

Create a Visual Studio project

To create a Visual Studio project for Universal Windows Platform (UWP) development, you need to set up Visual Studio development options, create the project, select the target architecture, set up audio capture, and install the Speech SDK.

Set up Visual Studio development options

To start, make sure you're set up correctly in Visual Studio for UWP development:

  1. Open Visual Studio 2019 to display the Start window.

    Start window - Visual Studio

  2. Select Continue without code to go to the Visual Studio IDE.

  3. From the Visual Studio menu bar, select Tools > Get Tools and Features to open Visual Studio Installer and view the Modifying dialog box.

    Workloads tab, Modifying dialog box, Visual Studio Installer

  4. In the Workloads tab, under Windows, find the Universal Windows Platform development workload. If the check box next to that workload is already selected, close the Modifying dialog box, and go to step 6.

  5. Select the Universal Windows Platform development check box, select Modify, and then in the Before we get started dialog box, select Continue to install the UWP development workload. Installation of the new feature may take a while.

  6. Close Visual Studio Installer.

Create the project and select the target architecture

Next, create your project:

  1. In the Visual Studio menu bar, choose File > New > Project to display the Create a new project window.

    Create a new project - Visual Studio

  2. Find and select Blank App (Universal Windows). Make sure that you select the C# version of this project type (as opposed to Visual Basic).

  3. Select Next to display the Configure your new project screen.

    Configure your new project - Visual Studio

  4. In Project name, enter helloworld.

  5. In Location, navigate to and select or create the folder to save your project in.

  6. Select Create to go to the New Universal Windows Platform Project window.

    New Universal Windows Platform Project dialog box - Visual Studio

  7. In Minimum version (the second drop-down box), choose Windows 10 Fall Creators Update (10.0; Build 16299), which is the minimum requirement for the Speech SDK.

  8. In Target version (the first drop-down box), choose a value identical to or later than the value in Minimum version.

  9. Select OK. You're returned to the Visual Studio IDE, with the new project created and visible in the Solution Explorer pane.

    helloworld project - Visual Studio

Now select your target platform architecture. In the Visual Studio toolbar, find the Solution Platforms drop-down box. (If you don't see it, choose View > Toolbars > Standard to display the toolbar containing Solution Platforms.) If you're running 64-bit Windows, choose x64 in the drop-down box. 64-bit Windows can also run 32-bit applications, so you can choose x86 if you prefer.

Note

The Speech SDK only supports Intel-compatible processors. ARM processors are currently not supported.

Set up audio capture

Then allow the project to capture audio input:

  1. In Solution Explorer, double-click Package.appxmanifest to open the package application manifest.

  2. Select the Capabilities tab.

    Capabilities tab, Package application manifest - Visual Studio

  3. Select the box for the Microphone capability.

  4. From the menu bar, choose File > Save Package.appxmanifest to save your changes.

Install the Speech SDK

Finally, install the Speech SDK NuGet package, and reference the Speech SDK in your project:

  1. In Solution Explorer, right-click your solution, and choose Manage NuGet Packages for Solution to go to the NuGet - Solution window.

  2. Select Browse.

    Screenshot of Manage Packages for Solution dialog box

  3. In Package source, choose nuget.org.

  4. In the Search box, enter Microsoft.CognitiveServices.Speech, and then choose that package after it appears in the search results.

    Screenshot of Manage Packages for Solution dialog box

  5. In the package status pane next to the search results, select your helloworld project.

  6. Select Install.

  7. In the Preview Changes dialog box, select OK.

  8. In the License Acceptance dialog box, view the license, and then select I Accept. The package installation begins, and when installation is complete, the Output pane displays a message similar to the following text: Successfully installed 'Microsoft.CognitiveServices.Speech 1.6.0' to helloworld.

Add sample code

Now add the XAML code that defines the user interface of the application, and add the C# code-behind implementation.

XAML code

First, you'll create the application's user interface by adding the XAML code:

  1. In Solution Explorer, open MainPage.xaml.

  2. In the designer's XAML view, replace the entire contents with the following code snippet:

    <Page
        x:Class="helloworld.MainPage"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:local="using:helloworld"
        xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
        xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
        mc:Ignorable="d"
        Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
    
        <Grid>
            <StackPanel Orientation="Vertical" HorizontalAlignment="Center"  
                        Margin="20,50,0,0" VerticalAlignment="Center" Width="800">
                <Button x:Name="EnableMicrophoneButton" Content="Enable Microphone"  
                        Margin="0,0,10,0" Click="EnableMicrophone_ButtonClicked" 
                        Height="35"/>
                <Button x:Name="ListenButton" Content="Talk to your bot" 
                        Margin="0,10,10,0" Click="ListenButton_ButtonClicked" 
                        Height="35"/>
                <StackPanel x:Name="StatusPanel" Orientation="Vertical" 
                            RelativePanel.AlignBottomWithPanel="True" 
                            RelativePanel.AlignRightWithPanel="True" 
                            RelativePanel.AlignLeftWithPanel="True">
                    <TextBlock x:Name="StatusLabel" Margin="0,10,10,0" 
                               TextWrapping="Wrap" Text="Status:" FontSize="20"/>
                    <Border x:Name="StatusBorder" Margin="0,0,0,0">
                        <ScrollViewer VerticalScrollMode="Auto"  
                                      VerticalScrollBarVisibility="Auto" MaxHeight="200">
                            <!-- Use LiveSetting to enable screen readers to announce 
                                 the status update. -->
                            <TextBlock 
                                x:Name="StatusBlock" FontWeight="Bold" 
                                AutomationProperties.LiveSetting="Assertive"
                                MaxWidth="{Binding ElementName=Splitter, Path=ActualWidth}" 
                                Margin="10,10,10,20" TextWrapping="Wrap"  />
                        </ScrollViewer>
                    </Border>
                </StackPanel>
            </StackPanel>
            <MediaElement x:Name="mediaElement"/>
        </Grid>
    </Page>
    

The Design view is updated to show the application's user interface.

C# code-behind source

Then you add the code-behind source so that the application works as expected. The code-behind source includes:

  • using statements for the Speech and Speech.Dialog namespaces
  • A simple implementation to ensure microphone access, wired to a button handler
  • Basic UI helpers to present messages and errors in the application
  • A landing point for the initialization code path that will be populated later
  • A helper to play back text-to-speech (without streaming support)
  • An empty button handler to start listening that will be populated later

To add the code-behind source, follow these steps:

  1. In Solution Explorer, open the code-behind source file MainPage.xaml.cs. (It's grouped under MainPage.xaml.)

  2. Replace the file's contents with the following code snippet:

    using Microsoft.CognitiveServices.Speech;
    using Microsoft.CognitiveServices.Speech.Audio;
    using Microsoft.CognitiveServices.Speech.Dialog;
    using System;
    using System.Diagnostics;
    using System.IO;
    using System.Text;
    using Windows.Foundation;
    using Windows.Storage.Streams;
    using Windows.UI.Xaml;
    using Windows.UI.Xaml.Controls;
    using Windows.UI.Xaml.Media;
    
    namespace helloworld
    {
        public sealed partial class MainPage : Page
        {
            private DialogServiceConnector connector;
    
            private enum NotifyType
            {
                StatusMessage,
                ErrorMessage
            };
    
            public MainPage()
            {
                this.InitializeComponent();
            }
    
            private async void EnableMicrophone_ButtonClicked(
                object sender, RoutedEventArgs e)
            {
                bool isMicAvailable = true;
                try
                {
                    var mediaCapture = new Windows.Media.Capture.MediaCapture();
                    var settings = 
                        new Windows.Media.Capture.MediaCaptureInitializationSettings();
                    settings.StreamingCaptureMode = 
                        Windows.Media.Capture.StreamingCaptureMode.Audio;
                    await mediaCapture.InitializeAsync(settings);
                }
                catch (Exception)
                {
                    isMicAvailable = false;
                }
                if (!isMicAvailable)
                {
                    await Windows.System.Launcher.LaunchUriAsync(
                        new Uri("ms-settings:privacy-microphone"));
                }
                else
                {
                    NotifyUser("Microphone was enabled", NotifyType.StatusMessage);
                }
            }
    
            private void NotifyUser(
                string strMessage, NotifyType type = NotifyType.StatusMessage)
            {
                // If called from the UI thread, then update immediately.
                // Otherwise, schedule a task on the UI thread to perform the update.
                if (Dispatcher.HasThreadAccess)
                {
                    UpdateStatus(strMessage, type);
                }
                else
                {
                    var task = Dispatcher.RunAsync(
                        Windows.UI.Core.CoreDispatcherPriority.Normal, 
                        () => UpdateStatus(strMessage, type));
                }
            }
    
            private void UpdateStatus(string strMessage, NotifyType type)
            {
                switch (type)
                {
                    case NotifyType.StatusMessage:
                        StatusBorder.Background = new SolidColorBrush(
                            Windows.UI.Colors.Green);
                        break;
                    case NotifyType.ErrorMessage:
                        StatusBorder.Background = new SolidColorBrush(
                            Windows.UI.Colors.Red);
                        break;
                }
                StatusBlock.Text += string.IsNullOrEmpty(StatusBlock.Text) 
                    ? strMessage : "\n" + strMessage;
    
                if (!string.IsNullOrEmpty(StatusBlock.Text))
                {
                    StatusBorder.Visibility = Visibility.Visible;
                    StatusPanel.Visibility = Visibility.Visible;
                }
                else
                {
                    StatusBorder.Visibility = Visibility.Collapsed;
                    StatusPanel.Visibility = Visibility.Collapsed;
                }
                // Raise an event if necessary to enable a screen reader 
                // to announce the status update.
                var peer = Windows.UI.Xaml.Automation.Peers.FrameworkElementAutomationPeer.FromElement(StatusBlock);
                if (peer != null)
                {
                    peer.RaiseAutomationEvent(
                        Windows.UI.Xaml.Automation.Peers.AutomationEvents.LiveRegionChanged);
                }
            }
    
            // Waits for and accumulates all audio associated with a given 
            // PullAudioOutputStream and then plays it to the MediaElement. Long spoken 
            // audio will create extra latency and a streaming playback solution 
            // (that plays audio while it continues to be received) should be used -- 
            // see the samples for examples of this.
            private void SynchronouslyPlayActivityAudio(
                PullAudioOutputStream activityAudio)
            {
                var playbackStreamWithHeader = new MemoryStream();
                playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("RIFF"), 0, 4); // ChunkID
                playbackStreamWithHeader.Write(BitConverter.GetBytes(UInt32.MaxValue), 0, 4); // ChunkSize: max
                playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("WAVE"), 0, 4); // Format
                playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("fmt "), 0, 4); // Subchunk1ID
                playbackStreamWithHeader.Write(BitConverter.GetBytes(16), 0, 4); // Subchunk1Size: PCM
                playbackStreamWithHeader.Write(BitConverter.GetBytes(1), 0, 2); // AudioFormat: PCM
                playbackStreamWithHeader.Write(BitConverter.GetBytes(1), 0, 2); // NumChannels: mono
                playbackStreamWithHeader.Write(BitConverter.GetBytes(16000), 0, 4); // SampleRate: 16kHz
                playbackStreamWithHeader.Write(BitConverter.GetBytes(32000), 0, 4); // ByteRate
                playbackStreamWithHeader.Write(BitConverter.GetBytes(2), 0, 2); // BlockAlign
                playbackStreamWithHeader.Write(BitConverter.GetBytes(16), 0, 2); // BitsPerSample: 16-bit
                playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("data"), 0, 4); // Subchunk2ID
                playbackStreamWithHeader.Write(BitConverter.GetBytes(UInt32.MaxValue), 0, 4); // Subchunk2Size
    
                byte[] pullBuffer = new byte[2056];
    
                uint lastRead = 0;
                do
                {
                    lastRead = activityAudio.Read(pullBuffer);
                    playbackStreamWithHeader.Write(pullBuffer, 0, (int)lastRead);
                }
                while (lastRead == pullBuffer.Length);
    
                var task = Dispatcher.RunAsync(
                    Windows.UI.Core.CoreDispatcherPriority.Normal, () =>
                {
                    mediaElement.SetSource(
                        playbackStreamWithHeader.AsRandomAccessStream(), "audio/wav");
                    mediaElement.Play();
                });
            }
    
            private void InitializeDialogServiceConnector()
            {
                // New code will go here
            }
    
            private async void ListenButton_ButtonClicked(
                object sender, RoutedEventArgs e)
            {
                // New code will go here
            }
        }
    }
    
  3. Add the following code snippet to the method body of InitializeDialogServiceConnector. This code creates the DialogServiceConnector with your subscription information.

    // create a DialogServiceConfig by providing a bot secret key 
    // and Cognitive Services subscription key
    // the RecoLanguage property is optional (default en-US); 
    // note that only en-US is supported in Preview
    const string channelSecret = "YourChannelSecret"; // Your channel secret
    const string speechSubscriptionKey = "YourSpeechSubscriptionKey"; // Your subscription key
    
    // Your subscription service region. 
    // Note: only a subset of regions are currently supported
    const string region = "YourServiceRegion"; 
    
    var botConfig = DialogServiceConfig.FromBotSecret(
        channelSecret, speechSubscriptionKey, region);
    botConfig.SetProperty(PropertyId.SpeechServiceConnection_RecoLanguage, "en-US");
    connector = new DialogServiceConnector(botConfig);
    

    Note

    Direct Line Speech (Preview) is currently available in a subset of Speech Services regions. Please refer to the list of supported regions for voice-first virtual assistants and ensure your resources are deployed in one of those regions.

    Note

    For information on configuring your bot and retrieving a channel secret, see the Bot Framework documentation for the Direct Line Speech channel.

  4. Replace the strings YourChannelSecret, YourSpeechSubscriptionKey, and YourServiceRegion with your own values for your bot, speech subscription, and region.

  5. Append the following code snippet to the end of the method body of InitializeDialogServiceConnector. This code sets up handlers for events relied on by DialogServiceConnector to communicate its bot activities, speech recognition results, and other information.

    // ActivityReceived is the main way your bot will communicate with the client 
    // and uses bot framework activities
    connector.ActivityReceived += async (sender, activityReceivedEventArgs) =>
    {
        NotifyUser(
            $"Activity received, hasAudio={activityReceivedEventArgs.HasAudio} activity={activityReceivedEventArgs.Activity}");
    
        if (activityReceivedEventArgs.HasAudio)
        {
            SynchronouslyPlayActivityAudio(activityReceivedEventArgs.Audio);
        }
    };
    
    // Canceled will be signaled when a turn is aborted or experiences an error condition
    connector.Canceled += (sender, canceledEventArgs) =>
    {
        NotifyUser($"Canceled, reason={canceledEventArgs.Reason}");
        if (canceledEventArgs.Reason == CancellationReason.Error)
        {
            NotifyUser(
                $"Error: code={canceledEventArgs.ErrorCode}, details={canceledEventArgs.ErrorDetails}");
        }
    };
    
    // Recognizing (not 'Recognized') will provide the intermediate recognized text 
    // while an audio stream is being processed
    connector.Recognizing += (sender, recognitionEventArgs) =>
    {
        NotifyUser($"Recognizing! in-progress text={recognitionEventArgs.Result.Text}");
    };
    
    // Recognized (not 'Recognizing') will provide the final recognized text 
    // once audio capture is completed
    connector.Recognized += (sender, recognitionEventArgs) =>
    {
        NotifyUser($"Final speech-to-text result: '{recognitionEventArgs.Result.Text}'");
    };
    
    // SessionStarted will notify when audio begins flowing to the service for a turn
    connector.SessionStarted += (sender, sessionEventArgs) =>
    {
        NotifyUser($"Now Listening! Session started, id={sessionEventArgs.SessionId}");
    };
    
    // SessionStopped will notify when a turn is complete and 
    // it's safe to begin listening again
    connector.SessionStopped += (sender, sessionEventArgs) =>
    {
        NotifyUser($"Listening complete. Session ended, id={sessionEventArgs.SessionId}");
    };
    
  6. Add the following code snippet to the body of the ListenButton_ButtonClicked method in the MainPage class. This code sets up DialogServiceConnector to listen, since you already established the configuration and registered the event handlers.

    if (connector == null)
    {
        InitializeDialogServiceConnector();
        // Optional step to speed up first interaction: if not called, 
        // connection happens automatically on first use
        var connectTask = connector.ConnectAsync();
    }
    
    try
    {
        // Start sending audio to your speech-enabled bot
        var listenTask = connector.ListenOnceAsync();
    
        // You can also send activities to your bot as JSON strings -- 
        // Microsoft.Bot.Schema can simplify this
        string speakActivity = 
            @"{""type"":""message"",""text"":""Greeting Message"", ""speak"":""Hello there!""}";
        await connector.SendActivityAsync(speakActivity);
    
    }
    catch (Exception ex)
    {
        NotifyUser($"Exception: {ex.ToString()}", NotifyType.ErrorMessage);
    }
    
  7. From the menu bar, choose File > Save All to save your changes.

Build and run the application

Now you are ready to build and test your application.

  1. From the menu bar, choose Build > Build Solution to build the application. The code should compile without errors now.

  2. Choose Debug > Start Debugging (or press F5) to start the application. The helloworld window appears.

    Sample UWP virtual assistant application in C# - quickstart

  3. Select Enable Microphone, and when the access permission request pops up, select Yes.

    Microphone access permission request

  4. Select Talk to your bot, and speak an English phrase or sentence into your device's microphone. Your speech is transmitted to the Direct Line Speech channel and transcribed to text, which appears in the window.

Next steps

See also