Quickstart: Recognize speech in a UWP app by using the Speech SDK

Quickstarts are also available for text-to-speech, speech-translation and voice-first virtual assistant.

If desired, choose a different programming language and/or environment:

In this article, you develop a C# Universal Windows Platform (UWP; Windows version 1709 later) application by using the Cognitive Services Speech SDK. The program will transcribe speech to text in real time from your device's microphone. The application is built with the Speech SDK NuGet Package and Microsoft Visual Studio 2017 or later (any edition).

Note

The Universal Windows Platform lets you develop apps that run on any device that supports Windows 10, including PCs, Xbox, Surface Hub, and other devices.

Prerequisites

This quickstart requires:

Create a Visual Studio project

  1. Start Visual Studio 2019.

  2. Make sure the Universal Windows Platform development workload is available. Choose Tools > Get Tools and Features from the Visual Studio menu bar to open the Visual Studio installer. If this workload is already enabled, close the dialog box.

    Screenshot of Visual Studio installer, with Workloads tab highlighted

    Otherwise, select the box next to .NET cross-platform development, and select Modify at the lower right corner of the dialog box. Installation of the new feature takes a moment.

  3. Create a blank Visual C# Universal Windows app. First, choose File > New > Project from the menu. In the New Project dialog box, expand Installed > Visual C# > Windows Universal in the left pane. Then select Blank App (Universal Windows). For the project name, enter helloworld.

    Screenshot of New Project dialog box

  4. The Speech SDK requires that your application is built for the Windows 10 Fall Creators Update or later. In the New Universal Windows Platform Project window that pops up, choose Windows 10 Fall Creators Update (10.0; Build 16299) as Minimum version. In the Target version box, select this version or any later version, and then click OK.

    Screenshot of the New Universal Windows Platform Project window

  5. If you're running 64-bit Windows, you can switch your build platform to x64 by using the drop-down menu in the Visual Studio toolbar. (64-bit Windows can run 32-bit applications, so you can leave it set to x86 if you prefer.)

    Screenshot of Visual Studio toolbar, with x64 highlighted

    Note

    The Speech SDK only supports Intel-compatible processors. ARM is currently not supported.

  6. Install and reference the Speech SDK NuGet package. In Solution Explorer, right-click the solution, and select Manage NuGet Packages for Solution.

    Screenshot of Solution Explorer, with Manage NuGet Packages for Solution option highlighted

  7. In the upper-right corner, in the Package Source field, select nuget.org. Search for the Microsoft.CognitiveServices.Speech package, and install it into the helloworld project.

    Screenshot of Manage Packages for Solution dialog box

  8. Accept the displayed license to begin installation of the NuGet package.

    Screenshot of License Acceptance dialog box

  9. The following output line appears in the Package Manager console.

    Successfully installed 'Microsoft.CognitiveServices.Speech 1.5.0' to helloworld
    
  10. Because the application uses the microphone for speech input, add the Microphone capability to the project. In Solution Explorer, double-click Package.appxmanifest to edit your application manifest. Then switch to the Capabilities tab, select the box for the Microphone capability, and save your changes.

    Screenshot of Visual Studio application manifest, with Capabilities and Microphone highlighted

Add sample code

  1. The application's user interface is defined by using XAML. Open MainPage.xaml in Solution Explorer. In the designer's XAML view, insert the following XAML snippet into the Grid tag (between <Grid> and </Grid>).

    <StackPanel Orientation="Vertical" HorizontalAlignment="Center"  Margin="20,50,0,0" VerticalAlignment="Center" Width="800">
        <Button x:Name="EnableMicrophoneButton" Content="Enable Microphone"  Margin="0,0,10,0" Click="EnableMicrophone_ButtonClicked" Height="35"/>
        <Button x:Name="SpeechRecognitionButton" Content="Speech recognition with microphone input" Margin="0,10,10,0" Click="SpeechRecognitionFromMicrophone_ButtonClicked" Height="35"/>
        <StackPanel x:Name="StatusPanel" Orientation="Vertical" RelativePanel.AlignBottomWithPanel="True" RelativePanel.AlignRightWithPanel="True" RelativePanel.AlignLeftWithPanel="True">
            <TextBlock x:Name="StatusLabel" Margin="0,10,10,0" TextWrapping="Wrap" Text="Status:" FontSize="20"/>
            <Border x:Name="StatusBorder" Margin="0,0,0,0">
                <ScrollViewer VerticalScrollMode="Auto"  VerticalScrollBarVisibility="Auto" MaxHeight="200">
                    <!-- Use LiveSetting to enable screen readers to announce the status update. -->
                    <TextBlock x:Name="StatusBlock" FontWeight="Bold" AutomationProperties.LiveSetting="Assertive"
                    MaxWidth="{Binding ElementName=Splitter, Path=ActualWidth}" Margin="10,10,10,20" TextWrapping="Wrap"  />
                </ScrollViewer>
            </Border>
        </StackPanel>
    </StackPanel>
    
  2. Open the code-behind source file MainPage.xaml.cs (find it grouped under MainPage.xaml). Replace all the code in it with the following.

    using System;
    using System.Text;
    using Windows.UI.Xaml;
    using Windows.UI.Xaml.Controls;
    using Windows.UI.Xaml.Media;
    using Microsoft.CognitiveServices.Speech;
    
    namespace helloworld
    {
        /// <summary>
        /// An empty page that can be used on its own or navigated to within a Frame.
        /// </summary>
        public sealed partial class MainPage : Page
        {
            public MainPage()
            {
                this.InitializeComponent();
            }
    
            private async void EnableMicrophone_ButtonClicked(object sender, RoutedEventArgs e)
            {
                bool isMicAvailable = true;
                try
                {
                    var mediaCapture = new Windows.Media.Capture.MediaCapture();
                    var settings = new Windows.Media.Capture.MediaCaptureInitializationSettings();
                    settings.StreamingCaptureMode = Windows.Media.Capture.StreamingCaptureMode.Audio;
                    await mediaCapture.InitializeAsync(settings);
                }
                catch (Exception)
                {
                    isMicAvailable = false;
                }
                if (!isMicAvailable)
                {
                    await Windows.System.Launcher.LaunchUriAsync(new Uri("ms-settings:privacy-microphone"));
                }
                else
                {
                    NotifyUser("Microphone was enabled", NotifyType.StatusMessage);
                }
            }
    
            private async void SpeechRecognitionFromMicrophone_ButtonClicked(object sender, RoutedEventArgs e)
            {
                // Creates an instance of a speech config with specified subscription key and service region.
                // Replace with your own subscription key and service region (e.g., "westus").
                var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
    
                try
                {
                    // Creates a speech recognizer using microphone as audio input.
                    using (var recognizer = new SpeechRecognizer(config))
                    {
                        // Starts speech recognition, and returns after a single utterance is recognized. The end of a
                        // single utterance is determined by listening for silence at the end or until a maximum of 15
                        // seconds of audio is processed.  The task returns the recognition text as result.
                        // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
                        // shot recognition like command or query.
                        // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
                        var result = await recognizer.RecognizeOnceAsync().ConfigureAwait(false);
    
                        // Checks result.
                        StringBuilder sb = new StringBuilder();
                        if (result.Reason == ResultReason.RecognizedSpeech)
                        {
                            sb.AppendLine($"RECOGNIZED: Text={result.Text}");
                        }
                        else if (result.Reason == ResultReason.NoMatch)
                        {
                            sb.AppendLine($"NOMATCH: Speech could not be recognized.");
                        }
                        else if (result.Reason == ResultReason.Canceled)
                        {
                            var cancellation = CancellationDetails.FromResult(result);
                            sb.AppendLine($"CANCELED: Reason={cancellation.Reason}");
    
                            if (cancellation.Reason == CancellationReason.Error)
                            {
                                sb.AppendLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                                sb.AppendLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                                sb.AppendLine($"CANCELED: Did you update the subscription info?");
                            }
                        }
    
                        // Update the UI
                        NotifyUser(sb.ToString(), NotifyType.StatusMessage);
                    }
                }
                catch(Exception ex)
                {
                    NotifyUser($"Enable Microphone First.\n {ex.ToString()}", NotifyType.ErrorMessage);
                }
            }
    
            private enum NotifyType
            {
                StatusMessage,
                ErrorMessage
            };
    
            private void NotifyUser(string strMessage, NotifyType type)
            {
                // If called from the UI thread, then update immediately.
                // Otherwise, schedule a task on the UI thread to perform the update.
                if (Dispatcher.HasThreadAccess)
                {
                    UpdateStatus(strMessage, type);
                }
                else
                {
                    var task = Dispatcher.RunAsync(Windows.UI.Core.CoreDispatcherPriority.Normal, () => UpdateStatus(strMessage, type));
                }
            }
    
            private void UpdateStatus(string strMessage, NotifyType type)
            {
                switch (type)
                {
                    case NotifyType.StatusMessage:
                        StatusBorder.Background = new SolidColorBrush(Windows.UI.Colors.Green);
                        break;
                    case NotifyType.ErrorMessage:
                        StatusBorder.Background = new SolidColorBrush(Windows.UI.Colors.Red);
                        break;
                }
                StatusBlock.Text += string.IsNullOrEmpty(StatusBlock.Text) ? strMessage : "\n" + strMessage;
    
                // Collapse the StatusBlock if it has no text to conserve real estate.
                StatusBorder.Visibility = !string.IsNullOrEmpty(StatusBlock.Text) ? Visibility.Visible : Visibility.Collapsed;
                if (!string.IsNullOrEmpty(StatusBlock.Text))
                {
                    StatusBorder.Visibility = Visibility.Visible;
                    StatusPanel.Visibility = Visibility.Visible;
                }
                else
                {
                    StatusBorder.Visibility = Visibility.Collapsed;
                    StatusPanel.Visibility = Visibility.Collapsed;
                }
                // Raise an event if necessary to enable a screen reader to announce the status update.
                var peer = Windows.UI.Xaml.Automation.Peers.FrameworkElementAutomationPeer.FromElement(StatusBlock);
                if (peer != null)
                {
                    peer.RaiseAutomationEvent(Windows.UI.Xaml.Automation.Peers.AutomationEvents.LiveRegionChanged);
                }
            }
        }
    }
    
  3. In the SpeechRecognitionFromMicrophone_ButtonClicked handler in this file, replace the string YourSubscriptionKey with your subscription key.

  4. In the SpeechRecognitionFromMicrophone_ButtonClicked handler, replace the string YourServiceRegion with the region associated with your subscription (for example, westus for the free trial subscription).

  5. Save all changes to the project.

Build and run the app

  1. Build the application. From the menu bar, select Build > Build Solution. The code should compile without errors now.

    Screenshot of Visual Studio application, with Build Solution option highlighted

  2. Start the application. From the menu bar, select Debug > Start Debugging, or press F5.

    Screenshot of Visual Studio application, with Start Debugging option highlighted

  3. A window pops up. Select Enable Microphone, and acknowledge the permission request that pops up.

    Screenshot of permission request

  4. Select Speech recognition with microphone input, and speak an English phrase or sentence into your device's microphone. Your speech is transmitted to the Speech Services and transcribed to text, which appears in the window.

    Screenshot of speech recognition user interface

Next steps

See also