In this quickstart, you'll use the Speech SDK to interactively recognize speech from a microphone input, and get the text transcription from captured audio. It's easy to integrate this feature into your apps or devices for common recognition tasks, such as transcribing conversations. It can also be used for more complex integrations, like using the Bot Framework with the Speech SDK to build voice assistants.
After satisfying a few prerequisites, recognizing speech from a microphone only takes four steps:
Create a SpeechConfig object from your subscription key and region.
Create a SpeechRecognizer object using the SpeechConfig object from above.
Using the SpeechRecognizer object, start the recognition process for a single utterance.
Inspect the SpeechRecognitionResult returned.
If you prefer to jump right in, view or download all Speech SDK C# Samples on GitHub. Otherwise, let's get started.
Make sure that you have access to a microphone for audio capture
Open your project in Visual Studio
The first step is to make sure that you have your project open in Visual Studio.
Launch Visual Studio 2019.
Load your project and open Program.cs.
Start with some boilerplate code
Let's add some code that works as a skeleton for our project. Make note that you've created an async method called RecognizeSpeechAsync().
using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
namespace helloworld
{
class Program
{
public static async Task RecognizeSpeechAsync()
{
}
static void Main()
{
RecognizeSpeechAsync().Wait();
Console.WriteLine("Please press <Return> to continue.");
Console.ReadLine();
}
}
}
Create a Speech configuration
Before you can initialize a SpeechRecognizer object, you need to create a configuration that uses your subscription key and subscription region. Insert this code in the RecognizeSpeechAsync() method.
Note
This sample uses the FromSubscription() method to build the SpeechConfig. For a full list of available methods, see SpeechConfig Class.
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.
Initialize a SpeechRecognizer
Now, let's create a SpeechRecognizer. This object is created inside of a using statement to ensure the proper release of unmanaged resources. Insert this code in the RecognizeSpeechAsync() method, right below your Speech configuration.
using (var recognizer = new SpeechRecognizer(config))
{
}
Recognize a phrase
From the SpeechRecognizer object, you're going to call the RecognizeOnceAsync() method. This method lets the Speech service know that you're sending a single phrase for recognition, and that once the phrase is identified to stop recognizing speech.
Inside the using statement, add this code:
var result = await recognizer.RecognizeOnceAsync();
Display the recognition results (or errors)
When the recognition result is returned by the Speech service, you'll want to do something with it. We're going to keep it simple and print the result to console.
Inside the using statement, below RecognizeOnceAsync(), add this code:
if (result.Reason == ResultReason.RecognizedSpeech)
{
Console.WriteLine($"We recognized: {result.Text}");
}
else if (result.Reason == ResultReason.NoMatch)
{
Console.WriteLine($"NOMATCH: Speech could not be recognized.");
}
else if (result.Reason == ResultReason.Canceled)
{
var cancellation = CancellationDetails.FromResult(result);
Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
if (cancellation.Reason == CancellationReason.Error)
{
Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
Console.WriteLine($"CANCELED: Did you update the subscription info?");
}
}
Check your code
At this point, your code should look like this:
//
// Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
//
using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
namespace helloworld
{
class Program
{
public static async Task RecognizeSpeechAsync()
{
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
using (var recognizer = new SpeechRecognizer(config))
{
var result = await recognizer.RecognizeOnceAsync();
if (result.Reason == ResultReason.RecognizedSpeech)
{
Console.WriteLine($"We recognized: {result.Text}");
}
else if (result.Reason == ResultReason.NoMatch)
{
Console.WriteLine($"NOMATCH: Speech could not be recognized.");
}
else if (result.Reason == ResultReason.Canceled)
{
var cancellation = CancellationDetails.FromResult(result);
Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
if (cancellation.Reason == CancellationReason.Error)
{
Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
Console.WriteLine($"CANCELED: Did you update the subscription info?");
}
}
}
}
static void Main()
{
RecognizeSpeechAsync().Wait();
Console.WriteLine("Please press <Return> to continue.");
Console.ReadLine();
}
}
}
Build and run your app
Now you're ready to build your app and test our speech recognition using the Speech service.
Compile the code - From the menu bar of Visual Studio, choose Build > Build Solution.
Start your app - From the menu bar, choose Debug > Start Debugging or press F5.
Start recognition - It'll prompt you to speak a phrase in English. Your speech is sent to the Speech service, transcribed as text, and rendered in the console.
The Speech SDK for Unity supports Windows Desktop (x86 and x64) or Universal Windows Platform (x86, x64, ARM/ARM64), Android (x86, ARM32/64) and iOS (x64 simulator, ARM32 and ARM64)
Make sure that you have access to a microphone for audio capture
If you've already done this, great. Let's keep going.
Create a Unity project
Open Unity. If you're using Unity for the first time, the Unity Hub window appears. (You can also open Unity Hub directly to get to this window.)
Select New. The Create a new project with Unity window appears.
In Project Name, enter csharp-unity.
In Templates, if 3D isn't already selected, select it.
In Location, select or create a folder to save the project in.
Select Create.
After a bit of time, the Unity Editor window appears.
Add UI
Now let's add a minimal UI to our scene. This UI consists of a button to trigger speech recognition and a text field to display the result. In the Hierarchy window, a sample scene is shown that Unity created with the new project.
At the top of the Hierarchy window, select Create > UI > Button.
This action creates three game objects that you can see in the Hierarchy window: a Button object, a Canvas object containing the button, and an EventSystem object.
In the Inspector window (by default on the right), set the Pos X and Pos Y properties to 0, so the button is centered in the middle of the canvas.
In the Hierarchy window, select Create > UI > Text to create a Text object.
In the Inspector window, set the Pos X and Pos Y properties to 0 and 120, and set the Width and Height properties to 240 and 120. These values ensure that the text field and the button don't overlap.
When you're done, the Scene view should look similar to this screenshot:
Add the sample code
To add the sample script code for the Unity project, follow these steps:
In the Project window, select Create > C# script to add a new C# script.
Name the script HelloWorld.
Double-click HelloWorld to edit the newly created script.
Note
To configure the code editor to be used by Unity for editing, select Edit > Preferences, and then go to the External Tools preferences. For more information, see the Unity User Manual.
Replace the existing script with the following code:
using UnityEngine;
using UnityEngine.UI;
using Microsoft.CognitiveServices.Speech;
#if PLATFORM_ANDROID
using UnityEngine.Android;
#endif
#if PLATFORM_IOS
using UnityEngine.iOS;
using System.Collections;
#endif
public class HelloWorld : MonoBehaviour
{
// Hook up the two properties below with a Text and Button object in your UI.
public Text outputText;
public Button startRecoButton;
private object threadLocker = new object();
private bool waitingForReco;
private string message;
private bool micPermissionGranted = false;
#if PLATFORM_ANDROID || PLATFORM_IOS
// Required to manifest microphone permission, cf.
// https://docs.unity3d.com/Manual/android-manifest.html
private Microphone mic;
#endif
public async void ButtonClick()
{
// Creates an instance of a speech config with specified subscription key and service region.
// Replace with your own subscription key and service region (e.g., "westus").
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
// Make sure to dispose the recognizer after use!
using (var recognizer = new SpeechRecognizer(config))
{
lock (threadLocker)
{
waitingForReco = true;
}
// Starts speech recognition, and returns after a single utterance is recognized. The end of a
// single utterance is determined by listening for silence at the end or until a maximum of 15
// seconds of audio is processed. The task returns the recognition text as result.
// Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
// shot recognition like command or query.
// For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
var result = await recognizer.RecognizeOnceAsync().ConfigureAwait(false);
// Checks result.
string newMessage = string.Empty;
if (result.Reason == ResultReason.RecognizedSpeech)
{
newMessage = result.Text;
}
else if (result.Reason == ResultReason.NoMatch)
{
newMessage = "NOMATCH: Speech could not be recognized.";
}
else if (result.Reason == ResultReason.Canceled)
{
var cancellation = CancellationDetails.FromResult(result);
newMessage = $"CANCELED: Reason={cancellation.Reason} ErrorDetails={cancellation.ErrorDetails}";
}
lock (threadLocker)
{
message = newMessage;
waitingForReco = false;
}
}
}
void Start()
{
if (outputText == null)
{
UnityEngine.Debug.LogError("outputText property is null! Assign a UI Text element to it.");
}
else if (startRecoButton == null)
{
message = "startRecoButton property is null! Assign a UI Button to it.";
UnityEngine.Debug.LogError(message);
}
else
{
// Continue with normal initialization, Text and Button objects are present.
#if PLATFORM_ANDROID
// Request to use the microphone, cf.
// https://docs.unity3d.com/Manual/android-RequestingPermissions.html
message = "Waiting for mic permission";
if (!Permission.HasUserAuthorizedPermission(Permission.Microphone))
{
Permission.RequestUserPermission(Permission.Microphone);
}
#elif PLATFORM_IOS
if (!Application.HasUserAuthorization(UserAuthorization.Microphone))
{
Application.RequestUserAuthorization(UserAuthorization.Microphone);
}
#else
micPermissionGranted = true;
message = "Click button to recognize speech";
#endif
startRecoButton.onClick.AddListener(ButtonClick);
}
}
void Update()
{
#if PLATFORM_ANDROID
if (!micPermissionGranted && Permission.HasUserAuthorizedPermission(Permission.Microphone))
{
micPermissionGranted = true;
message = "Click button to recognize speech";
}
#elif PLATFORM_IOS
if (!micPermissionGranted && Application.HasUserAuthorization(UserAuthorization.Microphone))
{
micPermissionGranted = true;
message = "Click button to recognize speech";
}
#endif
lock (threadLocker)
{
if (startRecoButton != null)
{
startRecoButton.interactable = !waitingForReco && micPermissionGranted;
}
if (outputText != null)
{
outputText.text = message;
}
}
}
}
Find and replace the string YourSubscriptionKey with your Speech service subscription key.
Find and replace the string YourServiceRegion with the region associated with your subscription. For example, if you're using the free trial, the region is westus.
Save the changes to the script.
Now return to the Unity Editor and add the script as a component to one of your game objects:
In the Hierarchy window, select the Canvas object.
In the Inspector window, select the Add Component button.
In the drop-down list, search for the HelloWorld script we created above and add it. A Hello World (Script) section appears in the Inspector window, listing two uninitialized properties, Output Text and Start Reco Button. These Unity component properties match public properties of the HelloWorld class.
Select the Start Reco Button property's object picker (the small circle icon to the right of the property), and choose the Button object you created earlier.
Select the Output Text property's object picker, and choose the Text object you created earlier.
Note
The button also has a nested text object. Make sure you do not accidentally pick it for text output
(or rename one of the text objects using the Name field in the Inspector window to avoid confusion).
Run the application in the Unity Editor
Now you're ready to run the application within the Unity Editor.
In the Unity Editor toolbar (below the menu bar), select the Play button (a right-pointing triangle).
Go to Game view, and wait for the Text object to display Click button to recognize speech. (It displays New Text when the application hasn't started or isn't ready to respond.)
Select the button and speak an English phrase or sentence into your computer's microphone. Your speech is transmitted to the Speech service and transcribed to text, which appears in the Game view.
Check the Console window for debug messages. If the Console window isn't showing, go to the menu bar and select Window > General > Console to display it.
When you're done recognizing speech, select the Play button in the Unity Editor toolbar to stop the application.
Additional options to run this application
This application can also be deployed to as an Android app, a Windows stand-alone app, or a UWP application.
For more information, see our sample repository. The quickstart/csharp-unity folder describes the configuration for these additional targets.
In Solution Explorer, open the code-behind source file MainPage.xaml.cs. (It's grouped under MainPage.xaml.)
Replace the code with the following base code:
using System;
using System.Text;
using Windows.UI.Xaml;
using Windows.UI.Xaml.Controls;
using Windows.UI.Xaml.Media;
using Microsoft.CognitiveServices.Speech;
namespace helloworld
{
/// <summary>
/// An empty page that can be used on its own or navigated to within a Frame.
/// </summary>
public sealed partial class MainPage : Page
{
public MainPage()
{
this.InitializeComponent();
}
private async void EnableMicrophone_ButtonClicked(object sender, RoutedEventArgs e)
{
bool isMicAvailable = true;
try
{
var mediaCapture = new Windows.Media.Capture.MediaCapture();
var settings = new Windows.Media.Capture.MediaCaptureInitializationSettings();
settings.StreamingCaptureMode = Windows.Media.Capture.StreamingCaptureMode.Audio;
await mediaCapture.InitializeAsync(settings);
}
catch (Exception)
{
isMicAvailable = false;
}
if (!isMicAvailable)
{
await Windows.System.Launcher.LaunchUriAsync(new Uri("ms-settings:privacy-microphone"));
}
else
{
NotifyUser("Microphone was enabled", NotifyType.StatusMessage);
}
}
private async void SpeechRecognitionFromMicrophone_ButtonClicked(object sender, RoutedEventArgs e)
{
try
{
}
catch(Exception ex)
{
NotifyUser($"Enable Microphone First.\n {ex.ToString()}", NotifyType.ErrorMessage);
}
}
private enum NotifyType
{
StatusMessage,
ErrorMessage
};
private void NotifyUser(string strMessage, NotifyType type)
{
// If called from the UI thread, then update immediately.
// Otherwise, schedule a task on the UI thread to perform the update.
if (Dispatcher.HasThreadAccess)
{
UpdateStatus(strMessage, type);
}
else
{
var task = Dispatcher.RunAsync(Windows.UI.Core.CoreDispatcherPriority.Normal, () => UpdateStatus(strMessage, type));
}
}
private void UpdateStatus(string strMessage, NotifyType type)
{
switch (type)
{
case NotifyType.StatusMessage:
StatusBorder.Background = new SolidColorBrush(Windows.UI.Colors.Green);
break;
case NotifyType.ErrorMessage:
StatusBorder.Background = new SolidColorBrush(Windows.UI.Colors.Red);
break;
}
StatusBlock.Text += string.IsNullOrEmpty(StatusBlock.Text) ? strMessage : "\n" + strMessage;
// Collapse the StatusBlock if it has no text to conserve real estate.
StatusBorder.Visibility = !string.IsNullOrEmpty(StatusBlock.Text) ? Visibility.Visible : Visibility.Collapsed;
if (!string.IsNullOrEmpty(StatusBlock.Text))
{
StatusBorder.Visibility = Visibility.Visible;
StatusPanel.Visibility = Visibility.Visible;
}
else
{
StatusBorder.Visibility = Visibility.Collapsed;
StatusPanel.Visibility = Visibility.Collapsed;
}
// Raise an event if necessary to enable a screen reader to announce the status update.
var peer = Windows.UI.Xaml.Automation.Peers.FrameworkElementAutomationPeer.FromElement(StatusBlock);
if (peer != null)
{
peer.RaiseAutomationEvent(Windows.UI.Xaml.Automation.Peers.AutomationEvents.LiveRegionChanged);
}
}
}
}
Create a Speech configuration
Before you can initialize a SpeechRecognizer object, you need to create a configuration that uses your subscription key and subscription region. Insert this code in the RecognizeSpeechAsync() method.
Note
This sample uses the FromSubscription() method to build the SpeechConfig. For a full list of available methods, see SpeechConfig Class
// Creates an instance of a speech config with specified subscription key and service region.
// Replace with your own subscription key and service region (e.g., "westus").
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
Initialize a SpeechRecognizer
Now, let's create a SpeechRecognizer. This object is created inside of a using statement to ensure the proper release of unmanaged resources. Insert this code in the RecognizeSpeechAsync() method, right below your Speech configuration.
using (var recognizer = new SpeechRecognizer(config))
{
}
Recognize a phrase
From the SpeechRecognizer object, you're going to call the RecognizeOnceAsync() method. This method lets the Speech service know that you're sending a single phrase for recognition, and that once the phrase is identified to stop recognizing speech.
Inside the using statement, add this code:
var result = await recognizer.RecognizeOnceAsync().ConfigureAwait(false);
Display the recognition results (or errors)
When the recognition result is returned by the Speech service, you'll want to do something with it. We're going to keep it simple and print the result to the status panel.
// Checks result.
StringBuilder sb = new StringBuilder();
if (result.Reason == ResultReason.RecognizedSpeech)
{
sb.AppendLine($"RECOGNIZED: Text={result.Text}");
}
else if (result.Reason == ResultReason.NoMatch)
{
sb.AppendLine($"NOMATCH: Speech could not be recognized.");
}
else if (result.Reason == ResultReason.Canceled)
{
var cancellation = CancellationDetails.FromResult(result);
sb.AppendLine($"CANCELED: Reason={cancellation.Reason}");
if (cancellation.Reason == CancellationReason.Error)
{
sb.AppendLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
sb.AppendLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
sb.AppendLine($"CANCELED: Did you update the subscription info?");
}
}
// Update the UI
NotifyUser(sb.ToString(), NotifyType.StatusMessage);
}
Build and run the application
Now you are ready to build and test your application.
From the menu bar, choose Build > Build Solution to build the application. The code should compile without errors now.
Choose Debug > Start Debugging (or press F5) to start the application. The helloworld window appears.
Select Enable Microphone, and when the access permission request pops up, select Yes.
Select Speech recognition with microphone input, and speak an English phrase or sentence into your device's microphone. Your speech is transmitted to the Speech service and transcribed to text, which appears in the window.
Make sure that you have access to a microphone for audio capture
If you've already done this, great. Let's keep going.
Add sample code for the common helloworld project
The common helloworld project contains platform-independent implementations for your cross-platform application. Now add the XAML code that defines the user interface of the application, and add the C# code behind the implementation.
In Solution Explorer, under the common helloworld project, open MainPage.xaml.
In the designer's XAML view, insert the following XAML snippet into the Grid tag between <StackLayout> and </StackLayout>:
In Solution Explorer, open the code-behind source file MainPage.xaml.cs. It's grouped under MainPage.xaml.
Replace all the code in it with the following snippet:
//
// Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
//
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Xamarin.Forms;
using Microsoft.CognitiveServices.Speech;
namespace helloworld
{
public partial class MainPage : ContentPage
{
public MainPage()
{
InitializeComponent();
}
private async void OnRecognitionButtonClicked(object sender, EventArgs e)
{
try
{
// Creates an instance of a speech config with specified subscription key and service region.
// Replace with your own subscription key and service region (e.g., "westus").
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourRegion");
// Creates a speech recognizer using microphone as audio input.
using (var recognizer = new SpeechRecognizer(config))
{
// Starts speech recognition, and returns after a single utterance is recognized. The end of a
// single utterance is determined by listening for silence at the end or until a maximum of 15
// seconds of audio is processed. The task returns the recognition text as result.
// Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
// shot recognition like command or query.
// For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
var result = await recognizer.RecognizeOnceAsync().ConfigureAwait(false);
// Checks result.
StringBuilder sb = new StringBuilder();
if (result.Reason == ResultReason.RecognizedSpeech)
{
sb.AppendLine($"RECOGNIZED: Text={result.Text}");
}
else if (result.Reason == ResultReason.NoMatch)
{
sb.AppendLine($"NOMATCH: Speech could not be recognized.");
}
else if (result.Reason == ResultReason.Canceled)
{
var cancellation = CancellationDetails.FromResult(result);
sb.AppendLine($"CANCELED: Reason={cancellation.Reason}");
if (cancellation.Reason == CancellationReason.Error)
{
sb.AppendLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
sb.AppendLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
sb.AppendLine($"CANCELED: Did you update the subscription info?");
}
}
UpdateUI(sb.ToString());
}
}
catch (Exception ex)
{
UpdateUI("Exception: " + ex.ToString());
}
}
private async void OnEnableMicrophoneButtonClicked(object sender, EventArgs e)
{
bool micAccessGranted = await DependencyService.Get<IMicrophoneService>().GetPermissionsAsync();
if (!micAccessGranted)
{
UpdateUI("Please give access to microphone");
}
}
private void UpdateUI(String message)
{
Device.BeginInvokeOnMainThread(() =>
{
RecognitionText.Text = message;
});
}
}
}
In the source file's OnRecognitionButtonClicked handler, find the string YourSubscriptionKey, and replace it with your subscription key.
In the OnRecognitionButtonClicked handler, find the string YourServiceRegion, and replace it with the region associated with your subscription. (For example, use westus for the free trial subscription.)
Next, you need to create a Xamarin Service, which is used to query microphone permissions from different platform projects, such as UWP, Android, and iOS. To do that, add a new folder named Services under the helloworld project, and create a new C# source file under it. You can right-click the Services folder, and select Add > New Item > Code File. Rename the file IMicrophoneService.cs, and place all code from the following snippet in that file:
//
// Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
//
using System.Threading.Tasks;
namespace helloworld
{
public interface IMicrophoneService
{
Task<bool> GetPermissionsAsync();
void OnRequestPermissionsResult(bool isGranted);
}
}
Add sample code for the helloworld.Android project
Now add the C# code that defines the Android-specific part of the application.
In Solution Explorer, under the helloworld.Android project, open MainActivity.cs.
Replace all the code in it with the following snippet:
//
// Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
//
using System;
using Android.App;
using Android.Content.PM;
using Android.Runtime;
using Android.OS;
namespace helloworld.Droid
{
[Activity(Label = "helloworld", Icon = "@mipmap/icon", Theme = "@style/MainTheme", MainLauncher = true, ConfigurationChanges = ConfigChanges.ScreenSize | ConfigChanges.Orientation)]
public class MainActivity : global::Xamarin.Forms.Platform.Android.FormsAppCompatActivity
{
private const int RECORD_AUDIO = 1;
private IMicrophoneService micService;
internal static MainActivity Instance { get; private set; }
protected override void OnCreate(Bundle savedInstanceState)
{
Instance = this;
TabLayoutResource = Resource.Layout.Tabbar;
ToolbarResource = Resource.Layout.Toolbar;
base.OnCreate(savedInstanceState);
global::Xamarin.Forms.Forms.Init(this, savedInstanceState);
LoadApplication(new App());
Xamarin.Forms.DependencyService.Register<IMicrophoneService, MicrophoneService>();
micService = Xamarin.Forms.DependencyService.Get<IMicrophoneService>();
}
public override void OnRequestPermissionsResult(int requestCode, string[] permissions, [GeneratedEnum] Permission[] grantResults)
{
base.OnRequestPermissionsResult(requestCode, permissions, grantResults);
switch (requestCode)
{
case RECORD_AUDIO:
{
if (grantResults[0] == Permission.Granted)
{
micService.OnRequestPermissionsResult(true);
}
else
{
micService.OnRequestPermissionsResult(false);
}
}
break;
}
}
}
}
Next, add Android-specific implementation for MicrophoneService by creating the new folder Services under the helloworld.Android project. After that, create a new C# source file under it. Rename the file MicrophoneService.cs. Copy and paste the following code snippet into that file:
//
// Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
//
using System;
using System.Threading.Tasks;
using Android;
using Android.App;
using Android.OS;
using Android.Support.Design.Widget;
using Android.Support.V4.App;
namespace helloworld.Droid
{
class MicrophoneService : IMicrophoneService
{
public const int REQUEST_MIC = 1;
private string[] permissions = { Manifest.Permission.RecordAudio };
private TaskCompletionSource<bool> tcsPermissions;
public Task<bool> GetPermissionsAsync()
{
tcsPermissions = new TaskCompletionSource<bool>();
// Permissions are required only for Marshmallow and up
if ((int)Build.VERSION.SdkInt < 23)
{
tcsPermissions.TrySetResult(true);
}
else
{
var currentActivity = MainActivity.Instance;
if (ActivityCompat.CheckSelfPermission(currentActivity, Manifest.Permission.RecordAudio) != (int)Android.Content.PM.Permission.Granted)
{
RequestMicPermission();
}
else
{
tcsPermissions.TrySetResult(true);
}
}
return tcsPermissions.Task;
}
private void RequestMicPermission()
{
var currentActivity = MainActivity.Instance;
if (ActivityCompat.ShouldShowRequestPermissionRationale(currentActivity, Manifest.Permission.RecordAudio))
{
Snackbar.Make(currentActivity.FindViewById((Android.Resource.Id.Content)), "App requires microphone permission.", Snackbar.LengthIndefinite).SetAction("Ok", v =>
{
((Activity)currentActivity).RequestPermissions(permissions, REQUEST_MIC);
}).Show();
}
else
{
ActivityCompat.RequestPermissions(((Activity)currentActivity), permissions, REQUEST_MIC);
}
}
public void OnRequestPermissionsResult(bool isGranted)
{
tcsPermissions.TrySetResult(isGranted);
}
}
}
After that, open AndroidManifest.xml under the Properties folder. Add the following uses-permission setting for the microphone between <manifest> and </manifest>:
Now add the C# code that defines the iOS-specific part of the application. Also create Apple device-specific configurations to the helloworld.iOS project.
In Solution Explorer, under the helloworld.iOS project, open AppDelegate.cs.
Replace all the code in it with the following snippet:
//
// Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
//
using System;
using Foundation;
using UIKit;
namespace helloworld.iOS
{
// The UIApplicationDelegate for the application. This class is responsible for launching the
// User Interface of the application, as well as listening (and optionally responding) to
// application events from iOS.
[Register("AppDelegate")]
public partial class AppDelegate : global::Xamarin.Forms.Platform.iOS.FormsApplicationDelegate
{
//
// This method is invoked when the application has loaded and is ready to run. In this
// method you should instantiate the window, load the UI into it and then make the window
// visible.
//
// You have 17 seconds to return from this method, or iOS will terminate your application.
//
public override bool FinishedLaunching(UIApplication app, NSDictionary options)
{
global::Xamarin.Forms.Forms.Init();
LoadApplication(new App());
Xamarin.Forms.DependencyService.Register<IMicrophoneService, MicrophoneService>();
return base.FinishedLaunching(app, options);
}
}
}
Next, add iOS-specific implementation for MicrophoneService by creating the new folder Services under the helloworld.iO project. After that, create a new C# source file under it. Rename the file MicrophoneService.cs. Copy and paste the following code snippet into that file:
//
// Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
//
using AVFoundation;
using System;
using System.Threading.Tasks;
namespace helloworld.iOS
{
class MicrophoneService : IMicrophoneService
{
private TaskCompletionSource<bool> tcsPermissions;
public Task<bool> GetPermissionsAsync()
{
tcsPermissions = new TaskCompletionSource<bool>();
RequestMicPermission();
return tcsPermissions.Task;
}
private void RequestMicPermission()
{
var session = AVAudioSession.SharedInstance();
session.RequestRecordPermission((granted) =>
{
Console.WriteLine($"Audio Permission: {granted}");
if (granted)
{
tcsPermissions.TrySetResult(granted);
}
else
{
tcsPermissions.TrySetResult(false);
Console.WriteLine("YOU MUST ENABLE MICROPHONE PERMISSION");
}
});
}
public void OnRequestPermissionsResult(bool isGranted)
{
tcsPermissions.TrySetResult(isGranted);
}
}
}
Open Info.plist under the helloworld.iOS project in the text editor. Add the following key value pair under the dict section:
If you're building for an iPhone device, ensure that Bundle Identifier matches your device's provisioning profile app ID. Otherwise, the build will fail. With iPhoneSimulator, you can leave it as is.
If you're building on a Windows PC, establish a connection to the Mac device for building via Tools > iOS > Pair to Mac. Follow the instruction wizard provided by Visual Studio to enable the connection to the Mac device.
Add sample code for the helloworld.UWP project
Add sample code for the helloworld.UWP project
Now add the C# code that defines the UWP-specific part of the application.
In Solution Explorer, under the helloworld.UWP project, open MainPage.xaml.cs.
Replace all the code in it with the following snippet:
//
// Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
//
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Runtime.InteropServices.WindowsRuntime;
using Windows.Foundation;
using Windows.Foundation.Collections;
using Windows.UI.Xaml;
using Windows.UI.Xaml.Controls;
using Windows.UI.Xaml.Controls.Primitives;
using Windows.UI.Xaml.Data;
using Windows.UI.Xaml.Input;
using Windows.UI.Xaml.Media;
using Windows.UI.Xaml.Navigation;
namespace helloworld.UWP
{
public sealed partial class MainPage
{
public MainPage()
{
this.InitializeComponent();
LoadApplication(new helloworld.App());
Xamarin.Forms.DependencyService.Register<IMicrophoneService, MicrophoneService>();
}
}
}
Next, add a UWP-specific implementation for MicrophoneService by creating the new folder Services under the helloworld.UWP project. After that, create a new C# source file under it. Rename the file MicrophoneService.cs. Copy and paste the following code snippet into that file:
//
// Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
//
using System;
using System.Threading.Tasks;
namespace helloworld.UWP
{
class MicrophoneService : IMicrophoneService
{
public async Task<bool> GetPermissionsAsync()
{
bool isMicAvailable = true;
try
{
var mediaCapture = new Windows.Media.Capture.MediaCapture();
var settings = new Windows.Media.Capture.MediaCaptureInitializationSettings();
settings.StreamingCaptureMode = Windows.Media.Capture.StreamingCaptureMode.Audio;
await mediaCapture.InitializeAsync(settings);
}
catch (Exception)
{
isMicAvailable = false;
}
if (!isMicAvailable)
{
await Windows.System.Launcher.LaunchUriAsync(new Uri("ms-settings:privacy-microphone"));
}
return isMicAvailable;
}
public void OnRequestPermissionsResult(bool isGranted)
{
}
}
}
Next, double-click the Package.appxmanifest file under the helloworld.UWP project inside Visual Studio. Under Capabilities, make sure that Microphone is selected, and save the file.
Next double click Package.appxmanifest file under the helloworld.UWP project inside Visual Studio and under Capabilities > Microphone is checked and save the file.
Note: In case you see warning : Certificate file does not exist: helloworld.UWP_TemporaryKey.pfx, please check speech to text sample for more information.
From the menu bar, select File > Save All to save your changes.
Build and run the UWP application
Set helloworld.UWP as a startup project. Right-click the helloworld.UWP project, and select Build to build the application.
Select Debug > Start Debugging (or select F5) to start the application. The helloworld window appears.
Select Enable Microphone. When the access permission request appears, select Yes.
Select Start Speech recognition, and speak an English phrase or sentence into your device's microphone. Your speech is transmitted to the Speech service and transcribed to text, which appears in the window.
Build and run the Android and iOS applications
Building and running Android and iOS applications in the device or simulator happen in a similar way to the UWP. Make sure all SDKs are installed correctly as required in the "Prerequisites" section of this article.
In this quickstart, you'll use the Speech SDK to interactively recognize speech from a microphone input, and get the text transcription from captured audio. It's easy to integrate this feature into your apps or devices for common recognition tasks, such as transcribing conversations. It can also be used for more complex integrations, like using the Bot Framework with the Speech SDK to build voice assistants.
After satisfying a few prerequisites, recognizing speech from a microphone only takes four steps:
Create a SpeechConfig object from your subscription key and region.
Create a SpeechRecognizer object using the SpeechConfig object from above.
Using the SpeechRecognizer object, start the recognition process for a single utterance.
Inspect the SpeechRecognitionResult returned.
If you prefer to jump right in, view or download all Speech SDK C++ Samples on GitHub. Otherwise, let's get started.
Make sure that you have access to a microphone for audio capture
Add sample code
Create a C++ source file named helloworld.cpp, and paste the following code into it.
#include <iostream> // cin, cout
#include <speechapi_cxx.h>
using namespace std;
using namespace Microsoft::CognitiveServices::Speech;
void recognizeSpeech() {
// Creates an instance of a speech config with specified subscription key and service region.
// Replace with your own subscription key and service region (e.g., "westus").
auto config = SpeechConfig::FromSubscription("YourSubscriptionKey", "YourServiceRegion");
// Creates a speech recognizer
auto recognizer = SpeechRecognizer::FromConfig(config);
cout << "Say something...\n";
// Starts speech recognition, and returns after a single utterance is recognized. The end of a
// single utterance is determined by listening for silence at the end or until a maximum of 15
// seconds of audio is processed. The task returns the recognition text as result.
// Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
// shot recognition like command or query.
// For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
auto result = recognizer->RecognizeOnceAsync().get();
// Checks result.
if (result->Reason == ResultReason::RecognizedSpeech) {
cout << "We recognized: " << result->Text << std::endl;
}
else if (result->Reason == ResultReason::NoMatch) {
cout << "NOMATCH: Speech could not be recognized." << std::endl;
}
else if (result->Reason == ResultReason::Canceled) {
auto cancellation = CancellationDetails::FromResult(result);
cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
if (cancellation->Reason == CancellationReason::Error) {
cout << "CANCELED: ErrorCode= " << (int)cancellation->ErrorCode << std::endl;
cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
cout << "CANCELED: Did you update the subscription info?" << std::endl;
}
}
}
int main(int argc, char **argv) {
setlocale(LC_ALL, "");
recognizeSpeech();
return 0;
}
In this new file, replace the string YourSubscriptionKey with your Speech service subscription key.
Replace the string YourServiceRegion with the region associated with your subscription (for example, westus for the free trial subscription).
Note
The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.
Build the app
Note
Make sure to enter the commands below as a single command line. The easiest way to do that is to copy the command by using the Copy button next to each command, and then paste it at your shell prompt.
On an x64 (64-bit) system, run the following command to build the application.
In the console window, a prompt appears, requesting that you say something. Speak an English phrase or sentence. Your speech is transmitted to the Speech service and transcribed to text, which appears in the same window.
Say something...
We recognized: What's the weather like?
Make sure that you have access to a microphone for audio capture
Add sample code
Create a C++ source file named helloworld.cpp, and paste the following code into it.
#include <iostream> // cin, cout
#include <MicrosoftCognitiveServicesSpeech/speechapi_cxx.h>
using namespace std;
using namespace Microsoft::CognitiveServices::Speech;
void recognizeSpeech() {
// Creates an instance of a speech config with specified subscription key and service region.
// Replace with your own subscription key and service region (e.g., "westus").
auto config = SpeechConfig::FromSubscription("YourSubscriptionKey", "YourServiceRegion");
// Creates a speech recognizer
auto recognizer = SpeechRecognizer::FromConfig(config);
cout << "Say something...\n";
// Performs recognition. RecognizeOnceAsync() returns when the first utterance has been recognized,
// so it is suitable only for single shot recognition like command or query. For long-running
// recognition, use StartContinuousRecognitionAsync() instead.
auto result = recognizer->RecognizeOnceAsync().get();
// Checks result.
if (result->Reason == ResultReason::RecognizedSpeech) {
cout << "We recognized: " << result->Text << std::endl;
}
else if (result->Reason == ResultReason::NoMatch) {
cout << "NOMATCH: Speech could not be recognized." << std::endl;
}
else if (result->Reason == ResultReason::Canceled) {
auto cancellation = CancellationDetails::FromResult(result);
cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
if (cancellation->Reason == CancellationReason::Error) {
cout << "CANCELED: ErrorCode= " << (int)cancellation->ErrorCode << std::endl;
cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
cout << "CANCELED: Did you update the subscription info?" << std::endl;
}
}
}
int main(int argc, char **argv) {
setlocale(LC_ALL, "");
recognizeSpeech();
return 0;
}
In this new file, replace the string YourSubscriptionKey with your Speech service subscription key.
Replace the string YourServiceRegion with the region associated with your subscription (for example, westus for the free trial subscription).
Note
The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.
Build the app
Note
Make sure to enter the commands below as a single command line. The easiest way to do that is to copy the command by using the Copy button next to each command, and then paste it at your shell prompt.
Run the following command to build the application.
In the console window, a prompt appears, requesting that you say something. Speak an English phrase or sentence. Your speech is transmitted to the Speech service and transcribed to text, which appears in the same window.
Say something...
We recognized: What's the weather like?
Make sure that you have access to a microphone for audio capture
Add sample code
Open the source file helloworld.cpp.
Replace all the code with the following snippet:
#include <iostream>
#include <speechapi_cxx.h>
using namespace std;
using namespace Microsoft::CognitiveServices::Speech;
void recognizeSpeech()
{
// Creates an instance of a speech config with specified subscription key and service region.
// Replace with your own subscription key and service region (e.g., "westus").
auto config = SpeechConfig::FromSubscription("YourSubscriptionKey", "YourServiceRegion");
// Creates a speech recognizer.
auto recognizer = SpeechRecognizer::FromConfig(config);
cout << "Say something...\n";
// Starts speech recognition, and returns after a single utterance is recognized. The end of a
// single utterance is determined by listening for silence at the end or until a maximum of 15
// seconds of audio is processed. The task returns the recognition text as result.
// Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
// shot recognition like command or query.
// For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
auto result = recognizer->RecognizeOnceAsync().get();
// Checks result.
if (result->Reason == ResultReason::RecognizedSpeech)
{
cout << "We recognized: " << result->Text << std::endl;
}
else if (result->Reason == ResultReason::NoMatch)
{
cout << "NOMATCH: Speech could not be recognized." << std::endl;
}
else if (result->Reason == ResultReason::Canceled)
{
auto cancellation = CancellationDetails::FromResult(result);
cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
if (cancellation->Reason == CancellationReason::Error)
{
cout << "CANCELED: ErrorCode= " << (int)cancellation->ErrorCode << std::endl;
cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
cout << "CANCELED: Did you update the subscription info?" << std::endl;
}
}
}
int wmain()
{
recognizeSpeech();
cout << "Please press a key to continue.\n";
cin.get();
return 0;
}
In the same file, replace the string YourSubscriptionKey with your subscription key.
Replace the string YourServiceRegion with the region associated with your subscription (for example, westus for the free trial subscription).
From the menu bar, choose File > Save All.
Note
The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.
Build and run the application
From the menu bar, select Build > Build Solution to build the application. The code should compile without errors now.
Choose Debug > Start Debugging (or press F5) to start the helloworld application.
Speak an English phrase or sentence. The application transmits your speech to the Speech service, which transcribes to text and sends it back to the application for display.
In this quickstart, you'll use the Speech SDK to interactively recognize speech from a microphone input, and get the text transcription from captured audio. It's easy to integrate this feature into your apps or devices for common recognition tasks, such as transcribing conversations. It can also be used for more complex integrations, like using the Bot Framework with the Speech SDK to build voice assistants.
After satisfying a few prerequisites, recognizing speech from a microphone only takes four steps:
Create a SpeechConfig object from your subscription key and region.
Create a SpeechRecognizer object using the SpeechConfig object from above.
Using the SpeechRecognizer object, start the recognition process for a single utterance.
Inspect the SpeechRecognitionResult returned.
If you prefer to jump right in, view or download all Speech SDK Java Samples on GitHub. Otherwise, let's get started.
Make sure that you have access to a microphone for audio capture
Add sample code
To add a new empty class to your Java project, select File > New > Class.
In the New Java Class window, enter speechsdk.quickstart into the Package field, and Main into the Name field.
Replace all code in Main.java with the following snippet:
package speechsdk.quickstart;
import java.util.concurrent.Future;
import com.microsoft.cognitiveservices.speech.*;
/**
* Quickstart: recognize speech using the Speech SDK for Java.
*/
public class Main {
/**
* @param args Arguments are ignored in this sample.
*/
public static void main(String[] args) {
try {
// Replace below with your own subscription key
String speechSubscriptionKey = "YourSubscriptionKey";
// Replace below with your own service region (e.g., "westus").
String serviceRegion = "YourServiceRegion";
int exitCode = 1;
SpeechConfig config = SpeechConfig.fromSubscription(speechSubscriptionKey, serviceRegion);
assert(config != null);
SpeechRecognizer reco = new SpeechRecognizer(config);
assert(reco != null);
System.out.println("Say something...");
Future<SpeechRecognitionResult> task = reco.recognizeOnceAsync();
assert(task != null);
SpeechRecognitionResult result = task.get();
assert(result != null);
if (result.getReason() == ResultReason.RecognizedSpeech) {
System.out.println("We recognized: " + result.getText());
exitCode = 0;
}
else if (result.getReason() == ResultReason.NoMatch) {
System.out.println("NOMATCH: Speech could not be recognized.");
}
else if (result.getReason() == ResultReason.Canceled) {
CancellationDetails cancellation = CancellationDetails.fromResult(result);
System.out.println("CANCELED: Reason=" + cancellation.getReason());
if (cancellation.getReason() == CancellationReason.Error) {
System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
System.out.println("CANCELED: Did you update the subscription info?");
}
}
reco.close();
System.exit(exitCode);
} catch (Exception ex) {
System.out.println("Unexpected exception: " + ex.getMessage());
assert(false);
System.exit(1);
}
}
}
Replace the string YourSubscriptionKey with your subscription key.
Replace the string YourServiceRegion with the region associated with your subscription (for example, westus for the free trial subscription).
Save changes to the project.
Note
The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.
Build and run the app
Press F11, or select Run > Debug.
The next 15 seconds of speech input from your microphone will be recognized and logged in the console window.
Make sure that you have access to a microphone for audio capture
Create a user interface
Now we'll create a basic user interface for the application. Edit the layout for your main activity, activity_main.xml. Initially, the layout includes a title bar with your application's name, and a TextView that contains the text "Hello World!".
Select the TextView element. Change its ID attribute in the upper-right corner to hello.
From the palette in the upper left of the activity_main.xml window, drag a button into the empty space above the text.
In the button's attributes on the right, in the value for the onClick attribute, enter onSpeechButtonClicked. We'll write a method with this name to handle the button event. Change its ID attribute in the upper-right corner to button.
Use the magic wand icon at the top of the designer to infer layout constraints.
The text and graphical representation of your UI should now look like this:
Open the source file MainActivity.java. Replace all the code in this file with the following:
package com.microsoft.cognitiveservices.speech.samples.quickstart;
import android.support.v4.app.ActivityCompat;
import android.support.v7.app.AppCompatActivity;
import android.os.Bundle;
import android.util.Log;
import android.view.View;
import android.widget.TextView;
import com.microsoft.cognitiveservices.speech.ResultReason;
import com.microsoft.cognitiveservices.speech.SpeechConfig;
import com.microsoft.cognitiveservices.speech.SpeechRecognitionResult;
import com.microsoft.cognitiveservices.speech.SpeechRecognizer;
import java.util.concurrent.Future;
import static android.Manifest.permission.*;
public class MainActivity extends AppCompatActivity {
// Replace below with your own subscription key
private static String speechSubscriptionKey = "YourSubscriptionKey";
// Replace below with your own service region (e.g., "westus").
private static String serviceRegion = "YourServiceRegion";
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
// Note: we need to request the permissions
int requestCode = 5; // unique code for the permission request
ActivityCompat.requestPermissions(MainActivity.this, new String[]{RECORD_AUDIO, INTERNET}, requestCode);
}
public void onSpeechButtonClicked(View v) {
TextView txt = (TextView) this.findViewById(R.id.hello); // 'hello' is the ID of your text view
try {
SpeechConfig config = SpeechConfig.fromSubscription(speechSubscriptionKey, serviceRegion);
assert(config != null);
SpeechRecognizer reco = new SpeechRecognizer(config);
assert(reco != null);
Future<SpeechRecognitionResult> task = reco.recognizeOnceAsync();
assert(task != null);
// Note: this will block the UI thread, so eventually, you want to
// register for the event (see full samples)
SpeechRecognitionResult result = task.get();
assert(result != null);
if (result.getReason() == ResultReason.RecognizedSpeech) {
txt.setText(result.toString());
}
else {
txt.setText("Error recognizing. Did you update the subscription info?" + System.lineSeparator() + result.toString());
}
reco.close();
} catch (Exception ex) {
Log.e("SpeechSDKDemo", "unexpected " + ex.getMessage());
assert(false);
}
}
}
The onCreate method includes code that requests microphone and internet permissions, and initializes the native platform binding. Configuring the native platform bindings is only required once. It should be done early during application initialization.
The method onSpeechButtonClicked is, as noted earlier, the button click handler. A button press triggers speech-to-text transcription.
In the same file, replace the string YourSubscriptionKey with your subscription key.
Also replace the string YourServiceRegion with the region associated with your subscription. For example, use westus for the free trial subscription.
To build the application, select Ctrl+F9, or select Build > Make Project from the menu bar.
To launch the application, select Shift+F10, or select Run > Run 'app'.
In the deployment target window that appears, select your Android device.
Select the button in the application to begin a speech recognition section. The next 15 seconds of English speech will be sent to the Speech service and transcribed. The result appears in the Android application, and in the logcat window in Android Studio.
In this quickstart, you'll use the Speech SDK to interactively recognize speech from a microphone input, and get the text transcription from captured audio. It's easy to integrate this feature into your apps or devices for common recognition tasks, such as transcribing conversations. It can also be used for more complex integrations, like using the Bot Framework with the Speech SDK to build voice assistants.
After satisfying a few prerequisites, recognizing speech from a microphone only takes four steps:
Create a SpeechConfig object from your subscription key and region.
Create a SpeechRecognizer object using the SpeechConfig object from above.
Using the SpeechRecognizer object, start the recognition process for a single utterance.
Inspect the SpeechRecognitionResult returned.
If you prefer to jump right in, view or download all Speech SDK Python Samples on GitHub. Otherwise, let's get started.
Make sure that you have access to a microphone for audio capture
Support and updates
Updates to the Speech SDK Python package are distributed via PyPI and announced in the Release notes.
If a new version is available, you can update to it with the command pip install --upgrade azure-cognitiveservices-speech.
Check which version is currently installed by inspecting the azure.cognitiveservices.speech.__version__ variable.
The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.
import azure.cognitiveservices.speech as speechsdk
# Creates an instance of a speech config with specified subscription key and service region.
# Replace with your own subscription key and service region (e.g., "westus").
speech_key, service_region = "YourSubscriptionKey", "YourServiceRegion"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
# Creates a recognizer with the given settings
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
print("Say something...")
# Starts speech recognition, and returns after a single utterance is recognized. The end of a
# single utterance is determined by listening for silence at the end or until a maximum of 15
# seconds of audio is processed. The task returns the recognition text as result.
# Note: Since recognize_once() returns only a single utterance, it is suitable only for single
# shot recognition like command or query.
# For long-running multi-utterance recognition, use start_continuous_recognition() instead.
result = speech_recognizer.recognize_once()
# Checks result.
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = result.cancellation_details
print("Speech Recognition canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
print("Error details: {}".format(cancellation_details.error_details))
Install and use the Speech SDK with Visual Studio Code
Download and install a 64-bit version of Python, 3.5 or later, on your computer.
Open Visual Studio Code and install the Python extension. Select File > Preferences > Extensions from the menu. Search for Python.
Create a folder to store the project in. An example is by using Windows Explorer.
In Visual Studio Code, select the File icon. Then open the folder you created.
Create a new Python source file, speechsdk.py, by selecting the new file icon.
Copy, paste, and save the Python code to the newly created file.
Insert your Speech service subscription information.
If selected, a Python interpreter displays on the left side of the status bar at the bottom of the window.
Otherwise, bring up a list of available Python interpreters. Open the command palette (Ctrl+Shift+P) and enter Python: Select Interpreter. Choose an appropriate one.
You can install the Speech SDK Python package from within Visual Studio Code. Do that if it's not installed yet for the Python interpreter you selected.
To install the Speech SDK package, open a terminal. Bring up the command palette again (Ctrl+Shift+P) and enter Terminal: Create New Integrated Terminal.
In the terminal that opens, enter the command python -m pip install azure-cognitiveservices-speech or the appropriate command for your system.
To run the sample code, right-click somewhere inside the editor. Select Run Python File in Terminal.
Speak a few words when you're prompted. The transcribed text displays shortly afterward.
If you've clicked this tab, you probably didn't see a quickstart in your favorite programming language. Don't worry, we have additional quickstart materials and code samples available on GitHub. Use the table to find the right sample for your programming language and platform/OS combination.