Quickstart: Recognize and convert speech to text

Reference documentation | Package (NuGet) | Additional Samples on GitHub

In this quickstart, you run an application to recognize and transcribe human speech (often called speech-to-text).

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Set up the environment

The Speech SDK is available as a NuGet package and implements .NET Standard 2.0. You install the Speech SDK in the next section of this article, but first check the platform-specific installation instructions for any more requirements.

Recognize speech from a microphone

Follow these steps to create a new console application and install the Speech SDK.

  1. Open a command prompt where you want the new project, and create a console application with the .NET CLI.

    dotnet new console
    
  2. Install the Speech SDK in your new project with the .NET CLI.

    dotnet add package Microsoft.CognitiveServices.Speech
    
  3. Replace the contents of Program.cs with the following code.

    using System;
    using System.IO;
    using System.Threading.Tasks;
    using Microsoft.CognitiveServices.Speech;
    using Microsoft.CognitiveServices.Speech.Audio;
    
    class Program 
    {
        static string YourSubscriptionKey = "YourSubscriptionKey";
        static string YourServiceRegion = "YourServiceRegion";
    
        static void OutputSpeechRecognitionResult(SpeechRecognitionResult speechRecognitionResult)
        {
            switch (speechRecognitionResult.Reason)
            {
                case ResultReason.RecognizedSpeech:
                    Console.WriteLine($"RECOGNIZED: Text={speechRecognitionResult.Text}");
                    break;
                case ResultReason.NoMatch:
                    Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                    break;
                case ResultReason.Canceled:
                    var cancellation = CancellationDetails.FromResult(speechRecognitionResult);
                    Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
    
                    if (cancellation.Reason == CancellationReason.Error)
                    {
                        Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                        Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                        Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
                    }
                    break;
            }
        }
    
        async static Task Main(string[] args)
        {
            var speechConfig = SpeechConfig.FromSubscription(YourSubscriptionKey, YourServiceRegion);        
            speechConfig.SpeechRecognitionLanguage = "en-US";
    
            using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
            using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
    
            Console.WriteLine("Speak into your microphone.");
            var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync();
            OutputSpeechRecognitionResult(speechRecognitionResult);
        }
    }
    
  4. In Program.cs, replace YourSubscriptionKey with your Speech resource key, and replace YourServiceRegion with your Speech resource region.

  5. To change the speech recognition language, replace en-US with another supported language. For example, es-ES for Spanish (Spain). The default language is en-us if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see language identification.

Run your new console application to start speech recognition from a microphone:

dotnet run

Speak into your microphone when prompted. What you speak should be output as text:

Speak into your microphone.
RECOGNIZED: Text=I'm excited to try speech to text.

Here are some additional considerations:

  • This example uses the RecognizeOnceAsync operation to transcribe utterances of up to 30 seconds, or until silence is detected. For information about continuous recognition for longer audio, including multi-lingual conversations, see How to recognize speech.
  • To recognize speech from an audio file, use FromWavFileInput instead of FromDefaultMicrophoneInput:
    using var audioConfig = AudioConfig.FromWavFileInput("YourAudioFile.wav");
    
  • For compressed audio files such as MP4, install GStreamer and use PullAudioInputStream or PushAudioInputStream. For more information, see How to use compressed input audio.

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Reference documentation | Package (NuGet) | Additional Samples on GitHub

In this quickstart, you run an application to recognize and transcribe human speech (often called speech-to-text).

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Set up the environment

The Speech SDK is available as a NuGet package and implements .NET Standard 2.0. You install the Speech SDK in the next section of this article, but first check the platform-specific installation instructions for any more requirements.

Recognize speech from a microphone

Follow these steps to create a new console application and install the Speech SDK.

  1. Create a new C++ console project in Visual Studio.

  2. Install the Speech SDK in your new project with the NuGet package manager.

    Install-Package Microsoft.CognitiveServices.Speech
    
  3. Replace the contents of main.cpp with the following code:

    #include <iostream> 
    #include <speechapi_cxx.h>
    
    using namespace Microsoft::CognitiveServices::Speech;
    using namespace Microsoft::CognitiveServices::Speech::Audio;
    
    auto YourSubscriptionKey = "YourSubscriptionKey";
    auto YourServiceRegion = "YourServiceRegion";
    
    int main()
    {
        auto speechConfig = SpeechConfig::FromSubscription(YourSubscriptionKey, YourServiceRegion);
        speechConfig->SetSpeechRecognitionLanguage("en-US");
    
        // To recognize speech from an audio file, use `FromWavFileInput` instead of `FromDefaultMicrophoneInput`:
        // auto audioInput = AudioConfig::FromWavFileInput("YourAudioFile.wav");
        auto audioConfig = AudioConfig::FromDefaultMicrophoneInput();
        auto recognizer = SpeechRecognizer::FromConfig(speechConfig, audioConfig);
    
        std::cout << "Speak into your microphone.\n";
        auto result = recognizer->RecognizeOnceAsync().get();
    
        if (result->Reason == ResultReason::RecognizedSpeech)
        {
            std::cout << "RECOGNIZED: Text=" << result->Text << std::endl;
        }
        else if (result->Reason == ResultReason::NoMatch)
        {
            std::cout << "NOMATCH: Speech could not be recognized." << std::endl;
        }
        else if (result->Reason == ResultReason::Canceled)
        {
            auto cancellation = CancellationDetails::FromResult(result);
            std::cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
    
            if (cancellation->Reason == CancellationReason::Error)
            {
                std::cout << "CANCELED: ErrorCode=" << (int)cancellation->ErrorCode << std::endl;
                std::cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
                std::cout << "CANCELED: Did you set the speech resource key and region values?" << std::endl;
            }
        }
    }
    
  4. In main.cpp, replace YourSubscriptionKey with your Speech resource key, and replace YourServiceRegion with your Speech resource region.

  5. To change the speech recognition language, replace en-US with another supported language. For example, es-ES for Spanish (Spain). The default language is en-us if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see language identification.

Build and run your new console application to start speech recognition from a microphone.

Speak into your microphone when prompted. What you speak should be output as text:

Speak into your microphone.
RECOGNIZED: Text=I'm excited to try speech to text.

Here are some additional considerations:

  • This example uses the RecognizeOnceAsync operation to transcribe utterances of up to 30 seconds, or until silence is detected. For information about continuous recognition for longer audio, including multi-lingual conversations, see How to recognize speech.
  • To recognize speech from an audio file, use FromWavFileInput instead of FromDefaultMicrophoneInput:
    auto audioInput = AudioConfig::FromWavFileInput("YourAudioFile.wav");
    
  • For compressed audio files such as MP4, install GStreamer and use PullAudioInputStream or PushAudioInputStream. For more information, see How to use compressed input audio.

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Reference documentation | Package (Go) | Additional Samples on GitHub

In this quickstart, you run an application to recognize and transcribe human speech (often called speech-to-text).

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Set up the environment

Install the Speech SDK for Go. Check the platform-specific installation instructions for any more requirements.

Recognize speech from a microphone

Follow these steps to create a new GO module.

  1. Open a command prompt where you want the new module, and create a new file named speech-recognition.go.

  2. Copy the following code into speech-recognition.go:

    package main
    
    import (
    	"bufio"
    	"fmt"
    	"os"
    
    	"github.com/Microsoft/cognitive-services-speech-sdk-go/audio"
    	"github.com/Microsoft/cognitive-services-speech-sdk-go/speech"
    )
    
    func sessionStartedHandler(event speech.SessionEventArgs) {
    	defer event.Close()
    	fmt.Println("Session Started (ID=", event.SessionID, ")")
    }
    
    func sessionStoppedHandler(event speech.SessionEventArgs) {
    	defer event.Close()
    	fmt.Println("Session Stopped (ID=", event.SessionID, ")")
    }
    
    func recognizingHandler(event speech.SpeechRecognitionEventArgs) {
    	defer event.Close()
    	fmt.Println("Recognizing:", event.Result.Text)
    }
    
    func recognizedHandler(event speech.SpeechRecognitionEventArgs) {
    	defer event.Close()
    	fmt.Println("Recognized:", event.Result.Text)
    }
    
    func cancelledHandler(event speech.SpeechRecognitionCanceledEventArgs) {
    	defer event.Close()
    	fmt.Println("Received a cancellation: ", event.ErrorDetails)
        fmt.Println("Did you set the speech resource key and region values?")
    }
    
    func main() {
        key :=  "YourSubscriptionKey"
        region := "YourServiceRegion"
    
    	audioConfig, err := audio.NewAudioConfigFromDefaultMicrophoneInput()
    	if err != nil {
    		fmt.Println("Got an error: ", err)
    		return
    	}
    	defer audioConfig.Close()
    	speechConfig, err := speech.NewSpeechConfigFromSubscription(key, region)
    	if err != nil {
    		fmt.Println("Got an error: ", err)
    		return
    	}
    	defer speechConfig.Close()
    	speechRecognizer, err := speech.NewSpeechRecognizerFromConfig(speechConfig, audioConfig)
    	if err != nil {
    		fmt.Println("Got an error: ", err)
    		return
    	}
    	defer speechRecognizer.Close()
    	speechRecognizer.SessionStarted(sessionStartedHandler)
    	speechRecognizer.SessionStopped(sessionStoppedHandler)
    	speechRecognizer.Recognizing(recognizingHandler)
    	speechRecognizer.Recognized(recognizedHandler)
    	speechRecognizer.Canceled(cancelledHandler)
    	speechRecognizer.StartContinuousRecognitionAsync()
    	defer speechRecognizer.StopContinuousRecognitionAsync()
    	bufio.NewReader(os.Stdin).ReadBytes('\n')
    }
    
  3. In speech-recognition.go, replace YourSubscriptionKey with your Speech resource key, and replace YourServiceRegion with your Speech resource region.

Run the following commands to create a go.mod file that links to components hosted on GitHub:

go mod init speech-recognition
go get github.com/Microsoft/cognitive-services-speech-sdk-go

Now build and run the code:

go build
go run speech-recognition

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Reference documentation | Additional Samples on GitHub

In this quickstart, you run an application to recognize and transcribe human speech (often called speech-to-text).

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Set up the environment

Before you can do anything, you need to install the Speech SDK. The sample in this quickstart works with the Java Runtime.

Recognize speech from a microphone

Follow these steps to create a new console application for speech recognition.

  1. Open a command prompt where you want the new project, and create a new file named SpeechRecognition.java.

  2. Copy the following code into SpeechRecognition.java:

    import com.microsoft.cognitiveservices.speech.*;
    import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
    
    import java.util.concurrent.ExecutionException;
    import java.util.concurrent.Future;
    
    public class SpeechRecognition {
        private static String YourSubscriptionKey = "YourSubscriptionKey";
        private static String YourServiceRegion = "YourServiceRegion";
    
        public static void main(String[] args) throws InterruptedException, ExecutionException {
            SpeechConfig speechConfig = SpeechConfig.fromSubscription(YourSubscriptionKey, YourServiceRegion);
            speechConfig.setSpeechRecognitionLanguage("en-US");
            recognizeFromMicrophone(speechConfig);
        }
    
        public static void recognizeFromMicrophone(SpeechConfig speechConfig) throws InterruptedException, ExecutionException {
            AudioConfig audioConfig = AudioConfig.fromDefaultMicrophoneInput();
            SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
    
            System.out.println("Speak into your microphone.");
            Future<SpeechRecognitionResult> task = speechRecognizer.recognizeOnceAsync();
            SpeechRecognitionResult speechRecognitionResult = task.get();
    
            if (speechRecognitionResult.getReason() == ResultReason.RecognizedSpeech) {
                System.out.println("RECOGNIZED: Text=" + speechRecognitionResult.getText());
            }
            else if (speechRecognitionResult.getReason() == ResultReason.NoMatch) {
                System.out.println("NOMATCH: Speech could not be recognized.");
            }
            else if (speechRecognitionResult.getReason() == ResultReason.Canceled) {
                CancellationDetails cancellation = CancellationDetails.fromResult(speechRecognitionResult);
                System.out.println("CANCELED: Reason=" + cancellation.getReason());
    
                if (cancellation.getReason() == CancellationReason.Error) {
                    System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
                    System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
                    System.out.println("CANCELED: Did you set the speech resource key and region values?");
                }
            }
    
            System.exit(0);
        }
    }
    
  3. In SpeechRecognition.java, replace YourSubscriptionKey with your Speech resource key, and replace YourServiceRegion with your Speech resource region.

  4. To change the speech recognition language, replace en-US with another supported language. For example, es-ES for Spanish (Spain). The default language is en-us if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see language identification.

Run your new console application to start speech recognition from a microphone:

java SpeechRecognition

Speak into your microphone when prompted. What you speak should be output as text:

Speak into your microphone.
RECOGNIZED: Text=I'm excited to try speech to text.

Here are some additional considerations:

  • This example uses the RecognizeOnceAsync operation to transcribe utterances of up to 30 seconds, or until silence is detected. For information about continuous recognition for longer audio, including multi-lingual conversations, see How to recognize speech.
  • To recognize speech from an audio file, use fromWavFileInput instead of fromDefaultMicrophoneInput:
    AudioConfig audioConfig = AudioConfig.fromWavFileInput("YourAudioFile.wav");
    
  • For compressed audio files such as MP4, install GStreamer and use PullAudioInputStream or PushAudioInputStream. For more information, see How to use compressed input audio.

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Reference documentation | Package (npm) | Additional Samples on GitHub | Library source code

In this quickstart, you run an application to recognize and transcribe human speech (often called speech-to-text).

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Set up the environment

Before you can do anything, you need to install the Speech SDK for JavaScript. If you just want the package name to install, run npm install microsoft-cognitiveservices-speech-sdk. For guided installation instructions, see Set up the development environment.

Recognize speech from a file

Follow these steps to create a new console application for speech recognition.

  1. Open a command prompt where you want the new project, and create a new file named SpeechRecognition.js.

  2. Install the Speech SDK for JavaScript:

    npm install microsoft-cognitiveservices-speech-sdk
    
  3. Copy the following code into SpeechRecognition.js:

    const fs = require("fs");
    const sdk = require("microsoft-cognitiveservices-speech-sdk");
    const speechConfig = sdk.SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");
    speechConfig.speechRecognitionLanguage = "en-US";
    
    function fromFile() {
        let audioConfig = sdk.AudioConfig.fromWavFileInput(fs.readFileSync("YourAudioFile.wav"));
        let speechRecognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
    
        speechRecognizer.recognizeOnceAsync(result => {
            switch (result.reason) {
                case sdk.ResultReason.RecognizedSpeech:
                    console.log(`RECOGNIZED: Text=${result.text}`);
                    break;
                case sdk.ResultReason.NoMatch:
                    console.log("NOMATCH: Speech could not be recognized.");
                    break;
                case sdk.ResultReason.Canceled:
                    const cancellation = sdk.CancellationDetails.fromResult(result);
                    console.log(`CANCELED: Reason=${cancellation.reason}`);
    
                    if (cancellation.reason == sdk.CancellationReason.Error) {
                        console.log(`CANCELED: ErrorCode=${cancellation.ErrorCode}`);
                        console.log(`CANCELED: ErrorDetails=${cancellation.errorDetails}`);
                        console.log("CANCELED: Did you set the speech resource key and region values?");
                    }
                    break;
            }
            speechRecognizer.close();
        });
    }
    fromFile();
    
  4. In SpeechRecognition.js, replace YourSubscriptionKey with your Speech resource key, and replace YourServiceRegion with your Speech resource region.

  5. In SpeechRecognition.js, replace YourAudioFile.wav with your own WAV file. This example only recognizes speech from a WAV file. For For information about other audio formats, see How to use compressed input audio. This example supports up to 30 seconds audio.

  6. To change the speech recognition language, replace en-US with another supported language. For example, es-ES for Spanish (Spain). The default language is en-us if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see language identification.

Run your new console application to start speech recognition from a file:

node.exe SpeechRecognition.js

The speech from the audio file should be output as text:

RECOGNIZED: Text=I'm excited to try speech to text.

This example uses the recognizeOnceAsync operation to transcribe utterances of up to 30 seconds, or until silence is detected. For information about continuous recognition for longer audio, including multi-lingual conversations, see How to recognize speech.

Note

Recognizing speech from a microphone is not supported in Node.js. It's supported only in a browser-based JavaScript environment. For more information, see the React sample and the implementation of speech-to-text from a microphone on GitHub. The React sample shows design patterns for the exchange and management of authentication tokens. It also shows the capture of audio from a microphone or file for speech-to-text conversions.

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Reference documentation | Package (Download) | Additional Samples on GitHub

In this quickstart, you run an application to recognize and transcribe human speech (often called speech-to-text).

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Set up the environment

The Speech SDK for Objective-C is distributed as a framework bundle. The framework supports both Objective-C and Swift on both iOS and macOS.

The Speech SDK can be used in Xcode projects as a CocoaPod, or downloaded directly here and linked manually. This guide uses a CocoaPod. Install the CocoaPod dependency manager as described in its installation instructions.

Recognize speech from a microphone

Follow these steps to recognize speech in a macOS application.

  1. Clone the Azure-Samples/cognitive-services-speech-sdk repository to get the Recognize speech from a microphone in Objective-C on macOS sample project. The repository also has iOS samples.

  2. Navigate to the directory of the downloaded sample app (helloworld) in a terminal.

  3. Run the command pod install. This will generate a helloworld.xcworkspace Xcode workspace containing both the sample app and the Speech SDK as a dependency.

  4. Open the helloworld.xcworkspace workspace in Xcode.

  5. Open the file named AppDelegate.m and locate the buttonPressed method as shown here.

    - (void)buttonPressed:(NSButton *)button {
        // Creates an instance of a speech config with specified subscription key and service region.
        // Replace with your own subscription key // and service region (e.g., "westus").
        NSString *speechKey = @"YourSubscriptionKey";
        NSString *serviceRegion = @"YourServiceRegion";
    
        SPXAudioConfiguration *audioConfig = [[SPXAudioConfiguration alloc] initWithMicrophone:nil];
        SPXSpeechConfiguration *speechConfig = [[SPXSpeechConfiguration alloc] initWithSubscription:speechKey region:serviceRegion];
        SPXSpeechRecognizer *speechRecognizer = [[SPXSpeechRecognizer alloc] initWithSpeechConfiguration:speechConfig language:@"en-US" audioConfiguration:audioConfig];
    
        NSLog(@"Speak into your microphone.");
    
        SPXSpeechRecognitionResult *speechResult = [speechRecognizer recognizeOnce];
    
        // Checks result.
        if (SPXResultReason_Canceled == speechResult.reason) {
            SPXCancellationDetails *details = [[SPXCancellationDetails alloc] initFromCanceledRecognitionResult:speechResult];
            NSLog(@"Speech recognition was canceled: %@. Did you set the speech resource key and region values?", details.errorDetails);
            [self.label setStringValue:([NSString stringWithFormat:@"Canceled: %@", details.errorDetails])];
        } else if (SPXResultReason_RecognizedSpeech == speechResult.reason) {
            NSLog(@"Speech recognition result received: %@", speechResult.text);
            [self.label setStringValue:(speechResult.text)];
        } else {
            NSLog(@"There was an error.");
            [self.label setStringValue:(@"Speech Recognition Error")];
        }
    }
    
  6. In AppDelegate.m, replace YourSubscriptionKey with your Speech resource key, and replace YourServiceRegion with your Speech resource region.

  7. To change the speech recognition language, replace en-US with another supported language. For example, es-ES for Spanish (Spain). The default language is en-us if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see language identification.

  8. Make the debug output visible (View > Debug Area > Activate Console).

  9. Build and run the example code by selecting Product -> Run from the menu or selecting the Play button.

After you select the button in the app and say a few words, you should see the text you have spoken on the lower part of the screen. When you run the app for the first time, you should be prompted to give the app access to your computer's microphone.

Here are some additional considerations:

  • This example uses the recognizeOnce operation to transcribe utterances of up to 30 seconds, or until silence is detected. For information about continuous recognition for longer audio, including multi-lingual conversations, see How to recognize speech.
  • To recognize speech from an audio file, use initWithWavFileInput instead of initWithMicrophone:
    SPXAudioConfiguration *audioConfig = [[SPXAudioConfiguration alloc] initWithWavFileInput:YourAudioFile];
    

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Reference documentation | Package (Download) | Additional Samples on GitHub

In this quickstart, you run an application to recognize and transcribe human speech (often called speech-to-text).

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Set up the environment

The Speech SDK for Swift is distributed as a framework bundle. The framework supports both Objective-C and Swift on both iOS and macOS.

The Speech SDK can be used in Xcode projects as a CocoaPod, or downloaded directly here and linked manually. This guide uses a CocoaPod. Install the CocoaPod dependency manager as described in its installation instructions.

Recognize speech from a microphone

Follow these steps to recognize speech in a macOS application.

  1. Clone the Azure-Samples/cognitive-services-speech-sdk repository to get the Recognize speech from a microphone in Swift on macOS sample project. The repository also has iOS samples.

  2. Navigate to the directory of the downloaded sample app (helloworld) in a terminal.

  3. Run the command pod install. This will generate a helloworld.xcworkspace Xcode workspace containing both the sample app and the Speech SDK as a dependency.

  4. Open the helloworld.xcworkspace workspace in Xcode.

  5. Open the file named AppDelegate.swift and locate the applicationDidFinishLaunching and recognizeFromMic methods as shown here.

    import Cocoa
    
    @NSApplicationMain
    class AppDelegate: NSObject, NSApplicationDelegate {
        var label: NSTextField!
        var fromMicButton: NSButton!
    
        var sub: String!
        var region: String!
    
        @IBOutlet weak var window: NSWindow!
    
        func applicationDidFinishLaunching(_ aNotification: Notification) {
            print("loading")
            // load subscription information
            sub = "YourSubscriptionKey"
            region = "YourServiceRegion"
    
            label = NSTextField(frame: NSRect(x: 100, y: 50, width: 200, height: 200))
            label.textColor = NSColor.black
            label.lineBreakMode = .byWordWrapping
    
            label.stringValue = "Recognition Result"
            label.isEditable = false
    
            self.window.contentView?.addSubview(label)
    
            fromMicButton = NSButton(frame: NSRect(x: 100, y: 300, width: 200, height: 30))
            fromMicButton.title = "Recognize"
            fromMicButton.target = self
            fromMicButton.action = #selector(fromMicButtonClicked)
            self.window.contentView?.addSubview(fromMicButton)
        }
    
        @objc func fromMicButtonClicked() {
            DispatchQueue.global(qos: .userInitiated).async {
                self.recognizeFromMic()
            }
        }
    
        func recognizeFromMic() {
            var speechConfig: SPXSpeechConfiguration?
            do {
                try speechConfig = SPXSpeechConfiguration(subscription: sub, region: region)
            } catch {
                print("error \(error) happened")
                speechConfig = nil
            }
            speechConfig?.speechRecognitionLanguage = "en-US"
    
            let audioConfig = SPXAudioConfiguration()
    
            let reco = try! SPXSpeechRecognizer(speechConfiguration: speechConfig!, audioConfiguration: audioConfig)
    
            reco.addRecognizingEventHandler() {reco, evt in
                print("intermediate recognition result: \(evt.result.text ?? "(no result)")")
                self.updateLabel(text: evt.result.text, color: .gray)
            }
    
            updateLabel(text: "Listening ...", color: .gray)
            print("Listening...")
    
            let result = try! reco.recognizeOnce()
            print("recognition result: \(result.text ?? "(no result)"), reason: \(result.reason.rawValue)")
            updateLabel(text: result.text, color: .black)
    
            if result.reason != SPXResultReason.recognizedSpeech {
                let cancellationDetails = try! SPXCancellationDetails(fromCanceledRecognitionResult: result)
                print("cancelled: \(result.reason), \(cancellationDetails.errorDetails)")
                print("Did you set the speech resource key and region values?")
                updateLabel(text: "Error: \(cancellationDetails.errorDetails)", color: .red)
            }
        }
    
        func updateLabel(text: String?, color: NSColor) {
            DispatchQueue.main.async {
                self.label.stringValue = text!
                self.label.textColor = color
            }
        }
    }
    
  6. In AppDelegate.m, replace YourSubscriptionKey with your Speech resource key, and replace YourServiceRegion with your Speech resource region.

  7. To change the speech recognition language, replace en-US with another supported language. For example, es-ES for Spanish (Spain). The default language is en-us if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see language identification.

  8. Make the debug output visible by selecting View > Debug Area > Activate Console.

  9. Build and run the example code by selecting Product -> Run from the menu or selecting the Play button.

After you select the button in the app and say a few words, you should see the text you have spoken on the lower part of the screen. When you run the app for the first time, you should be prompted to give the app access to your computer's microphone.

This example uses the recognizeOnce operation to transcribe utterances of up to 30 seconds, or until silence is detected. For information about continuous recognition for longer audio, including multi-lingual conversations, see How to recognize speech.

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Reference documentation | Package (PyPi) | Additional Samples on GitHub

In this quickstart, you run an application to recognize and transcribe human speech (often called speech-to-text).

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Set up the environment

The Speech SDK for Python is available as a Python Package Index (PyPI) module. The Speech SDK for Python is compatible with Windows, Linux, and macOS.

Install a version of Python from 3.7 to 3.10. First check the platform-specific installation instructions for any more requirements.

Recognize speech from a microphone

Follow these steps to create a new console application.

  1. Open a command prompt where you want the new project, and create a new file named speech-recognition.py.

  2. Run this command to install the Speech SDK:

    pip install azure-cognitiveservices-speech
    
  3. Copy the following code into speech_recognition.py:

    import azure.cognitiveservices.speech as speechsdk
    
    def recognize_from_microphone():
        speech_config = speechsdk.SpeechConfig(subscription="YourSubscriptionKey", region="YourServiceRegion")
        speech_config.speech_recognition_language="en-US"
    
        audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
        speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
    
        print("Speak into your microphone.")
        speech_recognition_result = speech_recognizer.recognize_once_async().get()
    
        if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
            print("Recognized: {}".format(speech_recognition_result.text))
        elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
            print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
        elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
            cancellation_details = speech_recognition_result.cancellation_details
            print("Speech Recognition canceled: {}".format(cancellation_details.reason))
            if cancellation_details.reason == speechsdk.CancellationReason.Error:
                print("Error details: {}".format(cancellation_details.error_details))
                print("Did you set the speech resource key and region values?")
    
    recognize_from_microphone()
    
  4. In speech_recognition.py, replace YourSubscriptionKey with your Speech resource key, and replace YourServiceRegion with your Speech resource region.

  5. To change the speech recognition language, replace en-US with another supported language. For example, es-ES for Spanish (Spain). The default language is en-us if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see language identification.

Run your new console application to start speech recognition from a microphone:

python speech_recognition.py

Speak into your microphone when prompted. What you speak should be output as text:

Speak into your microphone.
RECOGNIZED: Text=I'm excited to try speech to text.

Here are some additional considerations:

  • This example uses the recognize_once_async operation to transcribe utterances of up to 30 seconds, or until silence is detected. For information about continuous recognition for longer audio, including multi-lingual conversations, see How to recognize speech.
  • To recognize speech from an audio file, use filename instead of use_default_microphone:
    audio_config = speechsdk.audio.AudioConfig(filename="YourAudioFile.wav")
    
  • For compressed audio files such as MP4, install GStreamer and use PullAudioInputStream or PushAudioInputStream. For more information, see How to use compressed input audio.

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Speech-to-text REST API v3.0 reference | Speech-to-text REST API for short audio reference | Additional Samples on GitHub

In this quickstart, you run an application to recognize and transcribe human speech (often called speech-to-text).

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Recognize speech from a file

At a command prompt, run the following cURL command. Insert the following values into the command. Replace YourSubscriptionKey with your Speech resource key, replace YourServiceRegion with your Speech resource region, and replace YourAudioFile.wav with the path and name of your audio file.

key="YourSubscriptionKey"
region="YourServiceRegion"
audio_file=@'YourAudioFile.wav'

curl --location --request POST \
"https://$region.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US" \
--header "Ocp-Apim-Subscription-Key: $key" \
--header "Content-Type: audio/wav" \
--data-binary $audio_file

You should receive a response similar to what is shown here. The DisplayText should be the text that was recognized from your audio file.

{
    "RecognitionStatus": "Success",
    "DisplayText": "My voice is my passport, verify me.",
    "Offset": 6600000,
    "Duration": 32100000
}

For more information, see speech-to-text REST API for short audio.

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

In this quickstart, you run an application to recognize and transcribe human speech (often called speech-to-text).

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Set up the environment

Follow these steps and see the Speech CLI quickstart for additional requirements for your platform.

  1. Install the Speech CLI via the .NET CLI by entering this command:

    dotnet tool install --global Microsoft.CognitiveServices.Speech.CLI
    
  2. Configure your Speech resource key and region, by running the following commands. Replace SUBSCRIPTION-KEY with your Speech resource key, and replace REGION with your Speech resource region:

    spx config @key --set SUBSCRIPTION-KEY
    spx config @region --set REGION
    

Recognize speech from a microphone

Run the following command to start speech recognition from a microphone:

spx recognize --microphone --source en-US

Speak into the microphone, and you see transcription of your words into text in real time. The Speech CLI stops after a period of silence, 30 seconds, or when you press Ctrl+C.

Connection CONNECTED...
RECOGNIZED: I'm excited to try speech to text.

Now that you've transcribed speech to text, here are some suggested modifications to try out:

  • To recognize speech from an audio file, use --file instead of --microphone. For compressed audio files such as MP4, install GStreamer and use --format. For more information, see How to use compressed input audio.
    spx recognize --file YourAudioFile.wav
    spx recognize --file YourAudioFile.mp4 --format any
    
  • To improve recognition accuracy of specific words or utterances, use a phrase list. You include a phrase list in-line or with a text file along with the recognize command:
    spx recognize --microphone --phrases "Contoso;Jessie;Rehaan;"
    spx recognize --microphone --phrases @phrases.txt
    
  • To change the speech recognition language, replace en-US with another supported language. For example, es-ES for Spanish (Spain). The default language is en-us if you don't specify a language.
    spx recognize --microphone --source es-ES
    
  • For continuous recognition of audio longer than 30 seconds, append --continuous:
    spx recognize --microphone --source es-ES --continuous
    

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Next steps