Quickstart: Convert text to speech

Reference documentation | Package (NuGet) | Additional Samples on GitHub

In this quickstart, you run an application that does text to speech synthesis.

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Set up the environment

The Speech SDK is available as a NuGet package and implements .NET Standard 2.0. You install the Speech SDK in the next section of this article, but first check the platform-specific installation instructions for any more requirements.

Synthesize to speaker output

Follow these steps to create a new console application and install the Speech SDK.

  1. Open a command prompt where you want the new project, and create a console application with the .NET CLI.

    dotnet new console
    
  2. Install the Speech SDK in your new project with the .NET CLI.

    dotnet add package Microsoft.CognitiveServices.Speech
    
  3. Replace the contents of Program.cs with the following code.

    using System;
    using System.IO;
    using System.Threading.Tasks;
    using Microsoft.CognitiveServices.Speech;
    using Microsoft.CognitiveServices.Speech.Audio;
    
    class Program 
    {
        static string YourSubscriptionKey = "YourSubscriptionKey";
        static string YourServiceRegion = "YourServiceRegion";
    
        static void OutputSpeechSynthesisResult(SpeechSynthesisResult speechSynthesisResult, string text)
        {
            switch (speechSynthesisResult.Reason)
            {
                case ResultReason.SynthesizingAudioCompleted:
                    Console.WriteLine($"Speech synthesized for text: [{text}]");
                    break;
                case ResultReason.Canceled:
                    var cancellation = SpeechSynthesisCancellationDetails.FromResult(speechSynthesisResult);
                    Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
    
                    if (cancellation.Reason == CancellationReason.Error)
                    {
                        Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                        Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
                        Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
                    }
                    break;
                default:
                    break;
            }
        }
    
        async static Task Main(string[] args)
        {
            var speechConfig = SpeechConfig.FromSubscription(YourSubscriptionKey, YourServiceRegion);      
    
            // The language of the voice that speaks.
            speechConfig.SpeechSynthesisVoiceName = "en-US-JennyNeural"; 
    
            using (var speechSynthesizer = new SpeechSynthesizer(speechConfig))
            {
                // Get text from the console and synthesize to the default speaker.
                Console.WriteLine("Enter some text that you want to speak >");
                string text = Console.ReadLine();
    
                var speechSynthesisResult = await speechSynthesizer.SpeakTextAsync(text);
                OutputSpeechSynthesisResult(speechSynthesisResult, text);
            }
    
            Console.WriteLine("Press any key to exit...");
            Console.ReadKey();
        }
    }
    
  4. In Program.cs, replace YourSubscriptionKey with your Speech resource key, and replace YourServiceRegion with your Speech resource region.

  5. To change the speech synthesis language, replace en-US-JennyNeural with another supported voice. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you set es-ES-ElviraNeural, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.

Run your new console application to start speech synthesis to the default speaker.

dotnet run

Enter some text that you want to speak. For example, type "I'm excited to try text to speech." Press the Enter key to hear the synthesized speech.

Enter some text that you want to speak >
I'm excited to try text to speech

This quickstart uses the SpeakTextAsync operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Reference documentation | Package (NuGet) | Additional Samples on GitHub

In this quickstart, you run an application that does text to speech synthesis.

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Set up the environment

The Speech SDK is available as a NuGet package and implements .NET Standard 2.0. You install the Speech SDK in the next section of this article, but first check the platform-specific installation instructions for any more requirements.

Synthesize to speaker output

Follow these steps to create a new console application and install the Speech SDK.

  1. Create a new C++ console project in Visual Studio.

  2. Install the Speech SDK in your new project with the NuGet package manager.

    Install-Package Microsoft.CognitiveServices.Speech
    
  3. Replace the contents of main.cpp with the following code:

    #include <iostream> // cin, cout
    #include <speechapi_cxx.h>
    
    using namespace std;
    using namespace Microsoft::CognitiveServices::Speech;
    using namespace Microsoft::CognitiveServices::Speech::Audio;
    
    auto YourSubscriptionKey = "YourSubscriptionKey";
    auto YourServiceRegion = "YourServiceRegion";
    
    int main()
    {
        auto speechConfig = SpeechConfig::FromSubscription(YourSubscriptionKey, YourServiceRegion);
    
        // The language of the voice that speaks.
        speechConfig->SetSpeechSynthesisVoiceName("en-US-JennyNeural");
    
        auto synthesizer = SpeechSynthesizer::FromConfig(speechConfig);
    
        // Get text from the console and synthesize to the default speaker.
        cout << "Enter some text that you want to speak >" << std::endl;
        std::string text;
        getline(cin, text);
    
        auto result = synthesizer->SpeakTextAsync(text).get();
    
        // Checks result.
        if (result->Reason == ResultReason::SynthesizingAudioCompleted)
        {
            cout << "Speech synthesized to speaker for text [" << text << "]" << std::endl;
        }
        else if (result->Reason == ResultReason::Canceled)
        {
            auto cancellation = SpeechSynthesisCancellationDetails::FromResult(result);
            cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
    
            if (cancellation->Reason == CancellationReason::Error)
            {
                cout << "CANCELED: ErrorCode=" << (int)cancellation->ErrorCode << std::endl;
                cout << "CANCELED: ErrorDetails=[" << cancellation->ErrorDetails << "]" << std::endl;
                cout << "CANCELED: Did you set the speech resource key and region values?" << std::endl;
            }
        }
    
        cout << "Press enter to exit..." << std::endl;
        cin.get();
    }           
    
  4. In main.cpp, replace YourSubscriptionKey with your Speech resource key, and replace YourServiceRegion with your Speech resource region.

  5. To change the speech synthesis language, replace en-US-JennyNeural with another supported voice. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you set es-ES-ElviraNeural, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.

Build and run your new console application to start speech synthesis to the default speaker.

dotnet run

Enter some text that you want to speak. For example, type "I'm excited to try text to speech." Press the Enter key to hear the synthesized speech.

Enter some text that you want to speak >
I'm excited to try text to speech

This quickstart uses the SpeakTextAsync operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Reference documentation | Package (Go) | Additional Samples on GitHub

In this quickstart, you run an application that does text to speech synthesis.

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Set up the environment

Install the Speech SDK for Go. Check the platform-specific installation instructions for any more requirements.

Synthesize to speaker output

Follow these steps to create a new GO module.

  1. Open a command prompt where you want the new module, and create a new file named speech-synthesis.go.

  2. Copy the following code into speech_synthesis.go:

    package main
    
    import (
    	"bufio"
    	"fmt"
    	"os"
    	"strings"
    	"time"
    
    	"github.com/Microsoft/cognitive-services-speech-sdk-go/audio"
    	"github.com/Microsoft/cognitive-services-speech-sdk-go/common"
    	"github.com/Microsoft/cognitive-services-speech-sdk-go/speech"
    )
    
    func synthesizeStartedHandler(event speech.SpeechSynthesisEventArgs) {
    	defer event.Close()
    	fmt.Println("Synthesis started.")
    }
    
    func synthesizingHandler(event speech.SpeechSynthesisEventArgs) {
    	defer event.Close()
    	fmt.Printf("Synthesizing, audio chunk size %d.\n", len(event.Result.AudioData))
    }
    
    func synthesizedHandler(event speech.SpeechSynthesisEventArgs) {
    	defer event.Close()
    	fmt.Printf("Synthesized, audio length %d.\n", len(event.Result.AudioData))
    }
    
    func cancelledHandler(event speech.SpeechSynthesisEventArgs) {
    	defer event.Close()
    	fmt.Println("Received a cancellation.")
    }
    
    func main() {
        key :=  "YourSubscriptionKey"
        region := "YourServiceRegion"
    
    	audioConfig, err := audio.NewAudioConfigFromDefaultSpeakerOutput()
    	if err != nil {
    		fmt.Println("Got an error: ", err)
    		return
    	}
    	defer audioConfig.Close()
    	speechConfig, err := speech.NewSpeechConfigFromSubscription(key, region)
    	if err != nil {
    		fmt.Println("Got an error: ", err)
    		return
    	}
    	defer speechConfig.Close()
    
    	speechConfig.SetSpeechSynthesisVoiceName("en-US-JennyNeural")
    
    	speechSynthesizer, err := speech.NewSpeechSynthesizerFromConfig(speechConfig, audioConfig)
    	if err != nil {
    		fmt.Println("Got an error: ", err)
    		return
    	}
    	defer speechSynthesizer.Close()
    
    	speechSynthesizer.SynthesisStarted(synthesizeStartedHandler)
    	speechSynthesizer.Synthesizing(synthesizingHandler)
    	speechSynthesizer.SynthesisCompleted(synthesizedHandler)
    	speechSynthesizer.SynthesisCanceled(cancelledHandler)
    
    	for {
    		fmt.Printf("Enter some text that you want to speak, or enter empty text to exit.\n> ")
    		text, _ := bufio.NewReader(os.Stdin).ReadString('\n')
    		text = strings.TrimSuffix(text, "\n")
    		if len(text) == 0 {
    			break
    		}
    
    		task := speechSynthesizer.SpeakTextAsync(text)
    		var outcome speech.SpeechSynthesisOutcome
    		select {
    		case outcome = <-task:
    		case <-time.After(60 * time.Second):
    			fmt.Println("Timed out")
    			return
    		}
    		defer outcome.Close()
    		if outcome.Error != nil {
    			fmt.Println("Got an error: ", outcome.Error)
    			return
    		}
    
    		if outcome.Result.Reason == common.SynthesizingAudioCompleted {
    			fmt.Printf("Speech synthesized to speaker for text [%s].\n", text)
    		} else {
    			cancellation, _ := speech.NewCancellationDetailsFromSpeechSynthesisResult(outcome.Result)
    			fmt.Printf("CANCELED: Reason=%d.\n", cancellation.Reason)
    
    			if cancellation.Reason == common.Error {
    				fmt.Printf("CANCELED: ErrorCode=%d\nCANCELED: ErrorDetails=[%s]\nCANCELED: Did you set the speech resource key and region values?\n",
    					cancellation.ErrorCode,
    					cancellation.ErrorDetails)
    			}
    		}
    	}
    }                                                
    
  3. In speech-synthesis.go, replace YourSubscriptionKey with your Speech resource key, and replace YourServiceRegion with your Speech resource region.

  4. To change the speech synthesis language, replace en-US-JennyNeural with another supported voice. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you set es-ES-ElviraNeural, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.

Run the following commands to create a go.mod file that links to components hosted on GitHub:

go mod init speech-synthesis
go get github.com/Microsoft/cognitive-services-speech-sdk-go

Now build and run the code:

go build
go run speech-synthesis

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Reference documentation | Additional Samples on GitHub

In this quickstart, you run an application that does text to speech synthesis.

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Set up the environment

Before you can do anything, you need to install the Speech SDK. The sample in this quickstart works with the Java Runtime.

Synthesize to speaker output

Follow these steps to create a new console application for speech recognition.

  1. Open a command prompt where you want the new project, and create a new file named SpeechSynthesis.java.

  2. Copy the following code into SpeechSynthesis.java:

    import com.microsoft.cognitiveservices.speech.*;
    import com.microsoft.cognitiveservices.speech.audio.*;
    
    import java.util.Scanner;
    import java.util.concurrent.ExecutionException;
    
    public class SpeechSynthesis {
        private static String YourSubscriptionKey = "YourSubscriptionKey";
        private static String YourServiceRegion = "YourServiceRegion";
    
        public static void main(String[] args) throws InterruptedException, ExecutionException {
            SpeechConfig speechConfig = SpeechConfig.fromSubscription(YourSubscriptionKey, YourServiceRegion);
    
            speechConfig.setSpeechSynthesisVoiceName("en-US-JennyNeural"); 
    
            SpeechSynthesizer speechSynthesizer = new SpeechSynthesizer(speechConfig);
    
            // Get text from the console and synthesize to the default speaker.
            System.out.println("Enter some text that you want to speak >");
            String text = new Scanner(System.in).nextLine();
            if (text.isEmpty())
            {
                return;
            }
    
            SpeechSynthesisResult speechRecognitionResult = speechSynthesizer.SpeakTextAsync(text).get();
    
            if (speechRecognitionResult.getReason() == ResultReason.SynthesizingAudioCompleted) {
                System.out.println("Speech synthesized to speaker for text [" + text + "]");
            }
            else if (speechRecognitionResult.getReason() == ResultReason.Canceled) {
                SpeechSynthesisCancellationDetails cancellation = SpeechSynthesisCancellationDetails.fromResult(speechRecognitionResult);
                System.out.println("CANCELED: Reason=" + cancellation.getReason());
    
                if (cancellation.getReason() == CancellationReason.Error) {
                    System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
                    System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
                    System.out.println("CANCELED: Did you set the speech resource key and region values?");
                }
            }
    
            System.exit(0);
        }
    }
    
  3. In SpeechSynthesis.java, replace YourSubscriptionKey with your Speech resource key, and replace YourServiceRegion with your Speech resource region.

  4. To change the speech synthesis language, replace en-US-JennyNeural with another supported voice. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you set es-ES-ElviraNeural, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.

Run your new console application to start speech synthesis to the default speaker.

java SpeechSynthesis

Enter some text that you want to speak. For example, type "I'm excited to try text to speech." Press the Enter key to hear the synthesized speech.

Enter some text that you want to speak >
I'm excited to try text to speech

This quickstart uses the SpeakTextAsync operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Reference documentation | Package (npm) | Additional Samples on GitHub | Library source code

In this quickstart, you run an application that does text to speech synthesis.

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Set up the environment

Before you can do anything, you need to install the Speech SDK for JavaScript. If you just want the package name to install, run npm install microsoft-cognitiveservices-speech-sdk. For guided installation instructions, see Set up the development environment.

Synthesize to file output

Follow these steps to create a new console application for speech synthesis.

  1. Open a command prompt where you want the new project, and create a new file named SpeechSynthesis.js.

  2. Install the Speech SDK for JavaScript:

    npm install microsoft-cognitiveservices-speech-sdk
    
  3. Copy the following code into SpeechSynthesis.js:

    (function() {
    
        "use strict";
    
        var sdk = require("microsoft-cognitiveservices-speech-sdk");
        var readline = require("readline");
    
        var key = "YourSubscriptionKey";
        var region = "YourServiceRegion";
        var audioFile = "YourAudioFile.wav";
    
        const speechConfig = sdk.SpeechConfig.fromSubscription(key, region);
        const audioConfig = sdk.AudioConfig.fromAudioFileOutput(audioFile);
    
        // The language of the voice that speaks.
        speechConfig.speechSynthesisVoiceName = "en-US-JennyNeural"; 
    
        // Create the speech synthesizer.
        var synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);
    
        var rl = readline.createInterface({
          input: process.stdin,
          output: process.stdout
        });
    
        rl.question("Enter some text that you want to speak >\n> ", function (text) {
          rl.close();
          // Start the synthesizer and wait for a result.
          synthesizer.speakTextAsync(text,
              function (result) {
            if (result.reason === sdk.ResultReason.SynthesizingAudioCompleted) {
              console.log("synthesis finished.");
            } else {
              console.error("Speech synthesis canceled, " + result.errorDetails +
                  "\nDid you set the speech resource key and region values?");
            }
            synthesizer.close();
            synthesizer = null;
          },
              function (err) {
            console.trace("err - " + err);
            synthesizer.close();
            synthesizer = null;
          });
          console.log("Now synthesizing to: " + audioFile);
        });
    }());
    
  4. In SpeechSynthesis.js, replace YourSubscriptionKey with your Speech resource key, and replace YourServiceRegion with your Speech resource region. Optionally you can rename YourAudioFile.wav to another output filename.

  5. To change the speech synthesis language, replace en-US-JennyNeural with another supported voice. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you set es-ES-ElviraNeural, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.

Run your new console application to start speech synthesis to a file:

node.exe SpeechSynthesis.js

The provided text should be output to an audio file:

Enter some text that you want to speak >
> I'm excited to try text to speech
Now synthesizing to: YourAudioFile.wav
synthesis finished.

This quickstart uses the SpeakTextAsync operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Reference documentation | Package (Download) | Additional Samples on GitHub

In this quickstart, you run an application that does text to speech synthesis.

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Set up the environment

The Speech SDK for Objective-C is distributed as a framework bundle. The framework supports both Objective-C and Swift on both iOS and macOS.

The Speech SDK can be used in Xcode projects as a CocoaPod, or downloaded directly here and linked manually. This guide uses a CocoaPod. Install the CocoaPod dependency manager as described in its installation instructions.

Synthesize to speaker output

Follow these steps to synthesize speech in a macOS application.

  1. Clone the Azure-Samples/cognitive-services-speech-sdk repository to get the Synthesize audio in Objective-C on macOS using the Speech SDK sample project. The repository also has iOS samples.

  2. Navigate to the directory of the downloaded sample app (helloworld) in a terminal.

  3. Run the command pod install. This will generate a helloworld.xcworkspace Xcode workspace containing both the sample app and the Speech SDK as a dependency.

  4. Open the helloworld.xcworkspace workspace in Xcode.

  5. Open the file named AppDelegate.m and locate the buttonPressed method as shown here.

    - (void)buttonPressed:(NSButton *)button {
        // Creates an instance of a speech config with specified subscription key and service region.
        // Replace with your own subscription key and service region (e.g., "westus").
        NSString *speechKey = @"YourSubscriptionKey";
        NSString *serviceRegion = @"YourServiceRegion";
    
        SPXSpeechConfiguration *speechConfig = [[SPXSpeechConfiguration alloc] initWithSubscription:speechKey region:serviceRegion];
        speechConfig.speechSynthesisVoiceName = @"en-US-JennyNeural";
        SPXSpeechSynthesizer *speechSynthesizer = [[SPXSpeechSynthesizer alloc] init:speechConfig];
    
        NSLog(@"Start synthesizing...");
    
        SPXSpeechSynthesisResult *speechResult = [speechSynthesizer speakText:[self.textField stringValue]];
    
        // Checks result.
        if (SPXResultReason_Canceled == speechResult.reason) {
            SPXSpeechSynthesisCancellationDetails *details = [[SPXSpeechSynthesisCancellationDetails alloc] initFromCanceledSynthesisResult:speechResult];
            NSLog(@"Speech synthesis was canceled: %@. Did you set the speech resource key and region values?", details.errorDetails);
        } else if (SPXResultReason_SynthesizingAudioCompleted == speechResult.reason) {
            NSLog(@"Speech synthesis was completed");
        } else {
            NSLog(@"There was an error.");
        }
    }
    
  6. In AppDelegate.m, replace YourSubscriptionKey with your Speech resource key, and replace YourServiceRegion with your Speech resource region.

  7. Optionally in AppDelegate.m, include a speech synthesis voice name as shown here:

    speechConfig.speechSynthesisVoiceName = @"en-US-JennyNeural";
    
  8. To change the speech synthesis language, replace en-US-JennyNeural with another supported voice. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you set es-ES-ElviraNeural, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.

  9. Make the debug output visible (View > Debug Area > Activate Console).

  10. Build and run the example code by selecting Product -> Run from the menu or selecting the Play button.

After you input some text and select the button in the app, you should hear the synthesized audio played.

This quickstart uses the SpeakText operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Reference documentation | Package (Download) | Additional Samples on GitHub

In this quickstart, you run an application that does text to speech synthesis.

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Set up the environment

The Speech SDK for Swift is distributed as a framework bundle. The framework supports both Objective-C and Swift on both iOS and macOS.

The Speech SDK can be used in Xcode projects as a CocoaPod, or downloaded directly here and linked manually. This guide uses a CocoaPod. Install the CocoaPod dependency manager as described in its installation instructions.

Synthesize to speaker output

Follow these steps to synthesize speech in a macOS application.

  1. Clone the Azure-Samples/cognitive-services-speech-sdk repository to get the Synthesize audio in Swift on macOS using the Speech SDK sample project. The repository also has iOS samples.

  2. Navigate to the directory of the downloaded sample app (helloworld) in a terminal.

  3. Run the command pod install. This will generate a helloworld.xcworkspace Xcode workspace containing both the sample app and the Speech SDK as a dependency.

  4. Open the helloworld.xcworkspace workspace in Xcode.

  5. Open the file named AppDelegate.swift and locate the applicationDidFinishLaunching and synthesize methods as shown here.

    import Cocoa
    
    @NSApplicationMain
    class AppDelegate: NSObject, NSApplicationDelegate, NSTextFieldDelegate {
        var textField: NSTextField!
        var synthesisButton: NSButton!
    
        var inputText: String!
    
        var sub: String!
        var region: String!
    
        @IBOutlet weak var window: NSWindow!
    
        func applicationDidFinishLaunching(_ aNotification: Notification) {
            print("loading")
            // load subscription information
            sub = "YourSubscriptionKey"
            region = "YourServiceRegion"
    
            inputText = ""
    
            textField = NSTextField(frame: NSRect(x: 100, y: 200, width: 200, height: 50))
            textField.textColor = NSColor.black
            textField.lineBreakMode = .byWordWrapping
    
            textField.placeholderString = "Type something to synthesize."
            textField.delegate = self
    
            self.window.contentView?.addSubview(textField)
    
            synthesisButton = NSButton(frame: NSRect(x: 100, y: 100, width: 200, height: 30))
            synthesisButton.title = "Synthesize"
            synthesisButton.target = self
            synthesisButton.action = #selector(synthesisButtonClicked)
            self.window.contentView?.addSubview(synthesisButton)
        }
    
        @objc func synthesisButtonClicked() {
            DispatchQueue.global(qos: .userInitiated).async {
                self.synthesize()
            }
        }
    
        func synthesize() {
            var speechConfig: SPXSpeechConfiguration?
            do {
                try speechConfig = SPXSpeechConfiguration(subscription: sub, region: region)
            } catch {
                print("error \(error) happened")
                speechConfig = nil
            }
    
            speechConfig?.speechSynthesisVoiceName = "en-US-JennyNeural";
    
            let synthesizer = try! SPXSpeechSynthesizer(speechConfig!)
            let result = try! synthesizer.speakText(inputText)
            if result.reason == SPXResultReason.canceled
            {
                let cancellationDetails = try! SPXSpeechSynthesisCancellationDetails(fromCanceledSynthesisResult: result)
                print("cancelled, error code: \(cancellationDetails.errorCode) detail: \(cancellationDetails.errorDetails!) ")
                print("Did you set the speech resource key and region values?");
                return
            }
        }
    
        func controlTextDidChange(_ obj: Notification) {
            let textFiled = obj.object as! NSTextField
            inputText = textFiled.stringValue
        }
    }
    
  6. In AppDelegate.m, replace YourSubscriptionKey with your Speech resource key, and replace YourServiceRegion with your Speech resource region.

  7. Optionally in AppDelegate.m, include a speech synthesis voice name as shown here:

    speechConfig?.speechSynthesisVoiceName = "en-US-JennyNeural";
    
  8. To change the speech synthesis language, replace en-US-JennyNeural with another supported voice. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you set es-ES-ElviraNeural, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.

  9. Make the debug output visible by selecting View > Debug Area > Activate Console.

  10. Build and run the example code by selecting Product -> Run from the menu or selecting the Play button.

After you input some text and select the button in the app, you should hear the synthesized audio played.

This quickstart uses the SpeakText operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Reference documentation | Package (PyPi) | Additional Samples on GitHub

In this quickstart, you run an application that does text to speech synthesis.

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Set up the environment

The Speech SDK for Python is available as a Python Package Index (PyPI) module. The Speech SDK for Python is compatible with Windows, Linux, and macOS.

Install a version of Python from 3.7 to 3.10. First check the platform-specific installation instructions for any more requirements.

Synthesize to speaker output

Follow these steps to create a new console application.

  1. Open a command prompt where you want the new project, and create a new file named speech-synthesis.py.

  2. Run this command to install the Speech SDK:

    pip install azure-cognitiveservices-speech
    
  3. Copy the following code into speech_synthesis.py:

    import azure.cognitiveservices.speech as speechsdk
    
    speech_config = speechsdk.SpeechConfig(subscription="YourSubscriptionKey", region="YourServiceRegion")
    audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
    
    # The language of the voice that speaks.
    speech_config.speech_synthesis_voice_name='en-US-JennyNeural'
    
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
    
    # Get text from the console and synthesize to the default speaker.
    print("Enter some text that you want to speak >")
    text = input()
    
    speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()
    
    if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
        print("Speech synthesized for text [{}]".format(text))
    elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = speech_synthesis_result.cancellation_details
        print("Speech synthesis canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            if cancellation_details.error_details:
                print("Error details: {}".format(cancellation_details.error_details))
                print("Did you set the speech resource key and region values?")
    
  4. In speech_synthesis.py, replace YourSubscriptionKey with your Speech resource key, and replace YourServiceRegion with your Speech resource region.

  5. To change the speech synthesis language, replace en-US-JennyNeural with another supported voice. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you set es-ES-ElviraNeural, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.

Run your new console application to start speech synthesis to the default speaker.

python speech_synthesis.py

Enter some text that you want to speak. For example, type "I'm excited to try text to speech." Press the Enter key to hear the synthesized speech.

Enter some text that you want to speak > 
I'm excited to try text to speech

This quickstart uses the speak_text_async operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Speech-to-text REST API v3.0 reference | Speech-to-text REST API for short audio reference | Additional Samples on GitHub

In this quickstart, you run an application that does text to speech synthesis.

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Synthesize to a file

At a command prompt, run the following cURL command. Insert the following values into the command. Replace YourSubscriptionKey with your Speech resource key, and replace YourServiceRegion with your Speech resource region. Optionally you can rename output.mp3 to another output filename.

key="YourSubscriptionKey"
region="YourServiceRegion"

curl --location --request POST 'https://$region.tts.speech.microsoft.com/cognitiveservices/v1' \
--header 'Ocp-Apim-Subscription-Key: $key' \
--header 'Content-Type: application/ssml+xml' \
--header 'X-Microsoft-OutputFormat: audio-16khz-128kbitrate-mono-mp3' \
--header 'User-Agent: curl' \
--data-raw '<speak version='\''1.0'\'' xml:lang='\''en-US'\''>
    <voice xml:lang='\''en-US'\'' xml:gender='\''Female'\'' name='\''en-US-JennyNeural'\''>
        my voice is my passport verify me
    </voice>
</speak>' > output.mp3

The provided text should be output to an audio file named output.mp3.

To change the speech synthesis language, replace en-US-JennyNeural with another supported voice. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you set es-ES-ElviraNeural, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.

For more information, see Text-to-speech REST API.

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

In this quickstart, you run an application that does text to speech synthesis.

Tip

To try the Speech service without writing any code, create a project in Speech Studio.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Speech resource in the Azure portal. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Set up the environment

Follow these steps and see the Speech CLI quickstart for additional requirements for your platform.

  1. Install the Speech CLI via the .NET CLI by entering this command:

    dotnet tool install --global Microsoft.CognitiveServices.Speech.CLI
    
  2. Configure your Speech resource key and region, by running the following commands. Replace SUBSCRIPTION-KEY with your Speech resource key, and replace REGION with your Speech resource region:

    spx config @key --set SUBSCRIPTION-KEY
    spx config @region --set REGION
    

Synthesize to speaker output

Run the following command for speech synthesis to the default speaker output. You can modify the text to be synthesized and the voice.

 spx synthesize --text "I'm excited to try text to speech" --voice "en-US-JennyNeural"

If you don't set a voice name, the default voice for en-US will speak. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you set --voice "es-ES-ElviraNeural", the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.

Run this command for information about additional speech synthesis options such as file input and output:

spx help synthesize

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Next steps