Text-to-speech REST API

The Speech Services allow you to convert text into synthesized speech and get a list of supported voices for a region using a set of REST APIs. Each available endpoint is associated with a region. A subscription key for the endpoint/region you plan to use is required.

The text-to-speech REST API supports neural and standard text-to-speech voices, each of which supports a specific language and dialect, identified by locale.

Important

Costs vary for standard, custom, and neural voices. For more information, see Pricing.

Before using this API, understand:

  • The text-to-speech REST API requires an Authorization header. This means that you need to complete a token exchange to access the service. For more information, see Authentication.

Authentication

Each request requires an authorization header. This table illustrates which headers are supported for each service:

Supported authorization headers Speech-to-text Text-to-speech
Ocp-Apim-Subscription-Key Yes No
Authorization: Bearer Yes Yes

When using the Ocp-Apim-Subscription-Key header, you're only required to provide your subscription key. For example:

'Ocp-Apim-Subscription-Key': 'YOUR_SUBSCRIPTION_KEY'

When using the Authorization: Bearer header, you're required to make a request to the issueToken endpoint. In this request, you exchange your subscription key for an access token that's valid for 10 minutes. In the next few sections you'll learn how to get a token, and use a token.

How to get an access token

To get an access token, you'll need to make a request to the issueToken endpoint using the Ocp-Apim-Subscription-Key and your subscription key.

These regions and endpoints are supported:

Region Token service endpoint
Australia East https://australiaeast.api.cognitive.microsoft.com/sts/v1.0/issueToken
Canada Central https://canadacentral.api.cognitive.microsoft.com/sts/v1.0/issueToken
Central US https://centralus.api.cognitive.microsoft.com/sts/v1.0/issueToken
East Asia https://eastasia.api.cognitive.microsoft.com/sts/v1.0/issueToken
East US https://eastus.api.cognitive.microsoft.com/sts/v1.0/issueToken
East US 2 https://eastus2.api.cognitive.microsoft.com/sts/v1.0/issueToken
France Central https://francecentral.api.cognitive.microsoft.com/sts/v1.0/issueToken
India Central https://centralindia.api.cognitive.microsoft.com/sts/v1.0/issueToken
Japan East https://japaneast.api.cognitive.microsoft.com/sts/v1.0/issueToken
Korea Central https://koreacentral.api.cognitive.microsoft.com/sts/v1.0/issueToken
North Central US https://northcentralus.api.cognitive.microsoft.com/sts/v1.0/issueToken
North Europe https://northeurope.api.cognitive.microsoft.com/sts/v1.0/issueToken
South Central US https://southcentralus.api.cognitive.microsoft.com/sts/v1.0/issueToken
Southeast Asia https://southeastasia.api.cognitive.microsoft.com/sts/v1.0/issueToken
UK South https://uksouth.api.cognitive.microsoft.com/sts/v1.0/issueToken
West Europe https://westeurope.api.cognitive.microsoft.com/sts/v1.0/issueToken
West US https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken
West US 2 https://westus2.api.cognitive.microsoft.com/sts/v1.0/issueToken

Use these samples to create your access token request.

HTTP sample

This example is a simple HTTP request to get a token. Replace YOUR_SUBSCRIPTION_KEY with your Speech Service subscription key. If your subscription isn't in the West US region, replace the Host header with your region's host name.

POST /sts/v1.0/issueToken HTTP/1.1
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY
Host: westus.api.cognitive.microsoft.com
Content-type: application/x-www-form-urlencoded
Content-Length: 0

The body of the response contains the access token in JSON Web Token (JWT) format.

PowerShell sample

This example is a simple PowerShell script to get an access token. Replace YOUR_SUBSCRIPTION_KEY with your Speech Service subscription key. Make sure to use the correct endpoint for the region that matches your subscription. This example is currently set to West US.

$FetchTokenHeader = @{
  'Content-type'='application/x-www-form-urlencoded';
  'Content-Length'= '0';
  'Ocp-Apim-Subscription-Key' = 'YOUR_SUBSCRIPTION_KEY'
}

$OAuthToken = Invoke-RestMethod -Method POST -Uri https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken
 -Headers $FetchTokenHeader

# show the token received
$OAuthToken

cURL sample

cURL is a command-line tool available in Linux (and in the Windows Subsystem for Linux). This cURL command illustrates how to get an access token. Replace YOUR_SUBSCRIPTION_KEY with your Speech Service subscription key. Make sure to use the correct endpoint for the region that matches your subscription. This example is currently set to West US.

curl -v -X POST
 "https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken" \
 -H "Content-type: application/x-www-form-urlencoded" \
 -H "Content-Length: 0" \
 -H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY"

C# sample

This C# class illustrates how to get an access token. Pass your Speech Service subscription key when you instantiate the class. If your subscription isn't in the West US region, change the value of FetchTokenUri to match the region for your subscription.

public class Authentication
{
    public static readonly string FetchTokenUri =
        "https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken";
    private string subscriptionKey;
    private string token;

    public Authentication(string subscriptionKey)
    {
        this.subscriptionKey = subscriptionKey;
        this.token = FetchTokenAsync(FetchTokenUri, subscriptionKey).Result;
    }

    public string GetAccessToken()
    {
        return this.token;
    }

    private async Task<string> FetchTokenAsync(string fetchUri, string subscriptionKey)
    {
        using (var client = new HttpClient())
        {
            client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", subscriptionKey);
            UriBuilder uriBuilder = new UriBuilder(fetchUri);

            var result = await client.PostAsync(uriBuilder.Uri.AbsoluteUri, null);
            Console.WriteLine("Token Uri: {0}", uriBuilder.Uri.AbsoluteUri);
            return await result.Content.ReadAsStringAsync();
        }
    }
}

Python sample

# Request module must be installed.
# Run pip install requests if necessary.
import requests

subscription_key = 'REPLACE_WITH_YOUR_KEY'


def get_token(subscription_key):
    fetch_token_url = 'https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken'
    headers = {
        'Ocp-Apim-Subscription-Key': subscription_key
    }
    response = requests.post(fetch_token_url, headers=headers)
    access_token = str(response.text)
    print(access_token)

How to use an access token

The access token should be sent to the service as the Authorization: Bearer <TOKEN> header. Each access token is valid for 10 minutes. You can get a new token at any time, however, to minimize network traffic and latency, we recommend using the same token for nine minutes.

Here's a sample HTTP request to the text-to-speech REST API:

POST /cognitiveservices/v1 HTTP/1.1
Authorization: Bearer YOUR_ACCESS_TOKEN
Host: westus.stt.speech.microsoft.com
Content-type: application/ssml+xml
Content-Length: 199
Connection: Keep-Alive

// Message body here...

Get a list of voices

The voices/list endpoint allows you to get a full list of voices for a specific region/endpoint.

Regions and endpoints

Region Endpoint
Australia East https://australiaeast.tts.speech.microsoft.com/cognitiveservices/voices/list
Brazil South https://brazilsouth.tts.speech.microsoft.com/cognitiveservices/voices/list
Canada Central https://canadacentral.tts.speech.microsoft.com/cognitiveservices/voices/list
Central US https://centralus.tts.speech.microsoft.com/cognitiveservices/voices/list
East Asia https://eastasia.tts.speech.microsoft.com/cognitiveservices/voices/list
East US https://eastus.tts.speech.microsoft.com/cognitiveservices/voices/list
East US 2 https://eastus2.tts.speech.microsoft.com/cognitiveservices/voices/list
France Central https://francecentral.tts.speech.microsoft.com/cognitiveservices/voices/list
India Central https://centralindia.tts.speech.microsoft.com/cognitiveservices/voices/list
Japan East https://japaneast.tts.speech.microsoft.com/cognitiveservices/voices/list
Korea Central https://koreacentral.tts.speech.microsoft.com/cognitiveservices/voices/list
North Central US https://northcentralus.tts.speech.microsoft.com/cognitiveservices/voices/list
North Europe https://northeurope.tts.speech.microsoft.com/cognitiveservices/voices/list
South Central US https://southcentralus.tts.speech.microsoft.com/cognitiveservices/voices/list
Southeast Asia https://southeastasia.tts.speech.microsoft.com/cognitiveservices/voices/list
UK South https://uksouth.tts.speech.microsoft.com/cognitiveservices/voices/list
West Europe https://westeurope.tts.speech.microsoft.com/cognitiveservices/voices/list
West US https://westus.tts.speech.microsoft.com/cognitiveservices/voices/list
West US 2 https://westus2.tts.speech.microsoft.com/cognitiveservices/voices/list

Request headers

This table lists required and optional headers for text-to-speech requests.

Header Description Required / Optional
Authorization An authorization token preceded by the word Bearer. For more information, see Authentication. Required

Request body

A body isn't required for GET requests to this endpoint.

Sample request

This request only requires an authorization header.

GET /cognitiveservices/voices/list HTTP/1.1

Host: westus.tts.speech.microsoft.com
Authorization: Bearer [Base64 access_token]

Sample response

This response has been truncated to illustrate the structure of a response.

Note

Voice availability varies by region/endpoint.

[
    {
        "Name": "Microsoft Server Speech Text to Speech Voice (ar-EG, Hoda)",
        "ShortName": "ar-EG-Hoda",
        "Gender": "Female",
        "Locale": "ar-EG"
    },
    {
        "Name": "Microsoft Server Speech Text to Speech Voice (ar-SA, Naayf)",
        "ShortName": "ar-SA-Naayf",
        "Gender": "Male",
        "Locale": "ar-SA"
    },
    {
        "Name": "Microsoft Server Speech Text to Speech Voice (bg-BG, Ivan)",
        "ShortName": "bg-BG-Ivan",
        "Gender": "Male",
        "Locale": "bg-BG"
    },
    {
        "Name": "Microsoft Server Speech Text to Speech Voice (ca-ES, HerenaRUS)",
        "ShortName": "ca-ES-HerenaRUS",
        "Gender": "Female",
        "Locale": "ca-ES"
    },
    {
        "Name": "Microsoft Server Speech Text to Speech Voice (cs-CZ, Jakub)",
        "ShortName": "cs-CZ-Jakub",
        "Gender": "Male",
        "Locale": "cs-CZ"
    },

    ...

]

HTTP status codes

The HTTP status code for each response indicates success or common errors.

HTTP status code Description Possible reason
200 OK The request was successful.
400 Bad Request A required parameter is missing, empty, or null. Or, the value passed to either a required or optional parameter is invalid. A common issue is a header that is too long.
401 Unauthorized The request is not authorized. Check to make sure your subscription key or token is valid and in the correct region.
429 Too Many Requests You have exceeded the quota or rate of requests allowed for your subscription.
502 Bad Gateway Network or server-side issue. May also indicate invalid headers.

Convert text-to-speech

The v1 endpoint allows you to convert text-to-speech using Speech Synthesis Markup Language (SSML).

Regions and endpoints

These regions are supported for text-to-speech using the REST API. Make sure that you select the endpoint that matches your subscription region.

Standard and neural voices

Use this table to determine availability of standard and neural voices by region/endpoint:

Region Endpoint Standard Voices Neural Voices
Australia East https://australiaeast.tts.speech.microsoft.com/cognitiveservices/v1 Yes Yes
Canada Central https://canadacentral.tts.speech.microsoft.com/cognitiveservices/v1 Yes Yes
Central US https://centralus.tts.speech.microsoft.com/cognitiveservices/v1 Yes No
East Asia https://eastasia.tts.speech.microsoft.com/cognitiveservices/v1 Yes No
East US https://eastus.tts.speech.microsoft.com/cognitiveservices/v1 Yes Yes
East US 2 https://eastus2.tts.speech.microsoft.com/cognitiveservices/v1 Yes No
France Central https://francecentral.tts.speech.microsoft.com/cognitiveservices/v1 Yes No
India Central https://centralindia.tts.speech.microsoft.com/cognitiveservices/v1 Yes Yes
Japan East https://japaneast.tts.speech.microsoft.com/cognitiveservices/v1 Yes No
Korea Central https://koreacentral.tts.speech.microsoft.com/cognitiveservices/v1 Yes No
North Central US https://northcentralus.tts.speech.microsoft.com/cognitiveservices/v1 Yes No
North Europe https://northeurope.tts.speech.microsoft.com/cognitiveservices/v1 Yes No
South Central US https://southcentralus.tts.speech.microsoft.com/cognitiveservices/v1 Yes Yes
Southeast Asia https://southeastasia.tts.speech.microsoft.com/cognitiveservices/v1 Yes Yes
UK South https://uksouth.tts.speech.microsoft.com/cognitiveservices/v1 Yes Yes
West Europe https://westeurope.tts.speech.microsoft.com/cognitiveservices/v1 Yes Yes
West US https://westus.tts.speech.microsoft.com/cognitiveservices/v1 Yes No
West US 2 https://westus2.tts.speech.microsoft.com/cognitiveservices/v1 Yes Yes

Custom voices

If you've created a custom voice font, use the endpoint that you've created. You can also use the endpoints listed below, replacing the {deploymentId} with the deployment ID for your voice model.

Region Endpoint
Australia East https://australiaeast.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
Canada Central https://canadacentral.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
Central US https://centralus.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
East Asia https://eastasia.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
East US https://eastus.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
East US 2 https://eastus2.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
France Central https://francecentral.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
India Central https://centralindia.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
Japan East https://japaneast.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
Korea Central https://koreacentral.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
North Central US https://northcentralus.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
North Europe https://northeurope.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
South Central US https://southcentralus.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
Southeast Asia https://southeastasia.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
UK South https://uksouth.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
West Europe https://westeurope.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
West US https://westus.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
West US 2 https://westus2.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}

Request headers

This table lists required and optional headers for text-to-speech requests.

Header Description Required / Optional
Authorization An authorization token preceded by the word Bearer. For more information, see Authentication. Required
Content-Type Specifies the content type for the provided text. Accepted value: application/ssml+xml. Required
X-Microsoft-OutputFormat Specifies the audio output format. For a complete list of accepted values, see audio outputs. Required
User-Agent The application name. The value provided must be less than 255 characters. Required

Audio outputs

This is a list of supported audio formats that are sent in each request as the X-Microsoft-OutputFormat header. Each incorporates a bitrate and encoding type. The Speech Services supports 24 kHz, 16 kHz, and 8 kHz audio outputs.

raw-16khz-16bit-mono-pcm raw-8khz-8bit-mono-mulaw
riff-8khz-8bit-mono-alaw riff-8khz-8bit-mono-mulaw
riff-16khz-16bit-mono-pcm audio-16khz-128kbitrate-mono-mp3
audio-16khz-64kbitrate-mono-mp3 audio-16khz-32kbitrate-mono-mp3
raw-24khz-16bit-mono-pcm riff-24khz-16bit-mono-pcm
audio-24khz-160kbitrate-mono-mp3 audio-24khz-96kbitrate-mono-mp3
audio-24khz-48kbitrate-mono-mp3

Note

If your selected voice and output format have different bit rates, the audio is resampled as necessary. However, 24 kHz voices do not support audio-16khz-16kbps-mono-siren and riff-16khz-16kbps-mono-siren output formats.

Request body

The body of each POST request is sent as Speech Synthesis Markup Language (SSML). SSML allows you to choose the voice and language of the synthesized speech returned by the text-to-speech service. For a complete list of supported voices, see language support.

Note

If using a custom voice, the body of a request can be sent as plain text (ASCII or UTF-8).

Sample request

This HTTP request uses SSML to specify the voice and language. The body cannot exceed 1,000 characters.

POST /cognitiveservices/v1 HTTP/1.1

X-Microsoft-OutputFormat: raw-16khz-16bit-mono-pcm
Content-Type: application/ssml+xml
Host: westus.tts.speech.microsoft.com
Content-Length: 225
Authorization: Bearer [Base64 access_token]

<speak version='1.0' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Female'
    name='en-US-JessaRUS'>
        Microsoft Speech Service Text-to-Speech API
</voice></speak>

See our quickstarts for language-specific examples:

HTTP status codes

The HTTP status code for each response indicates success or common errors.

HTTP status code Description Possible reason
200 OK The request was successful; the response body is an audio file.
400 Bad Request A required parameter is missing, empty, or null. Or, the value passed to either a required or optional parameter is invalid. A common issue is a header that is too long.
401 Unauthorized The request is not authorized. Check to make sure your subscription key or token is valid and in the correct region.
413 Request Entity Too Large The SSML input is longer than 1024 characters.
415 Unsupported Media Type It's possible that the wrong Content-Type was provided. Content-Type should be set to application/ssml+xml.
429 Too Many Requests You have exceeded the quota or rate of requests allowed for your subscription.
502 Bad Gateway Network or server-side issue. May also indicate invalid headers.

If the HTTP status is 200 OK, the body of the response contains an audio file in the requested format. This file can be played as it's transferred, saved to a buffer, or saved to a file.

Next steps