The Speech service allows you to convert text into synthesized speech and get a list of supported voices for a region by using a REST API. In this article, you learn about authorization options, query options, how to structure a request, and how to interpret a response.
Tip
Use cases for the text to speech REST API are limited. Use it only in cases where you can't use the Speech SDK. For example, with the Speech SDK you can subscribe to events for more insights about the text to speech processing and results.
The text to speech REST API supports neural text to speech voices in many locales. Each available endpoint is associated with a region. An API key for the endpoint or region that you plan to use is required. Here are links to more information:
Costs vary for prebuilt neural voices (called Neural on the pricing page) and custom neural voices (called Custom Neural on the pricing page). For more information, see Speech service pricing.
Before you use the text to speech REST API, understand that you need to complete a token exchange as part of authentication to access the service. For more information, see Authentication.
Get a list of voices
You can use the tts.speech.microsoft.com/cognitiveservices/voices/list endpoint to get a full list of voices for a specific region or endpoint. Prefix the voices list endpoint with a region to get a list of voices for that region. For example, to get a list of voices for the westus region, use the https://westus.tts.speech.microsoft.com/cognitiveservices/voices/list endpoint. For a list of all supported regions, see the regions documentation.
Note
Voices and styles in preview are only available in three service regions: East US, West Europe, and Southeast Asia.
Request headers
This table lists required and optional headers for text to speech requests:
Header
Description
Required or optional
Ocp-Apim-Subscription-Key
Your Speech resource key.
Either this header or Authorization is required.
Authorization
An authorization token preceded by the word Bearer. For more information, see Authentication.
Either this header or Ocp-Apim-Subscription-Key is required.
Request body
A body isn't required for GET requests to this endpoint.
Sample request
This request requires only an authorization header:
GET /cognitiveservices/voices/list HTTP/1.1
Host: westus.tts.speech.microsoft.com
Ocp-Apim-Subscription-Key: YOUR_RESOURCE_KEY
Here's an example curl command:
curl --location --request GET 'https://YOUR_RESOURCE_REGION.tts.speech.microsoft.com/cognitiveservices/voices/list' \
--header 'Ocp-Apim-Subscription-Key: YOUR_RESOURCE_KEY'
Sample response
You should receive a response with a JSON body that includes all supported locales, voices, gender, styles, and other details. The WordsPerMinute property for each voice can be used to estimate the length of the output speech. This JSON example shows partial results to illustrate the structure of a response:
The HTTP status code for each response indicates success or common errors.
HTTP status code
Description
Possible reason
200
OK
The request was successful.
400
Bad request
A required parameter is missing, empty, or null. Or, the value passed to either a required or optional parameter is invalid. A common reason is a header that's too long.
401
Unauthorized
The request isn't authorized. Make sure your resource key or token is valid and in the correct region.
429
Too many requests
You exceeded the quota or rate of requests allowed for your resource.
502
Bad gateway
There's a network or server-side problem. This status might also indicate invalid headers.
Voices in preview are available in only these three regions: East US, West Europe, and Southeast Asia.
Custom neural voices
If you've created a custom neural voice font, use the endpoint that you've created. You can also use the following endpoints. Replace {deploymentId} with the deployment ID for your neural voice model.
The preceding regions are available for neural voice model hosting and real-time synthesis. Custom neural voice training is only available in some regions. But users can easily copy a neural voice model from these regions to other regions in the preceding list.
Long Audio API
The Long Audio API is available in multiple regions with unique endpoints:
This table lists required and optional headers for text to speech requests:
Header
Description
Required or optional
Authorization
An authorization token preceded by the word Bearer. For more information, see Authentication.
Required
Content-Type
Specifies the content type for the provided text. Accepted value: application/ssml+xml.
Required
X-Microsoft-OutputFormat
Specifies the audio output format. For a complete list of accepted values, see Audio outputs.
Required
User-Agent
The application name. The provided value must be fewer than 255 characters.
Required
Request body
If you're using a custom neural voice, the body of a request can be sent as plain text (ASCII or UTF-8). Otherwise, the body of each POST request is sent as SSML. SSML allows you to choose the voice and language of the synthesized speech that the text to speech feature returns. For a complete list of supported voices, see Language and voice support for the Speech service.
Sample request
This HTTP request uses SSML to specify the voice and language. If the body length is long, and the resulting audio exceeds 10 minutes, it's truncated to 10 minutes. In other words, the audio length can't exceed 10 minutes.
POST /cognitiveservices/v1 HTTP/1.1
X-Microsoft-OutputFormat: riff-24khz-16bit-mono-pcm
Content-Type: application/ssml+xml
Host: westus.tts.speech.microsoft.com
Content-Length: <Length>
Authorization: Bearer [Base64 access_token]
User-Agent: <Your application name>
<speak version='1.0' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Male'
name='en-US-ChristopherNeural'>
I'm excited to try text to speech!
</voice></speak>
* For the Content-Length, you should use your own content length. In most cases, this value is calculated automatically.
HTTP status codes
The HTTP status code for each response indicates success or common errors:
HTTP status code
Description
Possible reason
200
OK
The request was successful. The response body is an audio file.
400
Bad request
A required parameter is missing, empty, or null. Or, the value passed to either a required or optional parameter is invalid. A common reason is a header that's too long.
401
Unauthorized
The request isn't authorized. Make sure your Speech resource key or token is valid and in the correct region.
415
Unsupported media type
It's possible that the wrong Content-Type value was provided. Content-Type should be set to application/ssml+xml.
429
Too many requests
You exceeded the quota or rate of requests allowed for your resource.
502
Bad gateway
There's a network or server-side problem. This status might also indicate invalid headers.
503
Service Unavailable
There's a server-side problem for various reasons.
If the HTTP status is 200 OK, the body of the response contains an audio file in the requested format. This file can be played as it's transferred, saved to a buffer, or saved to a file.
Audio outputs
The supported streaming and nonstreaming audio formats are sent in each request as the X-Microsoft-OutputFormat header. Each format incorporates a bit rate and encoding type. The Speech service supports 48-kHz, 24-kHz, 16-kHz, and 8-kHz audio outputs. Each prebuilt neural voice model is available at 24kHz and high-fidelity 48kHz.
If you select 48kHz output format, the high-fidelity voice model with 48kHz will be invoked accordingly. The sample rates other than 24kHz and 48kHz can be obtained through upsampling or downsampling when synthesizing, for example, 44.1kHz is downsampled from 48kHz.
If your selected voice and output format have different bit rates, the audio is resampled as necessary. You can decode the ogg-24khz-16bit-mono-opus format by using the Opus codec.
Authentication
Each request requires an authorization header. This table illustrates which headers are supported for each feature:
Supported authorization header
Speech to text
Text to speech
Ocp-Apim-Subscription-Key
Yes
Yes
Authorization: Bearer
Yes
Yes
When you're using the Ocp-Apim-Subscription-Key header, only your resource key must be provided. For example:
When you're using the Authorization: Bearer header, you need to make a request to the issueToken endpoint. In this request, you exchange your resource key for an access token that's valid for 10 minutes.
Another option is to use Microsoft Entra authentication that also uses the Authorization: Bearer header, but with a token issued via Microsoft Entra ID. See Use Microsoft Entra authentication.
How to get an access token
To get an access token, you need to make a request to the issueToken endpoint by using Ocp-Apim-Subscription-Key and your resource key.
Replace <REGION_IDENTIFIER> with the identifier that matches the region of your subscription.
Use the following samples to create your access token request.
HTTP sample
This example is a simple HTTP request to get a token. Replace YOUR_SUBSCRIPTION_KEY with your resource key for the Speech service. If your subscription isn't in the West US region, replace the Host header with your region's host name.
The body of the response contains the access token in JSON Web Token (JWT) format.
PowerShell sample
This example is a simple PowerShell script to get an access token. Replace YOUR_SUBSCRIPTION_KEY with your resource key for the Speech service. Make sure to use the correct endpoint for the region that matches your subscription. This example is currently set to West US.
$FetchTokenHeader = @{
'Content-type'='application/x-www-form-urlencoded';
'Content-Length'= '0';
'Ocp-Apim-Subscription-Key' = 'YOUR_SUBSCRIPTION_KEY'
}
$OAuthToken = Invoke-RestMethod -Method POST -Uri https://eastus.api.cognitive.microsoft.com/sts/v1.0/issueToken
-Headers $FetchTokenHeader
# show the token received
$OAuthToken
cURL sample
cURL is a command-line tool available in Linux (and in the Windows Subsystem for Linux). This cURL command illustrates how to get an access token. Replace YOUR_SUBSCRIPTION_KEY with your resource key for the Speech service. Make sure to use the correct endpoint for the region that matches your subscription. This example is currently set to West US.
This C# class illustrates how to get an access token. Pass your resource key for the Speech service when you instantiate the class. If your subscription isn't in the West US region, change the value of FetchTokenUri to match the region for your subscription.
public class Authentication
{
public static readonly string FetchTokenUri =
"https://eastus.api.cognitive.microsoft.com/sts/v1.0/issueToken";
private string subscriptionKey;
private string token;
public Authentication(string subscriptionKey)
{
this.subscriptionKey = subscriptionKey;
this.token = FetchTokenAsync(FetchTokenUri, subscriptionKey).Result;
}
public string GetAccessToken()
{
return this.token;
}
private async Task<string> FetchTokenAsync(string fetchUri, string subscriptionKey)
{
using (var client = new HttpClient())
{
client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", subscriptionKey);
UriBuilder uriBuilder = new UriBuilder(fetchUri);
var result = await client.PostAsync(uriBuilder.Uri.AbsoluteUri, null);
Console.WriteLine("Token Uri: {0}", uriBuilder.Uri.AbsoluteUri);
return await result.Content.ReadAsStringAsync();
}
}
}
Python sample
# Request module must be installed.
# Run pip install requests if necessary.
import requests
subscription_key = 'REPLACE_WITH_YOUR_KEY'
def get_token(subscription_key):
fetch_token_url = 'https://eastus.api.cognitive.microsoft.com/sts/v1.0/issueToken'
headers = {
'Ocp-Apim-Subscription-Key': subscription_key
}
response = requests.post(fetch_token_url, headers=headers)
access_token = str(response.text)
print(access_token)
How to use an access token
The access token should be sent to the service as the Authorization: Bearer <TOKEN> header. Each access token is valid for 10 minutes. You can get a new token at any time, but to minimize network traffic and latency, we recommend using the same token for nine minutes.
Here's a sample HTTP request to the Speech to text REST API for short audio:
POST /cognitiveservices/v1 HTTP/1.1
Authorization: Bearer YOUR_ACCESS_TOKEN
Host: westus.stt.speech.microsoft.com
Content-type: application/ssml+xml
Content-Length: 199
Connection: Keep-Alive
// Message body here...
Use Microsoft Entra authentication
To use Microsoft Entra authentication with the Speech to text REST API for short audio, you need to create an access token.
The steps to obtain the access token consisting of Resource ID and Microsoft Entra access token are the same as when using the Speech SDK.
Follow the steps here Use Microsoft Entra authentication
Create an AI Services resource for Speech
Configure the Speech resource for Microsoft Entra authentication
Get a Microsoft Entra access token
Get the Speech resource ID
After the resource ID and the Microsoft Entra access token were obtained, the actual access token can be constructed following this format: