Speech to text containers with Docker

The Speech to text container transcribes real-time speech or batch audio recordings with intermediate results. In this article, you learn how to download, install, and run a speech to text container.

For more information about prerequisites, validating that a container is running, running multiple containers on the same host, and running disconnected containers, see Install and run Speech containers with Docker.

Container images

The Speech to text container image for all supported versions and locales can be found on the Microsoft Container Registry (MCR) syndicate. It resides within the azure-cognitive-services/speechservices/ repository and is named speech-to-text.

A screenshot of the search connectors and triggers dialog.

The fully qualified container image name is, mcr.microsoft.com/azure-cognitive-services/speechservices/speech-to-text. Either append a specific version or append :latest to get the most recent version.

Version Path
Latest mcr.microsoft.com/azure-cognitive-services/speechservices/speech-to-text:latest

The latest tag pulls the latest image for the en-US locale.
4.6.0 mcr.microsoft.com/azure-cognitive-services/speechservices/speech-to-text:4.6.0-amd64-mr-in

All tags, except for latest, are in the following format and are case sensitive:

<major>.<minor>.<patch>-<platform>-<locale>-<prerelease>

The tags are also available in JSON format for your convenience. The body includes the container path and list of tags. The tags aren't sorted by version, but "latest" is always included at the end of the list as shown in this snippet:

{
  "name": "azure-cognitive-services/speechservices/speech-to-text",
  "tags": [
    "2.10.0-amd64-ar-ae",
    "2.10.0-amd64-ar-bh",
    "2.10.0-amd64-ar-eg",
    "2.10.0-amd64-ar-iq",
    "2.10.0-amd64-ar-jo",
    <--redacted for brevity-->
    "latest"
  ]
}

Get the container image with docker pull

You need the prerequisites including required hardware. Also see the recommended allocation of resources for each Speech container.

Use the docker pull command to download a container image from Microsoft Container Registry:

docker pull mcr.microsoft.com/azure-cognitive-services/speechservices/speech-to-text:latest

Important

The latest tag pulls the latest image for the en-US locale. For additional versions and locales, see speech to text container images.

Run the container with docker run

Use the docker run command to run the container.

The following table represents the various docker run parameters and their corresponding descriptions:

Parameter Description
{ENDPOINT_URI} The endpoint is required for metering and billing. For more information, see billing arguments.
{API_KEY} The API key is required. For more information, see billing arguments.

When you run the speech to text container, configure the port, memory, and CPU according to the speech to text container requirements and recommendations.

Here's an example docker run command with placeholder values. You must specify the ENDPOINT_URI and API_KEY values:

docker run --rm -it -p 5000:5000 --memory 8g --cpus 4 \
mcr.microsoft.com/azure-cognitive-services/speechservices/speech-to-text \
Eula=accept \
Billing={ENDPOINT_URI} \
ApiKey={API_KEY}

This command:

  • Runs a speech-to-text container from the container image.
  • Allocates 4 CPU cores and 8 GB of memory.
  • Exposes TCP port 5000 and allocates a pseudo-TTY for the container.
  • Automatically removes the container after it exits. The container image is still available on the host computer.

For more information about docker run with Speech containers, see Install and run Speech containers with Docker.

Use the container

Speech containers provide websocket-based query endpoint APIs that are accessed through the Speech SDK and Speech CLI. By default, the Speech SDK and Speech CLI use the public Speech service. To use the container, you need to change the initialization method.

Important

When you use the Speech service with containers, be sure to use host authentication. If you configure the key and region, requests will go to the public Speech service. Results from the Speech service might not be what you expect. Requests from disconnected containers will fail.

Instead of using this Azure-cloud initialization config:

var config = SpeechConfig.FromSubscription(...);

Use this config with the container host:

var config = SpeechConfig.FromHost(
    new Uri("ws://localhost:5000"));

Instead of using this Azure-cloud initialization config:

auto speechConfig = SpeechConfig::FromSubscription(...);

Use this config with the container host:

auto speechConfig = SpeechConfig::FromHost("ws://localhost:5000");

Instead of using this Azure-cloud initialization config:

speechConfig, err := speech.NewSpeechConfigFromSubscription(...)

Use this config with the container host:

speechConfig, err := speech.NewSpeechConfigFromHost("ws://localhost:5000")

Instead of using this Azure-cloud initialization config:

SpeechConfig speechConfig = SpeechConfig.fromSubscription(...);

Use this config with the container host:

SpeechConfig speechConfig = SpeechConfig.fromHost("ws://localhost:5000");

Instead of using this Azure-cloud initialization config:

const speechConfig = sdk.SpeechConfig.fromSubscription(...);

Use this config with the container host:

const speechConfig = sdk.SpeechConfig.fromHost("ws://localhost:5000");

Instead of using this Azure-cloud initialization config:

SPXSpeechConfiguration *speechConfig = [[SPXSpeechConfiguration alloc] initWithSubscription:...];

Use this config with the container host:

SPXSpeechConfiguration *speechConfig = [[SPXSpeechConfiguration alloc] initWithHost:"ws://localhost:5000"];

Instead of using this Azure-cloud initialization config:

let speechConfig = SPXSpeechConfiguration(subscription: "", region: "");

Use this config with the container host:

let speechConfig = SPXSpeechConfiguration(host: "ws://localhost:5000");

Instead of using this Azure-cloud initialization config:

speech_config = speechsdk.SpeechConfig(
    subscription=speech_key, region=service_region)

Use this config with the container endpoint:

speech_config = speechsdk.SpeechConfig(
    host="ws://localhost:5000")

When you use the Speech CLI in a container, include the --host ws://localhost:5000/ option. You must also specify --key none to ensure that the CLI doesn't try to use a Speech key for authentication. For information about how to configure the Speech CLI, see Get started with the Azure AI Speech CLI.

Try the speech to text quickstart using host authentication instead of key and region.

Next steps