Install and run Speech containers with Docker

Article
01/22/2024

By using containers, you can use a subset of the Speech service features in your own environment. In this article, you learn how to download, install, and run a Speech container.

Note

Disconnected container pricing and commitment tiers vary from standard containers. For more information, see Speech service pricing.

Prerequisites

You must meet the following prerequisites before you use Speech service containers. If you don't have an Azure subscription, create a free account before you begin. You need:

Docker installed on a host computer. Docker must be configured to allow the containers to connect with and send billing data to Azure.
- On Windows, Docker must also be configured to support Linux containers.
- You should have a basic understanding of Docker concepts.
A Speech service resource with the free (F0) or standard (S) pricing tier.

Billing arguments

Speech containers aren't licensed to run without being connected to Azure for metering. You must configure your container to communicate billing information with the metering service always.

Three primary parameters for all Azure AI containers are required. The Microsoft Software License Terms must be present with a value of accept. An Endpoint URI and API key are also needed.

Queries to the container are billed at the pricing tier of the Azure resource that's used for the ApiKey parameter.

The docker run command starts the container when all three of the following options are provided with valid values:

Option	Description
`ApiKey`	The API key of the Speech resource that's used to track billing information. The `ApiKey` value is used to start the container and is available on the Azure portal's Keys page of the corresponding Speech resource. Go to the Keys page, and select the Copy to clipboard icon.
`Billing`	The endpoint of the Speech resource that's used to track billing information. The endpoint is available on the Azure portal Overview page of the corresponding Speech resource. Go to the Overview page, hover over the endpoint, and a Copy to clipboard icon appears. Copy and use the endpoint where needed.
`Eula`	Indicates that you accepted the license for the container. The value of this option must be set to accept.

Important

These subscription keys are used to access your Azure AI services API. Don't share your keys. Store them securely. For example, use Azure Key Vault. We also recommend that you regenerate these keys regularly. Only one key is necessary to make an API call. When you regenerate the first key, you can use the second key for continued access to the service.

The container needs the billing argument values to run. These values allow the container to connect to the billing endpoint. The container reports usage about every 10 to 15 minutes. If the container doesn't connect to Azure within the allowed time window, the container continues to run but doesn't serve queries until the billing endpoint is restored. The connection is attempted 10 times at the same time interval of 10 to 15 minutes. If it can't connect to the billing endpoint within the 10 tries, the container stops serving requests. For an example of the information sent to Microsoft for billing, see the Azure AI container FAQ in the Azure AI services documentation.

For more information about these options, see Configure containers.

Container requirements and recommendations

The following table describes the minimum and recommended allocation of resources for each Speech container:

Container	Minimum	Recommended	Speech Model
Speech to text	4 core, 4-GB memory	8 core, 8-GB memory	+4 to 8 GB memory
Custom speech to text	4 core, 4-GB memory	8 core, 8-GB memory	+4 to 8 GB memory
Speech language identification	1 core, 1-GB memory	1 core, 1-GB memory	n/a
Neural text to speech	6 core, 12-GB memory	8 core, 16-GB memory	n/a

Each core must be at least 2.6 gigahertz (GHz) or faster.

Core and memory correspond to the --cpus and --memory settings, which are used as part of the docker run command.

Note

The minimum and recommended allocations are based on Docker limits, not the host machine resources. For example, speech to text containers memory map portions of a large language model. We recommend that the entire file should fit in memory. You need to add an additional 4 to 8 GB to load the speech models (see the previous table). Also, the first run of either container might take longer because models are being paged into memory.

Host computer requirements and recommendations

The host is an x64-based computer that runs the Docker container. It can be a computer on your premises or a Docker hosting service in Azure, such as:

Azure Kubernetes Service.
Azure Container Instances.
A Kubernetes cluster deployed to Azure Stack. For more information, see Deploy Kubernetes to Azure Stack.

Note

Containers support compressed audio input to the Speech SDK by using GStreamer. To install GStreamer in a container, follow Linux instructions for GStreamer in Use codec compressed audio input with the Speech SDK.

Advanced Vector Extension support

The host is the computer that runs the Docker container. The host must support Advanced Vector Extensions (AVX2). You can check for AVX2 support on Linux hosts with the following command:

grep -q avx2 /proc/cpuinfo && echo AVX2 supported || echo No AVX2 support detected

Warning

The host computer is required to support AVX2. The container will not function correctly without AVX2 support.

Run the container

Use the docker run command to run the container. Once running, the container continues to run until you stop the container.

Take note the following best practices with the docker run command:

Line-continuation character: The Docker commands in the following sections use the back slash, \, as a line continuation character. Replace or remove this character based on your host operating system's requirements.
Argument order: Don't change the order of the arguments unless you're familiar with Docker containers.

You can use the docker images command to list your downloaded container images. The following command lists the ID, repository, and tag of each downloaded container image, formatted as a table:

docker images --format "table {{.ID}}\t{{.Repository}}\t{{.Tag}}"

Here's an example result:

IMAGE ID         REPOSITORY                TAG
<image-id>       <repository-path/name>    <tag-name>

Validate that a container is running

There are several ways to validate that the container is running. Locate the External IP address and exposed port of the container in question, and open your favorite web browser. Use the various request URLs that follow to validate the container is running.

The example request URLs listed here are http://localhost:5000, but your specific container might vary. Make sure to rely on your container's External IP address and exposed port.

Request URL	Purpose
`http://localhost:5000/`	The container provides a home page.
`http://localhost:5000/ready`	Requested with GET, this URL provides a verification that the container is ready to accept a query against the model. This request can be used for Kubernetes liveness and readiness probes.
`http://localhost:5000/status`	Also requested with GET, this URL verifies if the api-key used to start the container is valid without causing an endpoint query. This request can be used for Kubernetes liveness and readiness probes.
`http://localhost:5000/swagger`	The container provides a full set of documentation for the endpoints and a Try it out feature. With this feature, you can enter your settings into a web-based HTML form and make the query without having to write any code. After the query returns, an example CURL command is provided to demonstrate the HTTP headers and body format that's required.

Stop the container

To shut down the container, in the command-line environment where the container is running, select Ctrl+C.

Run multiple containers on the same host

If you intend to run multiple containers with exposed ports, make sure to run each container with a different exposed port. For example, run the first container on port 5000 and the second container on port 5001.

You can have this container and a different Azure AI container running on the HOST together. You also can have multiple containers of the same Azure AI container running.

Host URLs

Note

Use a unique port number if you're running multiple containers.

Protocol	Host URL	Containers
WS	`ws://localhost:5000`	Speech to text Custom speech to text
HTTP	`http://localhost:5000`	Neural text to speech Speech language identification

For more information on using WSS and HTTPS protocols, see Container security in the Azure AI services documentation.

Troubleshooting

When you start or run the container, you might experience issues. Use an output mount and enable logging. Doing so allows the container to generate log files that are helpful when you troubleshoot issues.

Tip

For more troubleshooting information and guidance, see Azure AI containers frequently asked questions (FAQ) in the Azure AI services documentation.

Logging settings

Speech containers come with ASP.NET Core logging support. Here's an example of the neural-text-to-speech container started with default logging to the console:

docker run --rm -it -p 5000:5000 --memory 12g --cpus 6 \
mcr.microsoft.com/azure-cognitive-services/speechservices/neural-text-to-speech \
Eula=accept \
Billing={ENDPOINT_URI} \
ApiKey={API_KEY} \
Logging:Console:LogLevel:Default=Information

For more information about logging, see Configure Speech containers and usage records in the Azure AI services documentation.

Microsoft diagnostics container

If you're having trouble running an Azure AI container, you can try using the Microsoft diagnostics container. Use this container to diagnose common errors in your deployment environment that might prevent Azure AI containers from functioning as expected.

To get the container, use the following docker pull command:

docker pull mcr.microsoft.com/azure-cognitive-services/diagnostic

Then run the container. Replace {ENDPOINT_URI} with your endpoint, and replace {API_KEY} with your key to your resource:

docker run --rm mcr.microsoft.com/azure-cognitive-services/diagnostic \
eula=accept \
Billing={ENDPOINT_URI} \
ApiKey={API_KEY}

The container tests for network connectivity to the billing endpoint.

Run disconnected containers

To run disconnected containers (not connected to the internet), you must submit this request form and wait for approval. For more information about applying and purchasing a commitment plan to use containers in disconnected environments, see Use containers in disconnected environments in the Azure AI services documentation.

Next steps

Review configure containers for configuration settings.
Learn how to use Speech service containers with Kubernetes and Helm.
Deploy and run containers on Azure Container Instance
Use more Azure AI containers.

Install and run Speech containers with Docker

Prerequisites

Billing arguments

Container requirements and recommendations

Host computer requirements and recommendations

Advanced Vector Extension support

Run the container

Validate that a container is running

Stop the container

Run multiple containers on the same host

Host URLs

Troubleshooting

Logging settings

Microsoft diagnostics container

Run disconnected containers

Next steps

Feedback

Additional resources