您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

安装并运行语音服务容器Install and run Speech Service containers

语音容器使客户能够构建一个语音应用程序体系结构,经过优化,可利用可靠的云功能和边缘位置。Speech containers enable customers to build one speech application architecture that is optimized to take advantage of both robust cloud capabilities and edge locality.

两个语音容器是语音到文本文本到语音转换The two speech containers are speech-to-text and text-to-speech.

函数Function 功能Features 最新Latest
  • Transcribes 连续实时语音或批处理音频录制到具有中间结果的文本。Transcribes continuous real-time speech or batch audio recordings into text with intermediate results.
  • 将文本转换为自然发音的语音。Converts text to natural-sounding speech. 使用纯文本输入或语音合成标记语言 (SSML)。with plain text input or Speech Synthesis Markup Language (SSML).

    如果没有 Azure 订阅,请在开始之前创建一个免费帐户If you don't have an Azure subscription, create a free account before you begin.


    使用语音容器之前,必须满足以下先决条件:You must meet the following prerequisites before using Speech containers:

    需要Required 目的Purpose
    Docker 引擎Docker Engine 需要在主计算机上安装 Docker 引擎。You need the Docker Engine installed on a host computer. Docker 提供用于在 macOSWindowsLinux 上配置 Docker 环境的包。Docker provides packages that configure the Docker environment on macOS, Windows, and Linux. 有关 Docker 和容器的基础知识,请参阅 Docker 概述For a primer on Docker and container basics, see the Docker overview.

    必须将 Docker 配置为允许容器连接 Azure 并向其发送账单数据。Docker must be configured to allow the containers to connect with and send billing data to Azure.

    在 Windows 上,还必须将 Docker 配置为支持 Linux 容器。On Windows, Docker must also be configured to support Linux containers.

    熟悉 DockerFamiliarity with Docker 应对 Docker 概念有基本的了解,例如注册表、存储库、容器和容器映像,以及基本的 docker 命令的知识。You should have a basic understanding of Docker concepts, like registries, repositories, containers, and container images, as well as knowledge of basic docker commands.
    语音资源Speech resource 若要使用这些容器,必须具有:In order to use these containers, you must have:

    一个_语音_Azure 资源以获取对关联的帐单密钥和计费终结点 URI。A Speech Azure resource to get the associated billing key and billing endpoint URI. 这两个值都出现在 Azure 门户语音概述和密钥页和是否需要启动该容器。Both values are available on the Azure portal's Speech Overview and Keys pages and are required to start the container.

    {BILLING_KEY} :资源密钥{BILLING_KEY}: resource key

    {BILLING_ENDPOINT_URI} :终结点 URI 示例如下:https://westus.api.cognitive.microsoft.com/sts/v1.0{BILLING_ENDPOINT_URI}: endpoint URI example is: https://westus.api.cognitive.microsoft.com/sts/v1.0

    请求访问容器注册表Request access to the container registry

    必须先完成并提交认知服务语音容器请求窗体容器请求访问。You must first complete and submit the Cognitive Services Speech Containers Request form to request access to the container.

    通过该表单请求有关你、你的公司以及要使用该容器的用户方案的信息。The form requests information about you, your company, and the user scenario for which you'll use the container. 提交表单后,Azure 认知服务团队可以检查它,确保你满足访问专用容器注册表的条件。After you've submitted the form, the Azure Cognitive Services team reviews it to ensure that you meet the criteria for access to the private container registry.


    必须使用与表单中的 Microsoft 帐户 (MSA) 或 Azure Active Directory (Azure AD) 帐户关联的电子邮件地址。You must use an email address that's associated with either a Microsoft Account (MSA) or Azure Active Directory (Azure AD) account in the form.

    如果请求获得批准,则你会收到一封电子邮件,其中说明了如何获取凭据和访问专用容器注册表。If your request is approved, you'll receive an email with instructions that describe how to obtain your credentials and access the private container registry.

    使用 Docker CLI 对专用容器注册表进行身份验证Use the Docker CLI to authenticate the private container registry

    可通过多种方法中的任何一种使用认知服务容器的专用容器注册表进行身份验证,但建议的方法是在 Docker CLI 中使用命令行。You can authenticate with the private container registry for Cognitive Services Containers in any of several ways, but the recommended method from the command line is to use the Docker CLI.

    使用 docker login 命令(如以下示例所示)登录到 containerpreview.azurecr.io,即认知服务容器的专用容器注册表。Use the docker login command, as shown in the following example, to log in to containerpreview.azurecr.io, the private container registry for Cognitive Services Containers. 将 <username> 替换为用户名,将 <password> 替换为从 Azure 认知服务团队收到的凭据中提供的密码 。Replace <username> with the user name and <password> with the password that's provided in the credentials you received from the Azure Cognitive Services team.

    docker login containerpreview.azurecr.io -u <username> -p <password>

    如果已在文本文件中保护了凭据,则可以使用 cat 命令将该文本文件的内容连接到 docker login 命令,如以下示例所示。If you've secured your credentials in a text file, you can concatenate the contents of that text file, by using the cat command, to the docker login command, as shown in the following example. 将 <passwordFile> 替换为包含密码的文本文件的路径和名称,将 <username> 替换为凭据中提供的用户名 。Replace <passwordFile> with the path and name of the text file that contains the password and <username> with the user name that's provided in your credentials.

    cat <passwordFile> | docker login containerpreview.azurecr.io -u <username> --password-stdin

    主计算机The host computer

    主机是运行 Docker 容器且基于 x64 的计算机。The host is a x64-based computer that runs the Docker container. 它可以是本地计算机或 Azure 中的 Docker 托管服务,例如:It can be a computer on your premises or a Docker hosting service in Azure, such as:

    高级的矢量扩展支持Advanced Vector Extension support

    主机是运行 docker 容器的计算机。The host is the computer that runs the docker container. 主机必须支持高级矢量扩展(AVX2)。The host must support Advanced Vector Extensions (AVX2). 你可以检查这种支持在 Linux 主机使用以下命令:You can check this support on Linux hosts with the following command:

    grep -q avx2 /proc/cpuinfo && echo AVX2 supported || echo No AVX2 support detected

    容器要求和建议Container requirements and recommendations

    下表描述的最低和推荐 CPU 内核和内存来为每个语音容器分配。The following table describes the minimum and recommended CPU cores and memory to allocate for each Speech container.

    容器Container 最小值Minimum 建议Recommended
    cognitive-services-speech-to-textcognitive-services-speech-to-text 2 个核心2 core
    2 GB 内存2 GB memory
    4 核4 core
    4 GB 内存4 GB memory
    cognitive-services-text-to-speechcognitive-services-text-to-speech 1 个核心,获得 0.5 GB 内存1 core, 0.5 GB memory 2 核,1 GB 内存2 core, 1 GB memory
    • 每个核心必须至少为 2.6 千兆赫 (GHz) 或更快。Each core must be at least 2.6 gigahertz (GHz) or faster.

    核心和内存对应于 --cpus--memory 设置,用作 docker run 命令的一部分。Core and memory correspond to the --cpus and --memory settings, which are used as part of the docker run command.

    请注意;最低和推荐基于 Docker 限制主机计算机资源。Note; The minimum and recommended are based off of Docker limits, not the host machine resources. 例如,语音转文本容器内存映射部分大型语言模型,和它是_建议_整个文件适合在内存中,这是额外的 4-6 GB。For example, speech-to-text containers memory map portions of a large language model, and it is recommended that the entire file fits in memory, which is an additional 4-6 GB. 此外,首次运行任一容器可能需要更长时间,因为模型正在换到内存中。Also, the first run of either container may take longer, since models are being paged into memory.

    使用 docker pull 获取容器映像Get the container image with docker pull

    提供了有关语音的容器映像。Container images for Speech are available.

    容器Container 存储库Repository
    cognitive-services-speech-to-textcognitive-services-speech-to-text containerpreview.azurecr.io/microsoft/cognitive-services-speech-to-text:latest
    cognitive-services-text-to-speechcognitive-services-text-to-speech containerpreview.azurecr.io/microsoft/cognitive-services-text-to-speech:latest


    可以使用 docker images 命令列出下载的容器映像。You can use the docker images command to list your downloaded container images. 例如,以下命令以表格列出每个下载的容器映像的 ID、存储库和标记:For example, the following command lists the ID, repository, and tag of each downloaded container image, formatted as a table:

    docker images --format "table {{.ID}}\t{{.Repository}}\t{{.Tag}}"
    IMAGE ID            REPOSITORY              TAG
    ebbee78a6baa       <container-name>         latest

    语言区域设置是在容器标记Language locale is in container tag

    latest标记拉取en-us区域设置和jessarus语音。The latest tag pulls the en-us locale and jessarus voice.

    语音转文本区域设置Speech to text locales

    所有标记除外latest采用以下格式,其中<culture>表示的区域设置容器:All tags, except for latest are in the following format, where the <culture> indicates the locale container:


    以下标记是格式的示例:The following tag is an example of the format:


    下表列出了支持的区域设置为语音到文本1.1.1 中的容器的版本:The following table lists the supported locales for speech-to-text in the 1.1.1 version of the container:

    语言区域设置Language locale 标记Tags
    中文Chinese zh-cn
    英语English en-us
    法语French fr-ca
    德语German de-de
    意大利语Italian it-it
    日语Japanese ja-jp
    韩语Korean ko-kr
    葡萄牙语Portuguese pt-br
    西班牙语Spanish es-es

    文本到语音转换的区域设置Text to speech locales

    所有标记除外latest采用以下格式,其中<culture>指示的区域设置和<voice>指示容器的语音:All tags, except for latest are in the following format, where the <culture> indicates the locale and the <voice> indicates the voice of the container:


    以下标记是格式的示例:The following tag is an example of the format:


    下表列出了支持的区域设置为文本到语音转换1.1.0 中的容器的版本:The following table lists the supported locales for text-to-speech in the 1.1.0 version of the container:

    语言区域设置Language locale 标记Tags 支持的语音Supported voices
    中文Chinese zh-cn huihuirushuihuirus
    yaoyao apolloyaoyao-apollo
    英语English en-au catherinecatherine
    英语English en-gb george-apollogeorge-apollo
    英语English en-in heera apolloheera-apollo
    英语English en-us jessarusjessarus
    法语French fr-ca carolinecaroline
    法语French fr-fr hortenserushortenserus
    德语German de-de heddahedda
    意大利语Italian it-it cosimo-apollocosimo-apollo
    日语Japanese ja-jp ayumi-apolloayumi-apollo
    ichiro apolloichiro-apollo
    韩语Korean ko-kr heamirusheamirus
    葡萄牙语Portuguese pt-br daniel-apollodaniel-apollo
    西班牙语Spanish es-es elenaruselenarus
    西班牙语Spanish es-mx hildarushildarus

    语音容器的 docker 拉取Docker pull for the speech containers


    docker pull containerpreview.azurecr.io/microsoft/cognitive-services-speech-to-text:latest


    docker pull containerpreview.azurecr.io/microsoft/cognitive-services-text-to-speech:latest

    如何使用容器How to use the container

    一旦容器位于主计算机上,请通过以下过程使用容器。Once the container is on the host computer, use the following process to work with the container.

    1. 使用所需的而不是所用的计费设置来运行容器Run the container, with the required but not used billing settings. 提供 docker run 命令的多个示例More examples of the docker run command are available.
    2. 查询容器的预测终结点Query the container's prediction endpoint.

    通过 docker run 运行容器Run the container with docker run

    使用 docker run 命令运行三个容器中的任意一个。Use the docker run command to run any of the three containers. 该命令使用以下参数:The command uses the following parameters:

    在预览版期间、 计费设置必须是有效来启动该容器,但您不按使用量计费。During the preview, the billing settings must be valid to start the container, but you aren't billed for usage.

    占位符Placeholder Value
    {BILLING_KEY}{BILLING_KEY} 此密钥用于启动此容器,并可在 Azure 门户的语音密钥页上。This key is used to start the container, and is available on the Azure portal's Speech Keys page.
    {BILLING_ENDPOINT_URI}{BILLING_ENDPOINT_URI} 计费终结点 URI 值是可在 Azure 门户的语音概述页上。The billing endpoint URI value is available on the Azure portal's Speech Overview page.

    在以下示例 docker run 命令中,请将这些参数替换为自己的值。Replace these parameters with your own values in the following example docker run command.


    docker run --rm -it -p 5000:5000 --memory 2g --cpus 1 \
    containerpreview.azurecr.io/microsoft/cognitive-services-text-to-speech \
    Eula=accept \


    docker run --rm -it -p 5000:5000 --memory 2g --cpus 2 \
    containerpreview.azurecr.io/microsoft/cognitive-services-speech-to-text \
    Eula=accept \

    此命令:This command:

    • 在语音容器运行容器映像Runs a Speech container from the container image
    • 2 个 CPU 内核和 2 千兆字节 (GB) 的内存分配Allocates 2 CPU cores and 2 gigabytes (GB) of memory
    • 公开 TCP 端口 5000,并为容器分配伪 TTYExposes TCP port 5000 and allocates a pseudo-TTY for the container
    • 退出后自动删除容器。Automatically removes the container after it exits. 容器映像在主计算机上仍然可用。The container image is still available on the host computer.


    必须指定 EulaBillingApiKey 选项运行容器;否则,该容器不会启动。The Eula, Billing, and ApiKey options must be specified to run the container; otherwise, the container won't start. 有关详细信息,请参阅计费For more information, see Billing.

    查询容器的预测终结点Query the container's prediction endpoint

    容器Container 终结点Endpoint
    语音转文本Speech-to-text ws://localhost:5000/speech/recognition/dictation/cognitiveservices/v1ws://localhost:5000/speech/recognition/dictation/cognitiveservices/v1
    文本转语音Text-to-speech http://localhost:5000/speech/synthesize/cognitiveservices/v1


    容器提供了基于 websocket 的查询终结点 Api,通过访问Speech SDKThe container provides websocket-based query endpoint APIs, that are accessed through the Speech SDK.

    默认情况下,语音 SDK 使用联机语音服务。By default, the Speech SDK uses online speech services. 若要使用该容器,需要更改初始化方法。To use the container, you need to change the initialization method. 请参阅下面的示例。See the examples below.

    对于 C#For C#

    请从使用此 Azure 云初始化调用:Change from using this Azure-cloud initialization call:

    var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");

    更改为使用容器终结点发出此调用:to this call using the container endpoint:

    var config = SpeechConfig.FromEndpoint(
        new Uri("ws://localhost:5000/speech/recognition/dictation/cognitiveservices/v1"),

    对于 PythonFor Python

    请从使用此 Azure 云初始化调用Change from using this Azure-cloud initialization call

    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

    更改为使用容器终结点发出此调用:to this call using the container endpoint:

    speech_config = speechsdk.SpeechConfig(subscription=speech_key, endpoint="ws://localhost:5000/speech/recognition/dictation/cognitiveservices/v1")


    容器提供了 REST 终结点 Api 可找到这里和可以找到示例此处The container provides REST endpoint APIs which can be found here and samples can be found here.

    验证容器是否正在运行Validate that a container is running

    有几种方法可用于验证容器是否正在运行。There are several ways to validate that the container is running.

    请求Request 目的Purpose
    http://localhost:5000/ 容器提供主页。The container provides a home page.
    http://localhost:5000/status 使用 GET 进行了请求,从而在不会导致终结点查询的情况下验证容器是否正在运行。Requested with GET, to validate that the container is running without causing an endpoint query. 此请求可用于 Kubernetes 运行情况和就绪情况探测This request can be used for Kubernetes liveness and readiness probes.
    http://localhost:5000/swagger 容器提供终结点以及 Try it now 功能的整套文档。The container provides a full set of documentation for the endpoints and a Try it now feature. 使用此功能可以将设置输入到基于 Web 的 HTML 表单并进行查询,而无需编写任何代码。With this feature, you can enter your settings into a web-based HTML form and make the query without having to write any code. 查询返回后,将提供示例 CURL 命令,用于演示所需的 HTTP 标头和正文格式。After the query returns, an example CURL command is provided to demonstrate the HTTP headers and body format that's required.


    停止容器Stop the container

    若要关闭容器,请在运行容器的命令行环境中选择 Ctrl+C。To shut down the container, in the command-line environment where the container is running, select Ctrl+C.


    运行该容器时,该容器将使用 stdoutstderr 来输出信息,这些信息有助于排查启动或运行容器时发生的问题。When you run the container, the container uses stdout and stderr to output information that is helpful to troubleshoot issues that happen while starting or running the container.


    计费到 Azure 的信息,请使用语音容器发送_语音_上你的 Azure 帐户的资源。The Speech containers send billing information to Azure, using a Speech resource on your Azure account.

    对该容器的查询在用于 <ApiKey> 的 Azure 资源的定价层计费。Queries to the container are billed at the pricing tier of the Azure resource that's used for the <ApiKey>.

    如果未连接到计费终结点进行计量,则 Azure 认知服务容器不会被许可运行。Azure Cognitive Services containers aren't licensed to run without being connected to the billing endpoint for metering. 必须始终让容器可以向计费终结点传送计费信息。You must enable the containers to communicate billing information with the billing endpoint at all times. 认知服务容器不会将客户数据(例如,正在分析的图像或文本)发送给 Microsoft。Cognitive Services containers don't send customer data, such as the image or text that's being analyzed, to Microsoft.

    连接到 AzureConnect to Azure

    容器需要计费参数值才能运行。The container needs the billing argument values to run. 这些值使容器可以连接到计费终结点。These values allow the container to connect to the billing endpoint. 容器约每 10 到 15 分钟报告一次使用情况。The container reports usage about every 10 to 15 minutes. 如果容器未在允许的时间范围内连接到 Azure,容器将继续运行,但不会为查询提供服务,直到计费终结点恢复。If the container doesn't connect to Azure within the allowed time window, the container continues to run but doesn't serve queries until the billing endpoint is restored. 尝试连接按 10 到 15 分钟的相同时间间隔进行 10 次。The connection is attempted 10 times at the same time interval of 10 to 15 minutes. 如果无法在 10 次尝试内连接到计费终结点,容器将停止运行。If it can't connect to the billing endpoint within the 10 tries, the container stops running.

    计费参数Billing arguments

    必须使用有效值指定所有以下三个选项,才能使 docker run 命令启动容器:For the docker run command to start the container, all three of the following options must be specified with valid values:

    选项Option 说明Description
    ApiKey 用于跟踪计费信息的认知服务资源的 API 密钥。The API key of the Cognitive Services resource that's used to track billing information.
    必须将此选项的值设置为 Billing 中指定的已预配资源的 API 密钥。The value of this option must be set to an API key for the provisioned resource that's specified in Billing.
    Billing 用于跟踪计费信息的认知服务资源的终结点。The endpoint of the Cognitive Services resource that's used to track billing information.
    必须将此选项的值设置为已预配的 Azure 资源的终结点 URI。The value of this option must be set to the endpoint URI of a provisioned Azure resource.
    Eula 表示已接受容器的许可条款。Indicates that you accepted the license for the container.
    此选项的值必须设置为 acceptThe value of this option must be set to accept.

    有关这些选项的详细信息,请参阅配置容器For more information about these options, see Configure containers.

    博客文章Blog posts

    开发人员示例Developer samples

    可在 GitHub 存储库中查看开发人员示例。Developer samples are available at our GitHub repository.

    观看网络研讨会View webinar

    加入网络研讨会了解:Join the webinar to learn about:

    • 如何将认知服务部署到任何使用 Docker 的计算机How to deploy Cognitive Services to any machine using Docker
    • 如何将认知服务部署到 AKSHow to deploy Cognitive Services to AKS


    在本文中,已学习的概念和下载、 安装和运行语音的容器的工作流。In this article, you learned concepts and workflow for downloading, installing, and running Speech containers. 综上所述:In summary:

    • 语音提供 Docker,用于封装语音转文本和文本到语音转换的两个 Linux 容器。Speech provides two Linux containers for Docker, encapsulating speech to text and text to speech.
    • 可从 Azure 中的专用容器注册表下载容器映像。Container images are downloaded from the private container registry in Azure.
    • 容器映像在 Docker 中运行。Container images run in Docker.
    • 可以使用 REST API 或 SDK 通过指定主机的容器的 URI 调用语音容器中的操作。You can use either the REST API or SDK to call operations in Speech containers by specifying the host URI of the container.
    • 必须在实例化容器时指定账单信息。You must specify billing information when instantiating a container.


    如果未连接到 Azure 进行计量,则无法授权并运行认知服务容器。Cognitive Services containers are not licensed to run without being connected to Azure for metering. 客户需要始终让容器向计量服务传送账单信息。Customers need to enable the containers to communicate billing information with the metering service at all times. 认知服务容器不会将客户数据(例如,正在分析的图像或文本)发送给 Microsoft。Cognitive Services containers do not send customer data (e.g., the image or text that is being analyzed) to Microsoft.

    后续步骤Next steps