安裝和執行語音服務容器Install and run Speech Service containers

語音容器可讓客戶建立一個已優化的語音應用程式架構,以充分利用強大的雲端功能和邊緣位置。Speech containers enable customers to build one speech application architecture that is optimized to take advantage of both robust cloud capabilities and edge locality.

這兩個語音容器是語音轉換文字文字轉換語音The two speech containers are speech-to-text and text-to-speech.

函數Function 功能Features 最新Latest
語音轉文字Speech-to-text
  • 使用中繼結果,將連續即時語音或批次音訊錄音可將成文字。Transcribes continuous real-time speech or batch audio recordings into text with intermediate results.
  • 1.2.01.2.0
    文字轉語音Text-to-Speech
  • 將文字轉換成自然發音語音。Converts text to natural-sounding speech. 使用純文字輸入或語音合成標記語言(SSML)。with plain text input or Speech Synthesis Markup Language (SSML).
  • 1.2.01.2.0

    如果您沒有 Azure 訂用帳戶,請在開始前建立 免費帳戶If you don't have an Azure subscription, create a free account before you begin.

    必要條件Prerequisites

    使用語音容器之前,您必須符合下列必要條件:You must meet the following prerequisites before using Speech containers:

    必要項Required 用途Purpose
    Docker 引擎Docker Engine 您必須在主機電腦上安裝 Docker 引擎。You need the Docker Engine installed on a host computer. Docker 提供可在 macOSWindowsLinux 上設定 Docker 環境的套件。Docker provides packages that configure the Docker environment on macOS, Windows, and Linux. 如需 Docker 和容器基本概念的入門,請參閱 Docker 概觀 (英文)。For a primer on Docker and container basics, see the Docker overview.

    Docker 必須設定為允許容器與 Azure 連線,以及傳送帳單資料至 Azure。Docker must be configured to allow the containers to connect with and send billing data to Azure.

    在 Windows 上,也必須將 Docker 設定為支援 Linux 容器。On Windows, Docker must also be configured to support Linux containers.

    熟悉 DockerFamiliarity with Docker 您應具備對 Docker 概念 (例如登錄、存放庫、容器和容器映像等) 的基本了解,以及基本 docker 命令的知識。You should have a basic understanding of Docker concepts, like registries, repositories, containers, and container images, as well as knowledge of basic docker commands.
    語音資源Speech resource 若要使用這些容器,您必須具備:In order to use these containers, you must have:

    Azure_語音_資源,用來取得相關聯的 API 金鑰和端點 URI。An Azure Speech resource to get the associated API key and endpoint URI. 這兩個值都可在 Azure 入口網站的 [語音總覽] 和 [金鑰] 頁面上取得。Both values are available on the Azure portal's Speech Overview and Keys pages. 這兩者都是啟動容器的必要項。They are both required to start the container.

    {API_KEY} :[金鑰] 頁面上有兩個可用的資源金鑰之一{API_KEY}: One of the two available resource keys on the Keys page

    {ENDPOINT_URI} :[總覽] 頁面上所提供的端點{ENDPOINT_URI}: The endpoint as provided on the Overview page

    要求存取容器登錄Request access to the container registry

    您必須先完成並提交認知服務的語音容器要求表單,以要求容器的存取權。You must first complete and submit the Cognitive Services Speech Containers Request form to request access to the container.

    該表格需要有關您本身、您的公司,以及您將會使用該容器之使用者情節的資訊。The form requests information about you, your company, and the user scenario for which you'll use the container. 您已提交表單之後,Azure 認知服務小組會檢閱,以確保您符合私人容器登錄的存取權的準則。After you've submitted the form, the Azure Cognitive Services team reviews it to ensure that you meet the criteria for access to the private container registry.

    重要

    您必須使用與在表單中的 Microsoft 帳戶 (MSA) 或 Azure Active Directory (Azure AD) 帳戶相關聯的電子郵件地址。You must use an email address that's associated with either a Microsoft Account (MSA) or Azure Active Directory (Azure AD) account in the form.

    如果您的要求獲得核准,您會收到一封電子郵件,說明如何取得您的認證和存取私人容器登錄庫的指示。If your request is approved, you'll receive an email with instructions that describe how to obtain your credentials and access the private container registry.

    使用 Docker CLI 來驗證私人容器登錄Use the Docker CLI to authenticate the private container registry

    您可以向私人容器登錄的認知服務容器中任一種,但建議的方法,從命令列是使用Docker CLIYou can authenticate with the private container registry for Cognitive Services Containers in any of several ways, but the recommended method from the command line is to use the Docker CLI.

    使用docker login命令所示,在下列範例中,登入containerpreview.azurecr.io,認知服務容器的私用容器登錄。Use the docker login command, as shown in the following example, to log in to containerpreview.azurecr.io, the private container registry for Cognitive Services Containers. 取代 <使用者名稱> 的使用者名稱與 <密碼> 提供認證,您已收到的密碼Azure 認知服務小組。Replace <username> with the user name and <password> with the password that's provided in the credentials you received from the Azure Cognitive Services team.

    docker login containerpreview.azurecr.io -u <username> -p <password>
    

    如果您已受保護您的認證在文字檔中,您可以藉由串連該文字檔的內容cat命令,以docker login命令,如下列範例所示。If you've secured your credentials in a text file, you can concatenate the contents of that text file, by using the cat command, to the docker login command, as shown in the following example. 取代 <passwordFile> 使用的路徑和名稱的文字檔案,其中包含密碼和 <username> 的使用者名稱提供您的認證。Replace <passwordFile> with the path and name of the text file that contains the password and <username> with the user name that's provided in your credentials.

    cat <passwordFile> | docker login containerpreview.azurecr.io -u <username> --password-stdin
    

    主機電腦The host computer

    主機是可執行 Docker 容器的 x64 型電腦。The host is a x64-based computer that runs the Docker container. 它可以是您內部部署的電腦,或是在 Azure 中裝載服務的 Docker,例如:It can be a computer on your premises or a Docker hosting service in Azure, such as:

    Advanced Vector Extension 支援Advanced Vector Extension support

    主機是執行 Docker 容器的電腦。The host is the computer that runs the docker container. 主機必須支援「先進向量延伸」(AVX2)。The host must support Advanced Vector Extensions (AVX2). 您可以使用下列命令,在 Linux 主機上檢查這項支援:You can check this support on Linux hosts with the following command:

    grep -q avx2 /proc/cpuinfo && echo AVX2 supported || echo No AVX2 support detected
    

    容器的需求和建議Container requirements and recommendations

    下表說明每個語音容器的最低和建議 CPU 核心和記憶體配置。The following table describes the minimum and recommended CPU cores and memory to allocate for each Speech container.

    容器Container 最小值Minimum 建議Recommended
    cognitive-services-speech-to-textcognitive-services-speech-to-text 2核心2 core
    2 GB 記憶體2-GB memory
    4 核心4 core
    4 GB 記憶體4-GB memory
    cognitive-services-text-to-speechcognitive-services-text-to-speech 1核心,0.5-GB 記憶體1 core, 0.5-GB memory 2核心,1 GB 記憶體2 core, 1-GB memory
    • 每個核心必須至少 2.6 GHz 或更快。Each core must be at least 2.6 gigahertz (GHz) or faster.

    核心和記憶體會對應至 --cpus--memory 設定,用來作為 docker run 命令的一部分。Core and memory correspond to the --cpus and --memory settings, which are used as part of the docker run command.

    注意;最低和建議是根據 Docker 限制,而是主機電腦資源。Note; The minimum and recommended are based off of Docker limits, not the host machine resources. 例如,適用于大型語言模型的語音轉換文字容器記憶體對應部分,_建議您_將整個檔案納入記憶體中,這是額外的 4-6 GB。For example, speech-to-text containers memory map portions of a large language model, and it is recommended that the entire file fits in memory, which is an additional 4-6 GB. 此外,第一次執行任一容器可能需要較長的時間,因為模型會分頁到記憶體中。Also, the first run of either container may take longer, since models are being paged into memory.

    使用 docker pull 取得容器映像Get the container image with docker pull

    可使用適用于語音的容器映射。Container images for Speech are available.

    容器Container 存放庫Repository
    cognitive-services-speech-to-textcognitive-services-speech-to-text containerpreview.azurecr.io/microsoft/cognitive-services-speech-to-text:latest
    cognitive-services-text-to-speechcognitive-services-text-to-speech containerpreview.azurecr.io/microsoft/cognitive-services-text-to-speech:latest

    提示

    您可以使用 docker images (英文) 命令來列出已下載的容器映像。You can use the docker images command to list your downloaded container images. 例如,下列命令會列出每個已下載之容器映像的識別碼、存放庫和標籤,並將它格式化為表格:For example, the following command lists the ID, repository, and tag of each downloaded container image, formatted as a table:

    docker images --format "table {{.ID}}\t{{.Repository}}\t{{.Tag}}"
    
    IMAGE ID         REPOSITORY                TAG
    <image-id>       <repository-path/name>    <tag-name>
    

    語言地區設定位於容器標記中Language locale is in container tag

    標記會提取地區設定和jessarus語音。 en-us latestThe latest tag pulls the en-us locale and jessarus voice.

    語音轉換文字地區設定Speech to text locales

    除了以外latest的所有標記都會採用下列格式, <culture>其中會指出地區設定容器:All tags, except for latest are in the following format, where the <culture> indicates the locale container:

    <major>.<minor>.<patch>-<platform>-<culture>-<prerelease>
    

    下列標記是格式的範例:The following tag is an example of the format:

    1.2.0-amd64-en-us-preview
    

    下表列出容器的1.2.0 版本中,語音轉換文字支援的地區設定:The following table lists the supported locales for speech-to-text in the 1.2.0 version of the container:

    語言地區設定Language locale TagsTags
    中文Chinese zh-cn
    英文English en-us
    en-gb
    en-au
    en-in
    法文French fr-ca
    fr-fr
    德文German de-de
    義大利文Italian it-it
    日文Japanese ja-jp
    韓文Korean ko-kr
    葡萄牙文Portuguese pt-br
    西班牙文Spanish es-es
    es-mx

    文字轉換語音的地區設定Text to speech locales

    除了以外latest的所有標記都採用下列格式, <culture>其中表示地區設定,而則<voice>表示容器的語音:All tags, except for latest are in the following format, where the <culture> indicates the locale and the <voice> indicates the voice of the container:

    <major>.<minor>.<patch>-<platform>-<culture>-<voice>-<prerelease>
    

    下列標記是格式的範例:The following tag is an example of the format:

    1.2.0-amd64-en-us-jessarus-preview
    

    下表列出容器的1.2.0 版本中,文字轉換語音支援的地區設定:The following table lists the supported locales for text-to-speech in the 1.2.0 version of the container:

    語言地區設定Language locale TagsTags 支援的語音Supported voices
    中文Chinese zh-cn huihuirushuihuirus
    kangkang-apollokangkang-apollo
    yaoyao-apolloyaoyao-apollo
    英文English en-au catherinecatherine
    hayleyrushayleyrus
    英文English en-gb george-apollogeorge-apollo
    hazelrushazelrus
    susan-apollosusan-apollo
    英文English en-in heera-apolloheera-apollo
    priyaruspriyarus
    ravi-apolloravi-apollo
    英文English en-us jessarusjessarus
    benjaminrusbenjaminrus
    jessa24krusjessa24krus
    ziraruszirarus
    guy24krusguy24krus
    法文French fr-ca carolinecaroline
    harmonierusharmonierus
    法文French fr-fr hortenserushortenserus
    julie-apollojulie-apollo
    paul-apollopaul-apollo
    德文German de-de heddahedda
    heddarusheddarus
    stefan-apollostefan-apollo
    義大利文Italian it-it cosimo-apollocosimo-apollo
    luciarusluciarus
    日文Japanese ja-jp ayumi-apolloayumi-apollo
    harukarusharukarus
    ichiro-apolloichiro-apollo
    韓文Korean ko-kr heamirusheamirus
    葡萄牙文Portuguese pt-br daniel-apollodaniel-apollo
    heloisarusheloisarus
    西班牙文Spanish es-es elenaruselenarus
    劉娜-apollolaura-apollo
    pablo-apollopablo-apollo
    西班牙文Spanish es-mx hildarushildarus
    raul-apolloraul-apollo

    適用于語音容器的 Docker pullDocker pull for the speech containers

    語音轉文字Speech-to-text

    docker pull containerpreview.azurecr.io/microsoft/cognitive-services-speech-to-text:latest
    

    文字轉換語音Text-to-speech

    docker pull containerpreview.azurecr.io/microsoft/cognitive-services-text-to-speech:latest
    

    如何使用容器How to use the container

    容器位於主機電腦上時,請透過下列程序來使用容器。Once the container is on the host computer, use the following process to work with the container.

    1. 使用所需的計費設定執行容器Run the container, with the required billing settings. docker run 命令有相關範例可供參考。More examples of the docker run command are available.
    2. 查詢容器的預測端點Query the container's prediction endpoint.

    透過 docker run 執行容器Run the container with docker run

    使用 docker run 命令來執行三個容器的其中一個。Use the docker run command to run any of the three containers. 此命令會使用下列參數:The command uses the following parameters:

    預覽期間,計費設定必須是有效的,才能啟動容器,但您不需支付使用量的費用。During the preview, the billing settings must be valid to start the container, but you aren't billed for usage.

    預留位置Placeholder Value
    {API_KEY}{API_KEY} 此金鑰用來啟動容器,並可在 Azure 入口網站的 [語音金鑰] 頁面上取得。This key is used to start the container, and is available on the Azure portal's Speech Keys page.
    {ENDPOINT_URI}{ENDPOINT_URI} [計費端點 URI] 值可在 Azure 入口網站的語音總覽頁面上取得。The billing endpoint URI value is available on the Azure portal's Speech Overview page.

    請以您自己的值取代下列範例 docker run 命令中的參數。Replace these parameters with your own values in the following example docker run command.

    文字轉換語音Text-to-speech

    docker run --rm -it -p 5000:5000 --memory 2g --cpus 1 \
    containerpreview.azurecr.io/microsoft/cognitive-services-text-to-speech \
    Eula=accept \
    Billing={ENDPOINT_URI} \
    ApiKey={API_KEY}
    

    語音轉文字Speech-to-text

    docker run --rm -it -p 5000:5000 --memory 2g --cpus 2 \
    containerpreview.azurecr.io/microsoft/cognitive-services-speech-to-text \
    Eula=accept \
    Billing={ENDPOINT_URI} \
    ApiKey={API_KEY}
    

    此命令:This command:

    • 從容器映射執行語音容器Runs a Speech container from the container image
    • 配置2個 CPU 核心和 2 gb 的記憶體Allocates 2 CPU cores and 2 gigabytes (GB) of memory
    • 公開 TCP 連接埠 5000,並為容器配置虛擬 TTYExposes TCP port 5000 and allocates a pseudo-TTY for the container
    • 在容器結束之後自動將其移除。Automatically removes the container after it exits. 容器映像仍可在主機電腦上使用。The container image is still available on the host computer.

    重要

    必須指定 EulaBillingApiKey 選項以執行容器,否則容器將不會啟動。The Eula, Billing, and ApiKey options must be specified to run the container; otherwise, the container won't start. 如需詳細資訊,請參閱帳單For more information, see Billing.

    查詢容器的預測端點Query the container's prediction endpoint

    容器Container 端點Endpoint
    語音轉文字Speech-to-text ws://localhost:5000/speech/recognition/dictation/cognitiveservices/v1ws://localhost:5000/speech/recognition/dictation/cognitiveservices/v1
    文字轉換語音Text-to-speech http://localhost:5000/speech/synthesize/cognitiveservices/v1

    語音轉文字Speech-to-text

    容器提供以 websocket 為基礎的查詢端點 Api,可透過語音 SDK存取。The container provides websocket-based query endpoint APIs, that are accessed through the Speech SDK.

    根據預設,語音 SDK 會使用線上語音服務。By default, the Speech SDK uses online speech services. 若要使用容器,您必須變更初始化方法。To use the container, you need to change the initialization method. 請參閱以下範例。See the examples below.

    針對 C#For C#

    從使用此 Azure 雲端初始化呼叫變更:Change from using this Azure-cloud initialization call:

    var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
    

    為使用容器端點的這項呼叫:to this call using the container endpoint:

    var config = SpeechConfig.FromEndpoint(
        new Uri("ws://localhost:5000/speech/recognition/dictation/cognitiveservices/v1"),
        "YourSubscriptionKey");
    

    針對 PythonFor Python

    從使用此 Azure 雲端初始化呼叫變更Change from using this Azure-cloud initialization call

    speech_config = speechsdk.SpeechConfig(
        subscription=speech_key, region=service_region)
    

    為使用容器端點的這項呼叫:to this call using the container endpoint:

    speech_config = speechsdk.SpeechConfig(
        subscription=speech_key, endpoint="ws://localhost:5000/speech/recognition/dictation/cognitiveservices/v1")
    

    文字轉換語音Text-to-speech

    容器提供 REST 端點 Api,您可以在這裡找到,也可以在這裡找到範例。The container provides REST endpoint APIs that can be found here and samples can be found here.

    驗證容器正在執行Validate that a container is running

    有數種方式可驗證容器正在執行。There are several ways to validate that the container is running. 找出有問題之容器的外部 IP位址和公開端口, 然後開啟您最愛的網頁瀏覽器。Locate the External IP address and exposed port of the container in question, and open your favorite web browser. 使用下列各種要求 Url 來驗證容器是否正在執行。Use the various request URLs below to validate the container is running. 下面所列的範例要求 url http://localhost:5000是, 但您的特定容器可能會有所不同。The example request URLs listed below are http://localhost:5000, but your specific container may vary. 請記住, 您會依賴容器的外部 IP位址和公開的埠。Keep in mind that you're to rely on your container's External IP address and exposed port.

    要求 URLRequest URL 用途Purpose
    http://localhost:5000/ 容器會提供首頁。The container provides a home page.
    http://localhost:5000/status 使用 HTTP GET 要求, 以驗證容器是否正在執行, 而不會造成端點查詢。Requested with an HTTP GET, to validate that the container is running without causing an endpoint query. 此要求可用來進行 Kubernetes 活躍度和整備度探查 (英文)。This request can be used for Kubernetes liveness and readiness probes.
    http://localhost:5000/swagger 容器會為端點提供一組完整的檔, 並使用 [試用] 功能。The container provides a full set of documentation for the endpoints and a Try it out feature. 使用此功能,您可以將自己的設定輸入至以 Web 為基礎的 HTML 表單並進行查詢,而無須撰寫任何程式碼。With this feature, you can enter your settings into a web-based HTML form and make the query without having to write any code. 當查詢傳回時,會提供範例 CURL 命令來示範所需的 HTTP 標頭和本文格式。After the query returns, an example CURL command is provided to demonstrate the HTTP headers and body format that's required.

    容器的首頁

    停止容器Stop the container

    若要關閉的容器,容器執行所在的命令列環境中選取Ctrl + CTo shut down the container, in the command-line environment where the container is running, select Ctrl+C.

    疑難排解Troubleshooting

    當您執行容器時,容器會使用 stdoutstderr 來輸出資訊,該資訊有助於排解在啟動或執行容器時所發生的問題。When you run the container, the container uses stdout and stderr to output information that is helpful to troubleshoot issues that happen while starting or running the container.

    帳務Billing

    語音容器會使用您 Azure 帳戶上的_語音_資源,將帳單資訊傳送至 azure。The Speech containers send billing information to Azure, using a Speech resource on your Azure account.

    至容器的查詢會使用於 Azure 資源的定價層計費<ApiKey>Queries to the container are billed at the pricing tier of the Azure resource that's used for the <ApiKey>.

    Azure 認知服務容器在未連線至計費端點以進行計量的情況下,將無法被授權以執行。Azure Cognitive Services containers aren't licensed to run without being connected to the billing endpoint for metering. 您必須讓容器隨時都能與計量端點進行帳單資訊的通訊。You must enable the containers to communicate billing information with the billing endpoint at all times. 認知服務容器不會將客戶資料 (例如正在分析的影像或文字) 傳送至 Microsoft。Cognitive Services containers don't send customer data, such as the image or text that's being analyzed, to Microsoft.

    連接到 AzureConnect to Azure

    容器需要計費引數值才能執行。The container needs the billing argument values to run. 這些值讓容器能夠連線到計費端點。These values allow the container to connect to the billing endpoint. 容器會每隔 10 到 15 分鐘回報使用量。The container reports usage about every 10 to 15 minutes. 如果容器未在允許的時間範圍內連線到 Azure,容器會繼續執行,但在還原計費端點之前不會提供查詢。If the container doesn't connect to Azure within the allowed time window, the container continues to run but doesn't serve queries until the billing endpoint is restored. 以 10 到 15 分鐘的相同時間間隔嘗試連線 10 次。The connection is attempted 10 times at the same time interval of 10 to 15 minutes. 如果無法在 10 次嘗試內連線到計費端點,則容器會停止執行。If it can't connect to the billing endpoint within the 10 tries, the container stops running.

    計費引數Billing arguments

    針對docker run命令來啟動容器時,必須指定下列選項中的所有三個有效的值:For the docker run command to start the container, all three of the following options must be specified with valid values:

    選項Option 說明Description
    ApiKey 用來追蹤帳單資訊的認知服務資源的 API 金鑰。The API key of the Cognitive Services resource that's used to track billing information.
    此選項的值必須設定為 佈建的資源中指定的 API 金鑰BillingThe value of this option must be set to an API key for the provisioned resource that's specified in Billing.
    Billing 用來追蹤帳單資訊的認知服務資源端點。The endpoint of the Cognitive Services resource that's used to track billing information.
    此選項的值必須設定為已佈建 Azure 資源的端點 URI。The value of this option must be set to the endpoint URI of a provisioned Azure resource.
    Eula 表示您接受容器的授權。Indicates that you accepted the license for the container.
    此選項的值必須設定為接受The value of this option must be set to accept.

    如需這些選項的詳細資訊,請參閱設定容器For more information about these options, see Configure containers.

    部落格文章Blog posts

    開發人員範例Developer samples

    開發人員範例可從我們的 GitHub 存放庫取得。Developer samples are available at our GitHub repository.

    檢視網路研討會View webinar

    加入網路研討會以了解:Join the webinar to learn about:

    • 如何將認知服務部署到任何使用 Docker 的機器How to deploy Cognitive Services to any machine using Docker
    • 如何將認知服務部署到 AKSHow to deploy Cognitive Services to AKS

    總結Summary

    在本文中,您已瞭解下載、安裝及執行語音容器的概念和工作流程。In this article, you learned concepts and workflow for downloading, installing, and running Speech containers. 摘要說明:In summary:

    • 語音提供兩個適用于 Docker 的 Linux 容器,封裝語音轉換文字和文字轉換語音。Speech provides two Linux containers for Docker, encapsulating speech to text and text to speech.
    • 容器映像是從 Azure 中的私人容器登錄下載。Container images are downloaded from the private container registry in Azure.
    • 容器映像是在 Docker 中執行。Container images run in Docker.
    • 您可以藉由指定容器的主機 URI,使用 REST API 或 SDK 來呼叫語音容器中的作業。You can use either the REST API or SDK to call operations in Speech containers by specifying the host URI of the container.
    • 當具現化容器時,您必須提供帳單資訊。You're required to provide billing information when instantiating a container.

    重要

    認知服務容器在未連線至 Azure 以進行計量的情況下,將無法被授權以執行。Cognitive Services containers are not licensed to run without being connected to Azure for metering. 客戶必須啟用容器以持續與計量服務進行帳單資訊的通訊。Customers need to enable the containers to communicate billing information with the metering service at all times. 認知服務容器不會將客戶資料 (例如正在分析的影像或文字) 傳送至 Microsoft。Cognitive Services containers do not send customer data (e.g., the image or text that is being analyzed) to Microsoft.

    後續步驟Next steps