Use Speech service containers with Kubernetes and Helm

One option to manage your Speech containers on-premises is to use Kubernetes and Helm. Using Kubernetes and Helm to define the speech-to-text and text-to-speech container images, we'll create a Kubernetes package. This package will be deployed to a Kubernetes cluster on-premises. Finally, we'll explore how to test the deployed services and various configuration options. For more information about running Docker containers without Kubernetes orchestration, see install and run Speech service containers.

Prerequisites

The following prerequisites before using Speech containers on-premises:

Required Purpose
Azure Account If you don't have an Azure subscription, create a free account before you begin.
Container Registry access In order for Kubernetes to pull the docker images into the cluster, it will need access to the container registry.
Kubernetes CLI The Kubernetes CLI is required for managing the shared credentials from the container registry. Kubernetes is also needed before Helm, which is the Kubernetes package manager.
Helm CLI Install the Helm CLI, which is used to to install a helm chart (container package definition).
Speech resource In order to use these containers, you must have:

A Speech Azure resource to get the associated billing key and billing endpoint URI. Both values are available on the Azure portal's Speech Overview and Keys pages and are required to start the container.

{API_KEY}: resource key

{ENDPOINT_URI}: endpoint URI example is: https://westus.api.cognitive.microsoft.com/sts/v1.0

Refer to the Speech service container host computer details as a reference. This helm chart automatically calculates CPU and memory requirements based on how many decodes (concurrent requests) that the user specifies. Additionally, it will adjust based on whether optimizations for audio/text input are configured as enabled. The helm chart defaults to, two concurrent requests and disabling optimization.

Service CPU / Container Memory / Container
Speech-to-Text one decoder requires a minimum of 1,150 millicores. If the optimizedForAudioFile is enabled, then 1,950 millicores are required. (default: two decoders) Required: 2 GB
Limited: 4 GB
Text-to-Speech one concurrent request requires a minimum of 500 millicores. If the optimizeForTurboMode is enabled, then 1,000 millicores are required. (default: two concurrent requests) Required: 1 GB
Limited: 2 GB

Connect to the Kubernetes cluster

The host computer is expected to have an available Kubernetes cluster. See this tutorial on deploying a Kubernetes cluster for a conceptual understanding of how to deploy a Kubernetes cluster to a host computer.

Sharing Docker credentials with the Kubernetes cluster

To allow the Kubernetes cluster to docker pull the configured image(s) from the containerpreview.azurecr.io container registry, you need to transfer the docker credentials into the cluster. Execute the kubectl create command below to create a docker-registry secret based on the credentials provided from the container registry access prerequisite.

From your command-line interface of choice, run the following command. Be sure to replace the <username>, <password>, and <email-address> with the container registry credentials.

kubectl create secret docker-registry mcr \
    --docker-server=containerpreview.azurecr.io \
    --docker-username=<username> \
    --docker-password=<password> \
    --docker-email=<email-address>

Note

If you already have access to the containerpreview.azurecr.io container registry, you could create a Kubernetes secret using the generic flag instead. Consider the following command that executes against your Docker configuration JSON.

 kubectl create secret generic mcr \
     --from-file=.dockerconfigjson=~/.docker/config.json \
     --type=kubernetes.io/dockerconfigjson

The following output is printed to the console when the secret has been successfully created.

secret "mcr" created

To verify that the secret has been created, execute the kubectl get with the secrets flag.

kubectl get secrets

Executing the kubectl get secrets prints all the configured secrets.

NAME    TYPE                              DATA    AGE
mcr     kubernetes.io/dockerconfigjson    1       30s

Configure Helm chart values for deployment

Visit the Microsoft Helm Hub for all the publicly available helm charts offered by Microsoft. From the Microsoft Helm Hub, you'll find the Cognitive Services Speech On-Premises Chart. The Cognitive Services Speech On-Premises is the chart we'll install, but we must first create an config-values.yaml file with explicit configurations. Let's start by adding the Microsoft repository to our Helm instance.

helm repo add microsoft https://microsoft.github.io/charts/repo

Next, we'll configure our Helm chart values. Copy and paste the following YAML into a file named config-values.yaml. For more information on customizing the Cognitive Services Speech On-Premises Helm Chart, see customize helm charts. Replace the # {ENDPOINT_URI} and # {API_KEY} comments with your own values.

# These settings are deployment specific and users can provide customizations

# speech-to-text configurations
speechToText:
  enabled: true
  numberOfConcurrentRequest: 3
  optimizeForAudioFile: true
  image:
    registry: containerpreview.azurecr.io
    repository: microsoft/cognitive-services-speech-to-text
    tag: latest
    pullSecrets:
      - mcr # Or an existing secret
    args:
      eula: accept
      billing: # {ENDPOINT_URI}
      apikey: # {API_KEY}

# text-to-speech configurations
textToSpeech:
  enabled: true
  numberOfConcurrentRequest: 3
  optimizeForTurboMode: true
  image:
    registry: containerpreview.azurecr.io
    repository: microsoft/cognitive-services-text-to-speech
    tag: latest
    pullSecrets:
      - mcr # Or an existing secret
    args:
      eula: accept
      billing: # {ENDPOINT_URI}
      apikey: # {API_KEY}

Important

If the billing and apikey values are not provided, the services will expire after 15 min. Likewise, verification will fail as the services will not be available.

The Kubernetes package (Helm chart)

The Helm chart contains the configuration of which docker image(s) to pull from the containerpreview.azurecr.io container registry.

A Helm chart is a collection of files that describe a related set of Kubernetes resources. A single chart might be used to deploy something simple, like a memcached pod, or something complex, like a full web app stack with HTTP servers, databases, caches, and so on.

The provided Helm charts pull the docker images of the Speech service, both text-to-speech and the speech-to-text services from the containerpreview.azurecr.io container registry.

Install the Helm chart on the Kubernetes cluster

To install the helm chart we'll need to execute the helm install command, replacing the <config-values.yaml> with the appropriate path and file name argument. The microsoft/cognitive-services-speech-onpremise Helm chart referenced below is available on the Microsoft Helm Hub here.

helm install onprem-speech microsoft/cognitive-services-speech-onpremise \
    --version 0.1.1 \
    --values <config-values.yaml> 

Here is an example output you might expect to see from a successful install execution:

NAME:   onprem-speech
LAST DEPLOYED: Tue Jul  2 12:51:42 2019
NAMESPACE: default
STATUS: DEPLOYED

RESOURCES:
==> v1/Pod(related)
NAME                             READY  STATUS             RESTARTS  AGE
speech-to-text-7664f5f465-87w2d  0/1    Pending            0         0s
speech-to-text-7664f5f465-klbr8  0/1    ContainerCreating  0         0s
text-to-speech-56f8fb685b-4jtzh  0/1    ContainerCreating  0         0s
text-to-speech-56f8fb685b-frwxf  0/1    Pending            0         0s

==> v1/Service
NAME            TYPE          CLUSTER-IP    EXTERNAL-IP  PORT(S)       AGE
speech-to-text  LoadBalancer  10.0.252.106  <pending>    80:31811/TCP  1s
text-to-speech  LoadBalancer  10.0.125.187  <pending>    80:31247/TCP  0s

==> v1beta1/PodDisruptionBudget
NAME                                MIN AVAILABLE  MAX UNAVAILABLE  ALLOWED DISRUPTIONS  AGE
speech-to-text-poddisruptionbudget  N/A            20%              0                    1s
text-to-speech-poddisruptionbudget  N/A            20%              0                    1s

==> v1beta2/Deployment
NAME            READY  UP-TO-DATE  AVAILABLE  AGE
speech-to-text  0/2    2           0          0s
text-to-speech  0/2    2           0          0s

==> v2beta2/HorizontalPodAutoscaler
NAME                       REFERENCE                  TARGETS        MINPODS  MAXPODS  REPLICAS  AGE
speech-to-text-autoscaler  Deployment/speech-to-text  <unknown>/50%  2        10       0         0s
text-to-speech-autoscaler  Deployment/text-to-speech  <unknown>/50%  2        10       0         0s


NOTES:
cognitive-services-speech-onpremise has been installed!
Release is named onprem-speech

The Kubernetes deployment can take over several minutes to complete. To confirm that both pods and services are properly deployed and available, execute the following command:

kubectl get all

You should expect to see something similar to the following output:

NAME                                  READY     STATUS    RESTARTS   AGE
pod/speech-to-text-7664f5f465-87w2d   1/1       Running   0          34m
pod/speech-to-text-7664f5f465-klbr8   1/1       Running   0          34m
pod/text-to-speech-56f8fb685b-4jtzh   1/1       Running   0          34m
pod/text-to-speech-56f8fb685b-frwxf   1/1       Running   0          34m

NAME                     TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)        AGE
service/kubernetes       ClusterIP      10.0.0.1       <none>           443/TCP        3h
service/speech-to-text   LoadBalancer   10.0.252.106   52.162.123.151   80:31811/TCP   34m
service/text-to-speech   LoadBalancer   10.0.125.187   65.52.233.162    80:31247/TCP   34m

NAME                             DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/speech-to-text   2         2         2            2           34m
deployment.apps/text-to-speech   2         2         2            2           34m

NAME                                        DESIRED   CURRENT   READY     AGE
replicaset.apps/speech-to-text-7664f5f465   2         2         2         34m
replicaset.apps/text-to-speech-56f8fb685b   2         2         2         34m

NAME                                                            REFERENCE                   TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/speech-to-text-autoscaler   Deployment/speech-to-text   1%/50%    2         10        2          34m
horizontalpodautoscaler.autoscaling/text-to-speech-autoscaler   Deployment/text-to-speech   0%/50%    2         10        2          34m

Verify Helm deployment with Helm tests

The installed Helm charts define Helm tests, which serve as a convenience for verification. These tests validate service readiness. To verify both speech-to-text and text-to-speech services, we'll execute the Helm test command.

helm test onprem-speech

Important

These tests will fail if the POD status is not Running or if the deployment is not listed under the AVAILABLE column. Be patient as this can take over ten minutes to complete.

These tests will output various status results:

RUNNING: speech-to-text-readiness-test
PASSED: speech-to-text-readiness-test
RUNNING: text-to-speech-readiness-test
PASSED: text-to-speech-readiness-test

As an alternative to executing the helm tests, you could collect the External IP addresses and corresponding ports from the kubectl get all command. Using the IP and port, open a web browser and navigate to http://<external-ip>:<port>:/swagger/index.html to view the API swagger page(s).

Customize Helm charts

Helm charts are hierarchical. Being hierarchical allows for chart inheritance, it also caters to the concept of specificity, where settings that are more specific override inherited rules.

Speech (umbrella chart)

Values in the top-level "umbrella" chart override the corresponding sub-chart values. Therefore, all on-premises customized values should be added here.

Parameter Description Default
speechToText.enabled Whether the speech-to-text service is enabled. true
speechToText.verification.enabled Whether the helm test capability for speech-to-text service is enabled. true
speechToText.verification.image.registry The docker image repository that helm test uses to test speech-to-text service. Helm creates separate pod inside the cluster for testing and pulls the test-use image from this registry. docker.io
speechToText.verification.image.repository The docker image repository that helm test uses to test speech-to-text service. Helm test pod uses this repository to pull test-use image. antsu/on-prem-client
speechToText.verification.image.tag The docker image tag used with helm test for speech-to-text service. Helm test pod uses this tag to pull test-use image. latest
speechToText.verification.image.pullByHash Whether the test-use docker image is pulled by hash. If true, speechToText.verification.image.hash should be added, with valid image hash value. false
speechToText.verification.image.arguments The arguments used to execute the test-use docker image. Helm test pod passes these arguments to the container when running helm test. "./speech-to-text-client"
"./audio/whatstheweatherlike.wav"
"--expect=What's the weather like"
"--host=$(SPEECH_TO_TEXT_HOST)"
"--port=$(SPEECH_TO_TEXT_PORT)"
textToSpeech.enabled Whether the text-to-speech service is enabled. true
textToSpeech.verification.enabled Whether the helm test capability for speech-to-text service is enabled. true
textToSpeech.verification.image.registry The docker image repository that helm test uses to test speech-to-text service. Helm creates separate pod inside the cluster for testing and pulls the test-use image from this registry. docker.io
textToSpeech.verification.image.repository The docker image repository that helm test uses to test speech-to-text service. Helm test pod uses this repository to pull test-use image. antsu/on-prem-client
textToSpeech.verification.image.tag The docker image tag used with helm test for speech-to-text service. Helm test pod uses this tag to pull test-use image. latest
textToSpeech.verification.image.pullByHash Whether the test-use docker image is pulled by hash. If true, textToSpeech.verification.image.hash should be added, with valid image hash value. false
textToSpeech.verification.image.arguments The arguments to execute with the test-use docker image. The helm test pod passes these arguments to container when running helm test. "./text-to-speech-client"
"--input='What's the weather like'"
"--host=$(TEXT_TO_SPEECH_HOST)"
"--port=$(TEXT_TO_SPEECH_PORT)"

Speech-to-Text (sub-chart: charts/speechToText)

To override the "umbrella" chart, add the prefix speechToText. on any parameter to make it more specific. For example, it will override the corresponding parameter for example, speechToText.numberOfConcurrentRequest overrides numberOfConcurrentRequest.

Parameter Description Default
enabled Whether the speech-to-text service is enabled. false
numberOfConcurrentRequest The number of concurrent requests for the speech-to-text service. This chart automatically calculates CPU and memory resources, based on this value. 2
optimizeForAudioFile Whether the service needs to optimize for audio input via audio files. If true, this chart will allocate more CPU resource to service. false
image.registry The speech-to-text docker image registry. containerpreview.azurecr.io
image.repository The speech-to-text docker image repository. microsoft/cognitive-services-speech-to-text
image.tag The speech-to-text docker image tag. latest
image.pullSecrets The image secrets for pulling the speech-to-text docker image.
image.pullByHash Whether the docker image is pulled by hash. If true, image.hash is required. false
image.hash The speech-to-text docker image hash. Only used when image.pullByHash: true.
image.args.eula (required) Indicates you've accepted the license. The only valid value is accept
image.args.billing (required) The billing endpoint URI value is available on the Azure portal's Speech Overview page.
image.args.apikey (required) Used to track billing information.
service.type The Kubernetes service type of the speech-to-text service. See the Kubernetes service types instructions for more details and verify cloud provider support. LoadBalancer
service.port The port of the speech-to-text service. 80
service.annotations The speech-to-text annotations for the service metadata. Annotations are key value pairs.
annotations:
  some/annotation1: value1
  some/annotation2: value2
service.autoScaler.enabled Whether the Horizontal Pod Autoscaler is enabled. If true, the speech-to-text-autoscaler will be deployed in the Kubernetes cluster. true
service.podDisruption.enabled Whether the Pod Disruption Budget is enabled. If true, the speech-to-text-poddisruptionbudget will be deployed in the Kubernetes cluster. true

Sentiment analysis (sub-chart: charts/speechToText)

Starting with v2.2.0 of the speech-to-text container and v0.2.0 of the Helm chart, the following parameters are used for sentiment analysis using the Text Analytics API.

Parameter Description Values Default
textanalytics.enabled Whether the text-analytics service is enabled true/false false
textanalytics.image.registry The text-analytics docker image registry valid docker image registry
textanalytics.image.repository The text-analytics docker image repository valid docker image repository
textanalytics.image.tag The text-analytics docker image tag valid docker image tag
textanalytics.image.pullSecrets The image secrets for pulling text-analytics docker image valid secrets name
textanalytics.image.pullByHash Specifies if you are pulling docker image by hash. If yes, image.hash is required to have as well. If no, set it as 'false'. Default is false. true/false false
textanalytics.image.hash The text-analytics docker image hash. Only use it with image.pullByHash:true. valid docker image hash
textanalytics.image.args.eula One of the required arguments by text-analytics container, which indicates you've accepted the license. The value of this option must be: accept. accept, if you want to use the container
textanalytics.image.args.billing One of the required arguments by text-analytics container, which specifies the billing endpoint URI. The billing endpoint URI value is available on the Azure portal's Speech Overview page. valid billing endpoint URI
textanalytics.image.args.apikey One of the required arguments by text-analytics container, which is used to track billing information. valid apikey
textanalytics.cpuRequest The requested CPU for text-analytics container int 3000m
textanalytics.cpuLimit The limited CPU for text-analytics container 8000m
textanalytics.memoryRequest The requested memory for text-analytics container 3Gi
textanalytics.memoryLimit The limited memory for text-analytics container 8Gi
textanalytics.service.sentimentURISuffix The sentiment analysis URI suffix, the whole URI is in format "http://<service>:<port>/<sentimentURISuffix>". text/analytics/v3.0-preview/sentiment
textanalytics.service.type The type of text-analytics service in Kubernetes. See Kubernetes service types valid Kubernetes service type LoadBalancer
textanalytics.service.port The port of the text-analytics service int 50085
textanalytics.service.annotations The annotations users can add to text-analytics service metadata. For instance:
annotations:
some/annotation1: value1
some/annotation2: value2
annotations, one per each line
textanalytics.serivce.autoScaler.enabled Whether Horizontal Pod Autoscaler is enabled. If enabled, text-analytics-autoscaler will be deployed in the Kubernetes cluster true/false true
textanalytics.service.podDisruption.enabled Whether Pod Disruption Budget is enabled. If enabled, text-analytics-poddisruptionbudget will be deployed in the Kubernetes cluster true/false true

Text-to-Speech (sub-chart: charts/textToSpeech)

To override the "umbrella" chart, add the prefix textToSpeech. on any parameter to make it more specific. For example, it will override the corresponding parameter for example, textToSpeech.numberOfConcurrentRequest overrides numberOfConcurrentRequest.

Parameter Description Default
enabled Whether the text-to-speech service is enabled. false
numberOfConcurrentRequest The number of concurrent requests for the text-to-speech service. This chart automatically calculates CPU and memory resources, based on this value. 2
optimizeForTurboMode Whether the service needs to optimize for text input via text files. If true, this chart will allocate more CPU resource to service. false
image.registry The text-to-speech docker image registry. containerpreview.azurecr.io
image.repository The text-to-speech docker image repository. microsoft/cognitive-services-text-to-speech
image.tag The text-to-speech docker image tag. latest
image.pullSecrets The image secrets for pulling the text-to-speech docker image.
image.pullByHash Whether the docker image is pulled by hash. If true, image.hash is required. false
image.hash The text-to-speech docker image hash. Only used when image.pullByHash: true.
image.args.eula (required) Indicates you've accepted the license. The only valid value is accept
image.args.billing (required) The billing endpoint URI value is available on the Azure portal's Speech Overview page.
image.args.apikey (required) Used to track billing information.
service.type The Kubernetes service type of the text-to-speech service. See the Kubernetes service types instructions for more details and verify cloud provider support. LoadBalancer
service.port The port of the text-to-speech service. 80
service.annotations The text-to-speech annotations for the service metadata. Annotations are key value pairs.
annotations:
  some/annotation1: value1
  some/annotation2: value2
service.autoScaler.enabled Whether the Horizontal Pod Autoscaler is enabled. If true, the text-to-speech-autoscaler will be deployed in the Kubernetes cluster. true
service.podDisruption.enabled Whether the Pod Disruption Budget is enabled. If true, the text-to-speech-poddisruptionbudget will be deployed in the Kubernetes cluster. true

Next steps

For more details on installing applications with Helm in Azure Kubernetes Service (AKS), visit here.