Get started with the Azure Speech CLI

In this article, you'll learn how to use the Speech CLI, a command-line interface, to access Speech services like speech to text, text to speech, and speech translation without writing code. The Speech CLI is production ready and can be used to automate simple workflows in the Speech service, using .bat or shell scripts.

This article assumes that you have working knowledge of the command prompt, terminal, or PowerShell.

Download and install

Follow these steps to install the Speech CLI on Windows:

  1. On Windows, you need the Microsoft Visual C++ Redistributable for Visual Studio 2019 for your platform. Installing this for the first time may require a restart.

  2. Install .NET Core 3.1 SDK.

  3. Install the Speech CLI using NuGet by entering this command:

    dotnet tool install --global Microsoft.CognitiveServices.Speech.CLI
    

Type spx to see help for the Speech CLI.

Note

As an alternative to NuGet, you can download and extract the Speech CLI for Windows as a zip file.

Font limitations

On Windows, the Speech CLI can only show fonts available to the command prompt on the local computer. Windows Terminal supports all fonts produced interactively by the Speech CLI.

If you output to a file, a text editor like Notepad or a web browser like Microsoft Edge can also show all fonts.

Create subscription config

To start using the Speech CLI, you need to enter your Speech subscription key and region identifier. Get these credentials by following steps in Try the Speech service for free. Once you have your subscription key and region identifier (ex. eastus, westus), run the following commands.

spx config @key --set SUBSCRIPTION-KEY
spx config @region --set REGION

Your subscription authentication is now stored for future SPX requests. If you need to remove either of these stored values, run spx config @region --clear or spx config @key --clear.

Basic usage

This section shows a few basic SPX commands that are often useful for first-time testing and experimentation. Start by viewing the help built in to the tool by running the following command.

spx

You can search help topics by keyword. For example, enter the following command to see a list of Speech CLI usage examples:

spx help find --topics "examples"

Enter the following command to see options for the recognize command:

spx help recognize

Additional help commands listed in the right column. You can enter these commands to get detailed help about subcommands.

Speech to text (speech recognition)

Let's use the Speech CLI to convert speech to text (speech recognition) using your system's default microphone. After entering the command, SPX will begin listening for audio on the current active input device, and stop when you press ENTER. The recorded speech is then recognized and converted to text in the console output.

Important

If you are using a Docker container, --microphone will not work.

Run this command:

spx recognize --microphone

With the Speech CLI, you can also recognize speech from an audio file.

spx recognize --file /path/to/file.wav

Tip

If you're recognizing speech from an audio file in a Docker container, make sure that the audio file is located in the directory that you mounted in the previous step.

Don't forget, if you get stuck or want to learn more about the Speech CLI's recognition options, just type:

spx help recognize

Text to speech (speech synthesis)

Running the following command will take text as input, and output the synthesized speech to the current active output device (for example, your computer speakers).

spx synthesize --text "Testing synthesis using the Speech CLI" --speakers

You can also save the synthesized output to file. In this example, we'll create a file named my-sample.wav in the directory that the command is run.

spx synthesize --text "Enjoy using the Speech CLI." --audio output my-sample.wav

These examples presume that you're testing in English. However, we support speech synthesis in many languages. You can pull down a full list of voices with this command, or by visiting the language support page.

spx synthesize --voices

Here's how you use one of the voices you've discovered.

spx synthesize --text "Bienvenue chez moi." --voice fr-CA-Caroline --speakers

Don't forget, if you get stuck or want to learn more about the Speech CLI's synthesis options, just type:

spx help synthesize

Speech to text translation

With the Speech CLI, you can also do speech to text translation. Run this command to capture audio from your default microphone, and output the translation as text. Keep in mind that you need to supply the source and target language with the translate command.

spx translate --microphone --source en-US --target ru-RU

When translating into multiple languages, separate language codes with ;.

spx translate --microphone --source en-US --target ru-RU;fr-FR;es-ES

If you want to save the output of your translation, use the --output flag. In this example, you'll also read from a file.

spx translate --file /some/file/path/input.wav --source en-US --target ru-RU --output file /some/file/path/russian_translation.txt

Note

See the language and locale article for a list of all supported languages with their corresponding locale codes.

Don't forget, if you get stuck or want to learn more about the Speech CLI's translation options, just type:

spx help translate

Next steps