Get started with the Azure Speech CLI
In this article, you'll learn how to use the Speech CLI, a command-line interface, to access Speech services like speech to text, text to speech, and speech translation without writing code. The Speech CLI is production ready and can be used to automate simple workflows in the Speech service, using
.bat or shell scripts.
This article assumes that you have working knowledge of the command prompt, terminal, or PowerShell.
Download and install
Follow these steps to install the Speech CLI on Windows:
On Windows, you need the Microsoft Visual C++ Redistributable for Visual Studio 2019 for your platform. Installing this for the first time may require a restart.
Install .NET Core 3.1 SDK.
Install the Speech CLI using NuGet by entering this command:
dotnet tool install --global Microsoft.CognitiveServices.Speech.CLI
spx to see help for the Speech CLI.
As an alternative to NuGet, you can download and extract the Speech CLI for Windows as a zip file.
On Windows, the Speech CLI can only show fonts available to the command prompt on the local computer. Windows Terminal supports all fonts produced interactively by the Speech CLI.
If you output to a file, a text editor like Notepad or a web browser like Microsoft Edge can also show all fonts.
Create subscription config
To start using the Speech CLI, you need to enter your Speech subscription key and region identifier.
Get these credentials by following steps in Try the Speech service for free.
Once you have your subscription key and region identifier (ex.
westus), run the following commands.
spx config @key --set SUBSCRIPTION-KEY spx config @region --set REGION
Your subscription authentication is now stored for future SPX requests. If you need to remove either of these stored values, run
spx config @region --clear or
spx config @key --clear.
This section shows a few basic SPX commands that are often useful for first-time testing and experimentation. Start by viewing the help built in to the tool by running the following command.
You can search help topics by keyword. For example, enter the following command to see a list of Speech CLI usage examples:
spx help find --topics "examples"
Enter the following command to see options for the recognize command:
spx help recognize
Additional help commands listed in the right column. You can enter these commands to get detailed help about subcommands.
Speech to text (speech recognition)
Let's use the Speech CLI to convert speech to text (speech recognition) using your system's default microphone. After entering the command, SPX will begin listening for audio on the current active input device, and stop when you press ENTER. The recorded speech is then recognized and converted to text in the console output.
If you are using a Docker container,
--microphone will not work.
Run this command:
spx recognize --microphone
With the Speech CLI, you can also recognize speech from an audio file.
spx recognize --file /path/to/file.wav
If you're recognizing speech from an audio file in a Docker container, make sure that the audio file is located in the directory that you mounted in the previous step.
Don't forget, if you get stuck or want to learn more about the Speech CLI's recognition options, just type:
spx help recognize
Text to speech (speech synthesis)
Running the following command will take text as input, and output the synthesized speech to the current active output device (for example, your computer speakers).
spx synthesize --text "Testing synthesis using the Speech CLI" --speakers
You can also save the synthesized output to file. In this example, we'll create a file named
my-sample.wav in the directory that the command is run.
spx synthesize --text "Enjoy using the Speech CLI." --audio output my-sample.wav
These examples presume that you're testing in English. However, we support speech synthesis in many languages. You can pull down a full list of voices with this command, or by visiting the language support page.
spx synthesize --voices
Here's how you use one of the voices you've discovered.
spx synthesize --text "Bienvenue chez moi." --voice fr-CA-Caroline --speakers
Don't forget, if you get stuck or want to learn more about the Speech CLI's synthesis options, just type:
spx help synthesize
Speech to text translation
With the Speech CLI, you can also do speech to text translation. Run this command to capture audio from your default microphone, and output the translation as text. Keep in mind that you need to supply the
target language with the
spx translate --microphone --source en-US --target ru-RU
When translating into multiple languages, separate language codes with
spx translate --microphone --source en-US --target ru-RU;fr-FR;es-ES
If you want to save the output of your translation, use the
--output flag. In this example, you'll also read from a file.
spx translate --file /some/file/path/input.wav --source en-US --target ru-RU --output file /some/file/path/russian_translation.txt
See the language and locale article for a list of all supported languages with their corresponding locale codes.
Don't forget, if you get stuck or want to learn more about the Speech CLI's translation options, just type:
spx help translate