Quickstart: Create captions with speech to text
Reference documentation | Package (NuGet) | Additional Samples on GitHub
In this quickstart, you run a console app to create captions with speech to text.
Prerequisites
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Set up the environment
The Speech SDK is available as a NuGet package and implements .NET Standard 2.0. You install the Speech SDK in the next section of this article, but first check the SDK installation guide for any more requirements.
You must also install GStreamer for compressed input audio.
Create captions from speech
Follow these steps to create a new console application and install the Speech SDK.
Open a command prompt where you want the new project, and create a console application with the .NET CLI.
dotnet new consoleInstall the Speech SDK in your new project with the .NET CLI.
dotnet add package Microsoft.CognitiveServices.SpeechCopy the scenarios/csharp/dotnetcore/captioning/ sample files from GitHub into your project directory. Overwrite the local copy of
Program.cswith the file that you copy from GitHub.Build the project with the .NET CLI.
dotnet buildRun the application with your preferred command line arguments. See usage and arguments for the available options. Here is an example:
dotnet run --key YourSubscriptionKey --region YourServiceRegion --input caption.this.mp4 --format any --output caption.output.txt - --srt --recognizing --threshold 5 --profanity mask --phrases "Contoso;Jessie;Rehaan"Replace
YourSubscriptionKeywith your Speech resource key, and replaceYourServiceRegionwith your Speech resource region, such aswestusornortheurope. Make sure that the paths specified by--inputand--outputare valid. Otherwise you must change the paths.Important
Remember to remove the key from your code when you're done, and never post it publicly. For production, use a secure way of storing and accessing your credentials like Azure Key Vault. See the Cognitive Services security article for more information.
The output file with complete captions is written to
caption.output.txt. Intermediate results are shown in the console:00:00:00,180 --> 00:00:01,600 Welcome to 00:00:00,180 --> 00:00:01,820 Welcome to applied 00:00:00,180 --> 00:00:02,420 Welcome to applied mathematics 00:00:00,180 --> 00:00:02,930 Welcome to applied mathematics course 00:00:00,180 --> 00:00:03,100 Welcome to applied Mathematics course 2 00:00:00,180 --> 00:00:03,230 Welcome to applied Mathematics course 201.
Usage and arguments
Usage: captioning --key <key> --region <region> --input <input file>
Connection options include:
--key: Your Speech resource key.--region REGION: Your Speech resource region. Examples:westus,northeurope
Input options include:
--input FILE: Input audio from file. The default input is the microphone.--format FORMAT: Use compressed audio format. Valid only with--file. Valid values arealaw,any,flac,mp3,mulaw, andogg_opus. The default value isany. To use awavfile, don't specify the format. This option is not available with the JavaScript captioning sample. For compressed audio files such as MP4, install GStreamer and see How to use compressed input audio.
Language options include:
--languages LANG1,LANG2: Enable language identification for specified languages. For example:en-US,ja-JP. This option is only available with the C++, C#, and Python captioning samples. For more information, see Language identification.
Recognition options include:
--recognizing: OutputRecognizingevent results. The default output isRecognizedevent results only. These are always written to the console, never to an output file. The--quietoption overrides this. For more information, see Get speech recognition results.
Accuracy options include:
--phrases PHRASE1;PHRASE2: You can specify a list of phrases to be recognized, such asContoso;Jessie;Rehaan. For more information, see Improve recognition with phrase list.
Output options include:
--help: Show this help and stop--output FILE: Output captions to the specifiedfile. This flag is required.--srt: Output captions in SRT (SubRip Text) format. The default format is WebVTT (Web Video Text Tracks). For more information about SRT and WebVTT caption file formats, see Caption output format.--quiet: Suppress console output, except errors.--profanity OPTION: Valid values: raw, remove, mask. For more information, see Profanity filter concepts.--threshold NUMBER: Set stable partial result threshold. The default value with this code example is3. For more information, see Get partial results concepts.
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.
Reference documentation | Package (NuGet) | Additional Samples on GitHub
In this quickstart, you run a console app to create captions with speech to text.
Prerequisites
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Set up the environment
The Speech SDK is available as a NuGet package and implements .NET Standard 2.0. You install the Speech SDK in the next section of this article, but first check the SDK installation guide for any more requirements
You must also install GStreamer for compressed input audio.
Create captions from speech
Follow these steps to create a new console application and install the Speech SDK.
Download or copy the scenarios/cpp/windows/captioning/ sample files from GitHub into a local directory.
Open the
captioning.slnsolution file in Visual Studio.Install the Speech SDK in your project with the NuGet package manager.
Install-Package Microsoft.CognitiveServices.SpeechOpen Project > Properties > General. Set Configuration to
All configurations. Set C++ Language Standard toISO C++17 Standard (/std:c++17).Open Build > Configuration Manager.
- On a 64-bit Windows installation, set Active solution platform to
x64. - On a 32-bit Windows installation, set Active solution platform to
x86.
- On a 64-bit Windows installation, set Active solution platform to
Open Project > Properties > Debugging. Enter your preferred command line arguments at Command Arguments. See usage and arguments for the available options. Here is an example:
--key YourSubscriptionKey --region YourServiceRegion --input caption.this.mp4 --format any --output caption.output.txt - --srt --recognizing --threshold 5 --profanity mask --phrases "Contoso;Jessie;Rehaan"Replace
YourSubscriptionKeywith your Speech resource key, and replaceYourServiceRegionwith your Speech resource region, such aswestusornortheurope. Make sure that the paths specified by--inputand--outputare valid. Otherwise you must change the paths.Important
Remember to remove the key from your code when you're done, and never post it publicly. For production, use a secure way of storing and accessing your credentials like Azure Key Vault. See the Cognitive Services security article for more information.
Build and run the console application. The output file with complete captions is written to
caption.output.txt. Intermediate results are shown in the console:00:00:00,180 --> 00:00:01,600 Welcome to 00:00:00,180 --> 00:00:01,820 Welcome to applied 00:00:00,180 --> 00:00:02,420 Welcome to applied mathematics 00:00:00,180 --> 00:00:02,930 Welcome to applied mathematics course 00:00:00,180 --> 00:00:03,100 Welcome to applied Mathematics course 2 00:00:00,180 --> 00:00:03,230 Welcome to applied Mathematics course 201.
Usage and arguments
Usage: captioning --key <key> --region <region> --input <input file>
Connection options include:
--key: Your Speech resource key.--region REGION: Your Speech resource region. Examples:westus,northeurope
Input options include:
--input FILE: Input audio from file. The default input is the microphone.--format FORMAT: Use compressed audio format. Valid only with--file. Valid values arealaw,any,flac,mp3,mulaw, andogg_opus. The default value isany. To use awavfile, don't specify the format. This option is not available with the JavaScript captioning sample. For compressed audio files such as MP4, install GStreamer and see How to use compressed input audio.
Language options include:
--languages LANG1,LANG2: Enable language identification for specified languages. For example:en-US,ja-JP. This option is only available with the C++, C#, and Python captioning samples. For more information, see Language identification.
Recognition options include:
--recognizing: OutputRecognizingevent results. The default output isRecognizedevent results only. These are always written to the console, never to an output file. The--quietoption overrides this. For more information, see Get speech recognition results.
Accuracy options include:
--phrases PHRASE1;PHRASE2: You can specify a list of phrases to be recognized, such asContoso;Jessie;Rehaan. For more information, see Improve recognition with phrase list.
Output options include:
--help: Show this help and stop--output FILE: Output captions to the specifiedfile. This flag is required.--srt: Output captions in SRT (SubRip Text) format. The default format is WebVTT (Web Video Text Tracks). For more information about SRT and WebVTT caption file formats, see Caption output format.--quiet: Suppress console output, except errors.--profanity OPTION: Valid values: raw, remove, mask. For more information, see Profanity filter concepts.--threshold NUMBER: Set stable partial result threshold. The default value with this code example is3. For more information, see Get partial results concepts.
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.
Reference documentation | Package (Go) | Additional Samples on GitHub
In this quickstart, you run a console app to create captions with speech to text.
Prerequisites
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Set up the environment
Check whether there are any platform-specific installation steps.
You must also install GStreamer for compressed input audio.
Create captions from speech
Follow these steps to create a new GO module and install the Speech SDK.
Download or copy the scenarios/go/captioning/ sample files from GitHub into a local directory.
Open a command prompt in the same directory as
captioning.go.Run the following commands to create a
go.modfile that links to the Speech SDK components hosted on GitHub:go mod init captioning go get github.com/Microsoft/cognitive-services-speech-sdk-goBuild the GO module.
go buildRun the application with your preferred command line arguments. See usage and arguments for the available options. Here is an example:
go run captioning --key YourSubscriptionKey --region YourServiceRegion --input caption.this.mp4 --format any --output caption.output.txt - --srt --recognizing --threshold 5 --profanity mask --phrases "Contoso;Jessie;Rehaan"Replace
YourSubscriptionKeywith your Speech resource key, and replaceYourServiceRegionwith your Speech resource region, such aswestusornortheurope. Make sure that the paths specified by--inputand--outputare valid. Otherwise you must change the paths.Important
Remember to remove the key from your code when you're done, and never post it publicly. For production, use a secure way of storing and accessing your credentials like Azure Key Vault. See the Cognitive Services security article for more information.
The output file with complete captions is written to
caption.output.txt. Intermediate results are shown in the console:00:00:00,180 --> 00:00:01,600 Welcome to 00:00:00,180 --> 00:00:01,820 Welcome to applied 00:00:00,180 --> 00:00:02,420 Welcome to applied mathematics 00:00:00,180 --> 00:00:02,930 Welcome to applied mathematics course 00:00:00,180 --> 00:00:03,100 Welcome to applied Mathematics course 2 00:00:00,180 --> 00:00:03,230 Welcome to applied Mathematics course 201.
Usage and arguments
Usage: go run captioning.go helper.go --key <key> --region <region> --input <input file>
Connection options include:
--key: Your Speech resource key.--region REGION: Your Speech resource region. Examples:westus,northeurope
Input options include:
--input FILE: Input audio from file. The default input is the microphone.--format FORMAT: Use compressed audio format. Valid only with--file. Valid values arealaw,any,flac,mp3,mulaw, andogg_opus. The default value isany. To use awavfile, don't specify the format. This option is not available with the JavaScript captioning sample. For compressed audio files such as MP4, install GStreamer and see How to use compressed input audio.
Language options include:
--languages LANG1,LANG2: Enable language identification for specified languages. For example:en-US,ja-JP. This option is only available with the C++, C#, and Python captioning samples. For more information, see Language identification.
Recognition options include:
--recognizing: OutputRecognizingevent results. The default output isRecognizedevent results only. These are always written to the console, never to an output file. The--quietoption overrides this. For more information, see Get speech recognition results.
Accuracy options include:
--phrases PHRASE1;PHRASE2: You can specify a list of phrases to be recognized, such asContoso;Jessie;Rehaan. For more information, see Improve recognition with phrase list.
Output options include:
--help: Show this help and stop--output FILE: Output captions to the specifiedfile. This flag is required.--srt: Output captions in SRT (SubRip Text) format. The default format is WebVTT (Web Video Text Tracks). For more information about SRT and WebVTT caption file formats, see Caption output format.--quiet: Suppress console output, except errors.--profanity OPTION: Valid values: raw, remove, mask. For more information, see Profanity filter concepts.--threshold NUMBER: Set stable partial result threshold. The default value with this code example is3. For more information, see Get partial results concepts.
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.
Reference documentation | Additional Samples on GitHub
In this quickstart, you run a console app to create captions with speech to text.
Prerequisites
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Set up the environment
Before you can do anything, you need to install the Speech SDK. The sample in this quickstart works with the Java Runtime.
- Install Apache Maven
- Create a new
pom.xmlfile in the root of your project, and copy the following into it:<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.microsoft.cognitiveservices.speech.samples</groupId> <artifactId>quickstart-eclipse</artifactId> <version>1.0.0-SNAPSHOT</version> <build> <sourceDirectory>src</sourceDirectory> <plugins> <plugin> <artifactId>maven-compiler-plugin</artifactId> <version>3.7.0</version> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> </build> <repositories> <repository> <id>maven-cognitiveservices-speech</id> <name>Microsoft Cognitive Services Speech Maven Repository</name> <url>https://azureai.azureedge.net/maven/</url> </repository> </repositories> <dependencies> <dependency> <groupId>com.microsoft.cognitiveservices.speech</groupId> <artifactId>client-sdk</artifactId> <version>1.23.0</version> </dependency> </dependencies> </project> - Install the Speech SDK and dependencies.
mvn clean dependency:copy-dependencies - You must also install GStreamer for compressed input audio.
Create captions from speech
Follow these steps to create a new console application and install the Speech SDK.
Copy the scenarios/java/jre/captioning/ sample files from GitHub into your project directory. The
pom.xmlfile that you created in environment setup must also be in this directory.Open a command prompt and run this command to compile the project files.
javac Captioning.java -cp ".;target\dependency\*"Run the application with your preferred command line arguments. See usage and arguments for the available options. Here is an example:
java -cp ".;target\dependency\*" Captioning --key YourSubscriptionKey --region YourServiceRegion --input caption.this.mp4 --format any --output caption.output.txt - --srt --recognizing --threshold 5 --profanity mask --phrases "Contoso;Jessie;Rehaan"Replace
YourSubscriptionKeywith your Speech resource key, and replaceYourServiceRegionwith your Speech resource region, such aswestusornortheurope. Make sure that the paths specified by--inputand--outputare valid. Otherwise you must change the paths.Important
Remember to remove the key from your code when you're done, and never post it publicly. For production, use a secure way of storing and accessing your credentials like Azure Key Vault. See the Cognitive Services security article for more information.
The output file with complete captions is written to
caption.output.txt. Intermediate results are shown in the console:00:00:00,180 --> 00:00:01,600 Welcome to 00:00:00,180 --> 00:00:01,820 Welcome to applied 00:00:00,180 --> 00:00:02,420 Welcome to applied mathematics 00:00:00,180 --> 00:00:02,930 Welcome to applied mathematics course 00:00:00,180 --> 00:00:03,100 Welcome to applied Mathematics course 2 00:00:00,180 --> 00:00:03,230 Welcome to applied Mathematics course 201.
Usage and arguments
Usage: java -cp ".;target\dependency\*" Captioning --key <key> --region <region> --input <input file>
Connection options include:
--key: Your Speech resource key.--region REGION: Your Speech resource region. Examples:westus,northeurope
Input options include:
--input FILE: Input audio from file. The default input is the microphone.--format FORMAT: Use compressed audio format. Valid only with--file. Valid values arealaw,any,flac,mp3,mulaw, andogg_opus. The default value isany. To use awavfile, don't specify the format. This option is not available with the JavaScript captioning sample. For compressed audio files such as MP4, install GStreamer and see How to use compressed input audio.
Language options include:
--languages LANG1,LANG2: Enable language identification for specified languages. For example:en-US,ja-JP. This option is only available with the C++, C#, and Python captioning samples. For more information, see Language identification.
Recognition options include:
--recognizing: OutputRecognizingevent results. The default output isRecognizedevent results only. These are always written to the console, never to an output file. The--quietoption overrides this. For more information, see Get speech recognition results.
Accuracy options include:
--phrases PHRASE1;PHRASE2: You can specify a list of phrases to be recognized, such asContoso;Jessie;Rehaan. For more information, see Improve recognition with phrase list.
Output options include:
--help: Show this help and stop--output FILE: Output captions to the specifiedfile. This flag is required.--srt: Output captions in SRT (SubRip Text) format. The default format is WebVTT (Web Video Text Tracks). For more information about SRT and WebVTT caption file formats, see Caption output format.--quiet: Suppress console output, except errors.--profanity OPTION: Valid values: raw, remove, mask. For more information, see Profanity filter concepts.--threshold NUMBER: Set stable partial result threshold. The default value with this code example is3. For more information, see Get partial results concepts.
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.
Reference documentation | Package (npm) | Additional Samples on GitHub | Library source code
In this quickstart, you run a console app to create captions with speech to text.
Prerequisites
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Set up the environment
Before you can do anything, you need to install the Speech SDK for JavaScript. If you just want the package name to install, run npm install microsoft-cognitiveservices-speech-sdk. For guided installation instructions, see the SDK installation guide.
Create captions from speech
Follow these steps to create a Node.js console application and install the Speech SDK.
Copy the scenarios/javascript/node/captioning/ sample files from GitHub into your project directory.
Open a command prompt in the same directory as
Captioning.js.Install the Speech SDK for JavaScript:
npm install microsoft-cognitiveservices-speech-sdkRun the application with your preferred command line arguments. See usage and arguments for the available options. Here is an example:
node captioning.js --key YourSubscriptionKey --region YourServiceRegion --input caption.this.wav --output caption.output.txt - --srt --recognizing --threshold 5 --profanity mask --phrases "Contoso;Jessie;Rehaan"Replace
YourSubscriptionKeywith your Speech resource key, and replaceYourServiceRegionwith your Speech resource region, such aswestusornortheurope. Make sure that the paths specified by--inputand--outputare valid. Otherwise you must change the paths.Note
The Speech SDK for JavaScript does not support compressed input audio. You must use a WAV file as shown in the example.
Important
Remember to remove the key from your code when you're done, and never post it publicly. For production, use a secure way of storing and accessing your credentials like Azure Key Vault. See the Cognitive Services security article for more information.
The output file with complete captions is written to
caption.output.txt. Intermediate results are shown in the console:00:00:00,180 --> 00:00:01,600 Welcome to 00:00:00,180 --> 00:00:01,820 Welcome to applied 00:00:00,180 --> 00:00:02,420 Welcome to applied mathematics 00:00:00,180 --> 00:00:02,930 Welcome to applied mathematics course 00:00:00,180 --> 00:00:03,100 Welcome to applied Mathematics course 2 00:00:00,180 --> 00:00:03,230 Welcome to applied Mathematics course 201.
Usage and arguments
Usage: node captioning.js --key <key> --region <region> --input <input file>
Connection options include:
--key: Your Speech resource key.--region REGION: Your Speech resource region. Examples:westus,northeurope
Input options include:
--input FILE: Input audio from file. The default input is the microphone.--format FORMAT: Use compressed audio format. Valid only with--file. Valid values arealaw,any,flac,mp3,mulaw, andogg_opus. The default value isany. To use awavfile, don't specify the format. This option is not available with the JavaScript captioning sample. For compressed audio files such as MP4, install GStreamer and see How to use compressed input audio.
Language options include:
--languages LANG1,LANG2: Enable language identification for specified languages. For example:en-US,ja-JP. This option is only available with the C++, C#, and Python captioning samples. For more information, see Language identification.
Recognition options include:
--recognizing: OutputRecognizingevent results. The default output isRecognizedevent results only. These are always written to the console, never to an output file. The--quietoption overrides this. For more information, see Get speech recognition results.
Accuracy options include:
--phrases PHRASE1;PHRASE2: You can specify a list of phrases to be recognized, such asContoso;Jessie;Rehaan. For more information, see Improve recognition with phrase list.
Output options include:
--help: Show this help and stop--output FILE: Output captions to the specifiedfile. This flag is required.--srt: Output captions in SRT (SubRip Text) format. The default format is WebVTT (Web Video Text Tracks). For more information about SRT and WebVTT caption file formats, see Caption output format.--quiet: Suppress console output, except errors.--profanity OPTION: Valid values: raw, remove, mask. For more information, see Profanity filter concepts.--threshold NUMBER: Set stable partial result threshold. The default value with this code example is3. For more information, see Get partial results concepts.
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.
Reference documentation | Package (Download) | Additional Samples on GitHub
The Speech SDK for Objective-C does support getting speech recognition results for captioning, but we haven't yet included a guide here. Please select another programming language to get started and learn about the concepts, or see the Objective-C reference and samples linked from the beginning of this article.
Reference documentation | Package (Download) | Additional Samples on GitHub
The Speech SDK for Swift does support getting speech recognition results for captioning, but we haven't yet included a guide here. Please select another programming language to get started and learn about the concepts, or see the Swift reference and samples linked from the beginning of this article.
Reference documentation | Package (PyPi) | Additional Samples on GitHub
In this quickstart, you run a console app to create captions with speech to text.
Prerequisites
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Set up the environment
The Speech SDK for Python is available as a Python Package Index (PyPI) module. The Speech SDK for Python is compatible with Windows, Linux, and macOS.
- You must install the Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017, 2019, or 2022 for your platform. Installing this package for the first time might require a restart.
- On Linux, you must use the x64 target architecture.
- Install a version of Python from 3.7 to 3.10. First check the SDK installation guide for any more requirements
- You must also install GStreamer for compressed input audio.
Create captions from speech
Follow these steps to create a new console application.
Download or copy the scenarios/python/console/captioning/ sample files from GitHub into a local directory.
Open a command prompt in the same directory as
captioning.py.Run this command to install the Speech SDK:
pip install azure-cognitiveservices-speechRun the application with your preferred command line arguments. See usage and arguments for the available options. Here is an example:
python captioning.py --key YourSubscriptionKey --region YourServiceRegion --input caption.this.mp4 --format any --output caption.output.txt - --srt --recognizing --threshold 5 --profanity mask --phrases "Contoso;Jessie;Rehaan"Replace
YourSubscriptionKeywith your Speech resource key, and replaceYourServiceRegionwith your Speech resource region, such aswestusornortheurope. Make sure that the paths specified by--inputand--outputare valid. Otherwise you must change the paths.Important
Remember to remove the key from your code when you're done, and never post it publicly. For production, use a secure way of storing and accessing your credentials like Azure Key Vault. See the Cognitive Services security article for more information.
The output file with complete captions is written to
caption.output.txt. Intermediate results are shown in the console:00:00:00,180 --> 00:00:01,600 Welcome to 00:00:00,180 --> 00:00:01,820 Welcome to applied 00:00:00,180 --> 00:00:02,420 Welcome to applied mathematics 00:00:00,180 --> 00:00:02,930 Welcome to applied mathematics course 00:00:00,180 --> 00:00:03,100 Welcome to applied Mathematics course 2 00:00:00,180 --> 00:00:03,230 Welcome to applied Mathematics course 201.
Usage and arguments
Usage: python captioning.py --key <key> --region <region> --input <input file>
Connection options include:
--key: Your Speech resource key.--region REGION: Your Speech resource region. Examples:westus,northeurope
Input options include:
--input FILE: Input audio from file. The default input is the microphone.--format FORMAT: Use compressed audio format. Valid only with--file. Valid values arealaw,any,flac,mp3,mulaw, andogg_opus. The default value isany. To use awavfile, don't specify the format. This option is not available with the JavaScript captioning sample. For compressed audio files such as MP4, install GStreamer and see How to use compressed input audio.
Language options include:
--languages LANG1,LANG2: Enable language identification for specified languages. For example:en-US,ja-JP. This option is only available with the C++, C#, and Python captioning samples. For more information, see Language identification.
Recognition options include:
--recognizing: OutputRecognizingevent results. The default output isRecognizedevent results only. These are always written to the console, never to an output file. The--quietoption overrides this. For more information, see Get speech recognition results.
Accuracy options include:
--phrases PHRASE1;PHRASE2: You can specify a list of phrases to be recognized, such asContoso;Jessie;Rehaan. For more information, see Improve recognition with phrase list.
Output options include:
--help: Show this help and stop--output FILE: Output captions to the specifiedfile. This flag is required.--srt: Output captions in SRT (SubRip Text) format. The default format is WebVTT (Web Video Text Tracks). For more information about SRT and WebVTT caption file formats, see Caption output format.--quiet: Suppress console output, except errors.--profanity OPTION: Valid values: raw, remove, mask. For more information, see Profanity filter concepts.--threshold NUMBER: Set stable partial result threshold. The default value with this code example is3. For more information, see Get partial results concepts.
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.
In this quickstart, you run a console app to create captions with speech to text.
Prerequisites
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Set up the environment
Follow these steps and see the Speech CLI quickstart for additional requirements for your platform.
Install the Speech CLI via the .NET CLI by entering this command:
dotnet tool install --global Microsoft.CognitiveServices.Speech.CLIConfigure your Speech resource key and region, by running the following commands. Replace
SUBSCRIPTION-KEYwith your Speech resource key, and replaceREGIONwith your Speech resource region:spx config @key --set SUBSCRIPTION-KEY spx config @region --set REGION
You must also install GStreamer for compressed input audio.
Create captions from speech
With the Speech CLI, you can output both SRT (SubRip Text) and WebVTT (Web Video Text Tracks) captions from any type of media that contains audio.
To recognize audio from a file and output both WebVtt (vtt) and SRT (srt) captions, follow these steps.
Make sure that you have an input file named
caption.this.mp4in the path.Run the following command to output captions from the video file:
spx recognize --file caption.this.mp4 --format any --output vtt file - --output srt file - --output each file - @output.each.detailed --property SpeechServiceResponse_StablePartialResultThreshold=5 --profanity masked --phrases "Constoso;Jessie;Rehaan"The SRT and WebVTT captions are output to the console as shown here:
1 00:00:00,180 --> 00:00:03,230 Welcome to applied Mathematics course 201. WEBVTT 00:00:00.180 --> 00:00:03.230 Welcome to applied Mathematics course 201. { "ResultId": "561a0ea00cc14bb09bd294357df3270f", "Duration": "00:00:03.0500000" }
Usage and arguments
Here are details about the optional arguments from the previous command:
--file caption.this.mp4 --format any: Input audio from file. The default input is the microphone. For compressed audio files such as MP4, install GStreamer and see How to use compressed input audio.--output vtt file -and--output srt file -: Outputs WebVTT and SRT captions to standard output. For more information about SRT and WebVTT caption file formats, see Caption output format. For more information about the--outputargument, see Speech CLI output options.@output.each.detailed: Outputs event results with text, offset, and duration. For more information, see Get speech recognition results.--property SpeechServiceResponse_StablePartialResultThreshold=5: You can request that the Speech service return fewerRecognizingevents that are more accurate. In this example, the Speech service must affirm recognition of a word at least five times before returning the partial results to you. For more information, see Get partial results concepts.--profanity masked: You can specify whether to mask, remove, or show profanity in recognition results. For more information, see Profanity filter concepts.--phrases "Constoso;Jessie;Rehaan": You can specify a list of phrases to be recognized, such as Contoso, Jessie, and Rehaan. For more information, see Improve recognition with phrase list.
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.
Next steps
Povratne informacije
Pošalјite i prikažite povratne informacije za