您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

语音翻译入门Get started with speech translation

语音服务的核心功能之一是能够识别人类语音并将其翻译成其他语言。One of the core features of the Speech service is the ability to recognize human speech and translate it to other languages. 本快速入门介绍如何在应用和产品中使用语音 SDK 来执行高质量的语音翻译。In this quickstart you learn how to use the Speech SDK in your apps and products to perform high-quality speech translation. 此快速入门介绍以下主题:This quickstart covers topics including:

  • 将语音翻译为文本Translating speech-to-text
  • 将语音翻译为多种目标语言Translating speech to multiple target languages
  • 直接进行语音转语音翻译Performing direct speech-to-speech translation

跳转到 GitHub 上的示例Skip to samples on GitHub

如果要直接跳到示例代码,请参阅 GitHub 上的 C# 快速入门示例If you want to skip straight to sample code, see the C# quickstart samples on GitHub.

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

安装语音 SDKInstall the Speech SDK

需要先安装语音 SDK,然后才能执行任何操作。Before you can do anything, you'll need to install the Speech SDK. 根据你的平台,按照“关于语音 SDK”一文的获取语音 SDK 部分中的说明进行操作。Depending on your platform, follow the instructions under the Get the Speech SDK section of the About the Speech SDK article.

导入依赖项Import dependencies

若要运行本文中的示例,请在 Program.cs 文件的顶部添加以下 using 语句。To run the examples in this article, include the following using statements at the top of the Program.cs file.

using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Microsoft.CognitiveServices.Speech.Translation;

敏感数据和环境变量Sensitive data and environment variables

本文中的示例源代码依赖于用于存储敏感数据的环境变量,如语音资源订阅密钥和区域。The example source code in this article depends on environment variables for storing sensitive data, such as the Speech resource subscription key and region. Program 类包含从主机环境变量(即 SPEECH__SUBSCRIPTION__KEYSPEECH__SERVICE__REGION)分配的两个 static readonly string 值。The Program class contains two static readonly string values that are assigned from the host machines environment variables, namely SPEECH__SUBSCRIPTION__KEY and SPEECH__SERVICE__REGION. 这两个字段都在类范围内,因此可以在类的方法主体中访问它们。Both of these fields are at the class scope, making them accessible within method bodies of the class. 有关环境变量的详细信息,请参阅环境变量和应用程序配置For more information on environment variables, see environment variables and application configuration.

public class Program
{
    static readonly string SPEECH__SUBSCRIPTION__KEY =
        Environment.GetEnvironmentVariable(nameof(SPEECH__SUBSCRIPTION__KEY));
    
    static readonly string SPEECH__SERVICE__REGION =
        Environment.GetEnvironmentVariable(nameof(SPEECH__SERVICE__REGION));

    static Task Main() => Task.CompletedTask;
}

创建语音翻译配置Create a speech translation configuration

若要使用语音 SDK 调用语音服务,需要创建 SpeechTranslationConfigTo call the Speech service using the Speech SDK, you need to create a SpeechTranslationConfig. 此类包含有关你的订阅的信息,例如你的密钥和关联的区域、终结点、主机或授权令牌。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.

提示

无论你是要执行语音识别、语音合成、翻译,还是意向识别,都需要创建一个配置。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

可以通过以下几种方法初始化 SpeechTranslationConfigThere are a few ways that you can initialize a SpeechTranslationConfig:

  • 使用订阅:传入密钥和关联的区域。With a subscription: pass in a key and the associated region.
  • 使用终结点:传入语音服务终结点。With an endpoint: pass in a Speech service endpoint. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用主机:传入主机地址。With a host: pass in a host address. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用授权令牌:传入授权令牌和关联的区域。With an authorization token: pass in an authorization token and the associated region.

让我们看看如何使用密钥和区域创建 SpeechTranslationConfigLet's take a look at how a SpeechTranslationConfig is created using a key and region. 按照免费试用语音服务中的以下步骤获取这些凭据。Get these credentials by following steps in Try the Speech service for free.

public class Program
{
    static readonly string SPEECH__SUBSCRIPTION__KEY =
        Environment.GetEnvironmentVariable(nameof(SPEECH__SUBSCRIPTION__KEY));
    
    static readonly string SPEECH__SERVICE__REGION =
        Environment.GetEnvironmentVariable(nameof(SPEECH__SERVICE__REGION));

    static Task Main() => TranslateSpeechAsync();

    static async Task TranslateSpeechAsync()
    {
        var translationConfig =
            SpeechTranslationConfig.FromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);
    }
}

更改源语言Change source language

语音翻译的一项常见任务是指定输入(或源)语言。One common task of speech translation is specifying the input (or source) language. 让我们看看如何将输入语言更改为意大利语。Let's take a look at how you would change the input language to Italian. 在代码中与 SpeechTranslationConfig 实例交互,为 SpeechRecognitionLanguage 属性赋值。In your code, interact with the SpeechTranslationConfig instance, assigning to the SpeechRecognitionLanguage property.

static async Task TranslateSpeechAsync()
{
    var translationConfig =
        SpeechTranslationConfig.FromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);

    // Source (input) language
    translationConfig.SpeechRecognitionLanguage = "it-IT";
}

SpeechRecognitionLanguage 属性需要语言区域设置格式字符串。The SpeechRecognitionLanguage property expects a language-locale format string. 可以提供受支持的区域设置/语言的列表中“区域设置”列中的任何值 。You can provide any value in the Locale column in the list of supported locales/languages.

添加翻译语言Add translation language

语音翻译的另一项常见任务是指定目标翻译语言,至少需要一种语言,但支持多种语言。Another common task of speech translation is to specify target translation languages, at least one is required but multiples are supported. 以下代码片段将法语和德语设置成了目标翻译语言。The following code snippet sets both French and German as translation language targets.

static async Task TranslateSpeechAsync()
{
    var translationConfig =
        SpeechTranslationConfig.FromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);

    translationConfig.SpeechRecognitionLanguage = "it-IT";
    
    // Translate to languages. See, https://aka.ms/speech/sttt-languages
    translationConfig.AddTargetLanguage("fr");
    translationConfig.AddTargetLanguage("de");
}

每次调用 AddTargetLanguage 时,都会指定一种新的目标翻译语言。With every call to AddTargetLanguage, a new target translation language is specified. 换言之,根据源语言识别语音后,就会在接着进行的翻译操作过程中提供每项目标翻译。In other words, when speech is recognized from the source language, each target translation is available as part of the resulting translation operation.

初始化翻译识别器Initialize a translation recognizer

创建 SpeechTranslationConfig 后,下一步是初始化 TranslationRecognizerAfter you've created a SpeechTranslationConfig, the next step is to initialize a TranslationRecognizer. 初始化 TranslationRecognizer 时,需要向其传递 translationConfigWhen you initialize a TranslationRecognizer, you'll need to pass it your translationConfig. 配置对象会提供验证你的请求时语音服务所需的凭据。The configuration object provides the credentials that the speech service requires to validate your request.

如果使用设备的默认麦克风识别语音,则 TranslationRecognizer 应如下所示:If you're recognizing speech using your device's default microphone, here's what the TranslationRecognizer should look like:

static async Task TranslateSpeechAsync()
{
    var translationConfig =
        SpeechTranslationConfig.FromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);

    var fromLanguage = "en-US";
    var toLanguages = new List<string> { "it", "fr", "de" };
    translationConfig.SpeechRecognitionLanguage = fromLanguage;
    toLanguages.ForEach(translationConfig.AddTargetLanguage);

    using var recognizer = new TranslationRecognizer(translationConfig);
}

如果要指定音频输入设备,则需要创建一个 AudioConfig 并在初始化 TranslationRecognizer 时提供 audioConfig 参数。If you want to specify the audio input device, then you'll need to create an AudioConfig and provide the audioConfig parameter when initializing your TranslationRecognizer.

首先,引用 AudioConfig 对象,如下所示:First, you'll reference the AudioConfig object as follows:

static async Task TranslateSpeechAsync()
{
    var translationConfig =
        SpeechTranslationConfig.FromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);
    
    var fromLanguage = "en-US";
    var toLanguages = new List<string> { "it", "fr", "de" };
    translationConfig.SpeechRecognitionLanguage = fromLanguage;
    toLanguages.ForEach(translationConfig.AddTargetLanguage);

    using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
    using var recognizer = new TranslationRecognizer(translationConfig, audioConfig);
}

如果要提供音频文件而不是使用麦克风,则仍需要提供 audioConfigIf you want to provide an audio file instead of using a microphone, you'll still need to provide an audioConfig. 但是,在创建 AudioConfig(而不是调用 FromDefaultMicrophoneInput)时,将调用 FromWavFileInput 并传递 filename 参数。However, when you create an AudioConfig, instead of calling FromDefaultMicrophoneInput, you'll call FromWavFileInput and pass the filename parameter.

static async Task TranslateSpeechAsync()
{
    var translationConfig =
        SpeechTranslationConfig.FromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);
    
    var fromLanguage = "en-US";
    var toLanguages = new List<string> { "it", "fr", "de" };
    translationConfig.SpeechRecognitionLanguage = fromLanguage;
    toLanguages.ForEach(translationConfig.AddTargetLanguage);

    using var audioConfig = AudioConfig.FromWavFileInput("YourAudioFile.wav");
    using var recognizer = new TranslationRecognizer(translationConfig, audioConfig);
}

翻译语音Translate speech

为了翻译语音,语音 SDK 依赖于麦克风或音频文件输入。To translate speech, the Speech SDK relies on a microphone or an audio file input. 在语音翻译之前先进行语音识别。Speech recognition occurs before speech translation. 初始化所有对象后,调用识别一次的函数并获取结果。After all objects have been initialized, call the recognize-once function and get the result.

static async Task TranslateSpeechAsync()
{
    var translationConfig =
        SpeechTranslationConfig.FromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);
    
    var fromLanguage = "en-US";
    var toLanguages = new List<string> { "it", "fr", "de" };
    translationConfig.SpeechRecognitionLanguage = fromLanguage;
    toLanguages.ForEach(translationConfig.AddTargetLanguage);

    using var recognizer = new TranslationRecognizer(translationConfig);

    Console.Write($"Say something in '{fromLanguage}' and ");
    Console.WriteLine($"we'll translate into '{string.Join("', '", toLanguages)}'.\n");
    
    var result = await recognizer.RecognizeOnceAsync();
    if (result.Reason == ResultReason.TranslatedSpeech)
    {
        Console.WriteLine($"Recognized: \"{result.Text}\":");
        foreach (var (language, translation) in result.Translations)
        {
            Console.WriteLine($"Translated into '{language}': {translation}");
        }
    }
}

有关语音转文本的详细信息,请参阅语音识别基础知识For more information about speech-to-text, see the basics of speech recognition.

合成翻译Synthesize translations

成功进行语音识别和翻译后,结果会包含字典中的所有翻译。After a successful speech recognition and translation, the result contains all the translations in a dictionary. Translations 字典键是目标翻译语言,其值是已翻译的文本。The Translations dictionary key is the target translation language and the value is the translated text. 可以翻译已识别的语音,然后以另一种语言进行合成(语音转语音)。Recognized speech can be translated, then synthesized in a different language (speech-to-speech).

基于事件的合成Event-based synthesis

TranslationRecognizer 对象公开了 Synthesizing 事件。The TranslationRecognizer object exposes a Synthesizing event. 该事件触发多次,并提供一种从翻译识别结果检索合成音频的机制。The event fires several times, and provides a mechanism to retrieve the synthesized audio from the translation recognition result. 若要翻译为多种语言,请参阅手动合成If you're translating to multiple languages, see manual synthesis. 通过分配 VoiceName 指定合成语音,并为 Synthesizing 事件提供事件处理程序,获取音频。Specify the synthesis voice by assigning a VoiceName and provide an event handler for the Synthesizing event, get the audio. 以下示例将已翻译的音频另存为 .wav 文件。The following example saves the translated audio as a .wav file.

重要

基于事件的合成仅适用于单项翻译,请勿 添加多种目标翻译语言。The event-based synthesis only works with a single translation, do not add multiple target translation languages. 此外,VoiceName 应与目标翻译语言相同(例如,"de" 可映射到 "de-DE-Hedda")。Additionally, the VoiceName should be the same language as the target translation language, for example; "de" could map to "de-DE-Hedda".

static async Task TranslateSpeechAsync()
{
    var translationConfig =
        SpeechTranslationConfig.FromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);
    
    var fromLanguage = "en-US";
    var toLanguage = "de";
    translationConfig.SpeechRecognitionLanguage = fromLanguage;
    translationConfig.AddTargetLanguage(toLanguage);

    // See: https://aka.ms/speech/sdkregion#standard-and-neural-voices
    translationConfig.VoiceName = "de-DE-Hedda";

    using var recognizer = new TranslationRecognizer(translationConfig);

    recognizer.Synthesizing += (_, e) =>
    {
        var audio = e.Result.GetAudio();
        Console.WriteLine($"Audio synthesized: {audio.Length:#,0} byte(s) {(audio.Length == 0 ? "(Complete)" : "")}");

        if (audio.Length > 0)
        {
            File.WriteAllBytes("YourAudioFile.wav", audio);
        }
    };

    Console.Write($"Say something in '{fromLanguage}' and ");
    Console.WriteLine($"we'll translate into '{toLanguage}'.\n");

    var result = await recognizer.RecognizeOnceAsync();
    if (result.Reason == ResultReason.TranslatedSpeech)
    {
        Console.WriteLine($"Recognized: \"{result.Text}\"");
        Console.WriteLine($"Translated into '{toLanguage}': {result.Translations[toLanguage]}");
    }
}

手动合成Manual synthesis

Translations 字典可用于从翻译文本合成音频。The Translations dictionary can be used to synthesize audio from the translation text. 循环访问每项翻译,并合成翻译。Iterate through each translation, and synthesize the translation. 创建 SpeechSynthesizer 实例时,SpeechConfig 对象需要将其 SpeechSynthesisVoiceName 属性设为所需的语音。When creating a SpeechSynthesizer instance, the SpeechConfig object needs to have its SpeechSynthesisVoiceName property set to the desired voice. 以下示例翻译为五种语言,然后将每种翻译合成为相应神经语言的音频文件。The following example translates to five languages, and each translation is then synthesized to an audio file in the corresponding neural language.

static async Task TranslateSpeechAsync()
{
    var translationConfig =
        SpeechTranslationConfig.FromSubscription(SPEECH__SERVICE__KEY, SPEECH__SERVICE__REGION);

    var fromLanguage = "en-US";
    var toLanguages = new List<string> { "de", "en", "it", "pt", "zh-Hans" };
    translationConfig.SpeechRecognitionLanguage = fromLanguage;
    toLanguages.ForEach(translationConfig.AddTargetLanguage);

    using var recognizer = new TranslationRecognizer(translationConfig);

    Console.Write($"Say something in '{fromLanguage}' and ");
    Console.WriteLine($"we'll translate into '{string.Join("', '", toLanguages)}'.\n");

    var result = await recognizer.RecognizeOnceAsync();
    if (result.Reason == ResultReason.TranslatedSpeech)
    {
        // See: https://aka.ms/speech/sdkregion#standard-and-neural-voices
        var languageToVoiceMap = new Dictionary<string, string>
        {
            ["de"] = "de-DE-KatjaNeural",
            ["en"] = "en-US-AriaNeural",
            ["it"] = "it-IT-ElsaNeural",
            ["pt"] = "pt-BR-FranciscaNeural",
            ["zh-Hans"] = "zh-CN-XiaoxiaoNeural"
        };

        Console.WriteLine($"Recognized: \"{result.Text}\"");

        foreach (var (language, translation) in result.Translations)
        {
            Console.WriteLine($"Translated into '{language}': {translation}");

            var speechConfig =
                SpeechConfig.FromSubscription(
                    SPEECH__SERVICE__KEY, SPEECH__SERVICE__REGION);
            speechConfig.SpeechSynthesisVoiceName = languageToVoiceMap[language];

            using var audioConfig = AudioConfig.FromWavFileOutput($"{language}-translation.wav");
            using var synthesizer = new SpeechSynthesizer(speechConfig, audioConfig);
            
            await synthesizer.SpeakTextAsync(translation);
        }
    }
}

有关语音合成的详细信息,请参阅语音合成基础知识For more information about speech synthesis, see the basics of speech synthesis.

语音服务的核心功能之一是能够识别人类语音并将其翻译成其他语言。One of the core features of the Speech service is the ability to recognize human speech and translate it to other languages. 本快速入门介绍如何在应用和产品中使用语音 SDK 来执行高质量的语音翻译。In this quickstart you learn how to use the Speech SDK in your apps and products to perform high-quality speech translation. 此快速入门介绍以下主题:This quickstart covers topics including:

  • 将语音翻译为文本Translating speech-to-text
  • 将语音翻译为多种目标语言Translating speech to multiple target languages
  • 直接进行语音转语音翻译Performing direct speech-to-speech translation

跳转到 GitHub 上的示例Skip to samples on GitHub

如果要直接跳到示例代码,请参阅 GitHub 上的 C++ 快速入门示例If you want to skip straight to sample code, see the C++ quickstart samples on GitHub.

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

安装语音 SDKInstall the Speech SDK

你需要先安装语音 SDK,然后才能执行任何操作。Before you can do anything, you'll need to install the Speech SDK. 根据你的平台,按照“关于语音 SDK”一文的获取语音 SDK 部分中的说明进行操作。Depending on your platform, follow the instructions under the Get the Speech SDK section of the About the Speech SDK article.

导入依赖项Import dependencies

若要运行本文中的示例,请在 C++ 代码文件的顶部包含以下 #includeusing 语句。To run the examples in this article, include the following #include and using statements at the top of the C++ code file.

#include <iostream> // cin, cout
#include <fstream>
#include <string>
#include <stdio.h>
#include <stdlib.h>
#include <speechapi_cxx.h>

using namespace std;
using namespace Microsoft::CognitiveServices::Speech;
using namespace Microsoft::CognitiveServices::Speech::Audio;
using namespace Microsoft::CognitiveServices::Speech::Translation;

敏感数据和环境变量Sensitive data and environment variables

本文中的示例源代码依赖于用于存储敏感数据的环境变量,如语音资源订阅密钥和区域。The example source code in this article depends on environment variables for storing sensitive data, such as the Speech resource subscription key and region. C++ 代码文件包含从主机环境变量(即 SPEECH__SUBSCRIPTION__KEYSPEECH__SERVICE__REGION)分配的两个字符串值。The C++ code file contains two string values that are assigned from the host machines environment variables, namely SPEECH__SUBSCRIPTION__KEY and SPEECH__SERVICE__REGION. 这两个字段都在类范围内,因此可以在类的方法主体中访问它们。Both of these fields are at the class scope, making them accessible within method bodies of the class. 有关环境变量的详细信息,请参阅环境变量和应用程序配置For more information on environment variables, see environment variables and application configuration.

auto SPEECH__SUBSCRIPTION__KEY = getenv("SPEECH__SUBSCRIPTION__KEY");
auto SPEECH__SERVICE__REGION = getenv("SPEECH__SERVICE__REGION");

创建语音翻译配置Create a speech translation configuration

若要使用语音 SDK 调用语音服务,需要创建 SpeechTranslationConfigTo call the Speech service using the Speech SDK, you need to create a SpeechTranslationConfig. 此类包含有关你的订阅的信息,例如你的密钥和关联的区域、终结点、主机或授权令牌。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.

提示

无论你是要执行语音识别、语音合成、翻译,还是意向识别,都需要创建一个配置。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

可以通过以下几种方法初始化 SpeechTranslationConfigThere are a few ways that you can initialize a SpeechTranslationConfig:

  • 使用订阅:传入密钥和关联的区域。With a subscription: pass in a key and the associated region.
  • 使用终结点:传入语音服务终结点。With an endpoint: pass in a Speech service endpoint. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用主机:传入主机地址。With a host: pass in a host address. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用授权令牌:传入授权令牌和关联的区域。With an authorization token: pass in an authorization token and the associated region.

让我们看看如何使用密钥和区域创建 SpeechTranslationConfigLet's take a look at how a SpeechTranslationConfig is created using a key and region. 按照免费试用语音服务中的以下步骤获取这些凭据。Get these credentials by following steps in Try the Speech service for free.

auto SPEECH__SUBSCRIPTION__KEY = getenv("SPEECH__SUBSCRIPTION__KEY");
auto SPEECH__SERVICE__REGION = getenv("SPEECH__SERVICE__REGION");

void translateSpeech() {
    auto config =
        SpeechTranslationConfig::FromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);
}

int main(int argc, char** argv) {
    setlocale(LC_ALL, "");
    translateSpeech();
    return 0;
}

更改源语言Change source language

语音翻译的一项常见任务是指定输入(或源)语言。One common task of speech translation is specifying the input (or source) language. 让我们看看如何将输入语言更改为意大利语。Let's take a look at how you would change the input language to Italian. 在代码中与 SpeechTranslationConfig 实例交互,调用 SetSpeechRecognitionLanguage 方法。In your code, interact with the SpeechTranslationConfig instance, calling the SetSpeechRecognitionLanguage method.

void translateSpeech() {
    auto translationConfig =
        SpeechTranslationConfig::FromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);

    // Source (input) language
    translationConfig->SetSpeechRecognitionLanguage("it-IT");
}

SpeechRecognitionLanguage 属性需要语言区域设置格式字符串。The SpeechRecognitionLanguage property expects a language-locale format string. 可以提供受支持的区域设置/语言的列表中“区域设置”列中的任何值 。You can provide any value in the Locale column in the list of supported locales/languages.

添加翻译语言Add translation language

语音翻译的另一项常见任务是指定目标翻译语言,至少需要一种语言,但支持多种语言。Another common task of speech translation is to specify target translation languages, at least one is required but multiples are supported. 以下代码片段将法语和德语设置成了目标翻译语言。The following code snippet sets both French and German as translation language targets.

void translateSpeech() {
    auto translationConfig =
        SpeechTranslationConfig::FromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);

    translationConfig->SetSpeechRecognitionLanguage("it-IT");

    // Translate to languages. See, https://aka.ms/speech/sttt-languages
    translationConfig->AddTargetLanguage("fr");
    translationConfig->AddTargetLanguage("de");
}

每次调用 AddTargetLanguage 时,都会指定一种新的目标翻译语言。With every call to AddTargetLanguage, a new target translation language is specified. 换言之,根据源语言识别语音后,就会在接着进行的翻译操作过程中提供每项目标翻译。In other words, when speech is recognized from the source language, each target translation is available as part of the resulting translation operation.

初始化翻译识别器Initialize a translation recognizer

创建 SpeechTranslationConfig 后,下一步是初始化 TranslationRecognizerAfter you've created a SpeechTranslationConfig, the next step is to initialize a TranslationRecognizer. 初始化 TranslationRecognizer 时,需要向其传递 translationConfigWhen you initialize a TranslationRecognizer, you'll need to pass it your translationConfig. 配置对象会提供验证你的请求时语音服务所需的凭据。The configuration object provides the credentials that the speech service requires to validate your request.

如果使用设备的默认麦克风识别语音,则 TranslationRecognizer 应如下所示:If you're recognizing speech using your device's default microphone, here's what the TranslationRecognizer should look like:

void translateSpeech() {
    auto translationConfig =
        SpeechTranslationConfig::FromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);

    auto fromLanguage = "en-US";
    auto toLanguages = { "it", "fr", "de" };
    translationConfig->SetSpeechRecognitionLanguage(fromLanguage);
    for (auto language : toLanguages) {
        translationConfig->AddTargetLanguage(language);
    }

    auto recognizer = TranslationRecognizer::FromConfig(translationConfig);
}

如果要指定音频输入设备,则需要创建一个 AudioConfig 并在初始化 TranslationRecognizer 时提供 audioConfig 参数。If you want to specify the audio input device, then you'll need to create an AudioConfig and provide the audioConfig parameter when initializing your TranslationRecognizer.

首先,引用 AudioConfig 对象,如下所示:First, you'll reference the AudioConfig object as follows:

void translateSpeech() {
    auto translationConfig =
        SpeechTranslationConfig::FromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);

    auto fromLanguage = "en-US";
    auto toLanguages = { "it", "fr", "de" };
    translationConfig->SetSpeechRecognitionLanguage(fromLanguage);
    for (auto language : toLanguages) {
        translationConfig->AddTargetLanguage(language);
    }

    auto audioConfig = AudioConfig::FromDefaultMicrophoneInput();
    auto recognizer = TranslationRecognizer::FromConfig(translationConfig, audioConfig);
}

如果要提供音频文件而不是使用麦克风,则仍需要提供 audioConfigIf you want to provide an audio file instead of using a microphone, you'll still need to provide an audioConfig. 但是,在创建 AudioConfig(而不是调用 FromDefaultMicrophoneInput)时,将调用 FromWavFileInput 并传递 filename 参数。However, when you create an AudioConfig, instead of calling FromDefaultMicrophoneInput, you'll call FromWavFileInput and pass the filename parameter.

void translateSpeech() {
    auto translationConfig =
        SpeechTranslationConfig::FromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);

    auto fromLanguage = "en-US";
    auto toLanguages = { "it", "fr", "de" };
    translationConfig->SetSpeechRecognitionLanguage(fromLanguage);
    for (auto language : toLanguages) {
        translationConfig->AddTargetLanguage(language);
    }

    auto audioConfig = AudioConfig::FromWavFileInput("YourAudioFile.wav");
    auto recognizer = TranslationRecognizer::FromConfig(translationConfig, audioConfig);
}

翻译语音Translate speech

为了翻译语音,语音 SDK 依赖于麦克风或音频文件输入。To translate speech, the Speech SDK relies on a microphone or an audio file input. 在语音翻译之前先进行语音识别。Speech recognition occurs before speech translation. 初始化所有对象后,调用识别一次的函数并获取结果。After all objects have been initialized, call the recognize-once function and get the result.

void translateSpeech() {
    auto translationConfig =
        SpeechTranslationConfig::FromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);

    string fromLanguage = "en-US";
    string toLanguages[3] = { "it", "fr", "de" };
    translationConfig->SetSpeechRecognitionLanguage(fromLanguage);
    for (auto language : toLanguages) {
        translationConfig->AddTargetLanguage(language);
    }

    auto recognizer = TranslationRecognizer::FromConfig(translationConfig);
    cout << "Say something in '" << fromLanguage << "' and we'll translate...\n";

    auto result = recognizer->RecognizeOnceAsync().get();
    if (result->Reason == ResultReason::TranslatedSpeech)
    {
        cout << "Recognized: \"" << result->Text << "\"" << std::endl;
        for (auto pair : result->Translations)
        {
            auto language = pair.first;
            auto translation = pair.second;
            cout << "Translated into '" << language << "': " << translation << std::endl;
        }
    }
}

有关语音转文本的详细信息,请参阅语音识别基础知识For more information about speech-to-text, see the basics of speech recognition.

合成翻译Synthesize translations

成功进行语音识别和翻译后,结果会包含字典中的所有翻译。After a successful speech recognition and translation, the result contains all the translations in a dictionary. Translations 字典键是目标翻译语言,其值是已翻译的文本。The Translations dictionary key is the target translation language and the value is the translated text. 可以翻译已识别的语音,然后以另一种语言进行合成(语音转语音)。Recognized speech can be translated, then synthesized in a different language (speech-to-speech).

基于事件的合成Event-based synthesis

TranslationRecognizer 对象公开了 Synthesizing 事件。The TranslationRecognizer object exposes a Synthesizing event. 该事件触发多次,并提供一种从翻译识别结果检索合成音频的机制。The event fires several times, and provides a mechanism to retrieve the synthesized audio from the translation recognition result. 若要翻译为多种语言,请参阅手动合成If you're translating to multiple languages, see manual synthesis. 通过分配 SetVoiceName 指定合成语音,并为 Synthesizing 事件提供事件处理程序,获取音频。Specify the synthesis voice by assigning a SetVoiceName and provide an event handler for the Synthesizing event, get the audio. 以下示例将已翻译的音频另存为 .wav 文件。The following example saves the translated audio as a .wav file.

重要

基于事件的合成仅适用于单项翻译,请勿 添加多种目标翻译语言。The event-based synthesis only works with a single translation, do not add multiple target translation languages. 此外,SetVoiceName 应与目标翻译语言相同(例如,"de" 可映射到 "de-DE-Hedda")。Additionally, the SetVoiceName should be the same language as the target translation language, for example; "de" could map to "de-DE-Hedda".

void translateSpeech() {
    auto translationConfig =
        SpeechTranslationConfig::FromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);

    auto fromLanguage = "en-US";
    auto toLanguage = "de";
    translationConfig->SetSpeechRecognitionLanguage(fromLanguage);
    translationConfig->AddTargetLanguage(toLanguage);

    // See: https://aka.ms/speech/sdkregion#standard-and-neural-voices
    translationConfig->SetVoiceName("de-DE-Hedda");

    auto recognizer = TranslationRecognizer::FromConfig(translationConfig);
    recognizer->Synthesizing.Connect([](const TranslationSynthesisEventArgs& e)
        {
            auto audio = e.Result->Audio;
            auto size = audio.size();
            cout << "Audio synthesized: " << size << " byte(s)" << (size == 0 ? "(COMPLETE)" : "") << std::endl;

            if (size > 0) {
                ofstream file("translation.wav", ios::out | ios::binary);
                auto audioData = audio.data();
                file.write((const char*)audioData, sizeof(audio[0]) * size);
                file.close();
            }
        });

    cout << "Say something in '" << fromLanguage << "' and we'll translate...\n";

    auto result = recognizer->RecognizeOnceAsync().get();
    if (result->Reason == ResultReason::TranslatedSpeech)
    {
        cout << "Recognized: \"" << result->Text << "\"" << std::endl;
        for (auto pair : result->Translations)
        {
            auto language = pair.first;
            auto translation = pair.second;
            cout << "Translated into '" << language << "': " << translation << std::endl;
        }
    }
}

手动合成Manual synthesis

Translations 字典可用于从翻译文本合成音频。The Translations dictionary can be used to synthesize audio from the translation text. 循环访问每项翻译,并合成翻译。Iterate through each translation, and synthesize the translation. 创建 SpeechSynthesizer 实例时,SpeechConfig 对象需要将其 SetSpeechSynthesisVoiceName 属性设为所需的语音。When creating a SpeechSynthesizer instance, the SpeechConfig object needs to have its SetSpeechSynthesisVoiceName property set to the desired voice. 以下示例翻译为五种语言,然后将每种翻译合成为相应神经语言的音频文件。The following example translates to five languages, and each translation is then synthesized to an audio file in the corresponding neural language.

void translateSpeech() {
    auto translationConfig =
        SpeechTranslationConfig::FromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);

    auto fromLanguage = "en-US";
    auto toLanguages = { "de", "en", "it", "pt", "zh-Hans" };
    translationConfig->SetSpeechRecognitionLanguage(fromLanguage);
    for (auto language : toLanguages) {
        translationConfig->AddTargetLanguage(language);
    }

    auto recognizer = TranslationRecognizer::FromConfig(translationConfig);

    cout << "Say something in '" << fromLanguage << "' and we'll translate...\n";

    auto result = recognizer->RecognizeOnceAsync().get();
    if (result->Reason == ResultReason::TranslatedSpeech)
    {
        // See: https://aka.ms/speech/sdkregion#standard-and-neural-voices
        map<string, string> languageToVoiceMap;
        languageToVoiceMap["de"] = "de-DE-KatjaNeural";
        languageToVoiceMap["en"] = "en-US-AriaNeural";
        languageToVoiceMap["it"] = "it-IT-ElsaNeural";
        languageToVoiceMap["pt"] = "pt-BR-FranciscaNeural";
        languageToVoiceMap["zh-Hans"] = "zh-CN-XiaoxiaoNeural";

        cout << "Recognized: \"" << result->Text << "\"" << std::endl;
        for (auto pair : result->Translations)
        {
            auto language = pair.first;
            auto translation = pair.second;
            cout << "Translated into '" << language << "': " << translation << std::endl;

            auto speech_config =
                SpeechConfig::FromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);
            speech_config->SetSpeechSynthesisVoiceName(languageToVoiceMap[language]);

            auto audio_config = AudioConfig::FromWavFileOutput(language + "-translation.wav");
            auto synthesizer = SpeechSynthesizer::FromConfig(speech_config, audio_config);

            synthesizer->SpeakTextAsync(translation).get();
        }
    }
}

有关语音合成的详细信息,请参阅语音合成基础知识For more information about speech synthesis, see the basics of speech synthesis.

语音服务的核心功能之一是能够识别人类语音并将其翻译成其他语言。One of the core features of the Speech service is the ability to recognize human speech and translate it to other languages. 本快速入门介绍如何在应用和产品中使用语音 SDK 来执行高质量的语音翻译。In this quickstart you learn how to use the Speech SDK in your apps and products to perform high-quality speech translation. 此快速入门介绍以下主题:This quickstart covers topics including:

  • 将语音翻译为文本Translating speech-to-text
  • 将语音翻译为多种目标语言Translating speech to multiple target languages
  • 直接进行语音转语音翻译Performing direct speech-to-speech translation

跳转到 GitHub 上的示例Skip to samples on GitHub

如果要直接跳到示例代码,请参阅 GitHub 上的 Java 快速入门示例If you want to skip straight to sample code, see the Java quickstart samples on GitHub.

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

安装语音 SDKInstall the Speech SDK

需要先安装语音 SDK,然后才能执行任何操作。Before you can do anything, you'll need to install the Speech SDK. 根据你的平台,按照“关于语音 SDK”一文的获取语音 SDK 部分中的说明进行操作。Depending on your platform, follow the instructions under the Get the Speech SDK section of the About the Speech SDK article.

导入依赖项Import dependencies

若要运行本文中的示例,请在 *.Java 代码文件的顶部添加以下 import 语句。To run the examples in this article, include the following import statements at the top of the *.Java code file.

package speech;

import java.io.*;
import java.util.*;
import java.util.concurrent.*;
import com.microsoft.cognitiveservices.speech.*;
import com.microsoft.cognitiveservices.speech.audio.*;
import com.microsoft.cognitiveservices.speech.translation.*;

敏感数据和环境变量Sensitive data and environment variables

本文中的示例源代码依赖于用于存储敏感数据的环境变量,如语音资源订阅密钥和区域。The example source code in this article depends on environment variables for storing sensitive data, such as the Speech resource subscription key and region. Java 代码文件包含从主机环境变量(即 SPEECH__SUBSCRIPTION__KEYSPEECH__SERVICE__REGION)分配的两个 static final String 值。The Java code file contains two static final String values that are assigned from the host machines environment variables, namely SPEECH__SUBSCRIPTION__KEY and SPEECH__SERVICE__REGION. 这两个字段都在类范围内,因此可以在类的方法主体中访问它们。Both of these fields are at the class scope, making them accessible within method bodies of the class. 有关环境变量的详细信息,请参阅环境变量和应用程序配置For more information on environment variables, see environment variables and application configuration.

public class App {

    static final String SPEECH__SUBSCRIPTION__KEY = System.getenv("SPEECH__SUBSCRIPTION__KEY");
    static final String SPEECH__SERVICE__REGION = System.getenv("SPEECH__SERVICE__REGION");

    public static void main(String[] args) { }
}

创建语音翻译配置Create a speech translation configuration

若要使用语音 SDK 调用语音服务,需要创建 SpeechTranslationConfigTo call the Speech service using the Speech SDK, you need to create a SpeechTranslationConfig. 此类包含有关你的订阅的信息,例如你的密钥和关联的区域、终结点、主机或授权令牌。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.

提示

无论你是要执行语音识别、语音合成、翻译,还是意向识别,都需要创建一个配置。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

可以通过以下几种方法初始化 SpeechTranslationConfigThere are a few ways that you can initialize a SpeechTranslationConfig:

  • 使用订阅:传入密钥和关联的区域。With a subscription: pass in a key and the associated region.
  • 使用终结点:传入语音服务终结点。With an endpoint: pass in a Speech service endpoint. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用主机:传入主机地址。With a host: pass in a host address. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用授权令牌:传入授权令牌和关联的区域。With an authorization token: pass in an authorization token and the associated region.

让我们看看如何使用密钥和区域创建 SpeechTranslationConfigLet's take a look at how a SpeechTranslationConfig is created using a key and region. 按照免费试用语音服务中的以下步骤获取这些凭据。Get these credentials by following steps in Try the Speech service for free.

public class App {

    static final String SPEECH__SUBSCRIPTION__KEY = System.getenv("SPEECH__SERVICE__KEY");
    static final String SPEECH__SERVICE__REGION = System.getenv("SPEECH__SERVICE__REGION");

    public static void main(String[] args) {
        try {
            translateSpeech();
            System.exit(0);
        } catch (Exception ex) {
            System.out.println(ex);
            System.exit(1);
        }
    }

    static void translateSpeech() {
        SpeechTranslationConfig config = SpeechTranslationConfig.fromSubscription(
            SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);
    }
}

更改源语言Change source language

语音翻译的一项常见任务是指定输入(或源)语言。One common task of speech translation is specifying the input (or source) language. 让我们看看如何将输入语言更改为意大利语。Let's take a look at how you would change the input language to Italian. 在代码中与 SpeechTranslationConfig 实例交互,调用 setSpeechRecognitionLanguage 方法。In your code, interact with the SpeechTranslationConfig instance, calling the setSpeechRecognitionLanguage method.

static void translateSpeech() {
    SpeechTranslationConfig translationConfig = SpeechTranslationConfig.fromSubscription(
        SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);
    
    // Source (input) language
    translationConfig.setSpeechRecognitionLanguage("it-IT");
}

setSpeechRecognitionLanguage 函数需要语言区域设置格式字符串。The setSpeechRecognitionLanguage function expects a language-locale format string. 可以提供受支持的区域设置/语言的列表中“区域设置”列中的任何值 。You can provide any value in the Locale column in the list of supported locales/languages.

添加翻译语言Add translation language

语音翻译的另一项常见任务是指定目标翻译语言,至少需要一种语言,但支持多种语言。Another common task of speech translation is to specify target translation languages, at least one is required but multiples are supported. 以下代码片段将法语和德语设置成了目标翻译语言。The following code snippet sets both French and German as translation language targets.

static void translateSpeech() {
    SpeechTranslationConfig translationConfig = SpeechTranslationConfig.fromSubscription(
        SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);
    
    translationConfig.setSpeechRecognitionLanguage("it-IT");

    // Translate to languages. See, https://aka.ms/speech/sttt-languages
    translationConfig.addTargetLanguage("fr");
    translationConfig.addTargetLanguage("de");
}

每次调用 addTargetLanguage 时,都会指定一种新的目标翻译语言。With every call to addTargetLanguage, a new target translation language is specified. 换言之,根据源语言识别语音后,就会在接着进行的翻译操作过程中提供每项目标翻译。In other words, when speech is recognized from the source language, each target translation is available as part of the resulting translation operation.

初始化翻译识别器Initialize a translation recognizer

创建 SpeechTranslationConfig 后,下一步是初始化 TranslationRecognizerAfter you've created a SpeechTranslationConfig, the next step is to initialize a TranslationRecognizer. 初始化 TranslationRecognizer 时,需要向其传递 translationConfigWhen you initialize a TranslationRecognizer, you'll need to pass it your translationConfig. 配置对象会提供验证你的请求时语音服务所需的凭据。The configuration object provides the credentials that the speech service requires to validate your request.

如果使用设备的默认麦克风识别语音,则 TranslationRecognizer 应如下所示:If you're recognizing speech using your device's default microphone, here's what the TranslationRecognizer should look like:

static void translateSpeech() {
    SpeechTranslationConfig translationConfig = SpeechTranslationConfig.fromSubscription(
        SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);
    
    String fromLanguage = "en-US";
    String[] toLanguages = { "it", "fr", "de" };
    translationConfig.setSpeechRecognitionLanguage(fromLanguage);
    for (String language : toLanguages) {
        translationConfig.addTargetLanguage(language);
    }

    try (TranslationRecognizer recognizer = new TranslationRecognizer(translationConfig)) {
    }
}

如果要指定音频输入设备,则需要创建一个 AudioConfig 并在初始化 TranslationRecognizer 时提供 audioConfig 参数。If you want to specify the audio input device, then you'll need to create an AudioConfig and provide the audioConfig parameter when initializing your TranslationRecognizer.

首先,引用 AudioConfig 对象,如下所示:First, you'll reference the AudioConfig object as follows:

static void translateSpeech() {
    SpeechTranslationConfig translationConfig = SpeechTranslationConfig.fromSubscription(
        SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);
    
    String fromLanguage = "en-US";
    String[] toLanguages = { "it", "fr", "de" };
    translationConfig.setSpeechRecognitionLanguage(fromLanguage);
    for (String language : toLanguages) {
        translationConfig.addTargetLanguage(language);
    }

    AudioConfig audioConfig = AudioConfig.fromDefaultMicrophoneInput();
    try (TranslationRecognizer recognizer = new TranslationRecognizer(translationConfig, audioConfig)) {
        
    }
}

如果要提供音频文件而不是使用麦克风,则仍需要提供 audioConfigIf you want to provide an audio file instead of using a microphone, you'll still need to provide an audioConfig. 但是,在创建 AudioConfig(而不是调用 fromDefaultMicrophoneInput)时,将调用 fromWavFileInput 并传递 filename 参数。However, when you create an AudioConfig, instead of calling fromDefaultMicrophoneInput, you'll call fromWavFileInput and pass the filename parameter.

static void translateSpeech() {
    SpeechTranslationConfig translationConfig = SpeechTranslationConfig.fromSubscription(
        SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);
    
    String fromLanguage = "en-US";
    String[] toLanguages = { "it", "fr", "de" };
    translationConfig.setSpeechRecognitionLanguage(fromLanguage);
    for (String language : toLanguages) {
        translationConfig.addTargetLanguage(language);
    }

    AudioConfig audioConfig = AudioConfig.fromWavFileInput("YourAudioFile.wav");
    try (TranslationRecognizer recognizer = new TranslationRecognizer(translationConfig, audioConfig)) {
        
    }
}

翻译语音Translate speech

为了翻译语音,语音 SDK 依赖于麦克风或音频文件输入。To translate speech, the Speech SDK relies on a microphone or an audio file input. 在语音翻译之前先进行语音识别。Speech recognition occurs before speech translation. 初始化所有对象后,调用识别一次的函数并获取结果。After all objects have been initialized, call the recognize-once function and get the result.

static void translateSpeech() throws ExecutionException, InterruptedException {
    SpeechTranslationConfig translationConfig = SpeechTranslationConfig.fromSubscription(
        SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);
    
    String fromLanguage = "en-US";
    String[] toLanguages = { "it", "fr", "de" };
    translationConfig.setSpeechRecognitionLanguage(fromLanguage);
    for (String language : toLanguages) {
        translationConfig.addTargetLanguage(language);
    }

    try (TranslationRecognizer recognizer = new TranslationRecognizer(translationConfig)) {
        System.out.printf("Say something in '%s' and we'll translate...", fromLanguage);

        TranslationRecognitionResult result = recognizer.recognizeOnceAsync().get();
        if (result.getReason() == ResultReason.TranslatedSpeech) {
            System.out.printf("Recognized: \"%s\"\n", result.getText());
            for (Map.Entry<String, String> pair : result.getTranslations().entrySet()) {
                System.out.printf("Translated into '%s': %s\n", pair.getKey(), pair.getValue());
            }
        }
    }
}

有关语音转文本的详细信息,请参阅语音识别基础知识For more information about speech-to-text, see the basics of speech recognition.

合成翻译Synthesize translations

成功进行语音识别和翻译后,结果会包含字典中的所有翻译。After a successful speech recognition and translation, the result contains all the translations in a dictionary. getTranslations 函数会返回一个字典,其键是目标翻译语言,其值是已翻译的文本。The getTranslations function returns a dictionary with the key as the target translation language and the value is the translated text. 可以翻译已识别的语音,然后以另一种语言进行合成(语音转语音)。Recognized speech can be translated, then synthesized in a different language (speech-to-speech).

基于事件的合成Event-based synthesis

TranslationRecognizer 对象公开了 synthesizing 事件。The TranslationRecognizer object exposes a synthesizing event. 该事件触发多次,并提供一种从翻译识别结果检索合成音频的机制。The event fires several times, and provides a mechanism to retrieve the synthesized audio from the translation recognition result. 若要翻译为多种语言,请参阅手动合成If you're translating to multiple languages, see manual synthesis. 通过分配 setVoiceName 指定合成语音,并为 synthesizing 事件提供事件处理程序,获取音频。Specify the synthesis voice by assigning a setVoiceName and provide an event handler for the synthesizing event, get the audio. 以下示例将已翻译的音频另存为 .wav 文件。The following example saves the translated audio as a .wav file.

重要

基于事件的合成仅适用于单项翻译,请勿 添加多种目标翻译语言。The event-based synthesis only works with a single translation, do not add multiple target translation languages. 此外,setVoiceName 应与目标翻译语言相同(例如,"de" 可映射到 "de-DE-Hedda")。Additionally, the setVoiceName should be the same language as the target translation language, for example; "de" could map to "de-DE-Hedda".

static void translateSpeech() throws ExecutionException, FileNotFoundException, InterruptedException, IOException {
    SpeechTranslationConfig translationConfig = SpeechTranslationConfig.fromSubscription(
        SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);

    String fromLanguage = "en-US";
    String toLanguage = "de";
    translationConfig.setSpeechRecognitionLanguage(fromLanguage);
    translationConfig.addTargetLanguage(toLanguage);

    // See: https://aka.ms/speech/sdkregion#standard-and-neural-voices
    translationConfig.setVoiceName("de-DE-Hedda");

    try (TranslationRecognizer recognizer = new TranslationRecognizer(translationConfig)) {
        recognizer.synthesizing.addEventListener((s, e) -> {
            byte[] audio = e.getResult().getAudio();
            int size = audio.length;
            System.out.println("Audio synthesized: " + size + " byte(s)" + (size == 0 ? "(COMPLETE)" : ""));

            if (size > 0) {
                try (FileOutputStream file = new FileOutputStream("translation.wav")) {
                    file.write(audio);
                } catch (IOException ex) {
                    ex.printStackTrace();
                }
            }
        });

        System.out.printf("Say something in '%s' and we'll translate...", fromLanguage);

        TranslationRecognitionResult result = recognizer.recognizeOnceAsync().get();
        if (result.getReason() == ResultReason.TranslatedSpeech) {
            System.out.printf("Recognized: \"%s\"\n", result.getText());
            for (Map.Entry<String, String> pair : result.getTranslations().entrySet()) {
                String language = pair.getKey();
                String translation = pair.getValue();
                System.out.printf("Translated into '%s': %s\n", language, translation);
            }
        }
    }
}

手动合成Manual synthesis

getTranslations 函数返回一个字典,可使用该字典从翻译文本合成音频。The getTranslations function returns a dictionary that can be used to synthesize audio from the translation text. 循环访问每项翻译,并合成翻译。Iterate through each translation, and synthesize the translation. 创建 SpeechSynthesizer 实例时,SpeechConfig 对象需要将其 setSpeechSynthesisVoiceName 属性设为所需的语音。When creating a SpeechSynthesizer instance, the SpeechConfig object needs to have its setSpeechSynthesisVoiceName property set to the desired voice. 以下示例翻译为五种语言,然后将每种翻译合成为相应神经语言的音频文件。The following example translates to five languages, and each translation is then synthesized to an audio file in the corresponding neural language.

static void translateSpeech() throws ExecutionException, InterruptedException {
    SpeechTranslationConfig translationConfig = SpeechTranslationConfig.fromSubscription(
        SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);
    
    String fromLanguage = "en-US";
    String[] toLanguages = { "de", "en", "it", "pt", "zh-Hans" };
    translationConfig.setSpeechRecognitionLanguage(fromLanguage);
    for (String language : toLanguages) {
        translationConfig.addTargetLanguage(language);
    }

    try (TranslationRecognizer recognizer = new TranslationRecognizer(translationConfig)) {
        System.out.printf("Say something in '%s' and we'll translate...", fromLanguage);

        TranslationRecognitionResult result = recognizer.recognizeOnceAsync().get();
        if (result.getReason() == ResultReason.TranslatedSpeech) {
            // See: https://aka.ms/speech/sdkregion#standard-and-neural-voices
            Map<String, String> languageToVoiceMap = new HashMap<String, String>();
            languageToVoiceMap.put("de", "de-DE-KatjaNeural");
            languageToVoiceMap.put("en", "en-US-AriaNeural");
            languageToVoiceMap.put("it", "it-IT-ElsaNeural");
            languageToVoiceMap.put("pt", "pt-BR-FranciscaNeural");
            languageToVoiceMap.put("zh-Hans", "zh-CN-XiaoxiaoNeural");

            System.out.printf("Recognized: \"%s\"\n", result.getText());
            for (Map.Entry<String, String> pair : result.getTranslations().entrySet()) {
                String language = pair.getKey();
                String translation = pair.getValue();
                System.out.printf("Translated into '%s': %s\n", language, translation);

                SpeechConfig speechConfig =
                    SpeechConfig.fromSubscription(SPEECH__SUBSCRIPTION__KEY, SPEECH__SERVICE__REGION);
                speechConfig.setSpeechSynthesisVoiceName(languageToVoiceMap.get(language));

                AudioConfig audioConfig = AudioConfig.fromWavFileOutput(language + "-translation.wav");
                try (SpeechSynthesizer synthesizer = new SpeechSynthesizer(speechConfig, audioConfig)) {
                    synthesizer.SpeakTextAsync(translation).get();
                }
            }
        }
    }
}

有关语音合成的详细信息,请参阅语音合成基础知识For more information about speech synthesis, see the basics of speech synthesis.

语音服务的核心功能之一是能够识别人类语音并将其翻译成其他语言。One of the core features of the Speech service is the ability to recognize human speech and translate it to other languages. 本快速入门介绍如何在应用和产品中使用语音 SDK 来执行高质量的语音翻译。In this quickstart you learn how to use the Speech SDK in your apps and products to perform high-quality speech translation. 此快速入门介绍以下主题:This quickstart covers topics including:

  • 将语音翻译为文本Translating speech-to-text
  • 将语音翻译为多种目标语言Translating speech to multiple target languages
  • 直接进行语音转语音翻译Performing direct speech-to-speech translation

跳转到 GitHub 上的示例Skip to samples on GitHub

如果要直接跳到示例代码,请参阅 GitHub 上的 JavaScript 快速入门示例If you want to skip straight to sample code, see the JavaScript quickstart samples on GitHub.

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

安装语音 SDKInstall the Speech SDK

需要先安装 JavaScript 语音 SDK,然后才能执行操作。Before you can do anything, you'll need to install the Speech SDK for JavaScript . 根据你的平台,使用以下说明:Depending on your platform, use the following instructions:

另外,请根据目标环境使用以下项之一:Additionally, depending on the target environment use one of the following:

下载并提取 JavaScript 语音 SDK microsoft.cognitiveservices.speech.sdk.bundle.js 文件,将其置于可供 HTML 文件访问的文件夹中。Download and extract the Speech SDK for JavaScript microsoft.cognitiveservices.speech.sdk.bundle.js file, and place it in a folder accessible to your HTML file.

<script src="microsoft.cognitiveservices.speech.sdk.bundle.js"></script>;

提示

如果以 Web 浏览器为目标并使用 <script> 标记,则不需 sdk 前缀。If you're targeting a web browser, and using the <script> tag; the sdk prefix is not needed. sdk 前缀是一个别名,用于为 require 模块命名。The sdk prefix is an alias used to name the require module.

创建翻译配置Create a translation configuration

若要使用语音 SDK 调用翻译服务,需要创建 SpeechTranslationConfigTo call the translation service using the Speech SDK, you need to create a SpeechTranslationConfig. 此类包含有关你的订阅的信息,例如你的密钥和关联的区域、终结点、主机或授权令牌。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.

备注

无论你是要执行语音识别、语音合成、翻译,还是意向识别,都需要创建一个配置。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration. 可以通过以下几种方法初始化 SpeechTranslationConfigThere are a few ways that you can initialize a SpeechTranslationConfig:

  • 使用订阅:传入密钥和关联的区域。With a subscription: pass in a key and the associated region.
  • 使用终结点:传入语音服务终结点。With an endpoint: pass in a Speech service endpoint. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用主机:传入主机地址。With a host: pass in a host address. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用授权令牌:传入授权令牌和关联的区域。With an authorization token: pass in an authorization token and the associated region.

让我们看看如何使用密钥和区域创建 SpeechTranslationConfigLet's take a look at how a SpeechTranslationConfig is created using a key and region. 按照免费试用语音服务中的以下步骤获取这些凭据。Get these credentials by following steps in Try the Speech service for free.

const speechTranslationConfig = SpeechTranslationConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");

初始化翻译工具Initialize a translator

创建 SpeechTranslationConfig 后,下一步是初始化 TranslationRecognizerAfter you've created a SpeechTranslationConfig, the next step is to initialize a TranslationRecognizer. 初始化 TranslationRecognizer 时,需要向其传递 speechTranslationConfigWhen you initialize a TranslationRecognizer, you'll need to pass it your speechTranslationConfig. 这会提供翻译服务验证请求所需的凭据。This provides the credentials that the translation service requires to validate your request.

如果使用设备的默认麦克风翻译提供的语音,则 TranslationRecognizer 应如下所示:If you're translating speech provided through your device's default microphone, here's what the TranslationRecognizer should look like:

const translator = new TranslationRecognizer(speechTranslationConfig);

如果要指定音频输入设备,则需要创建一个 AudioConfig 并在初始化 TranslationRecognizer 时提供 audioConfig 参数。If you want to specify the audio input device, then you'll need to create an AudioConfig and provide the audioConfig parameter when initializing your TranslationRecognizer.

提示

了解如何获取音频输入设备的设备 IDLearn how to get the device ID for your audio input device. 引用 AudioConfig 对象,如下所示:Reference the AudioConfig object as follows:

const audioConfig = AudioConfig.fromDefaultMicrophoneInput();
const recognizer = new TranslationRecognizer(speechTranslationConfig, audioConfig);

如果要提供音频文件而不是使用麦克风,则仍需要提供 audioConfigIf you want to provide an audio file instead of using a microphone, you'll still need to provide an audioConfig. 但是,只有在以 Node.js 为目标时才能这样做。创建 AudioConfig 时,需调用 fromWavFileOutput 并传递 filename 参数,而不是调用 fromDefaultMicrophoneInputHowever, this can only be done when targeting Node.js and when you create an AudioConfig, instead of calling fromDefaultMicrophoneInput, you'll call fromWavFileOutput and pass the filename parameter.

const audioConfig = AudioConfig.fromWavFileInput("YourAudioFile.wav");
const recognizer = new TranslationRecognizer(speechTranslationConfig, audioConfig);

翻译语音Translate speech

JavaScript 的语音 SDK 的 TranslationRecognizer 类公开了一些可用于语音翻译的方法。The TranslationRecognizer class for the Speech SDK for JavaScript exposes a few methods that you can use for speech translation.

  • 单步翻译(异步)- 在非阻塞(异步)模式下执行翻译。Single-shot translation (async) - Performs translation in a non-blocking (asynchronous) mode. 这会翻译单个言语。This will translate a single utterance. 单个言语的结束是通过在结束时倾听静音或处理最长 15 秒音频时确定的。The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.
  • 连续翻译(异步)- 异步启动连续翻译操作。Continuous translation (async) - Asynchronously initiates continuous translation operation. 用户向事件注册并处理各种应用程序状态。The user registers to events and handles various application states. 若要停止异步连续翻译,请调用 stopContinuousRecognitionAsyncTo stop asynchronous continuous translation, call stopContinuousRecognitionAsync.

备注

详细了解如何选择语音识别模式Learn more about how to choose a speech recognition mode.

指定目标语言Specify a target language

若要进行翻译,必须同时指定源语言和至少一种目标语言。To translate, you must specify both a source language and at least one target language. 可以使用语音翻译表中列出的区域设置来选择源语言。You can choose a source language using a locale listed in the Speech translation table. 在同一链接中查找译入语的选项。Find your options for translated language at the same link. 当你想查看文本或想听到合成翻译语音时,目标语言的选项会有所不同。Your options for target languages differ when you want to view text, or want to hear synthesized translated speech. 若要从英语翻译为德语,请修改翻译配置对象:To translate from English to German, modify the translation config object:

speechTranslationConfig.speechRecognitionLanguage = "en-US";
speechTranslationConfig.addTargetLanguage("de");

单步识别Single-shot recognition

下面是使用 recognizeOnceAsync 进行异步单步翻译的示例:Here's an example of asynchronous single-shot translation using recognizeOnceAsync:

recognizer.recognizeOnceAsync(result => {
    // Interact with result
});

需要编写一些代码来处理结果。You'll need to write some code to handle the result. 此示例对翻译为德语的 result.reason 进行评估:This sample evaluates the result.reason for a translation to German:

recognizer.recognizeOnceAsync(
  function (result) {
    let translation = result.translations.get("de");
    window.console.log(translation);
    recognizer.close();
  },
  function (err) {
    window.console.log(err);
    recognizer.close();
});

你的代码还可以处理翻译过程中提供的更新。Your code can also handle updates provided while the translation is processing. 可以使用这些更新来提供有关翻译进度的可视反馈。You can use these updates to provide visual feedback about the translation progress. 请参阅 JavaScript Node.js 示例,获取显示翻译过程中提供的更新的示例代码。See this JavaScript Node.js example for sample code that shows updates provided during the translation process. 下面的代码还显示了翻译过程中产生的详细信息。The following code also displays details produced during the translation process.

recognizer.recognizing = function (s, e) {
    var str = ("(recognizing) Reason: " + SpeechSDK.ResultReason[e.result.reason] +
            " Text: " +  e.result.text +
            " Translation:");
    str += e.result.translations.get("de");
    console.log(str);
};
recognizer.recognized = function (s, e) {
    var str = "\r\n(recognized)  Reason: " + SpeechSDK.ResultReason[e.result.reason] +
            " Text: " + e.result.text +
            " Translation:";
    str += e.result.translations.get("de");
    str += "\r\n";
    console.log(str);
};

连续翻译Continuous translation

连续翻译涉及的方面比单步识别多一点。Continuous translation is a bit more involved than single-shot recognition. 它要求你订阅 recognizingrecognizedcanceled 事件以获取识别结果。It requires you to subscribe to the recognizing, recognized, and canceled events to get the recognition results. 若要停止翻译,必须调用 stopContinuousRecognitionAsyncTo stop translation, you must call stopContinuousRecognitionAsync. 下面是有关如何对音频输入文件执行连续翻译的示例。Here's an example of how continuous translation is performed on an audio input file.

首先,我们将定义输入并初始化一个 TranslationRecognizerLet's start by defining the input and initializing a TranslationRecognizer:

const translator = new TranslationRecognizer(speechTranslationConfig);

我们将订阅从 TranslationRecognizer 发送的事件。We'll subscribe to the events sent from the TranslationRecognizer.

  • recognizing:事件信号,包含中间翻译结果。recognizing: Signal for events containing intermediate translation results.
  • recognized:事件信号,包含最终翻译结果(指示成功的翻译尝试)。recognized: Signal for events containing final translation results (indicating a successful translation attempt).
  • sessionStopped:事件信号,指示翻译会话的结束(操作)。sessionStopped: Signal for events indicating the end of a translation session (operation).
  • canceled:事件信号,包含已取消的翻译结果(指示因直接取消请求或者传输或协议失败导致的翻译尝试取消)。canceled: Signal for events containing canceled translation results (indicating a translation attempt that was canceled as a result or a direct cancellation request or, alternatively, a transport or protocol failure).
recognizer.recognizing = (s, e) => {
    console.log(`TRANSLATING: Text=${e.result.text}`);
};
recognizer.recognized = (s, e) => {
    if (e.result.reason == ResultReason.RecognizedSpeech) {
        console.log(`TRANSLATED: Text=${e.result.text}`);
    }
    else if (e.result.reason == ResultReason.NoMatch) {
        console.log("NOMATCH: Speech could not be translated.");
    }
};
recognizer.canceled = (s, e) => {
    console.log(`CANCELED: Reason=${e.reason}`);
    if (e.reason == CancellationReason.Error) {
        console.log(`"CANCELED: ErrorCode=${e.errorCode}`);
        console.log(`"CANCELED: ErrorDetails=${e.errorDetails}`);
        console.log("CANCELED: Did you update the subscription info?");
    }
    recognizer.stopContinuousRecognitionAsync();
};
recognizer.sessionStopped = (s, e) => {
    console.log("\n    Session stopped event.");
    recognizer.stopContinuousRecognitionAsync();
};

完成所有设置后,可以调用 startContinuousRecognitionAsyncWith everything set up, we can call startContinuousRecognitionAsync.

// Starts continuous recognition. Uses stopContinuousRecognitionAsync() to stop recognition.
recognizer.startContinuousRecognitionAsync();
// Something later can call, stops recognition.
// recognizer.StopContinuousRecognitionAsync();

选择源语言Choose a source language

语音翻译的一项常见任务是指定输入(或源)语言。A common task for speech translation is specifying the input (or source) language. 让我们看看如何将输入语言更改为意大利语。Let's take a look at how you would change the input language to Italian. 在代码中找到 SpeechTranslationConfig,并直接在其下方添加下面的行。In your code, find your SpeechTranslationConfig, then add the following line directly below it.

speechTranslationConfig.speechRecognitionLanguage = "it-IT";

speechRecognitionLanguage 属性需要语言区域设置格式字符串。The speechRecognitionLanguage property expects a language-locale format string. 可以提供受支持的区域设置/语言的列表中“区域设置”列中的任何值。You can provide any value in the Locale column in the list of supported locales/languages.

选择一种或多种目标语言Choose one or more target languages

语音 SDK 可以并行翻译为多个目标语言。The Speech SDK can translate to multiple target languages in parallel. 可用的目标语言与源语言列表有些不同,你可以使用语言代码而不是区域设置来指定目标语言。The target languages available are somewhat different from the source language list, and you specify target languages using a language code, rather than a locale. 请参阅语言支持页上的语音翻译表中文本目标的语言代码列表。See a list of language codes for text targets in the speech translation table on the language support page. 你也可以在那里找到有关翻译为合成语言的详细信息。You can also find details about translation into synthesized languages there.

下面的代码将德语添加为目标语言:The following code adds German as a target language:

translationConfig.addTargetLanguage("de");

由于可能有多个目标语言翻译,因此在检查结果时,代码必须指定目标语言。Since multiple target language translations are possible, your code must specify the target language when examining the result. 下面的代码获取德语的翻译结果。The following code gets translation results for German.

recognizer.recognized = function (s, e) {
    var str = "\r\n(recognized)  Reason: " +
            sdk.ResultReason[e.result.reason] +
            " Text: " + e.result.text + " Translations:";
    var language = "de";
    str += " [" + language + "] " + e.result.translations.get(language);
    str += "\r\n";
    // show str somewhere
};

语音服务的核心功能之一是能够识别人类语音并将其翻译成其他语言。One of the core features of the Speech service is the ability to recognize human speech and translate it to other languages. 本快速入门介绍如何在应用和产品中使用语音 SDK 来执行高质量的语音翻译。In this quickstart you learn how to use the Speech SDK in your apps and products to perform high-quality speech translation. 此快速入门介绍以下主题:This quickstart covers topics including:

  • 将语音翻译为文本Translating speech-to-text
  • 将语音翻译为多种目标语言Translating speech to multiple target languages
  • 直接进行语音转语音翻译Performing direct speech-to-speech translation

跳转到 GitHub 上的示例Skip to samples on GitHub

如果要直接跳到示例代码,请参阅 GitHub 上的 Python 快速入门示例If you want to skip straight to sample code, see the Python quickstart samples on GitHub.

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

安装语音 SDKInstall the Speech SDK

你需要先安装语音 SDK,然后才能执行任何操作。Before you can do anything, you'll need to install the Speech SDK. 根据你的平台,按照“关于语音 SDK”一文的获取语音 SDK 部分中的说明进行操作。Depending on your platform, follow the instructions under the Get the Speech SDK section of the About the Speech SDK article.

导入依赖项Import dependencies

若要运行本文中的示例,请在 python 代码文件的顶部添加以下 import 语句。To run the examples in this article, include the following import statements at the top of the python code file.

import os
import azure.cognitiveservices.speech as speechsdk

敏感数据和环境变量Sensitive data and environment variables

本文中的示例源代码依赖于用于存储敏感数据的环境变量,如语音资源订阅密钥和区域。The example source code in this article depends on environment variables for storing sensitive data, such as the Speech resource subscription key and region. python 代码文件包含从主机环境变量(即 SPEECH__SUBSCRIPTION__KEYSPEECH__SERVICE__REGION)分配的两个值。The python code file contains two values that are assigned from the host machines environment variables, namely SPEECH__SUBSCRIPTION__KEY and SPEECH__SERVICE__REGION. 这两个变量都位于全局范围内,因此可以在代码文件的函数定义内访问它们。Both of these variables are at the global scope, making them accessible within function definition of the code file. 有关环境变量的详细信息,请参阅环境变量和应用程序配置For more information on environment variables, see environment variables and application configuration.

speech_key, service_region = os.environ['SPEECH__SUBSCRIPTION__KEY'], os.environ['SPEECH__SERVICE__REGION']

创建语音翻译配置Create a speech translation configuration

若要使用语音 SDK 调用语音服务,需要创建 SpeechTranslationConfigTo call the Speech service using the Speech SDK, you need to create a SpeechTranslationConfig. 此类包含有关你的订阅的信息,例如你的密钥和关联的区域、终结点、主机或授权令牌。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.

提示

无论你是要执行语音识别、语音合成、翻译,还是意向识别,都需要创建一个配置。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

可以通过以下几种方法初始化 SpeechTranslationConfigThere are a few ways that you can initialize a SpeechTranslationConfig:

  • 使用订阅:传入密钥和关联的区域。With a subscription: pass in a key and the associated region.
  • 使用终结点:传入语音服务终结点。With an endpoint: pass in a Speech service endpoint. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用主机:传入主机地址。With a host: pass in a host address. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用授权令牌:传入授权令牌和关联的区域。With an authorization token: pass in an authorization token and the associated region.

让我们看看如何使用密钥和区域创建 SpeechTranslationConfigLet's take a look at how a SpeechTranslationConfig is created using a key and region. 按照免费试用语音服务中的以下步骤获取这些凭据。Get these credentials by following steps in Try the Speech service for free.

from_language, to_language = 'en-US', 'de'

def translate_speech_to_text():
    translation_config = speechsdk.translation.SpeechTranslationConfig(
            subscription=speech_key, region=service_region)

更改源语言Change source language

语音翻译的一项常见任务是指定输入(或源)语言。One common task of speech translation is specifying the input (or source) language. 让我们看看如何将输入语言更改为意大利语。Let's take a look at how you would change the input language to Italian. 在代码中与 SpeechTranslationConfig 实例交互,为 speech_recognition_language 属性赋值。In your code, interact with the SpeechTranslationConfig instance, assigning to the speech_recognition_language property.

def translate_speech_to_text():
    translation_config = speechsdk.translation.SpeechTranslationConfig(
            subscription=speech_key, region=service_region)

    # Source (input) language
    translation_config.speech_recognition_language = from_language

speech_recognition_language 属性需要语言区域设置格式字符串。The speech_recognition_language property expects a language-locale format string. 可以提供受支持的区域设置/语言的列表中“区域设置”列中的任何值 。You can provide any value in the Locale column in the list of supported locales/languages.

添加翻译语言Add translation language

语音翻译的另一项常见任务是指定目标翻译语言,至少需要一种语言,但支持多种语言。Another common task of speech translation is to specify target translation languages, at least one is required but multiples are supported. 以下代码片段将法语和德语设置成了目标翻译语言。The following code snippet sets both French and German as translation language targets.

def translate_speech_to_text():
    translation_config = speechsdk.translation.SpeechTranslationConfig(
            subscription=speech_key, region=service_region)

    translation_config.speech_recognition_language = "it-IT"

    # Translate to languages. See, https://aka.ms/speech/sttt-languages
    translation_config.add_target_language("fr")
    translation_config.add_target_language("de")

每次调用 add_target_language 时,都会指定一种新的目标翻译语言。With every call to add_target_language, a new target translation language is specified. 换言之,根据源语言识别语音后,就会在接着进行的翻译操作过程中提供每项目标翻译。In other words, when speech is recognized from the source language, each target translation is available as part of the resulting translation operation.

初始化翻译识别器Initialize a translation recognizer

创建 SpeechTranslationConfig 后,下一步是初始化 TranslationRecognizerAfter you've created a SpeechTranslationConfig, the next step is to initialize a TranslationRecognizer. 初始化 TranslationRecognizer 时,需要向其传递 translation_configWhen you initialize a TranslationRecognizer, you'll need to pass it your translation_config. 配置对象会提供验证你的请求时语音服务所需的凭据。The configuration object provides the credentials that the speech service requires to validate your request.

如果使用设备的默认麦克风识别语音,则 TranslationRecognizer 应如下所示:If you're recognizing speech using your device's default microphone, here's what the TranslationRecognizer should look like:

def translate_speech_to_text():
    translation_config = speechsdk.translation.SpeechTranslationConfig(
            subscription=speech_key, region=service_region)

    translation_config.speech_recognition_language = from_language
    translation_config.add_target_language(to_language)

    recognizer = speechsdk.translation.TranslationRecognizer(
            translation_config=translation_config)

如果要指定音频输入设备,则需要创建一个 AudioConfig 并在初始化 TranslationRecognizer 时提供 audio_config 参数。If you want to specify the audio input device, then you'll need to create an AudioConfig and provide the audio_config parameter when initializing your TranslationRecognizer.

首先,引用 AudioConfig 对象,如下所示:First, you'll reference the AudioConfig object as follows:

def translate_speech_to_text():
    translation_config = speechsdk.translation.SpeechTranslationConfig(
            subscription=speech_key, region=service_region)

    translation_config.speech_recognition_language = from_language
    for lang in to_languages:
        translation_config.add_target_language(lang)

    audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
    recognizer = speechsdk.translation.TranslationRecognizer(
            translation_config=translation_config, audio_config=audio_config)

如果要提供音频文件而不是使用麦克风,则仍需要提供 audioConfigIf you want to provide an audio file instead of using a microphone, you'll still need to provide an audioConfig. 但是,在创建 AudioConfig 时,你将使用 filename="path-to-file.wav" 进行调用(而不是使用 use_default_microphone=True 进行调用),并提供 filename 参数。However, when you create an AudioConfig, instead of calling with use_default_microphone=True, you'll call with filename="path-to-file.wav" and provide the filename parameter.

def translate_speech_to_text():
    translation_config = speechsdk.translation.SpeechTranslationConfig(
            subscription=speech_key, region=service_region)

    translation_config.speech_recognition_language = from_language
    for lang in to_languages:
        translation_config.add_target_language(lang)

    audio_config = speechsdk.audio.AudioConfig(filename="path-to-file.wav")
    recognizer = speechsdk.translation.TranslationRecognizer(
            translation_config=translation_config, audio_config=audio_config)

翻译语音Translate speech

为了翻译语音,语音 SDK 依赖于麦克风或音频文件输入。To translate speech, the Speech SDK relies on a microphone or an audio file input. 在语音翻译之前先进行语音识别。Speech recognition occurs before speech translation. 初始化所有对象后,调用识别一次的函数并获取结果。After all objects have been initialized, call the recognize-once function and get the result.

import os
import azure.cognitiveservices.speech as speechsdk

speech_key, service_region = os.environ['SPEECH__SERVICE__KEY'], os.environ['SPEECH__SERVICE__REGION']
from_language, to_languages = 'en-US', 'de'

def translate_speech_to_text():
    translation_config = speechsdk.translation.SpeechTranslationConfig(
            subscription=speech_key, region=service_region)

    translation_config.speech_recognition_language = from_language
    translation_config.add_target_language(to_language)

    recognizer = speechsdk.translation.TranslationRecognizer(
            translation_config=translation_config)
    
    print('Say something...')
    result = recognizer.recognize_once()
    print(get_result_text(reason=result.reason, result=result))

def get_result_text(reason, result):
    reason_format = {
        speechsdk.ResultReason.TranslatedSpeech:
            f'RECOGNIZED "{from_language}": {result.text}\n' +
            f'TRANSLATED into "{to_language}"": {result.translations[to_language]}',
        speechsdk.ResultReason.RecognizedSpeech: f'Recognized: "{result.text}"',
        speechsdk.ResultReason.NoMatch: f'No speech could be recognized: {result.no_match_details}',
        speechsdk.ResultReason.Canceled: f'Speech Recognition canceled: {result.cancellation_details}'
    }
    return reason_format.get(reason, 'Unable to recognize speech')

translate_speech_to_text()

有关语音转文本的详细信息,请参阅语音识别基础知识For more information about speech-to-text, see the basics of speech recognition.

合成翻译Synthesize translations

成功进行语音识别和翻译后,结果会包含字典中的所有翻译。After a successful speech recognition and translation, the result contains all the translations in a dictionary. translations 字典键是目标翻译语言,其值是已翻译的文本。The translations dictionary key is the target translation language and the value is the translated text. 可以翻译已识别的语音,然后以另一种语言进行合成(语音转语音)。Recognized speech can be translated, then synthesized in a different language (speech-to-speech).

基于事件的合成Event-based synthesis

TranslationRecognizer 对象公开了 Synthesizing 事件。The TranslationRecognizer object exposes a Synthesizing event. 该事件触发多次,并提供一种从翻译识别结果检索合成音频的机制。The event fires several times, and provides a mechanism to retrieve the synthesized audio from the translation recognition result. 若要翻译为多种语言,请参阅手动合成If you're translating to multiple languages, see manual synthesis. 通过分配 voice_name 指定合成语音,并为 Synthesizing 事件提供事件处理程序,获取音频。Specify the synthesis voice by assigning a voice_name and provide an event handler for the Synthesizing event, get the audio. 以下示例将已翻译的音频另存为 .wav 文件。The following example saves the translated audio as a .wav file.

重要

基于事件的合成仅适用于单项翻译,请勿 添加多种目标翻译语言。The event-based synthesis only works with a single translation, do not add multiple target translation languages. 此外,voice_name 应与目标翻译语言相同(例如,"de" 可映射到 "de-DE-Hedda")。Additionally, the voice_name should be the same language as the target translation language, for example; "de" could map to "de-DE-Hedda".

import os
import azure.cognitiveservices.speech as speechsdk

speech_key, service_region = os.environ['SPEECH__SERVICE__KEY'], os.environ['SPEECH__SERVICE__REGION']
from_language, to_language = 'en-US', 'de'

def translate_speech_to_text():
    translation_config = speechsdk.translation.SpeechTranslationConfig(
            subscription=speech_key, region=service_region)

    translation_config.speech_recognition_language = from_language
    translation_config.add_target_language(to_language)

    # See: https://aka.ms/speech/sdkregion#standard-and-neural-voices
    translation_config.voice_name = "de-DE-Hedda"

    recognizer = speechsdk.translation.TranslationRecognizer(
            translation_config=translation_config)

    def synthesis_callback(evt):
        size = len(evt.result.audio)
        print(f'Audio synthesized: {size} byte(s) {"(COMPLETED)" if size == 0 else ""}')

        if size > 0:
            file = open('translation.wav', 'wb+')
            file.write(evt.result.audio)
            file.close()

    recognizer.synthesizing.connect(synthesis_callback)

    print(f'Say something in "{from_language}" and we\'ll translate into "{to_language}".')

    result = recognizer.recognize_once()
    print(get_result_text(reason=result.reason, result=result))

def get_result_text(reason, result):
    reason_format = {
        speechsdk.ResultReason.TranslatedSpeech:
            f'Recognized "{from_language}": {result.text}\n' +
            f'Translated into "{to_language}"": {result.translations[to_language]}',
        speechsdk.ResultReason.RecognizedSpeech: f'Recognized: "{result.text}"',
        speechsdk.ResultReason.NoMatch: f'No speech could be recognized: {result.no_match_details}',
        speechsdk.ResultReason.Canceled: f'Speech Recognition canceled: {result.cancellation_details}'
    }
    return reason_format.get(reason, 'Unable to recognize speech')

translate_speech_to_text()

手动合成Manual synthesis

translations 字典可用于从翻译文本合成音频。The translations dictionary can be used to synthesize audio from the translation text. 循环访问每项翻译,并合成翻译。Iterate through each translation, and synthesize the translation. 创建 SpeechSynthesizer 实例时,SpeechConfig 对象需要将其 speech_synthesis_voice_name 属性设为所需的语音。When creating a SpeechSynthesizer instance, the SpeechConfig object needs to have its speech_synthesis_voice_name property set to the desired voice. 以下示例翻译为五种语言,然后将每种翻译合成为相应神经语言的音频文件。The following example translates to five languages, and each translation is then synthesized to an audio file in the corresponding neural language.

import os
import azure.cognitiveservices.speech as speechsdk

speech_key, service_region = os.environ['SPEECH__SERVICE__KEY'], os.environ['SPEECH__SERVICE__REGION']
from_language, to_languages = 'en-US', [ 'de', 'en', 'it', 'pt', 'zh-Hans' ]

def translate_speech_to_text():
    translation_config = speechsdk.translation.SpeechTranslationConfig(
            subscription=speech_key, region=service_region)

    translation_config.speech_recognition_language = from_language
    for lang in to_languages:
        translation_config.add_target_language(lang)

    recognizer = speechsdk.translation.TranslationRecognizer(
            translation_config=translation_config)
    
    print('Say something...')
    result = recognizer.recognize_once()
    synthesize_translations(result=result)

def synthesize_translations(result):
    language_to_voice_map = {
        "de": "de-DE-KatjaNeural",
        "en": "en-US-AriaNeural",
        "it": "it-IT-ElsaNeural",
        "pt": "pt-BR-FranciscaNeural",
        "zh-Hans": "zh-CN-XiaoxiaoNeural"
    }
    print(f'Recognized: "{result.text}"')

    for language in result.translations:
        translation = result.translations[language]
        print(f'Translated into "{language}": {translation}')

        speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
        speech_config.speech_synthesis_voice_name = language_to_voice_map.get(language)
        
        audio_config = speechsdk.audio.AudioOutputConfig(filename=f'{language}-translation.wav')
        speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
        speech_synthesizer.speak_text_async(translation).get()

translate_speech_to_text()

有关语音合成的详细信息,请参阅语音合成基础知识For more information about speech synthesis, see the basics of speech synthesis.

语音服务的核心功能之一是能够识别人类语音并将其翻译成其他语言。One of the core features of the Speech service is the ability to recognize human speech and translate it to other languages. 本快速入门介绍如何在应用和产品中使用语音 SDK 来执行高质量的语音翻译。In this quickstart you learn how to use the Speech SDK in your apps and products to perform high-quality speech translation. 此快速入门将麦克风中的语音翻译成另一种语言的文本。This quickstart translates speech from the microphone into text in another language.

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

下载并安装Download and install

备注

在 Windows 上,需要安装适用于平台的 Microsoft Visual C++ Redistributable for Visual Studio 2019On Windows, you need the Microsoft Visual C++ Redistributable for Visual Studio 2019 for your platform. 首次安装时,可能需要重启 Windows。Installing this for the first time may require you to restart Windows.

按照以下步骤在 Windows 上安装语音 CLI:Follow these steps to install the Speech CLI on Windows:

  1. 下载语音 CLI zip 存档然后提取它。Download the Speech CLI zip archive, then extract it.
  2. 转到从下载中提取的根目录 spx-zips,并提取所需的子目录(spx-net471 用于 .NET Framework 4.7,spx-netcore-win-x64 用于 x64 CPU 上的 .NET Core 3.0)。Go to the root directory spx-zips that you extracted from the download, and extract the subdirectory that you need (spx-net471 for .NET Framework 4.7, or spx-netcore-win-x64 for .NET Core 3.0 on an x64 CPU).

在命令提示符中,将目录更改到此位置,然后键入 spx 查看语音 CLI 的帮助。In the command prompt, change directory to this location, and then type spx to see help for the Speech CLI.

备注

在 Windows 上,语音 CLI 只能显示本地计算机上命令提示符适用的字体。On Windows, the Speech CLI can only show fonts available to the command prompt on the local computer. Windows 终端支持通过语音 CLI 以交互方式生成的所有字体。Windows Terminal supports all fonts produced interactively by the Speech CLI. 如果输出到文件,文本编辑器(例如记事本)或 web 浏览器(例如 Microsoft Edge)也可以显示所有字体。If you output to a file, a text editor like Notepad or a web browser like Microsoft Edge can also show all fonts.

备注

查找命令时,Powershell 不会检查本地目录。Powershell does not check the local directory when looking for a command. 在 Powershell 中,将目录更改为 spx 的位置,并通过输入 .\spx 调用工具。In Powershell, change directory to the location of spx and call the tool by entering .\spx. 如果将此目录添加到路径,则 Powershell 和 Windows 命令提示符会从不包含 .\ 前缀的任何目录中查找 spxIf you add this directory to your path, Powershell and the Windows command prompt will find spx from any directory without including the .\ prefix.

创建订阅配置Create subscription config

若要开始使用语音 CLI,需要输入语音订阅密钥和区域标识符。To start using the Speech CLI, you need to enter your Speech subscription key and region identifier. 按照免费试用语音服务中的步骤获取这些凭据。Get these credentials by following steps in Try the Speech service for free. 获得订阅密钥和区域标识符后(例如Once you have your subscription key and region identifier (ex. eastuswestus),运行以下命令。eastus, westus), run the following commands.

spx config @key --set SUBSCRIPTION-KEY
spx config @region --set REGION

现在会存储订阅身份验证,用于将来的 SPX 请求。Your subscription authentication is now stored for future SPX requests. 如果需要删除这些已存储值中的任何一个,请运行 spx config @region --clearspx config @key --clearIf you need to remove either of these stored values, run spx config @region --clear or spx config @key --clear.

设置源语言和目标语言Set source and target language

此命令会调用语音 CLI,将麦克风中的语音从意大利语翻译成法语。This command calls Speech CLI to translate speech from the microphone from Italian to French.

 spx translate --microphone --source it-IT --target fr

后续步骤Next steps