您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

语音转文本入门Get started with speech-to-text

语音服务的核心功能之一是能够识别并转录人类语音(通常称为语音转文本)。One of the core features of the Speech service is the ability to recognize and transcribe human speech (often referred to as speech-to-text). 本快速入门介绍如何在应用和产品中使用语音 SDK 来执行高质量的语音转文本转换。In this quickstart, you learn how to use the Speech SDK in your apps and products to perform high-quality speech-to-text conversion.

跳转到 GitHub 上的示例Skip to samples on GitHub

如果要直接跳到示例代码,请参阅 GitHub 上的 C# 快速入门示例If you want to skip straight to sample code, see the C# quickstart samples on GitHub.

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

安装语音 SDKInstall the Speech SDK

如果只想开始使用包名称,请在 NuGet 控制台中运行 Install-Package Microsoft.CognitiveServices.SpeechIf you just want the package name to get started, run Install-Package Microsoft.CognitiveServices.Speech in the NuGet console.

有关特定于平台的安装说明,请参阅以下链接:For platform-specific installation instructions, see the following links:

创建语音配置Create a speech configuration

若要使用语音 SDK 调用语音服务,需要创建 SpeechConfigTo call the Speech service using the Speech SDK, you need to create a SpeechConfig. 此类包含有关你的订阅的信息,例如你的密钥和关联的区域、终结点、主机或授权令牌。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token. 通过使用密钥和区域创建 SpeechConfigCreate a SpeechConfig by using your key and region. 请参阅查找密钥和区域页面,查找密钥区域对。See the Find keys and region page to find your key-region pair.

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

class Program 
{
    async static Task Main(string[] args)
    {
        var speechConfig = SpeechConfig.FromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
    }
}

可以通过以下其他几种方法初始化 SpeechConfigThere are a few other ways that you can initialize a SpeechConfig:

  • 使用终结点:传入语音服务终结点。With an endpoint: pass in a Speech service endpoint. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用主机:传入主机地址。With a host: pass in a host address. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用授权令牌:传入授权令牌和关联的区域。With an authorization token: pass in an authorization token and the associated region.

备注

无论你是要执行语音识别、语音合成、翻译,还是意向识别,都需要创建一个配置。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

从麦克风识别Recognize from microphone

若要使用设备麦克风识别语音,需使用 FromDefaultMicrophoneInput() 创建 AudioConfigTo recognize speech using your device microphone, create an AudioConfig using FromDefaultMicrophoneInput(). 然后初始化 SpeechRecognizer,传递 audioConfigspeechConfigThen initialize a SpeechRecognizer, passing your audioConfig and speechConfig.

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

class Program 
{
    async static Task FromMic(SpeechConfig speechConfig)
    {
        using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
        using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);

        Console.WriteLine("Speak into your microphone.");
        var result = await recognizer.RecognizeOnceAsync();
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
    }

    async static Task Main(string[] args)
    {
        var speechConfig = SpeechConfig.FromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
        await FromMic(speechConfig);
    }
}

如果你想使用特定的音频输入设备,则需要在 AudioConfig 中指定设备 ID。If you want to use a specific audio input device, you need to specify the device ID in the AudioConfig. 了解如何获取音频输入设备的设备 IDLearn how to get the device ID for your audio input device.

从文件识别Recognize from file

如果要从音频文件(而不是麦克风)识别语音,则仍需要创建 AudioConfigIf you want to recognize speech from an audio file instead of a microphone, you still need to create an AudioConfig. 但创建 AudioConfig 时,需要调用 FromWavFileInput()(而不是调用 FromDefaultMicrophoneInput())并传递文件路径。However, when you create the AudioConfig, instead of calling FromDefaultMicrophoneInput(), you call FromWavFileInput() and pass the file path.

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

class Program 
{
    async static Task FromFile(SpeechConfig speechConfig)
    {
        using var audioConfig = AudioConfig.FromWavFileInput("PathToFile.wav");
        using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);

        var result = await recognizer.RecognizeOnceAsync();
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
    }

    async static Task Main(string[] args)
    {
        var speechConfig = SpeechConfig.FromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
        await FromFile(speechConfig);
    }
}

从内存中流识别Recognize from in-memory stream

对于许多用例,你的音频数据可能来自 Blob 存储,或者已经作为 byte[] 或类似的原始数据结构存在于内存中。For many use-cases, it is likely your audio data will be coming from blob storage, or otherwise already be in-memory as a byte[] or similar raw data structure. 以下示例使用 PushAudioInputStream 来识别语音,语音本质上是抽象的内存流。The following example uses a PushAudioInputStream to recognize speech, which is essentially an abstracted memory stream. 该示例代码执行下列操作:The sample code does the following:

  • 使用接受 byte[]Write() 函数将原始音频数据 (PCM) 写入 PushAudioInputStreamWrites raw audio data (PCM) to the PushAudioInputStream using the Write() function, which accepts a byte[].
  • 为了演示目的,请使用 FileReader 读取 .wav 文件,但如果你已经在 byte[] 中拥有音频数据,则可以直接跳过此步骤,将内容写入输入流。Reads a .wav file using a FileReader for demonstration purposes, but if you already have audio data in a byte[], you can skip directly to writing the content to the input stream.
  • 默认格式是 16 位 16khz 单声道 PCM。The default format is 16 bit, 16khz mono PCM. 若要自定义格式,可以使用静态函数 AudioStreamFormat.GetWaveFormatPCM(sampleRate, (byte)bitRate, (byte)channels)AudioStreamFormat 对象传递给 CreatePushStream()To customize the format, you can pass an AudioStreamFormat object to CreatePushStream() using the static function AudioStreamFormat.GetWaveFormatPCM(sampleRate, (byte)bitRate, (byte)channels).
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

class Program 
{
    async static Task FromStream(SpeechConfig speechConfig)
    {
        var reader = new BinaryReader(File.OpenRead("PathToFile.wav"));
        using var audioInputStream = AudioInputStream.CreatePushStream();
        using var audioConfig = AudioConfig.FromStreamInput(audioInputStream);
        using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);

        byte[] readBytes;
        do
        {
            readBytes = reader.ReadBytes(1024);
            audioInputStream.Write(readBytes, readBytes.Length);
        } while (readBytes.Length > 0);

        var result = await recognizer.RecognizeOnceAsync();
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
    }

    async static Task Main(string[] args)
    {
        var speechConfig = SpeechConfig.FromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
        await FromStream(speechConfig);
    }
}

使用推送流作为输入假定音频数据是原始 PCM,例如,跳过任何标头。Using a push stream as input assumes that the audio data is a raw PCM, e.g. skipping any headers. 如果未跳过标头,API 在某些情况下仍可正常运行,但为获得最佳结果,请考虑实现读取标头的逻辑,确保 byte[] 开头。The API will still work in certain cases if the header has not been skipped, but for the best results consider implementing logic to read off the headers so the byte[] starts at the start of the audio data.

错误处理。Error handling

前面的示例只从 result.text 获取已识别的文本,但要处理错误和其他响应,需要编写一些代码来处理结果。The previous examples simply get the recognized text from result.text, but to handle errors and other responses, you'll need to write some code to handle the result. 以下代码评估 result.Reason 属性并:The following code evaluates the result.Reason property and:

  • 输出识别结果:ResultReason.RecognizedSpeechPrints the recognition result: ResultReason.RecognizedSpeech
  • 如果没有识别匹配项,请通知用户:ResultReason.NoMatchIf there is no recognition match, inform the user: ResultReason.NoMatch
  • 如果遇到错误,则输出错误消息:ResultReason.CanceledIf an error is encountered, print the error message: ResultReason.Canceled
switch (result.Reason)
{
    case ResultReason.RecognizedSpeech:
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
        break;
    case ResultReason.NoMatch:
        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
        break;
    case ResultReason.Canceled:
        var cancellation = CancellationDetails.FromResult(result);
        Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

        if (cancellation.Reason == CancellationReason.Error)
        {
            Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
            Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
            Console.WriteLine($"CANCELED: Did you update the subscription info?");
        }
        break;
}

连续识别Continuous recognition

前面的示例使用单步识别,可识别单个言语。The previous examples use single-shot recognition, which recognizes a single utterance. 单个言语的结束是通过在结束时倾听静音或处理最长 15 秒音频时确定的。The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.

与此相反,当你想控制何时停止识别时,需要使用连续识别。In contrast, continuous recognition is used when you want to control when to stop recognizing. 它要求你订阅 RecognizingRecognizedCanceled 事件以获取识别结果。It requires you to subscribe to the Recognizing, Recognized, and Canceled events to get the recognition results. 若要停止识别,必须调用 StopContinuousRecognitionAsyncTo stop recognition, you must call StopContinuousRecognitionAsync. 下面是有关如何对音频输入文件执行连续识别的示例。Here's an example of how continuous recognition is performed on an audio input file.

首先定义输入并初始化 SpeechRecognizerStart by defining the input and initializing a SpeechRecognizer:

using var audioConfig = AudioConfig.FromWavFileInput("YourAudioFile.wav");
using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);

然后,创建 TaskCompletionSource<int> 来管理语音识别的状态。Then create a TaskCompletionSource<int> to manage the state of speech recognition.

var stopRecognition = new TaskCompletionSource<int>();

接下来,我们将订阅从 SpeechRecognizer 发送的事件。Next, subscribe to the events sent from the SpeechRecognizer.

  • Recognizing:事件信号,包含中间识别结果。Recognizing: Signal for events containing intermediate recognition results.
  • Recognized:事件信号,包含最终识别结果(指示成功的识别尝试)。Recognized: Signal for events containing final recognition results (indicating a successful recognition attempt).
  • SessionStopped:事件信号,指示识别会话的结束(操作)。SessionStopped: Signal for events indicating the end of a recognition session (operation).
  • Canceled:事件信号,包含已取消的识别结果(指示因直接取消请求或者传输或协议失败导致的识别尝试取消)。Canceled: Signal for events containing canceled recognition results (indicating a recognition attempt that was canceled as a result or a direct cancellation request or, alternatively, a transport or protocol failure).
recognizer.Recognizing += (s, e) =>
{
    Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}");
};

recognizer.Recognized += (s, e) =>
{
    if (e.Result.Reason == ResultReason.RecognizedSpeech)
    {
        Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
    }
    else if (e.Result.Reason == ResultReason.NoMatch)
    {
        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
    }
};

recognizer.Canceled += (s, e) =>
{
    Console.WriteLine($"CANCELED: Reason={e.Reason}");

    if (e.Reason == CancellationReason.Error)
    {
        Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
        Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
        Console.WriteLine($"CANCELED: Did you update the subscription info?");
    }

    stopRecognition.TrySetResult(0);
};

recognizer.SessionStopped += (s, e) =>
{
    Console.WriteLine("\n    Session stopped event.");
    stopRecognition.TrySetResult(0);
};

设置所有项后,调用 StartContinuousRecognitionAsync 开始识别。With everything set up, call StartContinuousRecognitionAsync to start recognizing.

await recognizer.StartContinuousRecognitionAsync();

// Waits for completion. Use Task.WaitAny to keep the task rooted.
Task.WaitAny(new[] { stopRecognition.Task });

// make the following call at some point to stop recognition.
// await recognizer.StopContinuousRecognitionAsync();

听写模式Dictation mode

使用连续识别时,可以使用相应的“启用听写”功能启用听写处理。When using continuous recognition, you can enable dictation processing by using the corresponding "enable dictation" function. 此模式将导致语音配置实例解释句子结构的单词说明(如标点符号)。This mode will cause the speech config instance to interpret word descriptions of sentence structures such as punctuation. 例如,言语“你居住在城镇吗问号”会被解释为文本“你居住在城镇吗?”。For example, the utterance "Do you live in town question mark" would be interpreted as the text "Do you live in town?".

若要启用听写模式,请在 SpeechConfig 上使用 EnableDictation 方法。To enable dictation mode, use the EnableDictation method on your SpeechConfig.

speechConfig.EnableDictation();

更改源语言Change source language

语音识别的常见任务是指定输入(或源)语言。A common task for speech recognition is specifying the input (or source) language. 让我们看看如何将输入语言更改为意大利语。Let's take a look at how you would change the input language to Italian. 在代码中找到 SpeechConfig,并直接在其下方添加此行。In your code, find your SpeechConfig, then add this line directly below it.

speechConfig.SpeechRecognitionLanguage = "it-IT";

SpeechRecognitionLanguage 属性需要语言区域设置格式字符串。The SpeechRecognitionLanguage property expects a language-locale format string. 可以提供受支持的区域设置/语言的列表中“区域设置”列中的任何值 。You can provide any value in the Locale column in the list of supported locales/languages.

提高识别准确度Improve recognition accuracy

可以通过多种方式使用语音 SDK 来提高识别的准确度。There are a few ways to improve recognition accuracy with the Speech SDK. 让我们看一下短语列表。Let's take a look at Phrase Lists. 短语列表用于标识音频数据中的已知短语,如人的姓名或特定位置。Phrase Lists are used to identify known phrases in audio data, like a person's name or a specific location. 可以将单个词或完整短语添加到短语列表。Single words or complete phrases can be added to a Phrase List. 在识别期间,如果音频中包含整个短语的完全匹配项,则使用短语列表中的条目。During recognition, an entry in a phrase list is used if an exact match for the entire phrase is included in the audio. 如果找不到与短语完全匹配的项,则不支持识别。If an exact match to the phrase is not found, recognition is not assisted.

重要

短语列表功能仅以英语提供。The Phrase List feature is only available in English.

若要使用短语列表,请首先创建一个 PhraseListGrammar 对象,然后使用 AddPhrase 添加特定的单词和短语。To use a phrase list, first create a PhraseListGrammar object, then add specific words and phrases with AddPhrase.

PhraseListGrammar 所做的任何更改都将在下一次识别或重新连接到语音服务之后生效。Any changes to PhraseListGrammar take effect on the next recognition or after a reconnection to the Speech service.

var phraseList = PhraseListGrammar.FromRecognizer(recognizer);
phraseList.AddPhrase("Supercalifragilisticexpialidocious");

如果需要清除短语列表:If you need to clear your phrase list:

phraseList.Clear();

提高识别精确度的其他方式Other options to improve recognition accuracy

短语列表只是提高识别准确度的一种方式。Phrase lists are only one option to improve recognition accuracy. 也可执行以下操作:You can also:

语音服务的核心功能之一是能够识别并转录人类语音(通常称为语音转文本)。One of the core features of the Speech service is the ability to recognize and transcribe human speech (often referred to as speech-to-text). 本快速入门介绍如何在应用和产品中使用语音 SDK 来执行高质量的语音转文本转换。In this quickstart, you learn how to use the Speech SDK in your apps and products to perform high-quality speech-to-text conversion.

跳转到 GitHub 上的示例Skip to samples on GitHub

如果要直接跳到示例代码,请参阅 GitHub 上的 C++ 快速入门示例If you want to skip straight to sample code, see the C++ quickstart samples on GitHub.

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

安装语音 SDKInstall the Speech SDK

你需要先安装语音 SDK,然后才能执行任何操作。Before you can do anything, you'll need to install the Speech SDK. 根据你的平台,使用以下说明:Depending on your platform, use the following instructions:

创建语音配置Create a speech configuration

若要使用语音 SDK 调用语音服务,需要创建 SpeechConfigTo call the Speech service using the Speech SDK, you need to create a SpeechConfig. 此类包含有关你的订阅的信息,例如你的密钥和关联的区域、终结点、主机或授权令牌。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token. 通过使用密钥和区域创建 SpeechConfigCreate a SpeechConfig by using your key and region. 请参阅查找密钥和区域页面,查找密钥区域对。See the Find keys and region page to find your key-region pair.

using namespace std;
using namespace Microsoft::CognitiveServices::Speech;

auto config = SpeechConfig::FromSubscription("<paste-your-subscription-key>", "<paste-your-region>");

可以通过以下其他几种方法初始化 SpeechConfigThere are a few other ways that you can initialize a SpeechConfig:

  • 使用终结点:传入语音服务终结点。With an endpoint: pass in a Speech service endpoint. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用主机:传入主机地址。With a host: pass in a host address. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用授权令牌:传入授权令牌和关联的区域。With an authorization token: pass in an authorization token and the associated region.

备注

无论你是要执行语音识别、语音合成、翻译,还是意向识别,都需要创建一个配置。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

从麦克风识别Recognize from microphone

若要使用设备麦克风识别语音,需使用 FromDefaultMicrophoneInput() 创建 AudioConfigTo recognize speech using your device microphone, create an AudioConfig using FromDefaultMicrophoneInput(). 然后初始化 SpeechRecognizer,传递 audioConfigconfigThen initialize a SpeechRecognizer, passing your audioConfig and config.

using namespace Microsoft::CognitiveServices::Speech::Audio;

auto audioConfig = AudioConfig::FromDefaultMicrophoneInput();
auto recognizer = SpeechRecognizer::FromConfig(config, audioConfig);

cout << "Speak into your microphone." << std::endl;
auto result = recognizer->RecognizeOnceAsync().get();
cout << "RECOGNIZED: Text=" << result->Text << std::endl;

如果你想使用特定的音频输入设备,则需要在 AudioConfig 中指定设备 ID。 了解如何获取音频输入设备的设备 IDLearn how to get the device ID for your audio input device.

从文件识别Recognize from file

如果要从音频文件(而不是使用麦克风)识别语音,则仍需要创建 AudioConfigIf you want to recognize speech from an audio file instead of using a microphone, you still need to create an AudioConfig. 但创建 AudioConfig 时,需要调用 FromWavFileInput()(而不是调用 FromDefaultMicrophoneInput())并传递文件路径。However, when you create the AudioConfig, instead of calling FromDefaultMicrophoneInput(), you call FromWavFileInput() and pass the file path.

using namespace Microsoft::CognitiveServices::Speech::Audio;

auto audioInput = AudioConfig::FromWavFileInput("YourAudioFile.wav");
auto recognizer = SpeechRecognizer::FromConfig(config, audioInput);

auto result = recognizer->RecognizeOnceAsync().get();
cout << "RECOGNIZED: Text=" << result->Text << std::endl;

识别语音Recognize speech

用于 C++ 的语音 SDK 的识别器类公开了一些可用于语音识别的方法。The Recognizer class for the Speech SDK for C++ exposes a few methods that you can use for speech recognition.

单步识别Single-shot recognition

单步识别可异步识别单个言语。Single-shot recognition asynchronously recognizes a single utterance. 单个言语的结束是通过在结束时倾听静音或处理最长 15 秒音频时确定的。The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed. 下面是使用 RecognizeOnceAsync 进行异步单步识别的示例:Here's an example of asynchronous single-shot recognition using RecognizeOnceAsync:

auto result = recognizer->RecognizeOnceAsync().get();

需要编写一些代码来处理结果。You'll need to write some code to handle the result. 此示例计算 result->ReasonThis sample evaluates the result->Reason:

  • 输出识别结果:ResultReason::RecognizedSpeechPrints the recognition result: ResultReason::RecognizedSpeech
  • 如果没有识别匹配项,请通知用户:ResultReason::NoMatchIf there is no recognition match, inform the user: ResultReason::NoMatch
  • 如果遇到错误,则输出错误消息:ResultReason::CanceledIf an error is encountered, print the error message: ResultReason::Canceled
switch (result->Reason)
{
    case ResultReason::RecognizedSpeech:
        cout << "We recognized: " << result->Text << std::endl;
        break;
    case ResultReason::NoMatch:
        cout << "NOMATCH: Speech could not be recognized." << std::endl;
        break;
    case ResultReason::Canceled:
        {
            auto cancellation = CancellationDetails::FromResult(result);
            cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
    
            if (cancellation->Reason == CancellationReason::Error) {
                cout << "CANCELED: ErrorCode= " << (int)cancellation->ErrorCode << std::endl;
                cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
                cout << "CANCELED: Did you update the subscription info?" << std::endl;
            }
        }
        break;
    default:
        break;
}

连续识别Continuous recognition

连续识别涉及的方面比单步识别多一点。Continuous recognition is a bit more involved than single-shot recognition. 它要求你订阅 RecognizingRecognizedCanceled 事件以获取识别结果。It requires you to subscribe to the Recognizing, Recognized, and Canceled events to get the recognition results. 若要停止识别,必须调用 StopContinuousRecognitionAsyncTo stop recognition, you must call StopContinuousRecognitionAsync. 下面是有关如何对音频输入文件执行连续识别的示例。Here's an example of how continuous recognition is performed on an audio input file.

首先,我们将定义输入并初始化一个 SpeechRecognizerLet's start by defining the input and initializing a SpeechRecognizer:

auto audioInput = AudioConfig::FromWavFileInput("YourAudioFile.wav");
auto recognizer = SpeechRecognizer::FromConfig(config, audioInput);

接下来,让我们创建一个变量来管理语音识别的状态。Next, let's create a variable to manage the state of speech recognition. 首先,我们将声明 promise<void>,因为在开始识别时,我们可以放心地假定该操作尚未完成。To start, we'll declare a promise<void>, since at the start of recognition we can safely assume that it's not finished.

promise<void> recognitionEnd;

我们将订阅从 SpeechRecognizer 发送的事件。We'll subscribe to the events sent from the SpeechRecognizer.

  • Recognizing:事件信号,包含中间识别结果。Recognizing: Signal for events containing intermediate recognition results.
  • Recognized:事件信号,包含最终识别结果(指示成功的识别尝试)。Recognized: Signal for events containing final recognition results (indicating a successful recognition attempt).
  • SessionStopped:事件信号,指示识别会话的结束(操作)。SessionStopped: Signal for events indicating the end of a recognition session (operation).
  • Canceled:事件信号,包含已取消的识别结果(指示因直接取消请求或者传输或协议失败导致的识别尝试取消)。Canceled: Signal for events containing canceled recognition results (indicating a recognition attempt that was canceled as a result or a direct cancellation request or, alternatively, a transport or protocol failure).
recognizer->Recognizing.Connect([](const SpeechRecognitionEventArgs& e)
    {
        cout << "Recognizing:" << e.Result->Text << std::endl;
    });

recognizer->Recognized.Connect([](const SpeechRecognitionEventArgs& e)
    {
        if (e.Result->Reason == ResultReason::RecognizedSpeech)
        {
            cout << "RECOGNIZED: Text=" << e.Result->Text 
                 << " (text could not be translated)" << std::endl;
        }
        else if (e.Result->Reason == ResultReason::NoMatch)
        {
            cout << "NOMATCH: Speech could not be recognized." << std::endl;
        }
    });

recognizer->Canceled.Connect([&recognitionEnd](const SpeechRecognitionCanceledEventArgs& e)
    {
        cout << "CANCELED: Reason=" << (int)e.Reason << std::endl;
        if (e.Reason == CancellationReason::Error)
        {
            cout << "CANCELED: ErrorCode=" << (int)e.ErrorCode << "\n"
                 << "CANCELED: ErrorDetails=" << e.ErrorDetails << "\n"
                 << "CANCELED: Did you update the subscription info?" << std::endl;

            recognitionEnd.set_value(); // Notify to stop recognition.
        }
    });

recognizer->SessionStopped.Connect([&recognitionEnd](const SessionEventArgs& e)
    {
        cout << "Session stopped.";
        recognitionEnd.set_value(); // Notify to stop recognition.
    });

完成所有设置后,可以调用 StopContinuousRecognitionAsyncWith everything set up, we can call StopContinuousRecognitionAsync.

// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
recognizer->StartContinuousRecognitionAsync().get();

// Waits for recognition end.
recognitionEnd.get_future().get();

// Stops recognition.
recognizer->StopContinuousRecognitionAsync().get();

听写模式Dictation mode

使用连续识别时,可以使用相应的“启用听写”功能启用听写处理。When using continuous recognition, you can enable dictation processing by using the corresponding "enable dictation" function. 此模式将导致语音配置实例解释句子结构的单词说明(如标点符号)。This mode will cause the speech config instance to interpret word descriptions of sentence structures such as punctuation. 例如,言语“你居住在城镇吗问号”会被解释为文本“你居住在城镇吗?”。For example, the utterance "Do you live in town question mark" would be interpreted as the text "Do you live in town?".

若要启用听写模式,请在 SpeechConfig 上使用 EnableDictation 方法。To enable dictation mode, use the EnableDictation method on your SpeechConfig.

config->EnableDictation();

更改源语言Change source language

语音识别的常见任务是指定输入(或源)语言。A common task for speech recognition is specifying the input (or source) language. 让我们看看如何将输入语言更改为德语。Let's take a look at how you would change the input language to German. 在代码中找到 SpeechConfig,并直接在其下方添加此行。In your code, find your SpeechConfig, then add this line directly below it.

config->SetSpeechRecognitionLanguage("de-DE");

SetSpeechRecognitionLanguage 是采用字符串作为实参的形参。SetSpeechRecognitionLanguage is a parameter that takes a string as an argument. 可以提供受支持的区域设置/语言的列表中的任何值。You can provide any value in the list of supported locales/languages.

提高识别准确度Improve recognition accuracy

可以通过多种方式使用语音 SDK 来提高识别的准确度。There are a few ways to improve recognition accuracy with the Speech SDK. 让我们看一下短语列表。Let's take a look at Phrase Lists. 短语列表用于标识音频数据中的已知短语,如人的姓名或特定位置。Phrase Lists are used to identify known phrases in audio data, like a person's name or a specific location. 可以将单个词或完整短语添加到短语列表。Single words or complete phrases can be added to a Phrase List. 在识别期间,如果音频中包含整个短语的完全匹配项,则使用短语列表中的条目。During recognition, an entry in a phrase list is used if an exact match for the entire phrase is included in the audio. 如果找不到与短语完全匹配的项,则不支持识别。If an exact match to the phrase is not found, recognition is not assisted.

重要

短语列表功能仅以英语提供。The Phrase List feature is only available in English.

若要使用短语列表,请首先创建一个 PhraseListGrammar 对象,然后使用 AddPhrase 添加特定的单词和短语。To use a phrase list, first create a PhraseListGrammar object, then add specific words and phrases with AddPhrase.

PhraseListGrammar 所做的任何更改都将在下一次识别或重新连接到语音服务之后生效。Any changes to PhraseListGrammar take effect on the next recognition or after a reconnection to the Speech service.

auto phraseListGrammar = PhraseListGrammar::FromRecognizer(recognizer);
phraseListGrammar->AddPhrase("Supercalifragilisticexpialidocious");

如果需要清除短语列表:If you need to clear your phrase list:

phraseListGrammar->Clear();

提高识别精确度的其他方式Other options to improve recognition accuracy

短语列表只是提高识别准确度的一种方式。Phrase lists are only one option to improve recognition accuracy. 也可执行以下操作:You can also:

语音服务的核心功能之一是能够识别并转录人类语音(通常称为语音转文本)。One of the core features of the Speech service is the ability to recognize and transcribe human speech (often referred to as speech-to-text). 本快速入门介绍如何在应用和产品中使用语音 SDK 来执行高质量的语音转文本转换。In this quickstart, you learn how to use the Speech SDK in your apps and products to perform high-quality speech-to-text conversion.

跳转到 GitHub 上的示例Skip to samples on GitHub

如果要直接跳到示例代码,请参阅 GitHub 上的 Go 快速入门示例If you want to skip straight to sample code, see the Go quickstart samples on GitHub.

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

安装语音 SDKInstall the Speech SDK

你需要先安装适用于 Go 的语音 SDK,然后才能执行任何操作。Before you can do anything, you'll need to install the Speech SDK for Go.

从麦克风将语音转换为文本Speech-to-text from microphone

使用下面的代码示例从默认的设备麦克风运行语音识别。Use the following code sample to run speech recognition from your default device microphone. 将变量 subscriptionregion 替换为你的订阅和区域密钥。Replace the variables subscription and region with your subscription and region keys. 运行脚本将在默认麦克风上启动识别会话并输出文本。Running the script will start a recognition session on your default microphone and output text.

import (
    "bufio"
    "fmt"
    "os"

    "github.com/Microsoft/cognitive-services-speech-sdk-go/audio"
    "github.com/Microsoft/cognitive-services-speech-sdk-go/speech"
)

func sessionStartedHandler(event speech.SessionEventArgs) {
    defer event.Close()
    fmt.Println("Session Started (ID=", event.SessionID, ")")
}

func sessionStoppedHandler(event speech.SessionEventArgs) {
    defer event.Close()
    fmt.Println("Session Stopped (ID=", event.SessionID, ")")
}

func recognizingHandler(event speech.SpeechRecognitionEventArgs) {
    defer event.Close()
    fmt.Println("Recognizing:", event.Result.Text)
}

func recognizedHandler(event speech.SpeechRecognitionEventArgs) {
    defer event.Close()
    fmt.Println("Recognized:", event.Result.Text)
}

func cancelledHandler(event speech.SpeechRecognitionCanceledEventArgs) {
    defer event.Close()
    fmt.Println("Received a cancellation: ", event.ErrorDetails)
}

func main() {
    subscription :=  "YOUR_SUBSCRIPTION_KEY"
    region := "YOUR_SUBSCRIPTIONKEY_REGION"

    audioConfig, err := audio.NewAudioConfigFromDefaultMicrophoneInput()
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer audioConfig.Close()
    config, err := speech.NewSpeechConfigFromSubscription(subscription, region)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer config.Close()
    speechRecognizer, err := speech.NewSpeechRecognizerFromConfig(config, audioConfig)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer speechRecognizer.Close()
    speechRecognizer.SessionStarted(sessionStartedHandler)
    speechRecognizer.SessionStopped(sessionStoppedHandler)
    speechRecognizer.Recognizing(recognizingHandler)
    speechRecognizer.Recognized(recognizedHandler)
    speechRecognizer.Canceled(cancelledHandler)
    speechRecognizer.StartContinuousRecognitionAsync()
    defer speechRecognizer.StopContinuousRecognitionAsync()
    bufio.NewReader(os.Stdin).ReadBytes('\n')
}

有关 SpeechConfigSpeechRecognizer 类的详细信息,请参阅参考文档。See the reference docs for detailed information on the SpeechConfig and SpeechRecognizer classes.

从音频文件将语音转换为文本Speech-to-text from audio file

使用以下示例从音频文件运行语音识别。Use the following sample to run speech recognition from an audio file. 将变量 subscriptionregion 替换为你的订阅和区域密钥。Replace the variables subscription and region with your subscription and region keys. 此外,将变量 file 替换为 .wav 文件的路径。Additionally, replace the variable file with a path to a .wav file. 运行脚本将从文件识别语音,并输出文本结果。Running the script will recognize speech from the file, and output the text result.

import (
    "fmt"
    "time"

    "github.com/Microsoft/cognitive-services-speech-sdk-go/audio"
    "github.com/Microsoft/cognitive-services-speech-sdk-go/speech"
)

func main() {
    subscription :=  "YOUR_SUBSCRIPTION_KEY"
    region := "YOUR_SUBSCRIPTIONKEY_REGION"
    file := "path/to/file.wav"

    audioConfig, err := audio.NewAudioConfigFromWavFileInput(file)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer audioConfig.Close()
    config, err := speech.NewSpeechConfigFromSubscription(subscription, region)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer config.Close()
    speechRecognizer, err := speech.NewSpeechRecognizerFromConfig(config, audioConfig)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer speechRecognizer.Close()
    speechRecognizer.SessionStarted(func(event speech.SessionEventArgs) {
        defer event.Close()
        fmt.Println("Session Started (ID=", event.SessionID, ")")
    })
    speechRecognizer.SessionStopped(func(event speech.SessionEventArgs) {
        defer event.Close()
        fmt.Println("Session Stopped (ID=", event.SessionID, ")")
    })

    task := speechRecognizer.RecognizeOnceAsync()
    var outcome speech.SpeechRecognitionOutcome
    select {
    case outcome = <-task:
    case <-time.After(5 * time.Second):
        fmt.Println("Timed out")
        return
    }
    defer outcome.Close()
    if outcome.Error != nil {
        fmt.Println("Got an error: ", outcome.Error)
    }
    fmt.Println("Got a recognition!")
    fmt.Println(outcome.Result.Text)
}

有关 SpeechConfigSpeechRecognizer 类的详细信息,请参阅参考文档。See the reference docs for detailed information on the SpeechConfig and SpeechRecognizer classes.

语音服务的核心功能之一是能够识别并转录人类语音(通常称为语音转文本)。One of the core features of the Speech service is the ability to recognize and transcribe human speech (often referred to as speech-to-text). 本快速入门介绍如何在应用和产品中使用语音 SDK 来执行高质量的语音转文本转换。In this quickstart, you learn how to use the Speech SDK in your apps and products to perform high-quality speech-to-text conversion.

跳转到 GitHub 上的示例Skip to samples on GitHub

如果要直接跳到示例代码,请参阅 GitHub 上的 Java 快速入门示例If you want to skip straight to sample code, see the Java quickstart samples on GitHub.

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

安装语音 SDKInstall the Speech SDK

你需要先安装语音 SDK,然后才能执行任何操作。Before you can do anything, you'll need to install the Speech SDK. 根据你的平台,使用以下说明:Depending on your platform, use the following instructions:

创建语音配置Create a speech configuration

若要使用语音 SDK 调用语音服务,需要创建 SpeechConfigTo call the Speech service using the Speech SDK, you need to create a SpeechConfig. 此类包含有关你的订阅的信息,例如你的密钥和关联的区域、终结点、主机或授权令牌。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token. 通过使用密钥和区域创建 SpeechConfigCreate a SpeechConfig by using your key and region. 请参阅查找密钥和区域页面,查找密钥区域对。See the Find keys and region page to find your key-region pair.

import com.microsoft.cognitiveservices.speech.*;
import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;

public class Program {
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        SpeechConfig speechConfig = SpeechConfig.fromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
    }
}

可以通过以下其他几种方法初始化 SpeechConfigThere are a few other ways that you can initialize a SpeechConfig:

  • 使用终结点:传入语音服务终结点。With an endpoint: pass in a Speech service endpoint. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用主机:传入主机地址。With a host: pass in a host address. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用授权令牌:传入授权令牌和关联的区域。With an authorization token: pass in an authorization token and the associated region.

备注

无论你是要执行语音识别、语音合成、翻译,还是意向识别,都需要创建一个配置。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

从麦克风识别Recognize from microphone

若要使用设备麦克风识别语音,需使用 fromDefaultMicrophoneInput() 创建 AudioConfigTo recognize speech using your device microphone, create an AudioConfig using fromDefaultMicrophoneInput(). 然后初始化 SpeechRecognizer,传递 audioConfigconfigThen initialize aSpeechRecognizer, passing your audioConfig and config.

import com.microsoft.cognitiveservices.speech.*;
import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;

public class Program {
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        SpeechConfig speechConfig = SpeechConfig.fromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
        fromMic(speechConfig);
    }

    public static void fromMic(SpeechConfig speechConfig) throws InterruptedException, ExecutionException {
        AudioConfig audioConfig = AudioConfig.fromDefaultMicrophoneInput();
        SpeechRecognizer recognizer = new SpeechRecognizer(speechConfig, audioConfig);

        System.out.println("Speak into your microphone.");
        Future<SpeechRecognitionResult> task = recognizer.recognizeOnceAsync();
        SpeechRecognitionResult result = task.get();
        System.out.println("RECOGNIZED: Text=" + result.getText());
    }
}

如果你想使用特定的音频输入设备,则需要在 AudioConfig 中指定设备 ID。 了解如何获取音频输入设备的设备 IDLearn how to get the device ID for your audio input device.

从文件识别Recognize from file

如果要从音频文件(而不是使用麦克风)识别语音,则仍需要创建 AudioConfigIf you want to recognize speech from an audio file instead of using a microphone, you still need to create an AudioConfig. 但创建 AudioConfig 时,需要调用 fromWavFileInput()(而不是调用 fromDefaultMicrophoneInput())并传递文件路径。However, when you create the AudioConfig, instead of calling fromDefaultMicrophoneInput(), call fromWavFileInput() and pass the file path.

import com.microsoft.cognitiveservices.speech.*;
import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;

public class Program {
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        SpeechConfig speechConfig = SpeechConfig.fromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
        fromFile(speechConfig);
    }

    public static void fromFile(SpeechConfig speechConfig) throws InterruptedException, ExecutionException {
        AudioConfig audioConfig = AudioConfig.fromWavFileInput("YourAudioFile.wav");
        SpeechRecognizer recognizer = new SpeechRecognizer(speechConfig, audioConfig);
        
        Future<SpeechRecognitionResult> task = recognizer.recognizeOnceAsync();
        SpeechRecognitionResult result = task.get();
        System.out.println("RECOGNIZED: Text=" + result.getText());
    }
}

错误处理Error handling

前面的示例只使用 result.getText() 获取已识别的文本,但要处理错误和其他响应,需要编写一些代码来处理结果。The previous examples simply get the recognized text using result.getText(), but to handle errors and other responses, you'll need to write some code to handle the result. 以下示例计算 result.getReason() 和:The following example evaluates result.getReason() and:

  • 输出识别结果:ResultReason.RecognizedSpeechPrints the recognition result: ResultReason.RecognizedSpeech
  • 如果没有识别匹配项,请通知用户:ResultReason.NoMatchIf there is no recognition match, inform the user: ResultReason.NoMatch
  • 如果遇到错误,则输出错误消息:ResultReason.CanceledIf an error is encountered, print the error message: ResultReason.Canceled
switch (result.getReason()) {
    case ResultReason.RecognizedSpeech:
        System.out.println("We recognized: " + result.getText());
        exitCode = 0;
        break;
    case ResultReason.NoMatch:
        System.out.println("NOMATCH: Speech could not be recognized.");
        break;
    case ResultReason.Canceled: {
            CancellationDetails cancellation = CancellationDetails.fromResult(result);
            System.out.println("CANCELED: Reason=" + cancellation.getReason());

            if (cancellation.getReason() == CancellationReason.Error) {
                System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
                System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
                System.out.println("CANCELED: Did you update the subscription info?");
            }
        }
        break;
}

连续识别Continuous recognition

前面的示例使用单步识别,可识别单个言语。The previous examples use single-shot recognition, which recognizes a single utterance. 单个言语的结束是通过在结束时倾听静音或处理最长 15 秒音频时确定的。The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.

与此相反,当你想控制何时停止识别时,需要使用连续识别。In contrast, continuous recognition is used when you want to control when to stop recognizing. 它要求你订阅 recognizingrecognizedcanceled 事件以获取识别结果。It requires you to subscribe to the recognizing, recognized, and canceled events to get the recognition results. 若要停止识别,必须调用 stopContinuousRecognitionAsyncTo stop recognition, you must call stopContinuousRecognitionAsync. 下面是有关如何对音频输入文件执行连续识别的示例。Here's an example of how continuous recognition is performed on an audio input file.

首先,我们将定义输入并初始化一个 SpeechRecognizerLet's start by defining the input and initializing a SpeechRecognizer:

AudioConfig audioConfig = AudioConfig.fromWavFileInput("YourAudioFile.wav");
SpeechRecognizer recognizer = new SpeechRecognizer(config, audioConfig);

接下来,让我们创建一个变量来管理语音识别的状态。Next, let's create a variable to manage the state of speech recognition. 首先,我们将在类范围内声明一个 SemaphoreTo start, we'll declare a Semaphore at the class scope.

private static Semaphore stopTranslationWithFileSemaphore;

我们将订阅从 SpeechRecognizer 发送的事件。We'll subscribe to the events sent from the SpeechRecognizer.

  • recognizing:事件信号,包含中间识别结果。recognizing: Signal for events containing intermediate recognition results.
  • recognized:事件信号,包含最终识别结果(指示成功的识别尝试)。recognized: Signal for events containing final recognition results (indicating a successful recognition attempt).
  • sessionStopped:事件信号,指示识别会话的结束(操作)。sessionStopped: Signal for events indicating the end of a recognition session (operation).
  • canceled:事件信号,包含已取消的识别结果(指示因直接取消请求或者传输或协议失败导致的识别尝试取消)。canceled: Signal for events containing canceled recognition results (indicating a recognition attempt that was canceled as a result or a direct cancellation request or, alternatively, a transport or protocol failure).
// First initialize the semaphore.
stopTranslationWithFileSemaphore = new Semaphore(0);

recognizer.recognizing.addEventListener((s, e) -> {
    System.out.println("RECOGNIZING: Text=" + e.getResult().getText());
});

recognizer.recognized.addEventListener((s, e) -> {
    if (e.getResult().getReason() == ResultReason.RecognizedSpeech) {
        System.out.println("RECOGNIZED: Text=" + e.getResult().getText());
    }
    else if (e.getResult().getReason() == ResultReason.NoMatch) {
        System.out.println("NOMATCH: Speech could not be recognized.");
    }
});

recognizer.canceled.addEventListener((s, e) -> {
    System.out.println("CANCELED: Reason=" + e.getReason());

    if (e.getReason() == CancellationReason.Error) {
        System.out.println("CANCELED: ErrorCode=" + e.getErrorCode());
        System.out.println("CANCELED: ErrorDetails=" + e.getErrorDetails());
        System.out.println("CANCELED: Did you update the subscription info?");
    }

    stopTranslationWithFileSemaphore.release();
});

recognizer.sessionStopped.addEventListener((s, e) -> {
    System.out.println("\n    Session stopped event.");
    stopTranslationWithFileSemaphore.release();
});

完成所有设置后,可以调用 startContinuousRecognitionAsyncWith everything set up, we can call startContinuousRecognitionAsync.

// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
recognizer.startContinuousRecognitionAsync().get();

// Waits for completion.
stopTranslationWithFileSemaphore.acquire();

// Stops recognition.
recognizer.stopContinuousRecognitionAsync().get();

听写模式Dictation mode

使用连续识别时,可以使用相应的“启用听写”功能启用听写处理。When using continuous recognition, you can enable dictation processing by using the corresponding "enable dictation" function. 此模式将导致语音配置实例解释句子结构的单词说明(如标点符号)。This mode will cause the speech config instance to interpret word descriptions of sentence structures such as punctuation. 例如,言语“你居住在城镇吗问号”会被解释为文本“你居住在城镇吗?”。For example, the utterance "Do you live in town question mark" would be interpreted as the text "Do you live in town?".

若要启用听写模式,请在 SpeechConfig 上使用 enableDictation 方法。To enable dictation mode, use the enableDictation method on your SpeechConfig.

config.enableDictation();

更改源语言Change source language

语音识别的常见任务是指定输入(或源)语言。A common task for speech recognition is specifying the input (or source) language. 让我们看看如何将输入语言更改为法语。Let's take a look at how you would change the input language to French. 在代码中找到 SpeechConfig,并直接在其下方添加此行。In your code, find your SpeechConfig, then add this line directly below it.

config.setSpeechRecognitionLanguage("fr-FR");

setSpeechRecognitionLanguage 是采用字符串作为实参的形参。setSpeechRecognitionLanguage is a parameter that takes a string as an argument. 可以提供受支持的区域设置/语言的列表中的任何值。You can provide any value in the list of supported locales/languages.

提高识别准确度Improve recognition accuracy

可以通过多种方式使用语音 SDK 来提高识别的准确度。There are a few ways to improve recognition accuracy with the Speech SDK. 让我们看一下短语列表。Let's take a look at Phrase Lists. 短语列表用于标识音频数据中的已知短语,如人的姓名或特定位置。Phrase Lists are used to identify known phrases in audio data, like a person's name or a specific location. 可以将单个词或完整短语添加到短语列表。Single words or complete phrases can be added to a Phrase List. 在识别期间,如果音频中包含整个短语的完全匹配项,则使用短语列表中的条目。During recognition, an entry in a phrase list is used if an exact match for the entire phrase is included in the audio. 如果找不到与短语完全匹配的项,则不支持识别。If an exact match to the phrase is not found, recognition is not assisted.

重要

短语列表功能仅以英语提供。The Phrase List feature is only available in English.

若要使用短语列表,请首先创建一个 PhraseListGrammar 对象,然后使用 AddPhrase 添加特定的单词和短语。To use a phrase list, first create a PhraseListGrammar object, then add specific words and phrases with AddPhrase.

PhraseListGrammar 所做的任何更改都将在下一次识别或重新连接到语音服务之后生效。Any changes to PhraseListGrammar take effect on the next recognition or after a reconnection to the Speech service.

PhraseListGrammar phraseList = PhraseListGrammar.fromRecognizer(recognizer);
phraseList.addPhrase("Supercalifragilisticexpialidocious");

如果需要清除短语列表:If you need to clear your phrase list:

phraseList.clear();

提高识别精确度的其他方式Other options to improve recognition accuracy

短语列表只是提高识别准确度的一种方式。Phrase lists are only one option to improve recognition accuracy. 也可执行以下操作:You can also:

语音服务的核心功能之一是能够识别并转录人类语音(通常称为语音转文本)。One of the core features of the Speech service is the ability to recognize and transcribe human speech (often referred to as speech-to-text). 本快速入门介绍如何在应用和产品中使用语音 SDK 来执行高质量的语音转文本转换。In this quickstart, you learn how to use the Speech SDK in your apps and products to perform high-quality speech-to-text conversion.

跳转到 GitHub 上的示例Skip to samples on GitHub

如果要直接跳到示例代码,请参阅 GitHub 上的 JavaScript 快速入门示例If you want to skip straight to sample code, see the JavaScript quickstart samples on GitHub.

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

安装语音 SDKInstall the Speech SDK

需要先安装 JavaScript 语音 SDK,然后才能执行操作。Before you can do anything, you'll need to install the Speech SDK for JavaScript . 根据你的平台,使用以下说明:Depending on your platform, use the following instructions:

另外,请根据目标环境使用以下项之一:Additionally, depending on the target environment use one of the following:

下载并提取 JavaScript 语音 SDK microsoft.cognitiveservices.speech.sdk.bundle.js 文件,将其置于可供 HTML 文件访问的文件夹中。Download and extract the Speech SDK for JavaScript microsoft.cognitiveservices.speech.sdk.bundle.js file, and place it in a folder accessible to your HTML file.

<script src="microsoft.cognitiveservices.speech.sdk.bundle.js"></script>;

提示

如果以 Web 浏览器为目标并使用 <script> 标记,则引用类时不需 sdk 前缀。If you're targeting a web browser, and using the <script> tag; the sdk prefix is not needed when referencing classes. sdk 前缀是一个别名,用于为 require 模块命名。The sdk prefix is an alias used to name the require module.

创建语音配置Create a speech configuration

若要使用语音 SDK 调用语音服务,需要创建 SpeechConfigTo call the Speech service using the Speech SDK, you need to create a SpeechConfig. 此类包含有关你的订阅的信息,例如你的密钥和关联的区域、终结点、主机或授权令牌。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token. 使用密钥和区域创建 SpeechConfigCreate a SpeechConfig using your key and region. 请参阅查找密钥和区域页面,查找密钥区域对。See the Find keys and region page to find your key-region pair.

const speechConfig = sdk.SpeechConfig.fromSubscription("<paste-your-subscription-key>", "<paste-your-region>");

可以通过以下其他几种方法初始化 SpeechConfigThere are a few other ways that you can initialize a SpeechConfig:

  • 使用终结点:传入语音服务终结点。With an endpoint: pass in a Speech service endpoint. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用主机:传入主机地址。With a host: pass in a host address. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用授权令牌:传入授权令牌和关联的区域。With an authorization token: pass in an authorization token and the associated region.

备注

无论你是要执行语音识别、语音合成、翻译,还是意向识别,都需要创建一个配置。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

从麦克风识别(仅限浏览器)Recognize from microphone (Browser only)

若要使用设备麦克风识别语音,需使用 fromDefaultMicrophoneInput() 创建 AudioConfigTo recognize speech using your device microphone, create an AudioConfig using fromDefaultMicrophoneInput(). 然后初始化 SpeechRecognizer,传递 speechConfigaudioConfigThen initialize a SpeechRecognizer, passing your speechConfig and audioConfig.

const sdk = require("microsoft-cognitiveservices-speech-sdk");
const speechConfig = sdk.SpeechConfig.fromSubscription("<paste-your-subscription-key>", "<paste-your-region>");

function fromMic() {
    let audioConfig = sdk.AudioConfig.fromDefaultMicrophoneInput();
    let recognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
    
    console.log('Speak into your microphone.');
    recognizer.recognizeOnceAsync(result => {
        console.log(`RECOGNIZED: Text=${result.text}`);
    });
}
fromMic();

如果你想使用特定的音频输入设备,则需要在 AudioConfig 中指定设备 ID。If you want to use a specific audio input device, you need to specify the device ID in the AudioConfig. 了解如何获取音频输入设备的设备 IDLearn how to get the device ID for your audio input device.

从文件识别Recognize from file

若要在基于浏览器的 JavaScript 环境中识别音频文件中的语音,可以使用 fromWavFileInput() 函数创建一个 AudioConfigTo recognize speech from an audio file in a browser-based JavaScript environment, you use the fromWavFileInput() function to create an AudioConfig. 函数 fromWavFileInput() 需要一个 JavaScriptFile 对象作为参数。The function fromWavFileInput() expects a JavaScript File object as a parameter.

const sdk = require("microsoft-cognitiveservices-speech-sdk");
const speechConfig = sdk.SpeechConfig.fromSubscription("<paste-your-subscription-key>", "<paste-your-region>");

function fromFile() {
    // wavByteContent should be a byte array of the raw wav content
    let file = new File([wavByteContent]);
    let audioConfig = sdk.AudioConfig.fromWavFileInput(file);
    let recognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
    
    recognizer.recognizeOnceAsync(result => {
        console.log(`RECOGNIZED: Text=${result.text}`);
    });
}
fromFile();

错误处理。Error handling

前面的示例只从 result.text 获取已识别的文本,但要处理错误和其他响应,需要编写一些代码来处理结果。The previous examples simply get the recognized text from result.text, but to handle errors and other responses, you'll need to write some code to handle the result. 以下代码评估 result.reason 属性并:The following code evaluates the result.reason property and:

  • 输出识别结果:ResultReason.RecognizedSpeechPrints the recognition result: ResultReason.RecognizedSpeech
  • 如果没有识别匹配项,请通知用户:ResultReason.NoMatchIf there is no recognition match, inform the user: ResultReason.NoMatch
  • 如果遇到错误,则输出错误消息:ResultReason.CanceledIf an error is encountered, print the error message: ResultReason.Canceled
switch (result.reason) {
    case ResultReason.RecognizedSpeech:
        console.log(`RECOGNIZED: Text=${result.text}`);
        break;
    case ResultReason.NoMatch:
        console.log("NOMATCH: Speech could not be recognized.");
        break;
    case ResultReason.Canceled:
        const cancellation = CancellationDetails.fromResult(result);
        console.log(`CANCELED: Reason=${cancellation.reason}`);

        if (cancellation.reason == CancellationReason.Error) {
            console.log(`CANCELED: ErrorCode=${cancellation.ErrorCode}`);
            console.log(`CANCELED: ErrorDetails=${cancellation.errorDetails}`);
            console.log("CANCELED: Did you update the subscription info?");
        }
        break;
    }

连续识别Continuous recognition

前面的示例使用单步识别,可识别单个言语。The previous examples use single-shot recognition, which recognizes a single utterance. 单个言语的结束是通过在结束时倾听静音或处理最长 15 秒音频时确定的。The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.

与此相反,当你想控制何时停止识别时,需要使用连续识别。In contrast, continuous recognition is used when you want to control when to stop recognizing. 它要求你订阅 RecognizingRecognizedCanceled 事件以获取识别结果。It requires you to subscribe to the Recognizing, Recognized, and Canceled events to get the recognition results. 若要停止识别,必须调用 stopContinuousRecognitionAsyncTo stop recognition, you must call stopContinuousRecognitionAsync. 下面是有关如何对音频输入文件执行连续识别的示例。Here's an example of how continuous recognition is performed on an audio input file.

首先定义输入并初始化 SpeechRecognizerStart by defining the input and initializing a SpeechRecognizer:

const recognizer = new sdk.SpeechRecognizer(speechConfig);

接下来,我们将订阅从 SpeechRecognizer 发送的事件。Next, subscribe to the events sent from the SpeechRecognizer.

  • recognizing:事件信号,包含中间识别结果。recognizing: Signal for events containing intermediate recognition results.
  • recognized:事件信号,包含最终识别结果(指示成功的识别尝试)。recognized: Signal for events containing final recognition results (indicating a successful recognition attempt).
  • sessionStopped:事件信号,指示识别会话的结束(操作)。sessionStopped: Signal for events indicating the end of a recognition session (operation).
  • canceled:事件信号,包含已取消的识别结果(指示因直接取消请求或者传输或协议失败导致的识别尝试取消)。canceled: Signal for events containing canceled recognition results (indicating a recognition attempt that was canceled as a result or a direct cancellation request or, alternatively, a transport or protocol failure).
recognizer.recognizing = (s, e) => {
    console.log(`RECOGNIZING: Text=${e.result.text}`);
};

recognizer.recognized = (s, e) => {
    if (e.result.reason == ResultReason.RecognizedSpeech) {
        console.log(`RECOGNIZED: Text=${e.result.text}`);
    }
    else if (e.result.reason == ResultReason.NoMatch) {
        console.log("NOMATCH: Speech could not be recognized.");
    }
};

recognizer.canceled = (s, e) => {
    console.log(`CANCELED: Reason=${e.reason}`);

    if (e.reason == CancellationReason.Error) {
        console.log(`"CANCELED: ErrorCode=${e.errorCode}`);
        console.log(`"CANCELED: ErrorDetails=${e.errorDetails}`);
        console.log("CANCELED: Did you update the subscription info?");
    }

    recognizer.stopContinuousRecognitionAsync();
};

recognizer.sessionStopped = (s, e) => {
    console.log("\n    Session stopped event.");
    recognizer.stopContinuousRecognitionAsync();
};

设置所有项后,调用 startContinuousRecognitionAsync 开始识别。With everything set up, call startContinuousRecognitionAsync to start recognizing.

recognizer.startContinuousRecognitionAsync();

// make the following call at some point to stop recognition.
// recognizer.StopContinuousRecognitionAsync();

听写模式Dictation mode

使用连续识别时,可以使用相应的“启用听写”功能启用听写处理。When using continuous recognition, you can enable dictation processing by using the corresponding "enable dictation" function. 此模式将导致语音配置实例解释句子结构的单词说明(如标点符号)。This mode will cause the speech config instance to interpret word descriptions of sentence structures such as punctuation. 例如,言语“你居住在城镇吗问号”会被解释为文本“你居住在城镇吗?”。For example, the utterance "Do you live in town question mark" would be interpreted as the text "Do you live in town?".

若要启用听写模式,请在 SpeechConfig 上使用 enableDictation 方法。To enable dictation mode, use the enableDictation method on your SpeechConfig.

speechConfig.enableDictation();

更改源语言Change source language

语音识别的常见任务是指定输入(或源)语言。A common task for speech recognition is specifying the input (or source) language. 让我们看看如何将输入语言更改为意大利语。Let's take a look at how you would change the input language to Italian. 在代码中找到 SpeechConfig,并直接在其下方添加此行。In your code, find your SpeechConfig, then add this line directly below it.

speechConfig.speechRecognitionLanguage = "it-IT";

speechRecognitionLanguage 属性需要语言区域设置格式字符串。The speechRecognitionLanguage property expects a language-locale format string. 可以提供受支持的区域设置/语言的列表中“区域设置”列中的任何值 。You can provide any value in the Locale column in the list of supported locales/languages.

提高识别准确度Improve recognition accuracy

可以通过多种方式提高语音的识别准确性。让我们看看短语列表。There are a few ways to improve recognition accuracy with the Speech Let's take a look at Phrase Lists. 短语列表用于标识音频数据中的已知短语,如人的姓名或特定位置。Phrase Lists are used to identify known phrases in audio data, like a person's name or a specific location. 可以将单个词或完整短语添加到短语列表。Single words or complete phrases can be added to a Phrase List. 在识别期间,如果音频中包含整个短语的完全匹配项,则使用短语列表中的条目。During recognition, an entry in a phrase list is used if an exact match for the entire phrase is included in the audio. 如果找不到与短语完全匹配的项,则不支持识别。If an exact match to the phrase is not found, recognition is not assisted.

重要

短语列表功能仅以英语提供。The Phrase List feature is only available in English.

若要使用短语列表,请首先创建一个 PhraseListGrammar 对象,然后使用 addPhrase 添加特定的单词和短语。To use a phrase list, first create a PhraseListGrammar object, then add specific words and phrases with addPhrase.

PhraseListGrammar 所做的任何更改都将在下一次识别或重新连接到语音服务之后生效。Any changes to PhraseListGrammar take effect on the next recognition or after a reconnection to the Speech service.

const phraseList = sdk.PhraseListGrammar.fromRecognizer(recognizer);
phraseList.addPhrase("Supercalifragilisticexpialidocious");

如果需要清除短语列表:If you need to clear your phrase list:

phraseList.clear();

提高识别精确度的其他方式Other options to improve recognition accuracy

短语列表只是提高识别准确度的一种方式。Phrase lists are only one option to improve recognition accuracy. 也可执行以下操作:You can also:

可以使用适用于 Swift 和 Objective-C 的语音 SDK 将语音转录为文本。You can transcribe speech into text using the Speech SDK for Swift and Objective-C.

先决条件Prerequisites

以下示例假定你有 Azure 帐户和语音服务订阅。The following samples assume that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

安装语音 SDK 和示例Install Speech SDK and samples

认知服务语音 SDK 包含适用于 iOS 和 Mac 的、以 Swift 和 Objective-C 编写的示例。The Cognitive Services Speech SDK contains samples written in in Swift and Objective-C for iOS and Mac. 单击链接可查看每个示例的安装说明:Click a link to see installation instructions for each sample:

我们还提供了在线的适用于 Objective-C 的语音 SDK 参考We also provide an online Speech SDK for Objective-C Reference.

语音服务的核心功能之一是能够识别并转录人类语音(通常称为语音转文本)。One of the core features of the Speech service is the ability to recognize and transcribe human speech (often referred to as speech-to-text). 本快速入门介绍如何在应用和产品中使用语音 SDK 来执行高质量的语音转文本转换。In this quickstart, you learn how to use the Speech SDK in your apps and products to perform high-quality speech-to-text conversion.

跳转到 GitHub 上的示例Skip to samples on GitHub

如果要直接跳到示例代码,请参阅 GitHub 上的 Python 快速入门示例If you want to skip straight to sample code, see the Python quickstart samples on GitHub.

先决条件Prerequisites

本文假设:This article assumes:

安装和导入语音 SDKInstall and import the Speech SDK

你需要先安装语音 SDK,然后才能执行任何操作。Before you can do anything, you'll need to install the Speech SDK.

pip install azure-cognitiveservices-speech

如果使用的是 macOS 且你遇到安装问题,则可能需要先运行此命令。If you're on macOS and run into install issues, you may need to run this command first.

python3 -m pip install --upgrade pip

安装语音 SDK 后,将其导入到 Python 项目中。After the Speech SDK is installed, import it into your Python project.

import azure.cognitiveservices.speech as speechsdk

创建语音配置Create a speech configuration

若要使用语音 SDK 调用语音服务,需要创建 SpeechConfigTo call the Speech service using the Speech SDK, you need to create a SpeechConfig. 此类包含有关你的订阅的信息,例如你的密钥和关联的区域、终结点、主机或授权令牌。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token. 使用密钥和区域创建 SpeechConfigCreate a SpeechConfig using your key and region. 请参阅查找密钥和区域页面,查找密钥区域对。See the Find keys and region page to find your key-region pair.

speech_config = speechsdk.SpeechConfig(subscription="<paste-your-subscription-key>", region="<paste-your-region>")

可以通过以下其他几种方法初始化 SpeechConfigThere are a few other ways that you can initialize a SpeechConfig:

  • 使用终结点:传入语音服务终结点。With an endpoint: pass in a Speech service endpoint. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用主机:传入主机地址。With a host: pass in a host address. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用授权令牌:传入授权令牌和关联的区域。With an authorization token: pass in an authorization token and the associated region.

备注

无论你是要执行语音识别、语音合成、翻译,还是意向识别,都需要创建一个配置。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

从麦克风识别Recognize from microphone

若要使用设备麦克风识别语音,只需创建 SpeechRecognizer(无需传递 AudioConfig),并传递 speech_configTo recognize speech using your device microphone, simply create a SpeechRecognizer without passing an AudioConfig, and pass your speech_config.

import azure.cognitiveservices.speech as speechsdk

def from_mic():
    speech_config = speechsdk.SpeechConfig(subscription="<paste-your-subscription-key>", region="<paste-your-region>")
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
    
    print("Speak into your microphone.")
    result = speech_recognizer.recognize_once_async().get()
    print(result.text)

from_mic()

如果你想使用特定的音频输入设备,则需要在 AudioConfig 中指定设备 ID,并将其传递给 SpeechRecognizer 构造函数的 audio_config 参数。If you want to use a specific audio input device, you need to specify the device ID in an AudioConfig, and pass it to the SpeechRecognizer constructor's audio_config param. 了解如何获取音频输入设备的设备 IDLearn how to get the device ID for your audio input device.

从文件识别Recognize from file

如果要从音频文件(而不是使用麦克风)识别语音,请创建 AudioConfig 并使用 filename 参数。If you want to recognize speech from an audio file instead of using a microphone, create an AudioConfig and use the filename parameter.

import azure.cognitiveservices.speech as speechsdk

def from_file():
    speech_config = speechsdk.SpeechConfig(subscription="<paste-your-subscription-key>", region="<paste-your-region>")
    audio_input = speechsdk.AudioConfig(filename="your_file_name.wav")
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)
    
    result = speech_recognizer.recognize_once_async().get()
    print(result.text)

from_file()

错误处理。Error handling

前面的示例只从 result.text 获取已识别的文本,但要处理错误和其他响应,需要编写一些代码来处理结果。The previous examples simply get the recognized text from result.text, but to handle errors and other responses, you'll need to write some code to handle the result. 以下代码评估 result.reason 属性并:The following code evaluates the result.reason property and:

  • 输出识别结果:speechsdk.ResultReason.RecognizedSpeechPrints the recognition result: speechsdk.ResultReason.RecognizedSpeech
  • 如果没有识别匹配项,请通知用户:speechsdk.ResultReason.NoMatch If there is no recognition match, inform the user: speechsdk.ResultReason.NoMatch
  • 如果遇到错误,则输出错误消息:speechsdk.ResultReason.CanceledIf an error is encountered, print the error message: speechsdk.ResultReason.Canceled
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))

连续识别Continuous recognition

前面的示例使用单步识别,可识别单个言语。The previous examples use single-shot recognition, which recognizes a single utterance. 单个言语的结束是通过在结束时倾听静音或处理最长 15 秒音频时确定的。The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.

与此相反,当你想控制何时停止识别时,需要使用连续识别。In contrast, continuous recognition is used when you want to control when to stop recognizing. 它要求你连接到 EventSignal 以获取识别结果,必须调用 stop_continuous_recognition()stop_continuous_recognition()It requires you to connect to the EventSignal to get the recognition results, and in to stop recognition, you must call stop_continuous_recognition() or stop_continuous_recognition(). 下面是有关如何对音频输入文件执行连续识别的示例。Here's an example of how continuous recognition is performed on an audio input file.

首先,我们将定义输入并初始化一个 SpeechRecognizerLet's start by defining the input and initializing a SpeechRecognizer:

audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

接下来,让我们创建一个变量来管理语音识别的状态。Next, let's create a variable to manage the state of speech recognition. 首先,我们将此设置为“False”,因为在开始识别时,我们可以放心地假定该操作尚未完成。To start, we'll set this to False, since at the start of recognition we can safely assume that it's not finished.

done = False

现在,我们将创建一个回叫,以在接收到 evt 时停止连续识别。Now, we're going to create a callback to stop continuous recognition when an evt is received. 需谨记以下几点。There's a few things to keep in mind.

  • 接收到 evt 时,系统将输出 evt 消息。When an evt is received, the evt message is printed.
  • 接收到 evt 后,系统将调用 stop_continuous_recognition() 来停止识别。After an evt is received, stop_continuous_recognition() is called to stop recognition.
  • 识别状态将更改为 TrueThe recognition state is changed to True.
def stop_cb(evt):
    print('CLOSING on {}'.format(evt))
    speech_recognizer.stop_continuous_recognition()
    nonlocal done
    done = True

此代码示例演示如何将回叫连接到从 SpeechRecognizer 发送的事件。This code sample shows how to connect callbacks to events sent from the SpeechRecognizer.

  • recognizing:事件信号,包含中间识别结果。recognizing: Signal for events containing intermediate recognition results.
  • recognized:事件信号,包含最终识别结果(指示成功的识别尝试)。recognized: Signal for events containing final recognition results (indicating a successful recognition attempt).
  • session_started:事件信号,指示识别会话的开始(操作)。session_started: Signal for events indicating the start of a recognition session (operation).
  • session_stopped:事件信号,指示识别会话的结束(操作)。session_stopped: Signal for events indicating the end of a recognition session (operation).
  • canceled:事件信号,包含已取消的识别结果(指示因直接取消请求或者传输或协议失败导致的识别尝试取消)。canceled: Signal for events containing canceled recognition results (indicating a recognition attempt that was canceled as a result or a direct cancellation request or, alternatively, a transport or protocol failure).
speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))

speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)

完成所有设置后,可以调用 start_continuous_recognition()With everything set up, we can call start_continuous_recognition().

speech_recognizer.start_continuous_recognition()
while not done:
    time.sleep(.5)

听写模式Dictation mode

使用连续识别时,可以使用相应的“启用听写”功能启用听写处理。When using continuous recognition, you can enable dictation processing by using the corresponding "enable dictation" function. 此模式将导致语音配置实例解释句子结构的单词说明(如标点符号)。This mode will cause the speech config instance to interpret word descriptions of sentence structures such as punctuation. 例如,言语“你居住在城镇吗问号”会被解释为文本“你居住在城镇吗?”。For example, the utterance "Do you live in town question mark" would be interpreted as the text "Do you live in town?".

若要启用听写模式,请在 SpeechConfig 上使用 enable_dictation() 方法。To enable dictation mode, use the enable_dictation() method on your SpeechConfig.

SpeechConfig.enable_dictation()

更改源语言Change source language

语音识别的常见任务是指定输入(或源)语言。A common task for speech recognition is specifying the input (or source) language. 让我们看看如何将输入语言更改为德语。Let's take a look at how you would change the input language to German. 在代码中找到 SpeechConfig,并直接在其下方添加此行。In your code, find your SpeechConfig, then add this line directly below it.

speech_config.speech_recognition_language="de-DE"

speech_recognition_language 是采用字符串作为实参的形参。speech_recognition_language is a parameter that takes a string as an argument. 可以提供受支持的区域设置/语言的列表中的任何值。You can provide any value in the list of supported locales/languages.

提高识别准确度Improve recognition accuracy

可以通过多种方式使用语音 SDK 来提高识别的准确度。There are a few ways to improve recognition accuracy with the Speech SDK. 让我们看一下短语列表。Let's take a look at Phrase Lists. 短语列表用于标识音频数据中的已知短语,如人的姓名或特定位置。Phrase Lists are used to identify known phrases in audio data, like a person's name or a specific location. 可以将单个词或完整短语添加到短语列表。Single words or complete phrases can be added to a Phrase List. 在识别期间,如果音频中包含整个短语的完全匹配项,则使用短语列表中的条目。During recognition, an entry in a phrase list is used if an exact match for the entire phrase is included in the audio. 如果找不到与短语完全匹配的项,则不支持识别。If an exact match to the phrase is not found, recognition is not assisted.

重要

短语列表功能仅以英语提供。The Phrase List feature is only available in English.

若要使用短语列表,请首先创建一个 PhraseListGrammar 对象,然后使用 addPhrase 添加特定的单词和短语。To use a phrase list, first create a PhraseListGrammar object, then add specific words and phrases with addPhrase.

PhraseListGrammar 所做的任何更改都将在下一次识别或重新连接到语音服务之后生效。Any changes to PhraseListGrammar take effect on the next recognition or after a reconnection to the Speech service.

phrase_list_grammar = speechsdk.PhraseListGrammar.from_recognizer(reco)
phrase_list_grammar.addPhrase("Supercalifragilisticexpialidocious")

如果需要清除短语列表:If you need to clear your phrase list:

phrase_list_grammar.clear()

提高识别精确度的其他方式Other options to improve recognition accuracy

短语列表只是提高识别准确度的一种方式。Phrase lists are only one option to improve recognition accuracy. 也可执行以下操作:You can also:

本快速入门介绍如何使用语音服务和 cURL 将语音转换为文本。In this quickstart, you learn how to convert speech to text using the Speech service and cURL.

若要深入了解语音转文本的概念,请参阅概述一文。For a high-level look at Speech-to-Text concepts, see the overview article.

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

将语音转换为文本Convert speech to text

请在命令提示符处运行以下命令。At a command prompt, run the following command. 需要将以下值插入到命令中。You will need to insert the following values into the command.

  • 语音服务订阅密钥。Your Speech service subscription key.
  • 你的语音服务区域。Your Speech service region.
  • 输入的音频文件路径。The input audio file path. 可以使用语音转文本来生成音频文件。You can generate audio files using text-to-speech.
curl --location --request POST 'https://INSERT_REGION_HERE.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: audio/wav' \
--data-binary 'INSERT_AUDIO_FILE_PATH_HERE'

应收到类似于下面的响应。You should receive a response like the following one.

{
    "RecognitionStatus": "Success",
    "DisplayText": "My voice is my passport, verify me.",
    "Offset": 6600000,
    "Duration": 32100000
}

有关详细信息,请参阅语音转文本 REST API 参考For more information see the speech-to-text REST API reference.

语音服务的核心功能之一是能够识别并转录人类语音(通常称为语音转文本)。One of the core features of the Speech service is the ability to recognize and transcribe human speech (often referred to as speech-to-text). 本快速入门介绍如何在应用和产品中使用语音 CLI 来执行高质量的语音转文本转换。In this quickstart, you learn how to use the Speech CLI in your apps and products to perform high-quality speech-to-text conversion.

下载并安装Download and install

备注

在 Windows 上,需要安装适用于平台的 Microsoft Visual C++ Redistributable for Visual Studio 2019On Windows, you need the Microsoft Visual C++ Redistributable for Visual Studio 2019 for your platform. 首次安装时,可能需要重启 Windows。Installing this for the first time may require you to restart Windows.

按照以下步骤在 Windows 上安装语音 CLI:Follow these steps to install the Speech CLI on Windows:

  1. 下载语音 CLI zip 存档然后提取它。Download the Speech CLI zip archive, then extract it.
  2. 转到从下载中提取的根目录 spx-zips,并提取所需的子目录(spx-net471 用于 .NET Framework 4.7,spx-netcore-win-x64 用于 x64 CPU 上的 .NET Core 3.0)。Go to the root directory spx-zips that you extracted from the download, and extract the subdirectory that you need (spx-net471 for .NET Framework 4.7, or spx-netcore-win-x64 for .NET Core 3.0 on an x64 CPU).

在命令提示符中,将目录更改到此位置,然后键入 spx 查看语音 CLI 的帮助。In the command prompt, change directory to this location, and then type spx to see help for the Speech CLI.

备注

在 Windows 上,语音 CLI 只能显示本地计算机上命令提示符适用的字体。On Windows, the Speech CLI can only show fonts available to the command prompt on the local computer. Windows 终端支持通过语音 CLI 以交互方式生成的所有字体。Windows Terminal supports all fonts produced interactively by the Speech CLI. 如果输出到文件,文本编辑器(例如记事本)或 web 浏览器(例如 Microsoft Edge)也可以显示所有字体。If you output to a file, a text editor like Notepad or a web browser like Microsoft Edge can also show all fonts.

备注

查找命令时,Powershell 不会检查本地目录。Powershell does not check the local directory when looking for a command. 在 Powershell 中,将目录更改为 spx 的位置,并通过输入 .\spx 调用工具。In Powershell, change directory to the location of spx and call the tool by entering .\spx. 如果将此目录添加到路径,则 Powershell 和 Windows 命令提示符会从不包含 .\ 前缀的任何目录中查找 spxIf you add this directory to your path, Powershell and the Windows command prompt will find spx from any directory without including the .\ prefix.

创建订阅配置Create subscription config

若要开始使用语音 CLI,需要输入语音订阅密钥和区域标识符。To start using the Speech CLI, you need to enter your Speech subscription key and region identifier. 按照免费试用语音服务中的步骤获取这些凭据。Get these credentials by following steps in Try the Speech service for free. 获得订阅密钥和区域标识符后(例如Once you have your subscription key and region identifier (ex. eastuswestus),运行以下命令。eastus, westus), run the following commands.

spx config @key --set SUBSCRIPTION-KEY
spx config @region --set REGION

现在会存储订阅身份验证,用于将来的 SPX 请求。Your subscription authentication is now stored for future SPX requests. 如果需要删除这些已存储值中的任何一个,请运行 spx config @region --clearspx config @key --clearIf you need to remove either of these stored values, run spx config @region --clear or spx config @key --clear.

从麦克风将语音转换为文本Speech-to-text from microphone

插上并打开电脑麦克风,同时关闭任何可能会使用麦克风的应用。Plug in and turn on your PC microphone, and turn off any apps that might also use the microphone. 某些计算机具有内置麦克风,其他计算机则需要配置蓝牙设备。Some computers have a built-in microphone, while others require configuration of a Bluetooth device.

现在,可以运行语音 CLI 来识别来自麦克风的语音。Now you're ready to run the Speech CLI to recognize speech from your microphone. 在命令行中,更改为包含语音 CLI 二进制文件的目录,然后运行以下命令。From the command line, change to the directory that contains the Speech CLI binary file, and run the following command.

spx recognize --microphone

备注

语音 CLI 默认为英语。The Speech CLI defaults to English. 你可以从“语音转文本”表中选择不同语言。You can choose a different language from the Speech-to-text table. 例如,添加 --source de-DE 以识别德语语音。For example, add --source de-DE to recognize German speech.

对麦克风说话,随后可以看到字词会实时转录为文本。Speak into the microphone, and you see transcription of your words into text in real-time. 如果停止说话一段时间,或者按 ctrl-C,语音 CLI 将停止。The Speech CLI will stop after a period of silence, or when you press ctrl-C.

从音频文件将语音转换为文本Speech-to-text from audio file

语音 CLI 可以识别多种文件格式和自然语言的语音。The Speech CLI can recognize speech in many file formats and natural languages. 在此示例中,可以使用包含英语语音的 WAV 文件(16kHz 或 8kHz,16 位,mono PCM)。In this example, you can use any WAV file (16kHz or 8kHz, 16-bit, and mono PCM) that contains English speech. 如果需要快速示例,请下载 whatstheweatherlike.wav 文件,并将其复制到语音 CLI 二进制文件所在的目录中。Or if you want a quick sample, download the whatstheweatherlike.wav file and copy it to the same directory as the Speech CLI binary file.

现在可以运行语音 CLI 来识别音频文件中找到的语音,方法是运行以下命令。Now you're ready to run the Speech CLI to recognize speech found in the audio file by running the following command.

spx recognize --file whatstheweatherlike.wav

备注

语音 CLI 默认为英语。The Speech CLI defaults to English. 你可以从“语音转文本”表中选择不同语言。You can choose a different language from the Speech-to-text table. 例如,添加 --source de-DE 以识别德语语音。For example, add --source de-DE to recognize German speech.

语音 CLI 将在屏幕上显示语音的文本转录。The Speech CLI will show a text transcription of the speech on the screen.

后续步骤Next steps