開始使用語音轉換文字Get started with speech-to-text

語音服務的核心功能之一,就是能夠辨識並轉譯人類語音 (通常稱為語音轉文字)。One of the core features of the Speech service is the ability to recognize and transcribe human speech (often referred to as speech-to-text). 在本快速入門中,您將了解如何在您的應用程式和產品中使用語音 SDK,以執行高品質的語音轉換文字辨識。In this quickstart, you learn how to use the Speech SDK in your apps and products to perform high-quality speech-to-text conversion.

跳至 GitHub 上的範例Skip to samples on GitHub

如果要直接跳到範例程式碼,請參閱 GitHub 上的 C# 快速入門範例If you want to skip straight to sample code, see the C# quickstart samples on GitHub.

必要條件Prerequisites

本文假設您具有 Azure 帳戶和語音服務訂用帳戶。This article assumes that you have an Azure account and Speech service subscription. 如果您沒有該帳戶和訂用帳戶,請免費試用語音服務If you don't have an account and subscription, try the Speech service for free.

安裝語音 SDKInstall the Speech SDK

如果想知道套件名稱以開始使用,請在 NuGet 主控台中執行 Install-Package Microsoft.CognitiveServices.SpeechIf you just want the package name to get started, run Install-Package Microsoft.CognitiveServices.Speech in the NuGet console.

如需平台特定的安裝指示,請參閱下列連結:For platform-specific installation instructions, see the following links:

建立語音設定Create a speech configuration

若要使用語音 SDK 來呼叫語音服務,您必須建立 SpeechConfigTo call the Speech service using the Speech SDK, you need to create a SpeechConfig. 此類別包含訂用帳戶的相關資訊,例如您的金鑰和關聯的區域、端點、主機或授權權杖。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token. 使用您的金鑰和區域來建立 SpeechConfigCreate a SpeechConfig by using your key and region. 請參閱尋找金鑰和區域頁面,以尋找您的金鑰區域配對。See the Find keys and region page to find your key-region pair.

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

class Program 
{
    async static Task Main(string[] args)
    {
        var speechConfig = SpeechConfig.FromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
    }
}

您可以透過其他數種方式將 SpeechConfig 初始化:There are a few other ways that you can initialize a SpeechConfig:

  • 使用端點:傳入語音服務端點。With an endpoint: pass in a Speech service endpoint. 金鑰或授權權杖是選用項目。A key or authorization token is optional.
  • 使用主機:傳入主機位址。With a host: pass in a host address. 金鑰或授權權杖是選用項目。A key or authorization token is optional.
  • 使用授權權杖:傳入授權權杖和相關聯的區域。With an authorization token: pass in an authorization token and the associated region.

注意

無論您是執行語音辨識、語音合成、翻譯還是意圖辨識,都一定會建立設定。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

從麥克風辨識Recognize from microphone

若要使用您的裝置麥克風辨識語音,請使用 FromDefaultMicrophoneInput() 建立 AudioConfigTo recognize speech using your device microphone, create an AudioConfig using FromDefaultMicrophoneInput(). 然後初始化 SpeechRecognizer,傳遞您的 audioConfigspeechConfigThen initialize a SpeechRecognizer, passing your audioConfig and speechConfig.

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

class Program 
{
    async static Task FromMic(SpeechConfig speechConfig)
    {
        using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
        using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);

        Console.WriteLine("Speak into your microphone.");
        var result = await recognizer.RecognizeOnceAsync();
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
    }

    async static Task Main(string[] args)
    {
        var speechConfig = SpeechConfig.FromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
        await FromMic(speechConfig);
    }
}

如果要使用「特定」的音訊輸入裝置,您需要在 AudioConfig 中指定裝置識別碼。If you want to use a specific audio input device, you need to specify the device ID in the AudioConfig. 了解如何取得音訊輸入裝置的裝置識別碼Learn how to get the device ID for your audio input device.

從檔案辨識Recognize from file

如果要辨識來自音訊檔案的語音而不使用麥克風,您仍然需要建立 AudioConfigIf you want to recognize speech from an audio file instead of a microphone, you still need to create an AudioConfig. 不過,建立 AudioConfig 時,您不會呼叫 FromDefaultMicrophoneInput(),而是會呼叫 FromWavFileInput() 並傳遞檔案路徑。However, when you create the AudioConfig, instead of calling FromDefaultMicrophoneInput(), you call FromWavFileInput() and pass the file path.

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

class Program 
{
    async static Task FromFile(SpeechConfig speechConfig)
    {
        using var audioConfig = AudioConfig.FromWavFileInput("PathToFile.wav");
        using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);

        var result = await recognizer.RecognizeOnceAsync();
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
    }

    async static Task Main(string[] args)
    {
        var speechConfig = SpeechConfig.FromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
        await FromFile(speechConfig);
    }
}

從記憶體內部資料流辨識Recognize from in-memory stream

對於許多使用案例,可能是您的音訊資料來自 Blob 儲存體,或已經是記憶體中的 byte[] 或類似的原始資料結構。For many use-cases, it is likely your audio data will be coming from blob storage, or otherwise already be in-memory as a byte[] or similar raw data structure. 下列範例會使用 PushAudioInputStream 來辨識語音,這基本上是抽象的記憶體資料流。The following example uses a PushAudioInputStream to recognize speech, which is essentially an abstracted memory stream. 範例程式碼會執行下列各項:The sample code does the following:

  • 使用可接受 byte[]Write() 函式,將未經處理的音訊資料 (PCM) 寫入 PushAudioInputStreamWrites raw audio data (PCM) to the PushAudioInputStream using the Write() function, which accepts a byte[].
  • 使用 FileReader 讀取 .wav 檔案以供示範之用,但如果您在 byte[] 中已經有音訊資料,則可以直接跳到將內容寫入至輸入資料流。Reads a .wav file using a FileReader for demonstration purposes, but if you already have audio data in a byte[], you can skip directly to writing the content to the input stream.
  • 預設格式為 16 位元,16khz 單一 PCM。The default format is 16 bit, 16khz mono PCM. 若要自訂格式,您可以使用靜態函數 AudioStreamFormat.GetWaveFormatPCM(sampleRate, (byte)bitRate, (byte)channels),將 AudioStreamFormat 物件傳遞至 CreatePushStream()To customize the format, you can pass an AudioStreamFormat object to CreatePushStream() using the static function AudioStreamFormat.GetWaveFormatPCM(sampleRate, (byte)bitRate, (byte)channels).
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

class Program 
{
    async static Task FromStream(SpeechConfig speechConfig)
    {
        var reader = new BinaryReader(File.OpenRead("PathToFile.wav"));
        using var audioInputStream = AudioInputStream.CreatePushStream();
        using var audioConfig = AudioConfig.FromStreamInput(audioInputStream);
        using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);

        byte[] readBytes;
        do
        {
            readBytes = reader.ReadBytes(1024);
            audioInputStream.Write(readBytes, readBytes.Length);
        } while (readBytes.Length > 0);

        var result = await recognizer.RecognizeOnceAsync();
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
    }

    async static Task Main(string[] args)
    {
        var speechConfig = SpeechConfig.FromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
        await FromStream(speechConfig);
    }
}

使用推送串流作為輸入時,會假設音訊資料是原始 PCM,例如略過任何標頭。Using a push stream as input assumes that the audio data is a raw PCM, e.g. skipping any headers. 如果未略過標頭,則此 API 仍然可以在某些情況下運作,但為了獲得最佳結果,請考慮執行邏輯以讀取標頭,讓 byte[] 從音訊資料開頭開始。The API will still work in certain cases if the header has not been skipped, but for the best results consider implementing logic to read off the headers so the byte[] starts at the start of the audio data.

錯誤處理Error handling

先前的範例只會從 result.text 取得已辨識的文字,但是若要處理錯誤和其他回應,您必須撰寫一些程式碼來處理結果。The previous examples simply get the recognized text from result.text, but to handle errors and other responses, you'll need to write some code to handle the result. 下列程式碼會評估 result.Reason 屬性和:The following code evaluates the result.Reason property and:

  • 列印辨識結果:ResultReason.RecognizedSpeechPrints the recognition result: ResultReason.RecognizedSpeech
  • 如果沒有任何相符的辨識,則通知使用者:ResultReason.NoMatchIf there is no recognition match, inform the user: ResultReason.NoMatch
  • 如果發生錯誤,則列印錯誤訊息:ResultReason.CanceledIf an error is encountered, print the error message: ResultReason.Canceled
switch (result.Reason)
{
    case ResultReason.RecognizedSpeech:
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
        break;
    case ResultReason.NoMatch:
        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
        break;
    case ResultReason.Canceled:
        var cancellation = CancellationDetails.FromResult(result);
        Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

        if (cancellation.Reason == CancellationReason.Error)
        {
            Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
            Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
            Console.WriteLine($"CANCELED: Did you update the subscription info?");
        }
        break;
}

連續辨識Continuous recognition

先前的範例會使用可辨識單一語句的一次性辨識。The previous examples use single-shot recognition, which recognizes a single utterance. 單一語句的結尾會藉由聽取結束時的靜默來決定,或是在處理音訊達 15 秒的上限時結束。The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.

相反地,當您想要 控制 何時停止辨識時,可使用連續辨識。In contrast, continuous recognition is used when you want to control when to stop recognizing. 您必須訂閱 RecognizingRecognizedCanceled 事件,才能取得辨識結果。It requires you to subscribe to the Recognizing, Recognized, and Canceled events to get the recognition results. 若要停止辨識,您必須呼叫 StopContinuousRecognitionAsyncTo stop recognition, you must call StopContinuousRecognitionAsync. 以下範例說明如何對音訊輸入檔執行連續辨識。Here's an example of how continuous recognition is performed on an audio input file.

首先要定義輸入並初始化 SpeechRecognizerStart by defining the input and initializing a SpeechRecognizer:

using var audioConfig = AudioConfig.FromWavFileInput("YourAudioFile.wav");
using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);

接下來建立一個 TaskCompletionSource<int> 來管理語音辨識的狀態。Then create a TaskCompletionSource<int> to manage the state of speech recognition.

var stopRecognition = new TaskCompletionSource<int>();

再來訂閱從 SpeechRecognizer 傳送的事件。Next, subscribe to the events sent from the SpeechRecognizer.

  • Recognizing:包含中繼辨識結果的事件訊號。Recognizing: Signal for events containing intermediate recognition results.
  • Recognized:包含最終辨識結果的事件訊號 (表示成功的辨識嘗試)。Recognized: Signal for events containing final recognition results (indicating a successful recognition attempt).
  • SessionStopped:表示辨識工作階段 (作業) 結束的事件訊號。SessionStopped: Signal for events indicating the end of a recognition session (operation).
  • Canceled:包含已取消之辨識結果的事件訊號 (表示因直接的取消要求或是傳輸或通訊協定失敗而取消的辨識嘗試)。Canceled: Signal for events containing canceled recognition results (indicating a recognition attempt that was canceled as a result or a direct cancellation request or, alternatively, a transport or protocol failure).
recognizer.Recognizing += (s, e) =>
{
    Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}");
};

recognizer.Recognized += (s, e) =>
{
    if (e.Result.Reason == ResultReason.RecognizedSpeech)
    {
        Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
    }
    else if (e.Result.Reason == ResultReason.NoMatch)
    {
        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
    }
};

recognizer.Canceled += (s, e) =>
{
    Console.WriteLine($"CANCELED: Reason={e.Reason}");

    if (e.Reason == CancellationReason.Error)
    {
        Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
        Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
        Console.WriteLine($"CANCELED: Did you update the subscription info?");
    }

    stopRecognition.TrySetResult(0);
};

recognizer.SessionStopped += (s, e) =>
{
    Console.WriteLine("\n    Session stopped event.");
    stopRecognition.TrySetResult(0);
};

設定好所有項目後,請呼叫 StartContinuousRecognitionAsync 以開始辨識。With everything set up, call StartContinuousRecognitionAsync to start recognizing.

await recognizer.StartContinuousRecognitionAsync();

// Waits for completion. Use Task.WaitAny to keep the task rooted.
Task.WaitAny(new[] { stopRecognition.Task });

// make the following call at some point to stop recognition.
// await recognizer.StopContinuousRecognitionAsync();

聽寫模式Dictation mode

使用連續辨識時,您可以使用對應的「啟用聽寫」功能來啟用聽寫處理。When using continuous recognition, you can enable dictation processing by using the corresponding "enable dictation" function. 此模式會使語音設定執行個體解譯句子結構的單字描述,例如標點符號。This mode will cause the speech config instance to interpret word descriptions of sentence structures such as punctuation. 例如,"Do you live in town question mark" 語句會解讀為文字 "Do you live in town?"。For example, the utterance "Do you live in town question mark" would be interpreted as the text "Do you live in town?".

若要啟用聽寫模式,請在您的 SpeechConfig 上使用 EnableDictation 方法。To enable dictation mode, use the EnableDictation method on your SpeechConfig.

speechConfig.EnableDictation();

變更來源語言Change source language

語音辨識的常見工作是指定輸入 (或來源) 語言。A common task for speech recognition is specifying the input (or source) language. 我們來看看如何將輸入語言變更為義大利文。Let's take a look at how you would change the input language to Italian. 在您的程式碼中,尋找您的 SpeechConfig,然後直接在其下方新增以下這一行。In your code, find your SpeechConfig, then add this line directly below it.

speechConfig.SpeechRecognitionLanguage = "it-IT";

SpeechRecognitionLanguage 屬性需要語言/地區設定格式字串。The SpeechRecognitionLanguage property expects a language-locale format string. 您可以提供支援的地區設定/語言清單中位於 [地區設定] 資料行內的任何值。You can provide any value in the Locale column in the list of supported locales/languages.

提高辨識精確度Improve recognition accuracy

有數種方式可讓您使用語音 SDK 提高辨識精確度。There are a few ways to improve recognition accuracy with the Speech SDK. 我們來看看片語清單。Let's take a look at Phrase Lists. 片語清單可用來識別音訊資料中的已知片語,例如人員的姓名或特定位置。Phrase Lists are used to identify known phrases in audio data, like a person's name or a specific location. 您可以將單字或完整片語新增至片語清單中。Single words or complete phrases can be added to a Phrase List. 辨識期間,如果音訊中包含與完整片語完全相符的項目,則會使用片語清單中的項目。During recognition, an entry in a phrase list is used if an exact match for the entire phrase is included in the audio. 如果找不到與片語完全相符的項目,就不會協助辨識。If an exact match to the phrase is not found, recognition is not assisted.

重要

片語清單功能只能在英文中使用。The Phrase List feature is only available in English.

若要使用片語清單,請先建立 PhraseListGrammar 物件,然後使用 AddPhrase 新增特定單字和片語。To use a phrase list, first create a PhraseListGrammar object, then add specific words and phrases with AddPhrase.

PhraseListGrammar 的任何變更將會在下一次辨識時或重新連線至語音服務之後生效。Any changes to PhraseListGrammar take effect on the next recognition or after a reconnection to the Speech service.

var phraseList = PhraseListGrammar.FromRecognizer(recognizer);
phraseList.AddPhrase("Supercalifragilisticexpialidocious");

如果您需要清除片語清單:If you need to clear your phrase list:

phraseList.Clear();

可提高辨識精確度的其他選項Other options to improve recognition accuracy

片語清單只是提高辨識精確度的選項之一。Phrase lists are only one option to improve recognition accuracy. 您也可以:You can also:

語音服務的核心功能之一,就是能夠辨識並轉譯人類語音 (通常稱為語音轉文字)。One of the core features of the Speech service is the ability to recognize and transcribe human speech (often referred to as speech-to-text). 在本快速入門中,您將了解如何在您的應用程式和產品中使用語音 SDK,以執行高品質的語音轉換文字辨識。In this quickstart, you learn how to use the Speech SDK in your apps and products to perform high-quality speech-to-text conversion.

跳至 GitHub 上的範例Skip to samples on GitHub

如果要直接跳到範例程式碼,請參閱 GitHub 上的 C++ 快速入門範例If you want to skip straight to sample code, see the C++ quickstart samples on GitHub.

必要條件Prerequisites

本文假設您具有 Azure 帳戶和語音服務訂用帳戶。This article assumes that you have an Azure account and Speech service subscription. 如果您沒有該帳戶和訂用帳戶,請免費試用語音服務If you don't have an account and subscription, try the Speech service for free.

安裝語音 SDKInstall the Speech SDK

您必須先安裝語音 SDK,才能執行動作。Before you can do anything, you'll need to install the Speech SDK. 根據您的平台,使用下列指示:Depending on your platform, use the following instructions:

建立語音設定Create a speech configuration

若要使用語音 SDK 來呼叫語音服務,您必須建立 SpeechConfigTo call the Speech service using the Speech SDK, you need to create a SpeechConfig. 此類別包含訂用帳戶的相關資訊,例如您的金鑰和關聯的區域、端點、主機或授權權杖。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token. 使用您的金鑰和區域來建立 SpeechConfigCreate a SpeechConfig by using your key and region. 請參閱尋找金鑰和區域頁面,以尋找您的金鑰區域配對。See the Find keys and region page to find your key-region pair.

using namespace std;
using namespace Microsoft::CognitiveServices::Speech;

auto config = SpeechConfig::FromSubscription("<paste-your-subscription-key>", "<paste-your-region>");

您可以透過其他數種方式將 SpeechConfig 初始化:There are a few other ways that you can initialize a SpeechConfig:

  • 使用端點:傳入語音服務端點。With an endpoint: pass in a Speech service endpoint. 金鑰或授權權杖是選用項目。A key or authorization token is optional.
  • 使用主機:傳入主機位址。With a host: pass in a host address. 金鑰或授權權杖是選用項目。A key or authorization token is optional.
  • 使用授權權杖:傳入授權權杖和相關聯的區域。With an authorization token: pass in an authorization token and the associated region.

注意

無論您是執行語音辨識、語音合成、翻譯還是意圖辨識,都一定會建立設定。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

從麥克風辨識Recognize from microphone

若要使用您的裝置麥克風辨識語音,請使用 FromDefaultMicrophoneInput() 建立 AudioConfigTo recognize speech using your device microphone, create an AudioConfig using FromDefaultMicrophoneInput(). 然後初始化 SpeechRecognizer,傳遞您的 audioConfigconfigThen initialize a SpeechRecognizer, passing your audioConfig and config.

using namespace Microsoft::CognitiveServices::Speech::Audio;

auto audioConfig = AudioConfig::FromDefaultMicrophoneInput();
auto recognizer = SpeechRecognizer::FromConfig(config, audioConfig);

cout << "Speak into your microphone." << std::endl;
auto result = recognizer->RecognizeOnceAsync().get();
cout << "RECOGNIZED: Text=" << result->Text << std::endl;

如果要使用「特定」的音訊輸入裝置,您需要在 AudioConfig 中指定裝置識別碼。 了解 如何取得音訊輸入裝置的裝置識別碼Learn how to get the device ID for your audio input device.

從檔案辨識Recognize from file

如果要辨識來自音訊檔案的語音而不使用麥克風,您仍然需要建立 AudioConfigIf you want to recognize speech from an audio file instead of using a microphone, you still need to create an AudioConfig. 不過,建立 AudioConfig 時,您不會呼叫 FromDefaultMicrophoneInput(),而是會呼叫 FromWavFileInput() 並傳遞檔案路徑。However, when you create the AudioConfig, instead of calling FromDefaultMicrophoneInput(), you call FromWavFileInput() and pass the file path.

using namespace Microsoft::CognitiveServices::Speech::Audio;

auto audioInput = AudioConfig::FromWavFileInput("YourAudioFile.wav");
auto recognizer = SpeechRecognizer::FromConfig(config, audioInput);

auto result = recognizer->RecognizeOnceAsync().get();
cout << "RECOGNIZED: Text=" << result->Text << std::endl;

辨識語音Recognize speech

「適用於 C++ 的語音 SDK」的辨識器類別會公開一些可供您用於語音辨識的方法。The Recognizer class for the Speech SDK for C++ exposes a few methods that you can use for speech recognition.

一次性辨識Single-shot recognition

一次性辨識會以非同步方式辨識單一語句。Single-shot recognition asynchronously recognizes a single utterance. 單一語句的結尾會藉由聽取結束時的靜默來決定,或是在處理音訊達 15 秒的上限時結束。The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed. 以下是使用 RecognizeOnceAsync 進行非同步一次性辨識的範例:Here's an example of asynchronous single-shot recognition using RecognizeOnceAsync:

auto result = recognizer->RecognizeOnceAsync().get();

您必須撰寫程式碼來處理結果。You'll need to write some code to handle the result. 此範例會評估 result->ReasonThis sample evaluates the result->Reason:

  • 列印辨識結果:ResultReason::RecognizedSpeechPrints the recognition result: ResultReason::RecognizedSpeech
  • 如果沒有任何相符的辨識,則通知使用者:ResultReason::NoMatchIf there is no recognition match, inform the user: ResultReason::NoMatch
  • 如果發生錯誤,則列印錯誤訊息:ResultReason::CanceledIf an error is encountered, print the error message: ResultReason::Canceled
switch (result->Reason)
{
    case ResultReason::RecognizedSpeech:
        cout << "We recognized: " << result->Text << std::endl;
        break;
    case ResultReason::NoMatch:
        cout << "NOMATCH: Speech could not be recognized." << std::endl;
        break;
    case ResultReason::Canceled:
        {
            auto cancellation = CancellationDetails::FromResult(result);
            cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
    
            if (cancellation->Reason == CancellationReason::Error) {
                cout << "CANCELED: ErrorCode= " << (int)cancellation->ErrorCode << std::endl;
                cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
                cout << "CANCELED: Did you update the subscription info?" << std::endl;
            }
        }
        break;
    default:
        break;
}

連續辨識Continuous recognition

連續辨識比一次性辨識略為複雜一些。Continuous recognition is a bit more involved than single-shot recognition. 您必須訂閱 RecognizingRecognizedCanceled 事件,才能取得辨識結果。It requires you to subscribe to the Recognizing, Recognized, and Canceled events to get the recognition results. 若要停止辨識,您必須呼叫 StopContinuousRecognitionAsyncTo stop recognition, you must call StopContinuousRecognitionAsync. 以下範例說明如何對音訊輸入檔執行連續辨識。Here's an example of how continuous recognition is performed on an audio input file.

首先,我們要定義輸入並初始化 SpeechRecognizerLet's start by defining the input and initializing a SpeechRecognizer:

auto audioInput = AudioConfig::FromWavFileInput("YourAudioFile.wav");
auto recognizer = SpeechRecognizer::FromConfig(config, audioInput);

接下來,我們要建立一個變數來管理語音辨識的狀態。Next, let's create a variable to manage the state of speech recognition. 為此,我們將宣告 promise<void>,因為在辨識開始時我們可以安全地假設辨識尚未完成。To start, we'll declare a promise<void>, since at the start of recognition we can safely assume that it's not finished.

promise<void> recognitionEnd;

我們會訂閱從 SpeechRecognizer 傳送的事件。We'll subscribe to the events sent from the SpeechRecognizer.

  • Recognizing:包含中繼辨識結果的事件訊號。Recognizing: Signal for events containing intermediate recognition results.
  • Recognized:包含最終辨識結果的事件訊號 (表示成功的辨識嘗試)。Recognized: Signal for events containing final recognition results (indicating a successful recognition attempt).
  • SessionStopped:表示辨識工作階段 (作業) 結束的事件訊號。SessionStopped: Signal for events indicating the end of a recognition session (operation).
  • Canceled:包含已取消之辨識結果的事件訊號 (表示因直接的取消要求或是傳輸或通訊協定失敗而取消的辨識嘗試)。Canceled: Signal for events containing canceled recognition results (indicating a recognition attempt that was canceled as a result or a direct cancellation request or, alternatively, a transport or protocol failure).
recognizer->Recognizing.Connect([](const SpeechRecognitionEventArgs& e)
    {
        cout << "Recognizing:" << e.Result->Text << std::endl;
    });

recognizer->Recognized.Connect([](const SpeechRecognitionEventArgs& e)
    {
        if (e.Result->Reason == ResultReason::RecognizedSpeech)
        {
            cout << "RECOGNIZED: Text=" << e.Result->Text 
                 << " (text could not be translated)" << std::endl;
        }
        else if (e.Result->Reason == ResultReason::NoMatch)
        {
            cout << "NOMATCH: Speech could not be recognized." << std::endl;
        }
    });

recognizer->Canceled.Connect([&recognitionEnd](const SpeechRecognitionCanceledEventArgs& e)
    {
        cout << "CANCELED: Reason=" << (int)e.Reason << std::endl;
        if (e.Reason == CancellationReason::Error)
        {
            cout << "CANCELED: ErrorCode=" << (int)e.ErrorCode << "\n"
                 << "CANCELED: ErrorDetails=" << e.ErrorDetails << "\n"
                 << "CANCELED: Did you update the subscription info?" << std::endl;

            recognitionEnd.set_value(); // Notify to stop recognition.
        }
    });

recognizer->SessionStopped.Connect([&recognitionEnd](const SessionEventArgs& e)
    {
        cout << "Session stopped.";
        recognitionEnd.set_value(); // Notify to stop recognition.
    });

完成所有設定後,我們可以呼叫 StopContinuousRecognitionAsyncWith everything set up, we can call StopContinuousRecognitionAsync.

// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
recognizer->StartContinuousRecognitionAsync().get();

// Waits for recognition end.
recognitionEnd.get_future().get();

// Stops recognition.
recognizer->StopContinuousRecognitionAsync().get();

聽寫模式Dictation mode

使用連續辨識時,您可以使用對應的「啟用聽寫」功能來啟用聽寫處理。When using continuous recognition, you can enable dictation processing by using the corresponding "enable dictation" function. 此模式會使語音設定執行個體解譯句子結構的單字描述,例如標點符號。This mode will cause the speech config instance to interpret word descriptions of sentence structures such as punctuation. 例如,"Do you live in town question mark" 語句會解讀為文字 "Do you live in town?"。For example, the utterance "Do you live in town question mark" would be interpreted as the text "Do you live in town?".

若要啟用聽寫模式,請在您的 SpeechConfig 上使用 EnableDictation 方法。To enable dictation mode, use the EnableDictation method on your SpeechConfig.

config->EnableDictation();

變更來源語言Change source language

語音辨識的常見工作是指定輸入 (或來源) 語言。A common task for speech recognition is specifying the input (or source) language. 我們來看看如何將輸入語言變更為德文。Let's take a look at how you would change the input language to German. 在您的程式碼中,尋找您的 SpeechConfig,然後直接在其下方新增以下這一行。In your code, find your SpeechConfig, then add this line directly below it.

config->SetSpeechRecognitionLanguage("de-DE");

SetSpeechRecognitionLanguage 是以字串作為引數的參數。SetSpeechRecognitionLanguage is a parameter that takes a string as an argument. 您可以提供支援的地區設定/語言清單中的任何值。You can provide any value in the list of supported locales/languages.

提高辨識精確度Improve recognition accuracy

有數種方式可讓您使用語音 SDK 提高辨識精確度。There are a few ways to improve recognition accuracy with the Speech SDK. 我們來看看片語清單。Let's take a look at Phrase Lists. 片語清單可用來識別音訊資料中的已知片語,例如人員的姓名或特定位置。Phrase Lists are used to identify known phrases in audio data, like a person's name or a specific location. 您可以將單字或完整片語新增至片語清單中。Single words or complete phrases can be added to a Phrase List. 辨識期間,如果音訊中包含與完整片語完全相符的項目,則會使用片語清單中的項目。During recognition, an entry in a phrase list is used if an exact match for the entire phrase is included in the audio. 如果找不到與片語完全相符的項目,就不會協助辨識。If an exact match to the phrase is not found, recognition is not assisted.

重要

片語清單功能只能在英文中使用。The Phrase List feature is only available in English.

若要使用片語清單,請先建立 PhraseListGrammar 物件,然後使用 AddPhrase 新增特定單字和片語。To use a phrase list, first create a PhraseListGrammar object, then add specific words and phrases with AddPhrase.

PhraseListGrammar 的任何變更將會在下一次辨識時或重新連線至語音服務之後生效。Any changes to PhraseListGrammar take effect on the next recognition or after a reconnection to the Speech service.

auto phraseListGrammar = PhraseListGrammar::FromRecognizer(recognizer);
phraseListGrammar->AddPhrase("Supercalifragilisticexpialidocious");

如果您需要清除片語清單:If you need to clear your phrase list:

phraseListGrammar->Clear();

可提高辨識精確度的其他選項Other options to improve recognition accuracy

片語清單只是提高辨識精確度的選項之一。Phrase lists are only one option to improve recognition accuracy. 您也可以:You can also:

語音服務的核心功能之一,就是能夠辨識並轉譯人類語音 (通常稱為語音轉文字)。One of the core features of the Speech service is the ability to recognize and transcribe human speech (often referred to as speech-to-text). 在本快速入門中,您將了解如何在您的應用程式和產品中使用語音 SDK,以執行高品質的語音轉換文字辨識。In this quickstart, you learn how to use the Speech SDK in your apps and products to perform high-quality speech-to-text conversion.

跳至 GitHub 上的範例Skip to samples on GitHub

如果要直接跳到範例程式碼,請參閱 GitHub 上的 Go 快速入門範例If you want to skip straight to sample code, see the Go quickstart samples on GitHub.

必要條件Prerequisites

本文假設您具有 Azure 帳戶和語音服務訂用帳戶。This article assumes that you have an Azure account and Speech service subscription. 如果您沒有該帳戶和訂用帳戶,請免費試用語音服務If you don't have an account and subscription, try the Speech service for free.

安裝語音 SDKInstall the Speech SDK

您必須先安裝適用於 Go 的語音 SDK,才能執行動作。Before you can do anything, you'll need to install the Speech SDK for Go.

從麥克風進行語音轉換文字Speech-to-text from microphone

使用下列程式碼範例,從預設裝置麥克風執行語音辨識。Use the following code sample to run speech recognition from your default device microphone. 以您的訂用帳戶和區域金鑰取代 subscriptionregion 變數。Replace the variables subscription and region with your subscription and region keys. 執行指令碼將會在您的預設麥克風和輸出文字啟動辨識工作階段。Running the script will start a recognition session on your default microphone and output text.

import (
    "bufio"
    "fmt"
    "os"

    "github.com/Microsoft/cognitive-services-speech-sdk-go/audio"
    "github.com/Microsoft/cognitive-services-speech-sdk-go/speech"
)

func sessionStartedHandler(event speech.SessionEventArgs) {
    defer event.Close()
    fmt.Println("Session Started (ID=", event.SessionID, ")")
}

func sessionStoppedHandler(event speech.SessionEventArgs) {
    defer event.Close()
    fmt.Println("Session Stopped (ID=", event.SessionID, ")")
}

func recognizingHandler(event speech.SpeechRecognitionEventArgs) {
    defer event.Close()
    fmt.Println("Recognizing:", event.Result.Text)
}

func recognizedHandler(event speech.SpeechRecognitionEventArgs) {
    defer event.Close()
    fmt.Println("Recognized:", event.Result.Text)
}

func cancelledHandler(event speech.SpeechRecognitionCanceledEventArgs) {
    defer event.Close()
    fmt.Println("Received a cancellation: ", event.ErrorDetails)
}

func main() {
    subscription :=  "YOUR_SUBSCRIPTION_KEY"
    region := "YOUR_SUBSCRIPTIONKEY_REGION"

    audioConfig, err := audio.NewAudioConfigFromDefaultMicrophoneInput()
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer audioConfig.Close()
    config, err := speech.NewSpeechConfigFromSubscription(subscription, region)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer config.Close()
    speechRecognizer, err := speech.NewSpeechRecognizerFromConfig(config, audioConfig)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer speechRecognizer.Close()
    speechRecognizer.SessionStarted(sessionStartedHandler)
    speechRecognizer.SessionStopped(sessionStoppedHandler)
    speechRecognizer.Recognizing(recognizingHandler)
    speechRecognizer.Recognized(recognizedHandler)
    speechRecognizer.Canceled(cancelledHandler)
    speechRecognizer.StartContinuousRecognitionAsync()
    defer speechRecognizer.StopContinuousRecognitionAsync()
    bufio.NewReader(os.Stdin).ReadBytes('\n')
}

如需 SpeechConfigSpeechRecognizer 類別的詳細資訊,請參閱參考文件。See the reference docs for detailed information on the SpeechConfig and SpeechRecognizer classes.

從音訊檔案進行語音轉換文字Speech-to-text from audio file

使用下列範例,從音訊檔案執行語音辨識。Use the following sample to run speech recognition from an audio file. 以您的訂用帳戶和區域金鑰取代 subscriptionregion 變數。Replace the variables subscription and region with your subscription and region keys. 另外,以 .wav 檔的路徑取代變數 fileAdditionally, replace the variable file with a path to a .wav file. 執行指令碼會從檔案辨識語音,並輸出文字結果。Running the script will recognize speech from the file, and output the text result.

import (
    "fmt"
    "time"

    "github.com/Microsoft/cognitive-services-speech-sdk-go/audio"
    "github.com/Microsoft/cognitive-services-speech-sdk-go/speech"
)

func main() {
    subscription :=  "YOUR_SUBSCRIPTION_KEY"
    region := "YOUR_SUBSCRIPTIONKEY_REGION"
    file := "path/to/file.wav"

    audioConfig, err := audio.NewAudioConfigFromWavFileInput(file)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer audioConfig.Close()
    config, err := speech.NewSpeechConfigFromSubscription(subscription, region)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer config.Close()
    speechRecognizer, err := speech.NewSpeechRecognizerFromConfig(config, audioConfig)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer speechRecognizer.Close()
    speechRecognizer.SessionStarted(func(event speech.SessionEventArgs) {
        defer event.Close()
        fmt.Println("Session Started (ID=", event.SessionID, ")")
    })
    speechRecognizer.SessionStopped(func(event speech.SessionEventArgs) {
        defer event.Close()
        fmt.Println("Session Stopped (ID=", event.SessionID, ")")
    })

    task := speechRecognizer.RecognizeOnceAsync()
    var outcome speech.SpeechRecognitionOutcome
    select {
    case outcome = <-task:
    case <-time.After(5 * time.Second):
        fmt.Println("Timed out")
        return
    }
    defer outcome.Close()
    if outcome.Error != nil {
        fmt.Println("Got an error: ", outcome.Error)
    }
    fmt.Println("Got a recognition!")
    fmt.Println(outcome.Result.Text)
}

如需 SpeechConfigSpeechRecognizer 類別的詳細資訊,請參閱參考文件。See the reference docs for detailed information on the SpeechConfig and SpeechRecognizer classes.

語音服務的核心功能之一,就是能夠辨識並轉譯人類語音 (通常稱為語音轉文字)。One of the core features of the Speech service is the ability to recognize and transcribe human speech (often referred to as speech-to-text). 在本快速入門中,您將了解如何在您的應用程式和產品中使用語音 SDK,以執行高品質的語音轉換文字辨識。In this quickstart, you learn how to use the Speech SDK in your apps and products to perform high-quality speech-to-text conversion.

跳至 GitHub 上的範例Skip to samples on GitHub

如果要直接跳到範例程式碼,請參閱 GitHub 上的 Java 快速入門範例If you want to skip straight to sample code, see the Java quickstart samples on GitHub.

必要條件Prerequisites

本文假設您具有 Azure 帳戶和語音服務訂用帳戶。This article assumes that you have an Azure account and Speech service subscription. 如果您沒有該帳戶和訂用帳戶,請免費試用語音服務If you don't have an account and subscription, try the Speech service for free.

安裝語音 SDKInstall the Speech SDK

您必須先安裝語音 SDK,才能執行動作。Before you can do anything, you'll need to install the Speech SDK. 根據您的平台,使用下列指示:Depending on your platform, use the following instructions:

建立語音設定Create a speech configuration

若要使用語音 SDK 來呼叫語音服務,您必須建立 SpeechConfigTo call the Speech service using the Speech SDK, you need to create a SpeechConfig. 此類別包含訂用帳戶的相關資訊,例如您的金鑰和關聯的區域、端點、主機或授權權杖。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token. 使用您的金鑰和區域來建立 SpeechConfigCreate a SpeechConfig by using your key and region. 請參閱尋找金鑰和區域頁面,以尋找您的金鑰區域配對。See the Find keys and region page to find your key-region pair.

import com.microsoft.cognitiveservices.speech.*;
import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;

public class Program {
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        SpeechConfig speechConfig = SpeechConfig.fromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
    }
}

您可以透過其他數種方式將 SpeechConfig 初始化:There are a few other ways that you can initialize a SpeechConfig:

  • 使用端點:傳入語音服務端點。With an endpoint: pass in a Speech service endpoint. 金鑰或授權權杖是選用項目。A key or authorization token is optional.
  • 使用主機:傳入主機位址。With a host: pass in a host address. 金鑰或授權權杖是選用項目。A key or authorization token is optional.
  • 使用授權權杖:傳入授權權杖和相關聯的區域。With an authorization token: pass in an authorization token and the associated region.

注意

無論您是執行語音辨識、語音合成、翻譯還是意圖辨識,都一定會建立設定。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

從麥克風辨識Recognize from microphone

若要使用您的裝置麥克風辨識語音,請使用 fromDefaultMicrophoneInput() 建立 AudioConfigTo recognize speech using your device microphone, create an AudioConfig using fromDefaultMicrophoneInput(). 然後初始化 SpeechRecognizer,傳遞您的 audioConfigconfigThen initialize aSpeechRecognizer, passing your audioConfig and config.

import com.microsoft.cognitiveservices.speech.*;
import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;

public class Program {
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        SpeechConfig speechConfig = SpeechConfig.fromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
        fromMic(speechConfig);
    }

    public static void fromMic(SpeechConfig speechConfig) throws InterruptedException, ExecutionException {
        AudioConfig audioConfig = AudioConfig.fromDefaultMicrophoneInput();
        SpeechRecognizer recognizer = new SpeechRecognizer(speechConfig, audioConfig);

        System.out.println("Speak into your microphone.");
        Future<SpeechRecognitionResult> task = recognizer.recognizeOnceAsync();
        SpeechRecognitionResult result = task.get();
        System.out.println("RECOGNIZED: Text=" + result.getText());
    }
}

如果要使用「特定」的音訊輸入裝置,您需要在 AudioConfig 中指定裝置識別碼。 了解如何取得音訊輸入裝置的裝置識別碼Learn how to get the device ID for your audio input device.

從檔案辨識Recognize from file

如果要辨識來自音訊檔案的語音而不使用麥克風,您仍然需要建立 AudioConfigIf you want to recognize speech from an audio file instead of using a microphone, you still need to create an AudioConfig. 不過,建立 AudioConfig 時,您不會呼叫 fromDefaultMicrophoneInput(),而是呼叫 fromWavFileInput() 並傳遞檔案路徑。However, when you create the AudioConfig, instead of calling fromDefaultMicrophoneInput(), call fromWavFileInput() and pass the file path.

import com.microsoft.cognitiveservices.speech.*;
import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;

public class Program {
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        SpeechConfig speechConfig = SpeechConfig.fromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
        fromFile(speechConfig);
    }

    public static void fromFile(SpeechConfig speechConfig) throws InterruptedException, ExecutionException {
        AudioConfig audioConfig = AudioConfig.fromWavFileInput("YourAudioFile.wav");
        SpeechRecognizer recognizer = new SpeechRecognizer(speechConfig, audioConfig);
        
        Future<SpeechRecognitionResult> task = recognizer.recognizeOnceAsync();
        SpeechRecognitionResult result = task.get();
        System.out.println("RECOGNIZED: Text=" + result.getText());
    }
}

錯誤處理Error handling

先前的範例只會使用 result.getText() 取得已辨識的文字,但是若要處理錯誤和其他回應,您必須撰寫一些程式碼來處理結果。The previous examples simply get the recognized text using result.getText(), but to handle errors and other responses, you'll need to write some code to handle the result. 下列範例會評估 result.getReason() 和:The following example evaluates result.getReason() and:

  • 列印辨識結果:ResultReason.RecognizedSpeechPrints the recognition result: ResultReason.RecognizedSpeech
  • 如果沒有任何相符的辨識,則通知使用者:ResultReason.NoMatchIf there is no recognition match, inform the user: ResultReason.NoMatch
  • 如果發生錯誤,則列印錯誤訊息:ResultReason.CanceledIf an error is encountered, print the error message: ResultReason.Canceled
switch (result.getReason()) {
    case ResultReason.RecognizedSpeech:
        System.out.println("We recognized: " + result.getText());
        exitCode = 0;
        break;
    case ResultReason.NoMatch:
        System.out.println("NOMATCH: Speech could not be recognized.");
        break;
    case ResultReason.Canceled: {
            CancellationDetails cancellation = CancellationDetails.fromResult(result);
            System.out.println("CANCELED: Reason=" + cancellation.getReason());

            if (cancellation.getReason() == CancellationReason.Error) {
                System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
                System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
                System.out.println("CANCELED: Did you update the subscription info?");
            }
        }
        break;
}

連續辨識Continuous recognition

先前的範例會使用可辨識單一語句的一次性辨識。The previous examples use single-shot recognition, which recognizes a single utterance. 單一語句的結尾會藉由聽取結束時的靜默來決定,或是在處理音訊達 15 秒的上限時結束。The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.

相反地,當您想要 控制 何時停止辨識時,可使用連續辨識。In contrast, continuous recognition is used when you want to control when to stop recognizing. 您必須訂閱 recognizingrecognizedcanceled 事件,才能取得辨識結果。It requires you to subscribe to the recognizing, recognized, and canceled events to get the recognition results. 若要停止辨識,您必須呼叫 stopContinuousRecognitionAsyncTo stop recognition, you must call stopContinuousRecognitionAsync. 以下範例說明如何對音訊輸入檔執行連續辨識。Here's an example of how continuous recognition is performed on an audio input file.

首先,我們要定義輸入並初始化 SpeechRecognizerLet's start by defining the input and initializing a SpeechRecognizer:

AudioConfig audioConfig = AudioConfig.fromWavFileInput("YourAudioFile.wav");
SpeechRecognizer recognizer = new SpeechRecognizer(config, audioConfig);

接下來,我們要建立一個變數來管理語音辨識的狀態。Next, let's create a variable to manage the state of speech recognition. 為此,我們會在類別範圍宣告 SemaphoreTo start, we'll declare a Semaphore at the class scope.

private static Semaphore stopTranslationWithFileSemaphore;

我們會訂閱從 SpeechRecognizer 傳送的事件。We'll subscribe to the events sent from the SpeechRecognizer.

  • recognizing:包含中繼辨識結果的事件訊號。recognizing: Signal for events containing intermediate recognition results.
  • recognized:包含最終辨識結果的事件訊號 (表示成功的辨識嘗試)。recognized: Signal for events containing final recognition results (indicating a successful recognition attempt).
  • sessionStopped:表示辨識工作階段 (作業) 結束的事件訊號。sessionStopped: Signal for events indicating the end of a recognition session (operation).
  • canceled:包含已取消之辨識結果的事件訊號 (表示因直接的取消要求或是傳輸或通訊協定失敗而取消的辨識嘗試)。canceled: Signal for events containing canceled recognition results (indicating a recognition attempt that was canceled as a result or a direct cancellation request or, alternatively, a transport or protocol failure).
// First initialize the semaphore.
stopTranslationWithFileSemaphore = new Semaphore(0);

recognizer.recognizing.addEventListener((s, e) -> {
    System.out.println("RECOGNIZING: Text=" + e.getResult().getText());
});

recognizer.recognized.addEventListener((s, e) -> {
    if (e.getResult().getReason() == ResultReason.RecognizedSpeech) {
        System.out.println("RECOGNIZED: Text=" + e.getResult().getText());
    }
    else if (e.getResult().getReason() == ResultReason.NoMatch) {
        System.out.println("NOMATCH: Speech could not be recognized.");
    }
});

recognizer.canceled.addEventListener((s, e) -> {
    System.out.println("CANCELED: Reason=" + e.getReason());

    if (e.getReason() == CancellationReason.Error) {
        System.out.println("CANCELED: ErrorCode=" + e.getErrorCode());
        System.out.println("CANCELED: ErrorDetails=" + e.getErrorDetails());
        System.out.println("CANCELED: Did you update the subscription info?");
    }

    stopTranslationWithFileSemaphore.release();
});

recognizer.sessionStopped.addEventListener((s, e) -> {
    System.out.println("\n    Session stopped event.");
    stopTranslationWithFileSemaphore.release();
});

完成所有設定後,我們可以呼叫 startContinuousRecognitionAsyncWith everything set up, we can call startContinuousRecognitionAsync.

// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
recognizer.startContinuousRecognitionAsync().get();

// Waits for completion.
stopTranslationWithFileSemaphore.acquire();

// Stops recognition.
recognizer.stopContinuousRecognitionAsync().get();

聽寫模式Dictation mode

使用連續辨識時,您可以使用對應的「啟用聽寫」功能來啟用聽寫處理。When using continuous recognition, you can enable dictation processing by using the corresponding "enable dictation" function. 此模式會使語音設定執行個體解譯句子結構的單字描述,例如標點符號。This mode will cause the speech config instance to interpret word descriptions of sentence structures such as punctuation. 例如,"Do you live in town question mark" 語句會解讀為文字 "Do you live in town?"。For example, the utterance "Do you live in town question mark" would be interpreted as the text "Do you live in town?".

若要啟用聽寫模式,請在您的 SpeechConfig 上使用 enableDictation 方法。To enable dictation mode, use the enableDictation method on your SpeechConfig.

config.enableDictation();

變更來源語言Change source language

語音辨識的常見工作是指定輸入 (或來源) 語言。A common task for speech recognition is specifying the input (or source) language. 我們來看看如何將輸入語言變更為法文。Let's take a look at how you would change the input language to French. 在您的程式碼中,尋找您的 SpeechConfig,然後直接在其下方新增以下這一行。In your code, find your SpeechConfig, then add this line directly below it.

config.setSpeechRecognitionLanguage("fr-FR");

setSpeechRecognitionLanguage 是以字串作為引數的參數。setSpeechRecognitionLanguage is a parameter that takes a string as an argument. 您可以提供支援的地區設定/語言清單中的任何值。You can provide any value in the list of supported locales/languages.

提高辨識精確度Improve recognition accuracy

有數種方式可讓您使用語音 SDK 提高辨識精確度。There are a few ways to improve recognition accuracy with the Speech SDK. 我們來看看片語清單。Let's take a look at Phrase Lists. 片語清單可用來識別音訊資料中的已知片語,例如人員的姓名或特定位置。Phrase Lists are used to identify known phrases in audio data, like a person's name or a specific location. 您可以將單字或完整片語新增至片語清單中。Single words or complete phrases can be added to a Phrase List. 辨識期間,如果音訊中包含與完整片語完全相符的項目,則會使用片語清單中的項目。During recognition, an entry in a phrase list is used if an exact match for the entire phrase is included in the audio. 如果找不到與片語完全相符的項目,就不會協助辨識。If an exact match to the phrase is not found, recognition is not assisted.

重要

片語清單功能只能在英文中使用。The Phrase List feature is only available in English.

若要使用片語清單,請先建立 PhraseListGrammar 物件,然後使用 AddPhrase 新增特定單字和片語。To use a phrase list, first create a PhraseListGrammar object, then add specific words and phrases with AddPhrase.

PhraseListGrammar 的任何變更將會在下一次辨識時或重新連線至語音服務之後生效。Any changes to PhraseListGrammar take effect on the next recognition or after a reconnection to the Speech service.

PhraseListGrammar phraseList = PhraseListGrammar.fromRecognizer(recognizer);
phraseList.addPhrase("Supercalifragilisticexpialidocious");

如果您需要清除片語清單:If you need to clear your phrase list:

phraseList.clear();

可提高辨識精確度的其他選項Other options to improve recognition accuracy

片語清單只是提高辨識精確度的選項之一。Phrase lists are only one option to improve recognition accuracy. 您也可以:You can also:

語音服務的核心功能之一,就是能夠辨識並轉譯人類語音 (通常稱為語音轉文字)。One of the core features of the Speech service is the ability to recognize and transcribe human speech (often referred to as speech-to-text). 在本快速入門中,您將了解如何在您的應用程式和產品中使用語音 SDK,以執行高品質的語音轉換文字辨識。In this quickstart, you learn how to use the Speech SDK in your apps and products to perform high-quality speech-to-text conversion.

跳至 GitHub 上的範例Skip to samples on GitHub

如果要直接跳到範例程式碼,請參閱 GitHub 上的 JavaScript 快速入門範例If you want to skip straight to sample code, see the JavaScript quickstart samples on GitHub.

必要條件Prerequisites

本文假設您具有 Azure 帳戶和語音服務訂用帳戶。This article assumes that you have an Azure account and Speech service subscription. 如果您沒有該帳戶和訂用帳戶,請免費試用語音服務If you don't have an account and subscription, try the Speech service for free.

安裝語音 SDKInstall the Speech SDK

您必須先安裝適用於 JavaScript 的語音 SDK,才能執行動作。Before you can do anything, you'll need to install the Speech SDK for JavaScript . 根據您的平台,使用下列指示:Depending on your platform, use the following instructions:

此外,根據目標環境而定,請使用下列其中一項:Additionally, depending on the target environment use one of the following:

下載 適用於 JavaScript 的語音 SDK microsoft.cognitiveservices.speech.sdk.bundle.js 檔案並將其解壓縮,放在您的 HTML 檔案可存取的資料夾中。Download and extract the Speech SDK for JavaScript microsoft.cognitiveservices.speech.sdk.bundle.js file, and place it in a folder accessible to your HTML file.

<script src="microsoft.cognitiveservices.speech.sdk.bundle.js"></script>;

提示

如果您是以網頁瀏覽器為目標,並使用 <script> 標籤,則參考類別不需要 sdk 前置詞。If you're targeting a web browser, and using the <script> tag; the sdk prefix is not needed when referencing classes. sdk 前置詞是用來命名 require 模組的別名。The sdk prefix is an alias used to name the require module.

建立語音設定Create a speech configuration

若要使用語音 SDK 來呼叫語音服務,您必須建立 SpeechConfigTo call the Speech service using the Speech SDK, you need to create a SpeechConfig. 此類別包含訂用帳戶的相關資訊,例如您的金鑰和關聯的區域、端點、主機或授權權杖。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token. 使用您的金鑰和區域建立 SpeechConfigCreate a SpeechConfig using your key and region. 請參閱尋找金鑰和區域頁面,以尋找您的金鑰區域配對。See the Find keys and region page to find your key-region pair.

const speechConfig = sdk.SpeechConfig.fromSubscription("<paste-your-subscription-key>", "<paste-your-region>");

您可以透過其他數種方式將 SpeechConfig 初始化:There are a few other ways that you can initialize a SpeechConfig:

  • 使用端點:傳入語音服務端點。With an endpoint: pass in a Speech service endpoint. 金鑰或授權權杖是選用項目。A key or authorization token is optional.
  • 使用主機:傳入主機位址。With a host: pass in a host address. 金鑰或授權權杖是選用項目。A key or authorization token is optional.
  • 使用授權權杖:傳入授權權杖和相關聯的區域。With an authorization token: pass in an authorization token and the associated region.

注意

無論您是執行語音辨識、語音合成、翻譯還是意圖辨識,都一定會建立設定。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

從麥克風辨識 (僅限瀏覽器)Recognize from microphone (Browser only)

若要使用您的裝置麥克風辨識語音,請使用 fromDefaultMicrophoneInput() 建立 AudioConfigTo recognize speech using your device microphone, create an AudioConfig using fromDefaultMicrophoneInput(). 然後初始化 SpeechRecognizer,傳遞您的 speechConfigaudioConfigThen initialize a SpeechRecognizer, passing your speechConfig and audioConfig.

const sdk = require("microsoft-cognitiveservices-speech-sdk");
const speechConfig = sdk.SpeechConfig.fromSubscription("<paste-your-subscription-key>", "<paste-your-region>");

function fromMic() {
    let audioConfig = sdk.AudioConfig.fromDefaultMicrophoneInput();
    let recognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
    
    console.log('Speak into your microphone.');
    recognizer.recognizeOnceAsync(result => {
        console.log(`RECOGNIZED: Text=${result.text}`);
    });
}
fromMic();

如果要使用「特定」的音訊輸入裝置,您需要在 AudioConfig 中指定裝置識別碼。If you want to use a specific audio input device, you need to specify the device ID in the AudioConfig. 了解 如何取得音訊輸入裝置的裝置識別碼Learn how to get the device ID for your audio input device.

從檔案辨識Recognize from file

若要從以瀏覽器為基礎的 JavaScript 環境中的音訊檔案辨識語音,請使用 fromWavFileInput() 函式來建立 AudioConfigTo recognize speech from an audio file in a browser-based JavaScript environment, you use the fromWavFileInput() function to create an AudioConfig. 函式 fromWavFileInput() 預期會使用 JavaScript File 物件作為參數。The function fromWavFileInput() expects a JavaScript File object as a parameter.

const sdk = require("microsoft-cognitiveservices-speech-sdk");
const speechConfig = sdk.SpeechConfig.fromSubscription("<paste-your-subscription-key>", "<paste-your-region>");

function fromFile() {
    // wavByteContent should be a byte array of the raw wav content
    let file = new File([wavByteContent]);
    let audioConfig = sdk.AudioConfig.fromWavFileInput(file);
    let recognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
    
    recognizer.recognizeOnceAsync(result => {
        console.log(`RECOGNIZED: Text=${result.text}`);
    });
}
fromFile();

錯誤處理Error handling

先前的範例只會從 result.text 取得已辨識的文字,但是若要處理錯誤和其他回應,您必須撰寫一些程式碼來處理結果。The previous examples simply get the recognized text from result.text, but to handle errors and other responses, you'll need to write some code to handle the result. 下列程式碼會評估 result.reason 屬性和:The following code evaluates the result.reason property and:

  • 列印辨識結果:ResultReason.RecognizedSpeechPrints the recognition result: ResultReason.RecognizedSpeech
  • 如果沒有任何相符的辨識,則通知使用者:ResultReason.NoMatchIf there is no recognition match, inform the user: ResultReason.NoMatch
  • 如果發生錯誤,則列印錯誤訊息:ResultReason.CanceledIf an error is encountered, print the error message: ResultReason.Canceled
switch (result.reason) {
    case ResultReason.RecognizedSpeech:
        console.log(`RECOGNIZED: Text=${result.text}`);
        break;
    case ResultReason.NoMatch:
        console.log("NOMATCH: Speech could not be recognized.");
        break;
    case ResultReason.Canceled:
        const cancellation = CancellationDetails.fromResult(result);
        console.log(`CANCELED: Reason=${cancellation.reason}`);

        if (cancellation.reason == CancellationReason.Error) {
            console.log(`CANCELED: ErrorCode=${cancellation.ErrorCode}`);
            console.log(`CANCELED: ErrorDetails=${cancellation.errorDetails}`);
            console.log("CANCELED: Did you update the subscription info?");
        }
        break;
    }

連續辨識Continuous recognition

先前的範例會使用可辨識單一語句的一次性辨識。The previous examples use single-shot recognition, which recognizes a single utterance. 單一語句的結尾會藉由聽取結束時的靜默來決定,或是在處理音訊達 15 秒的上限時結束。The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.

相反地,當您想要 控制 何時停止辨識時,可使用連續辨識。In contrast, continuous recognition is used when you want to control when to stop recognizing. 您必須訂閱 RecognizingRecognizedCanceled 事件,才能取得辨識結果。It requires you to subscribe to the Recognizing, Recognized, and Canceled events to get the recognition results. 若要停止辨識,您必須呼叫 stopContinuousRecognitionAsyncTo stop recognition, you must call stopContinuousRecognitionAsync. 以下範例說明如何對音訊輸入檔執行連續辨識。Here's an example of how continuous recognition is performed on an audio input file.

首先要定義輸入並初始化 SpeechRecognizerStart by defining the input and initializing a SpeechRecognizer:

const recognizer = new sdk.SpeechRecognizer(speechConfig);

再來訂閱從 SpeechRecognizer 傳送的事件。Next, subscribe to the events sent from the SpeechRecognizer.

  • recognizing:包含中繼辨識結果的事件訊號。recognizing: Signal for events containing intermediate recognition results.
  • recognized:包含最終辨識結果的事件訊號 (表示成功的辨識嘗試)。recognized: Signal for events containing final recognition results (indicating a successful recognition attempt).
  • sessionStopped:表示辨識工作階段 (作業) 結束的事件訊號。sessionStopped: Signal for events indicating the end of a recognition session (operation).
  • canceled:包含已取消之辨識結果的事件訊號 (表示因直接的取消要求或是傳輸或通訊協定失敗而取消的辨識嘗試)。canceled: Signal for events containing canceled recognition results (indicating a recognition attempt that was canceled as a result or a direct cancellation request or, alternatively, a transport or protocol failure).
recognizer.recognizing = (s, e) => {
    console.log(`RECOGNIZING: Text=${e.result.text}`);
};

recognizer.recognized = (s, e) => {
    if (e.result.reason == ResultReason.RecognizedSpeech) {
        console.log(`RECOGNIZED: Text=${e.result.text}`);
    }
    else if (e.result.reason == ResultReason.NoMatch) {
        console.log("NOMATCH: Speech could not be recognized.");
    }
};

recognizer.canceled = (s, e) => {
    console.log(`CANCELED: Reason=${e.reason}`);

    if (e.reason == CancellationReason.Error) {
        console.log(`"CANCELED: ErrorCode=${e.errorCode}`);
        console.log(`"CANCELED: ErrorDetails=${e.errorDetails}`);
        console.log("CANCELED: Did you update the subscription info?");
    }

    recognizer.stopContinuousRecognitionAsync();
};

recognizer.sessionStopped = (s, e) => {
    console.log("\n    Session stopped event.");
    recognizer.stopContinuousRecognitionAsync();
};

設定好所有項目後,請呼叫 startContinuousRecognitionAsync 以開始辨識。With everything set up, call startContinuousRecognitionAsync to start recognizing.

recognizer.startContinuousRecognitionAsync();

// make the following call at some point to stop recognition.
// recognizer.StopContinuousRecognitionAsync();

聽寫模式Dictation mode

使用連續辨識時,您可以使用對應的「啟用聽寫」功能來啟用聽寫處理。When using continuous recognition, you can enable dictation processing by using the corresponding "enable dictation" function. 此模式會使語音設定執行個體解譯句子結構的單字描述,例如標點符號。This mode will cause the speech config instance to interpret word descriptions of sentence structures such as punctuation. 例如,"Do you live in town question mark" 語句會解讀為文字 "Do you live in town?"。For example, the utterance "Do you live in town question mark" would be interpreted as the text "Do you live in town?".

若要啟用聽寫模式,請在您的 SpeechConfig 上使用 enableDictation 方法。To enable dictation mode, use the enableDictation method on your SpeechConfig.

speechConfig.enableDictation();

變更來源語言Change source language

語音辨識的常見工作是指定輸入 (或來源) 語言。A common task for speech recognition is specifying the input (or source) language. 我們來看看如何將輸入語言變更為義大利文。Let's take a look at how you would change the input language to Italian. 在您的程式碼中,尋找您的 SpeechConfig,然後直接在其下方新增以下這一行。In your code, find your SpeechConfig, then add this line directly below it.

speechConfig.speechRecognitionLanguage = "it-IT";

speechRecognitionLanguage 屬性需要語言/地區設定格式字串。The speechRecognitionLanguage property expects a language-locale format string. 您可以提供支援的地區設定/語言清單中位於 [地區設定] 資料行內的任何值。You can provide any value in the Locale column in the list of supported locales/languages.

提高辨識精確度Improve recognition accuracy

有數種方式可讓您使用語音提高辨識正確性 讓我們看看片語清單。There are a few ways to improve recognition accuracy with the Speech Let's take a look at Phrase Lists. 片語清單可用來識別音訊資料中的已知片語,例如人員的姓名或特定位置。Phrase Lists are used to identify known phrases in audio data, like a person's name or a specific location. 您可以將單字或完整片語新增至片語清單中。Single words or complete phrases can be added to a Phrase List. 辨識期間,如果音訊中包含與完整片語完全相符的項目,則會使用片語清單中的項目。During recognition, an entry in a phrase list is used if an exact match for the entire phrase is included in the audio. 如果找不到與片語完全相符的項目,就不會協助辨識。If an exact match to the phrase is not found, recognition is not assisted.

重要

片語清單功能只能在英文中使用。The Phrase List feature is only available in English.

若要使用片語清單,請先建立 PhraseListGrammar 物件,然後使用 addPhrase 新增特定單字和片語。To use a phrase list, first create a PhraseListGrammar object, then add specific words and phrases with addPhrase.

PhraseListGrammar 的任何變更將會在下一次辨識時或重新連線至語音服務之後生效。Any changes to PhraseListGrammar take effect on the next recognition or after a reconnection to the Speech service.

const phraseList = sdk.PhraseListGrammar.fromRecognizer(recognizer);
phraseList.addPhrase("Supercalifragilisticexpialidocious");

如果您需要清除片語清單:If you need to clear your phrase list:

phraseList.clear();

可提高辨識精確度的其他選項Other options to improve recognition accuracy

片語清單只是提高辨識精確度的選項之一。Phrase lists are only one option to improve recognition accuracy. 您也可以:You can also:

您可以使用適用於 Swift 和 Objective-C 的語音 SDK,將語音轉譯為文字。You can transcribe speech into text using the Speech SDK for Swift and Objective-C.

必要條件Prerequisites

下列範例假設您具有 Azure 帳戶和語音服務訂用帳戶。The following samples assume that you have an Azure account and Speech service subscription. 如果您沒有該帳戶和訂用帳戶,請免費試用語音服務If you don't have an account and subscription, try the Speech service for free.

安裝語音 SDK 和範例Install Speech SDK and samples

認知服務語音 SDK 包含以 Swift 和 Objective-C 撰寫的範例,適用於 iOS 和 Mac。The Cognitive Services Speech SDK contains samples written in in Swift and Objective-C for iOS and Mac. 按一下連結可查看每個範例的安裝指示:Click a link to see installation instructions for each sample:

我們也會提供適用於 Objective-C 參考的線上語音 SDKWe also provide an online Speech SDK for Objective-C Reference.

語音服務的核心功能之一,就是能夠辨識並轉譯人類語音 (通常稱為語音轉文字)。One of the core features of the Speech service is the ability to recognize and transcribe human speech (often referred to as speech-to-text). 在本快速入門中,您將了解如何在您的應用程式和產品中使用語音 SDK,以執行高品質的語音轉換文字辨識。In this quickstart, you learn how to use the Speech SDK in your apps and products to perform high-quality speech-to-text conversion.

跳至 GitHub 上的範例Skip to samples on GitHub

如果要直接跳到範例程式碼,請參閱 GitHub 上的 Python 快速入門範例If you want to skip straight to sample code, see the Python quickstart samples on GitHub.

PrerequisitesPrerequisites

本文假設:This article assumes:

安裝和匯入語音 SDKInstall and import the Speech SDK

您必須先安裝語音 SDK,才能執行動作。Before you can do anything, you'll need to install the Speech SDK.

pip install azure-cognitiveservices-speech

如果您在使用 macOS 時遇到安裝問題,您可能需要先執行此命令。If you're on macOS and run into install issues, you may need to run this command first.

python3 -m pip install --upgrade pip

安裝語音 SDK 之後,請將其匯入您的 Python 專案中。After the Speech SDK is installed, import it into your Python project.

import azure.cognitiveservices.speech as speechsdk

建立語音設定Create a speech configuration

若要使用語音 SDK 來呼叫語音服務,您必須建立 SpeechConfigTo call the Speech service using the Speech SDK, you need to create a SpeechConfig. 此類別包含訂用帳戶的相關資訊,例如您的金鑰和關聯的區域、端點、主機或授權權杖。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token. 使用您的金鑰和區域建立 SpeechConfigCreate a SpeechConfig using your key and region. 請參閱尋找金鑰和區域頁面,以尋找您的金鑰區域配對。See the Find keys and region page to find your key-region pair.

speech_config = speechsdk.SpeechConfig(subscription="<paste-your-subscription-key>", region="<paste-your-region>")

您可以透過其他數種方式將 SpeechConfig 初始化:There are a few other ways that you can initialize a SpeechConfig:

  • 使用端點:傳入語音服務端點。With an endpoint: pass in a Speech service endpoint. 金鑰或授權權杖是選用項目。A key or authorization token is optional.
  • 使用主機:傳入主機位址。With a host: pass in a host address. 金鑰或授權權杖是選用項目。A key or authorization token is optional.
  • 使用授權權杖:傳入授權權杖和相關聯的區域。With an authorization token: pass in an authorization token and the associated region.

注意

無論您是執行語音辨識、語音合成、翻譯還是意圖辨識,都一定會建立設定。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

從麥克風辨識Recognize from microphone

若要使用您的裝置麥克風辨識語音,只需建立 SpeechRecognizer 並傳遞 speech_config,無需傳遞 AudioConfigTo recognize speech using your device microphone, simply create a SpeechRecognizer without passing an AudioConfig, and pass your speech_config.

import azure.cognitiveservices.speech as speechsdk

def from_mic():
    speech_config = speechsdk.SpeechConfig(subscription="<paste-your-subscription-key>", region="<paste-your-region>")
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
    
    print("Speak into your microphone.")
    result = speech_recognizer.recognize_once_async().get()
    print(result.text)

from_mic()

如果要使用「特定」的音訊輸入裝置,您需要在 AudioConfig 中指定裝置識別碼,並傳遞至 SpeechRecognizer 建構函式的 audio_config 參數。If you want to use a specific audio input device, you need to specify the device ID in an AudioConfig, and pass it to the SpeechRecognizer constructor's audio_config param. 了解 如何取得音訊輸入裝置的裝置識別碼Learn how to get the device ID for your audio input device.

從檔案辨識Recognize from file

如果要辨識來自音訊檔案的語音而不使用麥克風,請建立 AudioConfig 並使用 filename 參數。If you want to recognize speech from an audio file instead of using a microphone, create an AudioConfig and use the filename parameter.

import azure.cognitiveservices.speech as speechsdk

def from_file():
    speech_config = speechsdk.SpeechConfig(subscription="<paste-your-subscription-key>", region="<paste-your-region>")
    audio_input = speechsdk.AudioConfig(filename="your_file_name.wav")
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)
    
    result = speech_recognizer.recognize_once_async().get()
    print(result.text)

from_file()

錯誤處理Error handling

先前的範例只會從 result.text 取得已辨識的文字,但是若要處理錯誤和其他回應,您必須撰寫一些程式碼來處理結果。The previous examples simply get the recognized text from result.text, but to handle errors and other responses, you'll need to write some code to handle the result. 下列程式碼會評估 result.reason 屬性和:The following code evaluates the result.reason property and:

  • 列印辨識結果:speechsdk.ResultReason.RecognizedSpeechPrints the recognition result: speechsdk.ResultReason.RecognizedSpeech
  • 如果沒有任何相符的辨識,則通知使用者:speechsdk.ResultReason.NoMatch If there is no recognition match, inform the user: speechsdk.ResultReason.NoMatch
  • 如果發生錯誤,則列印錯誤訊息:speechsdk.ResultReason.CanceledIf an error is encountered, print the error message: speechsdk.ResultReason.Canceled
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))

連續辨識Continuous recognition

先前的範例會使用可辨識單一語句的一次性辨識。The previous examples use single-shot recognition, which recognizes a single utterance. 單一語句的結尾會藉由聽取結束時的靜默來決定,或是在處理音訊達 15 秒的上限時結束。The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.

相反地,當您想要 控制 何時停止辨識時,可使用連續辨識。In contrast, continuous recognition is used when you want to control when to stop recognizing. 您必須連線至 EventSignal 才能取得辨識結果,且若要停止辨識,您必須呼叫 stop_continuous_recognition()stop_continuous_recognition()It requires you to connect to the EventSignal to get the recognition results, and in to stop recognition, you must call stop_continuous_recognition() or stop_continuous_recognition(). 以下範例說明如何對音訊輸入檔執行連續辨識。Here's an example of how continuous recognition is performed on an audio input file.

首先,我們要定義輸入並初始化 SpeechRecognizerLet's start by defining the input and initializing a SpeechRecognizer:

audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

接下來,我們要建立一個變數來管理語音辨識的狀態。Next, let's create a variable to manage the state of speech recognition. 為此,我們會將其設定為 False,因為在辨識開始時我們可以安全地假設辨識尚未完成。To start, we'll set this to False, since at the start of recognition we can safely assume that it's not finished.

done = False

現在,我們將建立一個回呼,以在接收到 evt 時停止連續識別。Now, we're going to create a callback to stop continuous recognition when an evt is received. 有幾件事要牢記在心。There's a few things to keep in mind.

  • 接收到 evt 時,會列印 evt 訊息。When an evt is received, the evt message is printed.
  • 接收到 evt 後,會呼叫 stop_continuous_recognition() 以停止辨識。After an evt is received, stop_continuous_recognition() is called to stop recognition.
  • 辨識狀態會變更為 TrueThe recognition state is changed to True.
def stop_cb(evt):
    print('CLOSING on {}'.format(evt))
    speech_recognizer.stop_continuous_recognition()
    nonlocal done
    done = True

此程式碼範例說明如何將回呼連線至從 SpeechRecognizer 傳送的事件。This code sample shows how to connect callbacks to events sent from the SpeechRecognizer.

  • recognizing:包含中繼辨識結果的事件訊號。recognizing: Signal for events containing intermediate recognition results.
  • recognized:包含最終辨識結果的事件訊號 (表示成功的辨識嘗試)。recognized: Signal for events containing final recognition results (indicating a successful recognition attempt).
  • session_started:表示辨識工作階段 (作業) 開始的事件訊號。session_started: Signal for events indicating the start of a recognition session (operation).
  • session_stopped:表示辨識工作階段 (作業) 結束的事件訊號。session_stopped: Signal for events indicating the end of a recognition session (operation).
  • canceled:包含已取消之辨識結果的事件訊號 (表示因直接的取消要求或是傳輸或通訊協定失敗而取消的辨識嘗試)。canceled: Signal for events containing canceled recognition results (indicating a recognition attempt that was canceled as a result or a direct cancellation request or, alternatively, a transport or protocol failure).
speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))

speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)

完成所有設定後,我們可以呼叫 start_continuous_recognition()With everything set up, we can call start_continuous_recognition().

speech_recognizer.start_continuous_recognition()
while not done:
    time.sleep(.5)

聽寫模式Dictation mode

使用連續辨識時,您可以使用對應的「啟用聽寫」功能來啟用聽寫處理。When using continuous recognition, you can enable dictation processing by using the corresponding "enable dictation" function. 此模式會使語音設定執行個體解譯句子結構的單字描述,例如標點符號。This mode will cause the speech config instance to interpret word descriptions of sentence structures such as punctuation. 例如,"Do you live in town question mark" 語句會解讀為文字 "Do you live in town?"。For example, the utterance "Do you live in town question mark" would be interpreted as the text "Do you live in town?".

若要啟用聽寫模式,請在您的 SpeechConfig 上使用 enable_dictation() 方法。To enable dictation mode, use the enable_dictation() method on your SpeechConfig.

SpeechConfig.enable_dictation()

變更來源語言Change source language

語音辨識的常見工作是指定輸入 (或來源) 語言。A common task for speech recognition is specifying the input (or source) language. 我們來看看如何將輸入語言變更為德文。Let's take a look at how you would change the input language to German. 在您的程式碼中,尋找您的 SpeechConfig,然後直接在其下方新增以下這一行。In your code, find your SpeechConfig, then add this line directly below it.

speech_config.speech_recognition_language="de-DE"

speech_recognition_language 是以字串作為引數的參數。speech_recognition_language is a parameter that takes a string as an argument. 您可以提供支援的地區設定/語言清單中的任何值。You can provide any value in the list of supported locales/languages.

提高辨識精確度Improve recognition accuracy

有數種方式可讓您使用語音 SDK 提高辨識精確度。There are a few ways to improve recognition accuracy with the Speech SDK. 我們來看看片語清單。Let's take a look at Phrase Lists. 片語清單可用來識別音訊資料中的已知片語,例如人員的姓名或特定位置。Phrase Lists are used to identify known phrases in audio data, like a person's name or a specific location. 您可以將單字或完整片語新增至片語清單中。Single words or complete phrases can be added to a Phrase List. 辨識期間,如果音訊中包含與完整片語完全相符的項目,則會使用片語清單中的項目。During recognition, an entry in a phrase list is used if an exact match for the entire phrase is included in the audio. 如果找不到與片語完全相符的項目,就不會協助辨識。If an exact match to the phrase is not found, recognition is not assisted.

重要

片語清單功能只能在英文中使用。The Phrase List feature is only available in English.

若要使用片語清單,請先建立 PhraseListGrammar 物件,然後使用 addPhrase 新增特定單字和片語。To use a phrase list, first create a PhraseListGrammar object, then add specific words and phrases with addPhrase.

PhraseListGrammar 的任何變更將會在下一次辨識時或重新連線至語音服務之後生效。Any changes to PhraseListGrammar take effect on the next recognition or after a reconnection to the Speech service.

phrase_list_grammar = speechsdk.PhraseListGrammar.from_recognizer(reco)
phrase_list_grammar.addPhrase("Supercalifragilisticexpialidocious")

如果您需要清除片語清單:If you need to clear your phrase list:

phrase_list_grammar.clear()

可提高辨識精確度的其他選項Other options to improve recognition accuracy

片語清單只是提高辨識精確度的選項之一。Phrase lists are only one option to improve recognition accuracy. 您也可以:You can also:

在本快速入門中,您將了解如何使用語音服務和 cURL 將語音轉換成文字。In this quickstart, you learn how to convert speech to text using the Speech service and cURL.

如需有關語音轉換文字概念的高階探討,請參閱概觀文章。For a high-level look at Speech-to-Text concepts, see the overview article.

必要條件Prerequisites

本文假設您具有 Azure 帳戶和語音服務訂用帳戶。This article assumes that you have an Azure account and Speech service subscription. 如果您沒有該帳戶和訂用帳戶,請免費試用語音服務If you don't have an account and subscription, try the Speech service for free.

語音轉換文字Convert speech to text

在命令提示字元中執行下列命令。At a command prompt, run the following command. 您必須將下列值插入命令中。You will need to insert the following values into the command.

  • 您的語音服務訂用帳戶金鑰。Your Speech service subscription key.
  • 您的語音服務區域。Your Speech service region.
  • 輸入音訊檔案路徑。The input audio file path. 您可以使用文字轉換語音來產生音訊檔案。You can generate audio files using text-to-speech.
curl --location --request POST 'https://INSERT_REGION_HERE.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: audio/wav' \
--data-binary 'INSERT_AUDIO_FILE_PATH_HERE'

您應該會收到類似下列的回應。You should receive a response like the following one.

{
    "RecognitionStatus": "Success",
    "DisplayText": "My voice is my passport, verify me.",
    "Offset": 6600000,
    "Duration": 32100000
}

如需詳細資訊,請參閱語音轉換文字 REST API 參考For more information see the speech-to-text REST API reference.

語音服務的核心功能之一,就是能夠辨識並轉譯人類語音 (通常稱為語音轉文字)。One of the core features of the Speech service is the ability to recognize and transcribe human speech (often referred to as speech-to-text). 在本快速入門中,您將了解如何在您的應用程式和產品中使用語音 CLI,以執行高品質的語音轉換文字辨識。In this quickstart, you learn how to use the Speech CLI in your apps and products to perform high-quality speech-to-text conversion.

下載並安裝Download and install

注意

在 Windows 上,您需根據平台來選擇適用於 Visual Studio 2019 的 Microsoft Visual C++ 可轉散發套件On Windows, you need the Microsoft Visual C++ Redistributable for Visual Studio 2019 for your platform. 第一次安裝時可能需要重新啟動 Windows。Installing this for the first time may require you to restart Windows.

請遵循下列步驟,在 Windows 上安裝語音 CLI:Follow these steps to install the Speech CLI on Windows:

  1. 下載語音 CLI zip 封存,然後將其解壓縮。Download the Speech CLI zip archive, then extract it.
  2. 移至您從下載解壓縮的根目錄 spx-zips,並解壓縮您需要的子目錄 (.NET Framework 4.7 為 spx-net471,或在 x64 CPU 上的 .NET Core 3.0 為 spx-netcore-win-x64)。Go to the root directory spx-zips that you extracted from the download, and extract the subdirectory that you need (spx-net471 for .NET Framework 4.7, or spx-netcore-win-x64 for .NET Core 3.0 on an x64 CPU).

在命令提示字元中,將目錄變更至此位置,然後輸入 spx 以查看語音 CLI 的說明。In the command prompt, change directory to this location, and then type spx to see help for the Speech CLI.

注意

在 Windows 上,語音 CLI 只能顯示本機電腦上的命令提示字元可用的字型。On Windows, the Speech CLI can only show fonts available to the command prompt on the local computer. Windows 終端機 支援語音 CLI 以互動方式產生的所有字型。Windows Terminal supports all fonts produced interactively by the Speech CLI. 如果您輸出至檔案,像是記事本或網頁瀏覽器 (例如 Microsoft Edge) 的文字編輯器也可以顯示所有字型。If you output to a file, a text editor like Notepad or a web browser like Microsoft Edge can also show all fonts.

注意

在尋找命令時,Powershell 不會檢查本機目錄。Powershell does not check the local directory when looking for a command. 在 Powershell 中,將目錄變更為 spx 的位置,然後輸入 .\spx 來呼叫工具。In Powershell, change directory to the location of spx and call the tool by entering .\spx. 如果您將此目錄新增至您的路徑,Powershell 和 Windows 命令提示字元會從任何目錄尋找 spx,但不包括 .\ 前置詞。If you add this directory to your path, Powershell and the Windows command prompt will find spx from any directory without including the .\ prefix.

建立訂用帳戶設定Create subscription config

若要開始使用語音 CLI,您必須輸入語音訂用帳戶金鑰和區域識別碼。To start using the Speech CLI, you need to enter your Speech subscription key and region identifier. 請依照免費試用語音服務中的步驟,來取得這些認證。Get these credentials by following steps in Try the Speech service for free. 一旦您擁有訂用帳戶金鑰和區域識別碼 (例如,Once you have your subscription key and region identifier (ex. eastuswestus),請執行下列命令。eastus, westus), run the following commands.

spx config @key --set SUBSCRIPTION-KEY
spx config @region --set REGION

現在會儲存您的訂用帳戶驗證以供未來的 SPX 要求之用。Your subscription authentication is now stored for future SPX requests. 如果您需要移除其中一個儲存的值,請執行 spx config @region --clearspx config @key --clearIf you need to remove either of these stored values, run spx config @region --clear or spx config @key --clear.

從麥克風進行語音轉換文字Speech-to-text from microphone

插入並開啟您的電腦麥克風,然後關閉任何可能也會使用麥克風的應用程式。Plug in and turn on your PC microphone, and turn off any apps that might also use the microphone. 有些電腦具有內建的麥克風,有些則需要設定藍牙裝置。Some computers have a built-in microphone, while others require configuration of a Bluetooth device.

現在您已準備好執行語音 CLI,以從您的麥克風辨識語音。Now you're ready to run the Speech CLI to recognize speech from your microphone. 從命令列,變更為包含語音 CLI 二進位檔案的目錄,然後執行下列命令。From the command line, change to the directory that contains the Speech CLI binary file, and run the following command.

spx recognize --microphone

注意

語音 CLI 預設為英文。The Speech CLI defaults to English. 您可以從語音轉換文字資料表選擇不同的語言。You can choose a different language from the Speech-to-text table. 例如,新增可辨識德文語音的 --source de-DEFor example, add --source de-DE to recognize German speech.

用麥克風說話,您會看到單字即時轉譯成文字。Speak into the microphone, and you see transcription of your words into text in real-time. 語音 CLI 會在一段無回應時間之後,或當您按 Ctrl + C 時停止。The Speech CLI will stop after a period of silence, or when you press ctrl-C.

從音訊檔案進行語音轉換文字Speech-to-text from audio file

語音 CLI 可以辨識多種檔案格式和自然語言的語音。The Speech CLI can recognize speech in many file formats and natural languages. 在本範例中,您可以使用包含英文語音的任何 WAV 檔案 (16kHz 或 8kHz、16 位元和 Mono PCM)。In this example, you can use any WAV file (16kHz or 8kHz, 16-bit, and mono PCM) that contains English speech. 或者,如果想取得快速範例,請下載 whatstheweatherlike.wav 檔案,然後複製到與語音 CLI 二進位檔案相同的目錄。Or if you want a quick sample, download the whatstheweatherlike.wav file and copy it to the same directory as the Speech CLI binary file.

現在您已準備好執行語音 CLI,執行下列命令以辨識在語音檔中找到的語音。Now you're ready to run the Speech CLI to recognize speech found in the audio file by running the following command.

spx recognize --file whatstheweatherlike.wav

注意

語音 CLI 預設為英文。The Speech CLI defaults to English. 您可以從語音轉換文字資料表選擇不同的語言。You can choose a different language from the Speech-to-text table. 例如,新增可辨識德文語音的 --source de-DEFor example, add --source de-DE to recognize German speech.

語音 CLI 會在畫面上顯示語音的文字轉譯。The Speech CLI will show a text transcription of the speech on the screen.

後續步驟Next steps