語音轉換文字 REST APISpeech-to-text REST API

語音服務可讓您使用 REST API 將語音轉換成文字,做為語音 SDK的替代方案。As an alternative to the Speech SDK, the Speech service allows you to convert speech-to-text using a REST API. 每個可存取的端點皆與區域相關聯。Each accessible endpoint is associated with a region. 您的應用程式需要您打算使用之端點的訂用帳戶金鑰。Your application requires a subscription key for the endpoint you plan to use. REST API 非常有限,而且應該僅用於語音 SDK無法使用的情況。The REST API is very limited, and it should only be used in cases were the Speech SDK cannot.

使用語音轉換文字 REST API 之前,請先瞭解:Before using the speech-to-text REST API, understand:

  • 使用 REST API 並直接傳輸音訊的要求最多隻能包含60秒的音訊。Requests that use the REST API and transmit audio directly can only contain up to 60 seconds of audio.
  • 語音轉文字 REST API 只會傳回最終結果,The speech-to-text REST API only returns final results. 不提供部分的結果。Partial results are not provided.

如果您的應用程式需要傳送較長的音訊,請考慮使用語音 SDK或以檔案為基礎的 REST API,例如批次轉譯。If sending longer audio is a requirement for your application, consider using the Speech SDK or a file-based REST API, like batch transcription.

驗證Authentication

每個要求都需要一個 authorization 標頭。Each request requires an authorization header. 下表會列出各項服務支援的標頭:This table illustrates which headers are supported for each service:

支援的授權標頭Supported authorization headers 語音轉文字Speech-to-text 文字轉換語音Text-to-speech
Ocp-Apim-Subscription-KeyOcp-Apim-Subscription-Key Yes No
授權:持有人Authorization: Bearer Yes Yes

當使用 Ocp-Apim-Subscription-Key 標頭,只需要提供您的訂用帳戶金鑰。When using the Ocp-Apim-Subscription-Key header, you're only required to provide your subscription key. 例如:For example:

'Ocp-Apim-Subscription-Key': 'YOUR_SUBSCRIPTION_KEY'

使用 Authorization: Bearer 標頭時,需要對 issueToken 端點提出要求。When using the Authorization: Bearer header, you're required to make a request to the issueToken endpoint. 在此要求中,要以訂用帳戶金鑰交換有效期間 10 分鐘的存取權杖。In this request, you exchange your subscription key for an access token that's valid for 10 minutes. 在接下來的幾節中,您將瞭解如何取得權杖,並使用權杖。In the next few sections you'll learn how to get a token, and use a token.

如何取得存取權杖How to get an access token

若要取得存取權杖,您必須使用 Ocp-Apim-Subscription-Key 和您的訂用帳戶金鑰,對 issueToken 端點提出要求。To get an access token, you'll need to make a request to the issueToken endpoint using the Ocp-Apim-Subscription-Key and your subscription key.

端點issueToken的格式如下:The issueToken endpoint has this format:

https://<REGION_IDENTIFIER>.api.cognitive.microsoft.com/sts/v1.0/issueToken

取代<REGION_IDENTIFIER>為符合此資料表中訂用帳戶區域的識別碼:Replace <REGION_IDENTIFIER> with the identifier matching the region of your subscription from this table:

[地理位置]Geography 區域Region 區域識別碼Region identifier
美洲Americas 美國中部Central US centralus
美洲Americas 美國東部East US eastus
美洲Americas 美國東部 2East US 2 eastus2
美洲Americas 美國中北部North Central US northcentralus
美洲Americas 美國中南部South Central US southcentralus
美洲Americas 美國中西部West Central US westcentralus
美洲Americas 美國西部West US westus
美洲Americas 美國西部 2West US 2 westus2
美洲Americas 加拿大中部Canada Central canadacentral
美洲Americas 巴西南部Brazil South brazilsouth
亞太地區Asia Pacific 東亞East Asia eastasia
亞太地區Asia Pacific 東南亞Southeast Asia southeastasia
亞太地區Asia Pacific 澳大利亞東部Australia East australiaeast
亞太地區Asia Pacific 印度中部Central India centralindia
亞太地區Asia Pacific 日本東部Japan East japaneast
亞太地區Asia Pacific 日本西部Japan West japanwest
亞太地區Asia Pacific 南韓中部Korea Central koreacentral
歐洲Europe 北歐North Europe northeurope
歐洲Europe 西歐West Europe westeurope
歐洲Europe 法國中部France Central francecentral
歐洲Europe 英國南部UK South uksouth

使用以下範例建立您的存取權杖要求。Use these samples to create your access token request.

HTTP 範例HTTP sample

這個範例是可取得權杖的簡單 HTTP 要求。This example is a simple HTTP request to get a token. 使用您的語音服務訂用帳戶金鑰來取代 YOUR_SUBSCRIPTION_KEYReplace YOUR_SUBSCRIPTION_KEY with your Speech Service subscription key. 如果您的訂用帳戶不在美國西部區域,請以您區域的主機名稱取代 HostIf your subscription isn't in the West US region, replace the Host header with your region's host name.

POST /sts/v1.0/issueToken HTTP/1.1
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY
Host: westus.api.cognitive.microsoft.com
Content-type: application/x-www-form-urlencoded
Content-Length: 0

回應的本文包含採用 JSON Web 權杖 (JWT) 格式的存取權杖。The body of the response contains the access token in JSON Web Token (JWT) format.

PowerShell 範例PowerShell sample

這個範例是簡單的 PowerShell 指令碼,用來取得存取權杖。This example is a simple PowerShell script to get an access token. 使用您的語音服務訂用帳戶金鑰來取代 YOUR_SUBSCRIPTION_KEYReplace YOUR_SUBSCRIPTION_KEY with your Speech Service subscription key. 請務必使用符合您訂用帳戶區域的正確端點。Make sure to use the correct endpoint for the region that matches your subscription. 此範例目前設為「美國西部」。This example is currently set to West US.

$FetchTokenHeader = @{
  'Content-type'='application/x-www-form-urlencoded';
  'Content-Length'= '0';
  'Ocp-Apim-Subscription-Key' = 'YOUR_SUBSCRIPTION_KEY'
}

$OAuthToken = Invoke-RestMethod -Method POST -Uri https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken
 -Headers $FetchTokenHeader

# show the token received
$OAuthToken

cURL 範例cURL sample

cURL 是 Linux (以及適用於 Linux 的 Windows 子系統) 中可用的命令列工具。cURL is a command-line tool available in Linux (and in the Windows Subsystem for Linux). 此 cURL 命令說明如何取得存取權杖。This cURL command illustrates how to get an access token. 使用您的語音服務訂用帳戶金鑰來取代 YOUR_SUBSCRIPTION_KEYReplace YOUR_SUBSCRIPTION_KEY with your Speech Service subscription key. 請務必使用符合您訂用帳戶區域的正確端點。Make sure to use the correct endpoint for the region that matches your subscription. 此範例目前設為「美國西部」。This example is currently set to West US.

curl -v -X POST
 "https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken" \
 -H "Content-type: application/x-www-form-urlencoded" \
 -H "Content-Length: 0" \
 -H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY"

C# 範例C# sample

此 C# 類別說明了如何取得存取權杖。This C# class illustrates how to get an access token. 在您具現化類別時,請傳遞您的語音服務訂用帳戶金鑰。Pass your Speech Service subscription key when you instantiate the class. 如果您的訂用帳戶不在美國西部區域,請變更 FetchTokenUri 的值以符合您訂用帳戶的區域。If your subscription isn't in the West US region, change the value of FetchTokenUri to match the region for your subscription.

public class Authentication
{
    public static readonly string FetchTokenUri =
        "https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken";
    private string subscriptionKey;
    private string token;

    public Authentication(string subscriptionKey)
    {
        this.subscriptionKey = subscriptionKey;
        this.token = FetchTokenAsync(FetchTokenUri, subscriptionKey).Result;
    }

    public string GetAccessToken()
    {
        return this.token;
    }

    private async Task<string> FetchTokenAsync(string fetchUri, string subscriptionKey)
    {
        using (var client = new HttpClient())
        {
            client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", subscriptionKey);
            UriBuilder uriBuilder = new UriBuilder(fetchUri);

            var result = await client.PostAsync(uriBuilder.Uri.AbsoluteUri, null);
            Console.WriteLine("Token Uri: {0}", uriBuilder.Uri.AbsoluteUri);
            return await result.Content.ReadAsStringAsync();
        }
    }
}

Python 範例Python sample

# Request module must be installed.
# Run pip install requests if necessary.
import requests

subscription_key = 'REPLACE_WITH_YOUR_KEY'


def get_token(subscription_key):
    fetch_token_url = 'https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken'
    headers = {
        'Ocp-Apim-Subscription-Key': subscription_key
    }
    response = requests.post(fetch_token_url, headers=headers)
    access_token = str(response.text)
    print(access_token)

如何使用存取權杖How to use an access token

存取權杖應傳送到服務作為 Authorization: Bearer <TOKEN> 標頭。The access token should be sent to the service as the Authorization: Bearer <TOKEN> header. 每一個存取權杖的有效時間為 10 分鐘。Each access token is valid for 10 minutes. 您可以隨時取得新權杖,但為了盡量降低網路流量和延遲,建議您使用相同的權杖九分鐘。You can get a new token at any time, however, to minimize network traffic and latency, we recommend using the same token for nine minutes.

以下是文字轉語音 REST API 的範例 HTTP 要求:Here's a sample HTTP request to the text-to-speech REST API:

POST /cognitiveservices/v1 HTTP/1.1
Authorization: Bearer YOUR_ACCESS_TOKEN
Host: westus.stt.speech.microsoft.com
Content-type: application/ssml+xml
Content-Length: 199
Connection: Keep-Alive

// Message body here...

區域與端點Regions and endpoints

REST API 的端點具有下列格式:The endpoint for the REST API has this format:

https://<REGION_IDENTIFIER>.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1

取代 <REGION_IDENTIFIER> 為符合此資料表中訂用帳戶區域的識別碼:Replace <REGION_IDENTIFIER> with the identifier matching the region of your subscription from this table:

[地理位置]Geography 區域Region 區域識別碼Region identifier
美洲Americas 美國中部Central US centralus
美洲Americas 美國東部East US eastus
美洲Americas 美國東部 2East US 2 eastus2
美洲Americas 美國中北部North Central US northcentralus
美洲Americas 美國中南部South Central US southcentralus
美洲Americas 美國中西部West Central US westcentralus
美洲Americas 美國西部West US westus
美洲Americas 美國西部 2West US 2 westus2
美洲Americas 加拿大中部Canada Central canadacentral
美洲Americas 巴西南部Brazil South brazilsouth
亞太地區Asia Pacific 東亞East Asia eastasia
亞太地區Asia Pacific 東南亞Southeast Asia southeastasia
亞太地區Asia Pacific 澳大利亞東部Australia East australiaeast
亞太地區Asia Pacific 印度中部Central India centralindia
亞太地區Asia Pacific 日本東部Japan East japaneast
亞太地區Asia Pacific 日本西部Japan West japanwest
亞太地區Asia Pacific 南韓中部Korea Central koreacentral
歐洲Europe 北歐North Europe northeurope
歐洲Europe 西歐West Europe westeurope
歐洲Europe 法國中部France Central francecentral
歐洲Europe 英國南部UK South uksouth

注意

必須將語言參數附加到 URL 後方,以避免收到 4xx HTTP 錯誤。The language parameter must be appended to the URL to avoid receiving an 4xx HTTP error. 例如,使用美國西部端點設定為美式英文的語言是:https://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-USFor example, the language set to US English using the West US endpoint is: https://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US.

查詢參數Query parameters

REST 要求的查詢字串中可能包括這些參數。These parameters may be included in the query string of the REST request.

參數Parameter 描述Description 必要/選用Required / Optional
language 識別正在辨識的口說語言。Identifies the spoken language that is being recognized. 請參閱支援的語言See Supported languages. 必要Required
format 指定結果格式。Specifies the result format. 接受的值為 simpledetailedAccepted values are simple and detailed. 簡單的結果包含 RecognitionStatusDisplayTextOffsetDurationSimple results include RecognitionStatus, DisplayText, Offset, and Duration. 詳細的回應包含四種不同的顯示文字標記法。Detailed responses include four different representations of display text. 預設設定是 simpleThe default setting is simple. 選擇性Optional
profanity 指定如何處理辨識結果中的不雅內容。Specifies how to handle profanity in recognition results. 接受的值為,以星號取代不雅內容, masked removed 這會移除結果中的所有不雅內容,或 raw (包含結果中的不雅內容)。Accepted values are masked, which replaces profanity with asterisks, removed, which removes all profanity from the result, or raw, which includes the profanity in the result. 預設設定是 maskedThe default setting is masked. 選擇性Optional
cid 使用自訂語音入口網站建立自訂模型時,您可以透過 [部署] 頁面上找到的端點識別碼來使用自訂模型。When using the Custom Speech portal to create custom models, you can use custom models via their Endpoint ID found on the Deployment page. 使用端點識別碼作為 cid 查詢字串參數的引數。Use the Endpoint ID as the argument to the cid query string parameter. 選擇性Optional

要求標頭Request headers

下表列出了語音轉文字要求的必要標頭和選用標頭。This table lists required and optional headers for speech-to-text requests.

頁首Header 描述Description 必要/選用Required / Optional
Ocp-Apim-Subscription-Key 您的語音服務訂用帳戶金鑰。Your Speech service subscription key. 必須有此標頭或 AuthorizationEither this header or Authorization is required.
Authorization 前面加入 Bearer 這個字的授權權杖。An authorization token preceded by the word Bearer. 如需詳細資訊,請參閱驗證For more information, see Authentication. 必須有此標頭或 Ocp-Apim-Subscription-KeyEither this header or Ocp-Apim-Subscription-Key is required.
Pronunciation-Assessment 指定用來在辨識結果中顯示發音分數的參數,其會評估語音輸入的發音品質,以及精確度、順暢、完整性等等的指示器。此參數是 base64 編碼的 json,其中包含多個詳細參數。Specifies the parameters for showing pronunciation scores in recognition results, which assess the pronunciation quality of speech input, with indicators of accuracy, fluency, completeness, etc. This parameter is a base64 encoded json containing multiple detailed parameters. 如需如何建立此標頭的詳細資料,請參閱發音評估參數See Pronunciation assessment parameters for how to build this header. 選擇性Optional
Content-type 描述所提供音訊資料的格式和轉碼器。Describes the format and codec of the provided audio data. 接受的值為 audio/wav; codecs=audio/pcm; samplerate=16000audio/ogg; codecs=opusAccepted values are audio/wav; codecs=audio/pcm; samplerate=16000 and audio/ogg; codecs=opus. 必要Required
Transfer-Encoding 指定正在傳送的音訊資料區塊,而不是單一檔案。Specifies that chunked audio data is being sent, rather than a single file. 只有在以區塊處理音訊資料時,才能使用此標頭。Only use this header if chunking audio data. 選擇性Optional
Expect 如果使用區塊傳輸,請傳送 Expect: 100-continueIf using chunked transfer, send Expect: 100-continue. 語音服務會確認初始要求並等候其他資料。The Speech service acknowledges the initial request and awaits additional data. 如果傳送的是音訊資料區塊,則為必要。Required if sending chunked audio data.
Accept 如果提供,則必須是 application/jsonIf provided, it must be application/json. 語音服務會以 JSON 提供結果。The Speech service provides results in JSON. 某些要求架構會提供不相容的預設值。Some request frameworks provide an incompatible default value. 最佳做法是一律包含 AcceptIt is good practice to always include Accept. 此為選用步驟,但建議執行。Optional, but recommended.

自動格式Audio formats

音訊是在 HTTP POST 要求的主體中傳送。Audio is sent in the body of the HTTP POST request. 它必須是此表格中的格式之一:It must be in one of the formats in this table:

格式Format 轉碼器Codec 位元速率Bit rate 採樣速率Sample Rate
WAVWAV PCMPCM 256 kbps256 kbps 16 kHz,單聲道16 kHz, mono
OGGOGG OPUSOPUS 256 kpbs256 kpbs 16 kHz,單聲道16 kHz, mono

注意

您可以透過語音服務中的 REST API 和 WebSocket 來支援上述格式。The above formats are supported through REST API and WebSocket in the Speech service. 語音 SDK目前支援具有 PCM 編解碼器和其他格式的 WAV 格式。The Speech SDK currently supports the WAV format with PCM codec as well as other formats.

發音評估參數Pronunciation assessment parameters

下表列出發音評估的必要和選擇性參數。This table lists required and optional parameters for pronunciation assessment.

參數Parameter 描述Description 必要/選用Required / Optional
ReferenceTextReferenceText 要針對發音進行評估的文字。The text that the pronunciation will be evaluated against. 必要Required
GradingSystemGradingSystem 用於評分校正的點系統。The point system for score calibration. 接受的值為 FivePointHundredMarkAccepted values are FivePoint and HundredMark. 預設設定是 FivePointThe default setting is FivePoint. 選擇性Optional
細微度Granularity 評估資料細微性。The evaluation granularity. 接受的值為 Phoneme ,它會顯示全文檢索單字和音素層級的分數,這會顯示全文檢索 Word 和字層級的分數,這 FullText 只會顯示全文檢索層級的分數。Accepted values are Phoneme, which shows the score on the full text, word and phoneme level, Word, which shows the score on the full text and word level, FullText, which shows the score on the full text level only. 預設設定是 PhonemeThe default setting is Phoneme. 選擇性Optional
維度Dimension 定義輸出準則。Defines the output criteria. 接受的值為 Basic (僅顯示精確度分數),會 Comprehensive 顯示更多維度的分數(例如,在全文檢索層級上順暢分數和完整性分數,word 層級的錯誤類型)。Accepted values are Basic, which shows the accuracy score only, Comprehensive shows scores on more dimensions (e.g. fluency score and completeness score on the full text level, error type on word level). 請檢查回應參數,以查看不同分數維度和 word 錯誤類型的定義。Check Response parameters to see definitions of different score dimensions and word error types. 預設設定是 BasicThe default setting is Basic. 選擇性Optional
EnableMiscueEnableMiscue 啟用 miscue 計算。Enables miscue calculation. 啟用此功能時,發音會與參考文字進行比較,而且會根據比較來標示省略/插入。With this enabled, the pronounced words will be compared to the reference text, and will be marked with omission/insertion based on the comparison. 接受的值為 FalseTrueAccepted values are False and True. 預設設定是 FalseThe default setting is False. 選擇性Optional
ScenarioIdScenarioId 指出自訂點系統的 GUID。A GUID indicating a customized point system. 選擇性Optional

以下是包含發音評估參數的 JSON 範例:Below is an example JSON containing the pronunciation assessment parameters:

{
  "ReferenceText": "Good morning.",
  "GradingSystem": "HundredMark",
  "Granularity": "FullText",
  "Dimension": "Comprehensive"
}

下列範例程式碼示範如何在標頭中建立發音評估參數 Pronunciation-AssessmentThe following sample code shows how to build the pronunciation assessment parameters into the Pronunciation-Assessment header:

var pronAssessmentParamsJson = $"{{\"ReferenceText\":\"Good morning.\",\"GradingSystem\":\"HundredMark\",\"Granularity\":\"FullText\",\"Dimension\":\"Comprehensive\"}}";
var pronAssessmentParamsBytes = Encoding.UTF8.GetBytes(pronAssessmentParamsJson);
var pronAssessmentHeader = Convert.ToBase64String(pronAssessmentParamsBytes);

我們強烈建議在張貼音訊資料時進行串流(區塊)上傳,這可能會大幅降低延遲。We strongly recommend streaming (chunked) uploading while posting the audio data, which can significantly reduce the latency. 請參閱不同程式設計語言的範例程式碼,以瞭解如何啟用串流。See sample code in different programming languages for how to enable streaming.

注意

發音評估功能目前僅適用于 westuseastasiacentralindia 區域。The pronunciation assessment feature is currently only available on westus, eastasia and centralindia regions. 這項功能目前僅適用于 en-US 語言。And this feature is currently only available on en-US language.

範例要求Sample request

下列範例包含主機名稱和必要標頭。The sample below includes the hostname and required headers. 請務必注意,此服務也預期會有此範例中未包含的音訊資料。It's important to note that the service also expects audio data, which is not included in this sample. 如先前所述,建議以區塊處理,但非必要。As mentioned earlier, chunking is recommended, however, not required.

POST speech/recognition/conversation/cognitiveservices/v1?language=en-US&format=detailed HTTP/1.1
Accept: application/json;text/xml
Content-Type: audio/wav; codecs=audio/pcm; samplerate=16000
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY
Host: westus.stt.speech.microsoft.com
Transfer-Encoding: chunked
Expect: 100-continue

若要啟用發音評量,您可以新增下列標頭。To enable pronunciation assessment, you can add below header. 如需如何建立此標頭的詳細資料,請參閱發音評估參數See Pronunciation assessment parameters for how to build this header.

Pronunciation-Assessment: eyJSZWZlcm...

HTTP 狀態碼HTTP status codes

每個回應的 HTTP 狀態碼會指出成功或常見的錯誤。The HTTP status code for each response indicates success or common errors.

HTTP 狀態碼HTTP status code 描述Description 可能的原因Possible reason
100 繼續Continue 已接受初始要求。The initial request has been accepted. 繼續傳送其餘的資料。Proceed with sending the rest of the data. (搭配區區塊轉送使用)(Used with chunked transfer)
200 確定OK 要求成功;回應主體是 JSON 物件。The request was successful; the response body is a JSON object.
400 不正確的要求Bad request 未提供語言代碼,而不是支援的語言、不正確音訊檔案等。Language code not provided, not a supported language, invalid audio file, etc.
401 未經授權Unauthorized 訂用帳戶金鑰或授權權杖在指定的區域中無效,或是無效的端點。Subscription key or authorization token is invalid in the specified region, or invalid endpoint.
403 禁止Forbidden 遺漏訂用帳戶金鑰或授權權杖。Missing subscription key or authorization token.

區塊傳輸Chunked transfer

區塊傳輸( Transfer-Encoding: chunked )可協助減少辨識延遲。Chunked transfer (Transfer-Encoding: chunked) can help reduce recognition latency. 它可讓語音服務在音訊檔案進行傳輸時開始處理。It allows the Speech service to begin processing the audio file while it is transmitted. REST API 不會提供部分或中間的結果。The REST API does not provide partial or interim results.

此程式碼範例說明如何以區塊傳送音訊。This code sample shows how to send audio in chunks. 只有第一個區塊應該包含音訊檔案的標頭。Only the first chunk should contain the audio file's header. requestHttpWebRequest 連接到適當 REST 端點的物件。request is an HttpWebRequest object connected to the appropriate REST endpoint. audioFile 是音訊檔案在磁碟上的路徑。audioFile is the path to an audio file on disk.

var request = (HttpWebRequest)HttpWebRequest.Create(requestUri);
request.SendChunked = true;
request.Accept = @"application/json;text/xml";
request.Method = "POST";
request.ProtocolVersion = HttpVersion.Version11;
request.Host = host;
request.ContentType = @"audio/wav; codecs=audio/pcm; samplerate=16000";
request.Headers["Ocp-Apim-Subscription-Key"] = "YOUR_SUBSCRIPTION_KEY";
request.AllowWriteStreamBuffering = false;

using (var fs = new FileStream(audioFile, FileMode.Open, FileAccess.Read))
{
    // Open a request stream and write 1024 byte chunks in the stream one at a time.
    byte[] buffer = null;
    int bytesRead = 0;
    using (var requestStream = request.GetRequestStream())
    {
        // Read 1024 raw bytes from the input audio file.
        buffer = new Byte[checked((uint)Math.Min(1024, (int)fs.Length))];
        while ((bytesRead = fs.Read(buffer, 0, buffer.Length)) != 0)
        {
            requestStream.Write(buffer, 0, bytesRead);
        }

        requestStream.Flush();
    }
}

回應參數Response parameters

結果以 JSON 格式提供。Results are provided as JSON. simple 格式包含以下的最上層欄位。The simple format includes these top-level fields.

參數Parameter 說明Description
RecognitionStatus 狀態,例如 Success 代表辨識成功。Status, such as Success for successful recognition. 請參閱下一個表格。See next table.
DisplayText 大小寫、標點符號、反向文字正規化後的已辨識文字(將語音文字轉換成較短的表單,例如200代表 "200" 或 "Dr. smith" 代表 "醫生 Smith"),以及不雅內容遮罩。The recognized text after capitalization, punctuation, inverse text normalization (conversion of spoken text to shorter forms, such as 200 for "two hundred" or "Dr. Smith" for "doctor smith"), and profanity masking. 只會在成功時呈現。Present only on success.
Offset 辨識的語音在音訊資料流中開始的時間 (以 100 奈秒為單位)。The time (in 100-nanosecond units) at which the recognized speech begins in the audio stream.
Duration 辨識的語音在音訊資料流中的持續時間 (以 100 奈秒為單位)。The duration (in 100-nanosecond units) of the recognized speech in the audio stream.

RecognitionStatus 欄位可包含下列的值:The RecognitionStatus field may contain these values:

狀態Status 描述Description
Success 辨識成功並顯示 DisplayText 欄位。The recognition was successful and the DisplayText field is present.
NoMatch 音訊串流中偵測到語音,但目標語言中沒有符合的字組。Speech was detected in the audio stream, but no words from the target language were matched. 通常表示辨識語言與使用者的口語語言不同。Usually means the recognition language is a different language from the one the user is speaking.
InitialSilenceTimeout 音訊串流的開頭沒有任何聲音,而且等候語音的服務已逾時。The start of the audio stream contained only silence, and the service timed out waiting for speech.
BabbleTimeout 音訊串流的開頭只有雜音,而且等候語音的服務已逾時。The start of the audio stream contained only noise, and the service timed out waiting for speech.
Error 辨識服務發生內部錯誤,無法繼續。The recognition service encountered an internal error and could not continue. 可能的話,再試一次。Try again if possible.

注意

如果音訊只包含不雅內容,而且 profanity 查詢參數設為 remove,則服務不會傳回語音結果。If the audio consists only of profanity, and the profanity query parameter is set to remove, the service does not return a speech result.

detailed格式包括其他形式的已辨識結果。The detailed format includes additional forms of recognized results. 使用 detailed 格式時,系統會提供 DisplayText 作為 NBest 清單中每個結果的 DisplayWhen using the detailed format, DisplayText is provided as Display for each result in the NBest list.

清單中的物件 NBest 可以包括:The object in the NBest list can include:

參數Parameter 說明Description
Confidence 項目的信賴分數從 0.0 (不信賴) 到 1.0 (完全信賴)The confidence score of the entry from 0.0 (no confidence) to 1.0 (full confidence)
Lexical 已辨識文字的語彙形式:已辨識的實際文字。The lexical form of the recognized text: the actual words recognized.
ITN 已辨識文字的反向文字正規化 (「標準」) 形式,包含電話號碼、數字、縮寫 ("doctor smith" 縮短為 "dr smith"),以及其他已套件的轉換。The inverse-text-normalized ("canonical") form of the recognized text, with phone numbers, numbers, abbreviations ("doctor smith" to "dr smith"), and other transformations applied.
MaskedITN 如果要求,已套用不雅內容遮罩的 ITN 形式。The ITN form with profanity masking applied, if requested.
Display 已辨識文字的顯示形式,已新增標點符號和大寫。The display form of the recognized text, with punctuation and capitalization added. 此參數與當格式設定為 simple 時,所提供的 DisplayText 相同。This parameter is the same as DisplayText provided when format is set to simple.
AccuracyScore 語音的發音精確度。Pronunciation accuracy of the speech. [精確度] 表示音素與原生說話者的發音相符的程度。Accuracy indicates how closely the phonemes match a native speaker's pronunciation. Word 和全文檢索層級精確度分數是從音素層級精確度分數匯總而來。Word and full text level accuracy score is aggregated from phoneme level accuracy score.
FluencyScore 指定語音的順暢。Fluency of the given speech. 順暢指出語音與原生說話者在單字之間使用無訊息中斷的程度。Fluency indicates how closely the speech matches a native speaker's use of silent breaks between words.
CompletenessScore 語音的完整性,是藉由計算發音為參考文字輸入的比例來判斷。Completeness of the speech, determined by calculating the ratio of pronounced words to reference text input.
PronScore 表示指定語音之發音品質的整體分數。Overall score indicating the pronunciation quality of the given speech. 這會從 AccuracyScore FluencyScore 和加權匯總 CompletenessScoreThis is aggregated from AccuracyScore, FluencyScore and CompletenessScore with weight.
ErrorType 這個值會指出相較于,是否省略、插入或發音不正確的字 ReferenceTextThis value indicates whether a word is omitted, inserted or badly pronounced, compared to ReferenceText. 可能的值為 None (表示此單字上沒有錯誤) OmissionInsertionMispronunciationPossible values are None (meaning no error on this word), Omission, Insertion and Mispronunciation.

回應範例Sample responses

辨識的一般回應 simpleA typical response for simple recognition:

{
  "RecognitionStatus": "Success",
  "DisplayText": "Remind me to buy 5 pencils.",
  "Offset": "1236645672289",
  "Duration": "1236645672289"
}

辨識的一般回應 detailedA typical response for detailed recognition:

{
  "RecognitionStatus": "Success",
  "Offset": "1236645672289",
  "Duration": "1236645672289",
  "NBest": [
      {
        "Confidence" : "0.87",
        "Lexical" : "remind me to buy five pencils",
        "ITN" : "remind me to buy 5 pencils",
        "MaskedITN" : "remind me to buy 5 pencils",
        "Display" : "Remind me to buy 5 pencils.",
      }
  ]
}

使用發音評量進行辨識的一般回應:A typical response for recognition with pronunciation assessment:

{
  "RecognitionStatus": "Success",
  "Offset": "400000",
  "Duration": "11000000",
  "NBest": [
      {
        "Confidence" : "0.87",
        "Lexical" : "good morning",
        "ITN" : "good morning",
        "MaskedITN" : "good morning",
        "Display" : "Good morning.",
        "PronScore" : 84.4,
        "AccuracyScore" : 100.0,
        "FluencyScore" : 74.0,
        "CompletenessScore" : 100.0,
        "Words": [
            {
              "Word" : "Good",
              "AccuracyScore" : 100.0,
              "ErrorType" : "None",
              "Offset" : 500000,
              "Duration" : 2700000
            },
            {
              "Word" : "morning",
              "AccuracyScore" : 100.0,
              "ErrorType" : "None",
              "Offset" : 5300000,
              "Duration" : 900000
            }
        ]
      }
  ]
}

後續步驟Next steps