使用發音評定

發行項
04/07/2024

在本文中，您將學習如何透過語音 SDK 使用語音轉換文字評估發音。發音評定會評估語音發音，並向說話者提供關於說話音訊正確度和流暢度的意見反應。

在串流模式中使用發音評估

發音評估支援不中斷的串流模式。錄製時間可以透過語音 SDK 無限制。只要您不停止錄製，評估程式就不會完成，而且您可以方便暫停和繼續評估。

如需發音評定可用性的相關資訊，請參閱支援的語言和可用區域。

作為基準，發音評估的使用成本與語音轉文字相同，適用於隨用隨付或承諾層定價。如果您購買語音轉文字的承諾層，發音評估的支出會用於符合承諾用量。如需詳細資訊，請參閱定價。

如需了解如何在您自己的應用程式中以串流模式使用發音評定，請參閱範例程式碼。

設定組態參數

注意

語音 SDK for Go 無法使用發音評定。您可以閱讀本指南中的概念。為您的解決方案選取另一種程式設計語言。

在中 SpeechRecognizer，您可以指定要學習或練習改善發音的語言。預設地區設定為 en-US。若要了解如何在您自己的應用程式中指定發音評估的學習語言，請參閱範例程式碼。

提示

如果您不確定要針對具有多個地區設定的語言設定，請分別嘗試每個地區設定。例如，針對西班牙文，請嘗試 es-ES 和 es-MX。判斷案例的地區設定分數較高。

您必須建立 PronunciationAssessmentConfig 物件。您可以設定 EnableProsodyAssessment 和 EnableContentAssessmentWithTopic 來啟用 prosody 和 content assessment。如需詳細資訊，請參閱組態方法。

var pronunciationAssessmentConfig = new PronunciationAssessmentConfig( 
    referenceText: "", 
    gradingSystem: GradingSystem.HundredMark,  
    granularity: Granularity.Phoneme,  
    enableMiscue: false); 
pronunciationAssessmentConfig.EnableProsodyAssessment(); 
pronunciationAssessmentConfig.EnableContentAssessmentWithTopic("greeting");

auto pronunciationConfig = PronunciationAssessmentConfig::Create("", PronunciationAssessmentGradingSystem::HundredMark, PronunciationAssessmentGranularity::Phoneme, false); 
pronunciationConfig->EnableProsodyAssessment(); 
pronunciationConfig->EnableContentAssessmentWithTopic("greeting");

PronunciationAssessmentConfig pronunciationConfig = new PronunciationAssessmentConfig("", 
    PronunciationAssessmentGradingSystem.HundredMark, PronunciationAssessmentGranularity.Phoneme, false); 
pronunciationConfig.enableProsodyAssessment(); 
pronunciationConfig.enableContentAssessmentWithTopic("greeting");

pronunciation_config = speechsdk.PronunciationAssessmentConfig( 
    reference_text="", 
    grading_system=speechsdk.PronunciationAssessmentGradingSystem.HundredMark, 
    granularity=speechsdk.PronunciationAssessmentGranularity.Phoneme, 
    enable_miscue=False) 
pronunciation_config.enable_prosody_assessment() 
pronunciation_config.enable_content_assessment_with_topic("greeting")

var pronunciationAssessmentConfig = new sdk.PronunciationAssessmentConfig( 
    referenceText: "", 
    gradingSystem: sdk.PronunciationAssessmentGradingSystem.HundredMark,  
    granularity: sdk.PronunciationAssessmentGranularity.Phoneme,  
    enableMiscue: false); 
pronunciationAssessmentConfig.enableProsodyAssessment(); 
pronunciationAssessmentConfig.enableContentAssessmentWithTopic("greeting");

SPXPronunciationAssessmentConfiguration *pronunicationConfig = 
[[SPXPronunciationAssessmentConfiguration alloc] init:@"" gradingSystem:SPXPronunciationAssessmentGradingSystem_HundredMark granularity:SPXPronunciationAssessmentGranularity_Phoneme enableMiscue:false]; 
[pronunicationConfig enableProsodyAssessment]; 
[pronunicationConfig enableContentAssessmentWithTopic:@"greeting"];

let pronAssessmentConfig = try! SPXPronunciationAssessmentConfiguration("", 
    gradingSystem: .hundredMark, 
    granularity: .phoneme, 
    enableMiscue: false) 
pronAssessmentConfig.enableProsodyAssessment() 
pronAssessmentConfig.enableContentAssessment(withTopic: "greeting")

此表列出發音評定的部分主要設定參數。

參數	描述
`ReferenceText`	評估發音時所使用的文字。 `ReferenceText` 是選用參數。如果您想要針對閱讀語言學習案例執行腳本評估，請設定參考文字。如果您想要執行未標明的評量，請勿設定參考文字。如需腳本與未指定評量之間的定價差異，請參閱定價。
`GradingSystem`	評分校正的計分系統。 `FivePoint` 提供0-5浮點數分數。 `HundredMark` 提供 0-100 浮點數分數。預設值：`FivePoint`。
`Granularity`	判斷評估資料細微性的最低層級。傳回大於或等於最小值之層級的分數。接受的值為 `Phoneme`，其會顯示全文檢索、單字、音節和音素層級的分數， `Word`其中顯示全文檢索和文字層級的分數，或 `FullText`，只會在全文檢索層級顯示分數。提供的完整參考文字可以是單字、句子或段落。這取決於您的輸入參考文字。預設值：`Phoneme`。
`EnableMiscue`	比較發音的字組和參考文字時，會計算失誤。啟用誤判是選擇性的。如果這個值為 `True`，則 `ErrorType` 結果值可以根據比較設定為 `Omission` 或 `Insertion`。值為 `False` 和 `True`。預設值：`False`。若要啟用錯誤計算，請將 `EnableMiscue` 設為 `True`。您可以參考資料表下方的程式碼片段。
`ScenarioId`	自訂點系統的 GUID。

設定方法

下表列出您可以為 PronunciationAssessmentConfig 物件設定的一些選擇性方法。

注意

內容和專業評定僅適用於美國地區設定。

若要探索內容和專業評定，請升級至 SDK 1.35.0 版或更新版本。

方法描述

EnableProsodyAssessment 啟用發音評估的 Prosody 評估。此功能會評估壓力、調音、說話速度和節奏等層面。這項功能可讓您深入了解語音的自然性和表達性。

啟用 Prosody 評定是選擇性的。如果呼叫這個方法，則會 ProsodyScore 傳回結果值。

EnableContentAssessmentWithTopic 啟用內容評量。內容評量是語言學習案例的未標語評定的一部分。藉由提供描述，您可以增強評定對所談論特定主題的瞭解。例如，在 C# 中呼叫 pronunciationAssessmentConfig.EnableContentAssessmentWithTopic("greeting");。您可以將「問候語」取代為您想要的文字來描述主題。描述沒有長度限制，目前僅支援 en-US 地區設定。

方法	描述
`EnableProsodyAssessment`	啟用發音評估的 Prosody 評估。此功能會評估壓力、調音、說話速度和節奏等層面。這項功能可讓您深入了解語音的自然性和表達性。啟用 Prosody 評定是選擇性的。如果呼叫這個方法，則會 `ProsodyScore` 傳回結果值。
`EnableContentAssessmentWithTopic`	啟用內容評量。內容評量是語言學習案例的未標語評定的一部分。藉由提供描述，您可以增強評定對所談論特定主題的瞭解。例如，在 C# 中呼叫 `pronunciationAssessmentConfig.EnableContentAssessmentWithTopic("greeting");`。您可以將「問候語」取代為您想要的文字來描述主題。描述沒有長度限制，目前僅支援 `en-US` 地區設定。

取得發音評定結果

辨識語音時，您可以要求將發音評定結果以 SDK 物件或 JSON 字串來呈現。

using (var speechRecognizer = new SpeechRecognizer(
    speechConfig,
    audioConfig))
{
    pronunciationAssessmentConfig.ApplyTo(speechRecognizer);
    var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync();

    // The pronunciation assessment result as a Speech SDK object
    var pronunciationAssessmentResult =
        PronunciationAssessmentResult.FromResult(speechRecognitionResult);

    // The pronunciation assessment result as a JSON string
    var pronunciationAssessmentResultJson = speechRecognitionResult.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult);
}

使用 SDK 物件搭配適用於 C++ 的語音 SDK，無法使用 Word、音節和音素結果。字組、音節和音素結果只能在 JSON 字串中使用。

auto speechRecognizer = SpeechRecognizer::FromConfig(
    speechConfig,
    audioConfig);

pronunciationAssessmentConfig->ApplyTo(speechRecognizer);
speechRecognitionResult = speechRecognizer->RecognizeOnceAsync().get();

// The pronunciation assessment result as a Speech SDK object
auto pronunciationAssessmentResult =
    PronunciationAssessmentResult::FromResult(speechRecognitionResult);

// The pronunciation assessment result as a JSON string
auto pronunciationAssessmentResultJson = speechRecognitionResult->Properties.GetProperty(PropertyId::SpeechServiceResponse_JsonResult);

若要了解如何在您自己的應用程式中指定發音評估的學習語言，請參閱範例程式碼。

針對 Android 應用程式開發，可以使用 SDK 物件搭配適用於 Java 的語音 SDK 來取得文字、音節和音素結果。結果也可以在 JSON 字串中取得。針對 JAVA Runtime (JRE) 應用程式開發，字組、音節和音素結果只能在 JSON 字串中取得。

SpeechRecognizer speechRecognizer = new SpeechRecognizer(
    speechConfig,
    audioConfig);

pronunciationAssessmentConfig.applyTo(speechRecognizer);
Future<SpeechRecognitionResult> future = speechRecognizer.recognizeOnceAsync();
SpeechRecognitionResult speechRecognitionResult = future.get(30, TimeUnit.SECONDS);

// The pronunciation assessment result as a Speech SDK object
PronunciationAssessmentResult pronunciationAssessmentResult =
    PronunciationAssessmentResult.fromResult(speechRecognitionResult);

// The pronunciation assessment result as a JSON string
String pronunciationAssessmentResultJson = speechRecognitionResult.getProperties().getProperty(PropertyId.SpeechServiceResponse_JsonResult);

recognizer.close();
speechConfig.close();
audioConfig.close();
pronunciationAssessmentConfig.close();
speechRecognitionResult.close();

var speechRecognizer = SpeechSDK.SpeechRecognizer.FromConfig(speechConfig, audioConfig);

pronunciationAssessmentConfig.applyTo(speechRecognizer);

speechRecognizer.recognizeOnceAsync((speechRecognitionResult: SpeechSDK.SpeechRecognitionResult) => {
    // The pronunciation assessment result as a Speech SDK object
    var pronunciationAssessmentResult = SpeechSDK.PronunciationAssessmentResult.fromResult(speechRecognitionResult);

    // The pronunciation assessment result as a JSON string
    var pronunciationAssessmentResultJson = speechRecognitionResult.properties.getProperty(SpeechSDK.PropertyId.SpeechServiceResponse_JsonResult);
},
{});

若要了解如何在您自己的應用程式中指定發音評估的學習語言，請參閱範例程式碼。

speech_recognizer = speechsdk.SpeechRecognizer(
        speech_config=speech_config, \
        audio_config=audio_config)

pronunciation_assessment_config.apply_to(speech_recognizer)
speech_recognition_result = speech_recognizer.recognize_once()

# The pronunciation assessment result as a Speech SDK object
pronunciation_assessment_result = speechsdk.PronunciationAssessmentResult(speech_recognition_result)

# The pronunciation assessment result as a JSON string
pronunciation_assessment_result_json = speech_recognition_result.properties.get(speechsdk.PropertyId.SpeechServiceResponse_JsonResult)

若要了解如何在您自己的應用程式中指定發音評估的學習語言，請參閱範例程式碼。

SPXSpeechRecognizer* speechRecognizer = \
        [[SPXSpeechRecognizer alloc] initWithSpeechConfiguration:speechConfig
                                              audioConfiguration:audioConfig];

[pronunciationAssessmentConfig applyToRecognizer:speechRecognizer];

SPXSpeechRecognitionResult *speechRecognitionResult = [speechRecognizer recognizeOnce];

// The pronunciation assessment result as a Speech SDK object
SPXPronunciationAssessmentResult* pronunciationAssessmentResult = [[SPXPronunciationAssessmentResult alloc] init:speechRecognitionResult];

// The pronunciation assessment result as a JSON string
NSString* pronunciationAssessmentResultJson = [speechRecognitionResult.properties getPropertyByName:SPXSpeechServiceResponseJsonResult];

若要了解如何在您自己的應用程式中指定發音評估的學習語言，請參閱範例程式碼。

let speechRecognizer = try! SPXSpeechRecognizer(speechConfiguration: speechConfig, audioConfiguration: audioConfig)

try! pronConfig.apply(to: speechRecognizer)

let speechRecognitionResult = try? speechRecognizer.recognizeOnce()

// The pronunciation assessment result as a Speech SDK object
let pronunciationAssessmentResult = SPXPronunciationAssessmentResult(speechRecognitionResult!)

// The pronunciation assessment result as a JSON string
let pronunciationAssessmentResultJson = speechRecognitionResult!.properties?.getPropertyBy(SPXPropertyId.speechServiceResponseJsonResult)

結果參數

根據您使用的是文稿式或未指定評量，您可以取得不同的發音評定結果。腳本評估適用於閱讀語言學習案例。未標語評估適用於語言學習案例。

注意

如需腳本與未指定評量之間的定價差異，請參閱定價。

腳本評估結果

下表列出腳本評估或閱讀案例的一些關鍵發音評估結果。

參數	描述	資料粒度
`AccuracyScore`	語音的發音精確度。精確度表示音素與母語人士發音的相符程度。音節、文字和全文檢索精確度分數會從音素級精確度分數匯總，並使用評估目標進行精簡。	電話 me 層級，音節層級（僅限 en-US），文字層級，全文檢索層級
`FluencyScore`	指定語音的流暢度。流暢度指出語音與母語人士在字組間停頓的相符程度。	全文檢索層級
`CompletenessScore`	語音的完整性，按輸入參考文字的發音字組比例計算。	全文檢索層級
`ProsodyScore`	給定演講的原音。 Prosody 指出給定演講的自然性，包括壓力、語調、說話速度和節奏。	全文檢索層級
`PronScore`	所指定語音發音質量的整體分數。 `PronScore` 是 `AccuracyScore`、`FluencyScore` 和 `CompletenessScore` 的加權彙總。	全文檢索層級
`ErrorType`	這個值表示與參考文字相較之下的錯誤類型。選項包括省略、插入或不正確地插入斷字。它也表示標點符號的遺漏中斷。它也指出一個字是發音不好，還是單調地上升、下降或平淡的語句。這個字、`Omission`、、、`Insertion`、`UnexpectedBreakMispronunciation`、 `MissingBreak`和 `Monotone`上沒有任何錯誤的可能值`None`。當某個字組的發音 `Mispronunciation` 低於 60 時，錯誤類型可能是 `AccuracyScore`。	文字層級

未標明的評量結果

下表列出未指定評量或說話案例的一些關鍵發音評量結果。

VocabularyScore、 GrammarScore和 TopicScore 參數會匯總至合併的內容評估。

注意

內容和專業評定僅適用於美國地區設定。

回應參數	描述	資料粒度
`AccuracyScore`	語音的發音精確度。精確度表示音素與母語人士發音的相符程度。音節、字組和全文的精確度分數是音素等級精確度分數的彙總，並使用評量目標進行精簡。	電話 me 層級，音節層級（僅限 en-US），文字層級，全文檢索層級
`FluencyScore`	指定語音的流暢度。流暢度指出語音與母語人士在字組間停頓的相符程度。	全文檢索層級
`ProsodyScore`	給定演講的原音。 Prosody 指出給定演講的自然性，包括壓力、語調、說話速度和節奏。	全文檢索層級
`VocabularyScore`	語匯用法的熟練程度。它會評估說話者在指定內容中有效使用單字及其適當性，以準確表達想法，以及語彙複雜度的水準。	全文檢索層級
`GrammarScore`	使用文法和各種句子模式的正確性。語彙精確度、文法精確度和句子結構的多樣性共同提升文法錯誤。	全文檢索層級
`TopicScore`	與主題的了解和參與程度，可提供演講者有效表達想法和想法的能力，以及參與主題的能力。	全文檢索層級
`PronScore`	所指定語音發音質量的整體分數。此值會從 `AccuracyScore`、 `FluencyScore`和以 `CompletenessScore` 權數進行匯總。	全文檢索層級
`ErrorType`	一個字的發音不正確、不正確地插入斷點符號，或在標點符號時遺漏斷點符號。它也指出發音是單調上升、下降還是平淡的語句。這個字、、`Mispronunciation`、 `MissingBreakUnexpectedBreak`和`Monotone`上沒有錯誤的可能值`None`。	文字層級

下表更詳細地說明 prosody 評估結果：

欄位	描述
`ProsodyScore`	整個語句的 Prosody 分數。
`Feedback`	文字層級的意見反應，包括 `Break` 和 `Intonation`。
`Break`
`ErrorTypes`	與中斷相關的錯誤類型，包括 `UnexpectedBreak` 和 `MissingBreak`。目前的版本不提供中斷錯誤類型。您必須設定欄位 `UnexpectedBreak – Confidence` 的臨界值，並 `MissingBreak – confidence` 決定在單字前面是否有非預期的中斷或遺漏中斷。
`UnexpectedBreak`	表示字組前未預期的斷點。
`MissingBreak`	表示字組前遺漏的斷符。
`Thresholds`	這兩個信賴分數的建議閾值為0.75。這表示，如果的值 `UnexpectedBreak – Confidence` 大於 0.75，就會有非預期的中斷。如果的值 `MissingBreak – confidence` 大於 0.75，則會有遺漏的中斷。雖然 0.75 是我們建議的值，但最好根據您自己的案例調整閾值。如果您想要在這兩個中斷時具有變數偵測敏感度，您可以將不同的臨界值指派給 `UnexpectedBreak - Confidence` 和 `MissingBreak - Confidence` 字段。
`Intonation`	表示語音中的調音。
`ErrorTypes`	與調音相關的錯誤類型，目前僅支援 Monotone。 `Monotone`如果存在於欄位中`ErrorTypes`，則會偵測到語句為單調。單調會在整個語句上偵測到，但標記會指派給所有單字。相同語句中的所有單字都會共用相同的單調偵測資訊。
`Monotone`	表示單調語音。
`Thresholds (Monotone Confidence)`	欄位 `Monotone - SyllablePitchDeltaConfidence` 會保留給使用者自定義的單調偵測。如果您對提供的單調決策不滿意，請調整這些欄位的閾值，根據您的喜好設定來自定義偵測。

JSON 結果範例

口語「hello」的腳本發音評估結果會顯示為下列範例中的 JSON 字串。

音素字母是 IPA。
同一個字組的音節會與音素一起傳回。
您可以使用 Offset 和 Duration 值，將音節與相對應的音素對齊。例如，第二個音節 loʊ 的起始位移（117000000）與第三個音素 l對齊。位移代表辨識語音在音訊數據流中開始的時間。此值是以 100 奈秒為單位來測量。若要深入了解 Offset 和 Duration ，請參閱回應屬性。
有五NBestPhonemes個對應到要求的語音電話數目。
在 Phonemes 之中，最有可能的口語音素是 ə，而非預期的音素 ɛ。預期的音素 ɛ 只收到 47 分的信賴分數。其他可能相符的音素則收到 52 分、17 分和 2 分的信賴分數。

{
    "Id": "bbb42ea51bdb46d19a1d685e635fe173",
    "RecognitionStatus": 0,
    "Offset": 7500000,
    "Duration": 13800000,
    "DisplayText": "Hello.",
    "NBest": [
        {
            "Confidence": 0.975003,
            "Lexical": "hello",
            "ITN": "hello",
            "MaskedITN": "hello",
            "Display": "Hello.",
            "PronunciationAssessment": {
                "AccuracyScore": 100,
                "FluencyScore": 100,
                "CompletenessScore": 100,
                "PronScore": 100
            },
            "Words": [
                {
                    "Word": "hello",
                    "Offset": 7500000,
                    "Duration": 13800000,
                    "PronunciationAssessment": {
                        "AccuracyScore": 99.0,
                        "ErrorType": "None"
                    },
                    "Syllables": [
                        {
                            "Syllable": "hɛ",
                            "PronunciationAssessment": {
                                "AccuracyScore": 91.0
                            },
                            "Offset": 7500000,
                            "Duration": 4100000
                        },
                        {
                            "Syllable": "loʊ",
                            "PronunciationAssessment": {
                                "AccuracyScore": 100.0
                            },
                            "Offset": 11700000,
                            "Duration": 9600000
                        }
                    ],
                    "Phonemes": [
                        {
                            "Phoneme": "h",
                            "PronunciationAssessment": {
                                "AccuracyScore": 98.0,
                                "NBestPhonemes": [
                                    {
                                        "Phoneme": "h",
                                        "Score": 100.0
                                    },
                                    {
                                        "Phoneme": "oʊ",
                                        "Score": 52.0
                                    },
                                    {
                                        "Phoneme": "ə",
                                        "Score": 35.0
                                    },
                                    {
                                        "Phoneme": "k",
                                        "Score": 23.0
                                    },
                                    {
                                        "Phoneme": "æ",
                                        "Score": 20.0
                                    }
                                ]
                            },
                            "Offset": 7500000,
                            "Duration": 3500000
                        },
                        {
                            "Phoneme": "ɛ",
                            "PronunciationAssessment": {
                                "AccuracyScore": 47.0,
                                "NBestPhonemes": [
                                    {
                                        "Phoneme": "ə",
                                        "Score": 100.0
                                    },
                                    {
                                        "Phoneme": "l",
                                        "Score": 52.0
                                    },
                                    {
                                        "Phoneme": "ɛ",
                                        "Score": 47.0
                                    },
                                    {
                                        "Phoneme": "h",
                                        "Score": 17.0
                                    },
                                    {
                                        "Phoneme": "æ",
                                        "Score": 2.0
                                    }
                                ]
                            },
                            "Offset": 11100000,
                            "Duration": 500000
                        },
                        {
                            "Phoneme": "l",
                            "PronunciationAssessment": {
                                "AccuracyScore": 100.0,
                                "NBestPhonemes": [
                                    {
                                        "Phoneme": "l",
                                        "Score": 100.0
                                    },
                                    {
                                        "Phoneme": "oʊ",
                                        "Score": 46.0
                                    },
                                    {
                                        "Phoneme": "ə",
                                        "Score": 5.0
                                    },
                                    {
                                        "Phoneme": "ɛ",
                                        "Score": 3.0
                                    },
                                    {
                                        "Phoneme": "u",
                                        "Score": 1.0
                                    }
                                ]
                            },
                            "Offset": 11700000,
                            "Duration": 1100000
                        },
                        {
                            "Phoneme": "oʊ",
                            "PronunciationAssessment": {
                                "AccuracyScore": 100.0,
                                "NBestPhonemes": [
                                    {
                                        "Phoneme": "oʊ",
                                        "Score": 100.0
                                    },
                                    {
                                        "Phoneme": "d",
                                        "Score": 29.0
                                    },
                                    {
                                        "Phoneme": "t",
                                        "Score": 24.0
                                    },
                                    {
                                        "Phoneme": "n",
                                        "Score": 22.0
                                    },
                                    {
                                        "Phoneme": "l",
                                        "Score": 18.0
                                    }
                                ]
                            },
                            "Offset": 12900000,
                            "Duration": 8400000
                        }
                    ]
                }
            ]
        }
    ]
}

您可以取得下列項目的發音評定分數：

全文
字組
音節群組
SAPI 或 IPA \(英文\) 格式的音素

每個地區設定支援的功能

下表摘要說明地區設定支援的功能。如需詳細資訊，請參閱下列各節。

電話 me 字母	Ipa	SAPI
電話 me 名稱	`en-US`	`en-US`, `zh-CN`
音節組	`en-US`	`en-US`
口語音素	`en-US`	`en-US`

音節群組

發音評定可以提供音節層級的評量結果。一個字通常由音節發音為音節，而不是由電話素來發音。音節的分組更清晰，與說話習慣一致。

發音評估僅支援IPA和SAPI的 en-US 音節群組。

下表比較範例音素與對應的音節。

範例字組	音素	音節
technological	teknələdʒɪkl	tek·nə·lɑ·dʒɪkl
hello	hɛloʊ	hɛ·loʊ
luck	lʌk	lʌk
photosynthesis	foʊtəsɪnθəsɪs	foʊ·tə·sɪn·θə·sɪs

若要要求音節層級的結果以及音素，請將資料細微性設定參數設為 Phoneme。

音素字母格式

發音評估支援使用IPA和和SAPI的 en-USen-USzh-CN phoneme 名稱。

若為支援 phoneme 名稱的地區設定，則會與分數一起提供 phoneme 名稱。電話me名稱有助於識別哪些音素的發音準確或不正確。針對其他地區設定，您只能取得音素分數。

下表比較範例 SAPI 音素與對應的 IPA 音素。

範例字組	SAPI 音素	IPA 音素
hello	h eh l ow	h ɛ l oʊ
luck	l ah k	l ʌ k
photosynthesis	f ow t ax s ih n th ax s ih s	f oʊ t ə s ɪ n θ ə s ɪ s

若要要求 IPA 音素，請將音素字母設定為 IPA。如果您未指定字母，則音素預設為 SAPI 格式。

pronunciationAssessmentConfig.PhonemeAlphabet = "IPA";

auto pronunciationAssessmentConfig = PronunciationAssessmentConfig::CreateFromJson("{\"referenceText\":\"good morning\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\",\"phonemeAlphabet\":\"IPA\"}");

PronunciationAssessmentConfig pronunciationAssessmentConfig = PronunciationAssessmentConfig.fromJson("{\"referenceText\":\"good morning\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\",\"phonemeAlphabet\":\"IPA\"}");

pronunciation_assessment_config = speechsdk.PronunciationAssessmentConfig(json_string="{\"referenceText\":\"good morning\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\",\"phonemeAlphabet\":\"IPA\"}")

var pronunciationAssessmentConfig = SpeechSDK.PronunciationAssessmentConfig.fromJSON("{\"referenceText\":\"good morning\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\",\"phonemeAlphabet\":\"IPA\"}");

pronunciationAssessmentConfig.phonemeAlphabet = @"IPA";

pronunciationAssessmentConfig?.phonemeAlphabet = "IPA"

評估口語語音

使用語音電話，您可以取得信賴分數，指出語音語音與預期的音素相符的可能性。

發音評估支援使用IPA和SAPI的 en-US 口語語音。

例如，若要取得單字 Hello的完整口語音效，您可以串連每個預期音素的第一個語音語音，且信賴分數最高。在下列評量結果中，當您說出這個字 hello時，預期的 IPA 電話是 h ɛ l oʊ。不過，實際的口語是 h ə l oʊ。在此範例中，您有五個可能候選項目適用於每個預期的音素。此評定結果顯示，最可能的口語音素是 ə 而非預期的音素 ɛ。預期的音素 ɛ 只收到 47 分的信賴分數。其他可能相符的音素則收到 52 分、17 分和 2 分的信賴分數。

{
    "Id": "bbb42ea51bdb46d19a1d685e635fe173",
    "RecognitionStatus": 0,
    "Offset": 7500000,
    "Duration": 13800000,
    "DisplayText": "Hello.",
    "NBest": [
        {
            "Confidence": 0.975003,
            "Lexical": "hello",
            "ITN": "hello",
            "MaskedITN": "hello",
            "Display": "Hello.",
            "PronunciationAssessment": {
                "AccuracyScore": 100,
                "FluencyScore": 100,
                "CompletenessScore": 100,
                "PronScore": 100
            },
            "Words": [
                {
                    "Word": "hello",
                    "Offset": 7500000,
                    "Duration": 13800000,
                    "PronunciationAssessment": {
                        "AccuracyScore": 99.0,
                        "ErrorType": "None"
                    },
                    "Syllables": [
                        {
                            "Syllable": "hɛ",
                            "PronunciationAssessment": {
                                "AccuracyScore": 91.0
                            },
                            "Offset": 7500000,
                            "Duration": 4100000
                        },
                        {
                            "Syllable": "loʊ",
                            "PronunciationAssessment": {
                                "AccuracyScore": 100.0
                            },
                            "Offset": 11700000,
                            "Duration": 9600000
                        }
                    ],
                    "Phonemes": [
                        {
                            "Phoneme": "h",
                            "PronunciationAssessment": {
                                "AccuracyScore": 98.0,
                                "NBestPhonemes": [
                                    {
                                        "Phoneme": "h",
                                        "Score": 100.0
                                    },
                                    {
                                        "Phoneme": "oʊ",
                                        "Score": 52.0
                                    },
                                    {
                                        "Phoneme": "ə",
                                        "Score": 35.0
                                    },
                                    {
                                        "Phoneme": "k",
                                        "Score": 23.0
                                    },
                                    {
                                        "Phoneme": "æ",
                                        "Score": 20.0
                                    }
                                ]
                            },
                            "Offset": 7500000,
                            "Duration": 3500000
                        },
                        {
                            "Phoneme": "ɛ",
                            "PronunciationAssessment": {
                                "AccuracyScore": 47.0,
                                "NBestPhonemes": [
                                    {
                                        "Phoneme": "ə",
                                        "Score": 100.0
                                    },
                                    {
                                        "Phoneme": "l",
                                        "Score": 52.0
                                    },
                                    {
                                        "Phoneme": "ɛ",
                                        "Score": 47.0
                                    },
                                    {
                                        "Phoneme": "h",
                                        "Score": 17.0
                                    },
                                    {
                                        "Phoneme": "æ",
                                        "Score": 2.0
                                    }
                                ]
                            },
                            "Offset": 11100000,
                            "Duration": 500000
                        },
                        {
                            "Phoneme": "l",
                            "PronunciationAssessment": {
                                "AccuracyScore": 100.0,
                                "NBestPhonemes": [
                                    {
                                        "Phoneme": "l",
                                        "Score": 100.0
                                    },
                                    {
                                        "Phoneme": "oʊ",
                                        "Score": 46.0
                                    },
                                    {
                                        "Phoneme": "ə",
                                        "Score": 5.0
                                    },
                                    {
                                        "Phoneme": "ɛ",
                                        "Score": 3.0
                                    },
                                    {
                                        "Phoneme": "u",
                                        "Score": 1.0
                                    }
                                ]
                            },
                            "Offset": 11700000,
                            "Duration": 1100000
                        },
                        {
                            "Phoneme": "oʊ",
                            "PronunciationAssessment": {
                                "AccuracyScore": 100.0,
                                "NBestPhonemes": [
                                    {
                                        "Phoneme": "oʊ",
                                        "Score": 100.0
                                    },
                                    {
                                        "Phoneme": "d",
                                        "Score": 29.0
                                    },
                                    {
                                        "Phoneme": "t",
                                        "Score": 24.0
                                    },
                                    {
                                        "Phoneme": "n",
                                        "Score": 22.0
                                    },
                                    {
                                        "Phoneme": "l",
                                        "Score": 18.0
                                    }
                                ]
                            },
                            "Offset": 12900000,
                            "Duration": 8400000
                        }
                    ]
                }
            ]
        }
    ]
}

若要指出是否要取得信賴分數，以及要取得信賴分數的可能口語音素數目，請將 NBestPhonemeCount 參數設為整數值，例如 5。

pronunciationAssessmentConfig.NBestPhonemeCount = 5;

auto pronunciationAssessmentConfig = PronunciationAssessmentConfig::CreateFromJson("{\"referenceText\":\"good morning\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\",\"phonemeAlphabet\":\"IPA\",\"nBestPhonemeCount\":5}");

PronunciationAssessmentConfig pronunciationAssessmentConfig = PronunciationAssessmentConfig.fromJson("{\"referenceText\":\"good morning\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\",\"phonemeAlphabet\":\"IPA\",\"nBestPhonemeCount\":5}");

pronunciation_assessment_config = speechsdk.PronunciationAssessmentConfig(json_string="{\"referenceText\":\"good morning\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\",\"phonemeAlphabet\":\"IPA\",\"nBestPhonemeCount\":5}")

var pronunciationAssessmentConfig = SpeechSDK.PronunciationAssessmentConfig.fromJSON("{\"referenceText\":\"good morning\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\",\"phonemeAlphabet\":\"IPA\",\"nBestPhonemeCount\":5}");

pronunciationAssessmentConfig.nbestPhonemeCount = 5;

pronunciationAssessmentConfig?.nbestPhonemeCount = 5

瞭解品質基準。
嘗試在Speech Studio中評估發音。
查看易於部署的發音評定示範。
觀看發音評估的影片示範。

使用發音評定

在串流模式中使用發音評估

設定組態參數

設定方法

取得發音評定結果

結果參數

腳本評估結果

未標明的評量結果

JSON 結果範例

每個地區設定支援的功能

音節群組

音素字母格式

評估口語語音

相關內容

其他資源