Pronunciation Assessment Results in Japanese Have Phonemes Empty

Glen Wang 20 Reputation points
2024-04-23T04:09:14.57+00:00

I'm experimenting with Azure Speech STT pronunciation assessment for Japanese, in Speech Studio.

However, the JSON output shows that the phonemes for every Japanese character are empty strings "". Yet, each syllable has an AccuracyScore.

But in Mandarin, the pronunciation assessment correctly displays phonemes for every Chinese character.

Here is an example for Japanese pronunciation assessment


                        "Word": "写真",
                        "Offset": 4300000,
                        "Duration": 7700000,
                        "PronunciationAssessment": {
                            "AccuracyScore": 90,
                            "ErrorType": "None"
                        },
                        "Syllables": [
                            {
                                "Syllable": "",
                                "PronunciationAssessment": {
                                    "AccuracyScore": 97
                                },
                                "Offset": 4300000,
                                "Duration": 7700000
                            }
                        ],{
                        "Word": "写真",
                        "Offset": 4300000,
                        "Duration": 7700000,
                        "PronunciationAssessment": {
                            "AccuracyScore": 90,
                            "ErrorType": "None"
                        },
                        "Syllables": [
                            {
                                "Syllable": "",
                                "PronunciationAssessment": {
                                    "AccuracyScore": 97
                                },
                                "Offset": 4300000,
                                "Duration": 7700000
                            }
                        ],
                        "Phonemes": [
                            {
                                "Phoneme": "",
                                "PronunciationAssessment": {
                                    "AccuracyScore": 100
                                },
                                "Offset": 4300000,
                                "Duration": 3100000
                            },
                            {
                                "Phoneme": "",
                                "PronunciationAssessment": {
                                    "AccuracyScore": 100
                                },
                                "Offset": 7500000,
                                "Duration": 1300000
                            },
                            {
                                "Phoneme": "",
                                "PronunciationAssessment": {
                                    "AccuracyScore": 75
                                },
                                "Offset": 8900000,
                                "Duration": 700000
                            },
                            {
                                "Phoneme": "",
                                "PronunciationAssessment": {
                                    "AccuracyScore": 100
                                },
                                "Offset": 9700000,
                                "Duration": 1300000
                            },
                            {
                                "Phoneme": "",
                                "PronunciationAssessment": {
                                    "AccuracyScore": 100
                                },
                                "Offset": 11100000,
                                "Duration": 900000
                            }
                        ]
                    {
                        "Word": "写真",
                        "Offset": 4300000,
                        "Duration": 7700000,
                        "PronunciationAssessment": {
                            "AccuracyScore": 90,
                            "ErrorType": "None"
                        },
                        "Syllables": [
                            {
                                "Syllable": "",
                                "PronunciationAssessment": {
                                    "AccuracyScore": 97
                                },
                                "Offset": 4300000,
                                "Duration": 7700000
                            }
                        ],
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,402 questions
{count} votes

1 answer

Sort by: Most helpful
  1. navba-MSFT 17,125 Reputation points Microsoft Employee
    2024-04-23T05:26:22.5933333+00:00

    @Glen Wang Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    The Azure Speech Service’s Pronunciation Assessment feature does support the Japanese locale:

    More info here: [https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=pronunciation-assessment ](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=pronunciation-assessment

    )

    .

    However, as per the documentation, it currently supports phoneme names in the International Phonetic Alphabet (IPA) format for English (en-US) and the Speech Application Programming Interface (SAPI) format for English (en-US) and Chinese (zh-CN).

    .

    This means that while the service can assess pronunciation in Japanese, the detailed phonetic feedback it provides (i.e., the phoneme names) will be in either the IPA format (for English) or the SAPI format (for English and Chinese). It does not currently provide phoneme names in a format specific to Japanese. This is likely why the “Phoneme” fields in your JSON output are empty for Japanese text.

    .

    More info here: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-pronunciation-assessment?pivots=programming-language-csharp#phoneme-alphabet-format

    .

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    0 comments No comments