Pronunciation Assessment Results in Japanese Have Phonemes Empty

Question

I'm experimenting with Azure Speech STT pronunciation assessment for Japanese, in Speech Studio.

However, the JSON output shows that the phonemes for every Japanese character are empty strings "". Yet, each syllable has an AccuracyScore.

But in Mandarin, the pronunciation assessment correctly displays phonemes for every Chinese character.

Here is an example for Japanese pronunciation assessment


                        "Word": "写真",
                        "Offset": 4300000,
                        "Duration": 7700000,
                        "PronunciationAssessment": {
                            "AccuracyScore": 90,
                            "ErrorType": "None"
                        },
                        "Syllables": [
                            {
                                "Syllable": "",
                                "PronunciationAssessment": {
                                    "AccuracyScore": 97
                                },
                                "Offset": 4300000,
                                "Duration": 7700000
                            }
                        ],{
                        "Word": "写真",
                        "Offset": 4300000,
                        "Duration": 7700000,
                        "PronunciationAssessment": {
                            "AccuracyScore": 90,
                            "ErrorType": "None"
                        },
                        "Syllables": [
                            {
                                "Syllable": "",
                                "PronunciationAssessment": {
                                    "AccuracyScore": 97
                                },
                                "Offset": 4300000,
                                "Duration": 7700000
                            }
                        ],
                        "Phonemes": [
                            {
                                "Phoneme": "",
                                "PronunciationAssessment": {
                                    "AccuracyScore": 100
                                },
                                "Offset": 4300000,
                                "Duration": 3100000
                            },
                            {
                                "Phoneme": "",
                                "PronunciationAssessment": {
                                    "AccuracyScore": 100
                                },
                                "Offset": 7500000,
                                "Duration": 1300000
                            },
                            {
                                "Phoneme": "",
                                "PronunciationAssessment": {
                                    "AccuracyScore": 75
                                },
                                "Offset": 8900000,
                                "Duration": 700000
                            },
                            {
                                "Phoneme": "",
                                "PronunciationAssessment": {
                                    "AccuracyScore": 100
                                },
                                "Offset": 9700000,
                                "Duration": 1300000
                            },
                            {
                                "Phoneme": "",
                                "PronunciationAssessment": {
                                    "AccuracyScore": 100
                                },
                                "Offset": 11100000,
                                "Duration": 900000
                            }
                        ]

                    {
                        "Word": "写真",
                        "Offset": 4300000,
                        "Duration": 7700000,
                        "PronunciationAssessment": {
                            "AccuracyScore": 90,
                            "ErrorType": "None"
                        },
                        "Syllables": [
                            {
                                "Syllable": "",
                                "PronunciationAssessment": {
                                    "AccuracyScore": 97
                                },
                                "Offset": 4300000,
                                "Duration": 7700000
                            }
                        ],

Answer

@Glen Wang Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

The Azure Speech Service’s Pronunciation Assessment feature does support the Japanese locale:

More info here: [https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=pronunciation-assessment ](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=pronunciation-assessment

)

.

However, as per the documentation, it currently supports phoneme names in the International Phonetic Alphabet (IPA) format for English (en-US) and the Speech Application Programming Interface (SAPI) format for English (en-US) and Chinese (zh-CN).

.

This means that while the service can assess pronunciation in Japanese, the detailed phonetic feedback it provides (i.e., the phoneme names) will be in either the IPA format (for English) or the SAPI format (for English and Chinese). It does not currently provide phoneme names in a format specific to Japanese. This is likely why the “Phoneme” fields in your JSON output are empty for Japanese text.

.

More info here: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-pronunciation-assessment?pivots=programming-language-csharp#phoneme-alphabet-format

.

Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

Pronunciation Assessment Results in Japanese Have Phonemes Empty

1 answer