Menerapkan identifikasi bahasa

Artikel
02/13/2024

Identifikasi bahasa digunakan untuk mengidentifikasi bahasa yang diucapkan dalam audio saat dibandingkan dengan daftar bahasa yang didukung.

Kasus penggunaan identifikasi bahasa (LID) meliputi:

Pengenalan ucapan ke teks saat Anda perlu mengidentifikasi bahasa dalam sumber audio lalu mentranskripsikan ke teks.
Terjemahan ucapan jika Anda perlu mengidentifikasi bahasa dalam sumber audio dan kemudian menerjemahkannya ke bahasa lain.

Untuk pengenalan ucapan, latensi awal lebih tinggi dengan identifikasi bahasa. Anda hanya harus menyertakan fitur opsional ini sesuai kebutuhan.

Mengatur opsi konfigurasi

Apakah Anda menggunakan identifikasi bahasa dengan ucapan ke teks atau dengan terjemahan ucapan, ada beberapa konsep umum dan opsi konfigurasi.

Tentukan daftar bahasa kandidat yang Anda harapkan dalam audio.
Putuskan apakah akan menggunakan identifikasi bahasa di awal atau berkelanjutan.

Kemudian Anda mengenali permintaan pengenalan satu kali atau berkelanjutan ke layanan Ucapan.

Penting

API Identifikasi Bahasa disederhanakan dengan Speech SDK versi 1.25 dan yang lebih baru. Properti SpeechServiceConnection_SingleLanguageIdPriority dan SpeechServiceConnection_ContinuousLanguageIdPriority telah dihapus. Satu properti SpeechServiceConnection_LanguageIdMode menggantikannya. Anda tidak perlu lagi memprioritaskan antara latensi rendah dan akurasi tinggi. Untuk pengenalan atau terjemahan ucapan berkelanjutan, Anda hanya perlu memilih apakah akan menjalankan Identifikasi Bahasa di awal atau berkelanjutan.

Artikel ini menyediakan cuplikan kode untuk menjelaskan konsepnya. Tautan untuk menyelesaikan sampel untuk setiap kasus penggunaan disediakan.

Bahasa kandidat

Anda menyediakan bahasa kandidat dengan AutoDetectSourceLanguageConfig objek . Anda mengharapkan bahwa setidaknya salah satu kandidat ada dalam audio. Anda dapat menyertakan hingga empat bahasa untuk LID awal atau hingga 10 bahasa untuk LID berkelanjutan. Layanan Ucapan mengembalikan salah satu bahasa kandidat yang disediakan meskipun bahasa tersebut tidak ada dalam audio. Misalnya, jika fr-FR (Prancis) dan en-US (Bahasa Inggris) disediakan sebagai kandidat, tetapi bahasa Jerman diucapkan, layanan mengembalikan baik fr-FR atau en-US.

Anda harus memberikan lokal lengkap dengan pemisah tanda hubung (-), tetapi identifikasi bahasa hanya menggunakan satu lokal per bahasa dasar. Jangan sertakan beberapa lokal untuk bahasa yang sama, misalnya, en-US dan en-GB.

var autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-CN" });

auto autoDetectSourceLanguageConfig = 
    AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE", "zh-CN" });

auto_detect_source_language_config = \
    speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE", "zh-CN"])

AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig.fromLanguages(Arrays.asList("en-US", "de-DE", "zh-CN"));

var autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fromLanguages([("en-US", "de-DE", "zh-CN"]);

NSArray *languages = @[@"en-US", @"de-DE", @"zh-CN"];
SPXAutoDetectSourceLanguageConfiguration* autoDetectSourceLanguageConfig = \
    [[SPXAutoDetectSourceLanguageConfiguration alloc]init:languages];

Untuk informasi selengkapnya, lihat bahasa yang didukung.

Identifikasi bahasa Di Awal dan Berkelanjutan

Ucapan mendukung identifikasi bahasa (LID) di awal dan berkelanjutan.

Catatan

Identifikasi bahasa berkelanjutan hanya didukung dengan Speech SDK di C#, C++, Java (hanya untuk ucapan ke teks), JavaScript (hanya untuk ucapan ke teks), dan Python.

LID di awal mengidentifikasi bahasa satu kali dalam beberapa detik pertama audio. Gunakan LID di awal jika bahasa dalam audio tidak berubah. Dengan LID awal, satu bahasa terdeteksi dan dikembalikan dalam waktu kurang dari 5 detik.
LID berkelanjutan dapat mengidentifikasi beberapa bahasa selama audio. Gunakan LID berkelanjutan jika bahasa dalam audio dapat berubah. LID berkelanjutan tidak mendukung perubahan bahasa dalam kalimat yang sama. Misalnya, jika Anda terutama berbahasa Spanyol dan menyisipkan beberapa kata bahasa Inggris, itu tidak mendeteksi perubahan bahasa per kata.

Anda menerapkan LID di awal atau LID berkelanjutan dengan memanggil metode untuk mengenali satu kali atau berkelanjutan. LID berkelanjutan hanya didukung dengan pengenalan berkelanjutan.

Mengenali satu kali atau berkelanjutan

Identifikasi bahasa dilengkapi dengan objek dan operasi pengenalan. Buat permintaan ke layanan Ucapan untuk pengenalan audio.

Catatan

Jangan bingung antara pengenalan dengan identifikasi. Pengakuan dapat digunakan dengan atau tanpa identifikasi bahasa.

Panggil metode "kenali sekali", atau mulai dan hentikan metode pengenalan berkelanjutan. Anda memilih dari:

Kenali sekali dengan LID At-start. LID berkelanjutan tidak didukung untuk dikenali sekali.
Gunakan pengenalan berkelanjutan dengan LID awal.
Gunakan pengenalan berkelanjutan dengan LID berkelanjutan.

Properti SpeechServiceConnection_LanguageIdMode hanya diperlukan untuk LID berkelanjutan. Tanpa itu, layanan Ucapan default ke LID awal. Nilai yang didukung adalah AtStart untuk LID di awal atau Continuous untuk LID berkelanjutan.

// Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
var result = await recognizer.RecognizeOnceAsync();

// Start and stop continuous recognition with At-start LID
await recognizer.StartContinuousRecognitionAsync();
await recognizer.StopContinuousRecognitionAsync();

// Start and stop continuous recognition with Continuous LID
speechConfig.SetProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");
await recognizer.StartContinuousRecognitionAsync();
await recognizer.StopContinuousRecognitionAsync();

// Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
auto result = recognizer->RecognizeOnceAsync().get();

// Start and stop continuous recognition with At-start LID
recognizer->StartContinuousRecognitionAsync().get();
recognizer->StopContinuousRecognitionAsync().get();

// Start and stop continuous recognition with Continuous LID
speechConfig->SetProperty(PropertyId::SpeechServiceConnection_LanguageIdMode, "Continuous");
recognizer->StartContinuousRecognitionAsync().get();
recognizer->StopContinuousRecognitionAsync().get();

// Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
SpeechRecognitionResult  result = recognizer->RecognizeOnceAsync().get();

// Start and stop continuous recognition with At-start LID
recognizer.startContinuousRecognitionAsync().get();
recognizer.stopContinuousRecognitionAsync().get();

// Start and stop continuous recognition with Continuous LID
speechConfig.setProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");
recognizer.startContinuousRecognitionAsync().get();
recognizer.stopContinuousRecognitionAsync().get();

# Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
result = recognizer.recognize_once()

# Start and stop continuous recognition with At-start LID
recognizer.start_continuous_recognition()
recognizer.stop_continuous_recognition()

# Start and stop continuous recognition with Continuous LID
speech_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceConnection_LanguageIdMode, value='Continuous')
recognizer.start_continuous_recognition()
recognizer.stop_continuous_recognition()

Menggunakan ucapan ke teks

Anda menggunakan pengenalan Ucapan ke teks saat Anda perlu mengidentifikasi bahasa dalam sumber audio lalu mentranskripsikan ke teks. Untuk informasi selengkapnya, lihat Ringkasan ucapan ke teks.

Catatan

Pengenalan ucapan ke teks dengan identifikasi bahasa awal didukung dengan SDK Ucapan di C#, C++, Python, Java, JavaScript, dan Objective-C. Pengenalan ucapan ke teks dengan identifikasi bahasa berkelanjutan hanya didukung dengan SDK Ucapan di C#, C++, Java, JavaScript, dan Python.

Saat ini untuk pengenalan ucapan ke teks dengan identifikasi bahasa berkelanjutan, Anda harus membuat SpeechConfig dari wss://{region}.stt.speech.microsoft.com/speech/universal/v2 string titik akhir, seperti yang ditunjukkan dalam contoh kode. Dalam rilis SDK mendatang, Anda tidak perlu mengaturnya.

Lihat contoh pengenalan ucapan ke teks lainnya dengan identifikasi bahasa di GitHub.

Kenali satu kali
Pengenalan berkelanjutan

using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

var speechConfig = SpeechConfig.FromSubscription("YourSubscriptionKey","YourServiceRegion");

var autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig.FromLanguages(
        new string[] { "en-US", "de-DE", "zh-CN" });

using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
using (var recognizer = new SpeechRecognizer(
    speechConfig,
    autoDetectSourceLanguageConfig,
    audioConfig))
{
    var speechRecognitionResult = await recognizer.RecognizeOnceAsync();
    var autoDetectSourceLanguageResult =
        AutoDetectSourceLanguageResult.FromResult(speechRecognitionResult);
    var detectedLanguage = autoDetectSourceLanguageResult.Language;
}

using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

var region = "YourServiceRegion";
// Currently the v2 endpoint is required. In a future SDK release you won't need to set it.
var endpointString = $"wss://{region}.stt.speech.microsoft.com/speech/universal/v2";
var endpointUrl = new Uri(endpointString);

var config = SpeechConfig.FromEndpoint(endpointUrl, "YourSubscriptionKey");

// Set the LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
config.SetProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");

var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-CN" });

var stopRecognition = new TaskCompletionSource<int>();
using (var audioInput = AudioConfig.FromWavFileInput(@"en-us_zh-cn.wav"))
{
    using (var recognizer = new SpeechRecognizer(config, autoDetectSourceLanguageConfig, audioInput))
    {
        // Subscribes to events.
        recognizer.Recognizing += (s, e) =>
        {
            if (e.Result.Reason == ResultReason.RecognizingSpeech)
            {
                Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}");
                var autoDetectSourceLanguageResult = AutoDetectSourceLanguageResult.FromResult(e.Result);
                Console.WriteLine($"DETECTED: Language={autoDetectSourceLanguageResult.Language}");
            }
        };

        recognizer.Recognized += (s, e) =>
        {
            if (e.Result.Reason == ResultReason.RecognizedSpeech)
            {
                Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
                var autoDetectSourceLanguageResult = AutoDetectSourceLanguageResult.FromResult(e.Result);
                Console.WriteLine($"DETECTED: Language={autoDetectSourceLanguageResult.Language}");
            }
            else if (e.Result.Reason == ResultReason.NoMatch)
            {
                Console.WriteLine($"NOMATCH: Speech could not be recognized.");
            }
        };

        recognizer.Canceled += (s, e) =>
        {
            Console.WriteLine($"CANCELED: Reason={e.Reason}");

            if (e.Reason == CancellationReason.Error)
            {
                Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
                Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
                Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
            }

            stopRecognition.TrySetResult(0);
        };

        recognizer.SessionStarted += (s, e) =>
        {
            Console.WriteLine("\n    Session started event.");
        };

        recognizer.SessionStopped += (s, e) =>
        {
            Console.WriteLine("\n    Session stopped event.");
            Console.WriteLine("\nStop recognition.");
            stopRecognition.TrySetResult(0);
        };

        // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
        await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);

        // Waits for completion.
        // Use Task.WaitAny to keep the task rooted.
        Task.WaitAny(new[] { stopRecognition.Task });

        // Stops recognition.
        await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
    }
}

Lihat contoh pengenalan ucapan ke teks lainnya dengan identifikasi bahasa di GitHub.

Kenali satu kali
Pengenalan berkelanjutan

using namespace std;
using namespace Microsoft::CognitiveServices::Speech;
using namespace Microsoft::CognitiveServices::Speech::Audio;

auto speechConfig = SpeechConfig::FromSubscription("YourSubscriptionKey","YourServiceRegion");

auto autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE", "zh-CN" });

auto recognizer = SpeechRecognizer::FromConfig(
    speechConfig,
    autoDetectSourceLanguageConfig
    );

speechRecognitionResult = recognizer->RecognizeOnceAsync().get();
auto autoDetectSourceLanguageResult =
    AutoDetectSourceLanguageResult::FromResult(speechRecognitionResult);
auto detectedLanguage = autoDetectSourceLanguageResult->Language;


// Creates an instance of a speech config with specified subscription key and service region.
// Note: For multi-lingual speech recognition with language id, it only works with speech v2 endpoint,
// you must use FromEndpoint api in order to use the speech v2 endpoint.

// Replace YourServiceRegion with your region, for example "westus", and
// replace YourSubscriptionKey with your own speech key.
string speechv2Endpoint = "wss://YourServiceRegion.stt.speech.microsoft.com/speech/universal/v2";
auto speechConfig = SpeechConfig::FromEndpoint(speechv2Endpoint, "YourSubscriptionKey");

// Set the mode of input language detection to either "AtStart" (the default) or "Continuous".
// Please refer to the documentation of Language ID for more information.
// https://aka.ms/speech/lid?pivots=programming-language-cpp
speechConfig->SetProperty(PropertyId::SpeechServiceConnection_LanguageIdMode, "Continuous");

// Define the set of languages to detect
auto autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "zh-CN" });

// Creates a speech recognizer using file as audio input.
// Replace with your own audio file name.
auto audioInput = AudioConfig::FromWavFileInput("en-us_zh-cn.wav");
auto recognizer = SpeechRecognizer::FromConfig(speechConfig, autoDetectSourceLanguageConfig, audioInput);

// promise for synchronization of recognition end.
promise<void> recognitionEnd;

// Subscribes to events.
recognizer->Recognizing.Connect([](const SpeechRecognitionEventArgs& e)
    {
        auto lidResult = AutoDetectSourceLanguageResult::FromResult(e.Result);
        cout << "Recognizing in " << lidResult->Language << ": Text =" << e.Result->Text << std::endl;
    });

recognizer->Recognized.Connect([](const SpeechRecognitionEventArgs& e)
    {
        if (e.Result->Reason == ResultReason::RecognizedSpeech)
        {
            auto lidResult = AutoDetectSourceLanguageResult::FromResult(e.Result);
            cout << "RECOGNIZED in " << lidResult->Language << ": Text=" << e.Result->Text << "\n"
                << "  Offset=" << e.Result->Offset() << "\n"
                << "  Duration=" << e.Result->Duration() << std::endl;
        }
        else if (e.Result->Reason == ResultReason::NoMatch)
        {
            cout << "NOMATCH: Speech could not be recognized." << std::endl;
        }
    });

recognizer->Canceled.Connect([&recognitionEnd](const SpeechRecognitionCanceledEventArgs& e)
    {
        cout << "CANCELED: Reason=" << (int)e.Reason << std::endl;

        if (e.Reason == CancellationReason::Error)
        {
            cout << "CANCELED: ErrorCode=" << (int)e.ErrorCode << "\n"
                << "CANCELED: ErrorDetails=" << e.ErrorDetails << "\n"
                << "CANCELED: Did you update the subscription info?" << std::endl;

            recognitionEnd.set_value(); // Notify to stop recognition.
        }
    });

recognizer->SessionStopped.Connect([&recognitionEnd](const SessionEventArgs& e)
    {
        cout << "Session stopped.";
        recognitionEnd.set_value(); // Notify to stop recognition.
    });

// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
recognizer->StartContinuousRecognitionAsync().get();

// Waits for recognition end.
recognitionEnd.get_future().get();

// Stops recognition.
recognizer->StopContinuousRecognitionAsync().get();

Lihat contoh pengenalan ucapan ke teks lainnya dengan identifikasi bahasa di GitHub.

Kenali satu kali
Pengenalan berkelanjutan

AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig.fromLanguages(Arrays.asList("en-US", "de-DE"));

SpeechRecognizer recognizer = new SpeechRecognizer(
    speechConfig,
    autoDetectSourceLanguageConfig,
    audioConfig);

Future<SpeechRecognitionResult> future = recognizer.recognizeOnceAsync();
SpeechRecognitionResult result = future.get(30, TimeUnit.SECONDS);
AutoDetectSourceLanguageResult autoDetectSourceLanguageResult =
    AutoDetectSourceLanguageResult.fromResult(result);
String detectedLanguage = autoDetectSourceLanguageResult.getLanguage();

recognizer.close();
speechConfig.close();
autoDetectSourceLanguageConfig.close();
audioConfig.close();
result.close();

// Shows how to do continuous speech recognition on a multilingual audio file with continuous language detection. Here, we assume the
// spoken language in the file can alternate between English (US), Spanish (Mexico) and German.
// If specified, speech recognition will use the custom model associated with the detected language.
public static void continuousRecognitionFromFileWithContinuousLanguageDetectionWithCustomModels() throws InterruptedException, ExecutionException, IOException
{
    // Continuous language detection with speech recognition requires the application to set a V2 endpoint URL.
    // Replace the service (Azure) region with your own service region (e.g. "westus").
    String v2EndpointUrl = "wss://" + "YourServiceRegion" + ".stt.speech.microsoft.com/speech/universal/v2";

    // Creates an instance of a speech config with specified endpoint URL and subscription key. Replace with your own subscription key.
    SpeechConfig speechConfig = SpeechConfig.fromEndpoint(URI.create(v2EndpointUrl), "YourSubscriptionKey");

    // Change the default from at-start language detection to continuous language detection, since the spoken language in the audio
    // may change.
    speechConfig.setProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");

    // Define a set of expected spoken languages in the audio, with an optional custom model endpoint ID associated with each.
    // Update the below with your own languages. Please see https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support
    // for all supported languages.
    // Update the below with your own custom model endpoint IDs, or omit it if you want to use the standard model.
    List<SourceLanguageConfig> sourceLanguageConfigs = new ArrayList<SourceLanguageConfig>();
    sourceLanguageConfigs.add(SourceLanguageConfig.fromLanguage("en-US", "YourEnUsCustomModelID"));
    sourceLanguageConfigs.add(SourceLanguageConfig.fromLanguage("es-MX", "YourEsMxCustomModelID"));
    sourceLanguageConfigs.add(SourceLanguageConfig.fromLanguage("de-DE"));

    // Creates an instance of AutoDetectSourceLanguageConfig with the above 3 source language configurations.
    AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.fromSourceLanguageConfigs(sourceLanguageConfigs);

    // We provide a WAV file with English and Spanish utterances as an example. Replace with your own multilingual audio file name.
    AudioConfig audioConfig = AudioConfig.fromWavFileInput( "es-mx_en-us.wav");

    // Creates a speech recognizer using file as audio input and the AutoDetectSourceLanguageConfig
    SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig, autoDetectSourceLanguageConfig, audioConfig);

    // Semaphore used to signal the call to stop continuous recognition (following either a session ended or a cancelled event)
    final Semaphore doneSemaphone = new Semaphore(0);

    // Subscribes to events.

    /* Uncomment this to see intermediate recognition results. Since this is verbose and the WAV file is long, it is commented out by default in this sample.
    speechRecognizer.recognizing.addEventListener((s, e) -> {
        AutoDetectSourceLanguageResult autoDetectSourceLanguageResult = AutoDetectSourceLanguageResult.fromResult(e.getResult());
        String language = autoDetectSourceLanguageResult.getLanguage();
        System.out.println(" RECOGNIZING: Text = " + e.getResult().getText());
        System.out.println(" RECOGNIZING: Language = " + language);
    });
    */

    speechRecognizer.recognized.addEventListener((s, e) -> {
        AutoDetectSourceLanguageResult autoDetectSourceLanguageResult = AutoDetectSourceLanguageResult.fromResult(e.getResult());
        String language = autoDetectSourceLanguageResult.getLanguage();
        if (e.getResult().getReason() == ResultReason.RecognizedSpeech) {
            System.out.println(" RECOGNIZED: Text = " + e.getResult().getText());
            System.out.println(" RECOGNIZED: Language = " + language);
        }
        else if (e.getResult().getReason() == ResultReason.NoMatch) {
            if (language == null || language.isEmpty() || language.toLowerCase().equals("unknown")) {
                System.out.println(" NOMATCH: Speech Language could not be detected.");
            }
            else {
                System.out.println(" NOMATCH: Speech could not be recognized.");
            }
        }
    });

    speechRecognizer.canceled.addEventListener((s, e) -> {
        System.out.println(" CANCELED: Reason = " + e.getReason());
        if (e.getReason() == CancellationReason.Error) {
            System.out.println(" CANCELED: ErrorCode = " + e.getErrorCode());
            System.out.println(" CANCELED: ErrorDetails = " + e.getErrorDetails());
            System.out.println(" CANCELED: Did you update the subscription info?");
        }
        doneSemaphone.release();
    });

    speechRecognizer.sessionStarted.addEventListener((s, e) -> {
        System.out.println("\n Session started event.");
    });

    speechRecognizer.sessionStopped.addEventListener((s, e) -> {
        System.out.println("\n Session stopped event.");
        doneSemaphone.release();
    });

    // Starts continuous recognition and wait for processing to end
    System.out.println(" Recognizing from WAV file... please wait");
    speechRecognizer.startContinuousRecognitionAsync().get();
    doneSemaphone.tryAcquire(30, TimeUnit.SECONDS);

    // Stop continuous recognition
    speechRecognizer.stopContinuousRecognitionAsync().get();

    // These objects must be closed in order to dispose underlying native resources
    speechRecognizer.close();
    speechConfig.close();
    audioConfig.close();
    for (SourceLanguageConfig sourceLanguageConfig : sourceLanguageConfigs)
    {
        sourceLanguageConfig.close();
    }
    autoDetectSourceLanguageConfig.close();
}

Lihat contoh pengenalan ucapan ke teks lainnya dengan identifikasi bahasa di GitHub.

Kenali satu kali
Pengenalan berkelanjutan

auto_detect_source_language_config = \
        speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE"])
speech_recognizer = speechsdk.SpeechRecognizer(
        speech_config=speech_config, 
        auto_detect_source_language_config=auto_detect_source_language_config, 
        audio_config=audio_config)
result = speech_recognizer.recognize_once()
auto_detect_source_language_result = speechsdk.AutoDetectSourceLanguageResult(result)
detected_language = auto_detect_source_language_result.language

import azure.cognitiveservices.speech as speechsdk
import time
import json

speech_key, service_region = "YourSubscriptionKey","YourServiceRegion"
weatherfilename="en-us_zh-cn.wav"

# Currently the v2 endpoint is required. In a future SDK release you won't need to set it. 
endpoint_string = "wss://{}.stt.speech.microsoft.com/speech/universal/v2".format(service_region)
speech_config = speechsdk.SpeechConfig(subscription=speech_key, endpoint=endpoint_string)
audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)

# Set the LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
speech_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceConnection_LanguageIdMode, value='Continuous')

auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(
    languages=["en-US", "de-DE", "zh-CN"])

speech_recognizer = speechsdk.SpeechRecognizer(
    speech_config=speech_config, 
    auto_detect_source_language_config=auto_detect_source_language_config,
    audio_config=audio_config)

done = False

def stop_cb(evt):
    """callback that signals to stop continuous recognition upon receiving an event `evt`"""
    print('CLOSING on {}'.format(evt))
    nonlocal done
    done = True

# Connect callbacks to the events fired by the speech recognizer
speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
# stop continuous recognition on either session stopped or canceled events
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)

# Start continuous speech recognition
speech_recognizer.start_continuous_recognition()
while not done:
    time.sleep(.5)

speech_recognizer.stop_continuous_recognition()

NSArray *languages = @[@"en-US", @"de-DE", @"zh-CN"];
SPXAutoDetectSourceLanguageConfiguration* autoDetectSourceLanguageConfig = \
        [[SPXAutoDetectSourceLanguageConfiguration alloc]init:languages];
SPXSpeechRecognizer* speechRecognizer = \
        [[SPXSpeechRecognizer alloc] initWithSpeechConfiguration:speechConfig
                           autoDetectSourceLanguageConfiguration:autoDetectSourceLanguageConfig
                                              audioConfiguration:audioConfig];
SPXSpeechRecognitionResult *result = [speechRecognizer recognizeOnce];
SPXAutoDetectSourceLanguageResult *languageDetectionResult = [[SPXAutoDetectSourceLanguageResult alloc] init:result];
NSString *detectedLanguage = [languageDetectionResult language];

var autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fromLanguages(["en-US", "de-DE"]);
var speechRecognizer = SpeechSDK.SpeechRecognizer.FromConfig(speechConfig, autoDetectSourceLanguageConfig, audioConfig);
speechRecognizer.recognizeOnceAsync((result: SpeechSDK.SpeechRecognitionResult) => {
        var languageDetectionResult = SpeechSDK.AutoDetectSourceLanguageResult.fromResult(result);
        var detectedLanguage = languageDetectionResult.language;
},
{});

Model kustom ucapan ke teks

Catatan

Deteksi bahasa dengan model kustom hanya dapat digunakan dengan ucapan real time ke terjemahan teks dan ucapan. Transkripsi batch hanya mendukung deteksi bahasa untuk model dasar default.

Sampel ini menunjukkan cara menggunakan deteksi bahasa dengan titik akhir kustom. Jika bahasa yang terdeteksi adalah en-US, contohnya menggunakan model default. Jika bahasa yang terdeteksi adalah fr-FR, contohnya menggunakan titik akhir model kustom. Untuk informasi selengkapnya, lihat Menyebarkan model ucapan kustom.

var sourceLanguageConfigs = new SourceLanguageConfig[]
{
    SourceLanguageConfig.FromLanguage("en-US"),
    SourceLanguageConfig.FromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR")
};
var autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig.FromSourceLanguageConfigs(
        sourceLanguageConfigs);

std::vector<std::shared_ptr<SourceLanguageConfig>> sourceLanguageConfigs;
sourceLanguageConfigs.push_back(
    SourceLanguageConfig::FromLanguage("en-US"));
sourceLanguageConfigs.push_back(
    SourceLanguageConfig::FromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR"));

auto autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig::FromSourceLanguageConfigs(
        sourceLanguageConfigs);

List sourceLanguageConfigs = new ArrayList<SourceLanguageConfig>();
sourceLanguageConfigs.add(
    SourceLanguageConfig.fromLanguage("en-US"));
sourceLanguageConfigs.add(
    SourceLanguageConfig.fromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR"));

AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig.fromSourceLanguageConfigs(
        sourceLanguageConfigs);

 en_language_config = speechsdk.languageconfig.SourceLanguageConfig("en-US")
 fr_language_config = speechsdk.languageconfig.SourceLanguageConfig("fr-FR", "The Endpoint Id for custom model of fr-FR")
 auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(
        sourceLanguageConfigs=[en_language_config, fr_language_config])

SPXSourceLanguageConfiguration* enLanguageConfig = [[SPXSourceLanguageConfiguration alloc]init:@"en-US"];
SPXSourceLanguageConfiguration* frLanguageConfig = \
        [[SPXSourceLanguageConfiguration alloc]initWithLanguage:@"fr-FR"
                                                     endpointId:@"The Endpoint Id for custom model of fr-FR"];
NSArray *languageConfigs = @[enLanguageConfig, frLanguageConfig];
SPXAutoDetectSourceLanguageConfiguration* autoDetectSourceLanguageConfig = \
        [[SPXAutoDetectSourceLanguageConfiguration alloc]initWithSourceLanguageConfigurations:languageConfigs];

var enLanguageConfig = SpeechSDK.SourceLanguageConfig.fromLanguage("en-US");
var frLanguageConfig = SpeechSDK.SourceLanguageConfig.fromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR");
var autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fromSourceLanguageConfigs([enLanguageConfig, frLanguageConfig]);

Menjalankan terjemahan ucapan

Gunakan terjemahan Ucapan saat Anda perlu mengidentifikasi bahasa dalam sumber audio lalu menerjemahkannya ke bahasa lain. Untuk informasi selengkapnya, lihat Ringkasan terjemahan ucapan.

Catatan

Terjemahan ucapan dengan identifikasi bahasa hanya didukung dengan Speech SDK di C#, C++, JavaScript, dan Python. Saat ini untuk terjemahan ucapan dengan identifikasi bahasa, Anda harus membuat SpeechConfig dari wss://{region}.stt.speech.microsoft.com/speech/universal/v2 string titik akhir, seperti yang ditunjukkan dalam contoh kode. Dalam rilis SDK mendatang, Anda tidak perlu mengaturnya.

Lihat lebih banyak contoh terjemahan ucapan dengan identifikasi bahasa di GitHub.

Kenali satu kali
Pengenalan berkelanjutan

using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Microsoft.CognitiveServices.Speech.Translation;

public static async Task RecognizeOnceSpeechTranslationAsync()
{
    var region = "YourServiceRegion";
    // Currently the v2 endpoint is required. In a future SDK release you won't need to set it.
    var endpointString = $"wss://{region}.stt.speech.microsoft.com/speech/universal/v2";
    var endpointUrl = new Uri(endpointString);

    var config = SpeechTranslationConfig.FromEndpoint(endpointUrl, "YourSubscriptionKey");

    // Source language is required, but currently ignored. 
    string fromLanguage = "en-US";
    speechTranslationConfig.SpeechRecognitionLanguage = fromLanguage;

    speechTranslationConfig.AddTargetLanguage("de");
    speechTranslationConfig.AddTargetLanguage("fr");

    var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-CN" });

    using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();

    using (var recognizer = new TranslationRecognizer(
        speechTranslationConfig, 
        autoDetectSourceLanguageConfig,
        audioConfig))
    {

        Console.WriteLine("Say something or read from file...");
        var result = await recognizer.RecognizeOnceAsync().ConfigureAwait(false);

        if (result.Reason == ResultReason.TranslatedSpeech)
        {
            var lidResult = result.Properties.GetProperty(PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult);

            Console.WriteLine($"RECOGNIZED in '{lidResult}': Text={result.Text}");
            foreach (var element in result.Translations)
            {
                Console.WriteLine($"    TRANSLATED into '{element.Key}': {element.Value}");
            }
        }
    }
}

using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Microsoft.CognitiveServices.Speech.Translation;

public static async Task MultiLingualTranslation()
{
    var region = "YourServiceRegion";
    // Currently the v2 endpoint is required. In a future SDK release you won't need to set it.
    var endpointString = $"wss://{region}.stt.speech.microsoft.com/speech/universal/v2";
    var endpointUrl = new Uri(endpointString);

    var config = SpeechTranslationConfig.FromEndpoint(endpointUrl, "YourSubscriptionKey");

    // Source language is required, but currently ignored. 
    string fromLanguage = "en-US";
    config.SpeechRecognitionLanguage = fromLanguage;

    config.AddTargetLanguage("de");
    config.AddTargetLanguage("fr");

    // Set the LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
    config.SetProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");
    var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-CN" });

    var stopTranslation = new TaskCompletionSource<int>();
    using (var audioInput = AudioConfig.FromWavFileInput(@"en-us_zh-cn.wav"))
    {
        using (var recognizer = new TranslationRecognizer(config, autoDetectSourceLanguageConfig, audioInput))
        {
            recognizer.Recognizing += (s, e) =>
            {
                var lidResult = e.Result.Properties.GetProperty(PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult);

                Console.WriteLine($"RECOGNIZING in '{lidResult}': Text={e.Result.Text}");
                foreach (var element in e.Result.Translations)
                {
                    Console.WriteLine($"    TRANSLATING into '{element.Key}': {element.Value}");
                }
            };

            recognizer.Recognized += (s, e) => {
                if (e.Result.Reason == ResultReason.TranslatedSpeech)
                {
                    var lidResult = e.Result.Properties.GetProperty(PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult);

                    Console.WriteLine($"RECOGNIZED in '{lidResult}': Text={e.Result.Text}");
                    foreach (var element in e.Result.Translations)
                    {
                        Console.WriteLine($"    TRANSLATED into '{element.Key}': {element.Value}");
                    }
                }
                else if (e.Result.Reason == ResultReason.RecognizedSpeech)
                {
                    Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
                    Console.WriteLine($"    Speech not translated.");
                }
                else if (e.Result.Reason == ResultReason.NoMatch)
                {
                    Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                }
            };

            recognizer.Canceled += (s, e) =>
            {
                Console.WriteLine($"CANCELED: Reason={e.Reason}");

                if (e.Reason == CancellationReason.Error)
                {
                    Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
                    Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
                    Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
                }

                stopTranslation.TrySetResult(0);
            };

            recognizer.SpeechStartDetected += (s, e) => {
                Console.WriteLine("\nSpeech start detected event.");
            };

            recognizer.SpeechEndDetected += (s, e) => {
                Console.WriteLine("\nSpeech end detected event.");
            };

            recognizer.SessionStarted += (s, e) => {
                Console.WriteLine("\nSession started event.");
            };

            recognizer.SessionStopped += (s, e) => {
                Console.WriteLine("\nSession stopped event.");
                Console.WriteLine($"\nStop translation.");
                stopTranslation.TrySetResult(0);
            };

            // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
            Console.WriteLine("Start translation...");
            await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);

            Task.WaitAny(new[] { stopTranslation.Task });
            await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
        }
    }
}

Lihat lebih banyak contoh terjemahan ucapan dengan identifikasi bahasa di GitHub.

Kenali satu kali
Pengenalan berkelanjutan

auto region = "YourServiceRegion";
// Currently the v2 endpoint is required. In a future SDK release you won't need to set it.
auto endpointString = std::format("wss://{}.stt.speech.microsoft.com/speech/universal/v2", region);
auto config = SpeechTranslationConfig::FromEndpoint(endpointString, "YourSubscriptionKey");

auto autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE" });

// Sets source and target languages
// The source language will be detected by the language detection feature. 
// However, the SpeechRecognitionLanguage still need to set with a locale string, but it will not be used as the source language.
// This will be fixed in a future version of Speech SDK.
auto fromLanguage = "en-US";
config->SetSpeechRecognitionLanguage(fromLanguage);
config->AddTargetLanguage("de");
config->AddTargetLanguage("fr");

// Creates a translation recognizer using microphone as audio input.
auto recognizer = TranslationRecognizer::FromConfig(config, autoDetectSourceLanguageConfig);
cout << "Say something...\n";

// Starts translation, and returns after a single utterance is recognized. The end of a
// single utterance is determined by listening for silence at the end or until a maximum of 15
// seconds of audio is processed. The task returns the recognized text as well as the translation.
// Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
// shot recognition like command or query.
// For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
auto result = recognizer->RecognizeOnceAsync().get();

// Checks result.
if (result->Reason == ResultReason::TranslatedSpeech)
{
    cout << "RECOGNIZED: Text=" << result->Text << std::endl;

    for (const auto& it : result->Translations)
    {
        cout << "TRANSLATED into '" << it.first.c_str() << "': " << it.second.c_str() << std::endl;
    }
}
else if (result->Reason == ResultReason::RecognizedSpeech)
{
    cout << "RECOGNIZED: Text=" << result->Text << " (text could not be translated)" << std::endl;
}
else if (result->Reason == ResultReason::NoMatch)
{
    cout << "NOMATCH: Speech could not be recognized." << std::endl;
}
else if (result->Reason == ResultReason::Canceled)
{
    auto cancellation = CancellationDetails::FromResult(result);
    cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;

    if (cancellation->Reason == CancellationReason::Error)
    {
        cout << "CANCELED: ErrorCode=" << (int)cancellation->ErrorCode << std::endl;
        cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
        cout << "CANCELED: Did you set the speech resource key and region values?" << std::endl;
    }
}

using namespace std;
using namespace Microsoft::CognitiveServices::Speech;
using namespace Microsoft::CognitiveServices::Speech::Audio;
using namespace Microsoft::CognitiveServices::Speech::Translation;

void MultiLingualTranslation()
{
    auto region = "YourServiceRegion";
    // Currently the v2 endpoint is required. In a future SDK release you won't need to set it.
    auto endpointString = std::format("wss://{}.stt.speech.microsoft.com/speech/universal/v2", region);
    auto config = SpeechTranslationConfig::FromEndpoint(endpointString, "YourSubscriptionKey");

    // Set the LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
    speechConfig->SetProperty(PropertyId::SpeechServiceConnection_LanguageIdMode, "Continuous");
    auto autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE", "zh-CN" });

    promise<void> recognitionEnd;
    // Source language is required, but currently ignored. 
    auto fromLanguage = "en-US";
    config->SetSpeechRecognitionLanguage(fromLanguage);
    config->AddTargetLanguage("de");
    config->AddTargetLanguage("fr");

    auto audioInput = AudioConfig::FromWavFileInput("whatstheweatherlike.wav");
    auto recognizer = TranslationRecognizer::FromConfig(config, autoDetectSourceLanguageConfig, audioInput);

    recognizer->Recognizing.Connect([](const TranslationRecognitionEventArgs& e)
        {
            std::string lidResult = e.Result->Properties.GetProperty(PropertyId::SpeechServiceConnection_AutoDetectSourceLanguageResult);

            cout << "Recognizing in Language = "<< lidResult << ":" << e.Result->Text << std::endl;
            for (const auto& it : e.Result->Translations)
            {
                cout << "  Translated into '" << it.first.c_str() << "': " << it.second.c_str() << std::endl;
            }
        });

    recognizer->Recognized.Connect([](const TranslationRecognitionEventArgs& e)
        {
            if (e.Result->Reason == ResultReason::TranslatedSpeech)
            {
                std::string lidResult = e.Result->Properties.GetProperty(PropertyId::SpeechServiceConnection_AutoDetectSourceLanguageResult);
                cout << "RECOGNIZED in Language = " << lidResult << ": Text=" << e.Result->Text << std::endl;
            }
            else if (e.Result->Reason == ResultReason::RecognizedSpeech)
            {
                cout << "RECOGNIZED: Text=" << e.Result->Text << " (text could not be translated)" << std::endl;
            }
            else if (e.Result->Reason == ResultReason::NoMatch)
            {
                cout << "NOMATCH: Speech could not be recognized." << std::endl;
            }

            for (const auto& it : e.Result->Translations)
            {
                cout << "  Translated into '" << it.first.c_str() << "': " << it.second.c_str() << std::endl;
            }
        });

    recognizer->Canceled.Connect([&recognitionEnd](const TranslationRecognitionCanceledEventArgs& e)
        {
            cout << "CANCELED: Reason=" << (int)e.Reason << std::endl;
            if (e.Reason == CancellationReason::Error)
            {
                cout << "CANCELED: ErrorCode=" << (int)e.ErrorCode << std::endl;
                cout << "CANCELED: ErrorDetails=" << e.ErrorDetails << std::endl;
                cout << "CANCELED: Did you set the speech resource key and region values?" << std::endl;

                recognitionEnd.set_value();
            }
        });

    recognizer->Synthesizing.Connect([](const TranslationSynthesisEventArgs& e)
        {
            auto size = e.Result->Audio.size();
            cout << "Translation synthesis result: size of audio data: " << size
                << (size == 0 ? "(END)" : "");
        });

    recognizer->SessionStopped.Connect([&recognitionEnd](const SessionEventArgs& e)
        {
            cout << "Session stopped.";
            recognitionEnd.set_value();
        });

    // Starts continuos recognition. Use StopContinuousRecognitionAsync() to stop recognition.
    recognizer->StartContinuousRecognitionAsync().get();
    recognitionEnd.get_future().get();
    recognizer->StopContinuousRecognitionAsync().get();
}

Lihat lebih banyak contoh terjemahan ucapan dengan identifikasi bahasa di GitHub.

Kenali satu kali
Pengenalan berkelanjutan

import azure.cognitiveservices.speech as speechsdk
import time
import json

speech_key, service_region = "YourSubscriptionKey","YourServiceRegion"
weatherfilename="en-us_zh-cn.wav"

# set up translation parameters: source language and target languages
# Currently the v2 endpoint is required. In a future SDK release you won't need to set it. 
endpoint_string = "wss://{}.stt.speech.microsoft.com/speech/universal/v2".format(service_region)
translation_config = speechsdk.translation.SpeechTranslationConfig(
    subscription=speech_key,
    endpoint=endpoint_string,
    speech_recognition_language='en-US',
    target_languages=('de', 'fr'))
audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)

# Specify the AutoDetectSourceLanguageConfig, which defines the number of possible languages
auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE", "zh-CN"])

# Creates a translation recognizer using and audio file as input.
recognizer = speechsdk.translation.TranslationRecognizer(
    translation_config=translation_config, 
    audio_config=audio_config,
    auto_detect_source_language_config=auto_detect_source_language_config)

# Starts translation, and returns after a single utterance is recognized. The end of a
# single utterance is determined by listening for silence at the end or until a maximum of 15
# seconds of audio is processed. The task returns the recognition text as result.
# Note: Since recognize_once() returns only a single utterance, it is suitable only for single
# shot recognition like command or query.
# For long-running multi-utterance recognition, use start_continuous_recognition() instead.
result = recognizer.recognize_once()

# Check the result
if result.reason == speechsdk.ResultReason.TranslatedSpeech:
    print("""Recognized: {}
    German translation: {}
    French translation: {}""".format(
        result.text, result.translations['de'], result.translations['fr']))
elif result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
    detectedSrcLang = result.properties[speechsdk.PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult]
    print("Detected Language: {}".format(detectedSrcLang))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    print("Translation canceled: {}".format(result.cancellation_details.reason))
    if result.cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(result.cancellation_details.error_details))

import azure.cognitiveservices.speech as speechsdk
import time
import json

speech_key, service_region = "YourSubscriptionKey","YourServiceRegion"
weatherfilename="en-us_zh-cn.wav"

# Currently the v2 endpoint is required. In a future SDK release you won't need to set it. 
endpoint_string = "wss://{}.stt.speech.microsoft.com/speech/universal/v2".format(service_region)
translation_config = speechsdk.translation.SpeechTranslationConfig(
    subscription=speech_key,
    endpoint=endpoint_string,
    speech_recognition_language='en-US',
    target_languages=('de', 'fr'))
audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)

# Set the LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
translation_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceConnection_LanguageIdMode, value='Continuous')

# Specify the AutoDetectSourceLanguageConfig, which defines the number of possible languages
auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE", "zh-CN"])

# Creates a translation recognizer using and audio file as input.
recognizer = speechsdk.translation.TranslationRecognizer(
    translation_config=translation_config, 
    audio_config=audio_config,
    auto_detect_source_language_config=auto_detect_source_language_config)

def result_callback(event_type, evt):
    """callback to display a translation result"""
    print("{}: {}\n\tTranslations: {}\n\tResult Json: {}".format(
        event_type, evt, evt.result.translations.items(), evt.result.json))

done = False

def stop_cb(evt):
    """callback that signals to stop continuous recognition upon receiving an event `evt`"""
    print('CLOSING on {}'.format(evt))
    nonlocal done
    done = True

# connect callback functions to the events fired by the recognizer
recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
# event for intermediate results
recognizer.recognizing.connect(lambda evt: result_callback('RECOGNIZING', evt))
# event for final result
recognizer.recognized.connect(lambda evt: result_callback('RECOGNIZED', evt))
# cancellation event
recognizer.canceled.connect(lambda evt: print('CANCELED: {} ({})'.format(evt, evt.reason)))

# stop continuous recognition on either session stopped or canceled events
recognizer.session_stopped.connect(stop_cb)
recognizer.canceled.connect(stop_cb)

def synthesis_callback(evt):
    """
    callback for the synthesis event
    """
    print('SYNTHESIZING {}\n\treceived {} bytes of audio. Reason: {}'.format(
        evt, len(evt.result.audio), evt.result.reason))
    if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("RECOGNIZED: {}".format(evt.result.properties))
        if evt.result.properties.get(speechsdk.PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult) == None:
            print("Unable to detect any language")
        else:
            detectedSrcLang = evt.result.properties[speechsdk.PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult]
            jsonResult = evt.result.properties[speechsdk.PropertyId.SpeechServiceResponse_JsonResult]
            detailResult = json.loads(jsonResult)
            startOffset = detailResult['Offset']
            duration = detailResult['Duration']
            if duration >= 0:
                endOffset = duration + startOffset
            else:
                endOffset = 0
            print("Detected language = " + detectedSrcLang + ", startOffset = " + str(startOffset) + " nanoseconds, endOffset = " + str(endOffset) + " nanoseconds, Duration = " + str(duration) + " nanoseconds.")
            global language_detected
            language_detected = True

# connect callback to the synthesis event
recognizer.synthesizing.connect(synthesis_callback)

# start translation
recognizer.start_continuous_recognition()

while not done:
    time.sleep(.5)

recognizer.stop_continuous_recognition()

Menjalankan dan menggunakan kontainer

Kontainer ucapan menyediakan API titik akhir kueri berbasis websocket yang diakses melalui Speech SDK dan Speech CLI. Secara default, Speech SDK dan Speech CLI menggunakan layanan Ucapan publik. Untuk menggunakan kontainer, Anda perlu mengubah metode inisialisasi. Gunakan URL host kontainer alih-alih kunci dan wilayah.

Saat Anda menjalankan ID bahasa dalam kontainer, gunakan SourceLanguageRecognizer objek alih-alih SpeechRecognizer atau TranslationRecognizer.

Untuk informasi selengkapnya tentang kontainer, lihat panduan cara penggunaan kontainer ucapan identifikasi bahasa.

Menerapkan transkripsi batch ucapan ke teks

Untuk mengidentifikasi bahasa dengan Transkripsi Batch REST API, gunakan languageIdentification properti dalam isi permintaan Transcriptions_Create Anda.

Peringatan

Transkripsi batch hanya mendukung identifikasi bahasa untuk model dasar default. Jika identifikasi bahasa dan model kustom ditentukan dalam permintaan transkripsi, layanan akan kembali menggunakan model dasar untuk bahasa kandidat yang ditentukan. Ini mungkin mengakibatkan hasil pengenalan yang tidak terduga.

Jika skenario ucapan ke teks Anda memerlukan identifikasi bahasa dan model kustom, gunakan ucapan real-time ke teks alih-alih transkripsi batch.

Contoh berikut menunjukkan penggunaan languageIdentification properti dengan empat bahasa kandidat. Untuk informasi selengkapnya tentang properti permintaan, lihat Membuat transkripsi batch.

{
    <...>
    
    "properties": {
    <...>
    
        "languageIdentification": {
            "candidateLocales": [
            "en-US",
            "ja-JP",
            "zh-CN",
            "hi-IN"
            ]
        },	
        <...>
    }
}

Menerapkan identifikasi bahasa

Mengatur opsi konfigurasi

Bahasa kandidat

Identifikasi bahasa Di Awal dan Berkelanjutan

Mengenali satu kali atau berkelanjutan

Menggunakan ucapan ke teks

Model kustom ucapan ke teks

Menjalankan terjemahan ucapan

Menjalankan dan menggunakan kontainer

Menerapkan transkripsi batch ucapan ke teks

Konten terkait

Sumber Daya Tambahan: