Menurunkan latensi sintesis ucapan menggunakan Speech SDK

Artikel
01/22/2024

Latensi sintesis sangat penting untuk aplikasi Anda. Dalam artikel ini, kami akan memperkenalkan praktik terbaik untuk menurunkan latensi dan menghadirkan performa terbaik bagi pengguna akhir Anda.

Biasanya, kami mengukur latensi berdasarkan first byte latency dan finish latency, sebagai berikut:

Latensi	Deskripsi	Kunci properti SpeechSynthesisResult
latensi byte pertama	Menunjukkan penundaan waktu antara permulaan tugas sintesis dan penerimaan gugus pertama data audio.	SpeechServiceResponse_SynthesisFirstByteLatencyMs
latensi akhir	Menunjukkan penundaan waktu antara permulaan tugas sintesis dan penerimaan seluruh data audio yang disintesiskan.	SpeechServiceResponse_SynthesisFinishLatencyMs

Speech SDK menempatkan durasi latensi dalam koleksi Properti SpeechSynthesisResult. Contoh kode berikut menunjukkan nilai-nilai ini.

var result = await synthesizer.SpeakTextAsync(text);
Console.WriteLine($"first byte latency: \t{result.Properties.GetProperty(PropertyId.SpeechServiceResponse_SynthesisFirstByteLatencyMs)} ms");
Console.WriteLine($"finish latency: \t{result.Properties.GetProperty(PropertyId.SpeechServiceResponse_SynthesisFinishLatencyMs)} ms");
// you can also get the result id, and send to us when you need help for diagnosis
var resultId = result.ResultId;

Latensi	Deskripsi	Kunci properti SpeechSynthesisResult
`first byte latency`	Menunjukkan penundaan waktu antara permulaan sintesis dan penerimaan gugus audio pertama.	`SpeechServiceResponse_SynthesisFirstByteLatencyMs`
`finish latency`	Menunjukkan penundaan waktu antara permulaan sintesis dan penerimaan seluruh audio yang disintesiskan.	`SpeechServiceResponse_SynthesisFinishLatencyMs`

Speech SDK mengukur latensi dan memasukkannya ke dalam kantong properti SpeechSynthesisResult. Lihat kode berikut untuk mendapatkannya.

auto result = synthesizer->SpeakTextAsync(text).get();
auto firstByteLatency = std::stoi(result->Properties.GetProperty(PropertyId::SpeechServiceResponse_SynthesisFirstByteLatencyMs));
auto finishedLatency = std::stoi(result->Properties.GetProperty(PropertyId::SpeechServiceResponse_SynthesisFinishLatencyMs));
// you can also get the result id, and send to us when you need help for diagnosis
auto resultId = result->ResultId;

Latensi	Deskripsi	Kunci properti SpeechSynthesisResult
`first byte latency`	Menunjukkan penundaan waktu antara permulaan sintesis dan penerimaan gugus audio pertama.	`SpeechServiceResponse_SynthesisFirstByteLatencyMs`
`finish latency`	Menunjukkan penundaan waktu antara permulaan sintesis dan penerimaan seluruh audio yang disintesiskan.	`SpeechServiceResponse_SynthesisFinishLatencyMs`

Speech SDK mengukur latensi dan memasukkannya ke dalam kantong properti SpeechSynthesisResult. Lihat kode berikut untuk mendapatkannya.

SpeechSynthesisResult result = synthesizer.SpeakTextAsync(text).get();
System.out.println("first byte latency: \t" + result.getProperties().getProperty(PropertyId.SpeechServiceResponse_SynthesisFirstByteLatencyMs) + " ms.");
System.out.println("finish latency: \t" + result.getProperties().getProperty(PropertyId.SpeechServiceResponse_SynthesisFinishLatencyMs) + " ms.");
// you can also get the result id, and send to us when you need help for diagnosis
String resultId = result.getResultId();

Latensi	Deskripsi	Kunci properti SpeechSynthesisResult
`first byte latency`	Menunjukkan penundaan waktu antara permulaan sintesis dan penerimaan gugus audio pertama.	`SpeechServiceResponse_SynthesisFirstByteLatencyMs`
`finish latency`	Menunjukkan penundaan waktu antara permulaan sintesis dan penerimaan seluruh audio yang disintesiskan.	`SpeechServiceResponse_SynthesisFinishLatencyMs`

Speech SDK mengukur latensi dan memasukkannya ke dalam kantong properti SpeechSynthesisResult. Lihat kode berikut untuk mendapatkannya.

result = synthesizer.speak_text_async(text).get()
first_byte_latency = int(result.properties.get_property(speechsdk.PropertyId.SpeechServiceResponse_SynthesisFirstByteLatencyMs))
finished_latency = int(result.properties.get_property(speechsdk.PropertyId.SpeechServiceResponse_SynthesisFinishLatencyMs))
# you can also get the result id, and send to us when you need help for diagnosis
result_id = result.result_id

Latensi	Deskripsi	Kunci properti SPXSpeechSynthesisResult
`first byte latency`	Menunjukkan penundaan waktu antara permulaan sintesis dan penerimaan gugus audio pertama.	`SPXSpeechServiceResponseSynthesisFirstByteLatencyMs`
`finish latency`	Menunjukkan penundaan waktu antara permulaan sintesis dan penerimaan seluruh audio yang disintesiskan.	`SPXSpeechServiceResponseSynthesisFinishLatencyMs`

Speech SDK mengukur latensi dan memasukkannya ke dalam kantong properti SPXSpeechSynthesisResult. Lihat kode berikut untuk mendapatkannya.

SPXSpeechSynthesisResult *speechResult = [speechSynthesizer speakText:text];
int firstByteLatency = [intString [speechResult.properties getPropertyById:SPXSpeechServiceResponseSynthesisFirstByteLatencyMs]];
int finishedLatency = [intString [speechResult.properties getPropertyById:SPXSpeechServiceResponseSynthesisFinishLatencyMs]];
// you can also get the result id, and send to us when you need help for diagnosis
NSString *resultId = result.resultId;

Latensi byte pertama lebih rendah dari latensi akhir dalam banyak kasus. Latensi byte pertama tidak bergantung pada panjang teks, sedangkan latensi akhir bertambah seiring panjang teks.

Idealnya, kami ingin meminimalkan latensi yang dialami pengguna (latensi sebelum pengguna mendengar suara) ke satu waktu perjalanan rute jaringan plus latensi gugus audio pertama layanan sintesis ucapan.

Streaming

Streaming sangat penting untuk menurunkan latensi. Kode klien dapat memulai pemutaran saat gugus audio pertama diterima. Dalam skenario layanan, Anda dapat meneruskan gugus audio langsung ke klien, tanpa menunggu seluruh audio.

Anda dapat menggunakan PullAudioOutputStream, PushAudioOutputStream, Synthesizing peristiwa, dan AudioDataStream Speech SDK untuk mengaktifkan streaming.