什麼是對話轉譯?What is Conversation Transcription?

對話轉譯是語音服務的一項先進功能, 結合了即時語音辨識、說話者識別和 diarization。Conversation Transcription is an advanced feature of the Speech Services that combines real-time speech recognition, speaker identification, and diarization. 對話轉譯非常適合用來轉譯人員會議, 而且能夠區分說話者, 讓您知道誰說什麼, 以及如何讓參與者專注在會議上, 並快速追蹤後續步驟。Conversation Transcription is perfect for transcribing in-person meetings, with the ability to distinguish speakers, it lets you know who said what and when, allowing participants to focus on the meeting and quickly follow up on next steps. 這項功能也改善了協助工具。This feature also improves accessibility. 有了轉譯, 您就可以積極地吸引參與者的聽力障礙。With transcription, you can actively engage participants with hearing impairments.

對話轉譯透過可自訂的語音模型提供精確的辨識, 讓您可以量身打造以瞭解產業和公司專屬的詞彙。Conversation Transcription delivers accurate recognition with customizable speech models that you can tailor to understand industry and company-specific vocabulary. 此外, 您可以使用語音裝置 SDK 配對對話轉譯, 以優化多麥克風裝置的使用體驗。Additionally, you can pair Conversation Transcription with the Speech Devices SDK to optimize the experience for multi-microphone devices.

注意

目前建議對小型會議進行對話轉譯。Currently, Conversation Transcription is recommended for small meetings. 如果您想要大規模延伸對大型會議的交談轉譯, 請洽詢我們。If you'd like to extend the Conversation Transcription for large meetings at scale, please contact us.

此圖說明搭配對話轉譯一起使用的硬體、軟體和服務。This diagram illustrates the hardware, software, and services that work together with Conversation Transcription.

匯入交談轉譯圖表

重要

需要具有特定幾何設定的迴圈七個麥克風陣列。A circular seven microphone array with specific geometry configuration is required. 如需規格和設計詳細資料, 請參閱Microsoft 語音裝置 SDK 麥克風For specification and design details, see Microsoft Speech Device SDK Microphone. 若要深入瞭解或購買開發工具組, 請參閱取得 Microsoft 語音裝置 SDKTo learn more or purchase a development kit, see Get Microsoft Speech Device SDK.

開始使用對話轉譯Get started with Conversation Transcription

您需要執行三個步驟, 才能開始使用對話轉譯。There are three steps that you need to take to get started with Conversation Transcription.

  1. 收集使用者的語音範例。Collect voice samples from users.
  2. 使用 user voice 範例產生使用者設定檔Generate user profiles using the user voice samples
  3. 使用語音 SDK 來識別使用者 (喇叭) 和轉譯語音Use the Speech SDK to identify users (speakers) and transcribe speech

收集使用者語音範例Collect user voice samples

第一個步驟是從每個使用者收集音訊錄影。The first step is to collect audio recordings from each user. 使用者語音應該記錄在無背景雜音的安靜環境中。User speech should be recorded in a quiet environment without background noise. 每個音訊樣本的建議長度介於30秒到2分鐘之間。The recommended length for each audio sample is between 30 seconds and two minutes. 較長的音訊範例會在識別說話者時提高準確度。Longer audio samples will result in improved accuracy when identifying speakers. 音訊必須是具有 16 KHz 取樣率的 mono 頻道。Audio must be mono channel with a 16 KHz sample rate.

除了前述的指導之外, 我們還會為您提供如何記錄及儲存音訊的方式--建議使用安全的資料庫。Beyond the aforementioned guidance, how audio is recorded and stored is up to you -- a secure database is recommended. 在下一節中, 我們將探討如何使用此音訊來產生與語音 SDK 搭配使用來辨識喇叭的使用者設定檔。In the next section, we'll review how this audio is used to generate user profiles that are used with the Speech SDK to recognize speakers.

產生使用者設定檔Generate user profiles

接下來, 您必須將收集到的錄音記錄傳送到簽章產生服務, 以驗證音訊並產生使用者設定檔。Next, you'll need to send the audio recordings you've collected to the Signature Generation Service to validate the audio and generate user profiles. 簽章產生服務是一組 REST api, 可讓您產生和取出使用者設定檔。The Signature Generation Service is a set of REST APIs, that allow you generate and retrieve user profiles.

若要建立使用者設定檔, 您必須使用GenerateVoiceSignature API。To create a user profile, you'll need to use the GenerateVoiceSignature API. 提供規格詳細資料和範例程式碼:Specification details and sample code are available:

注意

對話轉譯目前可在下欄區域中的 "en-us" 和 "zh-CN" 中使用: centralus和。 eastasiaConversation Transcription is currently available in "en-US" and "zh-CN" in the following regions: centralus and eastasia.

轉譯和識別喇叭Transcribe and identify speakers

對話轉譯需要多重通道音訊資料流程和使用者設定檔作為輸入, 以產生轉譯和識別說話者。Conversation Transcription expects multichannel audio streams and user profiles as inputs to generate transcriptions and identify speakers. 音訊和使用者設定檔資料會使用語音裝置 SDK 傳送至對話轉譯服務。Audio and user profile data are sent to Conversation Transcription service using the Speech Devices SDK. 如先前所述, 需要迴圈七個麥克風陣列和語音裝置 SDK, 才能使用對話轉譯。As previously mentioned, a circular seven microphone array and the Speech Devices SDK are required to use Conversation Transcription.

注意

如需規格和設計詳細資料, 請參閱Microsoft 語音裝置 SDK 麥克風For specification and design details, see Microsoft Speech Device SDK Microphone. 若要深入瞭解或購買開發工具組, 請參閱取得 Microsoft 語音裝置 SDKTo learn more or purchase a development kit, see Get Microsoft Speech Device SDK.

若要瞭解如何使用「語音裝置 SDK」進行對話轉譯, 請參閱如何使用對話轉譯。To learn how to use Conversation Transcription with the Speech Devices SDK, see How to use conversation transcription.

範例應用程式快速入門Quick Start with a sample app

Microsoft 語音裝置 SDK 具有適用于所有裝置相關範例的快速入門範例應用程式。Microsoft Speech Device SDK has a quick start sample app for all device related samples. 對話轉譯是其中一種。Conversation Transcription is one of them. 您可以在使用範例應用程式的語音裝置 SDK android 快速入門和其原始程式碼中找到, 以供您參考。You can find it in Speech Device SDK android quickstart with sample app and its source code for your reference.

後續步驟Next steps