說話者辨識 API - 預覽Speaker Recognition API - Preview

說話者辨識 API 是雲端式 API,提供先進的 AI 演算法進行說話者驗證和說話者辨識。Speaker Recognition APIs are cloud-based APIs that provide the advanced AI algorithms for speaker verification and speaker identification. 說話者辨識分為兩類:說話者驗證和說話者辨識。Speaker Recognition is divided into two categories: speaker verification and speaker identification.

說話者驗證Speaker Verification

語音具有與個人相關聯的獨特性。Voice has unique characteristics that can be associated with an individual. 在話務中心和 Web 服務等案例中,應用程式可以將語音作為驗證的額外因素。Applications can use voice as an additional factor for verification, in scenarios like call centers and web services.

說話者驗證 API 是一種智慧型工具,可運用使用者的語音和語音複雜密碼,來協助驗證使用者。Speaker Verification APIs serve as an intelligent tool to help verify users using both their voice and speech passphrases.

申請Enrollment

說話者驗證註冊因文字而異,這表示說話者需要選擇在註冊和驗證階段期間所用的特定複雜密碼。Enrollment for speaker verification is text-dependent, which means speakers need to choose a specific passphrase to use during both enrollment and verification phases.

在說話者註冊階段中,會記錄說話者說出特定片語的語音。In the speaker enrollment phase, the speaker's voice is recorded saying a specific phrase. 系統在辨識選擇的片語時,會擷取語音特徵以形成唯一的語音簽章。Voice features are extracted to form a unique voice signature while the chosen phrase is recognized. 同時,此說話者註冊資料將用於驗證說話者。Together, this speaker enrollment data would be used to verify the speaker. 說話者註冊資料會儲存在安全的系統中。The speaker enrollment data are stored in a secured system. 客戶可控制應保留資料的時間長度。The Customer controls how long it should be retained. 客戶可以透過 API 呼叫來建立、更新,以及移除個別說話者的註冊資料。Customers can create, update, and remove enrollment data for individual speakers through API calls. 刪除訂閱後,與該訂閱相關聯的所有說話者註冊資料也會一併刪除。When the subscription is deleted, all the speaker enrollment data associated with the subscription will also be deleted.

客戶應確保他們從使用者獲得適當權限,以進行說話者驗證。Customers should ensure they have received the appropriate permissions from the users for speaker verification.

驗證Verification

在驗證階段,客戶應該使用與待驗證個人相關聯的識別碼來呼叫說話者驗證 API。In the verification phase, the Customer should call the speaker verification API with the ID associated with the individual to be verified. 此服務會從輸入語音錄製中擷取語音特徵和複雜密碼。The service extracts voice features and the passphrase from the input speech recording. 然後,它會針對客戶要搜尋的說話者,將特徵與說話者註冊資料中的對應元素進行比較,並確定是否相符。Then it compares the features against the corresponding elements of the speaker enrollment data for the speaker the Customer is seeking to verify and determines any match. 回應會透過不同信賴等級傳回「接受」或「拒絕」。The response returns "accept" or "reject" with different confidence levels. 接著客戶會判斷如何使用結果,以協助決定此人是否為註冊的說話者。The Customer then determines how to use the results to help decide whether this person is the enrolled speaker.

閾值信賴等級應根據案例和其他使用的驗證因素來設定。The threshold confidence level should be set based on the scenario and other verification factors that are being used. 我們建議您對信賴等級進行測試,並針對每個應用程式考量適當的設定。We recommend you experiment with the confidence level and consider the appropriate setting for each application. API 的用途不是為了判斷音訊是來自實際人員,或是來自已註冊說話者的模擬或錄製內容。The APIs are not intended to determine whether the audio is from a live person or an imitation or a recording of an enrolled speaker.

這項服務不會保留在驗證階段傳送給該服務的語音錄製或擷取的語音特徵。The service does not retain the speech recording or the extracted voice features that are sent to the service during the verification phase.

如需說話者驗證的詳細資訊,請參閱說話者 - 驗證 API。For more details about speaker verification, please refer to the API Speaker - Verification.

說話者識別Speaker Identification

在一群已註冊的說話者中,應用程式可以使用語音來識別「誰在說話」。Applications can use voice to identify "who is speaking" given a group of enrolled speakers. 說話者辨識 API 可用於滿足生產力、個人化和話務中心轉譯等案例。Speaker Identification APIs could be used in scenarios like meeting productivity, personalization, and call center transcription.

申請Enrollment

說話者識別註冊與文字無關,這表示說話者在音訊中什麼都可以說。Enrollment for speaker identification is text-independent, which means that there are no restrictions on what the speaker says in the audio. 不需要複雜密碼。No passphrase is required.

系統在註冊階段會錄下說話者的聲音,並擷取語音特徵以形成唯一的語音簽章。In the enrollment phase, the speaker's voice is recorded, and voice features are extracted to form a unique voice signature. 擷取的語音音訊和特徵會儲存在安全的系統中。The speech audio and features extracted are stored in a secured system. 客戶可控制應保留資料的時間長度。The Customer controls how long it is retained. 客戶可以透過 API 呼叫,針對個別說話者來建立、更新和移除該說話者的註冊資料。Customers can create, update, and remove this speaker enrollment data for individual speakers through API calls. 刪除訂閱後,與該訂閱相關聯的所有說話者註冊資料也會一併刪除。When the subscription is deleted, all the speaker enrollment data associated with the subscription will also be deleted.

客戶應確保他們從使用者獲得適當權限,以進行說話者驗證。Customers should ensure they have received the appropriate permissions from the users for speaker identification.

識別Identification

在識別階段,說話者辨識服務會從輸入語音錄製擷取語音特徵。In the identification phase, the speaker identification service extracts voice features from the input speech recording. 然後,該服務會將特徵與指定說話者清單的註冊資料進行比較。Then it compares the features against the enrollment data of the specified list of speakers. 找到已註冊說話者的相符項目時,回應會傳回具有信賴等級的說話者識別碼。When a match is found with an enrolled speaker, the response returns the ID of the speaker with a confidence level. 否則,當沒有任何說話者符合已註冊的說話者時,回應會傳回「拒絕」。Otherwise, the response returns "reject" when no speaker is a match to an enrolled speaker.

閾值信賴等級應該根據案例進行設定。The threshold confidence level should be set based on the scenario. 我們建議您對信賴等級進行測試,並針對每個應用程式考量適當的設定。We recommend you experiment with the confidence level and consider the appropriate setting for each application. API 的用途不是為了判斷音訊是來自實際人員,或是來自已註冊說話者的模擬或錄製內容。The APIs are not intended to determine whether the audio is from a live person or an imitation or a recording of an enrolled speaker.

這項服務不會保留在識別階段傳送給該服務的語音錄製或擷取的語音特徵。The service does not retain the speech recording or the extracted voice features that are sent to the service for the identification phase.

如需說話者識別的詳細資訊,請參閱 說話者 - 識別 API。For more details about speaker identification, please refer to the API Speaker - Identification.