您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

说话人识别 API - 预览版Speaker Recognition API - Preview

说话人识别 API 是基于云的 API,提供用于说话人验证和说话人识别的先进 AI 算法。Speaker Recognition APIs are cloud-based APIs that provide the advanced AI algorithms for speaker verification and speaker identification. 说话人识别分为两个类别:说话人验证和说话人识别。Speaker Recognition is divided into two categories: speaker verification and speaker identification.

说话人验证Speaker Verification

语音具有可与个人关联的独特特征。Voice has unique characteristics that can be associated with an individual. 在呼叫中心和 Web 服务等场景中,应用程序可以使用语音作为附加的验证因素。Applications can use voice as an additional factor for verification, in scenarios like call centers and web services.

说话人验证 API 充当一个智能工具,可以帮助使用用户的声音和语音通行短语来验证其身份。Speaker Verification APIs serve as an intelligent tool to help verify users using both their voice and speech passphrases.

注册Enrollment

说话人验证注册依赖文本。也就是说,说话人需要选择特定的通行短语,以用于注册和验证阶段。Enrollment for speaker verification is text-dependent, which means speakers need to choose a specific passphrase to use during both enrollment and verification phases.

在说话人注册阶段,说话人需要讲出特定的短语,系统会对其录音。In the speaker enrollment phase, the speaker's voice is recorded saying a specific phrase. 然后提取语音特征来构成唯一的语音签名,同时识别所选的短语。Voice features are extracted to form a unique voice signature while the chosen phrase is recognized. 这些说话人注册数据共同用于验证说话人的身份。Together, this speaker enrollment data would be used to verify the speaker. 说话人注册数据存储在受保护的系统中。The speaker enrollment data are stored in a secured system. 这些数据的保留时长由客户控制。The Customer controls how long it should be retained. 客户可以通过 API 调用创建、更新和删除说话人的个人注册数据。Customers can create, update, and remove enrollment data for individual speakers through API calls. 删除订阅后,与该订阅关联的所有说话人注册数据也会一并删除。When the subscription is deleted, all the speaker enrollment data associated with the subscription will also be deleted.

客户应确保已收到要进行说话人验证的用户提供的相应权限。Customers should ensure they have received the appropriate permissions from the users for speaker verification.

验证Verification

在验证阶段,客户应结合与要验证的个人关联的 ID 调用说话人验证 API。In the verification phase, the Customer should call the speaker verification API with the ID associated with the individual to be verified. 服务从输入的录音中提取语音特征和通行短语。The service extracts voice features and the passphrase from the input speech recording. 然后,它会将这些特征与客户想要验证的说话人的说话人注册数据的相应元素进行比较,并确定是否存在任何匹配项。Then it compares the features against the corresponding elements of the speaker enrollment data for the speaker the Customer is seeking to verify and determines any match. 响应中会返回“接受”或“拒绝”以及不同的置信度。The response returns "accept" or "reject" with different confidence levels. 然后,客户确定如何使用结果来帮助判断此人是否为已注册的说话人。The Customer then determines how to use the results to help decide whether this person is the enrolled speaker.

应该根据场景以及所用的其他验证因素设置阈值置信度。The threshold confidence level should be set based on the scenario and other verification factors that are being used. 我们建议试验该置信度,并考虑每个应用场景的适当设置。We recommend you experiment with the confidence level and consider the appropriate setting for each application. 这些 API 并不旨在用于确定音频是来自真实的个人、模仿内容,还是已注册的说话人的录音。The APIs are not intended to determine whether the audio is from a live person or an imitation or a recording of an enrolled speaker.

在验证阶段,服务不会保留录音,也不会保留已发送到服务的提取语音特征。The service does not retain the speech recording or the extracted voice features that are sent to the service during the verification phase.

若要详细了解说话人验证,请参阅 API 说话人 - 验证For more details about speaker verification, please refer to the API Speaker - Verification.

说话人识别Speaker Identification

在给定了一组已注册说话人的情况下,应用程序可以使用语音来识别“谁正在说话”。Applications can use voice to identify "who is speaking" given a group of enrolled speakers. 可以在生产力会议、个性化和呼叫中心听录等场景中使用说话人识别 API。Speaker Identification APIs could be used in scenarios like meeting productivity, personalization, and call center transcription.

注册Enrollment

说话人识别注册不依赖文本。也就是说,对说话人在音频中所说的内容没有限制。Enrollment for speaker identification is text-independent, which means that there are no restrictions on what the speaker says in the audio. 无需通行短语。No passphrase is required.

在注册阶段,将为说话人录音,并提取语音特征来构成唯一的语音签名。In the enrollment phase, the speaker's voice is recorded, and voice features are extracted to form a unique voice signature. 提取的语音音频和特征存储在受保护的系统中。The speech audio and features extracted are stored in a secured system. 这些信息的保留时长由客户控制。The Customer controls how long it is retained. 客户可以通过 API 调用创建、更新和删除说话人个人的这些注册数据。Customers can create, update, and remove this speaker enrollment data for individual speakers through API calls. 删除订阅后,与该订阅关联的所有说话人注册数据也会一并删除。When the subscription is deleted, all the speaker enrollment data associated with the subscription will also be deleted.

客户应确保已收到要进行说话人识别的用户提供的相应权限。Customers should ensure they have received the appropriate permissions from the users for speaker identification.

识别Identification

在识别阶段,说话人识别服务将从输入的录音中提取语音特征。In the identification phase, the speaker identification service extracts voice features from the input speech recording. 然后,它将这些特征与指定的说话人列表中的注册数据进行比较。Then it compares the features against the enrollment data of the specified list of speakers. 如果找到了已注册说话人之间的匹配项,响应中会返回该说话人的 ID 以及置信度。When a match is found with an enrolled speaker, the response returns the ID of the speaker with a confidence level. 否则,如果未找到与已注册说话人之间的匹配项,响应中会返回“拒绝”。Otherwise, the response returns "reject" when no speaker is a match to an enrolled speaker.

应该根据场景设置阈值置信度。The threshold confidence level should be set based on the scenario. 我们建议试验该置信度,并考虑每个应用场景的适当设置。We recommend you experiment with the confidence level and consider the appropriate setting for each application. 这些 API 并不旨在用于确定音频是来自真实的个人、模仿内容,还是已注册的说话人的录音。The APIs are not intended to determine whether the audio is from a live person or an imitation or a recording of an enrolled speaker.

在识别阶段,服务不会保留录音,也不会保留已发送到服务的提取语音特征。The service does not retain the speech recording or the extracted voice features that are sent to the service for the identification phase.

若要详细了解说话人识别,请参阅 API 说话人 - 识别For more details about speaker identification, please refer to the API Speaker - Identification.