您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

说话人识别Speaker Recognition

Azure 认知服务发言人识别服务提供的算法可以根据扬声器的独特语音特征来验证和识别扬声器。The Azure Cognitive Service Speaker Recognition service provides algorithms that verify and identify speakers by their unique voice characteristics. 演讲者识别用于回答 "谁在说" 的问题。Speaker Recognition is used to answer the question "who is speaking?". 了解详细信息Learn more.

语音具有可与个人关联的独特特征。Voice has unique characteristics that can be associated with an individual. 我们为演讲者识别技术的两大应用程序提供说话人验证 Api 和演讲者识别 Api。We provide Speaker Verification APIs and Speaker Identification APIs for two major applications of Speaker Recognition technologies.

说话人验证Speaker Verification

扬声器验证可以是文本相关的,也可以是与文本无关的。Speaker verification can be either text-dependent or text-independent. 文本从属验证表示扬声器需要选择要在注册和验证阶段中使用的相同通行短语。Text-dependent verification means speakers need to choose the same passphrase to use during both enrollment and verification phases. 验证语音内容和语音签名有助于实现多重验证方案;与文本无关的验证意味着演讲者可以在注册和验证短语中以日常语言说话。The verification of both speech content and voice signature facilitates a multi-factor verification scenario; Text-independent verification means speakers can speak in everyday language in the enrollment and verification phrases.

依赖于文本说话人验证Text Dependent Speaker Verification

在 "扬声器注册" 阶段中,通过从一组预定义短语中口述密码来记录扬声器的声音。In the speaker enrollment phase, the speaker's voice is recorded by saying a passphrase from a set of predefined phrases. 在识别所选通行短语时,将从音频录音中提取语音功能,以形成唯一的语音签名。Voice features are extracted from the audio recording to form a unique voice signature while the chosen passphrase is recognized. 语音签名和通行短语一起用于验证扬声器。Together, the voice signature and the passphrase would be used to verify the speaker.

在验证阶段,将与要验证的个人关联的 ID 发送到发言人验证 API。In the verification phase, the ID associated with the individual to be verified is sent to the speaker verification API. 扬声器验证服务从输入语音记录中提取语音功能和密码。The speaker verification service extracts voice features and the passphrase from the input speech recording. 然后,它将语音功能和密码与相应发言人的注册配置文件进行比较。Then it compares the voice features and the passphrase against the enrollment profile of the corresponding speaker.

响应返回 "接受" 或 "拒绝",其相似性分数范围从0到1。The response returns "Accept" or "Reject" with a similarity score ranging from 0 to 1. "接受" 或 "拒绝" 响应是同时结合了扬声器验证结果和语音识别结果的结果,而相似性评分仅度量语音相似性。The "Accept" or "Reject" response is a result combining both the speaker verification result and speech recognition result, while the similarity score only measures the voice similarity. 当语音识别结果与注册短语匹配并且语音相似性分数大于或等于0.5 时,我们返回 "Accept"。We return "Accept" when the speech recognition result matches the enrollment phrase and the voice similarity score is greater or equal to 0.5. 但是,结果应根据方案和正在使用的其他验证因素来确定。However, the result should be determined based on the scenario and other verification factors that are being used. 建议你对自己的数据进行试验,并根据需要确定阈值以替代 "接受" 或 "拒绝" 响应。We recommend you experiment on your own data and determine your threshold to override "Accept" or "Reject" response as appropriate.

在文本相关扬声器验证 API 的当前版本中,我们提供了10个英文短语,供扬声器选择。In current version of text-dependent speaker verification API, we provide 10 English phrases for the speakers to choose from.

  • 我要让他的产品/服务无法拒绝。I am going to make him an offer he cannot refuse.
  • 休斯顿我们遇到了问题。Houston we have had a problem.
  • 我的电话是我的 passport 验证。My voice is my passport verify me.
  • 牙膏后,Apple 汁偏好。Apple juice tastes funny after toothpaste.
  • 无需密码即可获得。You can get in without your password.
  • 你现在可以激活安全系统。You can activate security system now.
  • 我的语音比密码更强。My voice is stronger than passwords.
  • 我的密码不是您的企业。My password is not your business.
  • 我的名字未知。My name is unknown to you.
  • 自行获得 "Be yourself everyone else is already taken"

可以通过将单独的请求发送到与文本无关的扬声器验证 API 和语音到文本 API 来创建自己的密码。You can create your own passphrases by sending separate requests to the text-independent speaker verification API and speech-to-text API. 将扬声器验证结果与语音识别结果相结合,可以确定扬声器的标识。Combining the speaker verification result and speech recognition result, you can determine the speaker's identity.

这些 API 并不旨在用于确定音频是来自真实的个人、模仿内容,还是已注册的说话人的录音。The APIs are not intended to determine whether the audio is from a live person or an imitation or a recording of an enrolled speaker. 为要读取的扬声器生成随机短语被视为有效,以防止重播攻击。Generating random phrases for the speaker to read is considered effective to prevent replay attack.

独立于文本说话人验证Text Independent Speaker Verification

说话人验证也可以是独立于文本的,这意味着音频中的演讲者会出现不会限制。Speaker Verification can also be text-independent, which means that there are no restrictions on what the speaker says in the audio.

在注册阶段,会从扬声器的音频中提取语音功能,以形成唯一的语音签名。In the enrollment phase, voice features are extracted from a speaker's audio to form a unique voice signature.

在验证阶段,将音频和与要验证的个人关联的 ID 发送到扬声器验证 API。In the verification phase, the audio and the ID associated with the individual to be verified are sent to the speaker verification API. 扬声器验证服务从输入语音记录中提取语音功能。The speaker verification service extracts voice features from the input speech recording. 然后,它将语音功能与相应发言人的注册配置文件中的语音签名进行比较。Then it compares the voice features against the voice signature in enrollment profile of the corresponding speaker.

响应返回 "接受" 或 "拒绝",其相似性分数范围从0到1。The response returns "Accept" or "Reject" with a similarity score ranging from 0 to 1. 当相似性分数大于或等于0.5 时,将返回 "Accept" 响应。The "Accept" response is returned when the similarity score is greater or equal to 0.5. 但是,结果应根据方案和正在使用的其他验证因素来确定。However, the result should be determined based on the scenario and other verification factors that are being used. 建议你对自己的数据进行试验,并根据需要确定阈值以替代 "接受" 或 "拒绝" 响应。We recommend you experiment on your own data and determine your threshold to override "Accept" or "Reject" response as appropriate.

这些 API 并不旨在用于确定音频是来自真实的个人、模仿内容,还是已注册的说话人的录音。The APIs are not intended to determine whether the audio is from a live person or an imitation or a recording of an enrolled speaker.

说话人识别Speaker Identification

演讲者识别是指在一组候选扬声器之间确定未知声音的标识的任务。Speaker identification is the task of determining the identity of an unknown voice among a set of candidate speakers. 根据所提供的 Id 列表,发言人识别 API 返回 "最佳匹配" 列表。The Speaker Identification API returns a list of "best matches" based on the similarity scores against a provided list of IDs. 演讲者识别 API 与文本无关,因为它不会比较注册和识别中所述的内容。The Speaker Identification API is text-independent as it does not compare what was said at the enrollment and recognition.

与文本无关的发言人标识Text Independent Speaker Identification

说话人识别注册不依赖文本。也就是说,对说话人在音频中所说的内容没有限制。Enrollment for speaker identification is text-independent, which means that there are no restrictions on what the speaker says in the audio. 无需通行短语。No passphrase is required. 在注册阶段,将为说话人录音,并提取语音特征来构成唯一的语音签名。In the enrollment phase, the speaker's voice is recorded, and voice features are extracted to form a unique voice signature.

在识别阶段,说话人识别服务将从输入的录音中提取语音特征。In the identification phase, the speaker identification service extracts voice features from the input speech recording. 然后,它将功能与指定扬声器列表(每个请求中最多50个候选发言人)的注册数据中的语音签名进行比较。Then it compares the features against the voice signatures in the enrollment data of a specified list of speakers (up to 50 candidate speakers in each request). 响应包含一个标识 ID 和5个排名排名的 id,其中的相似性评分范围为0到1。The response included one identified ID and five top-ranked IDs with similarity scores ranging from 0 to 1. 标识的 ID 是根据最匹配的发言人的相似性分数确定的。The identified ID is determined based on the similarity score of the best matched speaker. 如果不可用的候选扬声器未返回大于或等于0.5 的相似性分数,则响应将返回表示 "找不到匹配" 的零个字符串。If none of the candidate speakers returns a similarity score of greater or equal than 0.5, the response returns a string of zero to represent "no match is found". 但是,结果应根据你的方案和正在使用的其他因素来确定。However, the result should be determined based on your scenario and other factors that are being used. 建议你对数据进行试验,并根据需要确定阈值以替代默认的 "匹配或不匹配"。We recommend you experiment with your data and determine your threshold to override the default "match or no match" as appropriate.

这些 API 并不旨在用于确定音频是来自真实的个人、模仿内容,还是已注册的说话人的录音。The APIs are not intended to determine whether the audio is from a live person or an imitation or a recording of an enrolled speaker.

另请参阅See Also