您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

说话人识别入门Get started with Speaker Recognition

本快速入门介绍使用语音 SDK 进行说话人识别的基本设计模式,其中包括:In this quickstart, you learn basic design patterns for Speaker Recognition using the Speech SDK, including:

  • 依赖于文本和独立于文本的验证Text-dependent and text-independent verification
  • 用于识别一组语音中的语音样本的说话人识别Speaker identification to identify a voice sample among a group of voices
  • 删除语音配置文件Deleting voice profiles

若要深入了解语音识别概念,请参阅概述一文。For a high-level look at Speech Recognition concepts, see the overview article.

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

重要

目前仅 westus 区域中创建的 Azure 语音资源支持说话人识别。Speaker Recognition is currently only supported in Azure Speech resources created in the westus region.

安装语音 SDKInstall the Speech SDK

你需要先安装语音 SDK,然后才能执行任何操作。Before you can do anything, you'll need to install the Speech SDK. 根据你的平台,使用以下说明:Depending on your platform, use the following instructions:

导入依赖项Import dependencies

若要运行本文中的示例,请在脚本的最前面包含以下 using 语句。To run the examples in this article, include the following using statements at the top of your script.

using System;
using System.Collections.Generic;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

创建语音配置Create a speech configuration

若要使用语音 SDK 调用语音服务,需要创建 SpeechConfigTo call the Speech service using the Speech SDK, you need to create a SpeechConfig. 在此示例中,你将使用订阅密钥和区域创建一个 SpeechConfigIn this example, you create a SpeechConfig using a subscription key and region. 此外,你将创建一些基本的样板代码,在本文的余下部分,你将修改这些代码以进行不同的自定义操作。You also create some basic boilerplate code to use for the rest of this article, which you modify for different customizations.

请注意,区域设置为 westus,因为它是该服务唯一支持的区域。Note that the region is set to westus, as it is the only supported region for the service.

public class Program 
{
    static async Task Main(string[] args)
    {
        // replace with your own subscription key 
        string subscriptionKey = "YourSubscriptionKey";
        string region = "westus";
        var config = SpeechConfig.FromSubscription(subscriptionKey, region);
    }
}

依赖于文本的验证Text-dependent verification

说话人验证是确认说话人与已知或已注册的语音匹配的操作。Speaker Verification is the act of confirming that a speaker matches a known, or enrolled voice. 第一步是注册语音配置文件,以便该服务有可与将来的语音样本进行比较的内容。The first step is to enroll a voice profile, so that the service has something to compare future voice samples against. 在此示例中,使用依赖于文本的策略注册配置文件,该策略需要用于注册和验证的特定通行短语。In this example, you enroll the profile using a text-dependent strategy, which requires a specific pass-phrase to use for both enrollment and verification. 有关支持的通行短语的列表,请参阅参考文档See the reference docs for a list of supported pass-phrases.

首先在 Program 类中创建以下函数以注册语音配置文件。Start by creating the following function in your Program class to enroll a voice profile.

public static async Task VerificationEnroll(SpeechConfig config, Dictionary<string, string> profileMapping)
{
    using (var client = new VoiceProfileClient(config))
    using (var profile = await client.CreateProfileAsync(VoiceProfileType.TextDependentVerification, "en-us"))
    {
        using (var audioInput = AudioConfig.FromDefaultMicrophoneInput())
        {
            Console.WriteLine($"Enrolling profile id {profile.Id}.");
            // give the profile a human-readable display name
            profileMapping.Add(profile.Id, "Your Name");

            VoiceProfileEnrollmentResult result = null;
            while (result is null || result.RemainingEnrollmentsCount > 0)
            {
                Console.WriteLine("Speak the passphrase, \"My voice is my passport, verify me.\"");
                result = await client.EnrollProfileAsync(profile, audioInput);
                Console.WriteLine($"Remaining enrollments needed: {result.RemainingEnrollmentsCount}");
                Console.WriteLine("");
            }
            
            if (result.Reason == ResultReason.EnrolledVoiceProfile)
            {
                await SpeakerVerify(config, profile, profileMapping);
            }
            else if (result.Reason == ResultReason.Canceled)
            {
                var cancellation = VoiceProfileEnrollmentCancellationDetails.FromResult(result);
                Console.WriteLine($"CANCELED {profile.Id}: ErrorCode={cancellation.ErrorCode} ErrorDetails={cancellation.ErrorDetails}");
            }
        }
    }
}

在此函数中,await client.CreateProfileAsync() 实际会创建新的语音配置文件。In this function, await client.CreateProfileAsync() is what actually creates the new voice profile. 创建后,指定音频示例的输入方式,使用本示例中的 AudioConfig.FromDefaultMicrophoneInput() 捕获来自默认输入设备的音频。After it is created, you specify how you will input audio samples, using AudioConfig.FromDefaultMicrophoneInput() in this example to capture audio from your default input device. 接下来,注册 while 循环中的音频示例,该循环跟踪注册所必须的剩余的示例数。Next, you enroll audio samples in a while loop that tracks the number of samples remaining, and required, for enrollment. 在每次迭代中,client.EnrollProfileAsync(profile, audioInput) 都会提示你对着麦克风说出通行短语,并将该示例添加到语音配置文件。In each iteration, client.EnrollProfileAsync(profile, audioInput) will prompt you to speak the pass-phrase into your microphone, and add the sample to the voice profile.

注册完成后,调用 await SpeakerVerify(config, profile, profileMapping) 以针对刚创建的配置文件进行验证。After enrollment is completed, you call await SpeakerVerify(config, profile, profileMapping) to verify against the profile you just created. 添加另一个函数以定义 SpeakerVerifyAdd another function to define SpeakerVerify.

public static async Task SpeakerVerify(SpeechConfig config, VoiceProfile profile, Dictionary<string, string> profileMapping)
{
    var speakerRecognizer = new SpeakerRecognizer(config, AudioConfig.FromDefaultMicrophoneInput());
    var model = SpeakerVerificationModel.FromProfile(profile);

    Console.WriteLine("Speak the passphrase to verify: \"My voice is my passport, please verify me.\"");
    var result = await speakerRecognizer.RecognizeOnceAsync(model);
    Console.WriteLine($"Verified voice profile for speaker {profileMapping[result.ProfileId]}, score is {result.Score}");
}

在此函数中,传递刚创建的 VoiceProfile 对象以初始化要验证的模型。In this function, you pass the VoiceProfile object you just created to initialize a model to verify against. 接下来,await speakerRecognizer.RecognizeOnceAsync(model) 提示你再次说出通行短语,但这次将针对语音配置文件进行验证,并返回范围在 0.0-1.0 之间的相似性分数。Next, await speakerRecognizer.RecognizeOnceAsync(model) prompts you to speak the pass-phrase again, but this time it will validate it against your voice profile and return a similarity score ranging from 0.0-1.0. result 对象还会根据通行短语是否匹配返回 AcceptRejectThe result object also returns Accept or Reject, based on whether or not the pass-phrase matches.

接下来,修改 Main() 函数以调用你创建的新函数。Next, modify your Main() function to call the new functions you created. 此外,请注意,创建 Dictionary<string, string> 以通过函数调用传递引用。Additionally, note that you create a Dictionary<string, string> to pass by reference through your function calls. 出现这种情况的原因是,该服务不允许使用已创建的 VoiceProfile 存储用户可读的名称,并且出于隐私目的仅存储 ID 号。The reason for this is that the service does not allow storing a human-readable name with a created VoiceProfile, and only stores an ID number for privacy purposes. VerificationEnroll 函数中,使用新创建的 ID 以及文本名称向此字典添加一个条目。In the VerificationEnroll function, you add to this dictionary an entry with the newly created ID, along with a text name. 在需要显示用户可读名称的应用程序开发方案中,必须将此映射存储到某个位置,该服务无法对其进行存储。In application development scenarios where you need to display a human-readable name, you must store this mapping somewhere, the service cannot store it.

static async Task Main(string[] args)
{
    string subscriptionKey = "YourSubscriptionKey";
    string region = "westus";
    var config = SpeechConfig.FromSubscription(subscriptionKey, region);

    // persist profileMapping if you want to store a record of who the profile is
    var profileMapping = new Dictionary<string, string>();
    await VerificationEnroll(config, profileMapping);

    Console.ReadLine();
}

运行该脚本,系统会提示你说出短语“我的语音是我的 passport,验证我”三次进行注册,再说一次进行验证。Run the script, and you are prompted to speak the phrase My voice is my passport, verify me three times for enrollment, and one additional time for verification. 返回的结果是相似性分数,可用于创建自己的自定义阈值进行验证。The result returned is the similarity score, which you can use to create your own custom thresholds for verification.

Enrolling profile id 87-2cef-4dff-995b-dcefb64e203f.
Speak the passphrase, "My voice is my passport, verify me."
Remaining enrollments needed: 2

Speak the passphrase, "My voice is my passport, verify me."
Remaining enrollments needed: 1

Speak the passphrase, "My voice is my passport, verify me."
Remaining enrollments needed: 0

Speak the passphrase to verify: "My voice is my passport, verify me."
Verified voice profile for speaker Your Name, score is 0.915581

独立于文本的验证Text-independent verification

与依赖于文本的验证不同,独立于文本的验证 :In contrast to text-dependent verification, text-independent verification:

  • 不需要说出某个通行短语,可以说任何内容Does not require a certain pass-phrase to be spoken, anything can be spoken
  • 不需要三个音频样本,但需要总共 20 秒的音频Does not require three audio samples, but does require 20-seconds of total audio

VerificationEnroll 函数进行一些简单的更改,以便切换到独立于文本的验证。Make a couple simple changes to your VerificationEnroll function to switch to text-independent verification. 首先,将验证类型更改为 VoiceProfileType.TextIndependentVerificationFirst, you change the verification type to VoiceProfileType.TextIndependentVerification. 接下来,更改 while 循环以跟踪 result.RemainingEnrollmentsSpeechLength,这将继续提示你说话,直到捕获 20 秒的音频。Next, change the while loop to track result.RemainingEnrollmentsSpeechLength, which will continue to prompt you to speak until 20 seconds of audio have been captured.

public static async Task VerificationEnroll(SpeechConfig config, Dictionary<string, string> profileMapping)
{
    using (var client = new VoiceProfileClient(config))
    using (var profile = await client.CreateProfileAsync(VoiceProfileType.TextIndependentVerification, "en-us"))
    {
        using (var audioInput = AudioConfig.FromDefaultMicrophoneInput())
        {
            Console.WriteLine($"Enrolling profile id {profile.Id}.");
            // give the profile a human-readable display name
            profileMapping.Add(profile.Id, "Your Name");

            VoiceProfileEnrollmentResult result = null;
            while (result is null || result.RemainingEnrollmentsSpeechLength > TimeSpan.Zero)
            {
                Console.WriteLine("Continue speaking to add to the profile enrollment sample.");
                result = await client.EnrollProfileAsync(profile, audioInput);
                Console.WriteLine($"Remaining enrollment audio time needed: {result.RemainingEnrollmentsSpeechLength}");
                Console.WriteLine("");
            }
            
            if (result.Reason == ResultReason.EnrolledVoiceProfile)
            {
                await SpeakerVerify(config, profile, profileMapping);
            }
            else if (result.Reason == ResultReason.Canceled)
            {
                var cancellation = VoiceProfileEnrollmentCancellationDetails.FromResult(result);
                Console.WriteLine($"CANCELED {profile.Id}: ErrorCode={cancellation.ErrorCode} ErrorDetails={cancellation.ErrorDetails}");
            }
        }
    }
}

再次运行该程序,并在验证阶段说任何内容,因为无需通行短语。Run the program again, and speak anything during the verification phase since a pass-phrase is not required. 同样,将返回相似性分数。Again, the similarity score is returned.

Enrolling profile id 4tt87d4-f2d3-44ae-b5b4-f1a8d4036ee9.
Continue speaking to add to the profile enrollment sample.
Remaining enrollment audio time needed: 00:00:15.3200000

Continue speaking to add to the profile enrollment sample.
Remaining enrollment audio time needed: 00:00:09.8100008

Continue speaking to add to the profile enrollment sample.
Remaining enrollment audio time needed: 00:00:05.1900000

Continue speaking to add to the profile enrollment sample.
Remaining enrollment audio time needed: 00:00:00.8700000

Continue speaking to add to the profile enrollment sample.
Remaining enrollment audio time needed: 00:00:00

Speak the passphrase to verify: "My voice is my passport, please verify me."
Verified voice profile for speaker Your Name, score is 0.849409

说话人识别Speaker identification

说话人识别用于确定谁正在一组给定的注册语音中说话。Speaker Identification is used to determine who is speaking from a given group of enrolled voices. 此过程与独立于文本的验证非常相似,主要区别在于前者能够一次针对多个语音配置文件进行验证,而不是针对单个配置文件进行验证。The process is very similar to text-independent verification, with the main difference being able to verify against multiple voice profiles at once, rather than verifying against a single profile.

创建函数 IdentificationEnroll 以注册多个语音配置文件。Create a function IdentificationEnroll to enroll multiple voice profiles. 每个配置文件的注册过程与独立于文本的验证的注册过程相同,并且每个配置文件需要 20 秒的音频。The enrollment process for each profile is the same as the enrollment process for text-independent verification, and requires 20 seconds of audio for each profile. 此函数接受字符串 profileNames 的列表,并且将为列表中的每个名称创建新的语音配置文件。This function accepts a list of strings profileNames, and will create a new voice profile for each name in the list. 此函数返回 VoiceProfile 对象的列表,这些对象在下一个函数中用于识别说话人。The function returns a list of VoiceProfile objects, which you use in the next function for identifying a speaker.

public static async Task<List<VoiceProfile>> IdentificationEnroll(SpeechConfig config, List<string> profileNames, Dictionary<string, string> profileMapping)
{
    List<VoiceProfile> voiceProfiles = new List<VoiceProfile>();
    using (var client = new VoiceProfileClient(config))
    {
        foreach (string name in profileNames)
        {
            using (var audioInput = AudioConfig.FromDefaultMicrophoneInput())
            {
                var profile = await client.CreateProfileAsync(VoiceProfileType.TextIndependentIdentification, "en-us");
                Console.WriteLine($"Creating voice profile for {name}.");
                profileMapping.Add(profile.Id, name);

                VoiceProfileEnrollmentResult result = null;
                while (result is null || result.RemainingEnrollmentsSpeechLength > TimeSpan.Zero)
                {
                    Console.WriteLine($"Continue speaking to add to the profile enrollment sample for {name}.");
                    result = await client.EnrollProfileAsync(profile, audioInput);
                    Console.WriteLine($"Remaining enrollment audio time needed: {result.RemainingEnrollmentsSpeechLength}");
                    Console.WriteLine("");
                }
                voiceProfiles.Add(profile);
            }
        }
    }
    return voiceProfiles;
}

创建以下函数 SpeakerIdentification 以提交识别请求。Create the following function SpeakerIdentification to submit an identification request. 与说话人验证请求相比,此函数中的主要差异在于使用 SpeakerIdentificationModel.FromProfiles(),它接受 VoiceProfile 对象的列表。The main difference in this function compared to a speaker verification request is the use of SpeakerIdentificationModel.FromProfiles(), which accepts a list of VoiceProfile objects.

public static async Task SpeakerIdentification(SpeechConfig config, List<VoiceProfile> voiceProfiles, Dictionary<string, string> profileMapping) 
{
    var speakerRecognizer = new SpeakerRecognizer(config, AudioConfig.FromDefaultMicrophoneInput());
    var model = SpeakerIdentificationModel.FromProfiles(voiceProfiles);

    Console.WriteLine("Speak some text to identify who it is from your list of enrolled speakers.");
    var result = await speakerRecognizer.RecognizeOnceAsync(model);
    Console.WriteLine($"The most similar voice profile is {profileMapping[result.ProfileId]} with similarity score {result.Score}");
}

Main() 函数更改为以下函数。Change your Main() function to the following. 创建字符串 profileNames 的列表,将这些字符串传递到 IdentificationEnroll() 函数。You create a list of strings profileNames, which you pass to your IdentificationEnroll() function. 这会提示你为此列表中的每个名称创建新的语音配置文件,因此可以添加更多名称,为好友或同事创建其他配置文件。This will prompt you to create a new voice profile for each name in this list, so you can add more names to create additional profiles for friends or colleagues.

static async Task Main(string[] args)
{
    // replace with your own subscription key 
    string subscriptionKey = "YourSubscriptionKey";
    string region = "westus";
    var config = SpeechConfig.FromSubscription(subscriptionKey, region);

    // persist profileMapping if you want to store a record of who the profile is
    var profileMapping = new Dictionary<string, string>();
    var profileNames = new List<string>() { "Your name", "A friend's name" };
    
    var enrolledProfiles = await IdentificationEnroll(config, profileNames, profileMapping);
    await SpeakerIdentification(config, enrolledProfiles, profileMapping);

    foreach (var profile in enrolledProfiles)
    {
        profile.Dispose();
    }
    Console.ReadLine();
}

运行该脚本,系统会提示你说话以注册第一个配置文件的语音示例。Run the script, and you are prompted to speak to enroll voice samples for the first profile. 注册完成后,系统会提示你对列表 profileNames 中的每个名称重复此过程。After the enrollment is completed, you are prompted to repeat this process for each name in the list profileNames. 每个注册完成后,系统会提示你让任何人说话,该服务将尝试从已注册的语音配置文件中识别此人。After each enrollment is finished, you are prompted to have anyone speak, and the service will attempt to identify this person from among your enrolled voice profiles.

此示例仅返回最接近的匹配项,即相似性分数,但可以通过将 string json = result.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult) 添加到 SpeakerIdentification 函数,获取包含前五个相似性分数的完整响应。This example returns only the closest match and it's similarity score, but you can get the full response that includes the top five similarity scores by adding string json = result.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult) to your SpeakerIdentification function.

更改音频输入类型Changing audio input type

本文中的示例使用默认设备麦克风作为音频示例的输入。The examples in this article use the default device microphone as input for audio samples. 但是,在需要使用音频文件而不是麦克风输入的情况下,只需将 AudioConfig.FromDefaultMicrophoneInput() 的任何实例更改为 AudioConfig.FromWavFileInput(path/to/your/file.wav) 即可切换到文件输入。However, in scenarios where you need to use audio files instead of microphone input, simply change any instance of AudioConfig.FromDefaultMicrophoneInput() to AudioConfig.FromWavFileInput(path/to/your/file.wav) to switch to a file input. 还可以具有混合输入,例如,使用麦克风进行注册,使用文件进行验证。You can also have mixed inputs, using a microphone for enrollment and files for verification, for example.

删除语音配置文件注册Deleting voice profile enrollments

若要删除已注册的配置文件,请对 VoiceProfileClient 对象使用 DeleteProfileAsync() 函数。To delete an enrolled profile, use the DeleteProfileAsync() function on the VoiceProfileClient object. 以下示例函数演示如何从已知的语音配置文件 ID 中删除语音配置文件。The following example function shows how to delete a voice profile from a known voice profile ID.

public static async Task DeleteProfile(SpeechConfig config, string profileId) 
{
    using (var client = new VoiceProfileClient(config))
    {
        var profile = new VoiceProfile(profileId);
        await client.DeleteProfileAsync(profile);
    }
}

本快速入门介绍使用语音 SDK 进行说话人识别的基本设计模式,其中包括:In this quickstart, you learn basic design patterns for Speaker Recognition using the Speech SDK, including:

  • 依赖于文本和独立于文本的验证Text-dependent and text-independent verification
  • 用于识别一组语音中的语音样本的说话人识别Speaker identification to identify a voice sample among a group of voices
  • 删除语音配置文件Deleting voice profiles

若要深入了解语音识别概念,请参阅概述一文。For a high-level look at Speech Recognition concepts, see the overview article.

跳转到 GitHub 上的示例Skip to samples on GitHub

如果要直接跳到示例代码,请参阅 GitHub 上的 C++ 快速入门示例If you want to skip straight to sample code, see the C++ quickstart samples on GitHub.

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

重要

目前仅 westus 区域中创建的 Azure 语音资源支持说话人识别。Speaker Recognition is currently only supported in Azure Speech resources created in the westus region.

安装语音 SDKInstall the Speech SDK

你需要先安装语音 SDK,然后才能执行任何操作。Before you can do anything, you'll need to install the Speech SDK. 根据你的平台,使用以下说明:Depending on your platform, use the following instructions:

导入依赖项Import dependencies

若要运行本文中的示例,请在 .cpp 文件的最前面添加以下语句。To run the examples in this article, add the following statements at the top of your .cpp file.

#include <iostream>
#include <stdexcept>
// Note: Install the NuGet package Microsoft.CognitiveServices.Speech.
#include <speechapi_cxx.h>

using namespace std;
using namespace Microsoft::CognitiveServices::Speech;

// Note: Change the locale if desired.
auto profile_locale = "en-us";
auto audio_config = Audio::AudioConfig::FromDefaultMicrophoneInput();
auto ticks_per_second = 10000000;

创建语音配置Create a speech configuration

若要使用语音 SDK 调用语音服务,需要创建 SpeechConfigTo call the Speech service using the Speech SDK, you need to create a SpeechConfig. 此类包含有关你的订阅的信息,例如你的密钥和关联的区域、终结点、主机或授权令牌。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.

shared_ptr<SpeechConfig> GetSpeechConfig()
{
    char* subscription_key = nullptr;
    char* region = nullptr;
    size_t sz = 0;
    _dupenv_s(&subscription_key, &sz, "SPEECH_SUBSCRIPTION_KEY");
    _dupenv_s(&region, &sz, "SPEECH_REGION");
    if (subscription_key == nullptr) {
        throw std::invalid_argument("Please set the environment variable SPEECH_SUBSCRIPTION_KEY.");
    }
    if (region == nullptr) {
        throw std::invalid_argument("Please set the environment variable SPEECH_REGION.");
    }
    auto config = SpeechConfig::FromSubscription(subscription_key, region);
    free(subscription_key);
    free(region);
    return config;
}

依赖于文本的验证Text-dependent verification

说话人验证是确认说话人与已知或已注册的语音匹配的操作。Speaker Verification is the act of confirming that a speaker matches a known, or enrolled voice. 第一步是注册语音配置文件,以便该服务有可与将来的语音样本进行比较的内容。The first step is to enroll a voice profile, so that the service has something to compare future voice samples against. 在此示例中,使用依赖于文本的策略注册配置文件,该策略需要用于注册和验证的特定密码。In this example, you enroll the profile using a text-dependent strategy, which requires a specific passphrase to use for both enrollment and verification. 有关支持的密码的列表,请参阅参考文档See the reference docs for a list of supported passphrases.

TextDependentVerification 函数TextDependentVerification function

请先创建 TextDependentVerification 函数。Start by creating the TextDependentVerification function.

void TextDependentVerification(shared_ptr<VoiceProfileClient> client, shared_ptr<SpeakerRecognizer> recognizer)
{
    std::cout << "Text Dependent Verification:\n\n";
    // Create the profile.
    auto profile = client->CreateProfileAsync(VoiceProfileType::TextDependentVerification, profile_locale).get();
    std::cout << "Created profile ID: " << profile->GetId() << "\n";
    AddEnrollmentsToTextDependentProfile(client, profile);
    SpeakerVerify(profile, recognizer);
    // Delete the profile.
    client->DeleteProfileAsync(profile);
}

此函数使用 CreateProfileAsync 方法创建 VoiceProfile 对象。This function creates a VoiceProfile object with the CreateProfileAsync method. 请注意,有三种类型VoiceProfileNote there are three types of VoiceProfile:

  • TextIndependentIdentificationTextIndependentIdentification
  • TextDependentVerificationTextDependentVerification
  • TextIndependentVerificationTextIndependentVerification

在本例中,你将 VoiceProfileType::TextDependentVerification 传递到 CreateProfileAsyncIn this case you pass VoiceProfileType::TextDependentVerification to CreateProfileAsync.

然后调用接下来要定义的两个 helper 函数 AddEnrollmentsToTextDependentProfileSpeakerVerifyYou then call two helper functions that you'll define next, AddEnrollmentsToTextDependentProfile and SpeakerVerify. 最后,调用 DeleteProfileAsync,以清理该配置文件。Finally, call DeleteProfileAsync to clean up the profile.

AddEnrollmentsToTextDependentProfile 函数AddEnrollmentsToTextDependentProfile function

定义以下函数以注册语音配置文件。Define the following function to enroll a voice profile.

void AddEnrollmentsToTextDependentProfile(shared_ptr<VoiceProfileClient> client, shared_ptr<VoiceProfile> profile)
{
    shared_ptr<VoiceProfileEnrollmentResult> enroll_result = nullptr;
    while (enroll_result == nullptr || enroll_result->GetEnrollmentInfo(EnrollmentInfoType::RemainingEnrollmentsCount) > 0)
    {
        std::cout << "Please say the passphrase, \"My voice is my passport, verify me.\"\n";
        enroll_result = client->EnrollProfileAsync(profile, audio_config).get();
        std::cout << "Remaining enrollments needed: " << enroll_result->GetEnrollmentInfo(EnrollmentInfoType::RemainingEnrollmentsCount) << ".\n";
    }
    std::cout << "Enrollment completed.\n\n";
}

在此函数中,注册 while 循环中的音频示例,该循环跟踪注册所必需的剩余示例数。In this function, you enroll audio samples in a while loop that tracks the number of samples remaining, and required, for enrollment. 在每次迭代中,EnrollProfileAsync 都会提示你对着麦克风说出密码,并将该示例添加到语音配置文件。In each iteration, EnrollProfileAsync prompts you to speak the passphrase into your microphone, and adds the sample to the voice profile.

SpeakerVerify 函数SpeakerVerify function

按如下所示定义 SpeakerVerifyDefine SpeakerVerify as follows.

void SpeakerVerify(shared_ptr<VoiceProfile> profile, shared_ptr<SpeakerRecognizer> recognizer)
{
    shared_ptr<SpeakerVerificationModel> model = SpeakerVerificationModel::FromProfile(profile);
    std::cout << "Speak the passphrase to verify: \"My voice is my passport, verify me.\"\n";
    shared_ptr<SpeakerRecognitionResult> result = recognizer->RecognizeOnceAsync(model).get();
    std::cout << "Verified voice profile for speaker: " << result->ProfileId << ". Score is: " << result->GetScore() << ".\n\n";
}

在此函数中,使用 SpeakerVerificationModel::FromProfile 方法创建 SpeakerVerificationModel 对象,从而传入之前创建的 VoiceProfile 对象。In this function, you create a SpeakerVerificationModel object with the SpeakerVerificationModel::FromProfile method, passing in the VoiceProfile object you created earlier.

接下来,SpeechRecognizer::RecognizeOnceAsync 提示你再次说出密码,但这次将针对语音配置文件进行验证,并返回介于 0.0-1.0 之间的相似性分数。Next, SpeechRecognizer::RecognizeOnceAsync prompts you to speak the passphrase again, but this time it will validate it against your voice profile and return a similarity score ranging from 0.0-1.0. SpeakerRecognitionResult 对象还会根据密码是否匹配返回 AcceptRejectThe SpeakerRecognitionResult object also returns Accept or Reject, based on whether or not the passphrase matches.

独立于文本的验证Text-independent verification

与依赖于文本的验证不同,独立于文本的验证 :In contrast to text-dependent verification, text-independent verification:

  • 不需要说出特定的密码,可以说任何内容Does not require a certain passphrase to be spoken, anything can be spoken
  • 不需要三个音频样本,但需要总共 20 秒的音频Does not require three audio samples, but does require 20 seconds of total audio

TextIndependentVerification 函数TextIndependentVerification function

请先创建 TextIndependentVerification 函数。Start by creating the TextIndependentVerification function.

void TextIndependentVerification(shared_ptr<VoiceProfileClient> client, shared_ptr<SpeakerRecognizer> recognizer)
{
    std::cout << "Text Independent Verification:\n\n";
    // Create the profile.
    auto profile = client->CreateProfileAsync(VoiceProfileType::TextIndependentVerification, profile_locale).get();
    std::cout << "Created profile ID: " << profile->GetId() << "\n";
    AddEnrollmentsToTextIndependentProfile(client, profile);
    SpeakerVerify(profile, recognizer);
    // Delete the profile.
    client->DeleteProfileAsync(profile);
}

TextDependentVerification 函数一样,此函数使用 CreateProfileAsync 方法创建 VoiceProfile 对象。Like the TextDependentVerification function, this function creates a VoiceProfile object with the CreateProfileAsync method.

在本例中,你将 VoiceProfileType::TextIndependentVerification 传递到 CreateProfileAsyncIn this case you pass VoiceProfileType::TextIndependentVerification to CreateProfileAsync.

然后,调用两个 helper 函数:AddEnrollmentsToTextIndependentProfile(将在下一步定义)和 SpeakerVerify(已定义)。You then call two helper functions: AddEnrollmentsToTextIndependentProfile, which you'll define next, and SpeakerVerify, which you defined already. 最后,调用 DeleteProfileAsync,以清理该配置文件。Finally, call DeleteProfileAsync to clean up the profile.

AddEnrollmentsToTextIndependentProfileAddEnrollmentsToTextIndependentProfile

定义以下函数以注册语音配置文件。Define the following function to enroll a voice profile.

void AddEnrollmentsToTextIndependentProfile(shared_ptr<VoiceProfileClient> client, shared_ptr<VoiceProfile> profile)
{
    shared_ptr<VoiceProfileEnrollmentResult> enroll_result = nullptr;
    while (enroll_result == nullptr || enroll_result->GetEnrollmentInfo(EnrollmentInfoType::RemainingEnrollmentsSpeechLength) > 0)
    {
        std::cout << "Continue speaking to add to the profile enrollment sample.\n";
        enroll_result = client->EnrollProfileAsync(profile, audio_config).get();
        std::cout << "Remaining audio time needed: " << enroll_result->GetEnrollmentInfo(EnrollmentInfoType::RemainingEnrollmentsSpeechLength) / ticks_per_second << " seconds.\n";
    }
    std::cout << "Enrollment completed.\n\n";
}

在此函数中,注册 while 循环中的音频示例,该循环跟踪注册所必需的剩余音频秒数。In this function, you enroll audio samples in a while loop that tracks the number of seconds of audio remaining, and required, for enrollment. 在每次迭代中,EnrollProfileAsync 都会提示你对着麦克风讲话,并将该示例添加到语音配置文件。In each iteration, EnrollProfileAsync prompts you to speak into your microphone, and adds the sample to the voice profile.

说话人识别Speaker identification

说话人识别用于确定谁正在一组给定的注册语音中说话。Speaker Identification is used to determine who is speaking from a given group of enrolled voices. 此过程与独立于文本的验证非常相似,主要区别在于前者能够一次针对多个语音配置文件进行验证,而不是针对单个配置文件进行验证。The process is very similar to text-independent verification, with the main difference being able to verify against multiple voice profiles at once, rather than verifying against a single profile.

TextIndependentIdentification 函数TextIndependentIdentification function

请先创建 TextIndependentIdentification 函数。Start by creating the TextIndependentIdentification function.

void TextIndependentIdentification(shared_ptr<VoiceProfileClient> client, shared_ptr<SpeakerRecognizer> recognizer)
{
    std::cout << "Speaker Identification:\n\n";
    // Create the profile.
    auto profile = client->CreateProfileAsync(VoiceProfileType::TextIndependentIdentification, profile_locale).get();
    std::cout << "Created profile ID: " << profile->GetId() << "\n";
    AddEnrollmentsToTextIndependentProfile(client, profile);
    SpeakerIdentify(profile, recognizer);
    // Delete the profile.
    client->DeleteProfileAsync(profile);
}

TextDependentVerificationTextIndependentVerification 函数一样,此函数使用 CreateProfileAsync 方法创建 VoiceProfile 对象。Like the TextDependentVerification and TextIndependentVerification functions, this function creates a VoiceProfile object with the CreateProfileAsync method.

在本例中,你将 VoiceProfileType::TextIndependentIdentification 传递到 CreateProfileAsyncIn this case you pass VoiceProfileType::TextIndependentIdentification to CreateProfileAsync.

然后,调用两个 helper 函数:AddEnrollmentsToTextIndependentProfile(已定义)和 SpeakerIdentify(将在下一步定义)。You then call two helper functions: AddEnrollmentsToTextIndependentProfile, which you defined already, and SpeakerIdentify, which you'll define next. 最后,调用 DeleteProfileAsync,以清理该配置文件。Finally, call DeleteProfileAsync to clean up the profile.

SpeakerIdentify 函数SpeakerIdentify function

按如下所示定义 SpeakerIdentify 函数。Define the SpeakerIdentify function as follows.

void SpeakerIdentify(shared_ptr<VoiceProfile> profile, shared_ptr<SpeakerRecognizer> recognizer)
{
    shared_ptr<SpeakerIdentificationModel> model = SpeakerIdentificationModel::FromProfiles({ profile });
    // Note: We need at least four seconds of audio after pauses are subtracted.
    std::cout << "Please speak for at least ten seconds to identify who it is from your list of enrolled speakers.\n";
    shared_ptr<SpeakerRecognitionResult> result = recognizer->RecognizeOnceAsync(model).get();
    std::cout << "The most similar voice profile is: " << result->ProfileId << " with similarity score: " << result->GetScore() << ".\n\n";
}

在此函数中,使用 SpeakerIdentificationModel::FromProfiles 方法创建 SpeakerIdentificationModel 对象。In this function, you create a SpeakerIdentificationModel object with the SpeakerIdentificationModel::FromProfiles method. SpeakerIdentificationModel::FromProfiles 接受 VoiceProfile 对象的列表。SpeakerIdentificationModel::FromProfiles accepts a list of VoiceProfile objects. 在本例中,只需传入前面创建的 VoiceProfile 对象即可。In this case, you'll just pass in the VoiceProfile object you created earlier. 但是,如果需要,可以传入多个 VoiceProfile 对象,每个对象都注册了来自不同语音的音频示例。However, if you want, you can pass in multiple VoiceProfile objects, each enrolled with audio samples from a different voice.

接下来,SpeechRecognizer::RecognizeOnceAsync 将提示你再次讲话。Next, SpeechRecognizer::RecognizeOnceAsync prompts you to speak again. 这次,它会将你的语音与已注册的语音配置文件进行比较,并返回最相似的语音配置文件。This time it compares your voice to the enrolled voice profiles and returns the most similar voice profile.

main 函数Main function

最后,按如下所示定义 main 函数。Finally, define the main function as follows.

int main()
{
    auto speech_config = GetSpeechConfig();
    auto client = VoiceProfileClient::FromConfig(speech_config);
    auto recognizer = SpeakerRecognizer::FromConfig(speech_config, audio_config);
    TextDependentVerification(client, recognizer);
    TextIndependentVerification(client, recognizer);
    TextIndependentIdentification(client, recognizer);
    std::cout << "End of quickstart.\n";
}

此函数只调用你之前定义的函数。This function simply calls the functions you defined previously. 但首先,它会创建一个 VoiceProfileClient 对象和一个 SpeakerRecognizer 对象。First, though, it creates a VoiceProfileClient object and a SpeakerRecognizer object.

auto speech_config = GetSpeechConfig();
auto client = VoiceProfileClient::FromConfig(speech_config);
auto recognizer = SpeakerRecognizer::FromConfig(speech_config, audio_config);

VoiceProfileClient 用于创建、注册和删除语音配置文件。The VoiceProfileClient is used to create, enroll and delete voice profiles. SpeakerRecognizer 用于针对一个或多个已注册的语音配置文件验证语音样本。The SpeakerRecognizer is used to validate speech samples against one or more enrolled voice profiles.

更改音频输入类型Changing audio input type

本文中的示例使用默认设备麦克风作为音频示例的输入。The examples in this article use the default device microphone as input for audio samples. 但是,在需要使用音频文件而不是麦克风输入的情况下,只需更改以下行:However, in scenarios where you need to use audio files instead of microphone input, simply change the following line:

auto audio_config = Audio::AudioConfig::FromDefaultMicrophoneInput();

to:to:

auto audio_config = Audio::AudioConfig::FromWavFileInput(path/to/your/file.wav);

或者将 audio_config 的任何用途替换为 Audio::AudioConfig::FromWavFileInputOr replace any use of audio_config with Audio::AudioConfig::FromWavFileInput. 还可以具有混合输入,例如,使用麦克风进行注册,使用文件进行验证。You can also have mixed inputs, using a microphone for enrollment and files for verification, for example.

本快速入门介绍使用语音 SDK 进行说话人识别的基本设计模式,其中包括:In this quickstart, you learn basic design patterns for Speaker Recognition using the Speech SDK, including:

  • 依赖于文本和独立于文本的验证Text-dependent and text-independent verification
  • 用于识别一组语音中的语音样本的说话人识别Speaker identification to identify a voice sample among a group of voices
  • 删除语音配置文件Deleting voice profiles

若要深入了解语音识别概念,请参阅概述一文。For a high-level look at Speech Recognition concepts, see the overview article.

跳转到 GitHub 上的示例Skip to samples on GitHub

如果要直接跳到示例代码,请参阅 GitHub 上的 JavaScript 快速入门示例If you want to skip straight to sample code, see the JavaScript quickstart samples on GitHub.

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

重要

目前仅 westus 区域中创建的 Azure 语音资源支持说话人识别。Speaker Recognition is currently only supported in Azure Speech resources created in the westus region.

安装语音 SDKInstall the Speech SDK

需要先安装 JavaScript 语音 SDK,然后才能执行操作。Before you can do anything, you'll need to install the Speech SDK for JavaScript . 根据你的平台,使用以下说明:Depending on your platform, use the following instructions:

另外,请根据目标环境使用以下项之一:Additionally, depending on the target environment use one of the following:

下载并提取 JavaScript 语音 SDK microsoft.cognitiveservices.speech.sdk.bundle.js 文件,将其置于可供 HTML 文件访问的文件夹中。Download and extract the Speech SDK for JavaScript microsoft.cognitiveservices.speech.sdk.bundle.js file, and place it in a folder accessible to your HTML file.

<script src="microsoft.cognitiveservices.speech.sdk.bundle.js"></script>;

提示

如果以 Web 浏览器为目标并使用 <script> 标记,则不需 sdk 前缀。If you're targeting a web browser, and using the <script> tag; the sdk prefix is not needed. sdk 前缀是一个别名,用于为 require 模块命名。The sdk prefix is an alias used to name the require module.

导入依赖项Import dependencies

若要运行本文中的示例,请在 .js 文件的最前面添加以下语句。To run the examples in this article, add the following statements at the top of your .js file.

"use strict";

/* To run this sample, install:
npm install microsoft-cognitiveservices-speech-sdk
*/
var sdk = require("microsoft-cognitiveservices-speech-sdk");
var fs = require("fs");

// Note: Change the locale if desired.
const profile_locale = "en-us";

/* Note: passphrase_files and verify_file should contain paths to audio files that contain \"My voice is my passport, verify me.\"
You can obtain these files from:
https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/fa6428a0837779cbeae172688e0286625e340942/quickstart/javascript/node/speaker-recognition/verification
*/ 
const passphrase_files = ["myVoiceIsMyPassportVerifyMe01.wav", "myVoiceIsMyPassportVerifyMe02.wav", "myVoiceIsMyPassportVerifyMe03.wav"];
const verify_file = "myVoiceIsMyPassportVerifyMe04.wav";
/* Note: identify_file should contain a path to an audio file that uses the same voice as the other files, but contains different speech. You can obtain this file from:
https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/fa6428a0837779cbeae172688e0286625e340942/quickstart/javascript/node/speaker-recognition/identification
*/
const identify_file = "aboutSpeechSdk.wav";

const key_var = 'SPEECH_SUBSCRIPTION_KEY';
if (!process.env[key_var]) {
    throw new Error('please set/export the following environment variable: ' + key_var);
}
var subscription_key = process.env[key_var];

const region_var = 'SPEECH_REGION';
if (!process.env[region_var]) {
    throw new Error('please set/export the following environment variable: ' + region_var);
}
var region = process.env[region_var];

const ticks_per_second = 10000000;

这些语句将导入所需的库,并从环境变量中获取语音服务订阅密钥和区域。These statements import the required libraries and get your Speech service subscription key and region from your environment variables. 它们还指定了将在以下任务中使用的音频文件的路径。They also specify paths to audio files that you will use in the following tasks.

创建 helper 函数Create helper function

添加以下 helper 函数,以将音频文件读入流以供语音服务使用。Add the following helper function to read audio files into streams for use by the Speech service.

/* From: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/fa6428a0837779cbeae172688e0286625e340942/quickstart/javascript/node/speaker-recognition/verification/dependent-verification.js#L8
*/
function GetAudioConfigFromFile (file)
{
    let pushStream = sdk.AudioInputStream.createPushStream();
    fs.createReadStream(file).on("data", function(arrayBuffer) {
        pushStream.write(arrayBuffer.buffer);
    }).on("end", function() {
        pushStream.close();
    });
    return sdk.AudioConfig.fromStreamInput(pushStream);
}

在此函数中,使用 AudioInputStream.createPushStreamAudioConfig.fromStreamInput 方法创建 AudioConfig 对象。In this function, you use the AudioInputStream.createPushStream and AudioConfig.fromStreamInput methods to create an AudioConfig object. AudioConfig 对象表示音频流。This AudioConfig object represents an audio stream. 在以下任务中,你将使用其中几个 AudioConfig 对象。You will use several of these AudioConfig objects during the following tasks.

依赖于文本的验证Text-dependent verification

说话人验证是确认说话人与已知或已注册的语音匹配的操作。Speaker Verification is the act of confirming that a speaker matches a known, or enrolled voice. 第一步是注册语音配置文件,以便该服务有可与将来的语音样本进行比较的内容。The first step is to enroll a voice profile, so that the service has something to compare future voice samples against. 在此示例中,使用依赖于文本的策略注册配置文件,该策略需要用于注册和验证的特定密码。In this example, you enroll the profile using a text-dependent strategy, which requires a specific passphrase to use for both enrollment and verification. 有关支持的密码的列表,请参阅参考文档See the reference docs for a list of supported passphrases.

TextDependentVerification 函数TextDependentVerification function

请先创建 TextDependentVerification 函数。Start by creating the TextDependentVerification function.

async function TextDependentVerification(client, speech_config)
{
    console.log ("Text Dependent Verification:\n");
    var profile = null;
    try {
        // Create the profile.
        profile = await new Promise ((resolve, reject) => {
            client.createProfileAsync (sdk.VoiceProfileType.TextDependentVerification, profile_locale, result => { resolve(result); }, error => { reject(error); });
        });
        console.log ("Created profile ID: " + profile.profileId);
        await AddEnrollmentsToTextDependentProfile(client, profile, passphrase_files);
        const audio_config = GetAudioConfigFromFile(verify_file);
        const recognizer = new sdk.SpeakerRecognizer(speech_config, audio_config);
        await SpeakerVerify(profile, recognizer);
    }
    catch (error) {
        console.log ("Error:\n" + error);
    }
    finally {
        if (profile !== null) {
            console.log ("Deleting profile ID: " + profile.profileId);
            await new Promise ((resolve, reject) => {
                client.deleteProfileAsync (profile, result => { resolve(result); }, error => { reject(error); });
            });
        }
    }
}

此函数使用 VoiceProfileClient.createProfileAsync 方法创建 VoiceProfile 对象。This function creates a VoiceProfile object with the VoiceProfileClient.createProfileAsync method. 请注意,有三种类型VoiceProfileNote there are three types of VoiceProfile:

  • TextIndependentIdentificationTextIndependentIdentification
  • TextDependentVerificationTextDependentVerification
  • TextIndependentVerificationTextIndependentVerification

在本例中,你将 VoiceProfileType.TextDependentVerification 传递到 VoiceProfileClient.createProfileAsyncIn this case, you pass VoiceProfileType.TextDependentVerification to VoiceProfileClient.createProfileAsync.

然后调用接下来要定义的两个 helper 函数 AddEnrollmentsToTextDependentProfileSpeakerVerifyYou then call two helper functions that you'll define next, AddEnrollmentsToTextDependentProfile and SpeakerVerify. 最后,调用 VoiceProfileClient.deleteProfileAsync 以删除配置文件。Finally, call VoiceProfileClient.deleteProfileAsync to remove the profile.

AddEnrollmentsToTextDependentProfile 函数AddEnrollmentsToTextDependentProfile function

定义以下函数以注册语音配置文件。Define the following function to enroll a voice profile.

async function AddEnrollmentsToTextDependentProfile(client, profile, audio_files)
{
    for (var i = 0; i < audio_files.length; i++) {
        console.log ("Adding enrollment to text dependent profile...");
        const audio_config = GetAudioConfigFromFile (audio_files[i]);
        const result = await new Promise ((resolve, reject) => {
            client.enrollProfileAsync (profile, audio_config, result => { resolve(result); }, error => { reject(error); });
        });
        if (result.reason === sdk.ResultReason.Canceled) {
            throw(JSON.stringify(sdk.VoiceProfileEnrollmentCancellationDetails.fromResult(result)));
        }
        else {
            console.log ("Remaining enrollments needed: " + result.privDetails["remainingEnrollmentsCount"] + ".");
        }
    };
    console.log ("Enrollment completed.\n");
}

在此函数中,调用之前定义的 GetAudioConfigFromFile 函数,以从音频示例创建 AudioConfig 对象。In this function, you call the GetAudioConfigFromFile function you defined earlier to create AudioConfig objects from audio samples. 这些音频示例包含一个密码,例如“我的语音是我的通行证,请验证我。”These audio samples contain a passphrase such as "My voice is my passport, verify me." 然后,使用 VoiceProfileClient.enrollProfileAsync 方法注册这些音频示例。You then enroll these audio samples using the VoiceProfileClient.enrollProfileAsync method.

SpeakerVerify 函数SpeakerVerify function

按如下所示定义 SpeakerVerifyDefine SpeakerVerify as follows.

async function SpeakerVerify(profile, recognizer)
{
    const model = sdk.SpeakerVerificationModel.fromProfile(profile);
    const result = await new Promise ((resolve, reject) => {
        recognizer.recognizeOnceAsync (model, result => { resolve(result); }, error => { reject(error); });
    });
    console.log ("Verified voice profile for speaker: " + result.profileId + ". Score is: " + result.score + ".\n");
}

在此函数中,使用 SpeakerVerificationModel.FromProfile 方法创建 SpeakerVerificationModel 对象,从而传入之前创建的 VoiceProfile 对象。In this function, you create a SpeakerVerificationModel object with the SpeakerVerificationModel.FromProfile method, passing in the VoiceProfile object you created earlier.

接下来,调用 SpeechRecognizer.recognizeOnceAsync 方法,验证包含与之前注册的音频示例相同的密码的音频样本。Next, you call the SpeechRecognizer.recognizeOnceAsync method to validate an audio sample that contains the same passphrase as the audio samples you enrolled previously. SpeechRecognizer.recognizeOnceAsync 返回一个 SpeakerRecognitionResult 对象,其 score 属性包含介于 0.0-1.0 之间的相似性分数。SpeechRecognizer.recognizeOnceAsync returns a SpeakerRecognitionResult object, whose score property contains a similarity score ranging from 0.0-1.0. SpeakerRecognitionResult 对象还包含类型 ResultReasonreason 属性。The SpeakerRecognitionResult object also contains a reason property of type ResultReason. 如果验证成功,则 reason 属性的值应为 RecognizedSpeakerIf the verification was successful, the reason property should have value RecognizedSpeaker.

独立于文本的验证Text-independent verification

与依赖于文本的验证不同,独立于文本的验证 :In contrast to text-dependent verification, text-independent verification:

  • 不需要说出特定的密码,可以说任何内容Does not require a certain passphrase to be spoken, anything can be spoken
  • 不需要三个音频样本,但需要总共 20 秒的音频Does not require three audio samples, but does require 20 seconds of total audio

TextIndependentVerification 函数TextIndependentVerification function

请先创建 TextIndependentVerification 函数。Start by creating the TextIndependentVerification function.

async function TextIndependentVerification(client, speech_config)
{
    console.log ("Text Independent Verification:\n");
    var profile = null;
    try {
        // Create the profile.
        profile = await new Promise ((resolve, reject) => {
            client.createProfileAsync (sdk.VoiceProfileType.TextIndependentVerification, profile_locale, result => { resolve(result); }, error => { reject(error); });
        });
        console.log ("Created profile ID: " + profile.profileId);
        await AddEnrollmentsToTextIndependentProfile(client, profile, [identify_file]);
        const audio_config = GetAudioConfigFromFile(passphrase_files[0]);
        const recognizer = new sdk.SpeakerRecognizer(speech_config, audio_config);
        await SpeakerVerify(profile, recognizer);
    }
    catch (error) {
        console.log ("Error:\n" + error);
    }
    finally {
        if (profile !== null) {
            console.log ("Deleting profile ID: " + profile.profileId);
            await new Promise ((resolve, reject) => {
                client.deleteProfileAsync (profile, result => { resolve(result); }, error => { reject(error); });
            });
        }
    }
}

TextDependentVerification 函数一样,此函数使用 VoiceProfileClient.createProfileAsync 方法创建 VoiceProfile 对象。Like the TextDependentVerification function, this function creates a VoiceProfile object with the VoiceProfileClient.createProfileAsync method.

在本例中,你将 VoiceProfileType.TextIndependentVerification 传递到 createProfileAsyncIn this case, you pass VoiceProfileType.TextIndependentVerification to createProfileAsync.

然后,调用两个 helper 函数:AddEnrollmentsToTextIndependentProfile(将在下一步定义)和 SpeakerVerify(已定义)。You then call two helper functions: AddEnrollmentsToTextIndependentProfile, which you'll define next, and SpeakerVerify, which you defined already. 最后,调用 VoiceProfileClient.deleteProfileAsync 以删除配置文件。Finally, call VoiceProfileClient.deleteProfileAsync to remove the profile.

AddEnrollmentsToTextIndependentProfileAddEnrollmentsToTextIndependentProfile

定义以下函数以注册语音配置文件。Define the following function to enroll a voice profile.

async function AddEnrollmentsToTextIndependentProfile(client, profile, audio_files)
{
    for (var i = 0; i < audio_files.length; i++) {
        console.log ("Adding enrollment to text independent profile...");
        const audio_config = GetAudioConfigFromFile (audio_files[i]);
        const result = await new Promise ((resolve, reject) => {
            client.enrollProfileAsync (profile, audio_config, result => { resolve(result); }, error => { reject(error); });
        });
        if (result.reason === sdk.ResultReason.Canceled) {
            throw(JSON.stringify(sdk.VoiceProfileEnrollmentCancellationDetails.fromResult(result)));
        }
        else {
            console.log ("Remaining audio time needed: " + (result.privDetails["remainingEnrollmentsSpeechLength"] / ticks_per_second) + " seconds.");
        }
    }
    console.log ("Enrollment completed.\n");
}

在此函数中,调用之前定义的 GetAudioConfigFromFile 函数,以从音频示例创建 AudioConfig 对象。In this function, you call the GetAudioConfigFromFile function you defined earlier to create AudioConfig objects from audio samples. 然后,使用 VoiceProfileClient.enrollProfileAsync 方法注册这些音频示例。You then enroll these audio samples using the VoiceProfileClient.enrollProfileAsync method.

说话人识别Speaker identification

说话人识别用于确定谁正在一组给定的注册语音中说话。Speaker Identification is used to determine who is speaking from a given group of enrolled voices. 此过程与独立于文本的验证相似,主要区别在于前者能够一次针对多个语音配置文件进行验证,而不是针对单个配置文件进行验证。The process is similar to text-independent verification , with the main difference being able to verify against multiple voice profiles at once, rather than verifying against a single profile.

TextIndependentIdentification 函数TextIndependentIdentification function

请先创建 TextIndependentIdentification 函数。Start by creating the TextIndependentIdentification function.

async function TextIndependentIdentification(client, speech_config)
{
    console.log ("Text Independent Identification:\n");
    var profile = null;
    try {
        // Create the profile.
        profile = await new Promise ((resolve, reject) => {
            client.createProfileAsync (sdk.VoiceProfileType.TextIndependentIdentification, profile_locale, result => { resolve(result); }, error => { reject(error); });
        });
        console.log ("Created profile ID: " + profile.profileId);
        await AddEnrollmentsToTextIndependentProfile(client, profile, [identify_file]);
        const audio_config = GetAudioConfigFromFile(passphrase_files[0]);
        const recognizer = new sdk.SpeakerRecognizer(speech_config, audio_config);
        await SpeakerIdentify(profile, recognizer);
    }
    catch (error) {
        console.log ("Error:\n" + error);
    }
    finally {
        if (profile !== null) {
            console.log ("Deleting profile ID: " + profile.profileId);
            await new Promise ((resolve, reject) => {
                client.deleteProfileAsync (profile, result => { resolve(result); }, error => { reject(error); });
            });
        }
    }
}

TextDependentVerificationTextIndependentVerification 函数一样,此函数使用 VoiceProfileClient.createProfileAsync 方法创建 VoiceProfile 对象。Like the TextDependentVerification and TextIndependentVerification functions, this function creates a VoiceProfile object with the VoiceProfileClient.createProfileAsync method.

在本例中,你将 VoiceProfileType.TextIndependentIdentification 传递到 VoiceProfileClient.createProfileAsyncIn this case, you pass VoiceProfileType.TextIndependentIdentification to VoiceProfileClient.createProfileAsync.

然后,调用两个 helper 函数:AddEnrollmentsToTextIndependentProfile(已定义)和 SpeakerIdentify(将在下一步定义)。You then call two helper functions: AddEnrollmentsToTextIndependentProfile, which you defined already, and SpeakerIdentify, which you'll define next. 最后,调用 VoiceProfileClient.deleteProfileAsync 以删除配置文件。Finally, call VoiceProfileClient.deleteProfileAsync to remove the profile.

SpeakerIdentify 函数SpeakerIdentify function

按如下所示定义 SpeakerIdentify 函数。Define the SpeakerIdentify function as follows.

async function SpeakerIdentify(profile, recognizer)
{
    const model = sdk.SpeakerIdentificationModel.fromProfiles([profile]);
    const result = await new Promise ((resolve, reject) => {
        recognizer.recognizeOnceAsync (model, result => { resolve(result); }, error => { reject(error); });
    });
    console.log ("The most similar voice profile is: " + result.profileId + " with similarity score: " + result.score + ".\n");
}

在此函数中,使用 SpeakerIdentificationModel.fromProfiles 方法创建 SpeakerIdentificationModel 对象,从而传入之前创建的 VoiceProfile 对象。In this function, you create a SpeakerIdentificationModel object with the SpeakerIdentificationModel.fromProfiles method, passing in the VoiceProfile object you created earlier.

接下来,调用 SpeechRecognizer.recognizeOnceAsync 方法并传入音频示例。Next, you call the SpeechRecognizer.recognizeOnceAsync method and pass in an audio sample. SpeechRecognizer.recognizeOnceAsync 尝试根据用于创建 SpeakerIdentificationModelVoiceProfile 对象来标识此音频示例的语音。SpeechRecognizer.recognizeOnceAsync tries to identify the voice for this audio sample based on the VoiceProfile objects you used to create the SpeakerIdentificationModel. 它返回一个 SpeakerRecognitionResult 对象,该对象的 profileId 属性标识匹配的 VoiceProfile(如果有),而 score 属性包含介于 0.0-1.0 之间的相似性分数。It returns a SpeakerRecognitionResult object, whose profileId property identifies the matching VoiceProfile, if any, while the score property contains a similarity score ranging from 0.0-1.0.

main 函数Main function

最后,按如下所示定义 main 函数。Finally, define the main function as follows.

async function main() {
    const speech_config = sdk.SpeechConfig.fromSubscription(subscription_key, region);
    const client = new sdk.VoiceProfileClient(speech_config);

    await TextDependentVerification(client, speech_config);
    await TextIndependentVerification(client, speech_config);
    await TextIndependentIdentification(client, speech_config);
    console.log ("End of quickstart.");
}
main();

此函数创建一个 VoiceProfileClient 对象,该对象用于创建、注册和删除语音配置文件。This function creates a VoiceProfileClient object, which is used to create, enroll, and delete voice profiles. 然后,它调用你之前定义的函数。Then it calls the functions you defined previously.

本快速入门介绍使用语音 SDK 进行说话人识别的基本设计模式,其中包括:In this quickstart, you learn basic design patterns for Speaker Recognition using the Speech SDK, including:

  • 依赖于文本和独立于文本的验证Text-dependent and text-independent verification
  • 用于识别一组语音中的语音样本的说话人识别Speaker identification to identify a voice sample among a group of voices
  • 删除语音配置文件Deleting voice profiles

若要深入了解语音识别概念,请参阅概述一文。For a high-level look at Speech Recognition concepts, see the overview article.

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

重要

目前仅 westus 区域中创建的 Azure 语音资源支持说话人识别。Speaker Recognition is currently only supported in Azure Speech resources created in the westus region.

依赖于文本的验证Text-dependent verification

说话人验证是确认说话人与已知或已注册的语音匹配的操作。Speaker Verification is the act of confirming that a speaker matches a known, or enrolled voice. 第一步是注册语音配置文件,以便该服务有可与将来的语音样本进行比较的内容。The first step is to enroll a voice profile, so that the service has something to compare future voice samples against. 在此示例中,使用依赖于文本的策略注册配置文件,该策略需要用于注册和验证的特定密码。In this example, you enroll the profile using a text-dependent strategy, which requires a specific passphrase to use for both enrollment and verification. 有关支持的密码的列表,请参阅参考文档See the reference docs for a list of supported passphrases.

首先,创建语音配置文件Start by creating a voice profile. 需要在本文的每个 curl 命令中插入语音服务订阅密钥和区域。You will need to insert your Speech service subscription key and region into each of the curl commands in this article.

# Note Change locale if needed.
curl --location --request POST 'INSERT_ENDPOINT_HERE/speaker/verification/v2.0/text-dependent/profiles' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: application/json' \
--data-raw '{
    '\''locale'\'':'\''en-us'\''
}'

请注意,有三种类型的语音配置文件:Note there are three types of voice profile:

  • 依赖于文本的验证Text-dependent verification
  • 独立于文本的验证Text-independent verification
  • 独立于文本的识别Text-independent identification

在本例中,创建依赖于文本的验证语音配置文件。In this case, you create a text-dependent verification voice profile. 应该会收到以下响应。You should receive the following response.

{
    "remainingEnrollmentsCount": 3,
    "locale": "en-us",
    "createdDateTime": "2020-09-29T14:54:29.683Z",
    "enrollmentStatus": "Enrolling",
    "modelVersion": null,
    "profileId": "714ce523-de76-4220-b93f-7c1cc1882d6e",
    "lastUpdatedDateTime": null,
    "enrollmentsCount": 0,
    "enrollmentsLength": 0.0,
    "enrollmentSpeechLength": 0.0
}

接下来,注册语音配置文件Next, you enroll the voice profile. 对于 --data-binary 参数值,在计算机上指定一个音频文件,其中包含一种受支持的密码,例如“我的语音是我的通行证,请验证我。”For the --data-binary parameter value, specify an audio file on your computer that contains one of the supported passphrases, such as "my voice is my passport, verify me." 可以使用 Windows 录音机之类的应用录制这样的音频文件,也可以使用文本转语音来生成。You can record such an audio file with an app such as Windows Voice Recorder, or you can generate it using text -to-speech.

curl --location --request POST 'INSERT_ENDPOINT_HERE/speaker/verification/v2.0/text-dependent/profiles/INSERT_PROFILE_ID_HERE/enrollments' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: audio/wav' \
--data-binary 'INSERT_FILE_PATH_HERE'

应该会收到以下响应。You should receive the following response.

{
    "remainingEnrollmentsCount": 2,
    "passPhrase": "my voice is my passport verify me",
    "profileId": "714ce523-de76-4220-b93f-7c1cc1882d6e",
    "enrollmentStatus": "Enrolling",
    "enrollmentsCount": 1,
    "enrollmentsLength": 3.5,
    "enrollmentsSpeechLength": 2.88,
    "audioLength": 3.5,
    "audioSpeechLength": 2.88
}

此响应告知你需要再注册两个音频样本。This response tells you that you need to enroll two more audio samples.

在注册了总共三个音频样本后,应会收到以下响应。After you have enrolled a total of three audio samples, you should receive the following response.

{
    "remainingEnrollmentsCount": 0,
    "passPhrase": "my voice is my passport verify me",
    "profileId": "714ce523-de76-4220-b93f-7c1cc1882d6e",
    "enrollmentStatus": "Enrolled",
    "enrollmentsCount": 3,
    "enrollmentsLength": 10.5,
    "enrollmentsSpeechLength": 8.64,
    "audioLength": 3.5,
    "audioSpeechLength": 2.88
}

现在你可以针对语音配置文件验证音频样本Now you are ready to verify an audio sample against the voice profile. 此音频样本应包含与用于注册语音配置文件的样本相同的密码。This audio sample should contain the same passphrase as the samples you used to enroll the voice profile.

curl --location --request POST 'INSERT_ENDPOINT_HERE/speaker/verification/v2.0/text-dependent/profiles/INSERT_PROFILE_ID_HERE/verify' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: audio/wav' \
--data-binary 'INSERT_FILE_PATH_HERE'

应该会收到以下响应。You should receive the following response.

{
    "recognitionResult": "Accept",
    "score": 1.0
}

Accept 意味着密码匹配并且验证成功。The Accept means the passphrase matched and the verification was successful. 此响应还包含介于 0.0-1.0 之间的相似性分数。The response also contains a similarity score ranging from 0.0-1.0.

最后,删除语音配置文件To finish, delete the voice profile.

curl --location --request DELETE \
'INSERT_ENDPOINT_HERE/speaker/verification/v2.0/text-dependent/profiles/INSERT_PROFILE_ID_HERE' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE'

没有响应。There is no response.

独立于文本的验证Text-independent verification

与依赖于文本的验证不同,独立于文本的验证 :In contrast to text-dependent verification, text-independent verification:

  • 不需要说出特定的密码,可以说任何内容Does not require a certain passphrase to be spoken, anything can be spoken
  • 不需要三个音频样本,但需要总共 20 秒的音频Does not require three audio samples, but does require 20 seconds of total audio

首先,创建独立于文本的验证配置文件Start by creating a text-independent verification profile.

curl --location --request POST 'INSERT_ENDPOINT_HERE/speaker/verification/v2.0/text-independent/profiles' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: application/json' \
--data-raw '{
    '\''locale'\'':'\''en-us'\''
}'

应该会收到以下响应。You should receive the following response.

{
    "remainingEnrollmentsSpeechLength": 20.0,
    "locale": "en-us",
    "createdDateTime": "2020-09-29T16:08:52.409Z",
    "enrollmentStatus": "Enrolling",
    "modelVersion": null,
    "profileId": "3f85dca9-ffc9-4011-bf21-37fad2beb4d2",
    "lastUpdatedDateTime": null,
    "enrollmentsCount": 0,
    "enrollmentsLength": 0.0,
    "enrollmentSpeechLength": 0.0
}

接下来,注册语音配置文件Next, enroll the voice profile. 同样,需要提交包含总共 20 秒音频的音频样本,而不是提交三个音频样本。Again, rather than submitting three audio samples, you need to submit audio samples that contain a total of 20 seconds of audio.

curl --location --request POST 'INSERT_ENDPOINT_HERE/speaker/verification/v2.0/text-independent/profiles/INSERT_PROFILE_ID_HERE/enrollments' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: audio/wav' \
--data-binary 'INSERT_FILE_PATH_HERE'

提交足够的音频样本后,应会收到以下响应。Once you have submitted enough audio samples, you should receive the following response.

{
    "remainingEnrollmentsSpeechLength": 0.0,
    "profileId": "3f85dca9-ffc9-4011-bf21-37fad2beb4d2",
    "enrollmentStatus": "Enrolled",
    "enrollmentsCount": 1,
    "enrollmentsLength": 33.16,
    "enrollmentsSpeechLength": 29.21,
    "audioLength": 33.16,
    "audioSpeechLength": 29.21
}

现在你可以针对语音配置文件验证音频样本Now you are ready to verify an audio sample against the voice profile. 同样,此音频样本不需要包含密码。Again, this audio sample does not need to contain a passphrase. 它可以包含任何语音,只要其中总共包含至少四秒的音频即可。It can contain any speech, as long as it contains a total of at least four seconds of audio.

curl --location --request POST 'INSERT_ENDPOINT_HERE/speaker/verification/v2.0/text-independent/profiles/INSERT_PROFILE_ID_HERE/verify' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: audio/wav' \
--data-binary 'INSERT_FILE_PATH_HERE'

应该会收到以下响应。You should receive the following response.

{
    "recognitionResult": "Accept",
    "score": 0.9196669459342957
}

Accept 表示验证成功。The Accept means the verification was successful. 此响应还包含介于 0.0-1.0 之间的相似性分数。The response also contains a similarity score ranging from 0.0-1.0.

最后,删除语音配置文件To finish, delete the voice profile.

curl --location --request DELETE 'INSERT_ENDPOINT_HERE/speaker/verification/v2.0/text-independent/profiles/INSERT_PROFILE_ID_HERE' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE'

没有响应。There is no response.

说话人识别Speaker identification

说话人识别用于确定谁正在一组给定的注册语音中说话。Speaker Identification is used to determine who is speaking from a given group of enrolled voices. 此过程与独立于文本的验证相似,主要区别在于前者能够一次针对多个语音配置文件进行验证,而不是针对单个配置文件进行验证。The process is similar to text-independent verification, with the main difference being able to verify against multiple voice profiles at once, rather than verifying against a single profile.

首先,创建独立于文本的标识配置文件Start by creating a text-independent identification profile.

# Note Change locale if needed.
curl --location --request POST 'INSERT_ENDPOINT_HERE/speaker/identification/v2.0/text-independent/profiles' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: application/json' \
--data-raw '{
    '\''locale'\'':'\''en-us'\''
}'

应该会收到以下响应。You should receive the following response.

{
    "remainingEnrollmentsSpeechLength": 20.0,
    "locale": "en-us",
    "createdDateTime": "2020-09-22T17:25:48.642Z",
    "enrollmentStatus": "Enrolling",
    "modelVersion": null,
    "profileId": "de99ab38-36c8-4b82-b137-510907c61fe8",
    "lastUpdatedDateTime": null,
    "enrollmentsCount": 0,
    "enrollmentsLength": 0.0,
    "enrollmentSpeechLength": 0.0
}

接下来,注册语音配置文件Next, you enroll the voice profile. 同样,需要提交包含总共 20 秒音频的音频样本。Again, you need to submit audio samples that contain a total of 20 seconds of audio. 这些样本不需要包含密码。These samples do not need to contain a passphrase.

curl --location --request POST 'INSERT_ENDPOINT_HERE/speaker/identification/v2.0/text-independent/profiles/INSERT_PROFILE_ID_HERE/enrollments' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: audio/wav' \
--data-binary 'INSERT_FILE_PATH_HERE'

提交足够的音频样本后,应会收到以下响应。Once you have submitted enough audio samples, you should receive the following response.

{
    "remainingEnrollmentsSpeechLength": 0.0,
    "profileId": "de99ab38-36c8-4b82-b137-510907c61fe8",
    "enrollmentStatus": "Enrolled",
    "enrollmentsCount": 2,
    "enrollmentsLength": 36.69,
    "enrollmentsSpeechLength": 31.95,
    "audioLength": 33.16,
    "audioSpeechLength": 29.21
}

现在你可以使用语音配置文件识别音频样本Now you are ready to identify an audio sample using the voice profile. 识别命令接受以逗号分隔的可能的语音配置文件 ID 列表。The identify command accepts a comma-delimited list of possible voice profile IDs. 在本例中,你只需传入之前创建的语音配置文件的 ID。In this case, you'll just pass in the ID of the voice profile you created previously. 但是,如果需要,可以传入多个语音配置文件 ID,其中每个语音配置文件都注册了来自不同语音的音频示例。However, if you want, you can pass in multiple voice profile IDs where each voice profile is enrolled with audio samples from a different voice.

curl --location --request POST 'INSERT_ENDPOINT_HERE/speaker/identification/v2.0/text-independent/profiles/identifySingleSpeaker?profileIds=INSERT_PROFILE_ID_HERE' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: audio/wav' \
--data-binary 'INSERT_FILE_PATH_HERE'

应该会收到以下响应。You should receive the following response.

Success:
{
    "identifiedProfile": {
        "profileId": "de99ab38-36c8-4b82-b137-510907c61fe8",
        "score": 0.9083486
    },
    "profilesRanking": [
        {
            "profileId": "de99ab38-36c8-4b82-b137-510907c61fe8",
            "score": 0.9083486
        }
    ]
}

响应包含与所提交的音频示例最匹配的语音配置文件的 ID。The response contains the ID of the voice profile that most closely matches the audio sample you submitted. 还包含按相似性顺序排列的候选语音配置文件列表。It also contains a list of candidate voice profiles, ranked in order of similarity.

最后,删除语音配置文件To finish, delete the voice profile.

curl --location --request DELETE \
'INSERT_ENDPOINT_HERE/speaker/identification/v2.0/text-independent/profiles/INSERT_PROFILE_ID_HERE' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE'

没有响应。There is no response.

后续步骤Next steps

  • 有关类和函数的详细信息,请参阅说话人识别参考文档See the Speaker Recognition reference documentation for detail on classes and functions.

  • 请参阅 GitHub 上的 C#C++ 示例。See C# and C++ samples on GitHub.