Get started with Speaker Recognition
In this quickstart, you learn basic design patterns for Speaker Recognition using the Speech SDK, including:
- Text-dependent and text-independent verification
- Speaker identification to identify a voice sample among a group of voices
- Deleting voice profiles
For a high-level look at Speaker Recognition concepts, see the overview article. See the Reference node on left nav for a list of the supported platforms.
Prerequisites
This article assumes that you have an Azure account and Speech service subscription. If you don't have an account and subscription, try the Speech service for free.
Important
Microsoft limits access to Speaker Recognition. Apply to use it through the Azure Cognitive Services Speaker Recognition Limited Access Review. After approval, you can access the Speaker Recognition APIs.
Install the Speech SDK
Before you can do anything, you'll need to install the Speech SDK. Depending on your platform, use the following instructions:
Import dependencies
To run the examples in this article, include the following using statements at the top of your script.
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
Create a speech configuration
To call the Speech service using the Speech SDK, you need to create a SpeechConfig. In this example, you create a SpeechConfig using a subscription key and region. You also create some basic boilerplate code to use for the rest of this article, which you modify for different customizations.
public class Program
{
static async Task Main(string[] args)
{
// replace with your own subscription key
string subscriptionKey = "YourSubscriptionKey";
// replace with your own subscription region
string region = "YourSubscriptionRegion";
var config = SpeechConfig.FromSubscription(subscriptionKey, region);
}
}
Text-dependent verification
Speaker Verification is the act of confirming that a speaker matches a known, or enrolled voice. The first step is to enroll a voice profile, so that the service has something to compare future voice samples against. In this example, you enroll the profile using a text-dependent strategy, which requires a specific pass-phrase to use for both enrollment and verification. See the reference docs for a list of supported pass-phrases.
Start by creating the following function in your Program class to enroll a voice profile.
public static async Task VerificationEnroll(SpeechConfig config, Dictionary<string, string> profileMapping)
{
using (var client = new VoiceProfileClient(config))
using (var profile = await client.CreateProfileAsync(VoiceProfileType.TextDependentVerification, "en-us"))
using (var phraseResult = await client.GetActivationPhrasesAsync(VoiceProfileType.TextDependentVerification, "en-us"))
{
using (var audioInput = AudioConfig.FromDefaultMicrophoneInput())
{
Console.WriteLine($"Enrolling profile id {profile.Id}.");
// give the profile a human-readable display name
profileMapping.Add(profile.Id, "Your Name");
VoiceProfileEnrollmentResult result = null;
while (result is null || result.RemainingEnrollmentsCount > 0)
{
Console.WriteLine($"Speak the passphrase, \"${phraseResult.Phrases[0]}\"");
result = await client.EnrollProfileAsync(profile, audioInput);
Console.WriteLine($"Remaining enrollments needed: {result.RemainingEnrollmentsCount}");
Console.WriteLine("");
}
if (result.Reason == ResultReason.EnrolledVoiceProfile)
{
await SpeakerVerify(config, profile, profileMapping);
}
else if (result.Reason == ResultReason.Canceled)
{
var cancellation = VoiceProfileEnrollmentCancellationDetails.FromResult(result);
Console.WriteLine($"CANCELED {profile.Id}: ErrorCode={cancellation.ErrorCode} ErrorDetails={cancellation.ErrorDetails}");
}
}
}
}
In this function, await client.CreateProfileAsync() is what actually creates the new voice profile. After it is created, you specify how you will input audio samples, using AudioConfig.FromDefaultMicrophoneInput() in this example to capture audio from your default input device. Next, you enroll audio samples in a while loop that tracks the number of samples remaining, and required, for enrollment. In each iteration, client.EnrollProfileAsync(profile, audioInput) will prompt you to speak the pass-phrase into your microphone, and add the sample to the voice profile.
After enrollment is completed, you call await SpeakerVerify(config, profile, profileMapping) to verify against the profile you just created. Add another function to define SpeakerVerify.
public static async Task SpeakerVerify(SpeechConfig config, VoiceProfile profile, Dictionary<string, string> profileMapping)
{
var speakerRecognizer = new SpeakerRecognizer(config, AudioConfig.FromDefaultMicrophoneInput());
var model = SpeakerVerificationModel.FromProfile(profile);
Console.WriteLine("Speak the passphrase to verify: \"My voice is my passport, please verify me.\"");
var result = await speakerRecognizer.RecognizeOnceAsync(model);
Console.WriteLine($"Verified voice profile for speaker {profileMapping[result.ProfileId]}, score is {result.Score}");
}
In this function, you pass the VoiceProfile object you just created to initialize a model to verify against. Next, await speakerRecognizer.RecognizeOnceAsync(model) prompts you to speak the pass-phrase again, but this time it will validate it against your voice profile and return a similarity score ranging from 0.0-1.0. The result object also returns Accept or Reject, based on whether or not the pass-phrase matches.
Next, modify your Main() function to call the new functions you created. Additionally, note that you create a Dictionary<string, string> to pass by reference through your function calls. The reason for this is that the service does not allow storing a human-readable name with a created VoiceProfile, and only stores an ID number for privacy purposes. In the VerificationEnroll function, you add to this dictionary an entry with the newly created ID, along with a text name. In application development scenarios where you need to display a human-readable name, you must store this mapping somewhere, the service cannot store it.
static async Task Main(string[] args)
{
string subscriptionKey = "YourSubscriptionKey";
string region = "westus";
var config = SpeechConfig.FromSubscription(subscriptionKey, region);
// persist profileMapping if you want to store a record of who the profile is
var profileMapping = new Dictionary<string, string>();
await VerificationEnroll(config, profileMapping);
Console.ReadLine();
}
Run the script, and you are prompted to speak the phrase My voice is my passport, verify me three times for enrollment, and one additional time for verification. The result returned is the similarity score, which you can use to create your own custom thresholds for verification.
Enrolling profile id 87-2cef-4dff-995b-dcefb64e203f.
Speak the passphrase, "My voice is my passport, verify me."
Remaining enrollments needed: 2
Speak the passphrase, "My voice is my passport, verify me."
Remaining enrollments needed: 1
Speak the passphrase, "My voice is my passport, verify me."
Remaining enrollments needed: 0
Speak the passphrase to verify: "My voice is my passport, verify me."
Verified voice profile for speaker Your Name, score is 0.915581
Text-independent verification
In contrast to text-dependent verification, text-independent verification:
- Does not require three audio samples, but does require 20-seconds of total audio
Make a couple simple changes to your VerificationEnroll function to switch to text-independent verification. First, you change the verification type to VoiceProfileType.TextIndependentVerification. Next, change the while loop to track result.RemainingEnrollmentsSpeechLength, which will continue to prompt you to speak until 20 seconds of audio have been captured.
public static async Task VerificationEnroll(SpeechConfig config, Dictionary<string, string> profileMapping)
{
using (var client = new VoiceProfileClient(config))
using (var profile = await client.CreateProfileAsync(VoiceProfileType.TextIndependentVerification, "en-us"))
using (var phraseResult = await client.GetActivationPhrasesAsync(VoiceProfileType.TextIndependentVerification, "en-us"))
{
using (var audioInput = AudioConfig.FromDefaultMicrophoneInput())
{
Console.WriteLine($"Enrolling profile id {profile.Id}.");
// give the profile a human-readable display name
profileMapping.Add(profile.Id, "Your Name");
VoiceProfileEnrollmentResult result = null;
while (result is null || result.RemainingEnrollmentsSpeechLength > TimeSpan.Zero)
{
Console.WriteLine($"Speak the activation phrase, \"${phraseResult.Phrases[0]}\"");
result = await client.EnrollProfileAsync(profile, audioInput);
Console.WriteLine($"Remaining enrollment audio time needed: {result.RemainingEnrollmentsSpeechLength}");
Console.WriteLine("");
}
if (result.Reason == ResultReason.EnrolledVoiceProfile)
{
await SpeakerVerify(config, profile, profileMapping);
}
else if (result.Reason == ResultReason.Canceled)
{
var cancellation = VoiceProfileEnrollmentCancellationDetails.FromResult(result);
Console.WriteLine($"CANCELED {profile.Id}: ErrorCode={cancellation.ErrorCode} ErrorDetails={cancellation.ErrorDetails}");
}
}
}
}
Run the program again. Again, the similarity score is returned.
Enrolling profile id 4tt87d4-f2d3-44ae-b5b4-f1a8d4036ee9.
Speak the activation phrase, "<FIRST ACTIVATION PHRASE>"
Remaining enrollment audio time needed: 00:00:15.3200000
Speak the activation phrase, "<FIRST ACTIVATION PHRASE>"
Remaining enrollment audio time needed: 00:00:09.8100008
Speak the activation phrase, "<FIRST ACTIVATION PHRASE>"
Remaining enrollment audio time needed: 00:00:05.1900000
Speak the activation phrase, "<FIRST ACTIVATION PHRASE>"
Remaining enrollment audio time needed: 00:00:00.8700000
Speak the activation phrase, "<FIRST ACTIVATION PHRASE>"
Remaining enrollment audio time needed: 00:00:00
Speak the passphrase to verify: "My voice is my passport, please verify me."
Verified voice profile for speaker Your Name, score is 0.849409
Speaker identification
Speaker Identification is used to determine who is speaking from a given group of enrolled voices. The process is very similar to text-independent verification, with the main difference being able to verify against multiple voice profiles at once, rather than verifying against a single profile.
Create a function IdentificationEnroll to enroll multiple voice profiles. The enrollment process for each profile is the same as the enrollment process for text-independent verification, and requires 20 seconds of audio for each profile. This function accepts a list of strings profileNames, and will create a new voice profile for each name in the list. The function returns a list of VoiceProfile objects, which you use in the next function for identifying a speaker.
public static async Task<List<VoiceProfile>> IdentificationEnroll(SpeechConfig config, List<string> profileNames, Dictionary<string, string> profileMapping)
{
List<VoiceProfile> voiceProfiles = new List<VoiceProfile>();
using (var client = new VoiceProfileClient(config))
using (var phraseResult = await client.GetActivationPhrasesAsync(VoiceProfileType.TextIndependentVerification, "en-us"))
{
foreach (string name in profileNames)
{
using (var audioInput = AudioConfig.FromDefaultMicrophoneInput())
{
var profile = await client.CreateProfileAsync(VoiceProfileType.TextIndependentIdentification, "en-us");
Console.WriteLine($"Creating voice profile for {name}.");
profileMapping.Add(profile.Id, name);
VoiceProfileEnrollmentResult result = null;
while (result is null || result.RemainingEnrollmentsSpeechLength > TimeSpan.Zero)
{
Console.WriteLine($"Speak the activation phrase, \"${phraseResult.Phrases[0]}\" to add to the profile enrollment sample for {name}.");
result = await client.EnrollProfileAsync(profile, audioInput);
Console.WriteLine($"Remaining enrollment audio time needed: {result.RemainingEnrollmentsSpeechLength}");
Console.WriteLine("");
}
voiceProfiles.Add(profile);
}
}
}
return voiceProfiles;
}
Create the following function SpeakerIdentification to submit an identification request. The main difference in this function compared to a speaker verification request is the use of SpeakerIdentificationModel.FromProfiles(), which accepts a list of VoiceProfile objects.
public static async Task SpeakerIdentification(SpeechConfig config, List<VoiceProfile> voiceProfiles, Dictionary<string, string> profileMapping)
{
var speakerRecognizer = new SpeakerRecognizer(config, AudioConfig.FromDefaultMicrophoneInput());
var model = SpeakerIdentificationModel.FromProfiles(voiceProfiles);
Console.WriteLine("Speak some text to identify who it is from your list of enrolled speakers.");
var result = await speakerRecognizer.RecognizeOnceAsync(model);
Console.WriteLine($"The most similar voice profile is {profileMapping[result.ProfileId]} with similarity score {result.Score}");
}
Change your Main() function to the following. You create a list of strings profileNames, which you pass to your IdentificationEnroll() function. This will prompt you to create a new voice profile for each name in this list, so you can add more names to create additional profiles for friends or colleagues.
static async Task Main(string[] args)
{
// replace with your own subscription key
string subscriptionKey = "YourSubscriptionKey";
// replace with your own subscription region
string region = "YourSubscriptionRegion";
var config = SpeechConfig.FromSubscription(subscriptionKey, region);
// persist profileMapping if you want to store a record of who the profile is
var profileMapping = new Dictionary<string, string>();
var profileNames = new List<string>() { "Your name", "A friend's name" };
var enrolledProfiles = await IdentificationEnroll(config, profileNames, profileMapping);
await SpeakerIdentification(config, enrolledProfiles, profileMapping);
foreach (var profile in enrolledProfiles)
{
profile.Dispose();
}
Console.ReadLine();
}
Run the script, and you are prompted to speak to enroll voice samples for the first profile. After the enrollment is completed, you are prompted to repeat this process for each name in the list profileNames. After each enrollment is finished, you are prompted to have anyone speak, and the service will attempt to identify this person from among your enrolled voice profiles.
This example returns only the closest match and it's similarity score, but you can get the full response that includes the top five similarity scores by adding string json = result.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult) to your SpeakerIdentification function.
Changing audio input type
The examples in this article use the default device microphone as input for audio samples. However, in scenarios where you need to use audio files instead of microphone input, simply change any instance of AudioConfig.FromDefaultMicrophoneInput() to AudioConfig.FromWavFileInput(path/to/your/file.wav) to switch to a file input. You can also have mixed inputs, using a microphone for enrollment and files for verification, for example.
Deleting voice profile enrollments
To delete an enrolled profile, use the DeleteProfileAsync() function on the VoiceProfileClient object. The following example function shows how to delete a voice profile from a known voice profile ID.
public static async Task DeleteProfile(SpeechConfig config, string profileId)
{
using (var client = new VoiceProfileClient(config))
{
var profile = new VoiceProfile(profileId);
await client.DeleteProfileAsync(profile);
}
}
In this quickstart, you learn basic design patterns for Speaker Recognition using the Speech SDK, including:
- Text-dependent and text-independent verification
- Speaker identification to identify a voice sample among a group of voices
- Deleting voice profiles
For a high-level look at Speaker Recognition concepts, see the overview article. See the Reference node on left nav for a list of the supported platforms.
Skip to samples on GitHub
If you want to skip straight to sample code, see the C++ quickstart samples on GitHub.
Prerequisites
This article assumes that you have an Azure account and Speech service subscription. If you don't have an account and subscription, try the Speech service for free.
Important
Microsoft limits access to Speaker Recognition. Apply to use it through the Azure Cognitive Services Speaker Recognition Limited Access Review. After approval, you can access the Speaker Recognition APIs.
Install the Speech SDK
Before you can do anything, you'll need to install the Speech SDK. Depending on your platform, use the following instructions:
Import dependencies
To run the examples in this article, add the following statements at the top of your .cpp file.
#include <iostream>
#include <stdexcept>
// Note: Install the NuGet package Microsoft.CognitiveServices.Speech.
#include <speechapi_cxx.h>
using namespace std;
using namespace Microsoft::CognitiveServices::Speech;
// Note: Change the locale if desired.
auto profile_locale = "en-us";
auto audio_config = Audio::AudioConfig::FromDefaultMicrophoneInput();
auto ticks_per_second = 10000000;
Create a speech configuration
To call the Speech service using the Speech SDK, you need to create a SpeechConfig. This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.
shared_ptr<SpeechConfig> GetSpeechConfig()
{
auto subscription_key = 'PASTE_YOUR_SPEECH_SUBSCRIPTION_KEY_HERE';
auto region = 'PASTE_YOUR_SPEECH_ENDPOINT_REGION_HERE';
auto config = SpeechConfig::FromSubscription(subscription_key, region);
return config;
}
Text-dependent verification
Speaker Verification is the act of confirming that a speaker matches a known, or enrolled voice. The first step is to enroll a voice profile, so that the service has something to compare future voice samples against. In this example, you enroll the profile using a text-dependent strategy, which requires a specific passphrase to use for both enrollment and verification. See the reference docs for a list of supported passphrases.
TextDependentVerification function
Start by creating the TextDependentVerification function.
void TextDependentVerification(shared_ptr<VoiceProfileClient> client, shared_ptr<SpeakerRecognizer> recognizer)
{
std::cout << "Text Dependent Verification:\n\n";
// Create the profile.
auto profile = client->CreateProfileAsync(VoiceProfileType::TextDependentVerification, profile_locale).get();
std::cout << "Created profile ID: " << profile->GetId() << "\n";
AddEnrollmentsToTextDependentProfile(client, profile);
SpeakerVerify(profile, recognizer);
// Delete the profile.
client->DeleteProfileAsync(profile);
}
This function creates a VoiceProfile object with the CreateProfileAsync method. Note there are three types of VoiceProfile:
- TextIndependentIdentification
- TextDependentVerification
- TextIndependentVerification
In this case you pass VoiceProfileType::TextDependentVerification to CreateProfileAsync.
You then call two helper functions that you'll define next, AddEnrollmentsToTextDependentProfile and SpeakerVerify. Finally, call DeleteProfileAsync to clean up the profile.
AddEnrollmentsToTextDependentProfile function
Define the following function to enroll a voice profile.
void AddEnrollmentsToTextDependentProfile(shared_ptr<VoiceProfileClient> client, shared_ptr<VoiceProfile> profile)
{
shared_ptr<VoiceProfileEnrollmentResult> enroll_result = nullptr;
while (enroll_result == nullptr || enroll_result->GetEnrollmentInfo(EnrollmentInfoType::RemainingEnrollmentsCount) > 0)
{
std::cout << "Please say the passphrase, \"My voice is my passport, verify me.\"\n";
enroll_result = client->EnrollProfileAsync(profile, audio_config).get();
std::cout << "Remaining enrollments needed: " << enroll_result->GetEnrollmentInfo(EnrollmentInfoType::RemainingEnrollmentsCount) << ".\n";
}
std::cout << "Enrollment completed.\n\n";
}
In this function, you enroll audio samples in a while loop that tracks the number of samples remaining, and required, for enrollment. In each iteration, EnrollProfileAsync prompts you to speak the passphrase into your microphone, and adds the sample to the voice profile.
SpeakerVerify function
Define SpeakerVerify as follows.
void SpeakerVerify(shared_ptr<VoiceProfile> profile, shared_ptr<SpeakerRecognizer> recognizer)
{
shared_ptr<SpeakerVerificationModel> model = SpeakerVerificationModel::FromProfile(profile);
std::cout << "Speak the passphrase to verify: \"My voice is my passport, verify me.\"\n";
shared_ptr<SpeakerRecognitionResult> result = recognizer->RecognizeOnceAsync(model).get();
std::cout << "Verified voice profile for speaker: " << result->ProfileId << ". Score is: " << result->GetScore() << ".\n\n";
}
In this function, you create a SpeakerVerificationModel object with the SpeakerVerificationModel::FromProfile method, passing in the VoiceProfile object you created earlier.
Next, SpeechRecognizer::RecognizeOnceAsync prompts you to speak the passphrase again, but this time it will validate it against your voice profile and return a similarity score ranging from 0.0-1.0. The SpeakerRecognitionResult object also returns Accept or Reject, based on whether or not the passphrase matches.
Text-independent verification
In contrast to text-dependent verification, text-independent verification does not require three audio samples, but does require 20 seconds of total audio.
TextIndependentVerification function
Start by creating the TextIndependentVerification function.
void TextIndependentVerification(shared_ptr<VoiceProfileClient> client, shared_ptr<SpeakerRecognizer> recognizer)
{
std::cout << "Text Independent Verification:\n\n";
// Create the profile.
auto profile = client->CreateProfileAsync(VoiceProfileType::TextIndependentVerification, profile_locale).get();
std::cout << "Created profile ID: " << profile->GetId() << "\n";
AddEnrollmentsToTextIndependentProfile(client, profile);
SpeakerVerify(profile, recognizer);
// Delete the profile.
client->DeleteProfileAsync(profile);
}
Like the TextDependentVerification function, this function creates a VoiceProfile object with the CreateProfileAsync method.
In this case you pass VoiceProfileType::TextIndependentVerification to CreateProfileAsync.
You then call two helper functions: AddEnrollmentsToTextIndependentProfile, which you'll define next, and SpeakerVerify, which you defined already. Finally, call DeleteProfileAsync to clean up the profile.
AddEnrollmentsToTextIndependentProfile
Define the following function to enroll a voice profile.
void AddEnrollmentsToTextIndependentProfile(shared_ptr<VoiceProfileClient> client, shared_ptr<VoiceProfile> profile)
{
shared_ptr<VoiceProfileEnrollmentResult> enroll_result = nullptr;
while (enroll_result == nullptr || enroll_result->GetEnrollmentInfo(EnrollmentInfoType::RemainingEnrollmentsSpeechLength) > 0)
{
std::cout << "Continue speaking to add to the profile enrollment sample.\n";
enroll_result = client->EnrollProfileAsync(profile, audio_config).get();
std::cout << "Remaining audio time needed: " << enroll_result->GetEnrollmentInfo(EnrollmentInfoType::RemainingEnrollmentsSpeechLength) / ticks_per_second << " seconds.\n";
}
std::cout << "Enrollment completed.\n\n";
}
In this function, you enroll audio samples in a while loop that tracks the number of seconds of audio remaining, and required, for enrollment. In each iteration, EnrollProfileAsync prompts you to speak into your microphone, and adds the sample to the voice profile.
Speaker identification
Speaker Identification is used to determine who is speaking from a given group of enrolled voices. The process is very similar to text-independent verification, with the main difference being able to verify against multiple voice profiles at once, rather than verifying against a single profile.
TextIndependentIdentification function
Start by creating the TextIndependentIdentification function.
void TextIndependentIdentification(shared_ptr<VoiceProfileClient> client, shared_ptr<SpeakerRecognizer> recognizer)
{
std::cout << "Speaker Identification:\n\n";
// Create the profile.
auto profile = client->CreateProfileAsync(VoiceProfileType::TextIndependentIdentification, profile_locale).get();
std::cout << "Created profile ID: " << profile->GetId() << "\n";
AddEnrollmentsToTextIndependentProfile(client, profile);
SpeakerIdentify(profile, recognizer);
// Delete the profile.
client->DeleteProfileAsync(profile);
}
Like the TextDependentVerification and TextIndependentVerification functions, this function creates a VoiceProfile object with the CreateProfileAsync method.
In this case you pass VoiceProfileType::TextIndependentIdentification to CreateProfileAsync.
You then call two helper functions: AddEnrollmentsToTextIndependentProfile, which you defined already, and SpeakerIdentify, which you'll define next. Finally, call DeleteProfileAsync to clean up the profile.
SpeakerIdentify function
Define the SpeakerIdentify function as follows.
void SpeakerIdentify(shared_ptr<VoiceProfile> profile, shared_ptr<SpeakerRecognizer> recognizer)
{
shared_ptr<SpeakerIdentificationModel> model = SpeakerIdentificationModel::FromProfiles({ profile });
// Note: We need at least four seconds of audio after pauses are subtracted.
std::cout << "Please speak for at least ten seconds to identify who it is from your list of enrolled speakers.\n";
shared_ptr<SpeakerRecognitionResult> result = recognizer->RecognizeOnceAsync(model).get();
std::cout << "The most similar voice profile is: " << result->ProfileId << " with similarity score: " << result->GetScore() << ".\n\n";
}
In this function, you create a SpeakerIdentificationModel object with the SpeakerIdentificationModel::FromProfiles method. SpeakerIdentificationModel::FromProfiles accepts a list of VoiceProfile objects. In this case, you'll just pass in the VoiceProfile object you created earlier. However, if you want, you can pass in multiple VoiceProfile objects, each enrolled with audio samples from a different voice.
Next, SpeechRecognizer::RecognizeOnceAsync prompts you to speak again. This time it compares your voice to the enrolled voice profiles and returns the most similar voice profile.
Main function
Finally, define the main function as follows.
int main()
{
auto speech_config = GetSpeechConfig();
auto client = VoiceProfileClient::FromConfig(speech_config);
auto recognizer = SpeakerRecognizer::FromConfig(speech_config, audio_config);
TextDependentVerification(client, recognizer);
TextIndependentVerification(client, recognizer);
TextIndependentIdentification(client, recognizer);
std::cout << "End of quickstart.\n";
}
This function simply calls the functions you defined previously. First, though, it creates a VoiceProfileClient object and a SpeakerRecognizer object.
auto speech_config = GetSpeechConfig();
auto client = VoiceProfileClient::FromConfig(speech_config);
auto recognizer = SpeakerRecognizer::FromConfig(speech_config, audio_config);
The VoiceProfileClient is used to create, enroll and delete voice profiles. The SpeakerRecognizer is used to validate speech samples against one or more enrolled voice profiles.
Changing audio input type
The examples in this article use the default device microphone as input for audio samples. However, in scenarios where you need to use audio files instead of microphone input, simply change the following line:
auto audio_config = Audio::AudioConfig::FromDefaultMicrophoneInput();
to:
auto audio_config = Audio::AudioConfig::FromWavFileInput(path/to/your/file.wav);
Or replace any use of audio_config with Audio::AudioConfig::FromWavFileInput. You can also have mixed inputs, using a microphone for enrollment and files for verification, for example.
In this quickstart, you learn basic design patterns for Speaker Recognition using the Speech SDK, including:
- Text-dependent and text-independent verification
- Speaker identification to identify a voice sample among a group of voices
- Deleting voice profiles
For a high-level look at Speaker Recognition concepts, see the overview article. See the Reference node on left nav for a list of the supported platforms.
Skip to samples on GitHub
If you want to skip straight to sample code, see the JavaScript quickstart samples on GitHub.
Prerequisites
This article assumes that you have an Azure account and Speech service subscription. If you don't have an account and subscription, try the Speech service for free.
Important
Speaker Recognition is currently only supported in Azure Speech resources created in the westus region.
Install the Speech SDK
Before you can do anything, you'll need to install the Speech SDK for JavaScript . Depending on your platform, use the following instructions:
Additionally, depending on the target environment use one of the following:
Download and extract the Speech SDK for JavaScript microsoft.cognitiveservices.speech.sdk.bundle.js file, and place it in a folder accessible to your HTML file.
<script src="microsoft.cognitiveservices.speech.sdk.bundle.js"></script>;
Tip
If you're targeting a web browser, and using the <script> tag; the sdk prefix is not needed. The sdk prefix is an alias used to name the require module.
Import dependencies
To run the examples in this article, add the following statements at the top of your .js file.
"use strict";
/* To run this sample, install:
npm install microsoft-cognitiveservices-speech-sdk
*/
var sdk = require("microsoft-cognitiveservices-speech-sdk");
var fs = require("fs");
// Note: Change the locale if desired.
const profile_locale = "en-us";
/* Note: passphrase_files and verify_file should contain paths to audio files that contain \"My voice is my passport, verify me.\"
You can obtain these files from:
https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/fa6428a0837779cbeae172688e0286625e340942/quickstart/javascript/node/speaker-recognition/verification
*/
const passphrase_files = ["myVoiceIsMyPassportVerifyMe01.wav", "myVoiceIsMyPassportVerifyMe02.wav", "myVoiceIsMyPassportVerifyMe03.wav"];
const verify_file = "myVoiceIsMyPassportVerifyMe04.wav";
/* Note: identify_file should contain a path to an audio file that uses the same voice as the other files, but contains different speech. You can obtain this file from:
https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/fa6428a0837779cbeae172688e0286625e340942/quickstart/javascript/node/speaker-recognition/identification
*/
const identify_file = "aboutSpeechSdk.wav";
var subscription_key = 'PASTE_YOUR_SPEECH_SUBSCRIPTION_KEY_HERE';
var region = 'PASTE_YOUR_SPEECH_ENDPOINT_REGION_HERE';
const ticks_per_second = 10000000;
These statements import the required libraries and get your Speech service subscription key and region from your environment variables. They also specify paths to audio files that you will use in the following tasks.
Create helper function
Add the following helper function to read audio files into streams for use by the Speech service.
/* From: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/fa6428a0837779cbeae172688e0286625e340942/quickstart/javascript/node/speaker-recognition/verification/dependent-verification.js#L8
*/
function GetAudioConfigFromFile (file)
{
let pushStream = sdk.AudioInputStream.createPushStream();
fs.createReadStream(file).on("data", function(arrayBuffer) {
pushStream.write(arrayBuffer.buffer);
}).on("end", function() {
pushStream.close();
});
return sdk.AudioConfig.fromStreamInput(pushStream);
}
In this function, you use the AudioInputStream.createPushStream and AudioConfig.fromStreamInput methods to create an AudioConfig object. This AudioConfig object represents an audio stream. You will use several of these AudioConfig objects during the following tasks.
Text-dependent verification
Speaker Verification is the act of confirming that a speaker matches a known, or enrolled voice. The first step is to enroll a voice profile, so that the service has something to compare future voice samples against. In this example, you enroll the profile using a text-dependent strategy, which requires a specific passphrase to use for both enrollment and verification. See the reference docs for a list of supported passphrases.
TextDependentVerification function
Start by creating the TextDependentVerification function.
async function TextDependentVerification(client, speech_config)
{
console.log ("Text Dependent Verification:\n");
var profile = null;
try {
// Create the profile.
profile = await new Promise ((resolve, reject) => {
client.createProfileAsync (sdk.VoiceProfileType.TextDependentVerification, profile_locale, result => { resolve(result); }, error => { reject(error); });
});
console.log ("Created profile ID: " + profile.profileId);
await AddEnrollmentsToTextDependentProfile(client, profile, passphrase_files);
const audio_config = GetAudioConfigFromFile(verify_file);
const recognizer = new sdk.SpeakerRecognizer(speech_config, audio_config);
await SpeakerVerify(profile, recognizer);
}
catch (error) {
console.log ("Error:\n" + error);
}
finally {
if (profile !== null) {
console.log ("Deleting profile ID: " + profile.profileId);
await new Promise ((resolve, reject) => {
client.deleteProfileAsync (profile, result => { resolve(result); }, error => { reject(error); });
});
}
}
}
This function creates a VoiceProfile object with the VoiceProfileClient.createProfileAsync method. Note there are three types of VoiceProfile:
- TextIndependentIdentification
- TextDependentVerification
- TextIndependentVerification
In this case, you pass VoiceProfileType.TextDependentVerification to VoiceProfileClient.createProfileAsync.
You then call two helper functions that you'll define next, AddEnrollmentsToTextDependentProfile and SpeakerVerify. Finally, call VoiceProfileClient.deleteProfileAsync to remove the profile.
AddEnrollmentsToTextDependentProfile function
Define the following function to enroll a voice profile.
async function AddEnrollmentsToTextDependentProfile(client, profile, audio_files)
{
for (var i = 0; i < audio_files.length; i++) {
console.log ("Adding enrollment to text dependent profile...");
const audio_config = GetAudioConfigFromFile (audio_files[i]);
const result = await new Promise ((resolve, reject) => {
client.enrollProfileAsync (profile, audio_config, result => { resolve(result); }, error => { reject(error); });
});
if (result.reason === sdk.ResultReason.Canceled) {
throw(JSON.stringify(sdk.VoiceProfileEnrollmentCancellationDetails.fromResult(result)));
}
else {
console.log ("Remaining enrollments needed: " + result.privDetails["remainingEnrollmentsCount"] + ".");
}
};
console.log ("Enrollment completed.\n");
}
In this function, you call the GetAudioConfigFromFile function you defined earlier to create AudioConfig objects from audio samples. These audio samples contain a passphrase such as "My voice is my passport, verify me." You then enroll these audio samples using the VoiceProfileClient.enrollProfileAsync method.
SpeakerVerify function
Define SpeakerVerify as follows.
async function SpeakerVerify(profile, recognizer)
{
const model = sdk.SpeakerVerificationModel.fromProfile(profile);
const result = await new Promise ((resolve, reject) => {
recognizer.recognizeOnceAsync (model, result => { resolve(result); }, error => { reject(error); });
});
console.log ("Verified voice profile for speaker: " + result.profileId + ". Score is: " + result.score + ".\n");
}
In this function, you create a SpeakerVerificationModel object with the SpeakerVerificationModel.FromProfile method, passing in the VoiceProfile object you created earlier.
Next, you call the SpeechRecognizer.recognizeOnceAsync method to validate an audio sample that contains the same passphrase as the audio samples you enrolled previously. SpeechRecognizer.recognizeOnceAsync returns a SpeakerRecognitionResult object, whose score property contains a similarity score ranging from 0.0-1.0. The SpeakerRecognitionResult object also contains a reason property of type ResultReason. If the verification was successful, the reason property should have value RecognizedSpeaker.
Text-independent verification
In contrast to text-dependent verification, text-independent verification:
- Does not require a certain passphrase to be spoken, anything can be spoken
- Does not require three audio samples, but does require 20 seconds of total audio
TextIndependentVerification function
Start by creating the TextIndependentVerification function.
async function TextIndependentVerification(client, speech_config)
{
console.log ("Text Independent Verification:\n");
var profile = null;
try {
// Create the profile.
profile = await new Promise ((resolve, reject) => {
client.createProfileAsync (sdk.VoiceProfileType.TextIndependentVerification, profile_locale, result => { resolve(result); }, error => { reject(error); });
});
console.log ("Created profile ID: " + profile.profileId);
await AddEnrollmentsToTextIndependentProfile(client, profile, [identify_file]);
const audio_config = GetAudioConfigFromFile(passphrase_files[0]);
const recognizer = new sdk.SpeakerRecognizer(speech_config, audio_config);
await SpeakerVerify(profile, recognizer);
}
catch (error) {
console.log ("Error:\n" + error);
}
finally {
if (profile !== null) {
console.log ("Deleting profile ID: " + profile.profileId);
await new Promise ((resolve, reject) => {
client.deleteProfileAsync (profile, result => { resolve(result); }, error => { reject(error); });
});
}
}
}
Like the TextDependentVerification function, this function creates a VoiceProfile object with the VoiceProfileClient.createProfileAsync method.
In this case, you pass VoiceProfileType.TextIndependentVerification to createProfileAsync.
You then call two helper functions: AddEnrollmentsToTextIndependentProfile, which you'll define next, and SpeakerVerify, which you defined already. Finally, call VoiceProfileClient.deleteProfileAsync to remove the profile.
AddEnrollmentsToTextIndependentProfile
Define the following function to enroll a voice profile.
async function AddEnrollmentsToTextIndependentProfile(client, profile, audio_files)
{
for (var i = 0; i < audio_files.length; i++) {
console.log ("Adding enrollment to text independent profile...");
const audio_config = GetAudioConfigFromFile (audio_files[i]);
const result = await new Promise ((resolve, reject) => {
client.enrollProfileAsync (profile, audio_config, result => { resolve(result); }, error => { reject(error); });
});
if (result.reason === sdk.ResultReason.Canceled) {
throw(JSON.stringify(sdk.VoiceProfileEnrollmentCancellationDetails.fromResult(result)));
}
else {
console.log ("Remaining audio time needed: " + (result.privDetails["remainingEnrollmentsSpeechLength"] / ticks_per_second) + " seconds.");
}
}
console.log ("Enrollment completed.\n");
}
In this function, you call the GetAudioConfigFromFile function you defined earlier to create AudioConfig objects from audio samples. You then enroll these audio samples using the VoiceProfileClient.enrollProfileAsync method.
Speaker identification
Speaker Identification is used to determine who is speaking from a given group of enrolled voices. The process is similar to text-independent verification, with the main difference being able to verify against multiple voice profiles at once, rather than verifying against a single profile.
TextIndependentIdentification function
Start by creating the TextIndependentIdentification function.
async function TextIndependentIdentification(client, speech_config)
{
console.log ("Text Independent Identification:\n");
var profile = null;
try {
// Create the profile.
profile = await new Promise ((resolve, reject) => {
client.createProfileAsync (sdk.VoiceProfileType.TextIndependentIdentification, profile_locale, result => { resolve(result); }, error => { reject(error); });
});
console.log ("Created profile ID: " + profile.profileId);
await AddEnrollmentsToTextIndependentProfile(client, profile, [identify_file]);
const audio_config = GetAudioConfigFromFile(passphrase_files[0]);
const recognizer = new sdk.SpeakerRecognizer(speech_config, audio_config);
await SpeakerIdentify(profile, recognizer);
}
catch (error) {
console.log ("Error:\n" + error);
}
finally {
if (profile !== null) {
console.log ("Deleting profile ID: " + profile.profileId);
await new Promise ((resolve, reject) => {
client.deleteProfileAsync (profile, result => { resolve(result); }, error => { reject(error); });
});
}
}
}
Like the TextDependentVerification and TextIndependentVerification functions, this function creates a VoiceProfile object with the VoiceProfileClient.createProfileAsync method.
In this case, you pass VoiceProfileType.TextIndependentIdentification to VoiceProfileClient.createProfileAsync.
You then call two helper functions: AddEnrollmentsToTextIndependentProfile, which you defined already, and SpeakerIdentify, which you'll define next. Finally, call VoiceProfileClient.deleteProfileAsync to remove the profile.
SpeakerIdentify function
Define the SpeakerIdentify function as follows.
async function SpeakerIdentify(profile, recognizer)
{
const model = sdk.SpeakerIdentificationModel.fromProfiles([profile]);
const result = await new Promise ((resolve, reject) => {
recognizer.recognizeOnceAsync (model, result => { resolve(result); }, error => { reject(error); });
});
console.log ("The most similar voice profile is: " + result.profileId + " with similarity score: " + result.score + ".\n");
}
In this function, you create a SpeakerIdentificationModel object with the SpeakerIdentificationModel.fromProfiles method, passing in the VoiceProfile object you created earlier.
Next, you call the SpeechRecognizer.recognizeOnceAsync method and pass in an audio sample.
SpeechRecognizer.recognizeOnceAsync tries to identify the voice for this audio sample based on the VoiceProfile objects you used to create the SpeakerIdentificationModel. It returns a SpeakerRecognitionResult object, whose profileId property identifies the matching VoiceProfile, if any, while the score property contains a similarity score ranging from 0.0-1.0.
Main function
Finally, define the main function as follows.
async function main() {
const speech_config = sdk.SpeechConfig.fromSubscription(subscription_key, region);
const client = new sdk.VoiceProfileClient(speech_config);
await TextDependentVerification(client, speech_config);
await TextIndependentVerification(client, speech_config);
await TextIndependentIdentification(client, speech_config);
console.log ("End of quickstart.");
}
main();
This function creates a VoiceProfileClient object, which is used to create, enroll, and delete voice profiles. Then it calls the functions you defined previously.
In this quickstart, you learn basic design patterns for Speaker Recognition using the Speech SDK, including:
- Text-dependent and text-independent verification
- Speaker identification to identify a voice sample among a group of voices
- Deleting voice profiles
For a high-level look at Speaker Recognition concepts, see the overview article. See the Reference node on left nav for a list of the supported platforms.
Prerequisites
This article assumes that you have an Azure account and Speech service subscription. If you don't have an account and subscription, try the Speech service for free.
Important
Speaker Recognition is currently only supported in Azure Speech resources created in the westus region.
Text-dependent verification
Speaker Verification is the act of confirming that a speaker matches a known, or enrolled voice. The first step is to enroll a voice profile, so that the service has something to compare future voice samples against. In this example, you enroll the profile using a text-dependent strategy, which requires a specific passphrase to use for both enrollment and verification. See the reference docs for a list of supported passphrases.
Start by creating a voice profile. You will need to insert your Speech service subscription key and region into each of the curl commands in this article.
# Note Change locale if needed.
curl --location --request POST 'INSERT_ENDPOINT_HERE/speaker/verification/v2.0/text-dependent/profiles' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: application/json' \
--data-raw '{
'\''locale'\'':'\''en-us'\''
}'
Note there are three types of voice profile:
- Text-dependent verification
- Text-independent verification
- Text-independent identification
In this case, you create a text-dependent verification voice profile. You should receive the following response.
{
"remainingEnrollmentsCount": 3,
"locale": "en-us",
"createdDateTime": "2020-09-29T14:54:29.683Z",
"enrollmentStatus": "Enrolling",
"modelVersion": null,
"profileId": "714ce523-de76-4220-b93f-7c1cc1882d6e",
"lastUpdatedDateTime": null,
"enrollmentsCount": 0,
"enrollmentsLength": 0.0,
"enrollmentSpeechLength": 0.0
}
Next, you enroll the voice profile. For the --data-binary parameter value, specify an audio file on your computer that contains one of the supported passphrases, such as "my voice is my passport, verify me." You can record such an audio file with an app such as Windows Voice Recorder, or you can generate it using text -to-speech.
curl --location --request POST 'INSERT_ENDPOINT_HERE/speaker/verification/v2.0/text-dependent/profiles/INSERT_PROFILE_ID_HERE/enrollments' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: audio/wav' \
--data-binary @'INSERT_FILE_PATH_HERE'
You should receive the following response.
{
"remainingEnrollmentsCount": 2,
"passPhrase": "my voice is my passport verify me",
"profileId": "714ce523-de76-4220-b93f-7c1cc1882d6e",
"enrollmentStatus": "Enrolling",
"enrollmentsCount": 1,
"enrollmentsLength": 3.5,
"enrollmentsSpeechLength": 2.88,
"audioLength": 3.5,
"audioSpeechLength": 2.88
}
This response tells you that you need to enroll two more audio samples.
After you have enrolled a total of three audio samples, you should receive the following response.
{
"remainingEnrollmentsCount": 0,
"passPhrase": "my voice is my passport verify me",
"profileId": "714ce523-de76-4220-b93f-7c1cc1882d6e",
"enrollmentStatus": "Enrolled",
"enrollmentsCount": 3,
"enrollmentsLength": 10.5,
"enrollmentsSpeechLength": 8.64,
"audioLength": 3.5,
"audioSpeechLength": 2.88
}
Now you are ready to verify an audio sample against the voice profile. This audio sample should contain the same passphrase as the samples you used to enroll the voice profile.
curl --location --request POST 'INSERT_ENDPOINT_HERE/speaker/verification/v2.0/text-dependent/profiles/INSERT_PROFILE_ID_HERE/verify' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: audio/wav' \
--data-binary @'INSERT_FILE_PATH_HERE'
You should receive the following response.
{
"recognitionResult": "Accept",
"score": 1.0
}
The Accept means the passphrase matched and the verification was successful. The response also contains a similarity score ranging from 0.0-1.0.
To finish, delete the voice profile.
curl --location --request DELETE \
'INSERT_ENDPOINT_HERE/speaker/verification/v2.0/text-dependent/profiles/INSERT_PROFILE_ID_HERE' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE'
There is no response.
Text-independent verification
In contrast to text-dependent verification, text-independent verification:
- Does not require a certain passphrase to be spoken, anything can be spoken
- Does not require three audio samples, but does require 20 seconds of total audio
Start by creating a text-independent verification profile.
curl --location --request POST 'INSERT_ENDPOINT_HERE/speaker/verification/v2.0/text-independent/profiles' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: application/json' \
--data-raw '{
'\''locale'\'':'\''en-us'\''
}'
You should receive the following response.
{
"remainingEnrollmentsSpeechLength": 20.0,
"locale": "en-us",
"createdDateTime": "2020-09-29T16:08:52.409Z",
"enrollmentStatus": "Enrolling",
"modelVersion": null,
"profileId": "3f85dca9-ffc9-4011-bf21-37fad2beb4d2",
"lastUpdatedDateTime": null,
"enrollmentsCount": 0,
"enrollmentsLength": 0.0,
"enrollmentSpeechLength": 0.0
}
Next, enroll the voice profile. Again, rather than submitting three audio samples, you need to submit audio samples that contain a total of 20 seconds of audio.
curl --location --request POST 'INSERT_ENDPOINT_HERE/speaker/verification/v2.0/text-independent/profiles/INSERT_PROFILE_ID_HERE/enrollments' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: audio/wav' \
--data-binary @'INSERT_FILE_PATH_HERE'
Once you have submitted enough audio samples, you should receive the following response.
{
"remainingEnrollmentsSpeechLength": 0.0,
"profileId": "3f85dca9-ffc9-4011-bf21-37fad2beb4d2",
"enrollmentStatus": "Enrolled",
"enrollmentsCount": 1,
"enrollmentsLength": 33.16,
"enrollmentsSpeechLength": 29.21,
"audioLength": 33.16,
"audioSpeechLength": 29.21
}
Now you are ready to verify an audio sample against the voice profile. Again, this audio sample does not need to contain a passphrase. It can contain any speech, as long as it contains a total of at least four seconds of audio.
curl --location --request POST 'INSERT_ENDPOINT_HERE/speaker/verification/v2.0/text-independent/profiles/INSERT_PROFILE_ID_HERE/verify' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: audio/wav' \
--data-binary @'INSERT_FILE_PATH_HERE'
You should receive the following response.
{
"recognitionResult": "Accept",
"score": 0.9196669459342957
}
The Accept means the verification was successful. The response also contains a similarity score ranging from 0.0-1.0.
To finish, delete the voice profile.
curl --location --request DELETE 'INSERT_ENDPOINT_HERE/speaker/verification/v2.0/text-independent/profiles/INSERT_PROFILE_ID_HERE' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE'
There is no response.
Speaker identification
Speaker Identification is used to determine who is speaking from a given group of enrolled voices. The process is similar to text-independent verification, with the main difference being able to verify against multiple voice profiles at once, rather than verifying against a single profile.
Start by creating a text-independent identification profile.
# Note Change locale if needed.
curl --location --request POST 'INSERT_ENDPOINT_HERE/speaker/identification/v2.0/text-independent/profiles' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: application/json' \
--data-raw '{
'\''locale'\'':'\''en-us'\''
}'
You should receive the following response.
{
"remainingEnrollmentsSpeechLength": 20.0,
"locale": "en-us",
"createdDateTime": "2020-09-22T17:25:48.642Z",
"enrollmentStatus": "Enrolling",
"modelVersion": null,
"profileId": "de99ab38-36c8-4b82-b137-510907c61fe8",
"lastUpdatedDateTime": null,
"enrollmentsCount": 0,
"enrollmentsLength": 0.0,
"enrollmentSpeechLength": 0.0
}
Next, you enroll the voice profile. Again, you need to submit audio samples that contain a total of 20 seconds of audio. These samples do not need to contain a passphrase.
curl --location --request POST 'INSERT_ENDPOINT_HERE/speaker/identification/v2.0/text-independent/profiles/INSERT_PROFILE_ID_HERE/enrollments' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: audio/wav' \
--data-binary @'INSERT_FILE_PATH_HERE'
Once you have submitted enough audio samples, you should receive the following response.
{
"remainingEnrollmentsSpeechLength": 0.0,
"profileId": "de99ab38-36c8-4b82-b137-510907c61fe8",
"enrollmentStatus": "Enrolled",
"enrollmentsCount": 2,
"enrollmentsLength": 36.69,
"enrollmentsSpeechLength": 31.95,
"audioLength": 33.16,
"audioSpeechLength": 29.21
}
Now you are ready to identify an audio sample using the voice profile. The identify command accepts a comma-delimited list of possible voice profile IDs. In this case, you'll just pass in the ID of the voice profile you created previously. However, if you want, you can pass in multiple voice profile IDs where each voice profile is enrolled with audio samples from a different voice.
curl --location --request POST 'INSERT_ENDPOINT_HERE/speaker/identification/v2.0/text-independent/profiles/identifySingleSpeaker?profileIds=INSERT_PROFILE_ID_HERE' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: audio/wav' \
--data-binary @'INSERT_FILE_PATH_HERE'
You should receive the following response.
Success:
{
"identifiedProfile": {
"profileId": "de99ab38-36c8-4b82-b137-510907c61fe8",
"score": 0.9083486
},
"profilesRanking": [
{
"profileId": "de99ab38-36c8-4b82-b137-510907c61fe8",
"score": 0.9083486
}
]
}
The response contains the ID of the voice profile that most closely matches the audio sample you submitted. It also contains a list of candidate voice profiles, ranked in order of similarity.
To finish, delete the voice profile.
curl --location --request DELETE \
'INSERT_ENDPOINT_HERE/speaker/identification/v2.0/text-independent/profiles/INSERT_PROFILE_ID_HERE' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE'
There is no response.
Next steps
See the Speaker Recognition reference documentation for detail on classes and functions.