Text Independent - Create Enrollment
Enroll Profile
Adds an enrollment to existing profile.
The first enrollment must be a predefined activation phrase which can be listed using the /phrases/{locale} api.
If the minimum number of requested enrollment audios is reached, a voice print is created.
Any further enrollment will be used to improve the voice print.
Limitations:
Minimum audio input length per request is 1 second
Maximum audio input length per request is 120 seconds
Minimum total effective speech length (excluding silence and other non-speech frames) for creating a voiceprint is 20 seconds This limitation can be disabled by setting ignoreMinLength to true.
Maximum total audio input length allowed for creating a voiceprint is 300 seconds
Minimum audio Signal-to-noise ratio (SNR) is 2dB
POST {endpoint}/speaker-recognition/identification/text-independent/profiles/{profileId}/enrollments?api-version=2021-09-05
POST {endpoint}/speaker-recognition/identification/text-independent/profiles/{profileId}/enrollments?api-version=2021-09-05&ignoreMinLength={ignoreMinLength}
URI Parameters
| Name | In | Required | Type | Description |
|---|---|---|---|---|
|
endpoint
|
path | True |
|
Supported Cognitive Services endpoints (protocol and hostname, for example: https://westus.api.cognitive.microsoft.com). |
|
profile
|
path | True |
|
Unique identifier for profile id (guid). |
|
api-version
|
query | True |
|
Specifies the version of the operation to use for this request. |
|
ignore
|
query |
|
If true, a voice print will be created immediately for this profile regardless of how much speech is supplied or stored. Default is false. |
Request Header
Media Types: "audio/wav; codecs=audio/pcm"
| Name | Required | Type | Description |
|---|---|---|---|
| Ocp-Apim-Subscription-Key | True |
|
Request Body
Media Types: "audio/wav; codecs=audio/pcm"
| Name | Type | Description |
|---|---|---|
| audioData |
|
Binary audio file. Supported formats are audio/wav; codecs=audio/pcm. Supports audio up to 5MB. |
Responses
| Name | Type | Description |
|---|---|---|
| 201 Created |
Created |
|
| Other Status Codes |
Failure Headers
|
Security
Ocp-Apim-Subscription-Key
Type:
apiKey
In:
header
Examples
Successful Query
Sample Request
POST https://westus.api.cognitive.microsoft.com/speaker-recognition/identification/text-independent/profiles/49a36324-fc4b-4387-aa06-090cfbf0064f/enrollments?api-version=2021-09-05
Ocp-Apim-Subscription-Key: {API key}
"{binary file date}"
Sample Response
Content-Type: application/json
{
"profileId": "49a36324-fc4b-4387-aa06-090cfbf0064f",
"enrollmentStatus": "Enrolling",
"enrollmentsCount": 1,
"enrollmentsLengthInSec": 1.83,
"enrollmentsSpeechLengthInSec": 1.35,
"remainingEnrollmentsSpeechLengthInSec": 18.65,
"audioLengthInSec": 1.83,
"audioSpeechLengthInSec": 1.35
}
Content-Type: application/json
x-ms-error-code: Error Code
{
"error": {
"code": "Error Code",
"message": "Erro Messae"
}
}
Definitions
| Error | |
|
Speaker |
Speaker error message |
|
Ti |
Speaker profile enrollment info |
|
Training |
Status representing the current state of the profile enrollment. Available values are:
|
Error
| Name | Type | Description |
|---|---|---|
| code |
|
|
| message |
|
SpeakerErrorInfo
Speaker error message
| Name | Type | Description |
|---|---|---|
| error |
TiEnrollmentInfo
Speaker profile enrollment info
| Name | Type | Description |
|---|---|---|
| audioLengthInSec |
|
This enrollment audio length in seconds. |
| audioSpeechLengthInSec |
|
This enrollment audio pure speech (which is the amount of audio after removing silence and non-speech segments) length in seconds. |
| enrollmentStatus |
Status representing the current state of the profile enrollment. Available values are:
|
|
| enrollmentsCount |
|
Number of enrollment audios accepted for this profile. |
| enrollmentsLengthInSec |
|
Total length of enrollment audios accepted for this profile in seconds. |
| enrollmentsSpeechLengthInSec |
|
Summation of pure speech (which is the amount of audio after removing silence and non-speech segments) across all profile enrollments in seconds. |
| profileId |
|
Unique identifier for profile id (guid). |
| remainingEnrollmentsSpeechLengthInSec |
|
Amount of pure speech (which is the amount of audio after removing silence and non-speech segments) needed to complete profile enrollment in seconds. |
TrainingStatusType
Status representing the current state of the profile enrollment. Available values are:
- Enrolling: profile has no voice print and not ready for recognition requests.
- Training: voice print of profile is being created and can’t be used for recognition at the moment.
- Enrolled: profile has a voice print and ready for recognition requests.
| Name | Type | Description |
|---|---|---|
| Enrolled |
|
|
| Enrolling |
|
|
| Training |
|