Text Dependent - Create Enrollment

Enroll Profile
Adds an enrollment to existing profile. If the minimum number of requested enrollment audios is reached, a voice print is created. If the voice print was created before, it gets recreated from all existing enrollment audios including the new one.

Limitations:

  • Minimum audio input length per request is 1 second
  • Maximum audio input length per request is 10 seconds
  • Minimum number of enrollments for creating a voiceprint is 3
  • Maximum number of enrollments for creating a voiceprint is 50
  • Minimum audio Signal-to-noise ratio (SNR) is 2dB

Constraints:

  • First enrollment must match an existing passphrase.
  • All enrollments after the first one, must use the same passphrase used in the first enrollment.
POST {endpoint}/speaker-recognition/verification/text-dependent/profiles/{profileId}/enrollments?api-version=2021-09-05

URI Parameters

Name In Required Type Description
endpoint
path True
  • string

Supported Cognitive Services endpoints (protocol and hostname, for example: https://westus.api.cognitive.microsoft.com).

profileId
path True
  • string
uuid

Unique identifier for profile id (guid).

api-version
query True
  • string

Specifies the version of the operation to use for this request.

Request Header

Media Types: "audio/wav; codecs=audio/pcm"

Name Required Type Description
Ocp-Apim-Subscription-Key True
  • string

Request Body

Media Types: "audio/wav; codecs=audio/pcm"

Name Type Description
audioData
  • object

Binary audio file. Supported formats are audio/wav; codecs=audio/pcm. Supports audio up to 5MB.

Responses

Name Type Description
201 Created

Created

Other Status Codes

Failure

Headers

  • x-ms-error-code: string

Security

Ocp-Apim-Subscription-Key

Type: apiKey
In: header

Examples

Successful Query

Sample Request

POST https://westus.api.cognitive.microsoft.com/speaker-recognition/verification/text-dependent/profiles/49a36324-fc4b-4387-aa06-090cfbf0064f/enrollments?api-version=2021-09-05
Ocp-Apim-Subscription-Key: {API key}
"{binary file date}"

Sample Response

Content-Type: application/json
{
  "profileId": "49a36324-fc4b-4387-aa06-090cfbf0064f",
  "enrollmentStatus": "Enrolling",
  "enrollmentsCount": 1,
  "enrollmentsLengthInSec": 1.83,
  "enrollmentsSpeechLengthInSec": 1.35,
  "remainingEnrollmentsCount": 2,
  "passPhrase": "my voice is my passport verify me",
  "audioLengthInSec": 1.83,
  "audioSpeechLengthInSec": 1.35
}
Content-Type: application/json
x-ms-error-code: Error Code
{
  "error": {
    "code": "Error Code",
    "message": "Erro Messae"
  }
}

Definitions

Error
SpeakerErrorInfo

Speaker error message

TdEnrollmentInfo

Text-Dependent Speaker profile enrollment info

TrainingStatusType

Status representing the current state of the profile. Available values are:

  • Enrolling: profile has no voice print and not ready for recognition requests.
  • Training: voice print of profile is being created and can’t be used for recognition at the moment.
  • Enrolled: profile has a voice print and ready for recognition requests.

Error

Name Type Description
code
  • string
message
  • string

SpeakerErrorInfo

Speaker error message

Name Type Description
error

TdEnrollmentInfo

Text-Dependent Speaker profile enrollment info

Name Type Description
audioLengthInSec
  • number

This enrollment audio length in seconds.

audioSpeechLengthInSec
  • number

This enrollment audio pure speech (which is the amount of audio after removing silence and non-speech segments) length in seconds.

enrollmentStatus

Status representing the current state of the profile. Available values are:

  • Enrolling: profile has no voice print and not ready for recognition requests.
  • Training: voice print of profile is being created and can’t be used for recognition at the moment.
  • Enrolled: profile has a voice print and ready for recognition requests.
enrollmentsCount
  • integer

Number of enrollment audios accepted for this profile.

enrollmentsLengthInSec
  • number

Total length of enrollment audios accepted for this profile in seconds.

enrollmentsSpeechLengthInSec
  • number

Summation of pure speech (which is the amount of audio after removing silence and non-speech segments) across all profile enrollments in seconds.

passPhrase
  • string

Passphrase associated with this enrollment.

profileId
  • string

Unique identifier for profile id (guid).

remainingEnrollmentsCount
  • integer

Number of enrollment audios needed to complete profile enrollment.

TrainingStatusType

Status representing the current state of the profile. Available values are:

  • Enrolling: profile has no voice print and not ready for recognition requests.
  • Training: voice print of profile is being created and can’t be used for recognition at the moment.
  • Enrolled: profile has a voice print and ready for recognition requests.
Name Type Description
Enrolled
  • string
Enrolling
  • string
Training
  • string