Get a speaker profile ID for the personal voice (preview)

Article
02/07/2024

Note

Personal voice for text to speech is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

To use personal voice in your application, you need to get a speaker profile ID. The speaker profile ID is used to generate synthesized audio with the text input provided.

You create a speaker profile ID based on the speaker's verbal consent statement and an audio prompt (a clean human voice sample between 50 - 90 seconds). The user's voice characteristics are encoded in the speakerProfileId property that's used for text to speech. For more information, see use personal voice in your application.

Note

The personal voice ID and speaker profile ID aren't same. You can choose the personal voice ID, but the speaker profile ID is generated by the service. The personal voice ID is used to manage the personal voice. The speaker profile ID is used for text to speech.

You provide the audio files from a publicly accessible URL (PersonalVoices_Create) or upload the audio files (PersonalVoices_Post).

Create personal voice from a file

In this scenario, the audio files must be available locally.

To create a personal voice and get the speaker profile ID, use the PersonalVoices_Post operation of the custom voice API. Construct the request body according to the following instructions:

Set the required projectId property. See create a project.
Set the required consentId property. See add user consent.
Set the required audiodata property. You can specify one or more audio files in the same request.

Make an HTTP POST request using the URI as shown in the following PersonalVoices_Post example.

Replace YourResourceKey with your Speech resource key.
Replace YourResourceRegion with your Speech resource region.
Replace JessicaPersonalVoiceId with a personal voice ID of your choice. The case sensitive ID will be used in the personal voice's URI and can't be changed later.

curl -v -X POST -H "Ocp-Apim-Subscription-Key: YourResourceKey" -F 'projectId="ProjectId"' -F 'consentId="JessicaConsentId"' -F 'audiodata=@"D:\PersonalVoiceTest\CNVSample001.wav"' -F 'audiodata=@"D:\PersonalVoiceTest\CNVSample002.wav"' "
https://YourResourceRegion.api.cognitive.microsoft.com/customvoice/personalvoices/JessicaPersonalVoiceId?api-version=2023-12-01-preview"

You should receive a response body in the following format:

{
  "id": "JessicaPersonalVoiceId",
  "speakerProfileId": "3059912f-a3dc-49e3-bdd0-02e449df1fe3",
  "projectId": "ProjectId",
  "consentId": "JessicaConsentId",
  "status": "NotStarted",
  "createdDateTime": "2023-04-01T05:30:00.000Z",
  "lastActionDateTime": "2023-04-02T10:15:30.000Z"
}

Use the speakerProfileId property to integrate personal voice in your text to speech application. For more information, see use personal voice in your application.

The response header contains the Operation-Location property. Use this URI to get details about the PersonalVoices_Post operation. Here's an example of the response header:

Operation-Location: https://eastus.api.cognitive.microsoft.com/customvoice/operations/1321a2c0-9be4-471d-83bb-bc3be4f96a6f?api-version=2023-12-01-preview
Operation-Id: 1321a2c0-9be4-471d-83bb-bc3be4f96a6f

Create personal voice from a URL

In this scenario, the audio files must already be stored in an Azure Blob Storage container.

To create a personal voice and get the speaker profile ID, use the PersonalVoices_Create operation of the custom voice API. Construct the request body according to the following instructions:

Set the required projectId property. See create a project.
Set the required consentId property. See add user consent.
Set the required audios property. Within the audios property, set the following properties:
- Set the required containerUrl property to the URL of the Azure Blob Storage container that contains the audio files. Use shared access signatures (SAS) SAS for a container with both read and list permissions.
- Set the required extensions property to the extensions of the audio files.
- Optionally, set the prefix property to set a prefix for the blob name.

Make an HTTP PUT request using the URI as shown in the following PersonalVoices_Create example.

Replace YourResourceKey with your Speech resource key.
Replace YourResourceRegion with your Speech resource region.
Replace JessicaPersonalVoiceId with a personal voice ID of your choice. The case sensitive ID will be used in the personal voice's URI and can't be changed later.

curl -v -X PUT -H "Ocp-Apim-Subscription-Key: YourResourceKey" -H "Content-Type: application/json" -d '{
  "projectId": "ProjectId",
  "consentId": "JessicaConsentId",
  "audios": {
    "containerUrl": "https://contoso.blob.core.windows.net/voicecontainer?mySasToken",
    "prefix": "jessica/",
    "extensions": [
      ".wav"
    ]
  }
} '  "https://YourResourceRegion.api.cognitive.microsoft.com/customvoice/personalvoices/JessicaPersonalVoiceId?api-version=2023-12-01-preview"

You should receive a response body in the following format:

{
  "id": "JessicaPersonalVoiceId",
  "speakerProfileId": "3059912f-a3dc-49e3-bdd0-02e449df1fe3",
  "projectId": "ProjectId",
  "consentId": "JessicaConsentId",
  "status": "NotStarted",
  "createdDateTime": "2023-04-01T05:30:00.000Z",
  "lastActionDateTime": "2023-04-02T10:15:30.000Z"
}

Use the speakerProfileId property to integrate personal voice in your text to speech application. For more information, see use personal voice in your application.

The response header contains the Operation-Location property. Use this URI to get details about the PersonalVoices_Create operation. Here's an example of the response header:

Operation-Location: https://eastus.api.cognitive.microsoft.com/customvoice/operations/1321a2c0-9be4-471d-83bb-bc3be4f96a6f?api-version=2023-12-01-preview
Operation-Id: 1321a2c0-9be4-471d-83bb-bc3be4f96a6f

Next steps

Use personal voice in your application..

Share via

Get a speaker profile ID for the personal voice (preview)

Create personal voice from a file

Create personal voice from a URL

Next steps

Feedback

Additional resources