Need to reduce the time taken to transcribe a audio call using Azure Speech to Text

Question

Hi ,

I am trying to convert audio files to text using azure speech to text, which includes diarization as well (using Batch transcription API) , the audio files are in wav format and roughly the size of audio file ranges between 500 KB to 3 MB, however, conversion of these files takes around 4 minutes on an average, If I were to transcribe for a very high volume of audio files, for ex( 2000 audio files would take around a week), which is very time consuming.

IS there a way to reduce the time taken?

PS:- Is 4 min per audio call for the size mentioned is the standard time taken ?

Answer

@ParikshitSamvatsar-6931 Are you using each call to the API to send a single audio file? You could use more than one file URL to transcribe multiple files. Please review the guidance for more details.

To take the full advantage of Batch Transcription ability to efficiently transcribe a large number of audio files we recommend always sending multiple files per request or pointing to a Blob Storage container with the audio files to transcribe. The service will transcribe the files concurrently reducing the turnaround time. Using multiple files in a single request is very simple and straightforward.

Request to send multiple files in a single request

{  
  "contentUrls": [  
    "",  
    "",  
    ""  
  ],  
  "properties": {  
    "wordLevelTimestampsEnabled": true  
  },  
  "locale": "en-US",  
  "displayName": "Transcription of file using default model for en-US"  
}

The batch transcription API also supports upto 2000 simultaneous jobs and upto 300 requests per minute. This could actually help you transcribe all your files much faster. You can lookup the limits of the API here.

Need to reduce the time taken to transcribe a audio call using Azure Speech to Text

1 answer