question

16818492 avatar image
0 Votes"
16818492 asked 16818492 commented

Need to reduce the time taken to transcribe a audio call using Azure Speech to Text

Hi ,

I am trying to convert audio files to text using azure speech to text, which includes diarization as well (using Batch transcription API) , the audio files are in wav format and roughly the size of audio file ranges between 500 KB to 3 MB, however, conversion of these files takes around 4 minutes on an average, If I were to transcribe for a very high volume of audio files, for ex( 2000 audio files would take around a week), which is very time consuming.

IS there a way to reduce the time taken?

PS:- Is 4 min per audio call for the size mentioned is the standard time taken ?

azure-cognitive-servicesazure-speechazure-text-analytics
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

romungi-MSFT avatar image
0 Votes"
romungi-MSFT answered 16818492 commented

@ParikshitSamvatsar-6931 Are you using each call to the API to send a single audio file? You could use more than one file URL to transcribe multiple files. Please review the guidance for more details.

To take the full advantage of Batch Transcription ability to efficiently transcribe a large number of audio files we recommend always sending multiple files per request or pointing to a Blob Storage container with the audio files to transcribe. The service will transcribe the files concurrently reducing the turnaround time. Using multiple files in a single request is very simple and straightforward.

Request to send multiple files in a single request

 {
   "contentUrls": [
     "<URL to an audio file 1 to transcribe>",
     "<URL to an audio file 2 to transcribe>",
     "<URL to an audio file 3 to transcribe>"
   ],
   "properties": {
     "wordLevelTimestampsEnabled": true
   },
   "locale": "en-US",
   "displayName": "Transcription of file using default model for en-US"
 }

The batch transcription API also supports upto 2000 simultaneous jobs and upto 300 requests per minute. This could actually help you transcribe all your files much faster. You can lookup the limits of the API here.

123938-image.png







image.png (37.9 KiB)
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello , @romungi-MSFT For speech to text batch transcription, I am using python, and following this code,

"https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/quickstart/python/from-blob/python-client/main.py"

But the issue is, I am unable put multiple audio files into single request as you mentioned above.

136898-image.png


IIf you see above code, this transcription definition does not accept multiple URI's of audio files, it only accepts single URI of audio file at a time which makes me unble to implement something like this below

136880-image.png




I am aware of the fact that you mentioned that I can convert all of audio files in single request by directly pointing to a container, but I am looking for the flexibility of where I can put selected multiple audio files of my choice into a single request, rather than just directly pointing to a container.

Request your response ,
Thanks in advance.

0 Votes 0 ·
image.png (10.1 KiB)
image.png (5.5 KiB)