Need to reduce the time taken to transcribe a audio call using Azure Speech to Text

16818492 1 Reputation point
2021-08-17T06:31:32.043+00:00

Hi ,

I am trying to convert audio files to text using azure speech to text, which includes diarization as well (using Batch transcription API) , the audio files are in wav format and roughly the size of audio file ranges between 500 KB to 3 MB, however, conversion of these files takes around 4 minutes on an average, If I were to transcribe for a very high volume of audio files, for ex( 2000 audio files would take around a week), which is very time consuming.

IS there a way to reduce the time taken?

PS:- Is 4 min per audio call for the size mentioned is the standard time taken ?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,413 questions
Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
359 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,415 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 42,311 Reputation points Microsoft Employee
    2021-08-17T11:28:00.743+00:00

    @ParikshitSamvatsar-6931 Are you using each call to the API to send a single audio file? You could use more than one file URL to transcribe multiple files. Please review the guidance for more details.

    To take the full advantage of Batch Transcription ability to efficiently transcribe a large number of audio files we recommend always sending multiple files per request or pointing to a Blob Storage container with the audio files to transcribe. The service will transcribe the files concurrently reducing the turnaround time. Using multiple files in a single request is very simple and straightforward.

    Request to send multiple files in a single request

    {  
      "contentUrls": [  
        "<URL to an audio file 1 to transcribe>",  
        "<URL to an audio file 2 to transcribe>",  
        "<URL to an audio file 3 to transcribe>"  
      ],  
      "properties": {  
        "wordLevelTimestampsEnabled": true  
      },  
      "locale": "en-US",  
      "displayName": "Transcription of file using default model for en-US"  
    }  
    

    The batch transcription API also supports upto 2000 simultaneous jobs and upto 300 requests per minute. This could actually help you transcribe all your files much faster. You can lookup the limits of the API here.

    123938-image.png