Speech services

Blob Storage

Solution Idea

If you'd like to see us expand this article with more information, such as potential use cases, alternative services, implementation considerations, or pricing guidance, let us know with GitHub Feedback!

With Speech services, it's easy to transcribe every call. Index the transcription for full-text search, or apply Text Analytics to detect sentiment, language, and key phrases for insights. If your call center recordings involve specialized terminology, such as product names or IT jargon, create a custom language model to teach Speech Services the vocabulary. A custom acoustic model helps Speech Services understand speakers even with background noise or poor phone connections.

For more information, read how batch transcription works with Speech Services.


Architecture Diagram Download an SVG of this architecture.

Data flow

  1. Adapt a model for your domain and deploy that model
  2. Upload your recordings to a blob container
  3. Create a POST request to batch transcription
  4. Speech Services schedules the transcription job
  5. Stereo files are split into two channels
  6. Mono files undergo diarization to distinguish between speakers
  7. Download the transcription using the transcription ID


Next steps