Speech Services

Solution Idea

If you'd like to see us expand this article with more information (implementation details, pricing guidance, code examples, etc), let us know with GitHub Feedback!

With Speech Services, it's easy to transcribe every call. Index the transcription for full-text search, or apply Text Analytics to detect sentiment, language, and key phrases for insights. If your call center recordings involve specialized terminology, such as product names or IT jargon, create a custom language model to teach Speech Services the vocabulary. A custom acoustic model helps Speech Services understand speakers even with background noise or poor phone connections.

For more information, read how batch transcription works with Speech Services.

Architecture

Adapt a model for your domain and deploy that model

Upload your recordings to a blob container

Create a POST request to batch transcription

Speech Services schedules the transcription job

Stereo files are split into two channels

Mono files undergo diarization to distinguish between speakers

Download the transcription using the transcription ID

Data Flow

  1. Adapt a model for your domain and deploy that model
  2. Upload your recordings to a blob container
  3. Create a POST request to batch transcription
  4. Speech Services schedules the transcription job
  5. Stereo files are split into two channels
  6. Mono files undergo diarization to distinguish between speakers
  7. Download the transcription using the transcription ID