Migrate from Media Indexer and Media Indexer 2 to Video Indexer

media services logo

The Azure Media Indexer media processor and Azure Media Indexer 2 Preview media processors are being retired. For the retirement dates, see this legacy components topic. Azure Media Services Video Indexer replaces these legacy media processors.

Azure Media Services Video Indexer is built on Azure Media Analytics, Azure Cognitive Search, Cognitive Services (such as the Face API, Microsoft Translator, the Computer Vision API, and Custom Speech Service). It enables you to extract the insights from your videos using Video Indexer video and audio models. To see what scenarios Video Indexer can be used in, what features it offers, and how to get started, see Video Indexer video and audio models.

You can extract insights from your video and audio files by using the Azure Media Services v3 analyzer presets or directly by using the Video Indexer APIs. Currently, there is an overlap between features offered by the Video Indexer APIs and the Media Services v3 APIs.


To understand when you would want to use Video Indexer vs. Media Services analyzer presets, check out the comparison document.

This article discusses the steps for migrating from the Azure Media Indexer and Azure Media Indexer 2 to Azure Media Services Video Indexer.

Migration options

If you require then
a solution that provides a speech-to-text transcription for any media file format in a closed caption file formats: VTT, SRT, or TTML
as well as additional audio insights such as: keywords, topic inferencing, acoustic events, speaker diarization, entities extraction and translation
update your applications to use the Azure Video Indexer capabilities through the Video Indexer v2 REST API or the Azure Media Services v3 Audio Analyzer preset.
speech-to-text capabilities use the Cognitive Services Speech API directly.

Getting started with Video Indexer

The following section points you to relevant links: How can I get started with Video Indexer?

Getting started with Media Services v3 APIs

Azure Media Services v3 API enables you to extract insights from your video and audio files through the Azure Media Services v3 analyzer presets.

AudioAnalyzerPreset enables you to extract multiple audio insights from an audio or video file. The output includes a VTT or TTML file for the audio transcript and a JSON file (with all the additional audio insights). The audio insights include keywords, speaker indexing, and speech sentiment analysis. AudioAnalyzerPreset also supports language detection for specific languages. For detailed information, see Transforms.

Get started

To get started see:

Getting started with Cognitive Services Speech Services

Azure Cognitive Services provides a speech-to-text service that transcribes audio streams to text in real time that your applications, tools, or devices can consume or display. You can use speech-to-text to customize your own acoustic model, language model, or pronunciation model. For more information, see Cognitive Services speech-to-text.


The speech-to-text service does not take video file formats and only takes certain audio formats.

For more information about the text-to-speech service and how to get started, see What is speech-to-text?

Known differences from deprecated services

You will find that Video Indexer, Azure Media Services v3 AudioAnalyzerPreset, and Cognitive Services Speech Services services are more reliable and produces better quality output than the retired Azure Media Indexer 1 and Azure Media Indexer 2 processors.

Some known differences include:

  • Cognitive Services Speech Services does not support keyword extraction. However, Video Indexer and Media Services v3 AudioAnalyzerPreset both offer a more robust set of keywords in JSON file format.

Need help?

You can open a support ticket by navigating to New support request

Next steps