What is Video Indexer? (preview)

Video Indexer is a cloud application built using Azure Media Analytics, Cognitive Services (such as the Face API, Microsoft Translator, the Computer Vision API, and Custom Speech Service), and Azure Search. It enables you to extract the following insights from your videos using artificial intelligence technologies:

  • Automatic language detection: Video Indexer can automatically detect the language of the video. Auto language detection currently supports English, Spanish, French, German, Italian, Chinese (Simplified), Japanese, Russian. Will fallback to English when the language can't be detected.
  • Audio transcription: Video Indexer has speech-to-text functionality, which enables customers to get a transcript of the spoken words. Supported languages include English, Spanish, French, German, Italian, Chinese (Simplified), Portuguese (Brazilian), Japanese, and Russian (with many more to come in the future).
  • Face tracking and identification: Face technologies enable detection of faces in a video. The detected faces are matched against a celebrity database to evaluate which celebrities are present in the video. Customers can also label faces that do not match a celebrity. Video Indexer builds a face model based on those labels and can recognize those faces in videos submitted in the future.
  • Speaker indexing: Video Indexer has the ability to map and understand which speaker spoke which words and when.
  • Visual text recognition: With this technology, Video Indexer service extracts text that is displayed in the videos.
  • Voice activity detection: Detection enables Video Indexer to separate background noise and voice activity.
  • Scene detection: Video Indexer has the ability to perform visual analysis on the video to determine when a scene changes in a video.
  • Keyframe extraction: Video Indexer automatically detects keyframes in a video.
  • Sentiment analysis: Video Indexer performs sentiment analysis on the text extracted using speech-to-text and optical character recognition, and provides that information in the form of positive, negative, or neutral sentiments, along with timecodes.
  • Translation: Video Indexer has the ability to translate the audio transcript from one language to another. The following languages are supported: English, Spanish, French, German, Italian, Chinese-Simplified, Portuguese-Brazilian, Japanese, and Russian. Once translated, the user can even get captioning in the video player in other languages.
  • Visual content moderation: This technology enables detection of adult and/or racy material present in the video and can be used for content filtering.
  • Keywords extraction: Video Indexer extracts keywords based on the transcript of the spoken words and text recognized by visual text recognizer.
  • Labels: Video Indexer provides labels for visual objects such as cat, dog, table, car, as well as actions such as standing, running or flying.
  • Brands: Video Indexer extracts business brands based on the transcript of the spoken words and text recognized by visual text recognizer.

Once Video Indexer is done processing and analyzing, you can review, curate, search, and publish the video insights.

Whether your role is a content manager or a developer, the Video Indexer service is able to address your needs. Content managers can use the Video Indexer web portal to consume the service without writing a single line of code, see Get started using the Video Indexer portal. Developers can take advantage of APIs to process content at scale, see Use Video Indexer REST API. The service also enables customers to use widgets to publish video streams and extracted insights in their own applications, see Embed visual widgets in your application.

You can sign up for the service using existing AAD, LinkedIn, Facebook, Google, or MSA account. For more information, see getting started.


Below are a few scenarios where Video Indexer can be useful

  • Search – Insights extracted from the video can be used to enhance the search experience across a video library. For example, indexing spoken words and faces can enable the search experience of finding moments in a video where a particular person spoke certain words or when two people were seen together. Search based on such insights from videos is applicable to news agencies, educational institutes, broadcasters, entertainment content owners, enterprise LOB apps and in general to any industry that has a video library that users need to search against.

  • Monetization – Video Indexer can help improve the value of videos. As an example, industries that rely on ad revenue (for example, news media, social media, etc.), can deliver more relevant ads by using the extracted insights as additional signals to the ad server (presenting a sports shoe ad is more relevant in the middle of a football match vs. a swimming competition).

  • User engagement – Video insights can be used to improve user engagement by positioning the relevant video moments to users. As an example, consider an educational video that explains spheres for the first 30 minutes and pyramids in the next 30 minutes. A student reading about pyramids would benefit more if the video is positioned starting from the 30-minute marker.

For more information, see this blog.

Next steps

You're ready to get started with Video Indexer. For more information, see the following articles: