Experimental preset for content-aware encoding

In order to prepare content for delivery by adaptive bitrate streaming, video needs to be encoded at multiple bit-rates (high to low). In order to ensure graceful degradation of quality, as the bitrate is lowered so is the resolution of the video. This results in a so-called encoding ladder – a table of resolutions and bitrates; see the Media Services built-in encoding presets.

Overview

Interest in moving beyond a one-preset-fits-all-videos approach increased after Netflix published their blog in December 2015. Since then, multiple solutions for content-aware encoding have been released in the marketplace; see this article for an overview. The idea is to be aware of the content – to customize or tune the encoding ladder to the complexity of the individual video. At each resolution, there is a bitrate beyond which any increase in quality is not perceptive – the encoder operates at this optimal bitrate value. The next level of optimization is to select the resolutions based on the content – for example, a video of a PowerPoint presentation does not benefit from going below 720p. Going further, the encoder can be tasked to optimize the settings for each shot within the video. Netflix described such an approach in 2018.

In early 2017, Microsoft released the Adaptive Streaming preset to address the problem of the variability in the quality and resolution of the source videos. Our customers had a varying mix of content, some at 1080p, others at 720p, and a few at SD and lower resolutions. Furthermore, not all source content was high quality mezzanines from film or TV studios. The Adaptive Streaming preset addresses these problems by ensuring that the bitrate ladder never exceeds the resolution or the average bitrate of the input mezzanine.

The experimental content-aware encoding preset extends that mechanism, by incorporating custom logic that lets the encoder seek the optimal bitrate value for a given resolution, but without requiring extensive computational analysis. The net result is that this new preset produces an output that has lower bitrate than the Adaptive Streaming preset, but at a higher quality. See the following sample graphs that show the comparison using quality metrics like PSNR and VMAF. The source was created by concatenating short clips of high complexity shots from movies and TV shows, intended to stress the encoder. By definition, this preset produces results that vary from content to content – it also means that for some content, there may not be significant reduction in bitrate or improvement in quality.

Rate-distortion (RD) curve using PSNR

Figure 1: Rate-distortion (RD) curve using PSNR metric for high complexity source

Rate-distortion (RD) curve using VMAF

Figure 2: Rate-distortion (RD) curve using VMAF metric for high complexity source

The preset currently is tuned for high complexity, high quality source videos (movies, TV shows). Work is in progress to adapt to low complexity content (for example, PowerPoint presentations), as well as poorer quality videos. This preset also uses the same set of resolutions as the Adaptive Streaming preset. Microsoft is working on methods to select the minimal set of resolutions based on the content. As follows are results for another category of source content, where the encoder was able to determine that the input was of poor quality (many compression artifacts because of the low bitrate). Note that with the experimental preset, the encoder decided to produce just one output layer – at a low enough bitrate so that most clients would be able to play the stream without stalling.

RD curve using PSNR

Figure 3: RD curve using PSNR for low quality input (at 1080p)

RD curve using VMAF

Figure 4: RD curve using VMAF for low quality input (at 1080p)

Use the experimental preset

You can create transforms that use this preset as follows. If using a tutorial such as this, you can update the code as follows:

TransformOutput[] output = new TransformOutput[]
{
   new TransformOutput
   {
      // The preset for the Transform is set to one of Media Services built-in sample presets.
      // You can customize the encoding settings by changing this to use "StandardEncoderPreset" class.
      Preset = new BuiltInStandardEncoderPreset()
      {
         // This sample uses the new experimental preset for content-aware encoding
         PresetName = EncoderNamedPreset.ContentAwareEncodingExperimental
      }
   }
};

Note

The prefix "experimental" is used here to signal that the underlying algorithms are still evolving. There can and will be changes over time to the logic used for generating bitrate ladders, with the goal of converging on an algorithm that is robust, and adapts to a wide variety of input conditions. Encoding jobs using this preset will still be billed based on output minutes, and the output asset can be delivered from our streaming endpoints in protocols such as DASH and HLS.

Next steps

Now that you have learned about this new option of optimizing your videos, we invite you to try it out. You can send us feedback using the links at the end of this article, or engage us more directly at amsved@microsoft.com.