Building a rich and extensible media platform

Windows provides a broad set of technologies for consumers to experience video and audio and for developers to tap into these technologies through rich APIs. This post goes into depth on both of these aspects of the Windows media platform, which has been substantially improved for both desktop and Metro style apps. The landscape for media playback has changed significantly since Windows 7 was released, with an increased focus on streaming, and the desire for content owners to offer playback of their content on a broader array of devices, all while significantly reducing the battery power required for playback. With these new capabilities, which are part of both Windows 8 and Windows RT of course, we worked to provide industry-leading support for consumers and developers. This post was authored by Scott Manchester, group program manager for our Media Platform and Technologies team. –Steven

Engaging with rich media—whether watching a movie, video chatting, or playing music—is one of the most prevalent and enjoyable things we do on our PCs today. I’d like to talk a little bit about the work we’ve done in Windows 8 to make a rich variety of multimedia activities possible, and to extend those capabilities to third party developers through an extensible media platform.

We had three goals in mind when designing the Windows 8 media platform:

  1. Maximize performance. We wanted media playback to be fast and responsive, enabling the full power of the hardware while maximizing battery life on each PC.
  2. Simplify development and extensibility. We wanted to provide a platform that could be easily extended and tailored for a given application, setting the stage for innovative custom media apps on Windows.
  3. Enable a breadth of scenarios. A high performance, high efficiency, extensible platform can then enable a wide range of music, video, communications, and other multimedia apps.

With these three goals in mind, we set out to reimagine the media experience on the Windows platform.

Faster, more responsive media experiences

Performance is a key aspect of any user experience, but it is especially critical in multimedia scenarios. Videos need to play in real time, voice communication needs to feel instantaneous, and all of these tasks need to minimize the drain on your battery.

We measure performance by the time, computing resources, and memory that a given task takes on a system. We aimed to minimize all of those metrics. Our goals for media performance were focused on audio and video playback, transcoding, encoding, and capture.

Efficient video decoding

To get better battery life or just reduce power consumption for all media scenarios, we continue to work with partners in the silicon chip industry to enable new and faster experiences. With Windows 8 running on a Windows 8 certified PC, video decoding for common media formats will be offloaded to a dedicated hardware subsystem for media. This allows us to significantly lower CPU usage, resulting in smoother video playback and a longer battery life, as the dedicated media hardware is much more efficient than the CPU at media decoding. This improves all scenarios that require video decoding, including playback, transcoding, encoding, and capture scenarios.

The figure below shows a comparison of the average CPU utilization between Windows 7 and Windows 8 during playback of 720p VC1/H.264 video clips and webcam capture preview.

Windows 7 and Windows 8 CPU usage compared. WMV Decode on Windows 7: 32%, on Windows 8: 14%; H.264 Decode on Windows 7: 30%, on Windows 8: 13%; Capture Preview on Windows 7: 27%, on Windows 8: 8%.

In addition to video offload, the improvements to webcam capture are made possible by the move from a DirectShow Capture API to the new, far more optimized Windows 8 Media Foundation Capture API. We’ve also improved software encoders for H.264 and VC-1 content so that encoding using the CPU (when it makes sense) is both fast and power-efficient.

Maximizing battery life during audio playback

Another example of the media performance improvements we’ve made in Windows 8 is in maximizing battery life (or just reducing power consumption) during audio playback. In addition to enabling offload of the audio pipeline (similar to the offload of video described above), we’ve radically improved the audio playback pipeline to be more efficient during steady-state playback. By batching up large chunks of audio data and doing all the processing for that chunk at one time, the CPU can stay asleep for over 100 times longer (over 1 second vs. 10ms), which can result in dramatically increased battery life during audio playback.

Of course, this approach isn’t perfect for all scenarios since the increased buffering introduces additional delay. In the communications section below, we’ll talk more about these tradeoffs and how the media stack adapts to optimize for each scenario

Audio and video offloading are just a couple of examples of the ways we’ve optimized the media stack in Windows 8 to provide lower CPU utilization, lower memory utilization, and better battery life for Desktop and Metro style apps.

Supporting a rich set of media scenarios

Performance is a critical aspect of the platform, but it is only as important as the features that shine because of it. In Windows 8, those features include support for modern video formats, low-latency communication streams, and a seamless connection to external media devices.

Platform tradeoffs

One of the challenges in developing a single media platform that serves different scenarios is that the platform has competing goals. For example, communication scenarios require low-latency, and audio/video encoding and playback, whose quality and performance benefit from buffering, which results in higher latency. In the next several sections, we’ll touch on these challenges in the context of some of the scenarios we’ve worked to enable in Windows 8, including:

  • Communications (e.g. Skype, Lync, etc.)
  • Video playback and modern format support
  • Auto-orientation of video
  • Playback of premium content
  • Seamless audio transitions
  • Bringing the media experience to additional screens
  • Emerging media capabilities

Simplifying development and extensibility

One common theme across these experiences is the extensibility that we’ve incorporated into the multimedia platform. Because users have a wide range of use cases, media formats, codecs, protection mechanisms, and processing, we provided our developers with the ability to customize and tailor their offerings to create great apps and websites on Windows.

As we discuss some of the media scenarios in the next several sections, we'll also cover some of the work we’ve done to make those scenarios extensible by developers and third-party partners. Let’s dive deeper into the scenarios we’ve targeted for Windows 8.


Real-time communication on PCs, especially on mobile devices, has seen a huge growth over the last decade. Windows users are using services like Skype and Lync to make several billion minutes of voice and video calls per day. TeleGeography estimates that international Skype-to-Skype calls (including video calls) grew 48 percent in 2011, to 145 billion minutes. We’ve made a significant investment in improving the experience of video and audio calling on all Windows 8 PCs. To achieve this goal, we focused our efforts in two areas:

  • Enable built-in low-latency media capture and rendering. Low latency is essential for communications apps, so Windows supports low-latency media capture and playback into the OS.
  • Support HD cameras to enhance video communication experience. High-definition videos make your communication experience more real and enjoyable, so Windows supports HD camera devices.

Enabling low latency

When you communicate with another person, you expect near-instant responses. For this reason, communications systems generally try to minimize the end-to-end delay (also referred to as latency). In designing audio and video systems for playback, buffering is often used as both a protection against glitches caused by processing spikes or network traffic, and to reduce power consumption. However, this buffering introduces a delay into the audio and video, which is perceived as latency by the audience. In engineering Windows 8, we designed the media platform to support both playback-optimized and communication-optimized scenarios. The media infrastructure can switch between a playback mode (high buffering, more tolerant of varying conditions) and a communications-optimized mode (low delay).

According to the TIA/EIA 920 standard, the one-way audio latency that can be attributed to just the media processing pipeline cannot exceed 100ms in order to achieve a usable real-time communication experience. With this metric in mind, we designed a test environment to measure the end-to-end latency of the pipeline, shown in the following diagram:

Illustration of latency between sender and receiver. Includes camera latency on capture device; latency in capture pipeline including capture source, encoder, and network sink; playback pipeline latency including network source, decoder, video processor and renderer; and rendering device latency in the display or audio speaker.

There are many components to optimize to get low latency

In the case of video communication, the end-to-end or “glass-to-glass” pipeline latency is measured as the delay it takes for a video frame to be captured by the camera device and then encoded to a supported video format, streamed over the network loopback interfaces, decoded, and finally rendered by the display.

Looking at the figure below, you can see the result obtained for capturing and rendering PCM audio when the media pipeline is in low latency mode. The first set of spikes corresponds to the original spoken words at the transmitter and the second set shows those words at the receiver. The delay between the two is 65ms, well below the 100ms goal.

Graph showing 65 millisecond delay between sender and receiver of audio transmission

End-to-end pipeline latency of PCM audio: Low latency mode

The next chart shows a comparison of the pipeline latency of playback and communication-optimized mode when a video frame is captured, encoded (in H.264 format), streamed, decoded, and then displayed at various resolutions. The goal of 145ms overall latency (as deemed by TIA/EIA 920 for usable real-time video calling) is shown by the green line on the chart.

Comparison shown for VGA, SVGA, 720p and 1080p. In all cases, playback mode is over 500ms but in Low latency mode is close to 100ms, under the goal of 145ms.

Video frames are captured at a rate of 30 frames per second and encoded into H.264

In playback mode, the average latency of the pipeline is about 575ms. This delay is necessary for a smooth playback experience when consuming video, but unacceptable for real-time video communication. In low latency mode, on the other hand, the measured latency is well under the target goal at each of the measured video resolutions.

Supporting HD video calling

Another example of the work we have done to improve communication on Windows 8 PCs is through OS support for HD cameras. New class drivers will work transparently with applications to provide support for HD video features. In addition, all of the hardware acceleration for video decoding discussed previously will be utilized for communication scenarios.

Windows 8 will offer a consistent, high-quality, hardware-accelerated, power efficient media communication experience on PCs designed for Windows 8. We have made significant investments in the media platform to improve pipeline latency, and with added support for H.264 cameras, users will be able to communicate with friends and family in high-fidelity HD video.

Video and audio support for Metro style apps

Our main goal for native media format support for Metro style apps was to ensure users and app developers could count on a consistently great playback experience across a wide variety of PC form factors, with modern formats used in mainstream scenarios such as:

  • HTML5-based entertainment on the web
  • Home movies captured using popular smartphones, point-and-shoot cameras, or AVC-HD cameras
  • Streaming music, movies, and TV shows from popular services

The tables below show the video and audio formats that have built-in support for Metro style apps. Formats recommended for use by Metro style apps are a reflection of deep partnerships with hardware manufacturers for predictable hardware acceleration across PC form factors and predictable end-to-end scenario performance beyond playback such as capture, streaming, and transcoding.

Media file and stream formats

Windows 8 has excellent support for MPEG-4, most typically comprised of H.264 video and AAC audio. Several popular codecs, including Divx and Xvid, implement the MPEG-4 Part 2 standard, so many of these files play great in Metro style apps. The same is true for modern MOV files, which are based on the MPEG-4 Part 12 standard, such as videos captured on iOS devices. Fragmented MPEG-4 and 2K/4K resolutions are now possible. We have previously talked about MPEG-2 and DVD playback, which is available in Windows 8 Media Center.

During the development of Windows 7 we talked quite a bit about CODEC support natively in Windows and the formats available through extensibility. Since then, the environment around CODECs has consistently moved towards a smaller set of well-defined and broadly-supported formats, particularly h.264 for video. Due to factors such as intellectual property and hardware support, this makes a great deal of sense. Even browsers are making this transition with HTML5. But we also recognize that some individuals have preferred formats for a variety of reasons, and we wanted to make sure Windows 8 app developers could choose to use the formats they prefer. Formats popular among the enthusiast community or with specific developers such as FLAC, MKV, and OGG, can have their own CODECs packaged as part of a Metro style app, since the Windows 8 media platform is highly extensible.

Auto-orientation of video

With the proliferation of video recording in traditional cameras, smartphones, and tablets, users can capture video while holding their device in either portrait or landscape mode – there is no “right-side-up” any longer, thanks to modern touch-based interfaces. Many of us have experienced the frustration of recording a video and realizing the camera was sideways or upside down only after viewing it on the PC. Since the video scan pattern is fixed, videos may not be oriented properly when viewed.

To overcome this problem, cameras are beginning to author orientation metadata in mainstream file formats such as MP4 and ASF when saving recorded video to storage.

Image appears sideways without metadata support, but appears correctly with metadata support

To ensure a terrific viewing experience of personal videos from Windows PCs, we’ve made the following improvements to address this problem:

  • Orientation metadata is now supported in MP4 and ASF (VC-1, WMV) videos.
  • Videos with orientation metadata are auto-rotated during playback.
  • The thumbnail for a video with orientation metadata is auto-rotated.
  • Metro style apps with video capture capabilities can easily read and author orientation metadata.

Premium content

Another area where we’ve invested heavily for Windows 8 is in allowing seamless playback of premium content. Although most of the video content consumed initially on the Internet was user generated, much of the growth in the Internet video space can now be attributed to “premium content,” which includes online movie purchases through on-demand streaming video, as well as the ad-supported TV offerings. According to IHS Screen Digest, 3.4 billion paid movies will be streamed online in the US in 2012—over double the number watched in 2011, and over a billion more movies than were consumed via DVD and Blu-Ray combined.

Premium video content has many of the same requirements as any other video content, but it also requires two substantial platform features in order to deliver the best experience: adaptive bitrate streaming and content protection.

Adaptive bitrate streaming

Adaptive bitrate streaming provides a smoother, more responsive video playback experience by enabling the PC to adapt to the most appropriate bitrate under varying networking and resource utilization conditions. As a result, startup and seek times can be significantly improved because the first few frames can be delivered at a lower bitrate to reduce buffering time and increase responsiveness. If network or device conditions change, the PC can negotiate a lower or higher bitrate to minimize buffering or increase video quality.

Through the extensibility of the Media Foundation Platform in Windows 8, apps can have custom media sources and adaptive bitrate media sources to support new formats. Custom media sources and streaming protocols can also take advantage of hardware offload and content protection.

The Windows Azure Media Services team is using our extensibility model to build the Smooth Streaming Client SDK for Metro style apps. Smooth Streaming is Microsoft’s initiative to deliver high quality multi-bitrate content and enable Video-on-demand, Live, Linear TV, and Download-and-Play.

Content protection

Most premium Internet video content services choose to apply content protection, which is often a requirement from the content owners (e.g. movie studios or TV networks). To enable the playback of protected content in Metro style apps, Microsoft is making available the PlayReady Client SDK for premium content services. PlayReady supports download as well as streaming, and the above-mentioned IIS Smooth Streaming Client SDK integrates seamlessly with the PlayReady Client SDK to allow services to easily build protected streaming experiences.

We recognize that there are other content protection technologies being used today in the industry. Just like with adaptive streaming, the Media Foundation extensibility model allows for third parties to integrate their custom content protection systems with built-in hardware-accelerated video decoding. If a service needs to use a custom streaming format or content protection system, it can integrate its own technology without having to compromise on decoding quality or battery runtime.

In summary, Windows 8 will enable a wider offering of premium content services for customers to choose from and enjoy on their Windows 8 devices, providing a great streaming and downloaded experience as well as great battery life when watching premium HD video content.

Seamless audio transitions

As Windows 8 enables a multitude of media scenarios, we wanted to make sure that transitioning between these scenarios was as seamless and fluid as possible. Users often run into overlapping audio-based activities – for example, while listening to a music streaming service, they attempt to watch a video clip. We wanted to provide a clean, uncluttered audio experience that would make it easier and simpler for you to listen to the content you want, when you want it.

In Windows 8, instead of mixing all audio content and sending the resulting (often incoherent) stream to the speakers, Windows can pause a stream when a second stream is played and when it makes sense to do so. In most cases, Windows prioritizes audio coming from the app that is in the foreground. When you move the app to the background, the system quiets the stream. An example is a game app where you likely don’t want to listen to game audio when you’ve switched away from the game. However, there are cases where this is not the desired behavior – for example, if you’re listening to music in the background while checking email or surfing the web. To enable these scenarios and to allow you to hear background audio when it makes sense, we’ve introduced stream types that reflect the type of audio being played.

Below is a list of different stream types, along with an example of the type of content expected for each stream.

Audio category

Example streams

Background capable?

Background capable media

Local and streaming audio playlists


Foreground only media

Movies, games



Skype, Voice-over-IP, live chatting



Alarms, ringing notifications


Game media

Background music played by a game


Game effects

Gun shots, explosions, characters talking, all non-music sounds


Sound effects

Button confirmation sounds, beeps, dings



Default audio type, and recommended for all audio media that does not need to continue playing in the background.


Bringing the media experience to additional screens

In Windows 7, we announced Play To, which you can use to stream media files to supported external devices from Windows Explorer and Windows Media player. In Windows 8, Play To makes it even easier and simpler to share personal media collections and HTML5 media with Play-To-enabled devices at home. Our focus for Play To was to create rich social experiences built around personal content – like sharing photos with family and friends, streaming music for a party, or watching user-generated videos from the Internet. The experience has been designed from the ground up to integrate tightly with HTML5 from existing websites and your personal media collections, whether they’re stored in the local library of a Windows PC or tablet, on another home PC or network-attached media server, or on a web server in the cloud.

Play To is now easier to discover and will deliver a consistent, high quality experience from a multitude of Metro style apps. A few of the improved user experiences include:

  • Improved setup: On home networks (or HomeGroup) where you’ve allowed sharing, Play To devices are automatically discovered and installed on your PC.
  • Improved device experience: Metro style apps work only with Windows certified Play To receivers. These devices are validated to support modern media formats, are DLNA standards-compliant, and have great performance (including the updated Xbox 360 available later this year). The desktop experience first introduced in Windows 7 has been added to the Explorer Ribbon and will continue to support all DLNA DMR devices.
  • Easier discovery: Play To is accessible from the Devices charm, making it easy to initiate from any app that supports Play To. Just swipe in from the right edge (or point your mouse to the top-right corner), select the Devices charm, and then select the device you want to stream to.
  • Integrated into Metro style IE: IE allows you to stream HTML5 music, video, and photos from the web to your devices.
  • Works with the new Music, Video, and Photo apps: Apps can stream photos from a variety of sources and personal music and video collections.

Video on a tablet PC with Devices pane, playing the same video on a second screen

Play To from the Videos app

We have also focused heavily on making it easy for developers to use Play To in their apps and websites – the functionality is available to all Metro style apps via the Play To contract. The XBox 360 will support Play To in an update later this year.

Emerging media capabilities

Windows is enabling support for new content types for consumption and increased flexibility for content creation and communication. Stereo 3D, accessibility, and DSP effects are three examples of how we are enabling great multimedia experiences on Windows 8

Experiencing stereo 3D video

Over the last few years, the Stereo 3D (S3D) market has evolved from hype to finished consumer products. S3D provides a 3D viewing experience by displaying two overlapping copies of a video (captured from different angles), which appear as a single 3D video when viewed with 3D glasses. Our goal is to enable a viable S3D ecosystem for Windows by enabling key gaming and video playback scenarios on a platform that abstracts away the specifics of the 3D technology from the end-user’s PC.

In Windows 8, S3D support is available on DirectX 10 or higher GPUs with compatible drivers. A S3D-compatible display is needed to see S3D content. We wanted to make sure that Windows would support a wide range of display technologies with a consistent user experience, and make it easy for software and hardware to develop on our platform. As a result, specific S3D display technologies are largely made irrelevant by the graphics drivers, and a consistent set of APIs are available to apps using stereo 3D.

The Windows 8 media platform provides support for standards-compliant media formats for S3D video. H.264 video with frame-packing metadata represented as Supplemental Enhancement Information (SEI) is the typical format being adopted for online delivery, and is therefore the desirable S3D video format in Windows 8. The frame-packing formats that we support natively in the platform include both side-by-side and top-and-bottom arrangements, as in the illustration below.

2 images displayed side by side, and 2 images stacked vertically

Windows 8 supports a range of stereo 3D input formats, including side-by-side and top-bottom.

Delivering accessible media experiences in the web platform

Media accessibility is an important part of the Windows promise to our customers, especially for users with accessibility needs.

Subtitles provide interpretive or additional information to viewers who prefer a written transcript, those who need to see a translation in a different language, or those who need to see a transcript due to limited hearing ability.

Still video image with subtitles, video controls, and subtitle options: Off / English / German

Video playback in Windows 8 with subtitles

The web community has worked together through W3C to specify the best ways to deliver the subtitling experience through all modern web platforms. These include the following:

  • The <track> element can carry subtitle and closed captions for the HTML5 video tag. This feature is now incorporated into Windows 8. Subtitle support is now available through the video tag in IE10 and in apps using HTML.
  • User controls are available on the default media controls of the video tag.
  • There is native support for the WebVTT and SMPTE-TT formats that are commonly found in the web community and with partners in the TV and broadcasting industries.
  • The Windows 8 media platform provides support for multiple audio tracks within a media source. Users can switch audio tracks to their preferred language, and tracks can also be used for audio descriptions for sight-impaired users. Metro style apps can now easily switch between audio tracks or even play multiple audio tracks simultaneously, for instance, a normal audio track plus an audio description.

Standard video controls, plus language options: English, Hindi, Chinese, Polish

Video playback in Windows 8 with multiple audio tracks

Adding effects to the media pipeline

The Windows 8 media platform has been designed to adapt easily. One way that we’ve done this is by allowing effects (often referred to as digital signal processing, or DSP) to be added to the pipeline. We’ve included several built-in effects, like image stabilization and horizontal flipping (which is useful for webcam preview), and we’ve also made it easy for applications to plug in to the Media Foundation pipeline with custom effects. In addition, we’ve made sure that media data can pass through the pipeline efficiently, thus minimizing the performance and power impact of adding DSPs.


The Windows 8 media platform is designed to deliver a fluid and responsive media experience with great battery life. We’ve engineered Windows to give you a great user experience across a broad set of scenarios, including voice communication, audio and video playback, and streaming content. As media applications continue to evolve, the media platform in Windows will enable these experiences to shine across all Windows 8 PCs.

I’ll close now with a video that walks you through some of the highlights of the new media platform.


Download this video to view it in your favorite media player:
High quality MP4 | Lower quality MP4