Hardware accelerating everything: Windows 8 graphics
With Windows 8 we set out to enable all applications to have the beautiful and high-performance graphics enabled by modern graphics hardware. This work builds on the well-established foundations of DirectX graphics, which have been providing an increasing breadth of APIs and capabilities. In Windows 7, we expanded the capabilities of DirectX to provide a common hardware-accelerated graphics platform for a broader range of applications. Whereas previously, DirectX mainly provided 3-D graphics, we added functionality for what we call “mainstream” graphics. Mainstream uses center on the typical desktop applications most people find themselves using every day, including web browsers, email, calendars, and productivity applications. Windows 7 added two new components to DirectX: Direct2D for two-dimensional graphics (shapes, bitmaps, etc.) and DirectWrite for handling text. Both of these additions not only focused on performance but also on delivering high-quality 2-D rendering. With these additions, DirectX became a hardware-accelerated graphics platform for all types of applications. Indeed, we showed what a typical application could achieve by using DirectX when Internet Explorer 9 brought hardware-accelerated graphics to the web. WinRT bring these capabilities to the full range of new Windows 8 applications. In this post, authored by Rob Copeland the group program manager on our Graphics team, we look at the details behind the scenes in enabling this new class of graphical application. --Steven
In computer graphics, high performance is a guiding principle. In the early days of personal computing, discrete, add-on graphics cards were mostly focused on specialized applications such as CAD/CAM and gaming. Even early on, there was a view that all of this graphics horsepower could be used for more: notably a better user interface and experience. One of the first graphics cards for a PC was called a “Windows Accelerator” from S3 Graphics, which focused on the user experience by moving windows around the screen faster. As graphics hardware evolved, so, too, did the methods that developers use to interact with that hardware.
DirectX is the part of Windows that provides a common application programming interface, or API, that allows developers to use the graphics hardware in the PC to draw text, shapes, and three-dimensional scenes, and display them on the screen. DirectX has also evolved over time in both capabilities and performance characteristics. In the early years, DirectX was focused mainly on games. As applications evolved to provide richer and more graphically-intense user experiences, many of them started to use DirectX as a way to get better performance and richer visuals.
Enter Windows 8
When we started to plan the work we’d undertake for graphics in Windows 8, we knew that we would be creating a new, visually rich way for users to interact with apps and with Windows itself. We also knew that we’d be building a new platform for creating Metro style apps, and that we’d be targeting a more diverse set of hardware than ever before. While we had a great graphics platform to start with, there was more work to do in order to support those efforts. We came up with four main goals:
- Ensure that all Metro style experiences are rendered smoothly and quickly.
- Provide a hardware-accelerated platform for all Metro style apps.
- Add new capabilities to DirectX to enable stunning visual experiences.
- Support the widest diversity of graphics hardware ever.
While each of these focus on different aspects of building Windows 8, they all depend on great performance and capabilities from the graphics platform.
Planning for performance
Graphics performance on Windows depends on both the operating system and the hardware system, comprised of the CPU, the GPU (graphics processing unit), and the associated display driver. To ensure that we could deliver a great experience for new Metro style apps, we needed to make sure that both the software platform and the hardware system would deliver great performance.
In the past we’ve used many different benchmarks and apps to measure the performance of DirectX. These have been largely focused on 3D games. While games are still very important, we knew that many of these existing ways to measure graphics performance did not tell us everything we needed to know for graphics-intensive, 2D, mainstream apps.
So we created new scenario-focused tests and metrics to track our progress. The metrics we use are as follows:
1. Frame rate
We express frame rate in frames per second (FPS). This metric is widely reported for gaming benchmarks, and is equally important for video content and other apps. When something is animating on the screen, a rate of 60 FPS makes the animation appear smooth. We target that rate because most computer screens refresh at 60 hertz. With that frame rate, Windows can provide very smooth animations with “stick to your finger” touch interactions.
2. Glitch count
While frame rate is an important metric, it doesn't tell the whole story. For example, running a benchmark for 10 minutes and getting 60 FPS on average sounds perfect. But, it doesn’t tell us how low the frame rate might have dropped during the test. For example, if the frame rate dips down to 10 FPS momentarily during demanding parts, the animations will stutter. The glitch count metric looks for the total number of times that rendering took more than 1/60 of a second, thus resulting in a reduced frame rate. It also looks at the number of concurrent frames missed. The goal here is to have no missed frames during animations.
3. Time to first frame
Most people expect their apps to launch quickly, so initializing DirectX needs to be fast. “Time to first frame” tells us how much time it takes from the moment you tap or click to launch an app until you see the first frame of the app on the screen. To measure this, we created simple apps to help analyze and optimize the graphics system for the time it takes to initialize a graphics device, allocate the required memory, and so on. This helps us ensure that the work to set up DirectX takes very little time.
4. Memory utilization
The more memory our graphics components use, the less memory is available for apps. By ensuring that most of the system’s memory is available for apps, you get the best app performance, and more apps can run at the same time. Apps use a mix of system memory and GPU memory. GPU memory is mostly used for rendering operations such as drawing images, geometric shapes, and text. Additionally there are graphics operations that use the CPU and therefore use system memory.
In order to characterize memory utilization, we measure the memory used by the system for the following scenarios:
- The app is idle. That is, it is not doing any work and is not rendering or displaying new information to the screen.
- The app is displaying information to the screen. This represents the base memory cost of a simple drawing.
- Texture creation. This represents the memory used for creating a large number of image objects on the GPU.
- Vertex buffer creation. This represents the memory overhead of creating geometric shapes.
- GPU data upload. This measures memory overhead involved in uploading data to the GPU.
Measuring memory usage across many types of apps and these various scenarios has helped us further optimize DirectX and the display drivers.
5. CPU utilization
Most graphics operations utilize the CPU in addition to the GPU. For example, when an app is figuring out what it’s going to draw, it typically does these calculations on the CPU. CPU utilization is important to understand because the higher the percentage of the CPU used by a task, the fewer cycles the CPU can devote to other tasks. For good graphics performance and overall system responsiveness, it is important to effectively balance work between the CPU and the GPU.
These benchmarks and metrics help us ensure that the experiences and apps are smooth and have great performance. They play a big role in our understanding of mainstream apps. Of course, we still utilize industry benchmarks, games, and other ways to measure our overall performance.
Hardware accelerating mainstream graphics
There are many ways to look at mainstream graphics. To ensure that our work would give users the right performance and the right experiences we studied many examples of both Metro style and desktop apps to understand how they used the graphics hardware. In particular, Internet Explorer 9, Windows Live Mail, and Windows Live Messenger make excellent use of DirectX. Because these apps have done great work utilizing DirectX, they're good examples of what other apps might do. This led to a number of investments to ensure mainstream apps were fast and looked great.
Improving text performance
Text is by far the most frequently used graphical element in Windows, so improving text rendering performance goes a long way towards creating a better experience. Web pages, email programs, instant messaging, and other reading apps all benefit from high-quality and high-performance text display.
The Metro style design language is typographically rich and a number of Metro style experiences are focused on providing an excellent reading experience. DirectWrite enables great typographic quality, super-fast processing of font data for rendering, and provides industry-leading global text support. We’ve continued to improve text performance in Windows 8 by optimizing our default text rendering in Metro style apps to deliver better performance and efficiency, while maintaining typographic quality and global text support.
The bar chart below illustrates the performance improvements that result from this work. It includes measurements for the following text scenarios:
- Rendering a screen full of reading-size text formatted as paragraphs as you would find in a web page or Word document
- Rendering a screen full of small chunks of text at reading sizes as you would find in user interface controls such as button labels or menus
- Rendering a screen full of small chunks of heading-sized text as you would see in titles & headings in Metro style apps and as headlines on blog posts and news articles on the web.
The most noticeable performance improvement can be seen when scrolling through a long document on a touch screen. The reduction in time required to render the characters frees up CPU cycles to handle other tasks like processing high-frequency touch input, or displaying more complex document layouts.
Improving geometry rendering performance
Along with text, we also made dramatic performance improvements for 2D geometry rendering. Geometry rendering is the core graphics technology that is used to create things like tables, charts, graphs, diagrams, and user interface elements, as shown in the example below. For Windows 8, our improvements in this area have primarily focused on delivering high-performance implementations of HTML5 Canvas and SVG technologies for use in Metro style apps, and webpages viewed with Internet Explorer 10.
The Weather app in Windows 8 uses geometry to display a graph of historical temperature and precipitation data
When Direct2D draws geometry, it takes instructions from the app about what to draw in the form of 2D figures (e.g. rectangles, ellipses, and paths), the size and location of the figures, and specifics about the style of rendering, including brush color and stroke style. Then it converts those instructions into a set of triangles and commands that it sends to Direct3D to generate the desired output. We call this conversion process tessellation.
To improve geometry rendering performance in Windows 8, we focused on reducing the CPU cost associated with tessellation in two ways.
First, we optimized our implementation of tessellation when rendering simple geometries like rectangles, lines, rounded rectangles, and ellipses. Below is a chart showing the impact of these improvements.
Second, to improve performance when rendering irregular geometry (e.g. geographical borders on a map), we use a new graphics hardware feature called Target Independent Rasterization, or TIR.
TIR enables Direct2D to spend fewer CPU cycles on tessellation, so it can give drawing instructions to the GPU more quickly and efficiently, without sacrificing visual quality. TIR is available in new GPU hardware designed for Windows 8 that supports DirectX 11.1.
Below is a chart showing the performance improvement for rendering anti-aliased geometry from a variety of SVG files on a DirectX 11.1 GPU supporting TIR:
We worked closely with our graphics hardware partners to design TIR. Dramatic improvements were made possible because of that partnership. DirectX 11.1 hardware is already on the market today and we’re working with our partners to make sure more TIR-capable products will be broadly available.
Images are widely used in a variety of scenarios including displaying user interfaces, webpages, and other app content. Websites commonly use JPEGs for pictures and PNG and GIF files to efficiently store user interface elements such as button graphics.
Working with digital photographs is also a very common activity on Windows. The number of digital photographs that Windows customers view and manipulate on their PCs continues to grow at an incredible rate.
We’ve made several performance improvements for working with images and photographs using the JPEG, GIF, and PNG formats.
For JPEG, improvements include:
- Faster image decoding by expanding SIMD usage on all CPU architectures
- Faster Huffman decoding and encoding
For PNG, improvements include:
- Faster image decoding by expanding SIMD usage on all CPU architectures
- Faster image encoding and decoding by optimizing our zlib implementation
In addition, we’ve improved pixel format conversion as well as image scaling. This results in faster decoding and rendering of images for all apps.
The video below uses a test app to measure the decoding and rendering time for a set of images. Windows 8 takes 40% less time than Windows 7 to render 64images (4.38 seconds vs. 7.28 seconds)
Rendering and displaying
As we evolve DirectX to support more mainstream scenarios, another area we we’ve invested in is optimizing how apps render and display their content. There are some big differences in how a 3D game draws its content and how a mainstream app such as Internet Explorer draws its content. For example, consider the video of the game below. In games like this, the entire scene changes rapidly. As the “camera” moves around the vehicle, the clouds move across the sky, and smoke billows up from the engine, the app must redraw the entire scene in each frame in order to achieve a life-like and engaging experience.
Now consider the webpage below. It has both a text article and a video. While the video plays, the browser must update the portion of the window containing the video but not the text. Additionally, if the user scrolls the page up, then we only need to render the new text at the bottom of the page. The rest of the text has already been rendered and simply needs to be moved.
To improve apps that don’t need to redraw the entire screen for each frame, we optimized how DirectX deals with redrawing just portions of the screen and how it scrolls. This work not only improves app efficiency and performance, but since it reduces redundant drawing and reduces the number of times graphics data needs to be copied in memory, it also reduces power consumption, thus increasing battery life.
Making the entire platform great
All of these changes help Windows render experiences very quickly and smoothly. While we’ve talked mostly about features in DirectX, the great thing is that all of this work contributes to making our entire platform hardware-accelerated by default. Since we built the Metro style platform on top of DirectX, all apps take full advantage of the graphics hardware on the system, regardless of the programming language and framework the developer chooses.
Creating stunning visual experiences with Direct2D and Direct3D
Stylistic effects applied to images are becoming more common in modern user experiences. They can help highlight an area of an app, draw your attention to a specific part of the screen, or just make things look better. As we planned the graphics capabilities for Windows 8, we wanted to make it really easy for developers to apply these types of effects in their apps. We looked at two main areas where image processing would be useful:
- User interface images
The Metro style experience uses dynamic visuals. We wanted to enable Metro style apps to do image processing in real-time. This can range from 3D transition effects to perspective transforms, blurs, and highlights on user interface elements.
Apps that deal with photographs often want a rich set of image processing features. Effects such as adjusting exposure, brightness, and contrast, applying vibrancy and clarity, working with advanced curves, and applying lens corrections all allow these apps to enhance your digital memories.
To enable these types of experiences, we added “Direct2D Effects,” a new set of APIs that enable high-quality, hardware-accelerated effects to be applied to any image. Direct2D Effects have the following benefits:
- They provide optimal-quality renderings of image effects to suit the needs of wide variety of apps.
- The effects are hardware-accelerated and work on a wide variety of graphics hardware.
- A simple API enables great effects with minimal programming.
- They provide many built-in effects.
- They support large image sizes and up to 32 bits per channel.
- Custom effects can be combined with built-in effects or other custom effects.
Direct2D Effects power some of the new user experiences in Windows 8. For example, when tapping on a tile on the Start screen, the tile uses the 3D perspective transform effect to “tilt” in the right direction. They also power the rest of our platform. For example, SVG filter effects and CSS 3D transforms are implemented using Direct2D Effects.
Direct3D 11.1 as a common foundation
While adding new features like Direct2D Effects is a great way to help developers deliver new experiences, we also looked at ways to make it easier to use existing DirectX features.
Over years of development, we've added various different features to DirectX. Hardware acceleration of video decoding came alongside programmable shaders in Direct3D 9. In Windows 7, we added Direct2D and built it on top of Direct3D 10. At that time, we also created DirectCompute, a new system for high-performance computation on the GPU that became part of Direct3D 11. One result of all these updates is that DirectX has a very comprehensive set of features around graphics and GPU computation, but as a side effect, it has also become increasingly difficult to create an app that uses video, 2D graphics, 3D graphics, text, and DirectCompute together.
In Windows 8, the new Direct3D 11.1 API is the foundation for hardware acceleration of 2D graphics and text, image processing, 3D graphics and computation, and video. The new API makes it much simpler to mix different types of content in a single scene because that single API now manages all of the GPU resources associated with rendering. This also reduces memory usage by eliminating the redundancy involved in creating multiple graphics device-management objects in app code. In addition, Direct3D 11.1 provides a uniform way for apps to access the various capabilities of different graphics hardware. It provides mechanisms for the app to determine what features are available, and then only uses those capabilities. This enables apps to make maximum use of the GPU’s capabilities, whether the GPU was designed for long battery life on a tablet, or high-end gaming on a desktop PC.
Diverse graphics hardware
Historically, the expectations for each successive release of Windows have been that both the graphics platform and the graphics hardware capabilities will become richer and higher in performance. This is still true, as the graphics hardware industry continues to develop faster, more powerful GPUs. But in Windows 7, we started to see an inflection point in these assumptions, as the diversity of the hardware broadened with the introduction of mobile, low-power devices.
With Windows 8, this trend towards diverse hardware types is continuing and accelerating, both with new, high-performance graphics cards, and with an increasingly wide range of low-power mobile devices. The diversity of the hardware for Windows 8 will span a broader range than ever before; from graphics hardware that consumes on the order of 1 watt in always-connected tablets all the way up to high-end systems with multiple graphics cards that use a total of 1,000 watts or more. This broadening diversity brings with it new design considerations.
Our goal remains to provide visually compelling, high-performance experiences. With highly mobile devices, the primary power source is a battery, so we also need to maximize battery life. To meet both the performance and power consumption requirements of these new form factors, many of our graphics hardware partners have employed new GPU architectures.
One of the graphics architectures commonly used in low-power system designs to achieve performance along with great battery life is called “tile-based rendering.” The general concept of a tile-based rendering approach is to have a very high performance (but small) memory cache that the graphics engine uses for rendering. The GPU then renders the screen in sections (or tiles) by repeatedly processing the same set of commands on each tile, rather than the whole screen at once. The intent is to minimize operations that use memory off-chip, therefore keeping power consumption low and performance high. Repeatedly accessing memory off-chip is expensive both in terms of time and power consumption.
To increase the efficiency of these tile-based architectures, we added a number of flags, hints, and new APIs that can minimize the number of times the tiles are rendered. We have incorporated the use of these into the Metro style app development platform to ensure greater efficiency in apps running on graphics hardware that uses a tile-based rendering architecture.
Another way for graphics hardware to reduce power consumption while still achieving great performance is to perform graphics rendering calculations using fewer bits of precision. This allows the GPU to more efficiently structure its data so that it can process more data simultaneously, thus reducing the power needed. For Windows 8, we added new mechanisms for apps to specify the amount of precision needed in their graphical calculations. For example, when doing custom blending of multiple images where the image data is 8 bits per component, the blending computations could be done with 10 bits of precision rather than the default of 32 bits. The reduced precision doesn’t impact image quality, but does reduce power consumption.
Great performance, smoothly rendered
As you can see, we’ve done a lot of work to enable a very fast and smoothly animated user experience in Windows 8. From new ways to measure our progress, to optimizations for mainstream uses of our graphics platform, and new hardware features, we’ve created the best Windows graphics platform yet. And of course, we continue to push the envelope on immersive, three-dimensional gaming, with great performance and new features such as stereoscopic 3D.
From high-end gaming rigs to light-weight, always-connected tablets, Windows 8 supports the broadest range of graphics hardware ever in a single operating system. We hope this post has helped explain some ways in which this work enables a whole new set of rich experiences.
- Rob Copeland
P.S. Thanks to Sriram Subramanian, Dan McLachlan, Kam VedBrat, Steve Lim, and Jianye Lu, for their substantial contributions to this blog post.