Improving Audio Glitch Resilience in Windows 7
Delivering excellent audio playback on a PC is one of those “much harder than it looks” technical challenges. Unlike dedicated audio / video devices, PCs have a lot going on during playback of audio and the playback happens on an incredible array of hardware and software. Many of you might be familiar with “glitches” that occasionally happen. In this post, Kristin Carr, a program manager on our Devices and Media team, describes some of the engineering in Windows 7 to improve this area representing the work of a number of folks across the team. One lesson I learned early in the product cycle is that we don’t say “glitch-free” but rather “glitch-resilient” and hopefully that will make sense as you read this. --Steven
Have you ever used your PC to play an MP3 or a DVD? If you answered yes, you’re among the overwhelming majority of PC customers who use their computer for audio and video applications, encompassing everything from watching a movie to playing a game to viewing a YouTube clip. But you may have also had an experience where your audio or video wasn’t quite perfect – perhaps the video was a bit choppy or the audio stuttered. We call this a ‘glitch’ – a perceived discontinuity in your audio or video that interrupts the playback experience. In this blog post, we’ll be focusing on audio glitching: we’ll examine the ecosystem challenges that can cause glitches, and we’ll discuss the work we’ve been doing to improve the Windows 7 experience.
What Causes Glitching?
In previous posts, we’ve touched on a variety of ecosystem initiatives and challenges that we’ve undertaken for Windows 7, including application compatibility, accessibility, and system performance, among others. Tracing the root cause of audio glitching leads us to a similar place: because Windows runs on a huge variety of hardware configurations and multitasks between dozens of applications, it is challenging to ensure that all of the programs and drivers running on your computer will work together in exactly the way you expect.
Audio is especially sensitive. In order for you to hear music from your speakers, data needs to be delivered to your audio hardware approximately every 10 milliseconds, or 30 times in the blink of an eye! The challenge is that your PC is usually doing a lot of other things at the same time you’re listening to music, such as streaming that YouTube video or downloading that new song, and many of these other tasks have complex timing requirements as well. As you can imagine, it doesn’t take much – a slow network driver or a graphics driver that requires plenty of CPU time – to prevent your audio from reaching your ears in a continuous fashion.
So what are we doing to address this challenge? The answer is ‘lots!’ – and the remainder of this blog post will be devoted to discussing these things:
- Gathering data in order to characterize the problem
- Developing a systematic method to detect and analyze glitches
- Getting these tests and tools widely deployed, both at Microsoft and by our Windows partners
- Engaging with partners to detect, diagnose and fix glitching issues
Who Experiences Glitching?
In studying this during the Windows 7 development cycle, our first order of business was to gather data. We‘d heard reports of audio glitching, but we didn’t know the exact scope of the problem. How often do users hear their audio glitching? Are there certain machines that were worse than others? With these questions in mind, we set out to understand our problem space a bit better.
We gathered data by using the telemetry infrastructure built into Windows, which allows our users to report back to Microsoft with performance data and other statistics that help us improve the OS. For each machine that opted to contribute data to Microsoft, we measured the number of times that the underlying audio hardware was being starved for data (i.e., when the user might hear a glitch). This data was grouped into “sessions,” each of which represents the data collected on a single machine for a single day or the data collected between machine reboots, whichever is shorter.
Let’s dive into some of the results. First, let’s look at the overall rate of audio glitching:
Figure 1: Distribution of Glitch Counts per Session
The chart above shows data from external (non-Microsoft) RC users. Approximately 80% of sessions showed no glitching at all, but 4.3% showed 10 or more glitches, which indicates that audio glitching affects a significant number of users.
Once we figured out how often glitching occurs, we started looking into why it occurs. First, we broke the data down by laptop/desktop form factor:
Figure 2: Glitching Likelihood by Form Factor
From this data, we noticed that laptops were almost twice as likely to experience audio glitching. As a result, we’ve made sure to address and target mobile PCs as well as mobile scenarios (for example, playing music while running on battery) for better coverage in our glitching tests and diagnostic tools.
Next, we looked at glitching likelihood by PC manufacturer:
Figure 3: Glitching Likelihood by PC Manufacturer (Mfr)
This data showed that certain manufacturers were more likely to be susceptible to audio glitching than others. As a result, we made sure to spread our testing efforts across a wide spectrum of machines and manufacturers. In addition, we are using this data to work with manufacturers to see if we can identify components or specific causes that would result in higher glitch incidents.
Finally, we looked at glitching on a wide variety of PC models:
Figure 4: Breakdown of All Glitch Sessions by PC Model
In the chart above, we examined all of the sessions that had at least one glitch, and we looked for any correlation with the PC make and model as shown in the table above (actual machine names have been anonymized). The first thing to notice is that Machine A is responsible for more than three times as much audio glitching as any of the other machines on the list. This data confirmed earlier reports of audio glitching on this particular machine, which we traced to a graphics card that shipped in a faulty configuration. As a result, we were able to work with the manufacturer to improve the configuration.
This chart also helps to show how widespread the issue is. There were hundreds of PC models that showed evidence of glitching – in fact, it seemed difficult to find a single PC model for which audio glitching did not ever occur. On the other hand, most individual machines didn’t show any problems at all. The conclusion that we drew was that audio glitching was not caused by any one hardware configuration, but was dependent on all the different hardware and driver permutations a user could possibly encounter on their machine. It was clear that no machine was immune, and in order to improve the experience, we were going to need a far-reaching, system-wide solution to this problem.
Developing Tools to Diagnose Glitching
Once we had data on when and why glitching occurs, the Windows Devices & Media Performance team developed a comprehensive suite of tests that were centered around media playback scenarios and were designed to assess how well a PC performed at that scenario. During media playback, these tests recorded thousands of statistics about the system’s performance, including CPU load, the activity of all components on the system and their corresponding interactions, and whether glitching occurred, among other things. We intentionally covered a huge range of scenarios and configurations, including laptops running on battery power, hardware under stress, hundreds of media content types, and many more. The goal was to exercise each PC in a wide variety of user scenarios in order to uncover and isolate audio glitches.
In addition, the Devices & Media Performance team created a graphical tool to highlight glitches as well as the CPU activities that occurred before and during an audio glitch, which allows us to quickly diagnose any glitching problems that we uncover. For example, in the figure shown below, we can see a visual representation of when glitches occurred, and we can display related measurements that occurred at the time of the glitching in order to easily pinpoint any suspicious behavior.
Figure 5: Example Graphical View of Audio Glitch Troubleshooting
In this case, you can see four audio glitches (shown by red vertical lines in the top panel). Two panels down, we have displayed calls to the CPU that took longer than 3ms (called long ISRs/DPCs). In this example, you can see a direct correlation between audio glitches and long ISRs and DPCs, which are procedure calls executed by the operating system that have the potential to hog the CPU and produce audio glitches. From here, we can track down the components responsible for these calls in order to reduce or eliminate the glitching. This figure shows additional information than what we used to diagnose the particular problem discussed above; however, this information and the many other measurements are available to diagnose other glitches and media performance issues from across a wide range of sources.
Putting the Tools to Work
Armed with these tests and tools, our next step was to deploy them on as many systems as possible. As part of this effort, we are participating in a Windows-wide initiative to help OEMs test their PCs at or before ship time. Hundreds of OEM machines get shipped to Microsoft for use in our Windows lab where we run thousands of tests in order to validate and ensure the best user experience. What this means is that if we notice that a particular machine or configuration might be susceptible to glitching, we can work with the OEM to try to fix the problem before the consumer ever sees their new PC.
By running these tests and analyzing the results with our new tools, we’ve been able to find hundreds of potential issues that would result in audio glitches. In some cases, this analysis resulted in changes to the Windows code. In other cases, we have identified components developed by our partners that can lead to audio glitching.
Engaging with Windows Partners
Since the issues we identify with these tools often involve components from many different partners, an important aspect of this work is engaging with these partners. Until now, it has been almost impossible for manufacturers to know how their components will affect the system as a whole, but by making these tests and tools available, we are attempting enable these partners to see how their components interact and what the final impact on users will be.
As part of this effort, we have been working to ensure that our partners can take full advantage of these new tools and tests. We’ve talked with OEMs, ODMs (original design manufacturers, who traditionally assemble the PC for the OEM), hardware manufacturers, and software vendors. We’ve given presentations and tutorials, written whitepapers, and held video conference workshops. Our goal has been to make it as easy as possible to create glitch-resilient software and hardware.
In summary, this effort includes:
- Sharing audio glitching telemetry data with our partners. Our partners have had very little concrete data on the prevalence of audio glitching. With the data we are now collecting, we can help them to diagnose problems and improve their products.
- Running our suite of audio and video performance tests on the hundreds of machines that OEMs send us and communicating the results to our partners. By assessing as many systems as possible and providing these results, we begin to tackle the causes of audio glitching.
- Providing the tools and support that enable our partners to understand how their components are interacting with everything else on a PC and enable them to more easily address the subtle issues that can result in audio glitching.
Ultimately, we and all of our Windows partners share a common customer (you!); by working with our partners, calling attention to these issues, and providing more insight into the root causes of audio glitching, we are continue to improve the audio experience for everyone.