May 2013

Volume 28 Number 05

# DirectX Factor - Exploring Filters in XAudio2

By Charles Petzold | May 2013

In the pantheon of notable waveforms, the simple sine curve reigns supreme. Just by looking at it, you can see its quintessential smoothly undulating nature—slowing down as it reaches its peaks, almost stopping as it crests, and then progressively picking up speed, reaching maximum velocity as it crosses the horizontal axis to start another slowdown.

This visual impression is confirmed by a deeper mathematical analysis. The instantaneous velocity of the sine curve at any point is a tangent line to the curve. Graph those velocities, and you get another sine curve, offset from the original by a quarter cycle. Do it again using this second curve, and that’s a sine curve showing acceleration, offset from the original by half a cycle, as shown in Figure 1.

Figure 1 A Sine Curve, Its Velocity (in Violet) and Acceleration (in Aqua)

In calculus terms, the sine curve is the negative of its own second derivative. From basic physics, we know that force is proportional to acceleration, which means that in any physical process where force is inversely proportional to displacement, motion is described by sine curves. Springs have this characteristic: The more you stretch them, the greater the force in the opposite direction. But many other substances found in nature have an intrinsic springiness as well, including the compression and rarefaction of air.

Consider the plucking of a taut string, the tapping of a stretched animal skin, the vibration of air within a pipe. These processes all involve springy objects that vibrate with the characteristic motion of a sine curve. More commonly, this sine curve is supplemented by additional sine curves whose frequencies are integral multiples of the fundamental frequency. The sine curves in this assemblage are known as harmonics.

By itself, a single sine curve is audibly quite boring. But put a few of them together in a harmonic relationship and the sound gets much more interesting. In real life, very often the frequencies of these harmonics are not exact integrals of a base frequency, and are more correctly referred to as overtones. It’s this combination of overtones—including how they change over time—that defines a musical instrument’s characteristic sound, or timbre.

A little fragment of a particular waveform can be graphed as a function of time, as shown on the top in Figure 2. If this waveform repeats every 4 ms, it has a frequency of 250 Hz, which is close to middle C on the piano.

Figure 2 A Waveform in Time Domain (Top) and Frequency Domain (Bottom)

With some Fourier analysis, we can separate this waveform into its constituent sine curves, and represent it in a somewhat different manner, as shown on the bottom in Figure 2. This graph shows the relative amplitudes of these constituent sine curves arranged by frequency. In signal-processing lingo, Figure 2 shows the equivalence of the time-domain representation of a waveform on the top and the frequency-domain representation on the bottom.

In real life, a representation of a sound in its frequency domain would encompass the entire audio spectrum from 20 Hz to 20,000 Hz, and constantly change over the course of time.

## Filter Basics

The frequency-domain representation allows us to think of sound as a collection of sine waves of various frequencies, and that’s often useful in understanding audio processing.

A very common type of audio processing involves amplifying or attenuating certain ranges of frequencies in the audio spectrum, thus altering the harmonic composition of the sound. This is a tool known as a filter. In analog signal processing, filters are circuits; in digital signal processing, they’re algorithms.

The most common filter types are called low-pass, high-pass and band-pass; the terms refer to the frequencies the filters let through. The low-pass filter emphasizes lower frequencies by attenuating higher frequencies. Similarly, the high-pass filter attenuates lower frequencies. Both the low-pass and high-pass filters are defined by a particular cutoff frequency that indicates where the attenuation begins. The band-pass filter doesn’t have a cutoff frequency, but a center frequency serves a similar purpose. Frequencies outside of a range around that center frequency are attenuated.

Most filters can’t simply block all sine waves above or below a particular frequency. Instead, a sine wave with a particular frequency is attenuated based on its distance from the cutoff or center frequency with a roll-off effect. The slope of this roll-off is governed by a property of the filter known as Q, which stands for quality. A filter with a higher Q has a steeper roll-off.

The Q factor is easiest to interpret with a band-pass filter. Figure 3 shows the effect of a band-pass filter applied to a range of frequencies. The center frequency is marked at f0, and two other frequencies are marked f1 and f2 where the band-pass filter attenuates the amplitude to 70.7 percent of the f0 amplitude.

Figure 3 Bandwidth in a Band-Pass Filter

Why 70.7 percent? The power of a waveform is calculated as the square of the waveform’s amplitude, and f1 and f2 indicate the frequencies where the waveform has been attenuated to half its original power. Because power is amplitude squared, the amplitude at those points is the square root of 1/2, or 0.707.

Q is calculated by dividing the center frequency by the difference between the two half-power frequencies:

Q = f0 / (f2 – f1)

However, the difference between f2 and f0 is not the same as the difference between f0 and f1. Instead, the ratios are the same: f2 / f0 equals f0 / f1. If f2 is double f1, that’s an octave, and it’s fairly easy to calculate that Q equals the square root of 2, or approximately 1.414.

The ratio between f2 and f1 is known as the filter’s bandwidth, and is often specified in octaves. For a bandwidth B in octaves, you can calculate Q like so:

As the bandwidth decreases, Q increases and the roll-off is steeper.

I said that f2 and f1 are the frequencies that the filter attenuates to half the power of f0. Half the power is also known as -3 decibels. The decibel is a logarithmic scale roughly approximating the human perception of loudness. The difference in decibels between two power levels P1 and P0 is:

db = 10·log10(P1/P0)

If P1 is half P0, the base-10 logarithm of 0.5 is -0.301, and 10 times that is approximately -3. When dealing with amplitudes, decibels are calculated as:

db = 20·log10(A1/A0)

The base-10 logarithm of 0.707 is -0.15, and 20 times that is also -3.

Every doubling of the amplitude corresponds to an increase of 6 decibels, which is why the 16-bit sampling rate of audio CDs is sometimes said to have a dynamic range of 96 decibels.

## Applying a Filter

If you’re using XAudio2 in a Windows 8 program to generate sounds or modify the sounds of existing music files, applying a filter to those sounds is as simple as initializing three fields of an XAUDIO2_FILTER_PARAMETERS structure and calling a method named SetFilterParameters.

As you’ve seen in recent installments of this column, a program creates one or more instances of IXAudio2SourceVoice to define the waveforms themselves, and a single instance of IXAudio2­MasteringVoice to effectively combine all the source voices into a single audio stream. Later in this article you’ll also see how to create instances of IXAudio2SubmixVoice to control the processing and mixing of sounds on their way to the mastering voice. Both source voices and submix voices support the SetFilterParameters method, but only if the voices are created with the XAUDIO2_VOICE_USEFILTER flag.

A call to SetFilterParameters requires a pointer to an XAUDIO2_FILTER_PARAMETERS structure, which has three fields:

Type: set to one of the members of XAUDIO2_FILTER_TYPE enumeration, which includes members for low-pass, high-pass and band-pass filters, as well as a notch (or band-reject) filter, and single-pole low-pass and high-pass filters, which (as you’ll see shortly) are simpler filters.

Frequency: set to 2·sin(π·f0/fs) where f0 is the cutoff frequency and fs is the sampling frequency, where f0 is not greater than 1/6 fs, which means that the value set to the field is no greater than 1.

OneOverQ: 1 divided by the desired Q factor, greater than zero and no greater than 1.5. Thus, Q cannot be less than 2/3, which corresponds to a bandwidth of 2 octaves.

I haven’t yet shown you graphs, similar to Figure 3, that illustrate how the low-pass and high-pass filters attenuate frequencies. Sometimes such graphs simply show a roll-off effect and thus can be dangerously deceptive if the actual filter doesn’t quite work like that. Such is the case with XAudio2 filters. For the low-pass, high-pass, band-pass, and notch filters, XAudio2 implements a type of digital filter known as a biquad, which involves a fairly simple algorithm but does not create a simple roll-off effect for low-pass and high-pass filters. (If you’re interested in the algorithm, follow the links in the Wikipedia article on “Digital biquad filter” at bit.ly/Yoeeq1.)

Biquad filters tend to resonate at the center frequency of a band-pass filter, and near the cutoff frequencies of the low-pass and high-pass filters. This means that the filter not only attenuates some frequencies, but amplifies others. To use these filters intelligently, you must be aware of this effect. Fortunately, this amplification is fairly easy to predict. For the band-pass filter, the amplitude of a sine wave at the center frequency is increased by a factor equal to Q. For the low-pass and high-pass filters, the maximum amplification near the cutoff frequency is equal to Q for higher values of Q, but somewhat greater than Q for lower values.

Figure 4 shows the effects of all the XAudio2 filter types set for a frequency of 261.6 Hz (middle C) and a Q of 1.414. The horizontal axis is logarithmic with a range of 3 octaves above and below middle C. The vertical axis shows the resultant amplitude for sine curves at those frequencies. The horizontal black line at an amplitude of 1 is for no filter. All the other lines are identified with different colors.

Figure 4 The Effect of Filters for a Q of 1.414

For example, the low-pass filter not only lets through frequencies below the cutoff frequency, but amplifies them, and this amplification increases as you get closer to the cutoff frequency. The high-pass filter has the opposite effect.

Figure 5 is similar to Figure 4 but for a Q of 4.318, which is associated with a bandwidth of 1/3 octave. Notice that the vertical axis is different to accommodate the increased amplification.

Figure 5 The Effect of Filters for a Q of 4.318

If you want to use a simple low-pass or high-pass filter that won’t amplify at all, stick to the one-pole filters. These are very simple filters governed simply by a cutoff frequency and they don’t use the Q setting. They function much like the simple bass and treble controls on a car stereo. But if you want to use the more sophisticated filters, your program must compensate for any amplification by the filter.

If you’d rather implement your own filters, you can do that as well by creating an XAudio2 Audio Processing Object (XAPO), which is a class that gets access to an audio stream and can implement effects.

## Watching the Volume

To allow me (and you) to experiment with filters, I created a Windows 8 project named AudioFilterDemo that’s included in the downloadable code for this article. Figure 6 shows it running.

Figure 6 The AudioFilterDemo Program

The three oscillators toward the top are all independently controllable, with a frequency range encompassing 3 octaves on either side of middle C. The slider is a logarithmic scale but adjustable to 10 divisions between notes, which is an increment known as 10 cents.

The filter has a frequency slider as well as a slider for Q. All the frequency sliders have tooltips that identify the note and its frequency. Figure 7 shows the method that sets the filter on the three waveform source voices whenever there’s a change in the controls.

Figure 7 AudioFilterDemo Setting XAudio2 Filter Parameters

``````void MainPage::SetFilterParameters()
{
if (pSawtoothOscillator != nullptr)
{
XAUDIO2_FILTER_PARAMETERS filterParameters;
if (currentFilterType != -1)
{
double cutoffFrequency =
440 * pow(2, (filterFrequencySlider->Value - 69) / 12);
filterParameters.Type = XAUDIO2_FILTER_TYPE(currentFilterType);
filterParameters.Frequency =
float(2 * sin(3.14 * cutoffFrequency / 44100));
filterParameters.OneOverQ = float(1 / filterQSlider->Value);
}
else
{
// Documentation:
// "acoustically equivalent to the filter being fully bypassed"
filterParameters.Type = LowPassFilter;
filterParameters.Frequency = 1.0f;
filterParameters.OneOverQ = 1.0f;
}
pSawtoothOscillator->GetVoice()->SetFilterParameters(
&filterParameters, XAUDIO2_COMMIT_ALL);
pSquareWaveOscillator->GetVoice()->SetFilterParameters(
&filterParameters, XAUDIO2_COMMIT_ALL);
pSineWaveOscillator->GetVoice()->SetFilterParameters(
&filterParameters, XAUDIO2_COMMIT_ALL);
}
}
``````

The bottom panel is a volume meter scaled in decibels. This allows you to see the resultant volume for a particular waveform and filter settings. The program makes no adjustments to volume other than through user settings. If this meter goes into the red, it means that the program is generating a sound that’s too loud, and that waveform is being clipped before going into the sound system on your computer.

The volume meter is based on a predefined effects class. Figure 8 shows the code I used to create an instance of that effect. The program then sets a timer and calls GetEffectParameters to obtain the peak levels of the output sound since the last time GetEffectParameters was called.

Figure 8 Creating a Volume Meter Effect

``````// Create volume meter effect
IUnknown * pVolumeMeterAPO;
XAudio2CreateVolumeMeter(&pVolumeMeterAPO);
// Reference the effect with two structures
XAUDIO2_EFFECT_DESCRIPTOR effectDescriptor;
effectDescriptor.pEffect = pVolumeMeterAPO;
effectDescriptor.InitialState = true;
effectDescriptor.OutputChannels = 2;
XAUDIO2_EFFECT_CHAIN effectChain;
effectChain.EffectCount = 1;
effectChain.pEffectDescriptors = &effectDescriptor;
// Set the effect on the mastering voice
pMasteringVoice->SetEffectChain(&effectChain);
// Release the local reference to the effect
pVolumeMeterAPO->Release();
``````

One interesting exercise in this program is to play either a square wave or sawtooth wave through a band-pass filter with a Q of at least 4 or so. As you change the filter frequency, you can hear the individual overtones of the waveform. A square wave has only odd harmonics, but a sawtooth wave has both odd and even harmonics.

## The Graphic Equalizer

Time was, every well-equipped home audio setup included a graphic equalizer containing a row of vertical slide potentiometers controlling a bank of band-pass filters. In a graphic equalizer, each band-pass filter covers two-thirds or one-third of an octave in the total audio spectrum. Among professional sound engineers, graphic equalizers are used to adjust for the acoustic response of a room by boosting or cutting various frequencies. Home users often arrange the sliders in a “smile” pattern as imitated in Figure 9, boosting both the low and high ends and leaving the middle range softer so as to not inordinately interfere with conversation.

Figure 9 The GraphicEqualizer Program

The GraphicEqualizer program allows you to load an MP3 or WMA file from your Windows 8 Music Library and play it through a 1/3-octave graphic equalizer. The program contains 26 vertical sliders, each of which is associated with a band-pass filter. As you can see, each slider is labeled with the center frequency for that band. In theory, each center frequency should be the cube root of 2 (or about 1.26) higher than the previous filter, but lots of rounding is employed to keep the numbers sensible. Based on a photograph of a 1/3 graphic equalizer I found on Wikipedia, I labeled the 26 sliders starting with 20 Hz, 25, 31.5, 40 and up through 6.3 KHz, stopping short of the 7,350 Hz limit for a 44,100 Hz sampling rate.

Most graphic equalizers have a separate band of potentiometers for left and right channels, but I decided to forgo that amenity.

You’ve seen how a single filter can be applied to a particular IXAudio2SourceVoice instance, but the GraphicEqualizer program needs to apply 26 filters to a source voice. This is accomplished by creating 26 instances of IXAudio2SubmixVoice corresponding to these filters (plus a couple more), and creating what’s called in XAudio2 an “audio processing graph,” as shown in Figure 10. Each box is an instance of one of the three interfaces that derive from IXAudio2Voice, and the box identifies the variable name used in the GraphicEqualizer program.

Figure 10 The Audio Processing Graph Used in the GraphicEqualizer Program

An IXAudio2SubmixVoice instance can’t generate sound on its own. That privilege is reserved for source voices. But it can get input from a source voice (or another submix voice); apply a volume, filter or effect; and pass the result on to one or more submix voices, or to the mastering voice.

At the top of Figure 10 is the source voice that generates the music from a music file. At the bottom is the mastering voice that sends the result out to the computer’s sound hardware. In between are all the submix voices.

It’s a push model: Whenever you create a source voice or submix voice, you can indicate the destination voice (or voices) you want to receive the output of that voice. Later on, you can change the destination of that output with a call to SetOutputVoices. If you specify NULL in either case, the output goes to the mastering voice.

You indicate where you want the output of a voice to go with a pointer to an XAUDIO2_VOICE_SENDS structure, which contains two fields:

• SendCount of type unsigned integer
• pSends, which is a pointer to zero or more XAUDIO2_VOICE_DESCRIPTOR structures

The SendCount indicates the number of XAUDIO2_VOICE_DESCRIPTOR structures pointed to by pSends. It can be zero to indicate that the voice doesn’t go anywhere. The XAUDIO2_VOICE_DESCRIPTOR structure also has two fields:

• Flags, which can be 0 or XAUDIO2_SEND_USEFILTER
• pOutputVoice, of type IXAudio2Voice

The two IXAudio2SubmixVoice instances that feed into the bank of 26 submix voices, and the one that consolidates the output from those 26 voices, aren’t strictly needed to build the graphic equalizer, but they simplify the structure of the program. Whenever the program creates a new source voice—which happens whenever the user loads in a new music file—it just needs to direct the source voice’s output to the pSourceVoiceOutput instance.

The program also has a CheckBox button to bypass the equalizer. To disconnect the equalizer from the audio processing graph, all that’s necessary is to call SetOutputVoices on the pSourceVoiceOutput with a NULL pointer indicating it should go to the mastering voice. Restoring the equalizer involves a few lines of code to restore output from pSource­VoiceOutput to pEqualizerInput.

There are a couple of ways to define the filters that comprise a graphic equalizer. One approach is for each slider to change the Q of that filter—in effect making the filter more restrictive as you increase the slider. But I decided to keep the Q factor of each filter constant at a 1/3-octave bandwidth, or 4.318, and use the slider to vary the volume of that submix voice. Graphic equalizers usually allow switching the bank of sliders between a ±6 dB range and a ±12 dB range, but I decided on a ±24 dB range for more extreme effects.

When an equalizer slider is in its center position, the corresponding submix voice has a default volume of 1. Normally that would mean that a sound just passes through the submix voice unaltered. However, applying a filter with a Q of 4.318 in a submix voice causes the amplitude to increase by a factor of 4.318 at the center fre­quency. To compensate for that, the program sets the volume of the submix voice pEqualizerOutput to 1 divided by Q.

With all the sliders set in their center positions, clicking the CheckBox to switch the equalizer in and out of the audio graph causes no change in volume. The sound does change a little—­undoubtedly resulting from the way the various band-pass filters overlap—but the overall volume does not.

The equalizer sliders have their Minimum properties set to -24 and Maximum set to 24, corresponding to the gain in decibels. When the slider value changes, the volume for the corresponding submix voice is set in the ValueChanged event handler, like so:

``````Slider^ slider = dynamic_cast<Slider^>(sender);
int bandIndex = (int)slider->Tag;
float amplitude = float(pow(10, (args->NewValue / 20)));
pEqualizerBands[bandIndex]->SetVolume(amplitude, 0);
``````

That amplitude calculation is an inverse of the decibel calculation shown earlier. The resultant amplitude ranges from about 0.06 (at -24 dB) to about 16 (at 24 dB). If you keep in mind that each change of 6 dB is a halving or doubling of the center amplitude of 1, these ranges make sense. But if you crank up all the sliders to their maximum settings, the overall amplitude increases by a factor of 16, and the result is likely to be clipped and distorted.

In other words, the program implicitly assumes that the user maintains a balanced approach to life, and will reduce some sliders while increasing others.

Charles Petzold is a longtime contributor to MSDN Magazine and the author of “Programming Windows, 6th edition” (O’Reilly Media, 2012), a book about writing applications for Windows 8. His Web site is charlespetzold.com.

Thanks to the following technical expert for reviewing this article: Richard Fricks (Microsoft)