Audio for games

Learn how to develop and incorporate music and sounds into your DirectX game, and how to process the audio signals to create dynamic and positional sounds.

For audio programming, we recommend using either the XAudio2 library in DirectX, or the Windows Runtime Audio graphs APIs. We use XAudio2 here. XAudio2 is a low-level audio library that provides a signal processing and mixing foundation for games, and it supports a variety of formats.

You can also implement simple sounds and music playback with Microsoft Media Foundation. Microsoft Media Foundation is designed for the playback of media files and streams, both audio and video, but can also be used in games, and is particularly useful for cinematic scenes or non-interactive components of your game.

Concepts at a glance

Here are a few audio programming concepts we use in this section.

  • Signals are the basic unit of sound programming, analogous to pixels in graphics. The digital signal processors (DSPs) that process them are like the pixel shaders of game audio. They can transform signals, or combine them, or filter them. By programming to the DSPs, you can alter your game's sound effects and music with as little or as much complexity as you need.
  • Voices are the submixed composites of two or more signals. There are 3 types of XAudio2 voice objects: source, submix, and mastering voices. Source voices operate on audio data provided by the client. Source and submix voices send their output to one or more submix or mastering voices. Submix and mastering voices mix the audio from all voices feeding them, and operate on the result. Mastering voices write audio data to an audio device.
  • Mixing is the process of combining several discrete voices, such as the sound effects and the background audio that are played back in a scene, into a single stream. Submixing is the process of combining several discrete signals, such as the component sounds of an engine noise, and creating a voice.
  • Audio formats. Music and sound effects can be stored in a variety of digital formats for your game. There are uncompressed formats, like WAV, and compressed formats like MP3 and OGG. The more a sample is compressed -- typically designated by its bit rate, where the lower the bit rate is, the more lossy the compression -- the worse fidelity it has. Fidelity can vary across compression schemes and bit rates, so experiment with them to find what works best for your game.
  • Sample rate and quality. Sounds can be sampled at different rates, and sounds sampled at a lower rate have much poorer fidelity. The sample rate for CD quality is 44.1 Khz (44100 Hz). If you don't need high fidelity for a sound, you can choose a lower sample rate. Higher rates may be appropriate for professional audio applications, but you probably don't need them unless your game demands professional fidelity sound.
  • Sound emitters (or sources). In XAudio2, sound emitters are locations that emit a sound, be it a mere blip of a background noise or a snarling rock track played by an in-game jukebox. You specify emitters by world coordinates.
  • Sound listeners. A sound listener is often the player, or perhaps an AI entity in a more advanced game, that processes the sounds received from a listener. You can submix that sound into the audio stream for playback to the player, or you can use it to take a specific in-game action, like awakening an AI guard marked as a listener.

Design considerations

Audio is a tremendously important part of game design and development. Many gamers can recall a mediocre game elevated to legendary status just because of a memorable soundtrack, or great voice work and sound mixing, or overall stellar audio production. Music and sound define a game's personality, and establish the main motive that defines the game and makes it stand apart from other similar games. The effort you spend designing and developing your game's audio profile will be well worth it.

Positional 3D audio can add a level of immersion beyond that provided by 3D graphics. If you are developing a complex game that simulates a world, or which demands a cinematic style, consider using 3D positional audio techniques to really draw the player in.

DirectX audio development roadmap

XAudio2 conceptual resources

XAudio2 is the audio mixing library for DirectX, and is primarily intended for developing high performance audio engines for games. For game developers who want to add sound effects and background music to their modern games, XAudio2 offers an audio graph and mixing engine with low-latency and support for dynamic buffers, synchronous sample-accurate playback, and implicit source rate conversion.

Topic Description

Introduction to XAudio2

The topic provides a list of the audio programming features supported by XAudio2.

Getting Started with XAudio2

This topic provides information on key XAudio2 concepts, XAudio2 versions, and the RIFF audio format.

Common Audio Programming Concepts

This topic provides an overview of common audio concepts with which an audio developer should be familiar.

XAudio2 Voices

This topic contains an overview of XAudio2 voices, which are used to submix, operate on, and master audio data.

XAudio2 Callbacks

This topic covers the XAudio 2 callbacks, which are used to prevent breaks in the audio playback.

XAudio2 Audio Graphs

This topic covers the XAudio2 audio processing graphs, which take a set of audio streams from the client as input, process them, and deliver the final result to an audio device.

XAudio2 Audio Effects

The topic covers XAudio2 audio effects, which take incoming audio data and perform some operation on the data (such as a reverb effect) before passing it on.

Streaming Audio Data with XAudio2

This topic covers audio streaming with XAudio2.

X3DAudio

this topic covers X3DAudio, an API used in conjunction with XAudio2 to create the illusion of a sound coming from a point in 3D space.

XAudio2 Programming Reference

This section contains the complete reference for the XAudio2 APIs.

XAudio2 "how to" resources

Topic Description

How to: Initialize XAudio2

Learn how to initialize XAudio2 for audio playback by creating an instance of the XAudio2 engine, and creating a mastering voice.

How to: Load Audio Data Files in XAudio2

Learn how to populate the structures required to play audio data in XAudio2.

How to: Play a Sound with XAudio2

Learn how to play previously-loaded audio data in XAudio2.

How to: Use Submix Voices

Learn how to set groups of voices to send their output to the same submix voice.

How to: Use Source Voice Callbacks

Learn how to use XAudio2 source voice callbacks.

How to: Use Engine Callbacks

Learn how to use XAudio2 engine callbacks.

How to: Build a Basic Audio Processing Graph

Learn how to create an audio processing graph, constructed from a single mastering voice and a single source voice.

How to: Dynamically Add or Remove Voices From an Audio Graph

Learn how to add or remove submix voices from a graph that has been created following the steps in How to: Build a Basic Audio Processing Graph.

How to: Create an Effect Chain

Learn how to apply an effect chain to a voice to allow custom processing of the audio data for that voice.

How to: Create an XAPO

Learn how to implement IXAPO to create an XAudio2 audio processing object (XAPO).

How to: Add Run-time Parameter Support to an XAPO

Learn how to add run-time parameter support to an XAPO by implementing the IXAPOParameters interface.

How to: Use an XAPO in XAudio2

Learn how to use an effect implemented as an XAPO in an XAudio2 effect chain.

How to: Use XAPOFX in XAudio2

Learn how to use one of the effects included in XAPOFX in an XAudio2 effect chain.

How to: Stream a Sound from Disk

Learn how to stream audio data in XAudio2 by creating a separate thread to read an audio buffer, and to use callbacks to control that thread.

How to: Integrate X3DAudio with XAudio2

Learn how to use X3DAudio to provide the volume and pitch values for XAudio2 voices as well as the parameters for the XAudio2 built-in reverb effect.

How to: Group Audio Methods as an Operation Set

Learn how to use XAudio2 operation sets to make a group of method calls take effect at the same time.

Debugging Audio Glitches in XAudio2

Learn how to set the debug logging level for XAudio2.

Media Foundation resources

Media Foundation (MF) is a media platform for streaming audio and video playback. You can use the Media Foundation APIs to stream audio and video encoded and compressed with a variety of algorithms. It is not designed for real-time gameplay scenarios; instead, it provides powerful tools and broad codec support for more linear capture and presentation of audio and video components.

Topic Description

About Media Foundation

This section contains general information about the Media Foundation APIs, and the tools available to support them.

Media Foundation: Essential Concepts

This topic introduces some concepts that you will need to understand before writing a Media Foundation application.

Media Foundation Architecture

This section describes the general design of Microsoft Media Foundation, as well as the media primitives and processing pipeline it uses.

Audio/Video Capture

This topic describes how to use Microsoft Media Foundation to perform audio and video capture.

Audio/Video Playback

This topic describes how to implement audio/video playback in your app.

Supported Media Formats in Media Foundation

This topic lists the media formats that Microsoft Media Foundation supports natively. (Third parties can support additional formats by writing custom plug-ins.)

Encoding and File Authoring

This topic describes how to use Microsoft Media Foundation to perform audio and video encoding, and author media files.

Windows Media Codecs

This topic describes how to use the features of the Windows Media Audio and Video codecs to produce and consume compressed data streams.

Media Foundation Programming Reference

This section contains reference information for the Media Foundation APIs.

Media Foundation SDK Samples

This section lists sample apps that demonstrate how to use Media Foundation.

Windows Runtime XAML media types

If you are using DirectX-XAML interop, you can incorporate the Windows Runtime XAML media APIs into your UWP apps using DirectX with C++ for simpler game scenarios.

Topic Description

Windows.UI.Xaml.Controls.MediaElement

XAML element that represents an object that contains audio, video, or both.

Audio, video, and camera

Learn how to incorporate basic audio and video in your Universal Windows Platform (UWP) app.

MediaElement

Learn how to play a locally-stored media file in your UWP app.

MediaElement

Learn how to stream a media file with low-latency in your UWP app.

Media casting

Learn how to use the Play To contract to stream media from your UWP app to another device.

Reference