June 2016

Volume 31 Number 6

[Modern Apps]

Playing with Audio in the UWP

By Frank La

The Universal Windows Platform (UWP) has rich APIs for recording audio and video. However, the feature set doesn’t stop at recording. With just a few lines of code, developers can apply special effects to audio in real time. Effects such as reverb and echo are built into the API and are quite easy to implement. In this article, I’ll explore some of the basics of audio recording and applying special effects. I’ll create a UWP app that can record audio, save it, and apply various filters and special effects.

Setting up the Project to Record Audio

Recording audio requires the app to have permission to access the microphone and that requires modifying the app’s manifest file. In Solution Explorer, double-click on the Package.appxmanifest file. It’ll always be the in-root of the project.

Once the app manifest file editor window opens, click on the Capabilities tab. In the Capabilities list box, check the Microphone capability. This will allow your app access to the end user’s microphone. Without this, your app will throw an exception when you try to access the microphone.

Recording Audio

Before you start adding special effects to audio, you first want to be able to record audio. This is fairly straightforward. First, add a class to your project to encapsulate all the audio recording code. You’ll call this class AudioRecorder. It’ll have public methods to start and stop recording, as well as to play the audio clip you just recorded. To do this, you’ll need to add some members to your class. The first of these will be MediaCapture, which provides capabilities for capturing audio, video and images from a capture device, such as a microphone or webcam:

private MediaCapture _mediaCapture;

You’ll also want to add an InMemoryRandomAccessStream to capture the input from the microphone into memory:

private InMemoryRandomAccessStream _memoryBuffer;

In order to keep track of the state of your recording, you’ll add a publicly accessible property Boolean to your class:

public bool IsRecording { get; set; }

Recording the audio requires you to check if you’re already recording and if you are, the code will throw an exception. Other­wise, you’ll need to initialize your memory stream, delete the previous recording file and start recording.

Because the MediaCapture class provides multiple functions, you’ll have to specify that you want to capture audio. You’ll create an instance of MediaCaptureInitializationSettings to do just that. The code then creates an instance of a MediaCapture object and passes the MediaCaptureInitializationSettings to the InitializeAsync method, as shown in Figure 1.

Figure 1 Creating an Instance of a MediaCapture Object

public async void Record()
  {
  if (IsRecording)
  {
    throw new InvalidOperationException("Recording already in progress!");
  }
  await Initialize();
  await DeleteExistingFile();
  MediaCaptureInitializationSettings settings =
    new MediaCaptureInitializationSettings
  {
    StreamingCaptureMode = StreamingCaptureMode.Audio
  };
  _mediaCapture = new MediaCapture();
  await _mediaCapture.InitializeAsync(settings);
  await _mediaCapture.StartRecordToStreamAsync(
    MediaEncodingProfile.CreateMp3(AudioEncodingQuality.Auto), _memoryBuffer);
  IsRecording = true;
}

Finally, you’ll tell the MediaCapture object to start recording, passing along parameters that it records in MP3 format and where to store the data.

Stopping the recording requires far fewer lines of code:

public async void StopRecording()
{
  await _mediaCapture.StopRecordAsync();
  IsRecording = false;
  SaveAudioToFile();
}

The StopRecording method does three things: it tells the Media­Capture object to stop recording, sets the recording state to false and saves the audio stream data to an MP3 file on disk.

Saving Audio Data to Disk

Once the captured audio data is in the InMemoryRandom­AccessStream, you want to save the contents onto disk, as shown in Figure 2. Saving audio data from an in-memory stream requires you to copy the contents over to another stream and then push that data onto disk. Using the utilities in the Windows.ApplicationModel.Package namespace, you’re able to get the path to your app’s install directory. (During development, this will be in the \bin\x86\Debug directory of the project.) This is where you want the file to be recorded. You could easily modify the code to save elsewhere or have the user pick where to save the file.

Figure 2 Saving Audio Data to Disk

private async void SaveAudioToFile()
{
  IRandomAccessStream audioStream = _memoryBuffer.CloneStream();
  StorageFolder storageFolder = Package.Current.InstalledLocation;
  StorageFile storageFile = await storageFolder.CreateFileAsync(
    DEFAULT_AUDIO_FILENAME, CreationCollisionOption.GenerateUniqueName);
  this._fileName = storageFile.Name;
  using (IRandomAccessStream fileStream =
    await storageFile.OpenAsync(FileAccessMode.ReadWrite))
  {
    await RandomAccessStream.CopyAndCloseAsync(
      audioStream.GetInputStreamAt(0), fileStream.GetOutputStreamAt(0));
    await audioStream.FlushAsync();
    audioStream.Dispose();
  }
}

Playing Audio

Now that you have your audio data inside an in-memory buffer and on disk, you have two choices to play from: memory and disk.

The code for playing the audio from memory is quite simple. You create a new instance of the MediaElement control, set its source to the in-memory buffer, pass it a MIME type and then call the Play method.

public void Play()
{
  MediaElement playbackMediaElement = new MediaElement();
  playbackMediaElement.SetSource(_memoryBuffer, "MP3");
  playbackMediaElement.Play();
}

Playing from disk requires a little extra code, as opening files is an asynchronous task. In order to have the UI thread communicate with a task running on another thread, you’ll need to use the CoreDispatcher. The CoreDispatcher sends messages between the thread a given piece of code is running on and the UI thread. With it, code can get the UI context from another thread. For an excellent description of CoreDispatcher, read David Crook’s blog post on the subject at bit.ly/1SbJ6up.

Aside from the extra steps to handle the asynchronous code, the method resembles the previous one that uses the in-memory buffer:

public async Task PlayFromDisk(CoreDispatcher dispatcher)
{
  await dispatcher.RunAsync(CoreDispatcherPriority.Normal, async () =>
  {
    MediaElement playbackMediaElement = new MediaElement();
    StorageFolder storageFolder = Package.Current.InstalledLocation;
    StorageFile storageFile = await storageFolder.GetFileAsync(this._fileName);
    IRandomAccessStream stream = await storageFile.OpenAsync(FileAccessMode.Read);
    playbackMediaElement.SetSource(stream, storageFile.FileType);
    playbackMediaElement.Play();
  });
}

Building the UI

With the AudioRecorder class complete, the only thing left to do is build out the interface for the app. The interface for this project is quite simple, as all you need is a button to record and a button to play back the recorded audio, as shown in Figure 3. Accordingly, the XAML is simple: a TextBlock and a stack panel with two buttons:

<Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
  <Grid.RowDefinitions>
    <RowDefinition Height="43"/>
    <RowDefinition Height="*"/>
  </Grid.RowDefinitions>
<TextBlock FontSize="24">Audio in UWP</TextBlock>
<StackPanel HorizontalAlignment="Center" Grid.Row="1" >
  <Button Name="btnRecord" Click="btnRecord_Click">Record</Button>
  <Button Name="btnPlay" Click="btnPlay_Click">Play</Button>
</StackPanel>
</Grid>

AudioRecorder UI
Figure 3 AudioRecorder UI

In the codebehind class, you create a member variable of Audio­Recorder. This will be the object your app uses to record and play back audio:

AudioRecorder _audioRecorder;

You’ll instantiate the AudioRecorder class in the constructor of your app’s MainPage:

public MainPage()
{
  this.InitializeComponent();
  this._audioRecorder = new AudioRecorder();
}

The btnRecord button actually toggles the starting and stopping of audio recording. In order to keep the user informed of the current state of the AudioRecorder, the btnRecord_Click method changes the content of the btnRecord button, as well as starts and stops recording.

You have two options for the event handler for the btnPlay button: to play from the in-memory buffer or play from a file stored on disk.

To play from the in-memory buffer, the code is straightforward:

private void btnPlay_Click(object sender, RoutedEventArgs e)
{
  this._audioRecorder.Play();
}

As I mentioned previously, playing the file from disk happens asynchronously. This means that the task will run on a different thread than the UI thread. The OS scheduler will determine what thread the task will execute on at run time. Passing the Dispatcher object to the PlayFromDisk method allows the thread to get access to the UI context of the UI thread:

private async void btnPlay_Click(object sender, RoutedEventArgs e)
{
  await this._audioRecorder.PlayFromDisk(Dispatcher);
}

Applying Special Effects

Now that you have your app recording and playing back audio, the time has come to explore some of the lesser-known features in the UWP: real-time audio special effects. Included within the APIs in the Windows.Media.Audio namespace are a number of special effects that can add an additional touch to apps.

For this project, you’ll place all the special effects code into its own class. However, before you create the new class, you’ll make one last modification to the AudioRecorder class. I’ll add the following method:

public async Task<StorageFile>
   GetStorageFile(CoreDispatcher dispatcher)
{
  StorageFolder storageFolder =
    Package.Current.InstalledLocation;
  StorageFile storageFile =
    await storageFolder.GetFileAsync(this._fileName);
  return storageFile;
}

The GetStorageFile method returns a StorageFile object to the saved audio file. This is how my special effects class will access the audio data.

Introducing the AudioGraph

The AudioGraph class is central to advanced audio scenarios in the UWP. An AudioGraph can route audio data from input source nodes to output source nodes through various mixing nodes. The full extent and power AudioGraph lies beyond the scope of this article, but it’s something I plan to dive more deeply into in future articles. For now, the important point is that every node in an audio graph can have multiple audio effects applied to them. For more information on AudioGraph, be sure to read the article on the Windows Dev Center at bit.ly/1VCIBfD.

First, you’ll want to add a class called AudioEffects to your project and add the following members:

private AudioGraph _audioGraph;
private AudioFileInputNode _fileInputNode;
private AudioDeviceOutputNode _deviceOutputNode;

In order to create an instance of the AudioGraph class, you need to create an AudioGraphSettings object, which contains the configuration settings for the AudioGraph. You then call the AudioGraph.Create­Async method passing these configuration settings. The CreateAsync method returns a CreateAudioGraphResult object. This class provides access to the created audio graph and a status value whether the audio graph creation failed or succeeded. 

You also need to create an output node to play the audio.  To do so, call the CreateDevice­OutputNodeAsync method on the AudioGraph class and set the member variable to the DeviceOutputNode property of the CreateAudioDeviceOutputNodeResult. The code to initialize the AudioGraph and the Audio­DeviceOutputNode all resides in the InitializeAudioGraph method here:

public async Task InitializeAudioGraph()
{
  AudioGraphSettings settings = new AudioGraphSettings(AudioRenderCategory.Media);
  CreateAudioGraphResult result = await AudioGraph.CreateAsync(settings);
  this._audioGraph = result.Graph;
  CreateAudioDeviceOutputNodeResult outputDeviceNodeResult =
    await this._audioGraph.CreateDeviceOutputNodeAsync();
  _deviceOutputNode = outputDeviceNodeResult.DeviceOutputNode;
}

Playing audio from an AudioGraph object is easy; simply call the Play method. Because the AudioGraph is a private member of your AudioEffects class, you’ll need to wrap a public method around it to make it accessible:

public void Play()
{
this._audioGraph.Start();
}

Now that you have the output device node created on the Audio­Graph, you need to create an input node from the audio file stored on disk. You’ll also need to add an outgoing connection to the FileInputNode. In this case, you want the outgoing node to be your audio output device. That’s exactly what you do in the LoadFileIntoGraph method:

public async Task LoadFileIntoGraph(StorageFile audioFile)
{
  CreateAudioFileInputNodeResult audioFileInputResult =
    await this._audioGraph.CreateFileInputNodeAsync(audioFile);
  _fileInputNode = audioFileInputResult.FileInputNode;
  _fileInputNode.AddOutgoingConnection(_deviceOutputNode);
  CreateAndAddEchoEffect();
}

You’ll also notice a reference to the CreateAndAddEchoEffect method, which I’ll discuss next.

Adding the Audio Effect

There are four built-in audio effects in the audio graph API: echo, reverb, equalizer and limiter. In this case, you want to add an echo to the recorded sound. Adding this effect is as easy as creating an EchoEffectDefition object and setting the properties of the effect. Once created, you need to add the effect definition to a node. In this case, you want to add the effect to the _fileInputNode, which contains the audio data recorded and saved onto disk:

private void CreateAndAddEchoEffect()
{
  EchoEffectDefinition echoEffectDefinition = new EchoEffectDefinition(this._audioGraph);
  echoEffectDefinition.Delay = 100.0f;
  echoEffectDefinition.WetDryMix = 0.7f;
  echoEffectDefinition.Feedback = 0.5f;
  _fileInputNode.EffectDefinitions.Add(echoEffectDefinition);
}

Putting It All Together

Now that you have the AudioEffect class completed, you can use it from the UI. First, you’ll add a button to your app’s main page:

<Button Content="Play with Special Effect" Click="btnSpecialEffectPlay_Click" />

And inside the click event handler, you get the file where the audio data is stored, create an instance of the AudioEffects class and pass it to the audio data file. Once that’s all done, all you need to do to play the sound is call the Play method:

private async void btnSpecialEffectPlay_Click(object sender, RoutedEventArgs e)
{
  var storageFile = await this._audioRecorder.GetStorageFile(Dispatcher);
  AudioEffects effects = new AudioEffects();
  await effects.InitializeAudioGraph();
  await effects.LoadFileIntoGraph(storageFile);
  effects.Play();
}

You run the app and click Record to record a small clip. To hear it as it was recorded, click the Play button. To hear the same audio with an echo added to it, click Play with Special Effect.

Wrapping Up

The UWP not only has rich support for capturing audio, but it also has some superb features to apply special effects to media in real time. Included with the platform are several effects that can be applied to audio. Among these are echo, reverb, equalizer and limiter. These effects can be applied individually or in any number of combinations. The only limit is your imagination.


Frank La Vigne is a technology evangelist on the Microsoft Technology and Civic Engagement team, where he helps users leverage technology in order to create a better community. He blogs regularly at FranksWorld.com and has a YouTube channel called Frank’s World TV (youtube.com/FranksWorldTV).

Thanks to the following technical experts for reviewing this article: Drew Batchelor and Jose Luis Manners


Discuss this article in the MSDN Magazine forum