Unlocking the power of HTML5

By Giorgio Sardo | Sr. Technical Evangelist – HTML5 and Internet Explorer

Sounds form the background of our life. Today the HTML5 <audio> element enables Web developers to embed sounds in their applications. The flexibility of the control coupled with the integration with the rest of the platform allows several scenarios, from simple sound effects to background audio to gaming experiences to more sophisticated audio engines.

This blog post walks through some of the best practices for using the <audio> tag in your Web applications, and includes useful tips from real-world sites.

Adding an audio element to your page

The very first step is to add the audio element to your page. You can do this by declaring an <audio> tag in your markup, by instantiating a new audio element in the JavaScript code, or by embedding the audio stream in the page:

The first approach allows you to initialize the audio components during the page load. The second approach gives you more flexibility and better management of the network flow, as it defers the loading of the audio clip to a specific time during the application lifecycle. The third approach (less recommended) consists in embedding the audio files as data-uri in the page, reducing the number of requests to the server.

Note that you can play an audio element generated by JavaScript even if it’s not been actually added to the DOM tree (like in the code snippet above). However adding the audio element to the page will allow you to display the default control bar.

Although not covered in this post, you can support more than one audiofile format. Also, if you are hosting the audio files on your server, remember to register the MIME type for mp3 files (“audio/mpeg”) on the server side. Here, for instance, is the setting on Internet Information Services (IIS).

Preloading the audio before playing

Once you have your audio element, you can choose the best preloading strategy. The HTML5 <audio> specification describes a preload property with three possible values:

  • “none”: hints to the user agent that either the author does not expect the user to need the media resource, or that the server wants to minimize unnecessary traffic.
  • If your scenario is a podcast blog with an audio file for each post, this option works particularly well, as it reduce the initial preload bandwidth. Once the user plays the file (either through the default visual controls or the JavaScript methods load() or play()), the browser will start fetching the audio stream.
  • “metadata”: hints to the user agent that the author does not expect the user to need the media resource, but that fetching the resource metadata (dimensions, duration, etc.) is reasonable.
  • This option is recommended if you are building an audio player control and you need basic information about the audio clip, but don’t need to play it yet.
  • “auto”: hints to the user agent that the user agent can put the user's needs first without risk to the server, up to and including optimistically downloading the entire resource. If you are building a game, this approach is probably the best one, as it allows you to preload all the audio clips before actually starting the game experience.

Note that when you set the src property of the audio element programmatically, the browser will set the preload property – unless otherwise specified – to “auto.” For this reason, if your scenario needs a different value, make sure to specify it in a line of code before setting the src.

You can preview the impact over the network of these three options by running this page using the F12 Developer Tools (Network Tab). For debug purposes, you can simulate new calls and disable the local cache by checking the “Always refresh from sever” menu.




While this property is great for the initialization phase, you might also need to know when the browser actually downloaded the audio clip and is ready to play it. You can get this information by listening to the “canplaythrough” event; this event is called by the User Agent once it estimates that if playback were to be started now, the media resource could be rendered at the current playback rate all the way to its end without having to stop for further buffering.


Another frequent request for scenarios with audio is the ability to loop a sound clip. With the HTML5 <audio>, you can do this using the “loop” property; this setting will loop your clip forever, or until the user or the application activates the pause() audio control.

Another approach to loop an audio file is to programmatically call the play()method when the audio clip ends; doing so will allow you eventually to manage the delay between one loop and the other.

Note that any play() call executed on the audio element before the sound actually ended won’t have any effect. If you are interested to “cancel and restart” the current sound, you will need to reset the currentTime.

Multiple audio tags

If your scenario needs the same audio file to be played several times concurrently (that is, with overlapping sounds), you can achieve this result by creating multiple audio tags pointing to the same file. Obviously the same approach also works if you are using different audio files at the same time. As we explained earlier in this post, you can either add those programmatically or by instantiating them in the markup.

The following code snippet shows how to load and play multiple audio files using markup. The audio samples all have the same length; at the end of the execution, they will loop starting from the beginning. As you play them in Internet Explorer 9, you can notice that they are automatically synchronized throughout various loops. You will notice that the combination of these 5 sounds will play like the audio file used in the previous demo (“sample.mp3”).

While this approach is very simple and straightforward, in most scenarios developers prefer to create the audio clips programmatically. The following code snippet shows how to add 3 audio clips dynamically using code. As you play them together, you will get the C Major chord!

This code pattern works on any browser and will allow you to build very compelling scenarios!

It’s important to keep in mind that as your application or game becomes more complex you might eventually reach two limits: the number of audio elements you can preload on the same page and the number of audio elements you can play at the same time.

These numbers depend on the browser and the capabilities of your PC. Based on my experience, Internet Explorer 9 can handle dozens of concurrent audio elements simultaneously with no issues. Other browsers don’t do as well – you might encounter evident delays and distortions as you play multiple files in a loop.

Synchronization strategies

Depending on the characteristics of the network, you should always consider the delay involved between adding the tag, getting the content and being ready to play it. In particular, when you are handling multiple files, each file might be ready to play earlier or later. Here, for instance, is a capture with the 3 files used previously, loaded from the local host.

As you can see in the Timings column, different files might be ready at a different time.

A very common synchronization strategy is to preload all the files first. Once they are all ready, you can quickly iterate through a loop and start playing them.

Let’s bring everything together now! The following demo simulates a piano playing Frère Jacques (also known as Brother John, Brother Peter…or Fra Martino). The page starts fetching all the notes, showing the progress as they get preloaded on the client. Once they are all ready, the song starts and keeps playing in a loop.

Run live demo

Audio in real world sites

Now that we’ve seen the common patterns to handle multiple audio files, I’d like to highlight a few Web sites as examples of best practice uses of the tag.

Pirates Love Daises: www.pirateslovedaisies.com

In another blog post I talked about Pirates Love Daises, an awesome HTML5 game built by Grant Skinner. In addition to great game play and compelling visual effects, Grant’s team also developed a sophisticated audio library that plays several audio samples throughout the game. The main logic is encapsulated within the AudioManager class. As suggested earlier, before actually starting the game the site preloads all the audio clips and display the cumulative progress in the initial loading screen. The site takes also into consideration the case of network timeouts or errors occurred while downloading an audio file.

Grant is currently working on a Sound Library project that will allow developers to use their sound engine’s logic with any other application. Looking forward to that!

Firework (by Mike Tompkins): www.beautyoftheweb.com/firework

The Firework demo is particularly interesting, as it allow you to interact with several audio tracks at the same time, dynamically changing the volume of each track. Moreover, as you interact with the audio channels, the interface dynamically reacts to different inputs or settings.

This time the audio tags have been declared in the HTML markup (there are just 6 tracks). The progress is tracked programmatically by listening to the canplaythrough event. Once all audio files are ready to play, a loop goes through the list and starts playing.

The developers in this case also made the decision to start with the volume set to 0 and to increase it dynamically to 1 as soon as the experience is ready to play. Depending on the quality of your audio card and drivers, this little trick reduces the likelihood of hearing an initial “knock” noise when the audio starts.

BeatKeep: www.beatkeep.net

The last scenario is probably the most complicated of the examples shown here. In this example, you can build your own songs using a beat machine and playing several audio clips in a loop. In this application, it’s critical to have perfect synchronization of the audio channels and an agile buffering system to load multiple clips.

The beat machine gives you full control over the tempo and the time signature; using sophisticated timer logic and binding model—the end result is a very smooth experience!


I encourage you to try all the samples and applications in this post using Internet Explorer 9 or other browsers and let us know what your experience was like! You can download all the sample code used in this article here.

If you want to learn more about the audio and video controls, I recommend that you watch the half-hour session from MIX “5 Things You Need To Know To Start Using and Today” or read these interesting articles on the MSDN.

Thanks toDoubleDominantfor the audio clips used in this blog post and toGrant SkinnerandArchetypefor their great HTML5 experiences.