Writing a network audio driver in Windows CE – Part 2 - Timing

[Revision 10/20/05 - minor changes to make content clearer]


This blog will discuss timing, the first bullet point in my first blog about networked audio drivers.

In an audio driver that plays directly to local hardware (LAD), the mechanism for timing is usually relatively simple. The driver writes to hardware buffers, stops writing when they are full, gets signaled when there are free buffers again. A networked audio driver (NAD) in the other hand sends packets down the network stack it is using, which then go over some medium to a different device, where it gets played.

Note that some of the issues discussed below are also valid for a LAD, but they become more visible/bigger issues in NADs, because NADs might have higher local CPU usage, less information about sink device, less control over sink device, greater latency and performance issues due to the network they stream over, etc.

One common system for NADs is to establish a connection through some sort of handshake where the codec type/codec settings/device settings are negotiated and then initiate streaming which is a one way communication from the source device to the sink device.

Using the above mechanism, the sink’s timing implementation is just to play the sounds at the correct speed, and buffer everything else it receives until it is needed. The source implementation is more interesting due to the challenges below:

1. It needs to send data fast enough so that the sink has enough data to play at any one time.

2. It must not send data so fast that the sinks buffers get overrun or latency due to excessive buffering becomes strongly noticeable.

3. If there are no built-in throttling mechanisms in the network technology it is using, it must send data to the network at a speed that will not adversely flood the network.

If 1 is not followed, the sink will not have enough data to play, which will cause either skips or quality degradation if the sink attempts some skip mitigation technique. A simple way for the source to figure out how fast it needs to send data is to follow the audio sampling rate (kbit/s) of the audio data it is streaming (and possibly adjust data rate if there is extra data per packet), but trying to figure out what to do when this data rate is not possible due to performance problems in the source device or network presents more design options.

If the issue is permanent, probably the best solution is to renegotiate the streaming rate between the devices to lower quality and thus lower kbit/s, or use a different codec.

But sometimes performance issues can be intermittent and allow for solutions that will not yield a decrease of quality. For example, assume the music is being streamed from a device where at some point a higher priority application starts using most of the CPU cycles for a short period of time, temporarily starving the NAD, which will yield a skip on the sink side. In the case where the skip in inevitable, the driver might choose to make the skip longer to fill all local buffers and/or temporarily decrease system load before starting to stream again to avoid multiple skips when the system becomes busy. This creates better user experience, and much as one skip is annoying, 100 skips in 10 seconds are considerably worst.

On point 2, to avoid sink side overrun, the source side needs to ensure that:

· The data is not consistently sent faster than the sink can consume it;

o If the source sends more data than the sink can use consistently, even if it is a very small amount, eventually it will completely fill the buffers in the sink side which will have to start to use some mechanism to throw data away.

· There are no burst of data sent that are larger than the buffers in the sink side.

o For instance, for efficiency reasons the source side might decide to send n milliseconds worth of audio data to the source at a time for efficiency reasons. The amount of data in n milliseconds must never be bigger than the size of the buffers in the sink side.

For latency issues, the sink side buffering can also be important. For example, if all the source can do is send data bits to the sink and no control messages in the lines of “flush buffer”, if n milliseconds of data are buffered, several user scenarios can be affected, for example:

· It will take n milliseconds to stop playing from the time the source stops sending data;

· If user decides to change the data that is being streamed (i.e. skip to another track mid-track), it will take n milliseconds for the change to reflect in the sink;

· If there are sounds created in real-time (i.e. system sounds) those will only be played n milliseconds later;

· If video is being played concurrent to the sounds, the sound and video will become unsynchronized.

If (n + network latency time) is small enough this is quite imperceptible to the user, but otherwise this can be a serious user experience issue, so it is important to design buffering appropriately.

I will not discuss 3 here since there are a lot of network timing information published, significant research in this area and most networking technologies supply some sort of load control/balancing (for some examples, see how TCP does this at the IETF website).

        Unfortunately, I don’t know of any sample audio drivers in CE which have timing mechanisms built in other than interrupts, so the writer of a NAD would have to implement timing themselves.

        There are of course many ways this could be done, but a couple of possible ways would be:

· Minor changes to wavdev sample driver, and add a middle layer between the stack and the driver that blocks the driver on writes for the necessary amount of time for the appropriate data rate.

o The middle layer would know how fast the data should go to the network, how much has been sent at any one point, and the time elapsed;

o If the audio driver tries to send a data packet at a time before any more data in meant to be sent to the sink, it is blocked until the appropriate time;

o If it tries to send data at or after the appropriate time it is allowed to do so without blocking.

o Some allowance for buffering should be made, so instead of one hard number for data sent in time n, it might be necessary to use 2 or more thresholds.

· Change wavdev (or write an audio driver from scratch) to do its own timing on writes (sends):

o This solution does not need any extra layers for timing, but involves more changes on the sample driver;

o Most of the above changes explained for adding an extra layer to block writes apply, but the driver itself needs to stop and trigger its writes.

The choice of which way to implement this of course depends on the technologies being used, and other design trade-offs.


Author Note: I apologize for posting this one late for the data mentioned in my initial blog.

[Author: Thais Melo]