Voice Performance

Other versions of this page are also available for the following:

Windows Mobile Not SupportedWindows Embedded CE Supported


Properly handling samples in the audio driver is key to achieving good sound quality and performance with audio sessions managed by RTC. These sessions can be a core part of a Voice over IP (VoIP) architecture.

The following best practices are useful to consider when optimizing voice performance:

  • A driver's audio capture characteristics have a greater influence on overall VoIP quality than a driver's audio rendering characteristics. These characteristics can have an effect on performance that is more important than, and even independent of, your device's raw processing power.
  • The smallest captured sound sample used in the an RTC Client API system is about 20 milliseconds (ms) long. The actual sample size depends upon the audio codec being used. It is set in the sample's WAVEHDR structure.
  • The new media stack in RTC 1.5 does not rely on audio driver firing interrupts, whenever an audio capture packet is available. Instead the media stack now polls if an audio capture packet is available. RTC media stack checks if WHDR_DONE is set for any submitted capture wave headers.
  • The overall latency for a VoIP receiving device is the sum of the times spent capturing the audio data, encoding it, sending it over the network, decoding it, and rendering it.
  • If the audio capture time is the sum of many samples filling the DMA buffer, the total end-to-end time of the process can exceed the acceptable latency limit for a phone call, which is about 200 ms.
    You can achieve low-latency audio capture by indicating the completion of small WAVEHDR structures as soon as each one is filled.

See Also

Other Resources

Using the RTC Client API