Share via


Microsoft Edge Codec Capabilities

[Some information relates to pre-released product which may be substantially modified before it's commercially released. Microsoft makes no warranties, express or implied, with respect to the information provided here.]

This section describes the codec capabilities supported by the Microsoft Edge ORTC API implementation.

Note  Microsoft Edge does not currently support header extensions or retransmission [RFC4588].

 

Audio Codecs mandatory-to-implement in WebRTCs (as noted in [RTCWEB-AUDIO]) include:

Note  The Microsoft Edge ORTC implementation supports the G.711, G.722, DTMF, CN, Opus and SILK [SILK] audio codecs.

 

Video Codecs mandatory-to-implement in WebRTCs (as noted in [RTCWEB-VIDEO]) include:

Note  Microsoft Edge currently supports the H.264UC video codec [MS-H264PF] which is based on H.264/SVC [RFC6190]. Work on support for H.264/AVC is in progress.

 

The capabilities of codecs supported by Microsoft Edge are described below.

Audio Codecs

G.711

The G.711 audio codec does not include any parameters or options in its capabilities. Since it is an audio codec there is no support for multi-stream transport or temporal or spatial scalability. Only mono operation is supported (numChannels set to 1). As noted in [RFC3551] Section 6, G.711 has been assigned static payload types for the PCMU and PCMA variants, which Microsoft Edge provides as the preferred payload type.

The Dominant Speaker History Notification (dsh) application RTCP feedback message allows a mixer to provide dominant speaker history. The packet format is described in [MS-RTP] Section 2.2.12.3, and RTCP feedback messages supported by Microsoft Edge (including dsh) are described in [MS-SDPEXT] Section 3.1.5.30.2. Since this application RTCP feedback message is only useful when communicating with a Skype for Business mixer, in other situations this proprietary feedback message should not be configured within RTCRtpParameters passed as an argument in send() or receive().

Example

{
  "name":"PCMU",
  "kind":"audio",
  "clockRate":8000,
  "preferredPayloadType":0,
  "numChannels":1,
  "rtcpFeedback":[{
    "type":"x-message",
    "parameter":"app send:dsh recv:dsh"
   }],
   "parameters":{},
   "options":{},
   "maxTemporalLayers":0,
   "maxSpatialLayers":0,
   "svcMultiStreamSupport":false
},
{
  "name":"PCMA",
  "kind":"audio",
  "clockRate":8000,
  "preferredPayloadType":8,
  "numChannels":1,
  "rtcpFeedback":[{
    "type":"x-message",
    "parameter":"app send:dsh recv:dsh"
  }],
  "parameters":{},
  "options":{},
  "maxTemporalLayers":0,
  "maxSpatialLayers":0,
  "svcMultiStreamSupport":false
}

G.722

The G.722 audio codec does not include any parameters or options in its capabilities. Since it is an audio codec there is no support for multi-stream transport or temporal or spatial scalability. As noted in [RFC3551] Section 6, G.722 has been assigned a static payload type, which Microsoft Edge provides as the preferred payload type.

While G.722-stereo is described in [MS-RTP] Section 2.2.1.1 and [MS-SDPEXT] Section 3.5.1.3, the G.722 implementation in Microsoft Edge only supports mono (numChannels set to 1).

Example

{
  "name":"G722",
  "kind":"audio",
  "clockRate":8000,
  "preferredPayloadType":9,
  "numChannels":1,
  "rtcpFeedback":[{
    "type":"x-message",
    "parameter":"app send:dsh recv:dsh"
   }],
   "parameters":{},
   "options":{},
   "maxTemporalLayers":0,
   "maxSpatialLayers":0,
   "svcMultiStreamSupport":false
}

Comfort Noise

The Comfort Noise (CN) audio codec does not include any parameters or options in its capabilities and settings. Since it is an audio codec, it does not support multi-stream transport or temporal or spatial scalability. Only mono operation is supported (numChannels set to 1). clockRate values of 8000 and 16000 are supported. As noted in [RFC3389], CN has been assigned a static payload type for a clockRate value of 8000, which Microsoft Edge provides as the preferred payload type. For a clockRate value of 16000, Microsoft Edge assigns a dynamic payload type.

When CN is configured for use along with the G.711, G.722 and SILK codecs, CN clockRate is set to 8000. Since the Microsoft Edge implementation of Opus does not support Discontinuous Transmission (DTX), it is possible (though not advisable if interoperability is desired) to use CN along with Opus, in which case the CN value of clockRate to 16000 would be configured.

Example

{
  "name":"CN",
  "kind":"audio",
  "clockRate":8000,
  "preferredPayloadType":13,
  "numChannels":1,
  "rtcpFeedback":[],
  "parameters":{},
  "options":{},
  "maxTemporalLayers":0,
  "maxSpatialLayers":0,
  "svcMultiStreamSupport":false
},
{
  "name":"CN",
  "kind":"audio",
  "clockRate":16000,
  "preferredPayloadType":118,
  "numChannels":1,
  "rtcpFeedback":[],
  "parameters":{},
  "options":{},
  "maxTemporalLayers":0,
  "maxSpatialLayers":0,
  "svcMultiStreamSupport":false
}

DTMF

The DTMF audio codec only includes a single parameter in its capabilities, as noted below. Since it is a mono audio codec, only a single channel is supported and it does not support multi-stream transport or temporal or spatial scalability. As noted in [RFC4733] DTMF utilizes a dynamic payload type so that Microsoft Edge assigns a preferred payload type in the dynamic range.

As noted in [MS-DTMF], DTMF cannot be used with the Microsoft implementation of Redundant Audio Data [MS-RTPRADEX]. The Microsoft Edge implementation of DTMF supports events 0-15 described in [RFC4733].

The single capability and setting supported by DTMF is defined in [RFC4733] Section 2.4:

Property Name Values Notes
events DOMString An indication of what telephony events are supported (or configured). Events are listed as one or more comma-separated elements. Each element can be either a single integer providing the value of an event code or an integer followed by a hyphen and a larger integer, presenting a range of consecutive event code values. The list does not have to be sorted. No white space is allowed in the argument. The union of all of the individual event codes and event code ranges designates the complete set of event numbers supported by the implementation.

 

Example

{
  "name":"telephone-event",
  "kind":"audio",
  "clockRate":8000,
  "preferredPayloadType":101,
  "numChannels":1,
  "rtcpFeedback":[],
  "parameters":{
    "events":"0-15"
  },
  "options":{},
  "maxTemporalLayers":0,
  "maxSpatialLayers":0,
  "svcMultiStreamSupport":false
}

Opus

The Microsoft Edge implementation of Opus does not support any parameters or options in Opus capabilities. While [OPUS-RTP] Section 7 describes Opus settings such as stereo, useinbandfec and usedtx, the Microsoft Edge implementation of Opus currently only supports mono operation, cannot support Discontinous Operation (DTX) and does not support inband FEC. While it is possible to configure external FEC (RED) for use with Opus to enhance robustness (distance of 3 is supported versus a distance of 1 for inband FEC), this will not interoperate with other Opus implementations. Interoperability issues may also be experienced if CN is configured to provide voice activity detection.

Since it is an audio codec, Opus does not support multi-stream transport or temporal or spatial scalability. While a numChannels value of 2 is supported, currently the numChannels value does not affect Opus codec operation (only mono is supported, regardless of the value of numChannels). Since Opus utilizes a dynamic payload type Microsoft Edge assigns a preferred payload type in the dynamic range. Currently clockRate can only be set to 48000.

Example

{
  "name":"OPUS",
  "kind":"audio",
  "clockRate":48000,
  "preferredPayloadType":106,
  "numChannels":1,
  "rtcpFeedback":[{
    "type":"x-message",
    "parameter":"app send:dsh recv:dsh"
   }],
   "parameters":{},
   "options":{},
   "maxTemporalLayers":0,
   "maxSpatialLayers":0,
   "svcMultiStreamSupport":false
}

SILK

For the SILK audio codec, Microsoft Edge does not support any parameters or options within the capabilities. Since it is an audio codec, SILK does not support multi-stream transport or temporal or spatial scalability. Only mono operation is supported (numChannels set to 1), along with clockRate values of 8000 and 16000. Since SILK utilizes a dynamic payload type Microsoft Edge assigns a preferred payload type in the dynamic range.

SILK settings are described in [MS-SDPEXT] Section 3.1.5.3. This includes a discussion of 8000 and 16000 clock rates, as well as the usedtx and useinbandfec settings. However, neither inband FEC or DTX is configurable in Microsoft Edge, although external FEC (RED) and CN may be configured.

Example

{
 "name":"SILK",
 "kind":"audio",
 "clockRate":8000,
 "preferredPayloadType":103,
 "numChannels":1,
 "rtcpFeedback":[{
   "type":"x-message",
   "parameter":"app send:dsh recv:dsh"
  }],
  "parameters":{},
  "options":{},
  "maxTemporalLayers":0,
  "maxSpatialLayers":0,
  "svcMultiStreamSupport":false
},
{
 "name":"SILK",
 "kind":"audio",
 "clockRate":16000,
 "preferredPayloadType":104,
 "numChannels":1,
 "rtcpFeedback":[{
   "type":"x-message",
   "parameter":"app send:dsh recv:dsh"
  }],
  "parameters":{},
  "options":{},
  "maxTemporalLayers":0,
  "maxSpatialLayers":0,
  "svcMultiStreamSupport":false
}

RED

For use with audio codecs, Microsoft Edge supports Redundant Audio Data (RED) [RFC2198] with extensions described in [MS-RTPRADEX]. Since it is only for use with audio, RED does not support multi-stream transport or temporal or spatial scalability. Only a single channel is supported. RED can be used to protect the G.711, G.722, SILK, CN and Opus payload types. When RED is configured, it is used to protect all configured audio codecs using only a single dynamically allocated RED payload type. There are no configuration parameters to set.

Note  Microsoft Edge Interop Note: As noted in [MS-RTPRADEX], within Microsoft's implementation of RED, only a single block of redundant audio data is supported, along with a block of primary data. The primary audio block and redundant audio block MUST use the same codec. In addition to advertising RED as an audio codec, Microsoft Edge capabilities also include RED as a mechanism for carriage of Forward Error Correction (FEC), since FEC is provided in the redundant audio data block, with a maximum distance of 3 (e.g. "fecMechanisms":["RED"]"). Since the FEC mechanism utilized with RED is proprietary, RED should not be configured within RTCRtpParameters when interoperation with other WebRTC implementations is desired.

 

Example

{
  "name":"RED",
  "kind":"audio",
  "clockRate":8000,
  "preferredPayloadType":97,
  "numChannels":1,
  "rtcpFeedback":[],
  "parameters":{},
  "options":{},
  "maxTemporalLayers":0,
  "maxSpatialLayers":0,
  "svcMultiStreamSupport":false
}

Video Codecs

H.264UC

Microsoft Edge supports the H.264UC [MS-H264PF] video codec, along with an end-to-end Forward Error Correction (FEC) scheme. For both H.264UC and FEC, Microsoft Edge assigns a preferred payload type in the dynamic range.

Microsoft Edge returns the example RTCRtpCapabilities object below in response to a call to getCapabilities("video"). H.264UC supports a maximum of 3 temporal layers using multi-stream transport, along with spatial simulcast.

Note  The H.264UC codec utilizes MRST transport, along with packetization-mode 1. Temporal scalability (a maximum of 3 layers) and spatial simulcast (a maximum of 2 layers) are automatically enabled and the encoding is dynamically adjusted based on network conditions. As a result, rtcpFeedback, parameters, options, maxTemporalLayers, maxSpatialLayers and svcMultiStreamSupport SHOULD be configured as they are provided in the complete RTCRtpCapabilities object shown below.

 

Example

{
 "codecs":[
 {
   "name":"X-H264UC",
   "kind":"video",
   "clockRate":90000,
   "preferredPayloadType":122,
   "numChannels":1,
   "rtcpFeedback":[{
      "type":"x-message",
      "parameter":"app send:src,x-pli recv:src,x-pli"
   }],
     "parameters":{
        "packetization-mode":"1",
        "mst-mode":"NI-TC"
   },
    "options":{},
    "maxTemporalLayers":3,
    "maxSpatialLayers":0,
    "svcMultiStreamSupport":true
 },
 {
    "name":"x-ulpfecuc",
    "kind":"video",
    "clockRate":90000,
    "preferredPayloadType":123,
    "numChannels":1,
    "rtcpFeedback":[],
    "parameters":{},
    "options":{},
    "maxTemporalLayers":0,
    "maxSpatialLayers":0,
    "svcMultiStreamSupport":false
 }],
 "headerExtensions":[],
 "fecMechanisms":[]
}

Note  Microsoft Edge Interop Note: H.264UC supports an end-to-end Forward Error Correction (FEC) scheme known as "ULPFECUC", described in [MS-SDPEXT] 3.1.5.3. Configuration of "ULPFECUC" is recommended when H.264UC is configured since this improves resilience against packet loss. For resilience, it is also recommended that the x-pli and src RTCP feedback messages be configured along with H.264UC. The x-pli RTCP feedback message is described in [MS-RTP] Section 2.2.12.1; it differs from the standard PLI message defined in [RFC4585] Section 6.3.1 in that it contains only one Feedback Control Information (FCI) field. There are no configurable settings for "ULPFECUC", and when using it, the fecMechanisms attribute is set to null as noted above. The "src" proprietary RTCP feedback message advertised as capability corresponds to the Video Source Request (VSR) RTCP feedback message described in [MS-RTP] Section 2.2.12.2.

 

H.264

Microsoft Edge does not yet support H.264. The ORTC API defines the following capabilities are defined for H.264, as noted in [RFC6184] Section 8.1, and [RTCWEB-VIDEO] Section 6.2.

Property Name Values Notes
profile-level-id unsigned long This parameter, defined in [RFC6184] Section 8.1, is mandatory to support, as noted in [RTCWEB-VIDEO] Section 6.2.
packetization-mode sequence<unsigned short> A sequence of unsigned shorts, each ranging from 0 to 2, indicating supported packetization-mode values. As noted in [RTCWEB-VIDEO] Section 6.2, support for packetization-mode 1 is mandatory.
max-mbps, max-smbps, max-fs, max-cpb, max-dpb, max-br unsigned long long As noted in [RTCWEB-VIDEO] Section 6.2, these optional parameters allow the implementation to specify that they can support certain features of H.264 at higher rates and values than those signalled with profile-level-id.

 

VP8

Microsoft Edge assigns a preferred payload type in the dynamic range. Currently, simulcast is not supported for VP8. The following capabilities are supported for VP8:

Property Name Values Notes
max-fr unsigned long Receiver. Indicates the maximum frame rate in frames per second that the decoder is capable of decoding.
max-fs unsigned long long Receiver. Indicates the maximum frame size in macroblocks that the decoder is capable of decoding.

 

The following VP8 sender settings are supported:

Property Name Values Notes
max-fr unsigned long Receiver. Indicates the maximum frame rate in frames per second that the decoder is capable of decoding.
max-fs unsigned long long Receiver. Indicates the maximum frame size in macroblocks that the decoder is capable of decoding.

 

RTX

Microsoft Edge supports the following capability that is defined for RTX:

Property Name Values Notes
rtxTime unsigned long Sender. The default time in milliseconds (measured from the time a packet was first sent) that the sender keeps an RTP packet in its buffers available for retransmission.

 

The RTX codec supports one setting:

Property Name Values Notes
apt payloadType Sender/receiver. The associated payload type of the original stream being retransmitted. There will be an "rtx" entry in RTCRtpParameters.codecs for each media codec that can be retransmitted, each with their own apt parameter.