Additional Design Considerations (Compact 2013)

Article
11/09/2017

3/26/2014

This topic provides design considerations for implementing a DirectShow decoder filter. Use the sections in this topic as follows:

To avoid deadlocks, you must set up an appropriate Threading Model.
To avoid jittery playback, you must ensure Starvation Avoidance.
To manage data flow, you must support Seeking, Passing Through, and Flushing.
To support data disassembly and reassembly, you can use a Scatter/gather Technique.
To use MPEG-4 as your encoding format, it is helpful to read some MPEG-4-Specific Notes.

Threading Model

A decoder must process samples on a separate thread to enable asynchronous calls from the Filter Graph Manager to occur on the main thread. Even while busy processing samples, the decoder must be able to respond to calls from the Filter Graph Manager; otherwise, multiple commands can attempt to access the same resource and cause what is known as a deadlock. In DirectShow, the thread on which the Filter Graph Manager calls are made is the application thread. The threads that process and pass the media samples downstream are the streaming threads. Because calls from the Filter Graph Manager can occur any time, the application thread and the streaming threads must be synchronized. The synchronization mechanism is described in the Application and Streaming Threads section.

Application and Streaming Threads

DirectShow base classes already define and use some synchronization mechanisms to ensure mutual exclusivity between the application and the streaming threads. Synchronization mechanisms consist of some critical section objects and related classes. Your code can use the same mechanisms wherever needed. For more information, see The Streaming and Application Threads.

Demultiplexers use the COutputQueue class for each output pin to manage a queue and a streaming thread to push out samples. A decoder can therefore expect to receive a call to its IMemInputPin::Receive or IMemInputPin::ReceiveMultiple method from that thread. Because this thread is only for streaming and is separate from the application thread where the calls from the Filter Graph Manager occur, decoders do not have to create a different thread. It may seem intuitive to handle the decoding asynchronously in a different thread because the operation is CPU intensive, but by using COutputQueue in the demultiplexer, you free the upstream thread that fetches the next sample. It is acceptable to keep the caller thread busy during decoding.

However, to push out the decoded samples, the decoder must use a separate thread. Instead of implementing all of the code to maintain a queue for holding the decoded frames and the associated thread, you can use the same COutputQueue base class in the decoder to easily handle the job.

The following figure shows the threads in a typical filter graph. All of the threads shown in the figure are separate threads. Each thread is shown to cross filter boundaries to represent the function calls that are made from one filter to functions in the next filter in the context of that same thread.

DirectShow Streaming Threads

Some platforms may have slightly different componentization for the decoders and renderers. As a result, the final stage shown in the previous figure will vary accordingly. However, despite the different componentization, there must still be separate streaming threads for decoding and rendering, because the decoder’s streaming thread continues to run as long as samples come in without any regard to the timestamp, whereas the renderer renders only when the timestamp on the sample becomes current.

Starvation Avoidance

Starvation refers to a lack of data to process in any part of the playback pipeline. When starvation occurs, an event is sent to the Filter Graph Manager that pauses the playback so that more data can be buffered and playback can resume. This pause causes jittery playback and a bad user experience. The following sections describe conditions that can cause starvation at different stages in the playback pipeline.

Queue Depth

Queue depth determines the number of messages that the queue can hold. A common tendency during implementation of a DirectShow filter is to create a deep working queue to hold samples in case the incoming sample rate cannot keep up with the playback rate. However, in practice, an upstream filter can buffer enough samples to ensure smooth playback, whether the upstream filter is the source filter or, if present, the buffering stream filter. If the queue is too deep, the decoder may buffer too many samples when the upstream buffer is waiting for a free spot to fill, resulting in starvation.

To ensure the best performance, set the queue depth so that the decoder holds the minimum number of samples required. For example, a simple audio decoder should keep a queue depth of only two samples. While one sample is being decoded, the queue is free to receive an incoming sample.

Remember that the sample that has been decoded should be released as soon as possible. However, some decoders need to hold on to a few additional samples because of the encoding format of the media. For example, a video decoder might use a group of pictures method in which the decoding of a sample depends on a previous sample. In another scenario, the sample might be only a part of the full frame. In that case, the decoder must collect all of the samples that belong to the frame before it decodes the whole frame. For more information, see the discussion about scatter/gather techniques later in this topic.

Thread Priority

If the upstream thread that is responsible for buffering does not receive enough CPU cycles, it can cause starvation. In other words, if the downstream filters consume data faster than the upstream thread can buffer, it causes the buffer to run out of samples and cause starvation. Therefore, make sure that you do not run the streaming threads in the decoders at more than normal priority.

Bandwidth

If the source filter has insufficient bandwidth to read the file or receive the stream, starvation will inevitably occur. For example, starvation can occur when a device attempts to play a high bitrate file over a relatively slow network connection. In this situation, there is not much that the decoder can do to avoid starvation.

In this example, when starvation occurs, the buffering component in the pipeline sends an EC_STARVATION notification to the Filter Graph Manager, and the Filter Graph Manager pauses the playback to let the buffers fill up so the user experiences a longer pause instead of frequent shorter glitches. However, to ensure a good user experience, the bitrate of the media must be lower than the available bandwidth at the source.

Seeking, Passing Through, and Flushing

While the filter graph is running, you can manage data flow through the pipeline and reduce latency by using the operations in the following list:

Seek. Moves the source filter to a new position in the file and generates samples from this new position.
Pass-through. Allows data to move through the pipeline unchanged.
Flush. Instructs all downstream filters to clear the pipeline and discard any samples. You can use of the flush operation to erase existing samples before a seek operation generates new samples.

Playback filter graphs handle seek requests from the Filter Graph Manager through the IMediaSeeking interface or, for older applications, the IMediaPosition Interface.

Important

IMediaPosition is a deprecated function. You should use this function only if an application requires a time format of decimal seconds stored as a double.

These interfaces are queried on each filter’s output pin. Each filter in the filter graph has to either handle the call or pass it on to the upstream filter’s output pin. A decoder does not handle these calls, but it must be able to pass these calls upstream. DirectShow provides the CPosPassThru Class base class for this task.

To implement CPosPassThru, the decoder just creates an instance of the class, passing the pointer to its input pin, which is connected to the upstream filter’s output pin, and forwards the IUnknown::QueryInterface call for IID_IMediaSeeking or IID_IMediaPosition to its output pin to the instance of CPosPassThru, as shown in the following example code.

// Sample code showing use of CPosPassThru
// Assuming that m_pPosPassThru is a member of type CPosPassThru *
// m_pPosPassThru needs to be deleted in COutputPin d-tor

STDMETHODIMP COutputPin::NonDelegatingQueryInterface(REFIID riid, void **ppv)
{
    HRESULT hr = S_OK;
    {
       // if the CPosPassThru instance has not been created, create it now
        if (!m_pPosPassThru)
        {
            HRESULT hr = S_OK;
            m_pPosPassThru = new CPosPassThru(this,  &hr, GetPin(0));
            if (!m_pPosPassThru)
            {
                return E_OUTOFMEMORY;
            }
            else if (FAILED(hr))
            {
                delete m_pPosPassThru;
                m_pPosPassThru = NULL;
                return hr;
            }
        }

        return m_pPosPassThru->NonDelegatingQueryInterface(riid, ppv);
    }
    else
    {
         // Other interfaces (not shown).
    }
}

Every decoder must also respond to flush calls. Flushing refers to discarding all streaming data in the pipeline and entering a state where incoming samples are not accepted. Flushing is necessary to support seeking. Flushing starts and ends with calls to IPin::BeginFlush and IPin::EndFlush on the input pin. CBaseInputPin already has the necessary code to set the m_bFlushing flag. The decoder must override at least IPin::BeginFlush to discard the samples in progress. It also must check for the flag within its IMemInputPin::Receive or IMemInputPin::ReceiveMultiple method and discard the samples as long as the flag is set. For more information about how to use this method to begin a flush operation, see CBaseInputPin::BeginFlush.

Scatter/gather Technique

Scatter/gather refers to a two-part technique. The first part, the scatter operation, breaks a large sample into multiple parts and sends the parts separately. In the second part, the gather operation, the decoder filter merges the parts to re-create the large sample. Theoretically, an upstream filter can send the parts out of order, in which case the receiver must reorder them.

Incoming frames in the filter graph may be spread across multiple fixed-size IMediaSample Interface buffers. Most demultiplexers in Windows Embedded Compact 2013 ensure that a whole frame is contained within one sample before handing it off to the decoder. When you use one of these demultiplexers, the decoder does not have to handle scatter/gather.

However, the MPEG-2 demultiplexer does require a scatter/gather implementation. Because of the nature of MPEG-2 video encoding, the full frame size may not be known at the container level without actually parsing the sample payload. Instead of parsing the encoded bytes, the MPEG-2 demultiplexer just sends fixed-size IMediaSample buffers to the decoder. If you implement an MPEG-2 decoder, it must be able to gather the video samples.

If the underlying decoding hardware of the device supports scatter/gather, it is easy for the decoder filter to reassemble the samples. If it does not support scatter/gather, the MPEG-2 decoder filter must reconstruct the sample by copying the received parts of the sample to memory and merging them before providing the sample to the hardware. The decoder filter determines the length of the complete sample by parsing the sample header and it can determine the length of a partial sample by calling IMediaSample::GetActualDataLength.

MPEG-4-Specific Notes

The following are notes that can be helpful for the decoder implementer to know when using MPEG-4 with Windows Embedded Compact 2013:

The MPEG-4 demultiplexer ensures that every sample fits in a buffer by requesting a buffer size equal to the largest sample in the media on the output allocator.
MPEG-4 seeking always starts from an I-Frame, also known as a key frame or an intra-encoded frame. If a seek is requested to a non-I-Frame, the demultiplexer finds the closest I-Frame and seeks to that location. This process might cause a staggered seek experience if the media that is being played is encoded with very sparse I-Frames.

Note

The decoding of these frames does not depend on information from other frames.