March 2010

Volume 25 Number 03

IIS Smooth Streaming - Enhancing Silverlight Video Experiences with Contextual Data

By Jit Ghosh | March 2010

There are two primary requirements for enabling a glitch-free viewing experience in Web-based, high-definition digital video delivery. First, the video provider needs to support high video delivery bit rates over the network. Second, the client computer needs to support continuous availability of processing capacity to decode the video at its fullest resolution.

The reality, however, is that network bandwidth for connected home computers can fluctuate significantly over time, and in certain parts of the world high bandwidth comes at a very high premium or is unavailable to many consumers. Along with that, the processing capacity of the client computer can vary, depending on CPU load at any given moment. As a result, consumers are susceptible to degradation in the quality of their viewing experience when a video stutters or freezes while the player is waiting to buffer enough data to show the next set of video frames, or waiting for the CPU cycles to decode those frames.

Adaptive streaming is a video delivery approach that addresses the problem of smooth content delivery and decoding. With adaptive streaming, video content is encoded at a range of bit rates and made available through a specialized streaming server. An adaptive streaming player constantly monitors various resource utilization metrics on the client computer and uses that information to compute the appropriate bit rate that the client can most efficiently decode and display at given the current resource constraints.

The player requests chunks of video encoded at that currently appropriate bit rate, and the streaming server responds with content from the video sources encoded at that bit rate. As a result, when resource conditions degrade, the player can continue displaying the video without any significant disruptions, with only a slight degradation in overall resolution, until an improvement or further degradation in conditions causes a different bit rate to be requested.

This kind of a continuous collaboration between the player and the server requires a special implementation of processing logic on both the streaming server and the client runtime implementing the player. Internet Information Server (IIS) Smooth Streaming is the server-side implementation of adaptive streaming over HTTP from Microsoft. The client-side implementation is provided as an extension to Microsoft Silverlight.

The IIS Smooth Streaming Player Development Kit is a Silverlight library that lets applications consume content being streamed over IIS Smooth Streaming. It also provides a rich API that offers programmatic access to various aspects of the Smooth Streaming logic.

In this article I will walk you through the basics of Smooth Streaming, and explain how you can use the IIS Smooth Streaming Player Development Kit to build rich user experiences around video. Specifically, I will look at using the Player Development Kit to consume a stream, with a close examination of the client-side data model for streams and tracks. I will show you how to consume additional data streams, such as closed captions and animations, and merge external data streams with an existing presentation. You’ll see how to schedule external clips such as advertisements within a presentation, handle variable playback rates and build composite manifests that lend to robust editing scenarios.

How Smooth Streaming Works

You can encode video for Smooth Streaming by using one of the supplied profiles in Expression Encoder 3.0. For one source video file, several files are created in the destination folder. Figure 1 shows the files created for a source video named FighterPilot.wmv.


Figure 1 Files Generated for Smooth Streaming by Expression Encoder

Each of the files with an .ismv extension contains the video encoded at a specific bit rate. For example, the FighterPilot_331.ismv contains the video encoded at a bit rate of 331 kbps, while FighterPilot_2056.ismv contains the video encoded at 2 mbps.

For each bit rate, the video content is broken into two-second fragments, and the .ismv files store these fragments in a file format called Protected Interoperable File Format (PIFF). Note that you can have additional audio tracks (or just audio in case the presentation is audio only) encoded in similar files that have an .isma extension.

Getting the Smooth Streaming Environment

To try out the examples discussed in this article, you will need to prepare a Smooth Streaming environment on your development machines.

The server-side ingredient is straightforward: you’ll need to download and install IIS Media Services 3.0 for IIS7 from iis.net/media using the Microsoft Web Platform installer.

You will need a copy of Microsoft Expression Encoder 3.0 to prepare video for Smooth Streaming. While there is a free evaluation version of Expression Encoder 3.0, that version does not include support for Smooth Streaming. You will need a licensed installation of Expression Encoder to create your own video.

The FighterPilot.ism file is a server manifest, which is structured in Synchronized Multimedia Integration Language (SMIL) format and contains a mapping of quality levels and bit rates to the .ismv and .isma files. This mapping in the server manifest is used by the server to access the right disk files to create the next fragment of content encoded at the right bit rate, before responding to a client side request. Figure 2 shows an excerpt of a server manifest file.

Figure 2 Sample Server Manifest

<smil xmlns="https://www.w3.org/2001/SMIL20/Language">
  <head>
    <meta name="clientManifestRelativePath"
      content="FighterPilot.ismc" />
  </head>
  <body>
    <switch>
      <video src="FighterPilot_2962.ismv"
        systemBitrate="2962000">
        <param name="trackID"
          value="2" valuetype="data" />
      </video>
      <video src="FighterPilot_2056.ismv"
        systemBitrate="2056000">
        <param name="trackID"
          value="2" valuetype="data" />
      </video>
      ...
      <audio src="FighterPilot_2962.ismv"
        systemBitrate="64000">
        <param name="trackID"
          value="1" valuetype="data" />
      </audio>
    </switch>
  </body>
</smil>

The server manifest also contains a mapping to a client manifest file (identified by the extension .ismc), which in my example is FighterPilot.ismc. The client manifest contains all the information that the Silverlight client will need to access the various media and data streams, as well as metadata about those streams, such as quality levels, available bit rates, timing information, codec initialization data and so on. The client-side logic will use this metadata to sample and decode the fragments and request bit rate switches based on prevailing local conditions.

At run time, the presentation begins with the client requesting the client manifest from the server. Once the client receives the manifest, it checks to see what bit rates are available and requests fragments of content starting at the lowest available bit rate. In response, the server prepares and sends the fragments by reading the data from the disk file encoded at that bit rate (using the mapping in the server manifest). The content is then displayed on the client.

The client gradually requests higher bit rates as allowed by the resource-monitoring logic, and eventually reaches the highest allowable bit rate as determined by the prevailing resource conditions. This interchange continues until the client’s monitoring logic senses a change in resource conditions resulting in a different lower desired bit rate. Subsequent client requests are for media encoded at the new bit rate, and the server again responds accordingly. This goes on until the presentation completes or is stopped.

Smooth Streaming with Silverlight

Getting video to play in Silverlight is a fairly uncomplicated effort. At a fundamental level, all you really need to do is add an instance of the MediaElement type to your XAML file, set the appropriate properties to control the MediaElement behavior, and make sure that the MediaElement.Source property points to a valid media source URI. For example, this XAML will play the FighterPilot.wmv video automatically as soon as the Silverlight page is launched, in a 640x360 rectangle:

<MediaElement AutoPlay="True" 
  Source="https://localhost/Media/FighterPilot.wmv" 
  Width="640" Height="360" />

The System.Windows.Controls.MediaElement type also exposes an API that allows you to control the behavior of the play experience in code and to build a player complete with standard controls like Play, Pause, Seek and so on. This approach works great with either progressively downloaded or HTTP streamed media as long as the container format and the encoding used is one that the Silverlight runtime has built-in support for.

What about file formats or codecs that are not supported out of the box by Silverlight? The MediaStreamSource (MSS) type enables an extensibility mechanism that allows you to take control of the media file parsing and decoding process by introducing your own custom parser and decoder into the Silverlight media pipeline. To do this, you need to implement a concrete type extending the abstract System.Windows.Media.MediaStreamSource, and then pass an instance of it to MediaElement using the MediaElement.SetSource method.

The MSS implementation will need to handle every aspect of the media consumption process short of the actual rendering—from receiving the media stream from a remote location, to parsing the container and associated metadata, to sampling individual audio and video samples and passing them to MediaElement for rendering.

Because the logic required to decode Smooth Streaming was not built into Silverlight, the first version of Smooth Streaming (part of IIS Media Services 2.0) was accompanied by a custom MSS implementation that handled all of the communication, parsing, and sampling logic, and also implemented the machine and network state-monitoring functionality.

For the most part, this approach worked well for Smooth Streaming, but there were a few shortcomings. The MSS is essentially a black box in that the only API it exposes directly is to facilitate interchange of raw audio and video samples between itself and a MediaElement. As a Silverlight developer, you do not have a direct way to interface with the MSS while in action. If the content being consumed had additional data like embedded text, animation or secondary camera angles, or if the streaming solution allowed for finer-grained control over the streams like variable playback rates, there was no way for you to programmatically access that additional data in a structured way because you were limited to interfacing with the fixed API set that MediaElement always exposes.

For Smooth Streaming, this poses a challenge. As you will see later in this article, the Smooth Streaming manifests and wire/file formats are pretty rich in terms of the additional content and metadata that can be carried, and with the MSS approach you could not get at that information. You need a Silverlight API that offers more control over and access to the Smooth Streaming solution.

IIS Smooth Streaming Player Development Kit

And that brings me to the IIS Smooth Streaming Player Development Kit. The Player Development Kit consists of a single assembly named 
Microsoft.Web.Media.SmoothStreaming.dll. At its heart is a type named Microsoft.Web.Media.SmoothStreaming.SmoothStreamingMediaElement (SSME). Using SSME in your code is almost identical to the way you would use a regular MediaElement:

<UserControl x:Class="SSPlayer.Page"
  xmlns="https://schemas.microsoft.com/winfx/2006/xaml/presentation" 
  xmlns:x="https://schemas.microsoft.com/winfx/2006/xaml" 
  xmlns:ss="clr-namespace:Microsoft.Web.Media.SmoothStreaming;assembly=Microsoft.Web.Media.SmoothStreaming">
  <Grid x:Name="LayoutRoot" Background="White">
    <ss:SmoothStreamingMediaElement AutoPlay="True" 
      Width="640" Height="360"
      SmoothStreamingSource="https://localhost/SmoothStreaming/Media/FighterPilot/FighterPilot.ism/manifest"/>
  </Grid>
</UserControl>

The SmoothStreamingSource property points SSME to a valid Smooth Streaming presentation. In general, the SSME API is a superset of the MediaElement API; this property is one of the few differences. SSME exposes the Source property just like MediaElement does, but SSME also exposes the SmoothStreamingSource property to attach to smooth streams. If you are authoring players that need the ability to consume both smooth streams and the other formats traditionally supported by MediaElement, you can safely use SSME, but you will likely need to author some code to set the right property to attach to the media source. Something like this:

private void SetMediaSource(string MediaSourceUri, 
  SmoothStreamingMediaElement ssme) {

  if (MediaSourceUri.Contains(".ism"))
    ssme.SmoothStreamingSource = new Uri(MediaSourceUri); 
  else
    ssme.Source = new Uri(MediaSourceUri); 
}

The other major difference to keep in mind is that SSME does not expose a SetSource overload that accepts a MediaStreamSource type. If you need to use a custom MSS, you should do that through MediaElement. 

Streams and Tracks

The Smooth Streaming client manifest contains rich metadata about the presentation and it can be useful to have programmatic access to that metadata inside your player application. SSME exposes parts of this metadata through a well-defined API in an arrangement of streams and tracks within each stream.

A stream represents the overall metadata for tracks of a specific type—video, audio, text, advertisements and so on. The stream also acts as a container for multiple tracks of the same underlying type. In a client manifest (see Figure 3), each StreamIndex entry represents a stream. There can be multiple streams in the presentation, as depicted by the multiple StreamIndex entries. There can also be multiple streams of the same type. In such cases, the stream name can be used to disambiguate among multiple occurrences of the same type.

Figure 3 Excerpt from a Client Manifest

<SmoothStreamingMedia MajorVersion="2" MinorVersion="0" 
  Duration="1456860000">
  <StreamIndex Type="video" Chunks="73" QualityLevels="8" 
    MaxWidth="1280" MaxHeight="720" 
    DisplayWidth="1280" DisplayHeight="720"
    Url="QualityLevels({bitrate})/Fragments(video={start time})">
    <QualityLevel Index="0" Bitrate="2962000" FourCC="WVC1" 
      MaxWidth="1280" MaxHeight="720"
      CodecPrivateData="250000010FD37E27F1678A27F859E80490825A645A64400000010E5A67F840" />
    <QualityLevel Index="1" Bitrate="2056000" FourCC="WVC1" 
      MaxWidth="992" MaxHeight="560" 
      CodecPrivateData="250000010FD37E1EF1178A1EF845E8049081BEBE7D7CC00000010E5A67F840" />
    ...
    <c n="0" d="20020000" />
    <c n="1" d="20020000" />
    ...
    <c n="71" d="20020000" />
    <c n="72" d="15010001" />
  </StreamIndex>
  <StreamIndex Type="audio" Index="0" FourCC="WMAP" 
    Chunks="73" QualityLevels="1" 
    Url="QualityLevels({bitrate})/Fragments(audio={start time})">
    <QualityLevel Bitrate="64000" SamplingRate="44100" Channels="2" 
      BitsPerSample="16" PacketSize="2973" AudioTag="354" 
      CodecPrivateData="1000030000000000000000000000E00042C0" />
    <c n="0" d="21246187" />
    <c n="1" d="19620819" />
    ...
    <c n="71" d="19504762" />
    <c n="72" d="14900906" />
  </StreamIndex>
  <StreamIndex Type="text" Name="ClosedCaptions" Subtype="CAPT" 
    TimeScale="10000000" ParentStreamIndex="video" 
    ManifestOutput="TRUE" QualityLevels="1" Chunks="2" 
    Url="QualityLevels({bitrate},{CustomAttributes})/Fragments(ClosedCaptions={start time})">
    <QualityLevel Index="0" Bitrate="1000" 
      CodecPrivateData="" FourCC=""/> 
    <c n="0" t="100000000">
      <f>...</f> 
    </c>
    <c n="1" t="150000000">
      <f>...</f>
    </c>
  </StreamIndex>
  ...
</SmoothStreamingMedia>

The StreamInfo type represents the stream in your Silverlight code. Once SSME downloads the client manifest, it raises the SmoothStreamingMediaElement.ManifestReady event. At this point the SmoothStreamingMediaElement.AvailableStreams collection property contains a StreamInfo instance for each StreamIndex entry in the client manifest.

For a given video stream in the client manifest, the video track is broken into many fragments of two-second duration, and each c element in the manifest represents metadata for the fragment. In this case, the fragments in the track are contiguous and define the entire duration of the video track without any breaks in between—in other words, the stream is not sparse.

For a closed caption stream, the track includes only two fragments, each with individual timing information (the t attribute on the c element). Further, the ParentStreamIndex attribute is set to “video,” parenting the closed caption stream with the video stream. This causes the closed caption stream to align with the timing information from the video stream—the closed caption stream starts and ends exactly with its parent video stream, and the first caption is displayed 10 seconds into the video stream while the 
second is displayed 15 seconds into the video. A stream in which the timeline is based on a parent stream and the fragments are non-contiguous is called a sparse stream. 

A track is a timed sequence of fragments of content of a specific type—video, audio or text. Each track is represented using an instance of a TrackInfo type, and all the tracks in a stream are made available through the StreamInfo.AvailableTracks collection property.

Each track in a client manifest is uniquely identified via a QualityLevel. A QualityLevel is identified by the associated bit rate, and is exposed through the TrackInfo.Bitrate property. For example, a video stream in a client manifest may have several QualityLevels, each with a unique bit rate. Each represents a unique track of the same video content, encoded at the bit rate specified by the QualityLevel.

Custom Attributes and Manifest Output

Custom attributes are a way to add additional stream- or track-specific information to the manifest. Custom attributes are specified using a CustomAttribute element, which can contain multiple data elements expressed as key/value pairs. Each data element is expressed as an Attribute element, with Key and Value attributes specifying the data element key and the data element value. In cases where distinct quality levels do not apply, such as multiple tracks within a stream with the same track name and bit rate, a custom attribute can also be used to disambiguate tracks from each other. Figure 4 shows an example of custom attribute usage.

Figure 4 Using Custom Attributes in the Client Manifest

<StreamIndex Type="video" Chunks="12" QualityLevels="2" 
  MaxWidth="1280" MaxHeight="720" 
  DisplayWidth="1280" DisplayHeight="720" 
  Url="QualityLevels({bitrate})/Fragments(video={start time})">
  <CustomAttributes>
    <Attribute Key="CameraAngle" Value="RoofCam"/>
    <Attribute Key="AccessLevel" Value="PaidSubscription"/>
  </CustomAttributes>
  <QualityLevel Index="0" Bitrate="2962000" FourCC="WVC1" 
    MaxWidth="1280" MaxHeight="720"
    CodecPrivateData="250000010FD37E27F1678A27F859E80490825A645A64400000010E5A67F840">
    <CustomAttributes>
      <Attribute Name = "hardwareProfile" Value = "10000" />
    </CustomAttributes>
  </QualityLevel>
...
</StreamIndex>

Custom attributes added to a manifest do not affect any SSME behavior automatically. They are a way for the production workflow to introduce custom data into the manifest that your player code can receive and act upon. For example, in Figure 4, you may want to look for the AccessLevel custom attribute key in the video stream custom attributes collection, and expose that video stream only to paying subscribers as instructed in the attribute’s value.

The StreamInfo.CustomAttributes collection property exposes a dictionary of string key/value pairs for all custom attributes applied at the stream level (as direct CustomAttribute child elements to the StreamIndex element). The TrackInfo.CustomAttributes property exposes the same for all custom attributes applied at the track level (as direct children to the QualityLevel element).

When the ManifestOutput attribute on the stream (the StreamIndex element) is set to TRUE, the client manifest can actually contain the data representing each fragment for the tracks within the stream. Figure 5 shows an example.

Figure 5 Manifest Output

<StreamIndex Type="text" Name="ClosedCaptions" Subtype="CAPT" 
  TimeScale="10000000" ParentStreamIndex="video" 
  ManifestOutput="TRUE" QualityLevels="1" Chunks="6" 
  Url="QualityLevels({bitrate},{CustomAttributes})/Fragments(ClosedCaptions={start time})"> 
  <QualityLevel Index="0" Bitrate="1000" CodecPrivateData="" FourCC=""/> 
  <c n="0" t="100000000">
    <f>PENhcHRpb24gSWQ9IntERTkwRkFDRC1CQzAxLTQzZjItQTRFQy02QTAxQTQ5QkFGQkJ9IiAKICAgICAgICBBY3Rp</f>
  </c>
  <c n="1" t="150000000">
    <f>PENhcHRpb24gSWQ9IntERTkwRkFDRC1CQzAxLTQzZjItQTRFQy02QTAxQTQ5QkFGQkJ9IiAKICAgI</f>
  </c>
...
</StreamIndex>

Note the nested content within the f elements—each represents caption item data to be displayed at the time specified by the containing chunk. The client manifest specification requires that the data be represented as a base64-encoded string version of the original data item.

The TrackInfo.TrackData collection property contains a list of TimelineEvent instances—one for each f element corresponding to the track. For each TimelineEvent entry, TimelineEvent.EventTime represents the time point in the sequence and the TimelineEvent.EventData provides the base64-encoded text string. TrackInfo also supports Bitrate, CustomAttributes, Index, Name and ParentStream properties.

Selecting Streams and Tracks

There are many interesting ways you can use the streams and tracks metadata and API in your application code.

It may be useful to have the ability to select specific tracks within a stream and filter out the rest. A common scenario is a graded viewing experience based on a subscriber’s level of access, where for a basic or free level you serve the low-resolution version of the content, and expose the high-definition version only to premium-level subscribers:

if (subscriber.AccessLevel != "Premium") {
  StreamInfo videoStream = 
    ssme.GetStreamInfoForStreamType("video");
  List<TrackInfo> allowedTracks = 
    videoStream.AvailableTracks.Where((ti) => 
    ti.Bitrate < 1000000).ToList();
  ssme.SelectTracksForStream(
    videoStream, allowedTracks, false);
}

GetStreamInfoForStreamType accepts a stream type literal and returns the matching StreamInfo instance. A LINQ query on StreamInfo.AvailableTracks retrieves a list of tracks that offer a bit rate of less than 1 mbps—in other words, a standard-definition video for non-premium subscribers. The SelectTracksForStream method can then be used to filter down the list of tracks in that stream to only the tracks you want to expose.

The last parameter to SelectTracksForStream, when set to true, indicates to SSME that any data stored in the look-ahead buffers should be cleaned out immediately. To get the current selected list of tracks at any time, you can use the StreamInfo.SelectedTracks property, while the StreamInfo.AvailableTracks property continues to expose all the available tracks.

Remember that Smooth Streaming allows multiple streams of the same type to coexist in the client manifest. In the current beta of the IIS Smooth Streaming Player Development Kit, the GetStreamInfoForStreamType method returns the first occurrence of a stream of the specified type in case there are multiple streams of that type, which may not be what you desire. However, there is nothing stopping you from bypassing this method and instead using a query on the AvailableStreams collection directly to get the right StreamInfo. The following snippet shows a LINQ query that gets a text stream named “ticker”:

StreamInfo tickerStream = 
  ssme.AvailableStreams.Where((stm) => 
  stm.Type == "text" && 
  stm.Name == "ticker").FirstOrDefault();

Using Text Streams

An audio/video presentation may need to display additional content that is timed along the primary video sequence at specific time points. Examples could be closed captions, advertisements, news alerts, overlay animations and so on. A text stream is a convenient place to expose such content.

One approach to include a text stream in your presentation would be to mux in the text tracks alongside the video tracks during the video encoding, so that the content chunks for the text track are delivered from the server, appropriately timed with the video.

Another option is to utilize the manifest output feature discussed earlier to author the text content into the client manifest itself. Let’s take a closer look at this second approach.

To start, you need to prepare a client manifest with the text streams. In a production media workflow, there can be many different ways to inject such content into the manifest during or after encoding, and the data could be coming from several different sources, like ad-serving platforms and caption generators. But, for this example, I am going to use a simple XML data file as the data source, use some LINQ over XML queries to manufacture the text streams, and insert them into an existing client manifest.

The structure of the data does not need to be complex. (You can find the full file in the code download for this article. I will show excerpts here for illustration.) The data file begins with a Tracks element, then contains two ContentTrack elements. Each ContentTrack entry will ultimately result in one distinct text stream in the client manifest. The first ContentTrack element is for the captions:

<ContentTrack Name="ClosedCaptions" Subtype="CAPT">

The second is for animations:

<ContentTrack Name="Animations" Subtype="DATA">

Each ContentTrack contains multiple Event elements, with the time attributes specifying the time points on the video’s timeline when these text events need to occur. The Event elements in turn contain the actual caption events defined in XML, or the XAML for the animation as CDATA sections:

<Event time="00:00:10"> 
  <![CDATA[<Caption Id="{DE90FACD-BC01-43f2-A4EC-6A01A49BAFBB}" 
    Action="ADD">
    Test Caption 1
  </Caption>] ]> 
</Event>
<Event time="00:00:15"> 
  <![CDATA[<Caption Id="{DE90FACD-BC01-43f2-A4EC-6A01A49BAFBB}" 
    Action="REMOVE"/>] ]> 
</Event>

Note that for each added closed caption event, there is a corresponding event that indicates the time point when the previously added caption needs to be removed. The Caption element contained within the CDATA section for a closed caption event defines an Action attribute with a value of Add or Remove to indicate appropriate action.

My LINQ over XML code transforms the XML data into appropriate entries for a client manifest, and inserts them into an existing client manifest file. You can find an example in the code download for this article, but note that the data format demonstrated is not a part of the Smooth Streaming Player Development Kit or the Smooth Streaming specification, and neither is it prescriptive in any way. You can define whatever data structure suits the needs of your application, as long as you can transform it into the appropriate format required by the Smooth Streaming client manifest specification, which includes encoding the text content in the CDATA sections to a base64 format.

Once the transformation is executed, the resulting client manifest file will contain the text streams as shown in Figure 6.

Figure 6 Client Manifest Excerpt with Text Content Streams

<SmoothStreamingMedia MajorVersion="2" MinorVersion="0" 
  Duration="1456860000">
  <StreamIndex Type="video" Chunks="73" QualityLevels="8" 
    MaxWidth="1280" MaxHeight="720" 
    DisplayWidth="1280" DisplayHeight="720"
    Url="QualityLevels({bitrate})/Fragments(video={start time})">
    <QualityLevel Index="0" Bitrate="2962000" FourCC="WVC1" 
      MaxWidth="1280" MaxHeight="720"
      CodecPrivateData="250000010FD37E27F1678A27F859E80490825A645A64400000010E5A67F840" />
    <QualityLevel Index="1" Bitrate="2056000" FourCC="WVC1" 
      MaxWidth="992" MaxHeight="560" 
      CodecPrivateData="250000010FD37E1EF1178A1EF845E8049081BEBE7D7CC00000010E5A67F840" />
    ...
    <c n="0" d="20020000" />
    <c n="1" d="20020000" />
    ...
    <c n="71" d="20020000" />
    <c n="72" d="15010001" />
  </StreamIndex>
  <StreamIndex Type="audio" Index="0" FourCC="WMAP" 
    Chunks="73" QualityLevels="1" 
    Url="QualityLevels({bitrate})/Fragments(audio={start time})">
    <QualityLevel Bitrate="64000" SamplingRate="44100" Channels="2" 
      BitsPerSample="16" PacketSize="2973" AudioTag="354" 
      CodecPrivateData="1000030000000000000000000000E00042C0" />
    <c n="0" d="21246187" />
    <c n="1" d="19620819" />
    ...
    <c n="71" d="19504762" />
    <c n="72" d="14900906" />
  </StreamIndex>
  <StreamIndex Type="text" Name="ClosedCaptions" Subtype="CAPT" 
    TimeScale="10000000" ParentStreamIndex="video" 
    ManifestOutput="TRUE" QualityLevels="1" Chunks="2" 
    Url="QualityLevels({bitrate},{CustomAttributes})/Fragments(ClosedCaptions={start time})">
    <QualityLevel Index="0" Bitrate="1000" 
      CodecPrivateData="" FourCC=""/> 
    <c n="0" t="100000000">
      <f>...</f> 
    </c>
    <c n="1" t="150000000">
      <f>...</f>
    </c>
  </StreamIndex>
  ...
</SmoothStreamingMedia>

The video and the audio streams already existed in the client manifest shown in Figure 6, and I added the two text streams, named ClosedCaptions and Animations, respectively. Note that each stream uses the video stream as its parent and sets ManifestOutput to true. The former is because the text streams are sparse in nature and parenting them to the video stream ensures correct timing of each text content entry (the c elements) along the video stream’s timeline. The latter is to ensure that the SSME reads the actual data (the base64-encoded strings within the f elements) from the manifest itself.

TimelineEvent and TimelineMarker

Now let’s look at making use of the additional text content in SSME. SSME exposes the additional text streams as StreamInfo instances in the AvailableStreams property, with each StreamInfo containing the track data as a TrackInfo instance. The TrackInfo.TrackData collection property will contain as many instances of the TimelineEvent type as there are text events in each text track. The TimelineEvent.EventData property exposes a byte array representing the string content (decoded from its base64-encoded format), while the TimelineEvent.EventTime property exposes the time point where this event needs to occur.

When you start playing the presentation, as these events are reached, SSME raises the TimelineEventReached event. Figure 7 shows a sample of handling the closed caption and animation tracks that were added to the client manifest in Figure 6.

Figure 7 Handling the TimelineEventReached Event

ssme.TimelineEventReached += 
  new EventHandler<TimelineEventArgs>((s, e) => { 
  //if closed caption event
  if (e.Track.ParentStream.Name == "ClosedCaptions" && 
    e.Track.ParentStream.Subtype == "CAPT") {

    //base64 decode the content and load the XML fragment
    XElement xElem = XElement.Parse(
      Encoding.UTF8.GetString(e.Event.EventData,
      0, e.Event.EventData.Length));

    //if we are adding a caption
    if (xElem.Attribute("Action") != null && 
      xElem.Attribute("Action").Value == "ADD") {

      //remove the text block if it exists
      UIElement captionTextBlock = MediaElementContainer.Children.
      Where((uie) => uie is FrameworkElement && 
        (uie as FrameworkElement).Name == (xElem.Attribute("Id").Value)).
        FirstOrDefault() as UIElement;
        if(captionTextBlock != null)
          MediaElementContainer.Children.Remove(captionTextBlock);

      //add a TextBlock 
      MediaElementContainer.Children.Add(new TextBlock() {
        Name = xElem.Attribute("Id").Value,
        Text = xElem.Value,
        HorizontalAlignment = HorizontalAlignment.Center,
        VerticalAlignment = VerticalAlignment.Bottom,
        Margin = new Thickness(0, 0, 0, 20),
        Foreground = new SolidColorBrush(Colors.White),
        FontSize = 22
      });
    }
    //if we are removing a caption
    else if (xElem.Attribute("Action") != null && 
      xElem.Attribute("Action").Value == "REMOVE") {

      //remove the TextBlock
      MediaElementContainer.Children.Remove(
        MediaElementContainer.Children.Where(
        (uie) => uie is FrameworkElement && 
        (uie as FrameworkElement).Name == 
        (xElem.Attribute("Id").Value)).FirstOrDefault() 
        as UIElement);
    }
  }

  //Logic for animation event
  ...
});

As each TimelineEvent is handled, you either insert a TextBlock into the UI to display a caption or load the animation XAML string and start the animation (see the downloadable code for details of the animation-handling logic).

Note that because the text content is base-64 encoded, it is decoded to its original state. Also note that the code checks the Action attribute on the Caption element to decide whether it is adding a caption to the UI or removing an existing caption. For animation events, you can rely on an animation’s own completion handler to remove it from the UI.

Figure 8 shows a screenshot of a caption being displayed and an ellipse being animated overlaid on a playing video. While this approach works well, there is one issue you need to consider before using this technique. The current release of SSME handles TimlineEvents at two-second boundaries. To understand this better, let’s say you had a closed caption timed at the 15.5-second time point along the video timeline. SSME would raise the TimelineEventReached event for this closed caption at the closest previous time point that is a multiple of 2—in other words, at approximately 14 seconds.


Figure 8 Content Overlay Using Text Content Streams and TimelineEvents

If your scenario demands greater accuracy and you can’t position your content chunks close to two-second boundaries, using the TimelineEventReached to handle the content tracks may not be the right way. You can, however, use the TimelineMarker class (as used in the standard MediaElement type) to add markers to your timeline that can raise the MarkerReached event at any granularity you may need. The code download for this article includes the outline of an AddAndHandleMarkers method that adds TimelineMarkers for each content event and responds to them in the MarkerReached event handler.

Merging External Manifests

Earlier you saw an example of adding additional streams of content to a client manifest. That approach works well if you have access to the client manifest, but you may encounter situations where direct access to the client manifest to make the necessary additions is not possible. You may also encounter situations where the additional content streams are conditionally dependent on other factors (for example, closed captions in different languages for different locales). Adding the data for all possible conditions to the client manifest causes SSME to spend more time parsing and loading the manifest.

SSME solves this problem by allowing you to merge external manifest files at run time into the original client manifest, giving you the ability to bring in additional data streams and act upon the data as shown before, without having to modify the original client manifest.

Here is an example of manifest merging:

ssme.ManifestMerge += new 
  SmoothStreamingMediaElement.ManifestMergeHandler((sender) => {
  object ParsedExternalManifest = null;
  //URI of the right external manifest based on current locale
  //for example expands to 
  string UriString = 
    string.Format(
    "https://localhost/SmoothStreaming/Media/FighterPilot/{0}/CC.xml", 
    CultureInfo.CurrentCulture.Name);
  //parse the external manifest - timeout in 3 secs
  ssme.ParseExternalManifest(new Uri(UriString), 3000, 
    out ParsedExternalManifest);
  //merge the external manifest
  ssme.MergeExternalManifest(ParsedExternalManifest); 
});

This code snippet notes the prevailing locale and uses an appropriate external manifest file (named CC.xml stored in a folder named for the language identifier for the locale) that contains closed captions in the right language for that locale. The ParseExternalManifest method accepts a URI pointing to the location of the external manifest and returns the parsed manifest as an object through the third out parameter to the method. The second parameter to the method accepts a timeout value, allowing you to avoid blocking for too long on the network call.

The MergeExternalManifest method accepts the parsed manifest object returned from the previous call and does the actual merging. Following this, the streams and tracks from any merged external manifest are made available anywhere else in your player code as StreamInfo and TrackInfo instances, and can be acted upon as shown earlier.

It is important to note that the calls to ParseExternalManifest and MergeExternalManifest can only be made in the ManifestMerge event handler. Any calls to these methods outside the scope of this event handler raise an InvalidOperationException.

Keep in mind that external manifests need to have an extension that has an associated MIME type registered with the Web server from which they are available. Using a common extension such as .xml is a good idea because the content is XML is anyway. If the external manifest files are served from the same Web server that is acting as your Smooth Streaming server, you should refrain from using the .ismc extension because the IIS Media Services handler prevents .ismc files from being accessed directly, and ParseExternalManifest will fail to download the external manifest.

As far as the structure of an external manifest goes, it needs to be identical to a regular client manifest: a top-level SmoothStreamingMedia element, with appropriate StreamIndex child elements to represent your data.

Clip Scheduling

You may face the need to insert additional video clips into a presentation at specific time points. Advertisement videos, breaking news or filler clips in a presentation are just a few examples. The problem can be viewed in two parts. First, acquiring the necessary content data and determining where in the timeline to insert it. Second, actually scheduling and playing the clips. SSME incorporates functionality that makes both of these tasks fairly straightforward to implement.

You can continue to use the approach of a text stream inserted into the client manifest, as illustrated in the previous sections, to make the clip data available to your code. Here is a sample data source used for clip schedule information:

<ContentTrack Name="AdClips" Subtype="DATA">
  <Event time="00:00:04">
    <![CDATA[<Clip Id="{89F92331-8501-41ac-B78A-F83F6DD4CB40}" 
    Uri="https://localhost/SmoothStreaming/Media/Robotica/Robotica_1080.ism/manifest" 
    ClickThruUri="https://msdn.microsoft.com/en-us/robotics/default.aspx" 
    Duration="00:00:20" />] ]>
  </Event>
  <Event time="00:00:10">
    <![CDATA[<Clip Id="{3E5169F0-A08A-4c31-BBAD-5ED51C2BAD21}" 
    Uri="https://localhost/ProgDownload/Amazon_1080.wmv" 
    ClickThruUri="https://en.wikipedia.org/wiki/Amazon_Rainforest" 
    Duration="00:00:25"/>] ]>
  </Event>     
</ContentTrack>

For each clip to be scheduled there is a URI for the content, a URI for a Web page the user can navigate to as a click-through on the clip, and a playback duration for the clip. The time attribute on the Event element specifies where in the timeline the clip is scheduled.

You can transform this data and add the corresponding text stream into the client manifest, using the same approach of a LINQ to XML query as outlined in the previous section. As before, the text stream is exposed to the code as a StreamInfo instance. You can then use the clip scheduling API on the SSME to utilize this information to schedule these clips. Figure 9 shows a method that schedules the clips based on this information.

Figure 9 Schduling Clips

private void ScheduleClips() {
  //get the clip data stream
  StreamInfo siAdClips = ssme.AvailableStreams.Where(
    si => si.Name == "AdClips").FirstOrDefault();

  //if we have tracks
  if (siAdClips != null && siAdClips.AvailableTracks.Count > 0) {

    //for each event in that track
    foreach (TimelineEvent te in 
      siAdClips.AvailableTracks[0].TrackData) {

      //parse the inner XML fragment
      XElement xeClipData = XElement.Parse(
        Encoding.UTF8.GetString(te.EventData, 0, 
        te.EventData.Length));

      //schedule the clip
      ssme.ScheduleClip(new ClipInformation {
        ClickThroughUrl = new Uri(
        xeClipData.Attribute("ClickThruUri").Value),
        ClipUrl = new Uri(xeClipData.Attribute("Uri").Value),
        IsSmoothStreamingSource = 
        xeClipData.Attribute("Uri").Value.ToUpper().Contains("ism"), 
        Duration = TimeSpan.Parse(xeClipData.Attribute("Duration").Value)
        },
        te.EventTime, true, //pause the timeline
        null);
    }
    //set the Clip MediaElement style
    ssme.ClipMediaElementStyle = 
      this.Resources["ClipStyle"] as Style;
  }
}

The ScheduleClip method on SSME does the actual scheduling. 
For each clip you want to schedule, a new instance of the ClipInformation type is inserted into the schedule with the appropriate properties derived from the clip data.

Note that clips can be either Smooth Streaming sources or other sources as supported by the Silverlight MediaElement. It is important to set the ClipInformation.IsSmoothStreamingSource property correctly to make sure the right player component is used to play the clip.

The second parameter to ScheduleClip is the time when you want the clip to play. The third parameter is used to indicate whether you want the timeline to stop progressing while the clip is playing. The last parameter is used to pass in any user data that will be made available with the various clip-related event handlers.

Sometimes clips need to be scheduled in a sequence where start- time information is applied only to the first clip in a sequence, and subsequent clips are chained so that all the scheduled clips play out in one continuous sequence. The ScheduleClip method facilitates this feature as well, as shown in Figure 10.

Figure 10 Using the ClipContext to Chain Scheduled Clips

private void ScheduleClips() {
  StreamInfo siAdClips = ssme.AvailableStreams.Where(
  si => si.Name == "AdClips").FirstOrDefault();

  if (siAdClips != null && siAdClips.AvailableTracks.Count > 0) {
    ClipContext clipCtx = null;
    foreach (
      TimelineEvent te in siAdClips.AvailableTracks[0].TrackData) {
      XElement xeClipData = 
        XElement.Parse(Encoding.UTF8.GetString(te.EventData, 0,
        te.EventData.Length));

      //if this is the first clip to be scheduled
      if (clipCtx == null) {
        clipCtx = ssme.ScheduleClip(new ClipInformation {
          ClickThroughUrl = new Uri(
          xeClipData.Attribute("ClickThruUri").Value),
          ClipUrl = new Uri(xeClipData.Attribute("Uri").Value),
          IsSmoothStreamingSource = 
          xeClipData.Attribute("Uri").Value.ToUpper().Contains("ism"), 
          Duration = TimeSpan.Parse(
          xeClipData.Attribute("Duration").Value)
        },
        te.EventTime, //pass in the start time for the clip
        true, null);
      }
      else { //subsequent clips
        clipCtx = ssme.ScheduleClip(new ClipInformation {
          ClickThroughUrl = new Uri(
          xeClipData.Attribute("ClickThruUri").Value),
          ClipUrl = new Uri(xeClipData.Attribute("Uri").Value),
          IsSmoothStreamingSource = 
          xeClipData.Attribute("Uri").Value.ToUpper().Contains("ism"),
          Duration = TimeSpan.Parse(
          xeClipData.Attribute("Duration").Value)
        },
        clipCtx, //clip context for the previous clip to chain
        true, null);
      }
    }
    ssme.ClipMediaElementStyle = 
      this.Resources["ClipStyle"] as Style;
  }
}

I only use an absolute time to schedule the first clip, when there is no ClipContext (in other words, the clipCtx variable is null). Each subsequent call to ScheduleClip returns a ClipContext instance that represents the scheduled state of the clip. The ScheduleClip method has an overload that accepts a ClipContext instance instead of a scheduled start time for a clip, and that schedules the clip to start right after the previously scheduled clip (represented by the passed-in ClipContext). 

When the scheduled clips play, SSME hides the main video and introduces a MediaElement to play the scheduled clip. In the event that you want to customize this MediaElement, you can set the ClipMediaElementStyle property on SSME to a desired XAML style.

There are also several events of interest that are raised by SSME while a scheduled clip is playing. The ClipProgressUpdate event can be handled to track the progress of the clip. ClipPlaybackEventArgs.Progress is of the enumeration type ClipProgress, which represents the clip’s progress in quartiles. The ClipProgressUpdate event is raised only at the start and end of the clip and at time points denoting 25 percent, 50 percent and 75 percent of the clip’s duration. Note that the ClipContext.HasQuartileEvents boolean property indicates whether the quartile events will be raised for a clip. In certain cases, like when the duration of a clip is not known, quartile progress events may not be raised.

The ClipClickThrough event is raised when the viewer clicks on a clip while viewing it. If click-through destination was intended for this clip, ClipEventArgs.ClipContext.ClipInformation.ClickThroughUrl exposes it and you can use a technique of your choice (like interacting with the browser to open a pop-up window) to open up the Web resource targeted by the click-through URL.

You can also use the ClipError event and the ClipStateChanged event to handle any error conditions and state changes for the clip, respectively. 

Playback Speed and Direction

SSME enables playing content at varying speeds and direction. The SmoothStreamingMediaElement.SupportedPlaybackRates property returns a list of supported playback speeds as double values, where 1.0 denotes the default playback speed. In the current public beta, this list contains the additional values of 0.5, 4.0, 8.0, -4.0 and -8.0. The positive values enable playback at half, 4x and 8x speeds, and the negative values enable reverse play (rewind) at 4x and 8x speeds.

The SmoothStreamingMediaElement.SetPlaybackRate method can be called to set the playback speed at any point during playback. SetPlaybackRate accepts the desired playback speed as its only parameter.

Note that controlling playback speed only works for Smooth Streaming content—so if you are using SSME to play content that is being progressively downloaded or streamed using some other technique, SetPlaybackRate will raise an exception.

Smooth Stream Edits Using Composite Manifests

Sometimes you may need to combine portions from multiple Smooth Streaming presentations into a single composite presentation. The most common scenario is using tools like rough-cut editors that allow users to specify mark-in and mark-out time points into a master source producing clips, and then having several such clips from potentially different master sources play in a linear fashion as a single presentation.

The composite manifest feature of SSME allows you to accomplish this by creating a separate manifest document that contains clip segments, where each clip segment defines a portion of a complete presentation bounded by the begin and end time points of the clip. The biggest benefit of using this approach is the ability to create different edits on existing presentations without the need to transcode the source material.

A composite manifest always ends with the extension .csm. To consume such a manifest you simply set the SmoothStreamingSource property to a valid URL pointing to a composite manifest file:

ssme.SmoothStreamingSource = new Uri("https://localhost/SmoothStreaming/Media/MyCompositeSample.csm");

Figure 11 shows an excerpt from a composite manifest. (The entire file is included in the code download for this article.)

Figure 11 Sample Composite Manifest

<?xml version="1.0" encoding="utf-8"?>
<SmoothStreamingMedia MajorVersion="2" MinorVersion="0" Duration="269000000">
<Clip Url="https://localhost/SmoothStreaming/Media/AmazingCaves/Amazing_Caves_1080.ism/manifest" 
  ClipBegin="81000000" ClipEnd="250000000">
<StreamIndex Type="video" Chunks="9" QualityLevels="3"
  MaxWidth="992" MaxHeight="560"
  DisplayWidth="992" DisplayHeight="560"
  Url="QualityLevels({bitrate})/Fragments(video={start time})">
  <QualityLevel Index="0" Bitrate="2056000" FourCC="WVC1"
    MaxWidth="992" MaxHeight="560"
    CodecPrivateData="250000010FD37E1EF1178A1EF845E8049081BEBE7D7CC00000010E5A67F840" 
  />
  <QualityLevel Index="1" Bitrate="1427000" FourCC="WVC1"
    MaxWidth="768" MaxHeight="432"
    CodecPrivateData="250000010FCB6C17F0D78A17F835E8049081AB8BD718400000010E5A67F840" 
  />
  <QualityLevel Index="2" Bitrate="991000" FourCC="WVC1"
    MaxWidth="592" MaxHeight="332"
    CodecPrivateData="250000010FCB5E1270A58A127829680490811E3DF8F8400000010E5A67F840" 
  />
  <c t="80130000" />
  <c t="100150000" />
  <c t="120170000" />
  <c t="140190000" />
  <c t="160210000" />
  <c t="180230000" />
  <c t="200250000" />
  <c t="220270000" />
  <c t="240290000" d="20020000" />
</StreamIndex>
<StreamIndex Type="audio" Index="0" FourCC="WMAP"
  Chunks="10" QualityLevels="1" 
  Url="QualityLevels({bitrate})/Fragments(audio={start time})">
  <QualityLevel Bitrate="64000" SamplingRate="44100"
    Channels="2" BitsPerSample="16" PacketSize="2973"
    AudioTag="354" CodecPrivateData="1000030000000000000000000000E00042C0" />
  <c t="63506576" />
  <c t="81734240" />
  <c t="102632199" />
  <c t="121672562" />
  <c t="142106122" />
  <c t="162075283" />
  <c t="181580045" />
  <c t="202478004" />
  <c t="222447165" />
  <c t="241313378" d="20143311" />
</StreamIndex>
</Clip>
<Clip Url="https://localhost/SmoothStreaming/Media/CoralReef/Coral_Reef_Adventure_1080.ism/manifest" 
  ClipBegin="102000000" ClipEnd="202000000">
<StreamIndex Type="video" Chunks="6" QualityLevels="3"
  MaxWidth="992" MaxHeight="560"
  DisplayWidth="992" DisplayHeight="560"
  Url="QualityLevels({bitrate})/Fragments(video={start time})">
...
</Clip>
</SmoothStreamingMedia>

This manifest contains two Clip elements, each defining a clip (also called an edit) from an existing Smooth Streaming presentation. The URL attribute points to an existing Smooth Streaming presentation, and the ClipBegin and ClipEnd attributes contain the beginning and ending time values that provide the bounds to the clip. The Duration attribute on the top-level SmoothStreamingMedia element needs to be the exact sum of the durations of each clip in the manifest—you can sum the difference of the ClipEnd and ClipBegin values of each Clip entry to get the total manifest duration.

Each Clip element contains the video and the audio StreamIndex and their child QualityLevel entries, mirroring the client manifest (.ismc) files of the source presentations. The chunk metadata (c) entries for each StreamIndex entry, however, can be limited to only those chunks that are required to satisfy the ClipBegin and ClipEnd boundaries. In other words, the ClipBegin value needs to be greater than or equal to the start time(t attribute) value of the first c entry for the stream, and the ClipEnd value needs to be less than or equal to the sum of the start time and the duration(d attribute) values of the last c entry for that stream.

Note that, in your client manifest, chunks may be defined in an indexed (n attribute) fashion with durations specified. However, for the composite manifest, the chunks need to be defined using their start times (which can be easily calculated by summing the durations of the preceding chunks). Also note that the Chunks attribute on each StreamIndex entry needs to reflect the number of chunks in the clip, but all the other attributes mirror the entries in the source client manifest. 

Live Streams

SSME can play both on-demand and live streams. To play a live Smooth Streaming video stream using SSME, you can set the SmoothStreamingSource property on SSME to a live publishing point URL:

ssme.SmoothStreamingSource = "https://localhost/SmoothStreaming/Media/FighterPilotLive.isml/manifest";

To know if SSME is playing a live stream, you can check the IsLive property, which is set to True if the content is a live source, and False otherwise.

Note that the setup and delivery of Smooth Streaming live video requires specialized infrastructure. A detailed discussion of setting up a live streaming server environment is beyond the scope of this article.

Wrapping Up

IIS Smooth Streaming is a state-of-the-art adaptive streaming platform from Microsoft. As you’ve seen, the Smooth Streaming PDK (and in particular the SmoothStreamingMediaElement type) is an essential ingredient to authoring Silverlight clients that can consume both on-demand and live streams. The PDK offers extensive control on the client-side behavior of smooth streams, and allows you to write rich and immersive experiences that go beyond just audio/video streams, letting you easily combine data streams with your media in a meaningful way.

A detailed treatment of Smooth Streaming is beyond the scope of this article. You are encouraged to find more details at iis.net/media. For more guidance on media programming in Silverlight and the Silverlight MediaElement type, you can visit silverlight.net/getstarted.


 

Jit Ghosh  is an architect evangelist in the Developer Evangelism team at Microsoft, advising customers in the media industry on building cutting-edge digital media solutions. Ghosh co-authored the book “Silverlight Recipes” (APress, 2009). You can read his blog at blogs.msdn.com/jitghosh.

Thanks to the following technical expert for reviewing this article: Vishal Sood