Transforms - Create Or Update

Create or Update Transform
Creates or updates a new Transform.

PUT https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Media/mediaServices/{accountName}/transforms/{transformName}?api-version=2018-07-01

URI Parameters

Name In Required Type Description
subscriptionId
path True
  • string

The unique identifier for a Microsoft Azure subscription.

resourceGroupName
path True
  • string

The name of the resource group within the Azure subscription.

accountName
path True
  • string

The Media Services account name.

transformName
path True
  • string

The Transform name.

api-version
query True
  • string

The Version of the API to be used with the client request.

Request Body

Name Required Type Description
properties.description
  • string

An optional verbose description of the Transform.

properties.outputs True

An array of one or more TransformOutputs that the Transform should generate.

Responses

Name Type Description
200 OK

OK

201 Created

Created

Other Status Codes

Detailed error information.

Examples

Create or update a Transform

Sample Request

PUT https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/contosoresources/providers/Microsoft.Media/mediaServices/contosomedia/transforms/createdTransform?api-version=2018-07-01
{
  "properties": {
    "description": "Example Transform to illustrate create and update.",
    "created": "0001-01-01T00:00:00-05:00",
    "lastModified": "0001-01-01T00:00:00-05:00",
    "outputs": [
      {
        "relativePriority": null,
        "onError": null,
        "preset": {
          "@odata.type": "#Microsoft.Media.BuiltInStandardEncoderPreset",
          "presetName": "AdaptiveStreaming"
        }
      }
    ]
  }
}

Sample Response

{
  "name": "createdTransform",
  "id": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/contosoresources/providers/Microsoft.Media/mediaservices/contosomedia/transforms/createdTransform",
  "type": "Microsoft.Media/mediaservices/transforms",
  "properties": {
    "created": "2018-08-08T16:29:57.9828393-04:00",
    "description": "Example Transform to illustrate create and update.",
    "lastModified": "2018-08-08T16:29:57.9828393-04:00",
    "outputs": [
      {
        "onError": "StopProcessingJob",
        "relativePriority": "Normal",
        "preset": {
          "@odata.type": "#Microsoft.Media.BuiltInStandardEncoderPreset",
          "presetName": "AdaptiveStreaming"
        }
      }
    ]
  }
}
{
  "name": "createdTransform",
  "id": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/contosoresources/providers/Microsoft.Media/mediaservices/contosomedia/transforms/createdTransform",
  "type": "Microsoft.Media/mediaservices/transforms",
  "properties": {
    "created": "2018-08-08T16:29:57.9828393-04:00",
    "description": "Example Transform to illustrate create and update.",
    "lastModified": "2018-08-08T16:29:57.9998038-04:00",
    "outputs": [
      {
        "onError": "StopProcessingJob",
        "relativePriority": "Normal",
        "preset": {
          "@odata.type": "#Microsoft.Media.BuiltInStandardEncoderPreset",
          "presetName": "AdaptiveStreaming"
        }
      }
    ]
  }
}

Definitions

AacAudio

Describes Advanced Audio Codec (AAC) audio encoding settings.

AacAudioProfile

The encoding profile to be used when encoding audio with AAC.

AnalysisResolution

Specifies the maximum resolution at which your video is analyzed. The default behavior is "SourceResolution," which will keep the input video at its original resolution when analyzed. Using "StandardDefinition" will resize input videos to standard definition while preserving the appropriate aspect ratio. It will only resize if the video is of higher resolution. For example, a 1920x1080 input would be scaled to 640x360 before processing. Switching to "StandardDefinition" will reduce the time it takes to process high resolution video. It may also reduce the cost of using this component (see https://azure.microsoft.com/en-us/pricing/details/media-services/#analytics for details). However, faces that end up being too small in the resized video may not be detected.

ApiError

The API error.

Audio

Defines the common properties for all audio codecs.

AudioAnalyzerPreset

The Audio Analyzer preset applies a pre-defined set of AI-based analysis operations, including speech transcription. Currently, the preset supports processing of content with a single audio track.

AudioOverlay

Describes the properties of an audio overlay.

BuiltInStandardEncoderPreset

Describes a built-in preset for encoding the input video with the Standard Encoder.

CopyAudio

A codec flag, which tells the encoder to copy the input audio bitstream.

CopyVideo

A codec flag, which tells the encoder to copy the input video bitstream without re-encoding.

Deinterlace

Describes the de-interlacing settings.

DeinterlaceMode

The deinterlacing mode. Defaults to AutoPixelAdaptive.

DeinterlaceParity

The field parity for de-interlacing, defaults to Auto.

EncoderNamedPreset

The built-in preset to be used for encoding videos.

EntropyMode

The entropy mode to be used for this layer. If not specified, the encoder chooses the mode that is appropriate for the profile and level.

FaceDetectorPreset

Describes all the settings to be used when analyzing a video in order to detect all the faces present.

Filters

Describes all the filtering operations, such as de-interlacing, rotation etc. that are to be applied to the input media before encoding.

H264Complexity

Tells the encoder how to choose its encoding settings. The default value is Balanced.

H264Layer

Describes the settings to be used when encoding the input video into a desired output bitrate layer with the H.264 video codec.

H264Video

Describes all the properties for encoding a video with the H.264 codec.

H264VideoProfile

We currently support Baseline, Main, High, High422, High444. Default is Auto.

Image

Describes the basic properties for generating thumbnails from the input video

ImageFormat

Describes the properties for an output image file.

InsightsType

Defines the type of insights that you want the service to generate. The allowed values are 'AudioInsightsOnly', 'VideoInsightsOnly', and 'AllInsights'. The default is AllInsights. If you set this to AllInsights and the input is audio only, then only audio insights are generated. Similarly if the input is video only, then only video insights are generated. It is recommended that you not use AudioInsightsOnly if you expect some of your inputs to be video only; or use VideoInsightsOnly if you expect some of your inputs to be audio only. Your Jobs in such conditions would error out.

JpgFormat

Describes the settings for producing JPEG thumbnails.

JpgImage

Describes the properties for producing a series of JPEG images from the input video.

JpgLayer

Describes the settings to produce a JPEG image from the input video.

Mp4Format

Describes the properties for an output ISO MP4 file.

MultiBitrateFormat

Describes the properties for producing a collection of GOP aligned multi-bitrate files. The default behavior is to produce one output file for each video layer which is muxed together with all the audios. The exact output files produced can be controlled by specifying the outputFiles collection.

ODataError

Information about an error.

OnErrorType

A Transform can define more than one outputs. This property defines what the service should do when one output fails - either continue to produce other outputs, or, stop the other outputs. The overall Job state will not reflect failures of outputs that are specified with 'ContinueJob'. The default is 'StopProcessingJob'.

OutputFile

Represents an output file produced.

PngFormat

Describes the settings for producing PNG thumbnails.

PngImage

Describes the properties for producing a series of PNG images from the input video.

PngLayer

Describes the settings to produce a PNG image from the input video.

Priority

Sets the relative priority of the TransformOutputs within a Transform. This sets the priority that the service uses for processing TransformOutputs. The default priority is Normal.

Rectangle

Describes the properties of a rectangular window applied to the input media before processing it.

Rotation

The rotation, if any, to be applied to the input video, before it is encoded. Default is Auto

StandardEncoderPreset

Describes all the settings to be used when encoding the input video with the Standard Encoder.

StretchMode

The resizing mode - how the input video will be resized to fit the desired output resolution(s). Default is AutoSize

Transform

A Transform encapsulates the rules or instructions for generating desired outputs from input media, such as by transcoding or by extracting insights. After the Transform is created, it can be applied to input media by creating Jobs.

TransformOutput

Describes the properties of a TransformOutput, which are the rules to be applied while generating the desired output.

TransportStreamFormat

Describes the properties for generating an MPEG-2 Transport Stream (ISO/IEC 13818-1) output video file(s).

Video

Describes the basic properties for encoding the input video.

VideoAnalyzerPreset

A video analyzer preset that extracts insights (rich metadata) from both audio and video, and outputs a JSON format file.

VideoOverlay

Describes the properties of a video overlay.

AacAudio

Describes Advanced Audio Codec (AAC) audio encoding settings.

Name Type Description
@odata.type string:
  • #Microsoft.Media.AacAudio

The discriminator for derived types.

bitrate
  • integer

The bitrate, in bits per second, of the output encoded audio.

channels
  • integer

The number of channels in the audio.

label
  • string

An optional label for the codec. The label can be used to control muxing behavior.

profile

The encoding profile to be used when encoding audio with AAC.

samplingRate
  • integer

The sampling rate to use for encoding in hertz.

AacAudioProfile

The encoding profile to be used when encoding audio with AAC.

Name Type Description
AacLc
  • string

Specifies that the output audio is to be encoded into AAC Low Complexity profile (AAC-LC).

HeAacV1
  • string

Specifies that the output audio is to be encoded into HE-AAC v1 profile.

HeAacV2
  • string

Specifies that the output audio is to be encoded into HE-AAC v2 profile.

AnalysisResolution

Specifies the maximum resolution at which your video is analyzed. The default behavior is "SourceResolution," which will keep the input video at its original resolution when analyzed. Using "StandardDefinition" will resize input videos to standard definition while preserving the appropriate aspect ratio. It will only resize if the video is of higher resolution. For example, a 1920x1080 input would be scaled to 640x360 before processing. Switching to "StandardDefinition" will reduce the time it takes to process high resolution video. It may also reduce the cost of using this component (see https://azure.microsoft.com/en-us/pricing/details/media-services/#analytics for details). However, faces that end up being too small in the resized video may not be detected.

Name Type Description
SourceResolution
  • string
StandardDefinition
  • string

ApiError

The API error.

Name Type Description
error

The error properties.

Audio

Defines the common properties for all audio codecs.

Name Type Description
@odata.type string:
  • #Microsoft.Media.Audio

The discriminator for derived types.

bitrate
  • integer

The bitrate, in bits per second, of the output encoded audio.

channels
  • integer

The number of channels in the audio.

label
  • string

An optional label for the codec. The label can be used to control muxing behavior.

samplingRate
  • integer

The sampling rate to use for encoding in hertz.

AudioAnalyzerPreset

The Audio Analyzer preset applies a pre-defined set of AI-based analysis operations, including speech transcription. Currently, the preset supports processing of content with a single audio track.

Name Type Description
@odata.type string:
  • #Microsoft.Media.AudioAnalyzerPreset

The discriminator for derived types.

audioLanguage
  • string

The language for the audio payload in the input using the BCP-47 format of 'language tag-region' (e.g: 'en-US'). The list of supported languages are English ('en-US' and 'en-GB'), Spanish ('es-ES' and 'es-MX'), French ('fr-FR'), Italian ('it-IT'), Japanese ('ja-JP'), Portuguese ('pt-BR'), Chinese ('zh-CN'), German ('de-DE'), Arabic ('ar-EG' and 'ar-SY'), Russian ('ru-RU'), Hindi ('hi-IN'), and Korean ('ko-KR'). If you know the language of your content, it is recommended that you specify it. If the language isn't specified or set to null, automatic language detection will choose the first language detected and process with the selected language for the duration of the file. This language detection feature currently supports English, Chinese, French, German, Italian, Japanese, Spanish, Russian, and Portuguese. It does not currently support dynamically switching between languages after the first language is detected. The automatic detection works best with audio recordings with clearly discernable speech. If automatic detection fails to find the language, transcription would fallback to 'en-US'."

AudioOverlay

Describes the properties of an audio overlay.

Name Type Description
@odata.type string:
  • #Microsoft.Media.AudioOverlay

The discriminator for derived types.

audioGainLevel
  • number

The gain level of audio in the overlay. The value should be in the range [0, 1.0]. The default is 1.0.

end
  • string

The position in the input video at which the overlay ends. The value should be in ISO 8601 duration format. For example, PT30S to end the overlay at 30 seconds in to the input video. If not specified the overlay will be applied until the end of the input video if inputLoop is true. Else, if inputLoop is false, then overlay will last as long as the duration of the overlay media.

fadeInDuration
  • string

The duration over which the overlay fades in onto the input video. The value should be in ISO 8601 duration format. If not specified the default behavior is to have no fade in (same as PT0S).

fadeOutDuration
  • string

The duration over which the overlay fades out of the input video. The value should be in ISO 8601 duration format. If not specified the default behavior is to have no fade out (same as PT0S).

inputLabel
  • string

The label of the job input which is to be used as an overlay. The Input must specify exactly one file. You can specify an image file in JPG or PNG formats, or an audio file (such as a WAV, MP3, WMA or M4A file), or a video file. See https://aka.ms/mesformats for the complete list of supported audio and video file formats.

start
  • string

The start position, with reference to the input video, at which the overlay starts. The value should be in ISO 8601 format. For example, PT05S to start the overlay at 5 seconds in to the input video. If not specified the overlay starts from the beginning of the input video.

BuiltInStandardEncoderPreset

Describes a built-in preset for encoding the input video with the Standard Encoder.

Name Type Description
@odata.type string:
  • #Microsoft.Media.BuiltInStandardEncoderPreset

The discriminator for derived types.

presetName

The built-in preset to be used for encoding videos.

CopyAudio

A codec flag, which tells the encoder to copy the input audio bitstream.

Name Type Description
@odata.type string:
  • #Microsoft.Media.CopyAudio

The discriminator for derived types.

label
  • string

An optional label for the codec. The label can be used to control muxing behavior.

CopyVideo

A codec flag, which tells the encoder to copy the input video bitstream without re-encoding.

Name Type Description
@odata.type string:
  • #Microsoft.Media.CopyVideo

The discriminator for derived types.

label
  • string

An optional label for the codec. The label can be used to control muxing behavior.

Deinterlace

Describes the de-interlacing settings.

Name Type Description
mode

The deinterlacing mode. Defaults to AutoPixelAdaptive.

parity

The field parity for de-interlacing, defaults to Auto.

DeinterlaceMode

The deinterlacing mode. Defaults to AutoPixelAdaptive.

Name Type Description
AutoPixelAdaptive
  • string

Apply automatic pixel adaptive de-interlacing on each frame in the input video.

Off
  • string

Disables de-interlacing of the source video.

DeinterlaceParity

The field parity for de-interlacing, defaults to Auto.

Name Type Description
Auto
  • string

Automatically detect the order of fields

BottomFieldFirst
  • string

Apply bottom field first processing of input video.

TopFieldFirst
  • string

Apply top field first processing of input video.

EncoderNamedPreset

The built-in preset to be used for encoding videos.

Name Type Description
AACGoodQualityAudio
  • string

Produces a single MP4 file containing only stereo audio encoded at 192 kbps.

AdaptiveStreaming
  • string

Produces a set of GOP aligned MP4 files with H.264 video and stereo AAC audio. Auto-generates a bitrate ladder based on the input resolution and bitrate. The auto-generated preset will never exceed the input resolution and bitrate. For example, if the input is 720p at 3 Mbps, output will remain 720p at best, and will start at rates lower than 3 Mbps. The output will have video and audio in separate MP4 files, which is optimal for adaptive streaming.

ContentAwareEncodingExperimental
  • string

Exposes an experimental preset for content-aware encoding. Given any input content, the service attempts to automatically determine the optimal number of layers, appropriate bitrate and resolution settings for delivery by adaptive streaming. The underlying algorithms will continue to evolve over time. The output will contain MP4 files with video and audio interleaved.

H264MultipleBitrate1080p
  • string

Produces a set of 8 GOP-aligned MP4 files, ranging from 6000 kbps to 400 kbps, and stereo AAC audio. Resolution starts at 1080p and goes down to 360p.

H264MultipleBitrate720p
  • string

Produces a set of 6 GOP-aligned MP4 files, ranging from 3400 kbps to 400 kbps, and stereo AAC audio. Resolution starts at 720p and goes down to 360p.

H264MultipleBitrateSD
  • string

Produces a set of 5 GOP-aligned MP4 files, ranging from 1600kbps to 400 kbps, and stereo AAC audio. Resolution starts at 480p and goes down to 360p.

H264SingleBitrate1080p
  • string

Produces an MP4 file where the video is encoded with H.264 codec at 6750 kbps and a picture height of 1080 pixels, and the stereo audio is encoded with AAC-LC codec at 64 kbps.

H264SingleBitrate720p
  • string

Produces an MP4 file where the video is encoded with H.264 codec at 4500 kbps and a picture height of 720 pixels, and the stereo audio is encoded with AAC-LC codec at 64 kbps.

H264SingleBitrateSD
  • string

Produces an MP4 file where the video is encoded with H.264 codec at 2200 kbps and a picture height of 480 pixels, and the stereo audio is encoded with AAC-LC codec at 64 kbps.

EntropyMode

The entropy mode to be used for this layer. If not specified, the encoder chooses the mode that is appropriate for the profile and level.

Name Type Description
Cabac
  • string

Context Adaptive Binary Arithmetic Coder (CABAC) entropy encoding.

Cavlc
  • string

Context Adaptive Variable Length Coder (CAVLC) entropy encoding.

FaceDetectorPreset

Describes all the settings to be used when analyzing a video in order to detect all the faces present.

Name Type Description
@odata.type string:
  • #Microsoft.Media.FaceDetectorPreset

The discriminator for derived types.

resolution

Specifies the maximum resolution at which your video is analyzed. The default behavior is "SourceResolution," which will keep the input video at its original resolution when analyzed. Using "StandardDefinition" will resize input videos to standard definition while preserving the appropriate aspect ratio. It will only resize if the video is of higher resolution. For example, a 1920x1080 input would be scaled to 640x360 before processing. Switching to "StandardDefinition" will reduce the time it takes to process high resolution video. It may also reduce the cost of using this component (see https://azure.microsoft.com/en-us/pricing/details/media-services/#analytics for details). However, faces that end up being too small in the resized video may not be detected.

Filters

Describes all the filtering operations, such as de-interlacing, rotation etc. that are to be applied to the input media before encoding.

Name Type Description
crop

The parameters for the rectangular window with which to crop the input video.

deinterlace

The de-interlacing settings.

overlays Overlay[]:

The properties of overlays to be applied to the input video. These could be audio, image or video overlays.

rotation

The rotation, if any, to be applied to the input video, before it is encoded. Default is Auto

H264Complexity

Tells the encoder how to choose its encoding settings. The default value is Balanced.

Name Type Description
Balanced
  • string

Tells the encoder to use settings that achieve a balance between speed and quality.

Quality
  • string

Tells the encoder to use settings that are optimized to produce higher quality output at the expense of slower overall encode time.

Speed
  • string

Tells the encoder to use settings that are optimized for faster encoding. Quality is sacrificed to decrease encoding time.

H264Layer

Describes the settings to be used when encoding the input video into a desired output bitrate layer with the H.264 video codec.

Name Type Description
@odata.type string:
  • #Microsoft.Media.H264Layer

The discriminator for derived types.

adaptiveBFrame
  • boolean

Whether or not adaptive B-frames are to be used when encoding this layer. If not specified, the encoder will turn it on whenever the video profile permits its use.

bFrames
  • integer

The number of B-frames to be used when encoding this layer. If not specified, the encoder chooses an appropriate number based on the video profile and level.

bitrate
  • integer

The average bitrate in bits per second at which to encode the input video when generating this layer. This is a required field.

bufferWindow
  • string

The VBV buffer window length. The value should be in ISO 8601 format. The value should be in the range [0.1-100] seconds. The default is 5 seconds (for example, PT5S).

entropyMode

The entropy mode to be used for this layer. If not specified, the encoder chooses the mode that is appropriate for the profile and level.

frameRate
  • string

The frame rate (in frames per second) at which to encode this layer. The value can be in the form of M/N where M and N are integers (For example, 30000/1001), or in the form of a number (For example, 30, or 29.97). The encoder enforces constraints on allowed frame rates based on the profile and level. If it is not specified, the encoder will use the same frame rate as the input video.

height
  • string

The height of the output video for this layer. The value can be absolute (in pixels) or relative (in percentage). For example 50% means the output video has half as many pixels in height as the input.

label
  • string

The alphanumeric label for this layer, which can be used in multiplexing different video and audio layers, or in naming the output file.

level
  • string

We currently support Level up to 6.2. The value can be Auto, or a number that matches the H.264 profile. If not specified, the default is Auto, which lets the encoder choose the Level that is appropriate for this layer.

maxBitrate
  • integer

The maximum bitrate (in bits per second), at which the VBV buffer should be assumed to refill. If not specified, defaults to the same value as bitrate.

profile

We currently support Baseline, Main, High, High422, High444. Default is Auto.

referenceFrames
  • integer

The number of reference frames to be used when encoding this layer. If not specified, the encoder determines an appropriate number based on the encoder complexity setting.

slices
  • integer

The number of slices to be used when encoding this layer. If not specified, default is zero, which means that encoder will use a single slice for each frame.

width
  • string

The width of the output video for this layer. The value can be absolute (in pixels) or relative (in percentage). For example 50% means the output video has half as many pixels in width as the input.

H264Video

Describes all the properties for encoding a video with the H.264 codec.

Name Type Description
@odata.type string:
  • #Microsoft.Media.H264Video

The discriminator for derived types.

complexity

Tells the encoder how to choose its encoding settings. The default value is Balanced.

keyFrameInterval
  • string

The distance between two key frames, thereby defining a group of pictures (GOP). The value should be a non-zero integer in the range [1, 30] seconds, specified in ISO 8601 format. The default is 2 seconds (PT2S).

label
  • string

An optional label for the codec. The label can be used to control muxing behavior.

layers

The collection of output H.264 layers to be produced by the encoder.

sceneChangeDetection
  • boolean

Whether or not the encoder should insert key frames at scene changes. If not specified, the default is false. This flag should be set to true only when the encoder is being configured to produce a single output video.

stretchMode

The resizing mode - how the input video will be resized to fit the desired output resolution(s). Default is AutoSize

H264VideoProfile

We currently support Baseline, Main, High, High422, High444. Default is Auto.

Name Type Description
Auto
  • string

Tells the encoder to automatically determine the appropriate H.264 profile.

Baseline
  • string

Baseline profile

High
  • string

High profile.

High422
  • string

High 4:2:2 profile.

High444
  • string

High 4:4:4 predictive profile.

Main
  • string

Main profile

Image

Describes the basic properties for generating thumbnails from the input video

Name Type Description
@odata.type string:
  • #Microsoft.Media.Image

The discriminator for derived types.

keyFrameInterval
  • string

The distance between two key frames, thereby defining a group of pictures (GOP). The value should be a non-zero integer in the range [1, 30] seconds, specified in ISO 8601 format. The default is 2 seconds (PT2S).

label
  • string

An optional label for the codec. The label can be used to control muxing behavior.

range
  • string

The position in the input video at which to stop generating thumbnails. The value can be in absolute timestamp (ISO 8601, e.g: PT5M30S to stop at 5 minutes and 30 seconds), or a frame count (For example, 300 to stop at the 300th frame), or a relative value (For example, 100%).

start
  • string

The position in the input video from where to start generating thumbnails. The value can be in absolute timestamp (ISO 8601, e.g: PT05S), or a frame count (For example, 10 for the 10th frame), or a relative value (For example, 1%). Also supports a macro {Best}, which tells the encoder to select the best thumbnail from the first few seconds of the video.

step
  • string

The intervals at which thumbnails are generated. The value can be in absolute timestamp (ISO 8601, e.g: PT05S for one image every 5 seconds), or a frame count (For example, 30 for every 30 frames), or a relative value (For example, 1%).

stretchMode

The resizing mode - how the input video will be resized to fit the desired output resolution(s). Default is AutoSize

ImageFormat

Describes the properties for an output image file.

Name Type Description
@odata.type string:
  • #Microsoft.Media.ImageFormat

The discriminator for derived types.

filenamePattern
  • string

The pattern of the file names for the generated output files. The following macros are supported in the file name: {Basename} - The base name of the input video {Extension} - The appropriate extension for this format. {Label} - The label assigned to the codec/layer. {Index} - A unique index for thumbnails. Only applicable to thumbnails. {Bitrate} - The audio/video bitrate. Not applicable to thumbnails. {Codec} - The type of the audio/video codec. Any unsubstituted macros will be collapsed and removed from the filename.

InsightsType

Defines the type of insights that you want the service to generate. The allowed values are 'AudioInsightsOnly', 'VideoInsightsOnly', and 'AllInsights'. The default is AllInsights. If you set this to AllInsights and the input is audio only, then only audio insights are generated. Similarly if the input is video only, then only video insights are generated. It is recommended that you not use AudioInsightsOnly if you expect some of your inputs to be video only; or use VideoInsightsOnly if you expect some of your inputs to be audio only. Your Jobs in such conditions would error out.

Name Type Description
AllInsights
  • string

Generate both audio and video insights. Fails if either audio or video Insights fail.

AudioInsightsOnly
  • string

Generate audio only insights. Ignore video even if present. Fails if no audio is present.

VideoInsightsOnly
  • string

Generate video only insights. Ignore audio if present. Fails if no video is present.

JpgFormat

Describes the settings for producing JPEG thumbnails.

Name Type Description
@odata.type string:
  • #Microsoft.Media.JpgFormat

The discriminator for derived types.

filenamePattern
  • string

The pattern of the file names for the generated output files. The following macros are supported in the file name: {Basename} - The base name of the input video {Extension} - The appropriate extension for this format. {Label} - The label assigned to the codec/layer. {Index} - A unique index for thumbnails. Only applicable to thumbnails. {Bitrate} - The audio/video bitrate. Not applicable to thumbnails. {Codec} - The type of the audio/video codec. Any unsubstituted macros will be collapsed and removed from the filename.

JpgImage

Describes the properties for producing a series of JPEG images from the input video.

Name Type Description
@odata.type string:
  • #Microsoft.Media.JpgImage

The discriminator for derived types.

keyFrameInterval
  • string

The distance between two key frames, thereby defining a group of pictures (GOP). The value should be a non-zero integer in the range [1, 30] seconds, specified in ISO 8601 format. The default is 2 seconds (PT2S).

label
  • string

An optional label for the codec. The label can be used to control muxing behavior.

layers

A collection of output JPEG image layers to be produced by the encoder.

range
  • string

The position in the input video at which to stop generating thumbnails. The value can be in absolute timestamp (ISO 8601, e.g: PT5M30S to stop at 5 minutes and 30 seconds), or a frame count (For example, 300 to stop at the 300th frame), or a relative value (For example, 100%).

start
  • string

The position in the input video from where to start generating thumbnails. The value can be in absolute timestamp (ISO 8601, e.g: PT05S), or a frame count (For example, 10 for the 10th frame), or a relative value (For example, 1%). Also supports a macro {Best}, which tells the encoder to select the best thumbnail from the first few seconds of the video.

step
  • string

The intervals at which thumbnails are generated. The value can be in absolute timestamp (ISO 8601, e.g: PT05S for one image every 5 seconds), or a frame count (For example, 30 for every 30 frames), or a relative value (For example, 1%).

stretchMode

The resizing mode - how the input video will be resized to fit the desired output resolution(s). Default is AutoSize

JpgLayer

Describes the settings to produce a JPEG image from the input video.

Name Type Description
@odata.type string:
  • #Microsoft.Media.JpgLayer

The discriminator for derived types.

height
  • string

The height of the output video for this layer. The value can be absolute (in pixels) or relative (in percentage). For example 50% means the output video has half as many pixels in height as the input.

label
  • string

The alphanumeric label for this layer, which can be used in multiplexing different video and audio layers, or in naming the output file.

quality
  • integer

The compression quality of the JPEG output. Range is from 0-100 and the default is 70.

width
  • string

The width of the output video for this layer. The value can be absolute (in pixels) or relative (in percentage). For example 50% means the output video has half as many pixels in width as the input.

Mp4Format

Describes the properties for an output ISO MP4 file.

Name Type Description
@odata.type string:
  • #Microsoft.Media.Mp4Format

The discriminator for derived types.

filenamePattern
  • string

The pattern of the file names for the generated output files. The following macros are supported in the file name: {Basename} - The base name of the input video {Extension} - The appropriate extension for this format. {Label} - The label assigned to the codec/layer. {Index} - A unique index for thumbnails. Only applicable to thumbnails. {Bitrate} - The audio/video bitrate. Not applicable to thumbnails. {Codec} - The type of the audio/video codec. Any unsubstituted macros will be collapsed and removed from the filename.

outputFiles

The list of output files to produce. Each entry in the list is a set of audio and video layer labels to be muxed together .

MultiBitrateFormat

Describes the properties for producing a collection of GOP aligned multi-bitrate files. The default behavior is to produce one output file for each video layer which is muxed together with all the audios. The exact output files produced can be controlled by specifying the outputFiles collection.

Name Type Description
@odata.type string:
  • #Microsoft.Media.MultiBitrateFormat

The discriminator for derived types.

filenamePattern
  • string

The pattern of the file names for the generated output files. The following macros are supported in the file name: {Basename} - The base name of the input video {Extension} - The appropriate extension for this format. {Label} - The label assigned to the codec/layer. {Index} - A unique index for thumbnails. Only applicable to thumbnails. {Bitrate} - The audio/video bitrate. Not applicable to thumbnails. {Codec} - The type of the audio/video codec. Any unsubstituted macros will be collapsed and removed from the filename.

outputFiles

The list of output files to produce. Each entry in the list is a set of audio and video layer labels to be muxed together .

ODataError

Information about an error.

Name Type Description
code
  • string

A language-independent error name.

details

The error details.

message
  • string

The error message.

target
  • string

The target of the error (for example, the name of the property in error).

OnErrorType

A Transform can define more than one outputs. This property defines what the service should do when one output fails - either continue to produce other outputs, or, stop the other outputs. The overall Job state will not reflect failures of outputs that are specified with 'ContinueJob'. The default is 'StopProcessingJob'.

Name Type Description
ContinueJob
  • string

Tells the service that if this TransformOutput fails, then allow any other TransformOutput to continue.

StopProcessingJob
  • string

Tells the service that if this TransformOutput fails, then any other incomplete TransformOutputs can be stopped.

OutputFile

Represents an output file produced.

Name Type Description
labels
  • string[]

The list of labels that describe how the encoder should multiplex video and audio into an output file. For example, if the encoder is producing two video layers with labels v1 and v2, and one audio layer with label a1, then an array like '[v1, a1]' tells the encoder to produce an output file with the video track represented by v1 and the audio track represented by a1.

PngFormat

Describes the settings for producing PNG thumbnails.

Name Type Description
@odata.type string:
  • #Microsoft.Media.PngFormat

The discriminator for derived types.

filenamePattern
  • string

The pattern of the file names for the generated output files. The following macros are supported in the file name: {Basename} - The base name of the input video {Extension} - The appropriate extension for this format. {Label} - The label assigned to the codec/layer. {Index} - A unique index for thumbnails. Only applicable to thumbnails. {Bitrate} - The audio/video bitrate. Not applicable to thumbnails. {Codec} - The type of the audio/video codec. Any unsubstituted macros will be collapsed and removed from the filename.

PngImage

Describes the properties for producing a series of PNG images from the input video.

Name Type Description
@odata.type string:
  • #Microsoft.Media.PngImage

The discriminator for derived types.

keyFrameInterval
  • string

The distance between two key frames, thereby defining a group of pictures (GOP). The value should be a non-zero integer in the range [1, 30] seconds, specified in ISO 8601 format. The default is 2 seconds (PT2S).

label
  • string

An optional label for the codec. The label can be used to control muxing behavior.

layers

A collection of output PNG image layers to be produced by the encoder.

range
  • string

The position in the input video at which to stop generating thumbnails. The value can be in absolute timestamp (ISO 8601, e.g: PT5M30S to stop at 5 minutes and 30 seconds), or a frame count (For example, 300 to stop at the 300th frame), or a relative value (For example, 100%).

start
  • string

The position in the input video from where to start generating thumbnails. The value can be in absolute timestamp (ISO 8601, e.g: PT05S), or a frame count (For example, 10 for the 10th frame), or a relative value (For example, 1%). Also supports a macro {Best}, which tells the encoder to select the best thumbnail from the first few seconds of the video.

step
  • string

The intervals at which thumbnails are generated. The value can be in absolute timestamp (ISO 8601, e.g: PT05S for one image every 5 seconds), or a frame count (For example, 30 for every 30 frames), or a relative value (For example, 1%).

stretchMode

The resizing mode - how the input video will be resized to fit the desired output resolution(s). Default is AutoSize

PngLayer

Describes the settings to produce a PNG image from the input video.

Name Type Description
@odata.type string:
  • #Microsoft.Media.PngLayer

The discriminator for derived types.

height
  • string

The height of the output video for this layer. The value can be absolute (in pixels) or relative (in percentage). For example 50% means the output video has half as many pixels in height as the input.

label
  • string

The alphanumeric label for this layer, which can be used in multiplexing different video and audio layers, or in naming the output file.

width
  • string

The width of the output video for this layer. The value can be absolute (in pixels) or relative (in percentage). For example 50% means the output video has half as many pixels in width as the input.

Priority

Sets the relative priority of the TransformOutputs within a Transform. This sets the priority that the service uses for processing TransformOutputs. The default priority is Normal.

Name Type Description
High
  • string

Used for TransformOutputs that should take precedence over others.

Low
  • string

Used for TransformOutputs that can be generated after Normal and High priority TransformOutputs.

Normal
  • string

Used for TransformOutputs that can be generated at Normal priority.

Rectangle

Describes the properties of a rectangular window applied to the input media before processing it.

Name Type Description
height
  • string

The height of the rectangular region in pixels. This can be absolute pixel value (e.g 100), or relative to the size of the video (For example, 50%).

left
  • string

The number of pixels from the left-margin. This can be absolute pixel value (e.g 100), or relative to the size of the video (For example, 50%).

top
  • string

The number of pixels from the top-margin. This can be absolute pixel value (e.g 100), or relative to the size of the video (For example, 50%).

width
  • string

The width of the rectangular region in pixels. This can be absolute pixel value (e.g 100), or relative to the size of the video (For example, 50%).

Rotation

The rotation, if any, to be applied to the input video, before it is encoded. Default is Auto

Name Type Description
Auto
  • string

Automatically detect and rotate as needed.

None
  • string

Do not rotate the video. If the output format supports it, any metadata about rotation is kept intact.

Rotate0
  • string

Do not rotate the video but remove any metadata about the rotation.

Rotate180
  • string

Rotate 180 degrees clockwise.

Rotate270
  • string

Rotate 270 degrees clockwise.

Rotate90
  • string

Rotate 90 degrees clockwise.

StandardEncoderPreset

Describes all the settings to be used when encoding the input video with the Standard Encoder.

Name Type Description
@odata.type string:
  • #Microsoft.Media.StandardEncoderPreset

The discriminator for derived types.

codecs Codec[]:

The list of codecs to be used when encoding the input video.

filters

One or more filtering operations that are applied to the input media before encoding.

formats Format[]:

The list of outputs to be produced by the encoder.

StretchMode

The resizing mode - how the input video will be resized to fit the desired output resolution(s). Default is AutoSize

Name Type Description
AutoFit
  • string

Pad the output (with either letterbox or pillar box) to honor the output resolution, while ensuring that the active video region in the output has the same aspect ratio as the input. For example, if the input is 1920x1080 and the encoding preset asks for 1280x1280, then the output will be at 1280x1280, which contains an inner rectangle of 1280x720 at aspect ratio of 16:9, and pillar box regions 280 pixels wide at the left and right.

AutoSize
  • string

Override the output resolution, and change it to match the display aspect ratio of the input, without padding. For example, if the input is 1920x1080 and the encoding preset asks for 1280x1280, then the value in the preset is overridden, and the output will be at 1280x720, which maintains the input aspect ratio of 16:9.

None
  • string

Strictly respect the output resolution without considering the pixel aspect ratio or display aspect ratio of the input video.

Transform

A Transform encapsulates the rules or instructions for generating desired outputs from input media, such as by transcoding or by extracting insights. After the Transform is created, it can be applied to input media by creating Jobs.

Name Type Description
id
  • string

Fully qualified resource ID for the resource.

name
  • string

The name of the resource.

properties.created
  • string

The UTC date and time when the Transform was created, in 'YYYY-MM-DDThh:mm:ssZ' format.

properties.description
  • string

An optional verbose description of the Transform.

properties.lastModified
  • string

The UTC date and time when the Transform was last updated, in 'YYYY-MM-DDThh:mm:ssZ' format.

properties.outputs

An array of one or more TransformOutputs that the Transform should generate.

type
  • string

The type of the resource.

TransformOutput

Describes the properties of a TransformOutput, which are the rules to be applied while generating the desired output.

Name Type Description
onError

A Transform can define more than one outputs. This property defines what the service should do when one output fails - either continue to produce other outputs, or, stop the other outputs. The overall Job state will not reflect failures of outputs that are specified with 'ContinueJob'. The default is 'StopProcessingJob'.

preset Preset:

Preset that describes the operations that will be used to modify, transcode, or extract insights from the source file to generate the output.

relativePriority

Sets the relative priority of the TransformOutputs within a Transform. This sets the priority that the service uses for processing TransformOutputs. The default priority is Normal.

TransportStreamFormat

Describes the properties for generating an MPEG-2 Transport Stream (ISO/IEC 13818-1) output video file(s).

Name Type Description
@odata.type string:
  • #Microsoft.Media.TransportStreamFormat

The discriminator for derived types.

filenamePattern
  • string

The pattern of the file names for the generated output files. The following macros are supported in the file name: {Basename} - The base name of the input video {Extension} - The appropriate extension for this format. {Label} - The label assigned to the codec/layer. {Index} - A unique index for thumbnails. Only applicable to thumbnails. {Bitrate} - The audio/video bitrate. Not applicable to thumbnails. {Codec} - The type of the audio/video codec. Any unsubstituted macros will be collapsed and removed from the filename.

outputFiles

The list of output files to produce. Each entry in the list is a set of audio and video layer labels to be muxed together .

Video

Describes the basic properties for encoding the input video.

Name Type Description
@odata.type string:
  • #Microsoft.Media.Video

The discriminator for derived types.

keyFrameInterval
  • string

The distance between two key frames, thereby defining a group of pictures (GOP). The value should be a non-zero integer in the range [1, 30] seconds, specified in ISO 8601 format. The default is 2 seconds (PT2S).

label
  • string

An optional label for the codec. The label can be used to control muxing behavior.

stretchMode

The resizing mode - how the input video will be resized to fit the desired output resolution(s). Default is AutoSize

VideoAnalyzerPreset

A video analyzer preset that extracts insights (rich metadata) from both audio and video, and outputs a JSON format file.

Name Type Description
@odata.type string:
  • #Microsoft.Media.VideoAnalyzerPreset

The discriminator for derived types.

audioLanguage
  • string

The language for the audio payload in the input using the BCP-47 format of 'language tag-region' (e.g: 'en-US'). The list of supported languages are English ('en-US' and 'en-GB'), Spanish ('es-ES' and 'es-MX'), French ('fr-FR'), Italian ('it-IT'), Japanese ('ja-JP'), Portuguese ('pt-BR'), Chinese ('zh-CN'), German ('de-DE'), Arabic ('ar-EG' and 'ar-SY'), Russian ('ru-RU'), Hindi ('hi-IN'), and Korean ('ko-KR'). If you know the language of your content, it is recommended that you specify it. If the language isn't specified or set to null, automatic language detection will choose the first language detected and process with the selected language for the duration of the file. This language detection feature currently supports English, Chinese, French, German, Italian, Japanese, Spanish, Russian, and Portuguese. It does not currently support dynamically switching between languages after the first language is detected. The automatic detection works best with audio recordings with clearly discernable speech. If automatic detection fails to find the language, transcription would fallback to 'en-US'."

insightsToExtract

Defines the type of insights that you want the service to generate. The allowed values are 'AudioInsightsOnly', 'VideoInsightsOnly', and 'AllInsights'. The default is AllInsights. If you set this to AllInsights and the input is audio only, then only audio insights are generated. Similarly if the input is video only, then only video insights are generated. It is recommended that you not use AudioInsightsOnly if you expect some of your inputs to be video only; or use VideoInsightsOnly if you expect some of your inputs to be audio only. Your Jobs in such conditions would error out.

VideoOverlay

Describes the properties of a video overlay.

Name Type Description
@odata.type string:
  • #Microsoft.Media.VideoOverlay

The discriminator for derived types.

audioGainLevel
  • number

The gain level of audio in the overlay. The value should be in the range [0, 1.0]. The default is 1.0.

cropRectangle

An optional rectangular window used to crop the overlay image or video.

end
  • string

The position in the input video at which the overlay ends. The value should be in ISO 8601 duration format. For example, PT30S to end the overlay at 30 seconds in to the input video. If not specified the overlay will be applied until the end of the input video if inputLoop is true. Else, if inputLoop is false, then overlay will last as long as the duration of the overlay media.

fadeInDuration
  • string

The duration over which the overlay fades in onto the input video. The value should be in ISO 8601 duration format. If not specified the default behavior is to have no fade in (same as PT0S).

fadeOutDuration
  • string

The duration over which the overlay fades out of the input video. The value should be in ISO 8601 duration format. If not specified the default behavior is to have no fade out (same as PT0S).

inputLabel
  • string

The label of the job input which is to be used as an overlay. The Input must specify exactly one file. You can specify an image file in JPG or PNG formats, or an audio file (such as a WAV, MP3, WMA or M4A file), or a video file. See https://aka.ms/mesformats for the complete list of supported audio and video file formats.

opacity
  • number

The opacity of the overlay. This is a value in the range [0 - 1.0]. Default is 1.0 which mean the overlay is opaque.

position

The location in the input video where the overlay is applied.

start
  • string

The start position, with reference to the input video, at which the overlay starts. The value should be in ISO 8601 format. For example, PT05S to start the overlay at 5 seconds in to the input video. If not specified the overlay starts from the beginning of the input video.