Minibatch​Source class

Definition

MinibatchSource(deserializers, max_samples=cntk.io.INFINITELY_REPEAT, max_sweeps=cntk.io.INFINITELY_REPEAT, randomization_window_in_chunks=cntk.io.DEFAULT_RANDOMIZATION_WINDOW, randomization_window_in_samples=0, randomization_seed=0, trace_level=cntk.logging.get_trace_level(), multithreaded_deserializer=None, frame_mode=False, truncation_length=0, randomize=True)
Parameters
deserializers
a single deserializer or a list

deserializers to be used in the composite reader

max_samples
int, defaults to @cntk.io.INFINITELY_REPEAT

The maximum number of input samples (not 'label samples') the reader can produce. After this number has been reached, the reader returns empty minibatches on subsequent calls to next_minibatch. max_samples and max_sweeps are mutually exclusive, an exception will be raised if both have non-default values. Important: Click here for a description of input and label samples.

max_sweeps
int, defaults to @cntk.io.INFINITELY_REPEAT

The maximum number of sweeps over the input dataset After this number has been reached, the reader returns empty minibatches on subsequent calls to func:next_minibatch. max_samples and max_sweeps are mutually exclusive, an exception will be raised if both have non-default values.

randomization_window_in_chunks
int, defaults to @cntk.io.DEFAULT_RANDOMIZATION_WINDOW_IN_CHUNKS

size of the randomization window in chunks, non-zero value enables randomization. randomization_window_in_chunks and randomization_window_in_samples are mutually exclusive, an exception will be raised if both have non-zero values.

randomization_window_in_samples
int, defaults to 0

size of the randomization window in samples, non-zero value enables randomization. randomization_window_in_chunks and randomization_window_in_samples are mutually exclusive, an exception will be raised if both have non-zero values.

randomization_seed
int, defaults to 0

initial randomization seed value (incremented every sweep when the input data is re-randomized).

trace_level
an instance of @cntk.logging.TraceLevel

the output verbosity level, defaults to the current logging verbosity level given by get_trace_level.

multithreaded_deserializer
bool

specifies if the deserialization should be done on a single or multiple threads. Defaults to None, which is effectively "auto" (multhithreading is disabled unless ImageDeserializer is present in the deserializers list). False and True faithfully turn the multithreading off/on.

frame_mode
bool, defaults to False

switches the frame mode on and off. If the frame mode is enabled the input data will be processed as individual frames ignoring all sequence information (this option cannot be used for BPTT, an exception will be raised if frame mode is enabled and the truncation length is non-zero).

truncation_length
int, defaults to 0

truncation length in samples, non-zero value enables the truncation (only applicable for BPTT, cannot be used in frame mode, an exception will be raised if frame mode is enabled and the truncation length is non-zero).

randomize
bool, defaults to True

Enables or disables randomization; use randomization_window_in_chunks or randomization_window_in_samples to specify the randomization range

Methods

get_checkpoint_state

Gets the checkpoint state of the MinibatchSource.

get_checkpoint_state()
Parameters
self
Returns

A dict that has the checkpoint state of the MinibatchSource

next_minibatch

Reads a minibatch that contains data for all input streams. The minibatch size is specified in terms of #samples and/or #sequences for the primary input stream; value of 0 for #samples/#sequences means unspecified. In case the size is specified in terms of both #sequences and #samples, the smaller of the 2 is taken. An empty map is returned when the MinibatchSource has no more data to return.

next_minibatch(minibatch_size_in_samples, input_map=None, device=None, num_data_partitions=None, partition_index=None)
Parameters
minibatch_size_in_samples
int

number of samples to retrieve for the next minibatch. Must be > 0. Important: Click here for a full description of this parameter.

input_map
dict

mapping of Variable to StreamInformation which will be used to convert the returned data.

device
DeviceDescriptor, defaults to None

CNTK DeviceDescriptor

num_data_partitions

Used for distributed training, indicates into how many partitions the source should split the data.

partition_index
int, defaults to None

Used for distributed training, indicates data from which partition to take.

Returns

A mapping of StreamInformation to MinibatchData if input_map was not specified. Otherwise, the returned value will be a mapping of Variable to class:MinibatchData. When the maximum number of epochs/samples is exhausted, the return value is an empty dict.

restore_from_checkpoint

Restores the MinibatchSource state from the specified checkpoint.

restore_from_checkpoint(checkpoint)
Parameters
checkpoint
dict

checkpoint to restore from

stream_info

Gets the description of the stream with given name. Throws an exception if there are none or multiple streams with this same name.

stream_info(name)
Parameters
name
str

stream name to fetch

Returns

StreamInformation The information for the given stream name.

stream_infos

Describes the streams this minibatch source produces.

stream_infos()
Parameters
self
Returns

A list of instances of StreamInformation

Attributes

current_position

Gets current position in the minibatch source.

Parameters
getter
cntk.cntk_py.Dictionary

minibatch position on the global timeline.

setter
cntk.cntk_py.Dictionary

position returned by the getter

is_distributed

Whether the minibatch source is running distributed

streams

Describes the streams 'this' minibatch source produces.

Returns

A dict mapping input names to instances of StreamInformation