Q: What is CNTK?

CNTK , the Microsoft Cognitive Toolkit, is a framework for deep learning. A Computational Network defines the function to be learned as a directed graph where each leaf node consists of an input value or parameter, and each non-leaf node represents a matrix or tensor operation upon its children. The beauty of CNTK is that once a computational network has been described, all the computation required to learn the network parameters is taken care of automatically. There is no need to derive gradients analytically or to code the interactions between variables for backpropagation.

Q: How can I give feedback?

Give us feedback through these channels .

Question 1

What is CNTK?

Accepted Answer

CNTK, the Microsoft Cognitive Toolkit, is a framework for deep learning. A Computational Network defines the function to be learned as a directed graph where each leaf node consists of an input value or parameter, and each non-leaf node represents a matrix or tensor operation upon its children. The beauty of CNTK is that once a computational network has been described, all the computation required to learn the network parameters is taken care of automatically. There is no need to derive gradients analytically or to code the interactions between variables for backpropagation.

Question 2

Why did Microsoft develop CNTK?

Accepted Answer

We first created CNTK for ourselves. CNTK was developed for the fastest training on the biggest data sets. Many of Microsoft's critical services run on models trained with CNTK. The results were so positive, we wanted to share our toolkit with the world.

Question 3

How can I give feedback?

Accepted Answer

Give us feedback through these channels.

Question 4

Training deep learning models can be time intensive, can CNTK help with this?

Accepted Answer

For mission critical AI research, we believe efficiency and performance are important criteria. CNTK was designed for peak performance for not only CPUs but also single-GPU, multi-GPU, and multi-machine-multi-GPU scenarios. Additionally, Microsoft’s 1-bit compression technique or Block momentum technique dramatically reduced communication costs -- enabling highly scalable parallel training on a large number of GPUs spanning multiple machines.

Question 5

Is CNTK flexible enough for my own network?

Accepted Answer

In addition to a wide variety of built-in computation nodes, CNTK provides a plug-in architecture allowing users to define their own computation nodes. So if your workload requires special customization, CNTK makes that easy to do. Readers are also fully customizable allowing support for arbitrary input formats.

Question 6

What are the key training algorithms supported by CNTK?

Accepted Answer

Today CNTK supports the following algorithms:

Feed Forward
CNN
RNN
LSTM
Sequence-to-Sequence.

Question 7

Who are the people behind the CNTK?

Accepted Answer

CNTK is developed by Microsoft's Technology and Research division. Additionally, CNTK gets major contributions from nearly all of Microsoft production teams.

Question 8

When did work begin on the CNTK?

Accepted Answer

The development of CNTK has been underway since late 2014.

Question 9

Is CNTK only optimized for Speech Recognition Training?

Accepted Answer

No. CNTK is used in production for the Speech Recognition as well as for Image and Text training.

Question 10

How can I use CNTK?

Accepted Answer

Using CNTK is easy and straightforward. Here are some ways to get started.

Question 11

Why does CNTK randomize the mini-batches after each epoch?

Accepted Answer

Doing so prevents the same samples from always appearing in a mini-batch together. This leads to improvements in the validation accuracy.

Question 12

Can the built-in readers be used train a network model using multiple input files?

Accepted Answer

Yes. See the description at Understanding and Extending Readers and look for the section describing how to "compose several data deserializers"

Question 13

How are sequences handled in CNTK?

Accepted Answer

See this article Working with Sequences.

Question 14

When looking at dropout, does chosen hidden unit omitted for the entire minibatch as updates takes place only after minibatchSize?

Accepted Answer

Typically a different set of hidden units are set to 0 for different samples in the same minibatch. For recurrent neural networks, some people constrain to drop same set of hidden units across time for the same sequence.

Question 15

Dropout documentation mentions about hidden units, does dropout apply to units in Convolutional Layer if there are multiple of them?

Accepted Answer

Convolution layer is also a hidden layer if it’s not the last output layer.

Question 16

Is there a way to specify different dropout rate to different layers?

Accepted Answer

In CNTK, you need to explicitly indicate to use dropout. For example, if your original model has h2=W1*h1 and you want to apply dropout to h1 you need to change it to h2=W1*Dropout(h1). Currently in BrainScript CNTK only allows to specify one dropout rate for all dropout nodes used in the same model. However, it allows you to select different dropout rate across epochs. In the Python API you can specify a different dropout rate for each layer.

BTW, we are not big fans of using different values for different dropout or model initial values. This would significantly increase the number of hyper parameters to be determined. If this is used in product, this means the model is lack of engineering stability.

Question 17

Can we apply weight constraints instead of a L2 regularization term?

Accepted Answer

Yes, you can compute anything based on weight and add it to your core criterion and use the combined criterion as your training objective.

Question 18

Can Python API could read models of BrainScript style trained?

Accepted Answer

Yes, you are able to read models trained by BrainScript.

CNTK FAQ