Article
01/16/2019

July 2017

Volume 32 Number 7

[Machine Learning]

Introduction to the Microsoft CNTK v2.0 Library

Disclaimer: CNTK version 2.0 is in Release Candidate mode. All information is subject to change.

The Microsoft Cognitive Toolkit (CNTK) is a powerful, open source library that can be used to create machine learning prediction models. In particular, CNTK can create deep neural networks, which are at the forefront of artificial intelligence efforts such as Cortana and self-driving automobiles.

CNTK version 2.0 is much, much different from version 1. At the time I’m writing this article, version 2.0 is in Release Candidate mode. By the time you read this, there will likely be some minor changes to the code base, but I’m confident they won’t affect the demo code presented here very much.

In this article, I’ll explain how to install CNTK v2.0, and how to create, train and make predictions with a simple neural network. A good way to see where this article is headed is to take a look at the screenshot in Figure 1.

Figure 1 CNTK v2.0 in Action

The CNTK library is written in C++ for performance reasons, but v2.0 has a new Python language API, which is now the preferred way to use the library. I invoke the iris_demo.py program by typing the following in an ordinary Windows 10 command shell:

> python iris_demo.py 2>nul

The second argument suppresses error messages. I do this only to avoid displaying the approximately 12 lines of CNTK build information that would otherwise be shown.

The goal of the demo program is to create a neural network that can predict the species of an iris flower, using the well-known Iris Data Set. The raw data items look like this:

5.0 3.5 1.3 0.3 setosa
5.5 2.6 4.4 1.2 versicolor
6.7 3.1 5.6 2.4 virginica

There are 150 data items, 50 of each of three species: setosa, versicolor and virginica. The first four values on each line are the predictor values, often called attributes or features. The item-to-predict is often called the class or the label. The first two feature values are a flower’s sepal length and width (a sepal is a leaf-like structure). The next two values are the petal length and width.

Neural networks work only with numeric values, so the data files used by the demo encode species as setosa = (1,0,0), versicolor = (0,1,0) and virginica = (0,0,1).

The demo program creates a 4-2-3 neural network; that is, a network with four input nodes for the feature values, two hidden processing nodes and three output nodes for the label values. The number of input and output nodes for a neural network classifier are determined by the structure of your data, but the number of hidden processing nodes is a free parameter and must be determined by trial and error.

You can think of a neural network as a complex mathematical prediction equation. Neural network training is the process of determining the constants that define the equation. Training is an iterative process and the demo performs 5,000 training iterations, using 120 of the 150 iris data items.

After training, the prediction model is applied to the 30 iris data items that were held out of the training process. The model had a classification error of 0.0667, which means that the model incorrectly predicted the species of 0.0667 * 30 = 2 flower items and, therefore, correctly predicted 28 items. The classification error on a holdout test set is a very rough estimate of how well you’d expect the model to do when presented with a set of new, previously unseen data items.

Next, the demo program uses the trained neural network to predict the species of a flower with features (6.9, 3.1, 4.6, 1.3). The prediction is computed and displayed in terms of probabilities: (0.263, 0.682, 0.055). Notice that the three values sum to 1.0. Because the middle value, 0.682, is the largest, the prediction maps to (0,1,0), which in turn maps to versicolor.

The remainder of the output shown in Figure 1 displays the values of the constants that define the neural network prediction model. I’ll explain where those values come from, and what they can be used for, shortly.

This article makes no particular assumptions about your knowledge of neural networks, or CNTK or Python. Regardless of your background, you should be able to follow along without too much trouble. The complete source code for the iris_demo.py program is presented in this article, and is also available in the accompanying download.

Installing CNTK v2.0

There are several ways to install CNTK, but I’ll describe the simplest approach. The first step is to install a CNTK-compatible version of Anaconda onto your Windows machine.

At the time I wrote this article, CNTK v2.0 RC1 required Anaconda (with Python 3), version 4.1.1, 64-bit, which contains Python version 3.5.2 and NumPy 1.11.1. So I went to the Anaconda Download site (which you can easily find with an Internet search), then to the archives page and found a self-extracting executable installer file named Anaconda3-4.1.1-Windows-x86_64.exe and double-clicked on it.

The CNTK library and documentation is hosted on GitHub at github.com/Microsoft/CNTK. I strongly advise you to review the current CNTK system requirements, especially the version of Anaconda, before trying to install CNTK.

The Anaconda install process is very slick and I accepted all the default installation options. You might want to take note of the Anaconda installation location because CNTK will go there, too.

By far the easiest way to install CNTK is indirectly, by using the Python pip utility program. In other words, you don’t need to go to the CNTK site to install it, though you do need to go to the CNTK installation directions to determine the correct installation URL. In my case that URL was: https://cntk.ai.PythonWheel/CPU-Only/cntk-2.0rc1-cp35-cp35m-win_amd64.whl

The URL you’ll want to use will definitely be different by the time you read this article. If you’re new to Python, you can think of a .WHL file (pronounced “wheel”) as somewhat similar to a Windows .MSI installer file. Notice the CPU-Only part of the URL. If you have a machine with a supported GPU, you can use it with a dual CPU-GPU version of CNTK.

Once you determine the correct URL, all you have to do is launch an ordinary Windows command shell and type:

> pip install <url>

Installation is very quick, and files are placed in the Anaconda directory tree. Any install errors will be immediately and painfully obvious, but you can check a successful installation by typing the following at a command prompt and you should see the CNTK version displayed:

> python -c "import cntk; print(cntk.__version__)"

Understanding Neural Networks

CNTK operates at a relatively low level. To understand how to use CNTK to create a neural network prediction model, you have to understand the basic mechanics of neural networks. The diagram in Figure 2 corresponds to the demo program.

Figure 2 Neural Network Input-Output Mechanism

The network input layer has four nodes and holds the sepal length and width (6.9, 3.1) and the petal length and width (4.6, 1.3) of a flower of an unknown species. The eight arrows connecting each of the four input nodes to the two hidden processing nodes represent numeric constants called weights. If nodes are 0-base indexed with [0] at the top, then the input-to-hidden weight from input[0] to hidden[0] is 0.6100 and so on.

Similarly, the six arrows connecting the two hidden nodes to the three output nodes are hidden-to-output weights. The two small arrows pointing into the two hidden nodes are special weights called biases. Similarly, the three output nodes each have a bias value.

The first step in the neural network input-output mechanism is to compute the values of the hidden nodes. The value in each hidden node is the hyperbolic tangent of the sum of products of input values and associated weights, plus the bias. For example:

hidden[0] = tanh( (6.9)(0.6100) +
                  (3.1)(0.7152) +
                  (4.6)(-1.0855) +
                  (1.3)(-1.0687) + 0.1468 )
          = tanh(0.1903)
          = 0.1882

The value of the hidden[1] node is calculated in the same way. The hyperbolic tangent function, abbreviated tanh, is called the hidden layer activation function. The tanh function accepts any value, from negative infinity to positive infinity, and returns a value between -1.0 and +1.0. There are several choices of activation functions supported by CNTK. The three most common are tanh, logistic sigmoid and rectified linear unit (ReLU).

Computing the output node values is similar to the process used to compute hidden nodes, but a different activation function, called softmax, is used. The first step is to compute the sum of products plus bias for all three output nodes:

pre-output[0] = (0.1882)(3.2200) + (0.9999)(-0.8545) + 0.1859
              = -0.0625
pre-output[1] = (0.1882)(-0.7311) + (0.9999)(0.3553) + 0.6735
              = 0.8912
pre-output[2] = (0.1882)(-4.1944) + (0.9999)(0.0244) + (-0.8595)
              = -1.6246

The softmax value of one of a set of three values is the exp function applied to the value, divided by the sum of the exp function applied to all three values. So the final output node values are computed as:

output[0] = exp(-0.0625) / exp(-0.0625) + exp(0.8912) + exp(-1.6246)
          = 0.263
output[1] = exp(0.8912) / exp(-0.0625) + exp(0.8912) + exp(-1.6246)
          = 0.682
output[2] = exp(-1.6246) / exp(-0.0625) + exp(0.8912) + exp(-1.6246)
          = 0.055

The purpose of softmax is to coerce the preliminary output values so they sum to 1.0 and can be interpreted as probabilities.

OK, but where do the values of the weights and biases come from? To get the values of the weights and biases, you must train the network using a set of data that has known input values and known, correct, output values. The idea is to use an optimization algorithm that finds the values for the weights and biases that minimizes the difference between the computed output values and the correct output values.

Demo Program Structure

The overall structure of the demo program is shown in Figure 3.

Figure 3 Demo Program Structure

# iris_demo.py
import cntk as C
...
def my_print(arr, dec):
def create_reader(path, is_training, input_dim,
  output_dim):
def save_weights(fn, ihWeights, hBiases,
  hoWeights, oBiases):
def do_demo():
def main():
  print("\nBegin Iris demo (CNTK 2.0) \n")
  np.random.seed(0)
  do_demo()  # all the work is done in do_demo()
if __name__ == "__main__":
  main()

The demo program has a function named main that acts as an entry point. The main function sets the seed of the global random number generator to 0 so that results will be reproducible, and then calls function do_demo that does all the work.

Helper function my_print displays a numeric vector using a specified number of decimals. The point here is that CNTK is just a library, and you must mix program-defined Python code with calls to the various CNTK functions. Helper function create_reader returns a special CNTK object that can be used to read data from a data file that uses the special CTF (CNTK text format) formatting protocol.

Helper function save_weights accepts a filename, a matrix of input-to-hidden weights, an array of hidden node biases, a matrix of hidden-to-output weights, and an array of output node biases, and writes those values to a text file so they can be used by other systems.

The complete listing for the demo program, with a few minor edits, is presented in Figure 4. I use an indent of two-space characters instead of the more common four, to save space. Also, all normal error-checking code has been removed.

Figure 4 Complete Demo Program

# iris_demo.py
# Anaconda 4.1.1 (Python 3.5, NumPy 1.11.1)
# CNTK 2.0 RC1
# Use a one-hidden layer simple NN with 2 hidden nodes
# to classify the Iris Dataset.
# This version uses the built-in Reader functions and
# data files that use the CTF format.
# trainData_cntk.txt - 120 items (40 each class)
# testData_cntk.txt - remaining 30 items
import numpy as np
import cntk as C
from cntk import Trainer  # to train the NN
from cntk.learners import sgd, learning_rate_schedule, \
  UnitType
from cntk.ops import *  # input_variable() def
from cntk.logging import ProgressPrinter
from cntk.initializer import glorot_uniform
from cntk.layers import default_options, Dense
from cntk.io import CTFDeserializer, MinibatchSource, \
  StreamDef, StreamDefs, INFINITELY_REPEAT
# =====
def my_print(arr, dec):
  # print an array of float/double with dec decimals
  fmt = "%." + str(dec) + "f" # like %.4f
  for i in range(0, len(arr)):
    print(fmt % arr[i] + '  ', end='')
  print("\n")
def create_reader(path, is_training, input_dim, output_dim):
  return MinibatchSource(CTFDeserializer(path, StreamDefs(
    features = StreamDef(field='attribs', shape=input_dim,
      is_sparse=False),
    labels = StreamDef(field='species', shape=output_dim,
      is_sparse=False)
  )), randomize = is_training,
    max_sweeps = INFINITELY_REPEAT if is_training else 1)
def save_weights(fn, ihWeights, hBiases,
  hoWeights, oBiases):
  f = open(fn, 'w')
  for vals in ihWeights:
    for v in vals:
      f.write("%s\n" % v)
  for v in hBiases:
    f.write("%s\n" % v)
  for vals in hoWeights:
    for v in vals:
      f.write("%s\n" % v)
  for v in oBiases:
    f.write("%s\n" % v)
  f.close()
def do_demo():
  # create NN, train, test, predict
  input_dim = 4
  hidden_dim = 2
  output_dim = 3
  train_file = "trainData_cntk.txt"
  test_file = "testData_cntk.txt"
  input_Var = C.ops.input(input_dim, np.float32)
  label_Var = C.ops.input(output_dim, np.float32)
  print("Creating a 4-2-3 tanh softmax NN for Iris data ")
  with default_options(init = glorot_uniform()):
    hLayer = Dense(hidden_dim, activation=C.ops.tanh,
      name='hidLayer')(input_Var) 
    oLayer = Dense(output_dim, activation=C.ops.softmax,
      name='outLayer')(hLayer)
  nnet = oLayer
  # ----------------------------------
  print("Creating a cross entropy mini-batch Trainer \n")
  ce = C.cross_entropy_with_softmax(nnet, label_Var)
  pe = C.classification_error(nnet, label_Var)
  fixed_lr = 0.05
  lr_per_batch = learning_rate_schedule(fixed_lr,
    UnitType.minibatch)
  learner = C.sgd(nnet.parameters, lr_per_batch)
  trainer = C.Trainer(nnet, (ce, pe), [learner])
  max_iter = 5000  # 5000 maximum training iterations
  batch_size = 5   # mini-batch size  5
  progress_freq = 1000  # print error every n minibatches
  reader_train = create_reader(train_file, True, input_dim,
    output_dim)
  my_input_map = {
    input_Var : reader_train.streams.features,
    label_Var : reader_train.streams.labels
  }
  pp = ProgressPrinter(progress_freq)
  print("Starting training \n")
  for i in range(0, max_iter):
    currBatch = reader_train.next_minibatch(batch_size,
      input_map = my_input_map)
    trainer.train_minibatch(currBatch)
    pp.update_with_trainer(trainer)
  print("\nTraining complete")
  # ----------------------------------
  print("\nEvaluating test data \n")
  reader_test = create_reader(test_file, False, input_dim,
    output_dim)
  numTestItems = 30
  allTest = reader_test.next_minibatch(numTestItems,
    input_map = my_input_map)
  test_error = trainer.test_minibatch(allTest)
  print("Classification error on the 30 test items = %f"
    % test_error)
  # ----------------------------------
  # make a prediction for an unknown flower
  # first train versicolor = 7.0,3.2,4.7,1.4,0,1,0
  unknown = np.array([[6.9, 3.1, 4.6, 1.3]],
    dtype=np.float32)
  print("\nPredicting Iris species for input features: ")
  my_print(unknown[0], 1)  # 1 decimal
  predicted = nnet.eval( {input_Var: unknown} )
  print("Prediction is: ")
  my_print(predicted[0], 3)  # 3 decimals
  # ---------------------------------
  print("\nTrained model input-to-hidden weights: \n")
  print(hLayer.hidLayer.W.value)
  print("\nTrained model hidden node biases: \n")
  print(hLayer.hidLayer.b.value)
  print("\nTrained model hidden-to-output weights: \n")
  print(oLayer.outLayer.W.value)
  print("\nTrained model output node biases: \n")
  print(oLayer.outLayer.b.value)
  save_weights("weights.txt", hLayer.hidLayer.W.value,
    hLayer.hidLayer.b.value, oLayer.outLayer.W.value,
    oLayer.outLayer.b.value)
  return 0  # success
def main():
  print("\nBegin Iris demo (CNTK 2.0) \n")
  np.random.seed(0)
  do_demo()  # all the work is done in do_demo()
if __name__ == "__main__":
  main()
# end script
-------------------------------------------------------------------

The demo program begins by importing the required Python packages and modules. I’ll describe the modules as they’re used in the demo code.

Setting Up the Data

There are two basic ways to read data for use by CNTK functions. You can format your files using the special CTF format and then use built-in CNTK reader functions, or you can use data in non-CTF format and write a custom reader function. The demo program uses the CTF data format approach. File trainData_cntk.txt looks like:

|attribs 5.1 3.5 1.4 0.2 |species 1 0 0
...
|attribs 7.0 3.2 4.7 1.4 |species 0 1 0
...
|attribs 6.9 3.1 5.4 2.1 |species 0 0 1

You specify the feature (predictor) values by using the “|” character followed by a string identifier, and the label values in the same way. You can use whatever you like for identifiers.

To create the training data, I go to the Wikipedia entry for Fisher’s Iris Data, copy and paste all 150 items into Notepad, select the first 40 of each species, and then do a bit of edit-replace. I use the leftover 10 of each species in the same way to create the testData_cntk.txt file. The create_reader function that uses the data files is defined as:

def create_reader(path, is_training, input_dim, output_dim):
  return MinibatchSource(CTFDeserializer(path, StreamDefs(
    features = StreamDef(field='attribs', shape=input_dim,
      is_sparse=False),
    labels = StreamDef(field='species', shape=output_dim,
      is_sparse=False)
  )), randomize = is_training,
    max_sweeps = INFINITELY_REPEAT if is_training else 1)

You can think of this function as boilerplate for CTF files. The only thing you’ll need to edit is the string identifiers (“attribs” and “species” here) used to identify features and labels.

Creating a Neural Network

The definition of function do_demo begins with:

def do_demo():
  input_dim = 4
  hidden_dim = 2
  output_dim = 3
  train_file = "trainData_cntk.txt"
  test_file = "testData_cntk.txt"
  input_Var = C.ops.input(input_dim, np.float32)
  label_Var = C.ops.input(output_dim, np.float32)
...

The meanings and values of the first five variables should be clear to you. Variables input_Var and label_Var are created using the built-in function named input, located in the cntk.ops package. You can think of these variables as numeric matrices, plus some special properties needed by CNTK.

The neural network is created with these statements:

print("Creating a 4-2-3 tanh softmax NN for Iris data ")
with default_options(init = glorot_uniform()):
  hLayer = Dense(hidden_dim, activation=C.ops.tanh,
    name='hidLayer')(input_Var) 
  oLayer = Dense(output_dim, activation=C.ops.softmax,
    name='outLayer')(hLayer)
nnet = oLayer

The Dense function creates a fully connected layer of nodes. You pass in the number of nodes and an activation function. The name parameter is optional in general, but is needed if you want to extract the weights and biases associated with a layer. Notice that instead of passing an array of input values for a layer into the Dense function, you append an object holding those values to the function call.

When creating a neural network layer, you should specify how the values for the associated weights and biases are initialized, using the init parameter to the Dense function. The demo initializes weights and biases using the Glorot (also called Xavier initialization) mini-algorithm implemented in function glorot_uniform. There are several alternative initialization functions in the cntk.initializer module.

The statement nnet = oLayer creates an alias for the output layer named oLayer. The idea is that the output layer represents a single layer, but also the output of the entire neural network.

Training the Neural Network

After training and test data have been set up, and a neural network has been created, the next step is to train the network. The demo program creates a trainer with these statements:

print("Creating a cross entropy mini-batch Trainer \n")
ce = C.cross_entropy_with_softmax(nnet, label_Var)
pe = C.classification_error(nnet, label_Var)
fixed_lr = 0.05
lr_per_batch = learning_rate_schedule(fixed_lr,
  UnitType.minibatch)
learner = C.sgd(nnet.parameters, lr_per_batch)
trainer = C.Trainer(nnet, (ce, pe), [learner])

The most common approach for measuring training error is to use what’s called cross-entropy error, also known as log loss. The main alternative to cross-entropy error for numeric problems similar to the Iris demo is the squared_error function.

After training has completed, you’re more interested in classification accuracy than in cross-entropy error—you want to know how many correct predictions the model makes. The demo uses the built-in classification_error function.

There are several optimization algorithms that can be used to minimize error during training. The most basic is called stochastic gradient descent (SGD), which is often called back-propagation. Alternative algorithms supported by CNTK include SGD with momentum, Nesterov and Adam (adaptive moment estimation).

The mini-batch form of SGD reads in one subset of the training items at a time, calculates the calculus gradients, and then updates all weights and bias values by a small increment called the learning rate. Training is often highly sensitive to the values used for the learning rate. After a CNTK trainer object has been created, the demo prepares training with these statements:

max_iter = 5000 
batch_size = 5 
progress_freq = 1000
reader_train = create_reader(train_file, True,
  input_dim, output_dim)
my_input_map = {
  input_Var : reader_train.streams.features,
  label_Var : reader_train.streams.labels
}
pp = ProgressPrinter(progress_freq)

The SGD algorithm is iterative, so you must specify a maximum number of iterations. Note that the value for the mini-batch size should be between 1 and the number of items in the training data.

The reader object for the trainer object is created by a call to create_reader. The True argument that’s passed to create_reader tells the function that the reader is going to be used for training data rather than test data and, therefore, that the data items should be processed in random order, which is important to avoid training stagnation.

The my_input_map object is a Python two-item collection. It’s used to tell the reader object where the feature data resides (input_Var) and where the label data resides (label_Var). Although you can print whatever information you wish inside the main training loop, the built-in ProgressPrinter object is a very convenient way to monitor training. Training is performed with these statements:

print("Starting training \n")
for i in range(0, max_iter):
  currBatch = reader_train.next_minibatch(batch_size,
    input_map = my_input_map)
  trainer.train_minibatch(currBatch)
  pp.update_with_trainer(trainer)
print("\nTraining complete")

In each training iteration, the next_minibatch function pulls a batch (5 in the demo) of training items, and uses SGD to update the current values of weights and biases.

Testing the Network

After a neural network has been trained, you should use the trained model on the holdout test data. The idea is that given enough training time and combinations of learning rate and batch size, you can eventually get close to 100 percent accuracy on your training data. However, excessive training can over-fit and lead to a model that predicts very poorly on new data.

print("\nEvaluating test data \n")
reader_test = create_reader(test_file, False, input_dim,
  output_dim)
numTestItems = 30
allTest = reader_test.next_minibatch(numTestItems,
  input_map = my_input_map)
test_error = trainer.test_minibatch(allTest)
print("Classification error on the 30 test items = %f"
  % test_error)

The next_minibatch function examines all 30 test items at once. Notice that you can reuse the my_input_map object for the test data because the mapping to input_Var and label_Var is the same as to the training data.

Making Predictions

Ultimately, the purpose of a neural network model is to make predictions for new, previously unseen data.

unknown = np.array([[6.9, 3.1, 4.6, 1.3]],
  dtype=np.float32)
print("\nPredicting Iris species for features: ")
my_print(unknown[0], 1)  # 1 decimal
predicted = nnet.eval( {input_Var: unknown} )
print("Prediction is: ")
my_print(predicted[0], 3)  # 3 decimals

The variable named unknown is an array-of-array-style numpy matrix, which is required by a CNTK neural network. The eval function accepts input values, runs them through the trained model using the neural network input-output process and the resulting three probabilities (0.263, 0.682, 0.055) are displayed.

In some situations it’s useful to iterate through all test items and use the eval function to see exactly which items were incorrectly predicted. You can also write code that uses the numpy.argmax function to determine the largest value in the output probabilities and explicitly print “correct” or “wrong.”

Exporting Weights and Biases

The demo program concludes by fetching the trained model’s weights and biases, and then displays them to the shell, as well as saves them to a text file. The idea is that you can train a neural network using CNTK, then use the trained model weights and biases in another system, such as a C# program, to make predictions.

The weights and bias values for the hidden layer are displayed like this:

print("\nTrained model input-to-hidden weights: \n")
print(hLayer.hidLayer.W.value)
print("\nTrained model hidden node biases: \n")
print(hLayer.hidLayer.b.value)

Recall that a CNTK network layer is a named object (hLayer), but that an optional name property was passed in when the layer was created (hidLayer). The tersely named W property of a named layer returns an array-of-arrays-style matrix holding the input-to-hidden weights. Similarly, the b property gives you the biases. The weights and biases for the output layer are obtained in the same way:

print("\nTrained model hidden-to-output weights: \n")
print(oLayer.outLayer.W.value)
print("\nTrained model output node biases: \n")
print(oLayer.outLayer.b.value)

The values of the (4 * 2) + (2 * 3) = 14 weights, and the (2 + 3) = 5 biases, are saved to text file, and function do_demo concludes, like so:

...
  save_weights("weights.txt", hLayer.hidLayer.W.value,
  hLayer.hidLayer.b.value, oLayer.outLayer.W.value,
  oLayer.outLayer.b.value)
  return 0  # success

The program-defined save_weights function writes one value per line. The order in which the values are written (input-to-hidden weights, then hidden biases, then hidden-to-output weights, then output biases) is arbitrary, so any system that uses the values from the weights file must use the same order.

Wrapping Up

If you’re new to neural networks, the number of decisions you have to make when using CNTK might seem a bit overwhelming. You need to decide how many hidden nodes to use, pick a hidden layer activation function, a learning optimization algorithm, a training error function, a training weight-initialization algorithm, a batch size, a learning rate and a maximum number of iterations.

However, in most cases, you can use the demo program presented in this article as a template, and experiment mostly with the number of hidden nodes, the maximum number of iterations, and the learning rate. In other words, you can safely use tanh hidden layer activation, cross-entropy for training error, Glorot initialization for weights and biases, and a training mini-batch size that is roughly 5 percent to 10 percent of the number of training items. The one exception to this is that instead of using the SGD training optimization algorithm, even though it’s the most commonly used, I suggest using the Adam algorithm.

Once you become familiar with CNTK basics, you can use the library to build very powerful, advanced, deep neural network architectures such as convolutional neural networks (CNNs) for image recognition and long short-term memory recurrent neural networks (LSTM RNNs) for the analysis of natural language data.

Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products, including Internet Explorer and Bing. Dr. McCaffrey can be reached at jamccaff@microsoft.com.

Thanks to the following Microsoft technical experts who reviewed this article: Chris Lee and Sayan Pathak

Discuss this article in the MSDN Magazine forum