rxNeuralNet: Neural Net


Neural networks for regression modeling and for Binary and multi-class classification.


  rxNeuralNet(formula = NULL, data, type = c("binary", "multiClass",
    "regression"), numHiddenNodes = 100, numIterations = 100,
    optimizer = sgd(), netDefinition = NULL, initWtsDiameter = 0.1,
    maxNorm = 0, acceleration = c("sse", "gpu"), miniBatchSize = 1,
    normalize = "auto", mlTransforms = NULL, mlTransformVars = NULL,
    rowSelection = NULL, transforms = NULL, transformObjects = NULL,
    transformFunc = NULL, transformVars = NULL, transformPackages = NULL,
    transformEnvir = NULL, blocksPerRead = rxGetOption("blocksPerRead"),
    reportProgress = rxGetOption("reportProgress"), verbose = 1,
    computeContext = rxGetOption("computeContext"),
    ensemble = ensembleControl(), ...)



The formula as described in rxFormula. Interaction terms and F() are not currently supported in the MicrosoftML.


A data source object or a character string specifying a .xdf file or a data frame object.


A character string denoting Fast Tree type:

  • "binary" for the default binary classification neural network.
  • "multiClass" for multi-class classification neural network.
  • "regression" for a regression neural network.


The default number of hidden nodes in the neural net. The default value is 100.


The number of iterations on the full training set. The default value is 100.


A list specifying either the sgd or adaptive optimization algorithm. This list can be created using sgd or adaDeltaSgd. The default value is sgd.


The Net# definition of the structure of the neural network. For more information about the Net# language, see Reference Guide


Sets the initial weights diameter that specifies the range from which values are drawn for the initial learning weights. The weights are initialized randomly from within this range. The default value is 0.1.


Specifies an upper bound to constrain the norm of the incoming weight vector at each hidden unit. This can be very important in maxout neural networks as well as in cases where training produces unbounded weights.


Specifies the type of hardware acceleration to use. Possible values are "sse" and "gpu". For GPU acceleration, it is recommended to use a miniBatchSize greater than one. If you want to use the GPU acceleration, there are additional manual setup steps are required:

  • Download and install NVidia CUDA Toolkit 6.5 (CUDA Toolkit ).
  • Download and install NVidia cuDNN v2 Library (cudnn Library ).
  • Find the libs directory of the MicrosoftRML package by calling system.file("mxLibs/x64", package = "MicrosoftML").
  • Copy cublas64_65.dll, cudart64_65.dll and cusparse64_65.dll from the CUDA Toolkit 6.5 into the libs directory of the MicrosoftML package.
  • Copy cudnn64_65.dll from the cuDNN v2 Library into the libs directory of the MicrosoftML package.


Sets the mini-batch size. Recommended values are between 1 and 256. This parameter is only used when the acceleration is GPU. Setting this parameter to a higher value improves the speed of training, but it might negatively affect the accuracy. The default value is 1.


Specifies the type of automatic normalization used:

  • "auto": if normalization is needed, it is performed automatically. This is the default choice.
  • "no": no normalization is performed.
  • "yes": normalization is performed.
  • "warn": if normalization is needed, a warning message is displayed, but normalization is not performed.
    Normalization rescales disparate data ranges to a standard scale. Feature scaling insures the distances between data points are proportional and enables various optimization methods such as gradient descent to converge much faster. If normalization is performed, a MaxMin normalizer is used. It normalizes values in an interval [a, b] where -1 <= a <= 0and 0 <= b <= 1 and b - a = 1. This normalizer preserves sparsity by mapping zero to zero.


Specifies a list of MicrosoftML transforms to be performed on the data before training or NULL if no transforms are to be performed. See featurizeText, categorical, and categoricalHash, for transformations that are supported. These transformations are performed after any specified R transformations. The default value is NULL.


Specifies a character vector of variable names to be used in mlTransforms or NULL if none are to be used. The default value is NULL.


Specifies the rows (observations) from the data set that are to be used by the model with the name of a logical variable from the data set (in quotes) or with a logical expression using variables in the data set. For example, rowSelection = "old" will only use observations in which the value of the variable old is TRUE. rowSelection = (age > 20) & (age < 65) & (log(income) > 10) only uses observations in which the value of the age variable is between 20 and 65 and the value of the log of the income variable is greater than 10. The row selection is performed after processing any data transformations (see the arguments transforms or transformFunc). As with all expressions, rowSelection can be defined outside of the function call using the expression function.


An expression of the form list(name = expression, ``...) that represents the first round of variable transformations. As with all expressions, transforms (or rowSelection) can be defined outside of the function call using the expression function.


A named list that contains objects that can be referenced by transforms, transformsFunc, and rowSelection.


The variable transformation function. See rxTransform for details.


A character vector of input data set variables needed for the transformation function. See rxTransform for details.


A character vector specifying additional R packages (outside of those specified in rxGetOption("transformPackages")) to be made available and preloaded for use in variable transformation functions. For exmple, those explicitly defined in RevoScaleR functions via their transforms and transformFunc arguments or those defined implicitly via their formula or rowSelection arguments. The transformPackages argument may also be NULL, indicating that no packages outside rxGetOption("transformPackages") are preloaded.


A user-defined environment to serve as a parent to all environments developed internally and used for variable data transformation. If transformEnvir = NULL, a new "hash" environment with parent baseenv() is used instead.


Specifies the number of blocks to read for each chunk of data read from the data source.


An integer value that specifies the level of reporting on the row processing progress:

  • 0: no progress is reported.
  • 1: the number of processed rows is printed and updated.
  • 2: rows processed and timings are reported.
  • 3: rows processed and all timings are reported.


An integer value that specifies the amount of output wanted. If 0, no verbose output is printed during calculations. Integer values from 1 to 4 provide increasing amounts of information.


Sets the context in which computations are executed, specified with a valid RxComputeContext. Currently local and RxInSqlServer compute contexts are supported.


Control parameters for ensembling.


Additional arguments to be passed directly to the Microsoft Compute Engine.


A neural network is a class of prediction models inspired by the human brain. A neural network can be represented as a weighted directed graph. Each node in the graph is called a neuron. The neurons in the graph are arranged in layers, where neurons in one layer are connected by a weighted edge (weights can be 0 or positive numbers) to neurons in the next layer. The first layer is called the input layer, and each neuron in the input layer corresponds to one of the features. The last layer of the function is called the output layer. So in the case of binary neural networks it contains two output neurons, one for each class, whose values are the probabilities of belonging to each class. The remaining layers are called hidden layers. The values of the neurons in the hidden layers and in the output layer are set by calculating the weighted sum of the values of the neurons in the previous layer and applying an activation function to that weighted sum. A neural network model is defined by the structure of its graph (namely, the number of hidden layers and the number of neurons in each hidden layer), the choice of activation function, and the weights on the graph edges. The neural network algorithm tries to learn the optimal weights on the edges based on the training data.

Although neural networks are widely known for use in deep learning and modeling complex problems such as image recognition, they are also easily adapted to regression problems. Any class of statistical models can be considered a neural network if they use adaptive weights and can approximate non-linear functions of their inputs. Neural network regression is especially suited to problems where a more traditional regression model cannot fit a solution.


  • rxNeuralNet: an rxNeuralNet object with the trained model.
  • NeuralNet: a learner specification object of class maml for the Neural Net trainer.


This algorithm is single-threaded and will not attempt to load the entire dataset into memory.


Microsoft Corporation Microsoft Technical Support


Wikipedia: Artificial neural network

See Also

rxFastTrees, rxFastForest, rxFastLinear, rxLogisticRegression, rxOneClassSvm, featurizeText, categorical, categoricalHash, rxPredict.mlModel.


 # Estimate a binary neural net
 rxNeuralNet1 <- rxNeuralNet(isCase ~ age + parity + education + spontaneous + induced,
                   transforms = list(isCase = case == 1),
                   data = infert)

 # Score to a data frame
 scoreDF <- rxPredict(rxNeuralNet1, data = infert, 
     extraVarsToWrite = "isCase",
     outData = NULL) # return a data frame

 # Compute and plot the Radio Operator Curve and AUC
 roc1 <- rxRoc(actualVarName = "isCase", predVarNames = "Probability", data = scoreDF) 

 # Regression neural net

 # Create an xdf file with the attitude data
 myXdf <- tempfile(pattern = "tempAttitude", fileext = ".xdf")
 rxDataStep(attitude, myXdf, rowsPerRead = 50, overwrite = TRUE)
 myXdfDS <- RxXdfData(file = myXdf)

 attitudeForm <- rating ~ complaints + privileges + learning + 
     raises + critical + advance

 # Estimate a regression neural net 
 res2 <- rxNeuralNet(formula = attitudeForm,  data = myXdfDS, 
     type = "regression")

 # Score to data frame
 scoreOut2 <- rxPredict(res2, data = myXdfDS, 
     extraVarsToWrite = "rating")

 # Plot the rating versus the score with a regression line
 rxLinePlot(rating~Score, type = c("p","r"), data = scoreOut2)

 # Clean up   

 # Multi-class neural net
 multiNN <- rxNeuralNet(
     formula = Species~Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
     type = "multiClass", data = iris)
 scoreMultiDF <- rxPredict(multiNN, data = iris, 
     extraVarsToWrite = "Species", outData = NULL)    
 # Print the first rows of the data frame with scores
 # Compute % of incorrect predictions
 badPrediction = scoreMultiDF$Species != scoreMultiDF$PredictedLabel
 # Look at the observations with incorrect predictions