Events
Apr 8, 3 PM - May 28, 7 AM
Sharpen your AI skills and enter the sweepstakes to win a free Certification exam
Register now!This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Neural networks for regression modeling and for Binary and multi-class classification.
rxNeuralNet(formula = NULL, data, type = c("binary", "multiClass",
"regression"), numHiddenNodes = 100, numIterations = 100,
optimizer = sgd(), netDefinition = NULL, initWtsDiameter = 0.1,
maxNorm = 0, acceleration = c("sse", "gpu"), miniBatchSize = 1,
normalize = "auto", mlTransforms = NULL, mlTransformVars = NULL,
rowSelection = NULL, transforms = NULL, transformObjects = NULL,
transformFunc = NULL, transformVars = NULL, transformPackages = NULL,
transformEnvir = NULL, blocksPerRead = rxGetOption("blocksPerRead"),
reportProgress = rxGetOption("reportProgress"), verbose = 1,
computeContext = rxGetOption("computeContext"),
ensemble = ensembleControl(), ...)
The formula as described in rxFormula. Interaction terms and F()
are not currently supported in the MicrosoftML.
A data source object or a character string specifying a .xdf file or a data frame object.
A character string denoting Fast Tree type:
"binary"
for the default binary classification neural network."multiClass"
for multi-class classification neural network."regression"
for a regression neural network.The default number of hidden nodes in the neural net. The default value is 100.
The number of iterations on the full training set. The default value is 100.
A list specifying either the sgd
or adaptive
optimization algorithm. This list can be created using sgd or adaDeltaSgd. The default value is sgd
.
The Net# definition of the structure of the neural network. For more information about the Net# language, see Reference Guide
Sets the initial weights diameter that specifies the range from which values are drawn for the initial learning weights. The weights are initialized randomly from within this range. The default value is 0.1.
Specifies an upper bound to constrain the norm of the incoming weight vector at each hidden unit. This can be very important in maxout neural networks as well as in cases where training produces unbounded weights.
Specifies the type of hardware acceleration to use. Possible values are "sse" and "gpu". For GPU acceleration, it is recommended to use a miniBatchSize greater than one. If you want to use the GPU acceleration, there are additional manual setup steps are required:
CUDA Toolkit
).cudnn Library
).system.file("mxLibs/x64", package = "MicrosoftML")
.Sets the mini-batch size. Recommended values are between 1 and 256. This parameter is only used when the acceleration is GPU. Setting this parameter to a higher value improves the speed of training, but it might negatively affect the accuracy. The default value is 1.
Specifies the type of automatic normalization used:
"auto"
: if normalization is needed, it is performed automatically. This is the default choice."no"
: no normalization is performed."yes"
: normalization is performed."warn"
: if normalization is needed, a warning message is displayed, but normalization is not performed.MaxMin
normalizer is used. It normalizes values in an interval [a, b] where -1 <= a <= 0
and 0 <= b <= 1
and b - a = 1
. This normalizer preserves sparsity by mapping zero to zero.Specifies a list of MicrosoftML transforms to be performed on the data before training or NULL
if no transforms are to be performed. See featurizeText, categorical, and categoricalHash, for transformations that are supported. These transformations are performed after any specified R transformations. The default value is NULL
.
Specifies a character vector of variable names to be used in mlTransforms
or NULL
if none are to be used. The default value is NULL
.
Specifies the rows (observations) from the data set that are to be used by the model with the name of a logical variable from the data set (in quotes) or with a logical expression using variables in the data set. For example, rowSelection = "old"
will only use observations in which the value of the variable old
is TRUE
. rowSelection = (age > 20) & (age < 65) & (log(income) > 10)
only uses observations in which the value of the age
variable is between 20 and 65 and the value of the log
of the income
variable is greater than 10. The row selection is performed after processing any data transformations (see the arguments transforms
or transformFunc
). As with all expressions, rowSelection
can be defined outside of the function call using the expression function.
An expression of the form list(name = expression, ``...)
that represents the first round of variable transformations. As with all expressions, transforms
(or rowSelection
) can be defined outside of the function call using the expression function.
A named list that contains objects that can be referenced by transforms
, transformsFunc
, and rowSelection
.
The variable transformation function. See rxTransform for details.
A character vector of input data set variables needed for the transformation function. See rxTransform for details.
A character vector specifying additional R packages (outside of those specified in rxGetOption("transformPackages")
) to be made available and preloaded for use in variable transformation functions. For example, those explicitly defined in RevoScaleR functions via their transforms
and transformFunc
arguments or those defined implicitly via their formula
or rowSelection
arguments. The transformPackages
argument may also be NULL
, indicating that no packages outside rxGetOption("transformPackages")
are preloaded.
A user-defined environment to serve as a parent to all environments developed internally and used for variable data transformation. If transformEnvir = NULL
, a new "hash" environment with parent baseenv()
is used instead.
Specifies the number of blocks to read for each chunk of data read from the data source.
An integer value that specifies the level of reporting on the row processing progress:
0
: no progress is reported.1
: the number of processed rows is printed and updated.2
: rows processed and timings are reported.3
: rows processed and all timings are reported.An integer value that specifies the amount of output wanted. If 0
, no verbose output is printed during calculations. Integer values from 1
to 4
provide increasing amounts of information.
Sets the context in which computations are executed, specified with a valid RxComputeContext. Currently local and RxInSqlServer compute contexts are supported.
Control parameters for ensembling.
Additional arguments to be passed directly to the Microsoft Compute Engine.
A neural network is a class of prediction models inspired by the human brain. A neural network can be represented as a weighted directed graph. Each node in the graph is called a neuron. The neurons in the graph are arranged in layers, where neurons in one layer are connected by a weighted edge (weights can be 0 or positive numbers) to neurons in the next layer. The first layer is called the input layer, and each neuron in the input layer corresponds to one of the features. The last layer of the function is called the output layer. So in the case of binary neural networks it contains two output neurons, one for each class, whose values are the probabilities of belonging to each class. The remaining layers are called hidden layers. The values of the neurons in the hidden layers and in the output layer are set by calculating the weighted sum of the values of the neurons in the previous layer and applying an activation function to that weighted sum. A neural network model is defined by the structure of its graph (namely, the number of hidden layers and the number of neurons in each hidden layer), the choice of activation function, and the weights on the graph edges. The neural network algorithm tries to learn the optimal weights on the edges based on the training data.
Although neural networks are widely known for use in deep learning and modeling complex problems such as image recognition, they are also easily adapted to regression problems. Any class of statistical models can be considered a neural network if they use adaptive weights and can approximate non-linear functions of their inputs. Neural network regression is especially suited to problems where a more traditional regression model cannot fit a solution.
rxNeuralNet
: an rxNeuralNet
object with the
trained model.
NeuralNet
: a learner specification object of
class maml
for the Neural Net trainer.
This algorithm is single-threaded and will not attempt to load the entire dataset into memory.
Microsoft Corporation Microsoft Technical Support
Wikipedia: Artificial neural network
rxFastTrees, rxFastForest, rxFastLinear, rxLogisticRegression, rxOneClassSvm, featurizeText, categorical, categoricalHash, rxPredict.mlModel.
# Estimate a binary neural net
rxNeuralNet1 <- rxNeuralNet(isCase ~ age + parity + education + spontaneous + induced,
transforms = list(isCase = case == 1),
data = infert)
# Score to a data frame
scoreDF <- rxPredict(rxNeuralNet1, data = infert,
extraVarsToWrite = "isCase",
outData = NULL) # return a data frame
# Compute and plot the Radio Operator Curve and AUC
roc1 <- rxRoc(actualVarName = "isCase", predVarNames = "Probability", data = scoreDF)
plot(roc1)
rxAuc(roc1)
#########################################################################
# Regression neural net
# Create an xdf file with the attitude data
myXdf <- tempfile(pattern = "tempAttitude", fileext = ".xdf")
rxDataStep(attitude, myXdf, rowsPerRead = 50, overwrite = TRUE)
myXdfDS <- RxXdfData(file = myXdf)
attitudeForm <- rating ~ complaints + privileges + learning +
raises + critical + advance
# Estimate a regression neural net
res2 <- rxNeuralNet(formula = attitudeForm, data = myXdfDS,
type = "regression")
# Score to data frame
scoreOut2 <- rxPredict(res2, data = myXdfDS,
extraVarsToWrite = "rating")
# Plot the rating versus the score with a regression line
rxLinePlot(rating~Score, type = c("p","r"), data = scoreOut2)
# Clean up
file.remove(myXdf)
#############################################################################
# Multi-class neural net
multiNN <- rxNeuralNet(
formula = Species~Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
type = "multiClass", data = iris)
scoreMultiDF <- rxPredict(multiNN, data = iris,
extraVarsToWrite = "Species", outData = NULL)
# Print the first rows of the data frame with scores
head(scoreMultiDF)
# Compute % of incorrect predictions
badPrediction = scoreMultiDF$Species != scoreMultiDF$PredictedLabel
sum(badPrediction)*100/nrow(scoreMultiDF)
# Look at the observations with incorrect predictions
scoreMultiDF[badPrediction,]
Events
Apr 8, 3 PM - May 28, 7 AM
Sharpen your AI skills and enter the sweepstakes to win a free Certification exam
Register now!