rxNaiveBayes: Parallel External Memory Algorithm for Naive Bayes Classifiers

Description

Fit Naive Bayes Classifiers on an .xdf file or data frame for small or large data using parallel external memory algorithm.

Usage

  rxNaiveBayes(formula, data, smoothingFactor = 0,   ...  )

Arguments

formula

formula as described in rxFormula.

data

either a data source object, a character string specifying a .xdf file, or a data frame object.

smoothingFactor

a positive smoothing factor to account for cases not present in the training data. It avoids modeling issues by preventing zero conditional probability estimates.

...

additional arguments to be passed directly to rxSummary such as byTerm, rowSelection, pweights, fweights, transforms, transformObjects, transformFunc, transformVars, transformPackages, transformEnvir, useSparseCube, removeZeroCounts, blocksPerRead, reportProgress, verbose, xdfCompressionLevel.

Value

an "rxNaiveBayes" object containing the following components:

  • apriori - a vector of prior class probabilities for the dependent variable.

  • tables - a list of tables, one for each predictor variable.

    • For a categorical variable, the table contains the conditional probabilities of the variable given the target class.
    • For a numeric variable, the table contains the mean and standard deviation of the variable given the target class.
  • levels - the levels of the dependent variable.

  • call - the matched call.

Author(s)

Microsoft Corporation Microsoft Technical Support

References

Naive Bayes classifier https://en.wikipedia.org/wiki/Naive_Bayes_classifier .

See Also

rxPredict.rxNaiveBayes.

Examples


 # multi-class classification with an .xdf file
 claimsXdf <- file.path(rxGetOption("sampleDataDir"),"claims.xdf")
 claims.nb <- rxNaiveBayes(type ~ age + cost, data = claimsXdf)
 claims.nb

 # prediction
 claims.nb.pred <- rxPredict(claims.nb, claimsXdf)
 claims.nb.pred
 table(claims.nb.pred[["type_Pred"]], rxDataStep(claimsXdf)[["type"]])