# rxFormula: Formula Syntax for RevoScaleR Analysis Functions

## Description

Highlights of the similarities and differences in formulas between
**RevoScaleR** and standard R functions.

## Details

The formula syntax used by the **RevoScaleR** analysis functions is similar,
but not identical, to regular R formula syntax. The most important differences
are:

*
With the exception of rxSummary, dot (`.`

)
explanatory variable expansion is not supported.

* Multiple column producing in-line variable transformations, e.g. poly and bs, are not supported.

*
The original order of the explanatory variables are maintained, i.e.
the main effects are not forced to precede the interaction terms. (See
`keep.order = TRUE`

setting in terms.formula for more
information.)

A formula typically consists of a *response*, which in most
**RevoScaleR** functions can be a single variable or multiple variables
combined using cbind, the `"~"`

operator, and one or
more *predictors*,typically separated by the `"+"`

operator.
The rxSummary function typically requires a formula with no
response.

Interactions are indicated using the `":"`

operator. The interaction of
two categorical variables results in a categorical variable containing the
full set of combinations of the two categories and adds a coefficient to the
model for each category. The interaction of two continuous variables is the
same as the multiplication of the two variables. The interaction of a
continuous and a categorical variable adds a coefficient for the continuous
variable for each level of the categorical variable. The asterisk operator
`"*"`

between categorical variables adds all subsets of interactions of
the variables to the model.

In **RevoScaleR**, predictors must be single-column variables.

**RevoScaleR** formulas support two formula functions for managing
categorical variables:

`F(x, low, high, exclude)`

creates a categorical variable out of continuous variable`x`

. Additional arguments`low`

,`high`

, and`exclude`

can be included to specify the value of the lowest category, the highest category, and how to handle values outside the specified range. For each integer in the range from`low`

to`high`

inclusive,**RevoScaleR**creates a level and assigns values greater than or equal to an integer n, but less than n+1, to n's level. If`x`

is already a factor,`F(x, low, high, exclude)`

can be used to limit the range of levels used; in this case`low`

and`high`

represent the indexes of the factor levels, and must be integers in the range from 1 to the number of levels.`N(x)`

creates a continuous variable from categorical variable`x`

. Note, however, that the value of this function is equivalent to the factor codes, and has no relation to any numeric values within the levels of the function. For this, use the construction`as.numeric(levels(x))[x]`

.

## Author(s)

Microsoft Corporation `Microsoft Technical Support`

## See Also

rxTransform, rxCrossTabs, rxCube, rxLinMod, rxLogit, rxSummary.

## Examples

```
# These two lines are set up for the examples
sampleDataDir <- rxGetOption("sampleDataDir")
censusWorkers <- file.path(sampleDataDir, "CensusWorkers.xdf")
rxSummary(~ F(age) + sex, data = censusWorkers)
rxSummary(~ F(age, low = 30, high = 45, exclude = FALSE), data = censusWorkers)
rxCube(incwage ~ F(age) : sex, data = censusWorkers)
```