Histogram plot for a variable in an .xdf file or data frame
rxHistogram(formula, data, pweights = NULL, fweights = NULL, numBreaks = NULL, startVal = NULL, endVal = NULL, levelsToDrop = NULL, levelsToKeep = NULL, rowSelection = NULL, transforms = NULL, transformObjects = NULL, transformFunc = NULL, transformVars = NULL, transformPackages = NULL, transformEnvir = NULL, blocksPerRead = rxGetOption("blocksPerRead"), histType = "Counts", title = NULL, subtitle = NULL, xTitle = NULL, yTitle = NULL, xNumTicks = NULL, yNumTicks = NULL, xAxisMinMax = NULL, yAxisMinMax = NULL, fillColor = "cyan", lineColor = "black", lineStyle = "solid", lineWidth = 1, plotAreaColor = "gray90", gridColor = "white", gridLineWidth = 1, gridLineStyle = "solid", maxNumPanels = 100, reportProgress = rxGetOption("reportProgress"), print = TRUE, ...)
formula describing the data to plot. It should take the form of
~x|g1 + g2 where
g2 are optional conditioning factor variables and x is the name of a variable or an on-the-fly factorization F(x). Other expressions of x are not supported.
either an RxXdfData object, a character string specifying the .xdf file, or a data frame containing the variable to plot.
character string specifying the variable to use as probability weights for the observations.
character string specifying the variable to use as frequency weights for the observations.
number of breaks to use to cut numeric data, including the upper and lower bounds.
low value used for cutting numeric data.
high value used for cutting numeric data.
levels to exclude if the histogram variable is a factor.
levels to keep if the histogram variable is a factor.
name of a logical variable in the data set (in quotes) or a logical expression using variables in the data set to specify row selection. For example,
rowSelection = "old" will use only observations in which the value of the variable
rowSelection = (age > 20) & (age < 65) & (log(income) > 10) will use only observations in which the value of the
age variable is between 20 and 65 and the value of the
log of the
income variable is greater than 10. The row selection is performed after processing any data transformations (see the arguments
transformFunc). As with all expressions,
rowSelection can be defined outside of the function call using the expression function.
an expression of the form
list(name = expression, ...) representing the first round of variable transformations. As with all expressions,
rowSelection) can be defined outside of the function call using the expression function.
a named list containing objects that can be referenced by
variable transformation function. See rxTransform for details.
character vector of input data set variables needed for the transformation function. See rxTransform for details.
character vector defining additional R packages (outside of those specified in
rxGetOption("transformPackages")) to be made available and preloaded for use in variable transformation functions, e.g., those explicitly defined in RevoScaleR functions via their
transformFunc arguments or those defined implicitly via their
rowSelection arguments. The
transformPackages argument may also be
NULL, indicating that no packages outside
rxGetOption("transformPackages") will be preloaded.
user-defined environment to serve as a parent to all environments developed internally and used for variable data transformation. If
transformEnvir = NULL, a new "hash" environment with parent
baseenv() is used instead.
number of blocks to read for each chunk of data read from the data source.
character string specifying
main title for the plot. Alternatively
main can be used.
subtitle (at the bottom) for the plot. Alternatively
sub can be used.
title for the X axis. Alternatively
xlab can be used.
title for the Y axis. Alternatively
ylab can be used.
number of tick marks on X axis (ignored for factor variables).
number of tick marks on Y axis.
numeric vector of length 2 containing a minimum and maximum value for the X axis. Alternatively
xlim can be used.
numeric vector of length 2 containing a minimum and maximum value for the Y axis. Alternatively
ylim can be used.
fill color for histogram. Use colors to see color names.
line color for border of histogram.
line style for border of histogram:
"blank", "solid", "dashed", ``"dotted", "dotdash", "longdash", or
line width for border of histogram. Alternatively
lwd can be used.
background color for the plot area.
color for grid lines.
line width for grid lines.
line style for grid lines.
integer specifying the maximum number of panels to plot. The number of panels is determined by the product of the number of levels of each conditioning variable. If the number of panels exceeds the maxNumPanels an error is given and the plot is not drawn. If maxNumPanels is NULL, it is ignored.
integer value with options:
0: no progress is reported.
1: the number of processed rows is printed and updated.
2: rows processed and timings are reported.
3: rows processed and all timings are reported.
TRUE, the plot is printed. If
FALSE, and the lattice package is loaded, an lattice plot object is returned invisibly and can be printed later.
additional arguments to be passed directly to the underlying
rxHistogram calls rxCube to perform computations and uses
the lattice graphics package (barchart or
xyplot) to create the plot. The
function will attempt bin continuous data in reasonable intervals. For
faster computation (using a bin for every integer value), use
the F() function around the variable. Descriptive argument names
are used to facilitate quick and easy plotting and self-documenting code
for new R users.
An object of class "trellis". It is automatically printed within the function.
Microsoft Technical Support
# Examples using airline data airlineData <- file.path(rxGetOption("sampleDataDir"), "AirlineDemoSmall.xdf") # Use the F() function to quickly compute bins for each integer level rxHistogram(~F(CRSDepTime), data = airlineData) # Specify the approximate number of breaks rxHistogram(~CRSDepTime, numBreaks=11, data = airlineData) # Examples using census data subsample censusWorkers <- file.path(rxGetOption("sampleDataDir"), "CensusWorkers") # Create panels for each of the 3 states rxHistogram(~ sex | state, data = censusWorkers) # Repeat, printing x axis labels at an angle, and all panels in a row rxHistogram(~ sex | state, scales = list(x = list(rot = 30)), data = censusWorkers, layout = c(3,1)) # Create panels for age for each sex for each state rxHistogram(~ age | sex + state, data = censusWorkers) # Specify how wage income should be broken into bins rxHistogram(~ incwage | state + sex, title="Wage Income Up To 100,000", endVal = 100000, numBreaks=21, data = censusWorkers) # Show panels for each state on a separate page numCols <- 1 numRows <- 2 ## Not run: par(ask=TRUE) # Set ask to pause between each plot ## End(Not run) rxHistogram(~ age | sex + state, data = censusWorkers, layout=c(numCols, numRows)) # Create a jpeg file for each page, named myplot001.jpeg, etc ## Not run: jpeg(file="myplot rxHistogram(~ age | sex + state, data = censusWorkers, blocksPerRead=6, layout=c(numCols, numRows)) dev.off() ## End(Not run)