rxHistogram: Histogram


Histogram plot for a variable in an .xdf file or data frame


  rxHistogram(formula, data, pweights = NULL, fweights = NULL, numBreaks = NULL,
              startVal = NULL, endVal = NULL, levelsToDrop = NULL,
              levelsToKeep = NULL, rowSelection = NULL, transforms = NULL,
              transformObjects = NULL, transformFunc = NULL, transformVars = NULL, 
              transformPackages = NULL, transformEnvir = NULL,
              blocksPerRead = rxGetOption("blocksPerRead"), 
              histType = "Counts", 
              title = NULL, subtitle = NULL, xTitle = NULL, yTitle = NULL,
              xNumTicks = NULL, yNumTicks = NULL, xAxisMinMax = NULL,
              yAxisMinMax = NULL, fillColor = "cyan", lineColor = "black",
              lineStyle = "solid", lineWidth = 1, plotAreaColor = "gray90",
              gridColor = "white", gridLineWidth = 1, gridLineStyle = "solid",
              maxNumPanels = 100, reportProgress = rxGetOption("reportProgress"),
              print = TRUE, ...) 



formula describing the data to plot. It should take the form of ~x|g1 + g2 where g1 and g2 are optional conditioning factor variables and x is the name of a variable or an on-the-fly factorization F(x). Other expressions of x are not supported.


either an RxXdfData object, a character string specifying the .xdf file, or a data frame containing the variable to plot.


character string specifying the variable to use as probability weights for the observations.


character string specifying the variable to use as frequency weights for the observations.


number of breaks to use to cut numeric data, including the upper and lower bounds.


low value used for cutting numeric data.


high value used for cutting numeric data.


levels to exclude if the histogram variable is a factor.


levels to keep if the histogram variable is a factor.


name of a logical variable in the data set (in quotes) or a logical expression using variables in the data set to specify row selection. For example, rowSelection = "old" will use only observations in which the value of the variable old is TRUE. rowSelection = (age > 20) & (age < 65) & (log(income) > 10) will use only observations in which the value of the age variable is between 20 and 65 and the value of the log of the income variable is greater than 10. The row selection is performed after processing any data transformations (see the arguments transforms or transformFunc). As with all expressions, rowSelection can be defined outside of the function call using the expression function.


an expression of the form list(name = expression, ...) representing the first round of variable transformations. As with all expressions, transforms (or rowSelection) can be defined outside of the function call using the expression function.


a named list containing objects that can be referenced by transforms, transformsFunc, and rowSelection.


variable transformation function. See rxTransform for details.


character vector of input data set variables needed for the transformation function. See rxTransform for details.


character vector defining additional R packages (outside of those specified in rxGetOption("transformPackages")) to be made available and preloaded for use in variable transformation functions, e.g., those explicitly defined in RevoScaleR functions via their transforms and transformFunc arguments or those defined implicitly via their formula or rowSelection arguments. The transformPackages argument may also be NULL, indicating that no packages outside rxGetOption("transformPackages") will be preloaded.


user-defined environment to serve as a parent to all environments developed internally and used for variable data transformation. If transformEnvir = NULL, a new "hash" environment with parent baseenv() is used instead.


number of blocks to read for each chunk of data read from the data source.


character string specifying "Counts" or "Percent".


main title for the plot. Alternatively main can be used.


subtitle (at the bottom) for the plot. Alternatively sub can be used.


title for the X axis. Alternatively xlab can be used.


title for the Y axis. Alternatively ylab can be used.


number of tick marks on X axis (ignored for factor variables).


number of tick marks on Y axis.


numeric vector of length 2 containing a minimum and maximum value for the X axis. Alternatively xlim can be used.


numeric vector of length 2 containing a minimum and maximum value for the Y axis. Alternatively ylim can be used.


fill color for histogram. Use colors to see color names.


line color for border of histogram.


line style for border of histogram: "blank", "solid", "dashed", ``"dotted", "dotdash", "longdash", or "twodash".


line width for border of histogram. Alternatively lwd can be used.


background color for the plot area.


color for grid lines.


line width for grid lines.


line style for grid lines.


integer specifying the maximum number of panels to plot. The number of panels is determined by the product of the number of levels of each conditioning variable. If the number of panels exceeds the maxNumPanels an error is given and the plot is not drawn. If maxNumPanels is NULL, it is ignored.


integer value with options:

  • 0: no progress is reported.
  • 1: the number of processed rows is printed and updated.
  • 2: rows processed and timings are reported.
  • 3: rows processed and all timings are reported.


logical. If TRUE, the plot is printed. If FALSE, and the lattice package is loaded, an lattice plot object is returned invisibly and can be printed later.


additional arguments to be passed directly to the underlying barchart or xyplot function.


rxHistogram calls rxCube to perform computations and uses the lattice graphics package (barchart or xyplot) to create the plot. The rxHistogram function will attempt bin continuous data in reasonable intervals. For faster computation (using a bin for every integer value), use the F() function around the variable. Descriptive argument names are used to facilitate quick and easy plotting and self-documenting code for new R users.


An object of class "trellis". It is automatically printed within the function.


Microsoft Corporation Microsoft Technical Support

See Also

rxLinePlot, rxCube, histogram.


 # Examples using airline data
 airlineData <- file.path(rxGetOption("sampleDataDir"), "AirlineDemoSmall.xdf")
 # Use the F() function to quickly compute bins for each integer level
 rxHistogram(~F(CRSDepTime), data = airlineData)
 # Specify the approximate number of breaks
 rxHistogram(~CRSDepTime, numBreaks=11, data = airlineData)

 # Examples using census data subsample
 censusWorkers <- file.path(rxGetOption("sampleDataDir"), "CensusWorkers")
 # Create panels for each of the 3 states
 rxHistogram(~ sex | state, data = censusWorkers)
 # Repeat, printing x axis labels at an angle, and all panels in a row
 rxHistogram(~ sex | state, scales = list(x = list(rot = 30)), 
     data = censusWorkers, layout = c(3,1))
 # Create panels for age for each sex for each state
 rxHistogram(~ age | sex + state, data = censusWorkers)
 # Specify how wage income should be broken into bins
 rxHistogram(~ incwage | state + sex, title="Wage Income Up To 100,000", 
   endVal = 100000, numBreaks=21, data = censusWorkers)

 # Show panels for each state on a separate page
 numCols <- 1
 numRows <- 2
 ## Not run:

par(ask=TRUE) # Set ask to pause between each plot
## End(Not run) 

 rxHistogram(~ age | sex + state, data = censusWorkers, layout=c(numCols, numRows)) 

 # Create a jpeg file for each page, named myplot001.jpeg, etc
 ## Not run:

rxHistogram(~ age | sex + state, data = censusWorkers, 
  blocksPerRead=6, layout=c(numCols, numRows)) 
## End(Not run)