rxLorenz: Lorenz Curve and Gini Coefficient

Description

Compute and plot an empirical Lorenz curve from a variable in a data set, optionally specifiying a separate variable from which to compute the y-values for the curve. Compute the Gini Coefficient from the Lorenz curve data. Appropriate for big data sets since data is binned with computations performed in one pass, rather than sorting the data as part of the computation process.

Usage

  rxLorenz(orderVarName, valueVarName = orderVarName, data, numBreaks = 1000,
    pweights = NULL, fweights = NULL, blocksPerRead = 1, 
    reportProgress = 1, verbose = 0)

 ## S3 method for class `rxLorenz':
rxGini  ( x )

 ## S3 method for class `rxLorenz':
plot  (x, title = NULL, subtitle = NULL, 
    xTitle = NULL, yTitle = NULL, lineColor = NULL,
    lineStyle = "solid", lineWidth = 2, equalityGridLine = TRUE, 
    equalityColor = "grey75", equalityStyle = NULL, equalityWidth = 2, ...) 

Arguments

orderVarName

A character string with the name of the variable to use in computing approximate quantiles.

valueVarName

A character string with the name of the variable to use to compute the mean values per quantile. Can be the same as orderVarName.

data

data frame, character string containing an .xdf file name (with path), or RxDataSource-class object representing a data set containing the actual and observed variables.

numBreaks

integer specifiying the number of breaks to use in comuting approximate quantiles.

pweights

character string specifying the variable to use as probability weights for the observations.

fweights

character string specifying the variable to use as frequency weights for the observations.

blocksPerRead

number of blocks to read for each chunk of data read from the data source.

reportProgress

integer value with options:

  • 0: no progress is reported.
  • 1: the number of processed rows is printed and updated.
  • 2: rows processed and timings are reported.
  • 3: rows processed and all timings are reported.

verbose

integer value. If 0, no additional output is printed. If 1, additional information is printed as summary statistics are computed.

x

output object from rxLorenz function.

title

main title for the plot.

subtitle

subtitle (at the bottom) for the plot.

xTitle

title for the X axis.

yTitle

title for the Y axis.

lineColor

character or integer vector specifying line color for the Lorenz curve. See colors for a list of available colors.

lineStyle

line style for line plot: "blank", "solid", "dashed", "dotted", "dotdash", "longdash", or "twodash". Specify "blank" for no line, or set type to "p".

lineWidth

a positive number specifiying the line width for line plot. The interpretation is device-specific.

equalityGridLine

logical value. If TRUE, a diagonal grid line will be drawn representing complete equality.

equalityColor

character or integer vector specifying line color for the equality grid line. If NULL, the color of other grid lines will be used.

equalityStyle

line style for the equality grid line: "blank", "solid", "dashed", "dotted", "dotdash", "longdash", or "twodash". If NULL, the style of other gride lines will be used.

equalityWidth

a positive number specifiying the line width for line plot. If NULL, the width of other grid lines will be used.

...

Additional arguments to be passed to xyplot.

Details

rxLorenz computes the cumulative percentage values of the variable specified in valueVarName for groups binned by the orderVarname. The size of the bins is determined by numBreaks.

When plotted, the cumulative percentage values are plotted against the quantile percentages.

The Gini coefficient is computed by estimating the ratio of the area between the line of equality and the Lorenz curve to the total area under the line of equality (using trapezoidal integration). The Gini coefficient can range from 0 to 1, with 0 representing perfect equality.

Precision can be increased by increasing numBreaks.

Value

rxLorenz returns a data frame of class "rxLorenz" containing two variables: cumVals and percent. It also may have a "description" attribute containing the value variable name or description.

rxGini returns a numeric vector of length one containing the approximate Gini coefficient.

Author(s)

Microsoft Corporation Microsoft Technical Support

See Also

rxPredict, rxLogit, rxGlm, rxLinePlot, rxQuantile, rxRoc.

Examples



 ########################################################################
 # Example using simple data frames for extreme distributions
 ########################################################################
 # Lorenz curve for complete equality
 testData <- data.frame(income = rep(100, times=10))
 lorenzOut1 <- rxLorenz("income", data = testData, numBreaks = 100)
 plot(lorenzOut1)
 rxGini(lorenzOut1)

 # Extreme inequality
 testData <- data.frame(income = c(rep(0, times=99), 100))
 lorenzOut2 <- rxLorenz("income", data = testData, numBreaks = 100)
 plot(lorenzOut2, equalityWidth = 3, equalityColor = "black")
 rxGini(lorenzOut2)

 ########################################################################
 # Example using xdf file from sample data
 ########################################################################

 censusWorkers <- file.path(rxGetOption("sampleDataDir"), "CensusWorkers") 

 # Compute Lorenz data using probability weights
 lorenzOut <- rxLorenz(orderVarName = "incwage", data = censusWorkers,
   pweights = "perwt")

 # Plot the Lorenz Curve
 lorenzPlot <- plot(lorenzOut, 
   title = "Lorenz Curve for Workers from Three States",
   subtitle = "Data Source: 5 Percent Sample of U.S. 2000 Census",
   lineWidth = 3, equalityColor = "black", equalityStyle = "longdash")

 # Compute the Gini Coefficient
 giniCoef <- rxGini(lorenzOut)