rxOpen-methods: Managing RevoScaleR Data Source Objects

Description

These functions manage RevoScaleR data source objects.

Usage

  rxOpen(src, mode = "r")
  rxClose(src, mode = "r")
  rxIsOpen(src, mode = "r")
  rxReadNext(src)
  rxWriteNext(from, to, ...)

Arguments

from

data frame object.

src

RxDataSource object.

to

RxDataSource object.

mode

character string specifying the mode (r or w) to open the file.

...

any other arguments to be passed on.

Value

For rxOpen and rxClose, a logical indicating whether the operation was successful. For rxIsOpen, a logical indicating whether or not the RxDataSource is open for the specified mode. For rxReadNext, either a data frame or a list depending upon the value of the returnDataFrame property within src.

Author(s)

Microsoft Corporation Microsoft Technical Support

See Also

rxNewDataSource, RxXdfData.

Examples


 ds <- RxXdfData(file.path(rxGetOption("sampleDataDir"), "claims.xdf"))
 # ds contains only one block of data
 rxOpen(ds) # must open the file before rxReadNext
 rxReadNext(ds) # get the first block
 rxReadNext(ds)
 rxClose(ds)

 # Use a data source to compute means by processing the data in chunks
 # Data processing functions: for each chunk, compute sums of columns and
 # number of rows, then update the results computed from previous chunks
 processData <- function(dframe)
   list(sumCols = colSums(dframe), numRows = nrow(dframe))
 updateResults <- function(x, y)
   list(sumCols = x$sumCols + y$sumCols, numRows = x$numRows + y$numRows)

 # Create data source
 censusWorkers <- file.path(rxGetOption("sampleDataDir"), "CensusWorkers.xdf")
 ds <- RxXdfData(censusWorkers, varsToKeep = c("age", "incwage"),
                 blocksPerRead = 2)

 # Process data and update results
 rxOpen(ds)
 resList <- processData(rxReadNext(ds))
 while(TRUE)
 {
     df <- rxReadNext(ds)
     if (nrow(df) == 0)
         break
     resList <- updateResults(resList, processData(df))
 }
 rxClose(ds)

 # Compute the means of the variables from the accumulated results
 varMeans <- resList$sumCols / resList$numRows
 varMeans