rx_spark_list_data

Usage

revoscalepy.rx_spark_list_data(show_description: bool,
    compute_context: revoscalepy.computecontext.RxComputeContext.RxComputeContext = None)

Description

Use these functions to manage the objects cached in the Spark memory system. These functions are only applicable when using RxSpark compute context.

Arguments

show_description

Bool value, indicating whether or not to print out the detail to console.

compute_context

RxSpark compute context object.

Returns

List of all objects cached in Spark memory system for rxSparkListData.

Example

from revoscalepy import RxOrcData, rx_spark_connect, rx_spark_list_data, rx_lin_mod, rx_spark_cache_data
rx_spark_connect()
col_info = {"DayOfWeek": {"type": "factor"}}
df = RxOrcData(file = "/share/sample_data/AirlineDemoSmallOrc", column_info = col_info)
df = rx_spark_cache_data(df, True)

# After the first run, a Spark data object is added into the list
rx_lin_mod("ArrDelay ~ DayOfWeek", data = df)
rx_spark_list_data(True)