Create a linear regression model using revoscalepy in Python

This Python quickstart demonstrates a linear regression model on a local Machine Learning Server, using functions from the revoscalepy library and built-in sample data.

Steps in this quicktart are executed on a Python command line, using Machine Learning Server in the default local compute context. In this context, all operations run locally: data is fetched from the data source, and the model-fitting runs in your current Python environment.

The revoscalepy library for Python contains objects, transformations, and algorithms similar to those provided for the RevoScaleR package for the R language. With revoscalepy, you can write a Python script that creates a compute context, moves data between compute contexts, transforms data, and trains predictive models using popular algorithms such as logistic and linear regression, decision trees, and more.

Note

For the SQL Server version of this tutorial, see Use Python with revoscalepy to create a model (SQL Server).

Start Python

  • On Windows, go to C:\Program Files\Microsoft\ML Server\PYTHON_SERVER, and double-click Python.exe.
  • On Linux, enter mlserver-python at the command line.

Import libraries and functions

Paste in the following statements to import libraries and functions.

import revoscalepy
import os
import pandas

from revoscalepy import RxComputeContext, RxXdfData
from revoscalepy import rx_lin_mod, rx_predict, rx_summary
from revoscalepy import RxOptions, rx_import

from pandas import Categorical

Create a data source object

The data is retrieved locally from a sample .xdf file included in your installation. In this step, set the file path and then create a data source object to load the data. The sample data provides airline delays over a given time period, for multiple airlines.

### Set the location
sample_data_path = RxOptions.get_option("sampleDataDir")

### Create the data source object
data_source = RxXdfData(os.path.join(sample_data_path, "AirlineDemoSmall.xdf"))

Create a linear regression model

In a linear regression, you model the relationship between dependent and independent variables. In this step, the duration of a delay is captured for each day of the week.

linmod_local = revoscalepy.rx_lin_mod("ArrDelay ~ DayOfWeek", data = data_source)

Predict delays

Using a prediction function, you can predict the likelihood of a delay for each day.

predict = revoscalepy.rx_predict(linmod_local, data = revoscalepy.rx_import(input_data = data_source))

Summarize data

In this last step, extract summary statistics from the sample dataset and then print the output to the console. The rx_summary function returns mean, standard deviation, and min-max values.

### Create an object to store summary data
summary = revoscalepy.rx_summary("ArrDelay ~ DayOfWeek", data = data_source)

### Send the output to the console
print(summary)

Next steps

The ability to switch compute context to a different machine or platform is a powerful capability. To see how this works, continue with the SQL Server version of this tutorial: Use Python with revoscalepy to create a model (SQL Server).

You can also review linear modeling for RevoScaleR. For linear models, the Python iplementation in revoscalepy is very similar to the R implementation in RevoScaleR.

See Also