Create a linear regression model using revoscalepy in Python
This Python quickstart demonstrates a linear regression model on a local Machine Learning Server, using functions from the revoscalepy library and built-in sample data.
Steps in this quicktart are executed on a Python command line, using Machine Learning Server in the default local compute context. In this context, all operations run locally: data is fetched from the data source, and the model-fitting runs in your current Python environment.
The revoscalepy library for Python contains objects, transformations, and algorithms similar to those provided for the RevoScaleR package for the R language. With revoscalepy, you can write a Python script that creates a compute context, moves data between compute contexts, transforms data, and trains predictive models using popular algorithms such as logistic and linear regression, decision trees, and more.
For the SQL Server version of this tutorial, see Use Python with revoscalepy to create a model (SQL Server).
- On Windows, go to C:\Program Files\Microsoft\ML Server\PYTHON_SERVER, and double-click Python.exe.
- On Linux, enter mlserver-python at the command line.
Import libraries and functions
Paste in the following statements to import libraries and functions.
import revoscalepy import os import pandas from revoscalepy import RxComputeContext, RxXdfData from revoscalepy import rx_lin_mod, rx_predict, rx_summary from revoscalepy import RxOptions, rx_import from pandas import Categorical
Create a data source object
The data is retrieved locally from a sample .xdf file included in your installation. In this step, set the file path and then create a data source object to load the data. The sample data provides airline delays over a given time period, for multiple airlines.
### Set the location sample_data_path = RxOptions.get_option("sampleDataDir") ### Create the data source object data_source = RxXdfData(os.path.join(sample_data_path, "AirlineDemoSmall.xdf"))
Create a linear regression model
In a linear regression, you model the relationship between dependent and independent variables. In this step, the duration of a delay is captured for each day of the week.
linmod_local = revoscalepy.rx_lin_mod("ArrDelay ~ DayOfWeek", data = data_source)
Using a prediction function, you can predict the likelihood of a delay for each day.
predict = revoscalepy.rx_predict(linmod_local, data = revoscalepy.rx_import(input_data = data_source))
In this last step, extract summary statistics from the sample dataset and then print the output to the console. The rx_summary function returns mean, standard deviation, and min-max values.
### Create an object to store summary data summary = revoscalepy.rx_summary("ArrDelay ~ DayOfWeek", data = data_source) ### Send the output to the console print(summary)
The ability to switch compute context to a different machine or platform is a powerful capability. To see how this works, continue with the SQL Server version of this tutorial: Use Python with revoscalepy to create a model (SQL Server).
You can also review linear modeling for RevoScaleR. For linear models, the Python iplementation in revoscalepy is very similar to the R implementation in RevoScaleR.