Python 教程:通过 SQL 机器学习使用线性回归来预测雪橇租赁Python tutorial: Predict ski rental with linear regression with SQL machine learning

适用于:Applies to: 是SQL Server 2017 (14.x)SQL Server 2017 (14.x)yesSQL Server 2017 (14.x)SQL Server 2017 (14.x) 及更高版本 是Azure SQL 托管实例Azure SQL Managed InstanceYesAzure SQL 托管实例Azure SQL Managed Instance适用于:Applies to: 是SQL Server 2017 (14.x)SQL Server 2017 (14.x)yesSQL Server 2017 (14.x)SQL Server 2017 (14.x) and later 是Azure SQL 托管实例Azure SQL Managed InstanceYesAzure SQL 托管实例Azure SQL Managed Instance

在这个由四个部分组成的教程系列中,你将在 SQL Server 机器学习服务中或大数据群集上使用 Python 和线性回归来预测雪橇租赁次数。In this four-part tutorial series, you will use Python and linear regression in SQL Server Machine Learning Services or on Big Data Clusters to predict the number of ski rentals. 本教程使用 Azure Data Studio 中的 Python 笔记本The tutorial use a Python notebook in Azure Data Studio.

在这个由四个部分组成的教程系列中,你将在 SQL Server 机器学习服务中使用 Python 和线性回归来预测雪橇租赁次数。In this four-part tutorial series, you will use Python and linear regression in SQL Server Machine Learning Services to predict the number of ski rentals. 本教程使用 Azure Data Studio 中的 Python 笔记本The tutorial use a Python notebook in Azure Data Studio.

在这个由四个部分组成的教程系列中,你将在 Azure SQL Server 托管实例机器学习服务中使用 Python 和线性回归来预测雪橇租赁次数。In this four-part tutorial series, you will use Python and linear regression in Azure SQL Managed Instance Machine Learning Services to predict the number of ski rentals. 本教程使用 Azure Data Studio 中的 Python 笔记本The tutorial use a Python notebook in Azure Data Studio.

假设你有一家雪橇租赁公司,你希望预测未来某个日期的雪橇租赁次数。Imagine you own a ski rental business and you want to predict the number of rentals that you'll have on a future date. 此信息可帮助你准备好库存、人员和设施。This information will help you get your stock, staff, and facilities ready.

在本系列的第一部分,你将设置必备组件。In the first part of this series, you'll get set up with the prerequisites. 在第二和第三部分,你将在某个笔记本中开发一些 Python 脚本,以准备数据并训练机器学习模型。In parts two and three, you'll develop some Python scripts in a notebook to prepare your data and train a machine learning model. 然后,在第三部分,你将使用 T-SQL 存储过程在数据库中运行这些 Python 脚本。Then, in part three, you'll run those Python scripts inside the database using T-SQL stored procedures.

本文将指导如何进行以下操作:In this article, you'll learn how to:

  • 导入示例数据库Import a sample database

第二部分中,你将了解如何将数据从数据库加载到 Python 数据帧中,并在 Python 中准备数据。In part two, you'll learn how to load the data from a database into a Python data frame, and prepare the data in Python.

第三部分介绍如何在 Python 中定型线性回归模型。In part three, you'll learn how to train a linear regression model in Python.

第四部分中,你将了解如何将模型存储到数据库中,然后根据你在第二和第三部分中开发的 Python 脚本来创建存储过程。In part four, you'll learn how to store the model in a database, and then create stored procedures from the Python scripts you developed in parts two and three. 存储过程将在服务器上运行,以便基于新数据进行预测。The stored procedures will run on the server to make predictions based on new data.

先决条件Prerequisites

  • Python IDE - 本教程在 Azure Data Studio 中使用 Python 笔记本。Python IDE - This tutorial uses a Python notebook in Azure Data Studio. 有关详细信息,请参阅如何使用 Azure Data Studio 中的笔记本For more information, see How to use notebooks in Azure Data Studio.

  • SQL 查询工具 - 本教程假定使用的是 Azure Data StudioSQL query tool - This tutorial assumes you're using Azure Data Studio.

  • 附加 Python 包 - 在本教程系列中的示例所使用的以下 Python 包中,有些可能是默认未安装的:Additional Python packages - The examples in this tutorial series use the following Python packages that may not be installed by default:

    • pandaspandas
    • pyodbcpyodbc
    • sklearnsklearn

    若要安装这些包:To install these packages:

    1. 在 Azure Data Studio 笔记本中,选择“管理包”。In your Azure Data Studio notebook, select Manage Packages.
    2. 在“管理包”窗格中,选择“添加新包”选项卡。In the Manage Packages pane, select the Add new tab.
    3. 对于以下每个包,输入包名称,单击“搜索”,然后单击“安装”。For each of the following packages, enter the package name, click Search, then click Install.

    作为替代方法,你可以打开“命令提示符”,更改为在 Azure Data Studio 中使用的 Python 版本的安装路径(例如 cd %LocalAppData%\Programs\Python\Python37-32),然后针对每个包运行 pip installAs an alternative, you can open a Command Prompt, change to the installation path for the version of Python you use in Azure Data Studio (for example, cd %LocalAppData%\Programs\Python\Python37-32), then run pip install for each package.

还原示例数据库Restore the sample database

本教程中使用的示例数据库已保存到 .bak 数据库备份文件,以供下载和使用。The sample database used in this tutorial has been saved to a .bak database backup file for you to download and use.

备注

如果在大数据群集上使用机器学习服务,请了解如何将数据库还原成 SQL Server 大数据群集主实例If you are using Machine Learning Services on Big Data Clusters, see how to Restore a database into the SQL Server big data cluster master instance.

  1. 下载文件 TutorialDB.bakDownload the file TutorialDB.bak.

  2. 使用以下详细信息,按 Azure Data Studio 中从备份文件还原数据库中的说明操作:Follow the directions in Restore a database from a backup file in Azure Data Studio, using these details:

    • 从下载的 TutorialDB.bak 文件中导入Import from the TutorialDB.bak file you downloaded
    • 将目标数据库命名为“TutorialDB”Name the target database "TutorialDB"
  3. 可以通过查询 dbo.rental_data 表来验证是否存在还原的数据集:You can verify that the restored database exists by querying the dbo.rental_data table:

    USE TutorialDB;
    SELECT * FROM [dbo].[rental_data];
    
  1. 下载文件 TutorialDB.bakDownload the file TutorialDB.bak.

  2. 按照 SQL Server Management Studio 中的将数据库还原到托管实例中的说明,使用以下详细信息:Follow the directions in Restore a database to a Managed Instance in SQL Server Management Studio, using these details:

    • 从下载的 TutorialDB.bak 文件中导入Import from the TutorialDB.bak file you downloaded
    • 将目标数据库命名为“TutorialDB”Name the target database "TutorialDB"
  3. 可以通过查询 dbo.rental_data 表来验证是否存在还原的数据集:You can verify that the restored database exists by querying the dbo.rental_data table:

    USE TutorialDB;
    SELECT * FROM [dbo].[rental_data];
    

清理资源Clean up resources

如果不打算继续学习本教程,请删除 TutorialDB 数据库。If you're not going to continue with this tutorial, delete the TutorialDB database.

后续步骤Next steps

在本系列教程的第一部分中,你已完成以下步骤:In part one of this tutorial series, you completed these steps:

  • 安装必备组件Installed the prerequisites
  • 导入示例数据库Import a sample database

若要从 TutorialDB 数据库中准备数据,请按照本教程系列的第二部分进行操作:To prepare the data from the TutorialDB database, follow part two of this tutorial series: