Tutorial: Python data analytics for SQL developers

APPLIES TO: yesSQL Server (Windows only) noAzure SQL Database noAzure SQL Data Warehouse noParallel Data Warehouse

In this tutorial for SQL programmers, learn about Python integration by building and deploying a Python-based machine learning solution using a NYCTaxi_sample database on SQL Server. You'll use T-SQL, SQL Server Management Studio, and a database engine instance with Machine Learning Services and Python language support.

This tutorial introduces you to Python functions used in a data modeling workflow. Steps include data exploration, building and training a binary classification model, and model deployment. You'll use sample data from the New York City Taxi and Limosine Commission, and the model you will build predicts whether a trip is likely to result in a tip based on the time of day, distance travelled, and pick-up location.

All of the Python code used in this tutorial is wrapped in stored procedures that you create and run in Management Studio.

Note

This tutorial is available in both R and Python. For the R version, see In-database analytics for R developers.

Overview

The process of building a machine learning solution is a complex one that can involve multiple tools, and the coordination of subject matter experts across several phases:

  • obtaining and cleaning data
  • exploring the data and building features useful for modeling
  • training and tuning the model
  • deployment to production

Development and testing of the actual code is best performed using a dedicated development environment. However, after the script is fully tested, you can easily deploy it to SQL Server using Transact-SQL stored procedures in the familiar environment of Management Studio. Wrapping external code in stored procedures is the primary mechanism for operationalizing code in SQL Server.

Whether you are a SQL programmer new to Python, or a Python developer new to SQL, this multi-part tutorial introduces a typical workflow for conducting in-database analytics with Python and SQL Server.

After the model has been saved to the database, you can call the model for predictions from Transact-SQL by using stored procedures.

Prerequisites

All tasks can be done using Transact-SQL stored procedures in Management Studio.

This tutorial assumes familiarity with basic database operations such as creating databases and tables, importing data, and writing SQL queries. It does not assume you know Python. As such, all Python code is provided.

Next steps