Free ebook: Data Science with Microsoft SQL Server 2016

 We’re happy to announce the availability of a new free ebook, Data Science with Microsoft SQL Server 2016 (ISBN 9781509304318), by Buck Woody, Danielle Dean, Debraj GuhaThakurta, Gagan Bansal, Matt Conners, & Wee-Hyong Tok. Enjoy!


The world around us, every business and nearly every industry, is being transformed by technology. This disruption is driven, in part, by the intersection of three trends: a massive explosion of data, intelligence from machine learning and advanced analytics, and the economics and agility of cloud computing.

While databases power nearly every aspect of business today, they were not originally designed with this disruption in mind. Traditional databases were about recording and retrieving transactions such as orders and payments very reliably, very securely and efficiently. They were designed to enable reliable, secure, mission-critical transactional applications at small to medium scale, in on-premises datacenters.

Databases built to get ahead of today’s disruptions do very fast analyses of live data in-memory as transactions are being recorded or queried. They support very low latency advanced analytics and machine learning, such as forecasting and predictive models, on the same data, so that applications can easily embed data-driven intelligence. They allow databases to be offered as a fully managed service in the cloud, in turn making it easy to build and deploy intelligent Software as a Service (SaaS) apps.

They also provide innovative security features built for a world where a majority of data is accessible over the Internet. They support 24×7 high-availability, efficient management and database administration across platforms. They therefore enable mission critical intelligent applications to be built and managed both in the cloud and on-premises. They are exciting harbingers of a new world of ambient intelligence.

SQL Server 2016 was built for this new world, and to help businesses get ahead of today’s disruptions. It supports hybrid transactional/analytical processing, advanced analytics and machine learning, mobile BI, data integration, always encrypted query processing capabilities and in-memory transactions with persistence. It integrates advanced analytics into the database, providing revolutionary capabilities to build intelligent, high performance transactional applications.

Imagine a core enterprise application built with a database such as SQL Server. What if you could embed intelligence, i.e. advanced analytics algorithms plus data transformations, within the database itself, to make every transaction intelligent in real time? That’s now possible for the first time with R and machine learning built into SQL Server 2016.  By combining the performance of SQL Server in-memory OLTP technology as well as in-memory columnstores with R and machine learning, applications can get extraordinary analytical performance in production, as well as the throughput, parallelism, security, reliability, compliance certifications and manageability of an industrial strength database engine.

This book is the first to truly describe how you can create intelligence applications leveraging SQL Server and R.  It is an exciting book that will empower every developer to unleash the power of data driven intelligence in their organization.

Joseph Sirosh
Corporate Vice President
Data Group, Microsoft


R is one of the most popular, powerful data analytics languages and environments in use by data scientists. Actionable business data is often stored in Relational Database Management Systems (RDBMS), and one of the most widely used RDBMS is Microsoft SQL Server. Much more than a database server, it’s a rich ecostructure with advanced analytic capabilities. Microsoft SQL Server R Services combines these environments, allowing direct interaction between the data on the RDBMS and the R language, all while preserving the security and safety the RDBMS contains. In this book, you’ll learn how Microsoft has combined these two environments, how a data scientist can use this new capability, and practical, hands-on examples of using SQL Server R Services to create real-world solutions.

How this book is organized

This book breaks down into three primary sections: an introduction to the SQL Server R Services and SQL Server in general, a description and explanation of how a data scientist works in this new environment (useful, given that many data scientists work in “silos,” and this new way of working brings them in to the business development process), and practical, hands-on examples of working through real-world solutions. The reader can either review the examples, or work through them with the chapters.

Who this book is for

The intended audience for this book is technical—specifically, the data scientist—and is assumed to be familiar with the R language and environment. We do, however, introduce data science and the R language briefly, with many resources for the reader to go learn those disciplines, as well, which puts this book within the reach of database administrators, developers, and other data professionals. Although we do not cover the totality of SQL Server in this book, references are provided and some concepts are explained in case you are not familiar with SQL Server, as is often the case with data scientists.

About the authors

Wee-Hyong Tok is a senior data scientist lead at Microsoft in the Algorithms and Data Science group. Wee-Hyong has decades of database systems experience, spanning academia and industry, including deep experience driving and shipping products and services that include distributed engineering teams from Asia and the United States. Before joining Microsoft, Tok worked on indatabase analytics, demonstrating how association rule mining can be integrated into a relational database management system, Predator-Miner, which makes it possible for users to express data-mining operations using SQL queries and provides opportunities for better query optimization and processing.

Tok is instrumental in driving data-mining boot camps in Asia and was honored as a Microsoft SQL Server Most Valuable Professional for several consecutive years because of his active contributions to the database community throughout Asia. He has coauthored several books, including the first book on Azure machine learning, Predictive Analytics with Microsoft Azure Machine Learning, and has also published more than 20 peer-reviewed academic papers and journals. He has a Ph.D. in computer science from the National University of Singapore.

Buck Woody works on the Microsoft Machine Learning and Data Science Team, using data and technology to educate others on solving business and science problems. With more than 30 years of professional and practical experience in computer data technologies, he is also a popular speaker at many conferences around the world. Buck is the author of more than 650 articles and 7 books on databases and machine learning technologies. In addition, he teaches database courses and sits on the Data Science Board at the University of Washington, and specializes in data analysis techniques.

Debraj GuhaThakurta is a senior data Scientist at Microsoft in the Algorithms and Data Science group. His effort focuses on the use of different platforms and toolkits such as Microsoft’s Cortana Intelligence suite, Microsoft R Server, SQL Server, Hadoop, and Spark for creating scalable and operationalized analytical processes for business problems. Debraj has extensive industry experience in biopharma and financial forecasting domains. He has a Ph.D. in chemistry and biophysics, and post-doctoral research experience in machine learning applications in bio-informatics. He has published more than 25 peer-reviewed papers, book chapters, and patents.

Danielle Dean is a senior data scientist lead at Microsoft in the Algorithms and Data Science group. She leads a team of data scientists and engineers on endto- end analytics projects that use Microsoft's Cortana Intelligence Suite for applications ranging from automating the ingestion of data to analyzing and implementing algorithms, creating web services of these implementations, and integrating them into customer solutions or building end-user dashboards and visualizations. Danielle holds a Ph.D. in quantitative psychology from the University of North Carolina at Chapel Hill, where she studied the application of multilevel event history models to understand the timing and processes leading to events between dyads within social networks.

Gagan Bansal is a data scientist leading the development of financial forecasting capabilities in Cortana Analytics at Microsoft. Gagan joined Microsoft from Yahoo Labs, where he was a lead engineer building and deploying large-scale user modeling and scoring pipelines on both grid (Hadoop) and stream scoring systems for display-ad targeting applications. Prior to Yahoo!, he worked on social targeting in online advertising at 33Across. Before that, he worked for another startup where he was involved in the development of real-time video processing algorithms for advertising in sports broadcasts. Gagan obtained his masters in computer science from Johns Hopkins University, where he worked on pedestrian detection in videos for his thesis. Before that, he graduated with a Bachelors in Computer Science degree from Indian Institute of Technology, Delhi. Gagan enjoys working on problems related to machine learning, large-scale data processing, computer systems, and image processing.

Matt Conners is a senior data sciences program manager in Microsoft’s Algorithms and Data Sciences group. He is focused on the forecasting domain, working with customers, partners, and data scientists to operationalize machine learning financial forecasting solutions. He has extensive business operations and industry domain experience, with more than 20 years’ of financial technology experience across sales, marketing, business operations, securities, and banking. He has an undergraduate degree in economics, and master’s degrees in finance and statistics.