Million Song Dataset in Azure SQL DB / SQL Server

Importing and using the Million Song Dataset in Azure SQL DB or SQL Server (2017+) to build a recommendation service for songs.

Getting Started

Prerequisites

First, deploy an Azure SQL database, SQL Server (2017+)here. This sample correctly on both SQL Server for Windows and also on SQL Server for Linux. (Do note that for Linux, you will need to adjust paths accordingly as the scripts assume Windows).

Next, download and copy the following files to a folder on your computer. The sample scripts assume this folder is C:\MSD on Windows; please modify accordingly based on your paths and OS versions.

Quickstart

Clone this repo (or download a ZIP file), move the *.FMT files to C:\MSD (or the path of your choice, provided you modify the references to that path in the .SQL scripts accordingly)

If you are using Azure SQL, files needs to be copied to an Azure Blob Store so that they can be imported as described here:

Examples of Bulk Access to Data in Azure Blob Storage

Then proceed to execute the .SQL scripts in sequence! Do note that importing the data (1_ImportSourceTables.SQL) can take a few minutes depending on the performance of your computer.

Graph data in SQL Server and Azure SQL

Please refer to these documentation links for more details on the new functionality:

Dataset Citations

Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), 2011.

The Echo Nest Taste profile subset, the official user data collection for the Million Song Dataset, available here.

More information about the data set and sources

More information about the Million Song Dataset and subsets / derivative datasets are available at: