Million Song Dataset in Azure SQL DB / SQL Server
Importing and using the Million Song Dataset in Azure SQL DB or SQL Server (2017+) to build a recommendation service for songs.
Getting Started
Prerequisites
First, deploy an Azure SQL database, SQL Server (2017+)here. This sample correctly on both SQL Server for Windows and also on SQL Server for Linux. (Do note that for Linux, you will need to adjust paths accordingly as the scripts assume Windows).
Next, download and copy the following files to a folder on your computer. The sample scripts assume this folder is C:\MSD on Windows; please modify accordingly based on your paths and OS versions.
- Unique songs
- User taste profiles (please un-zip this file manually in the same folder)
- Known mismatches of song IDs - this data is used to correct known data quality issues
Quickstart
Clone this repo (or download a ZIP file), move the *.FMT files to C:\MSD (or the path of your choice, provided you modify the references to that path in the .SQL scripts accordingly)
If you are using Azure SQL, files needs to be copied to an Azure Blob Store so that they can be imported as described here:
Examples of Bulk Access to Data in Azure Blob Storage
Then proceed to execute the .SQL scripts in sequence! Do note that importing the data (1_ImportSourceTables.SQL) can take a few minutes depending on the performance of your computer.
Graph data in SQL Server and Azure SQL
Please refer to these documentation links for more details on the new functionality:
- An overview of Graph data in SQL Server
- Architecture details for Graph data in SQL Server
- The official sample for Graph data in SQL Server
Dataset Citations
Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), 2011.
The Echo Nest Taste profile subset, the official user data collection for the Million Song Dataset, available here.
More information about the data set and sources
More information about the Million Song Dataset and subsets / derivative datasets are available at: