Sample data in Azure blob containers, SQL Server, and Hive tables

The following articles describe how to sample data that is stored in one of three different Azure locations:

This sampling task is a step in the Team Data Science Process (TDSP).

Why sample data?

If the dataset you plan to analyze is large, it's usually a good idea to down-sample the data to reduce it to a smaller but representative and more manageable size. Downsizing may facilitate data understanding, exploration, and feature engineering. This sampling role in the Cortana Analytics Process is to enable fast prototyping of the data processing functions and machine learning models.