Machine learning and data science tools on Azure Data Science Virtual Machines

Azure Data Science Virtual Machines (DSVMs) have a rich set of tools and libraries for machine learning. These resources are available in popular languages, such as Python, R, and Julia.

The DSVM supports these machine-learning tools and libraries:

Azure Machine Learning SDK for Python

For a full reference, visit Azure Machine Learning SDK for Python.

Category Value
What is it? You can use the Azure Machine Learning cloud service to develop and deploy machine-learning models. You can use the Python SDK to track your models as you build, train, scale, and manage them. Deploy models as containers, and run them in the cloud, on-premises, or on Azure IoT Edge.
Supported editions Windows (conda environment: AzureML), Linux (conda environment: py36)
Typical uses General machine-learning platform
How is it configured or installed? Installed with GPU support
How to use or run it As a Python SDK and in the Azure CLI. Activate to the conda environment AzureML on the Windows edition or activate to py36 on the Linux edition.
Link to samples Find sample Jupyter notebooks in the AzureML directory, under notebooks.

H2O

Category Value
What is it? An open-source AI platform that supports distributed, fast, in-memory, scalable machine learning.
Supported versions Linux
Typical uses General-purpose distributed, scalable machine learning
How is it configured or installed? H2O is installed in /dsvm/tools/h2o.
How to use or run it Connect to the VM with X2Go. Start a new terminal, and run java -jar /dsvm/tools/h2o/current/h2o.jar. Then, start a web browser and connect to http://localhost:54321.
Link to samples Find samples on the VM in Jupyter, under the h2o directory.

There are several other machine-learning libraries on DSVMs - for example, the popular scikit-learn package that's part of the Anaconda Python distribution for DSVMs. For a list of packages available in Python, R, and Julia, run the respective package managers.

LightGBM

Category Value
What is it? A fast, distributed, high-performance gradient-boosting (GBDT, GBRT, GBM, or MART) framework based on decision tree algorithms. Machine-learning tasks - ranking, classification, etc. - use it.
Supported versions Windows, Linux
Typical uses General-purpose gradient-boosting framework
How is it configured or installed? LightGBM is installed as a Python package on Windows. On Linux, the command-line executable is located in /opt/LightGBM/lightgbm. The R package is installed, and Python packages are installed.
Link to samples LightGBM guide

Rattle

Category Value
What is it? A graphical user interface for data mining that uses R.
Supported editions Windows, Linux
Typical uses General UI data-mining tool for R
How to use or run it As a UI tool. On Windows, start a command prompt, run R, and then inside R, run rattle(). On Linux, connect with X2Go, start a terminal, run R, and then inside R, run rattle().
Link to samples Rattle

Vowpal Wabbit

Category Value
What is it? A fast, open-source, out-of-core learning system library
Supported editions Windows, Linux
Typical uses General machine-learning library
How is it configured or installed? Windows: msi installer
Linux: apt-get
How to use or run it As an on-path command-line tool (C:\Program Files\VowpalWabbit\vw.exe on Windows, /usr/bin/vw on Linux)
Link to samples VowPal Wabbit samples

Weka

Category Value
What is it? A collection of machine-learning algorithms for data-mining tasks. You can either apply the algorithms directly, or call them from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.
Supported editions Windows, Linux
Typical uses General machine-learning tool
How to use or run it On Windows, search for Weka on the Start menu. On Linux, sign in with X2Go, and then go to Applications > Development > Weka.
Link to samples Weka samples

XGBoost

Category Value
What is it? A fast, portable, and distributed gradient-boosting (GBDT, GBRT, or GBM) library for Python, R, Java, Scala, C++, and more. It runs on a single machine, and on Apache Hadoop and Spark.
Supported editions Windows, Linux
Typical uses General machine-learning library
How is it configured or installed? Installed with GPU support
How to use or run it As a Python library (2.7 and 3.6+), R package, and on-path command-line tool (C:\dsvm\tools\xgboost\bin\xgboost.exe for Windows and /dsvm/tools/xgboost/xgboost for Linux)
Links to samples Samples are included on the VM, in /dsvm/tools/xgboost/demo on Linux, and C:\dsvm\tools\xgboost\demo on Windows.