Discussions around JupyterHub Infrastructure at JupyterCon

One of the main discussions I have been having at JupyterCon has been around the architectural models of How JupyterHub can be deployed within academic institutions, classes or groups.

The following is a quick overview of the potential architectual solutions of Jupyterhub this has been explained in more detail at the following blog https://aka.ms/archjupyterhub

 image

Starting at the bottom we have a typical on premise scenario with VMs, Physical machines or even container based installation of Jupyter Notebooks, JupyterHub connecting to local data resources users authenticate via local LDAP, Active Directory or external auth providers such as Microsoft O365 or Google Auth.

On the left hand side we have various different scenarios for authentication and provisioning starting at the top with a LDAP/Shibboleth connected JupyterHub or VM allowing users to Single Sign On. As Microsoft we have a dedicated Data Science VM image for Windows & Linux which comes preinstalled with all the necessary tools and JupyterHub.

The next option is using Azure Active Directory to provide Single Sign On to domain controlled VMs or Containers.

The Final scenario is using Windows Integrated Authentication to Single Sign on to VMs or Container based machines

One of the key of utilising Azure is that we support all operating systems and container management services and allow you to build and run Data Science application on Windows and Linux and provided a dedicated Data Science Virtual Machine fully configured with all the appropriate tools and services for your experiments

Utilizing services such as Azure AD Connect and Azure ExpressRoute allows you create a Hybrid Data Center connecting Services and Data hosted in the Azure Cloud directly to the Data and Services on premise.

The final option based on the right had side is utilizing Microsoft Azure Notebooks, This is a free service that provides Jupyter notebooks along with supporting packages for R, Python and F# as a service. This means you can just login using your Microsoft Account and get going since no installation/setup is necessary. Typical usage includes schools/instruction, giving webinars, learning languages, sharing ideas, etc. The service is provided by the Python team @ Microsoft, which is part of the Data Group.

If your interested in understanding the various architecture scenarios in more depth blog post above.

image

Students and Academics generally utilise Notebooks and associated data and services from the following perspective.

Typically students and academics undertaking Data Science courses access on premise data/services and devices, they typical utilise BYOD devices or lab based managed equipment.

The Cloud access offers services such as Microsoft Data Science VM, Containers and services such as Hdinsight, CNTK, Tensor Flow, HDInsight, Spark, Machine Learning and the thousands of community based projects and solutions which are utilised.

Source Control is essential for collaboration and sharing of output so solutions such as VSTS and Github become essential is the distribution & Collaboration of Notebooks and using services such as https://mybinder.org to render and share Notebooks hosted in Github Repos.

Microsoft provides a number of FREE cloud services which are suitable for Data Science/Jupyter users including Microsoft Cognitive Services https://aka.ms/cognitive Azure Notebooks https://Notebooks.azure.com and Azure Machine Learning Studio https://studio.azureml.net/ accessing data on and off premise is also key to your experiments and assignments, Microsoft provides Azure Data Stores and various mechanism for sharing your models and experiment with the world in the form of Cortana Intelligence Gallery https://gallery.cortanaintelligence.com/.

Other Resources

Data Science VM - https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-data-science-provision-vm

Data Science Docker Images https://blogs.msdn.microsoft.com/uk_faculty_connection/2017/02/13/deep-learning-using-cntk-caffe-keras-theanotorch-tensorflow-on-docker-with-microsoft-azure-batch-shipyard/ 

TensorFlow on Azure https://blogs.msdn.microsoft.com/uk_faculty_connection/2017/03/27/azure-gpu-tensorflow-step-by-step-setup/