Cloud computing guide for researchers – Getting started with AI and Machine Learning
We are on the verge of the biggest revolution in computing since Alan Turing formalised the concepts of computation. Today software is hand-made, relying on highly-skilled craftspeople to intricately tell computers how to do useful work. A Cambrian explosion in computing is imminent, as artificial intelligence and machine learning are enabling computers to learn from data. But what is the reality beyond the hype?
There is a huge range of techniques, tools, and technologies available now for you to use, without having to have a specialist PhD in AI or machine learning. There are great opportunities to accelerate and improve your research projects with AI across research domains such as chemistry, engineering, environmental and earth sciences, genomics, humanities, physics, and social sciences. The Microsoft AI platform is easy to use and perfect for all researchers - https://microsoft.com/ai. Here we show you how to get started with artifical intelligence (AI) and machine learning (ML), but this is only the tip of the iceberg in what you can achieve.
AI and ML made easy
Microsoft Azure provides powerful support for AI and machine learning for experts and beginners alike, be it TensorFlow or CNTK, Apache Spark or Data Bricks, scaling out on the latest GPUs. As the Microsoft AI platform is so extensive, it can be daunting to know where to start. So here are some suggestions to get you going, and grow your confidence to explore the full power of the Microsoft AI platform.
- We love Jupyter notebooks, so try out the Jupyter Notebooks-as-a-Service on Azure. They are free, executable and shareable over the web. Organize your notebooks and datasets in one centralized location. Libraries are saved automatically and can be viewed from any device, anywhere. These are a great way to do research in a reproducible way. You can start from scratch or upload your existing notebooks to Azure at https://notebooks.azure.com Here is one that Matthew Johnson at Microsoft Research uses to teach deep learning at the University of Cambridge - https://aka.ms/jupyterdeeplearning
- Azure has a huge range of different virtual machines for you to use in the cloud. There are a number of pre-built images (VMs with software pre-installed), and one of our favourites is the Data Science Virtual Machines, on Linux and Windows. You can create a basic VM (e.g. Ubuntu Linux) and install whatever software you like by following these instructions. We suggest that you start with a small VM (A1) to get started while you are getting to grips with the basics. You can easily rebuild/reboot a bigger VM (many cores and large memory) once you are familiar with the environment. There is even a deep learning virtual machine with Tensorflow, Caffe2, CNTK, Chainer pre-installed, along with Anaconda Python, Jupyter notebooks, and more tools that you can run on the most powerful GPUs.
- Explore Azure Machine Learning that is a complete end-to-end, easy to use, web-based system to experiment with your own machine learning algorithms. It makes it easy to test and deploy machine learning models, including with your own Python and R code using standard libraries like Sci-Kit Learn. There are many walkthroughs and tutorials to get you started. See https://studio.azureml.net/
- Image classification is a key computer vision technique that has many uses, and there are some excellent tools to help you build a full pipeline to create a dedicated classifier for you project. It is not easy though, and requires a high degree of expertise. The Custom Vision Service makes this much more straightforward, while using a state-of-the-art deep neural network behind the scenes to train a model based on your own input image data. You can build a remarkably accurate classifier in just a few minutes by following this short tutorial - /en-us/azure/cognitive-services/custom-vision-service/getting-started-build-a-classifier. (This isjust one useful dataset you can use to play with - https://www.robots.ox.ac.uk/~vgg/data/pets/)
- When you do need to build your own machine learning pipeline, training using many machines, then it can take a bit of work to create the infrastructure to do this, even in the cloud. Azure Batch AI provides you with a streamlined way of spinning a GPU cluster of machines for training your model with whatever AI framework you prefer. It makes it easy t:o provision clusters of GPUs or CPUs on demand; installing software in a container or with a script; automatic or manual scaling to manage costs; access to low priority virtual machines for learning and experimentation, and; mounting shared storage volumes for training and output data.
Machine Learning Services
When you are spending hours working on your AI and ML research, you want to have a smooth workflow and tools that make it really easy. The Machine Learning Workbench is a downloadable desktop application and command-line interface for Windows and MacOS that does just that. Built-in data preparation learns your data preparation steps as you perform them. Project management, run history, and notebook integration bolsters your productivity. Take advantage of the best open source frameworks, including TensorFlow, Cognitive Toolkit, Spark ML, and scikit-learn. You can then use the Experimentation Service to easily scale up on virtual machines or scale out using Spark clusters. Proactively manage model performance, identify the best model, and promote it using data-driven insights. Collaborate and share solutions using popular Git repositories.
Running and developing models is streamlined, but versioning them, and tracking which ones are being used in production, versus experimentally, can be a real headache. Azure Machine Learning Model Management allows you to do: model versioning; tracking models in production; deploying models to production through AzureML Compute Environment with Azure Container Service and Kubernetes; creating Docker containers with the models and testing them locally, and; automated model retraining.
You can also use AI tools for Visual Studio Code (on Mac, Linux, and Windows), to develop across desktop, cloud, and edge devices. This extension to VS Code allows you to build, test, and deploy Deep Learning / AI solutions. It features a seamless integration with Azure Machine Learning, notably a run history view, detailing the performance of previous trainings and custom metrics. It offers a samples explorer view, allowing to browse and bootstrap new project with Microsoft Cognitive Toolkit (previously known as CNTK), Google TensorFlow, and other deep-learning framework. Finally, it provides an explorer for compute targets, which enables you to submit jobs to train models on remote environments like Azure Virtual Machines or Linux servers with GPU.
AI School and LearnAnalytics@Microsoft
This only scratches the surface of what is possible with the Microsoft AI platform. We've created AI School so that you can dig deeper into more of what is possible. So have fun learning more about the art of the possible, and how easy it is at https://aischool.microsoft.com/learning-paths
We also have specialist resources for data scientists with LearnAnalytics@MS. This includes webinars, on-demand videos, and catalogue of classroom (in-person) training.
Need access to Microsoft Azure?
There are several ways you can get access to Microsoft Azure for your research. Your university may already make Azure available to you, so first port of call is to speak to your research computing department. There are also other ways for you to start experimenting with the cloud: