The Joy (and Hard Work) of Machine Learning
This blog post is authored by Joseph Sirosh .
Few people appreciate the enormous potential of machine learning (ML) in enterprise applications. I was lucky enough to get a taste of its potential benefits just a few months into my first job. It was 1995 and credit card issuers were beginning to adopt neural network models to detect credit card fraud in real-time. When a credit card is used, transaction data from the point of sale system is sent to the card issuer's credit authorization system where a neural network scores for the probability of fraud. If the probability is high, the transaction is declined in real-time. I was a scientist building such models and one of my first model deliveries was for a South American bank. When the model was deployed, the bank identified over a million dollars of previously undetected fraud on the very first day. This was a big eye-opener. In the years since, I have seen ML deliver huge value in diverse applications such as demand forecasting, failure and anomaly detection, ad targeting, online recommendations and virtual assistants like Cortana. By embedding ML into their enterprise systems, organizations can improve customer experience, reduce the risk of systemic failures, grow revenue and realize significant cost savings.
However, building ML systems is slow, time-consuming and error prone. Even though we are able to analyze very large data sets these days and deploy at very high transaction rates, several bottlenecks remain:
- ML system development requires deep expertise. Even though the core principles of ML are now accessible to a wider audience, talented data scientists are as hard to hire today as they were two decades ago.
- Practitioners are forced to use a variety of tools to collect, clean, merge and analyze data. These tools have a steep learning curve and are not integrated. Commercial ML software is expensive to deploy and maintain.
- Building and verifying models requires considerable experimentation. Data scientists often find themselves limited by compute and storage because they need to run a large number of experiments that generate considerable new data.
- Software tools do not support scalable experimentation or methods for organizing experiment runs. The act of collaborating with a team on experiments, sharing derived variables, scripts, etc. is manual and ad-hoc, without tools support. Evaluating and debugging statistical models remains a challenge.
Data scientists work around these limitations by writing custom programs and by doing undifferentiated heavy lifting as they perform their ML experiments. But it gets harder in the deployment phase. Deploying ML models in a mission-critical business process such as real-time fraud prevention or ad targeting requires sophisticated engineering:
- Typically, ML models that have been developed offline now have to be re-implemented in a language such as C++, C# or Java.
- The transaction data pipelines have to be plumbed. Data transformations and variables used in the offline models have to be re-coded and compiled.
- These re-implementations inevitably introduce bugs, requiring verification that the models work as originally designed.
- A custom container for the model has to be built, with appropriate monitors, metrics and logging.
- Advanced deployments require A/B testing frameworks to evaluate alternative models side-by-side. One needs mechanisms to switch models in or out, preferably without recompiling and deploying the entire application.
- One has to validate that the candidate production model works as originally designed through statistical tests.
- The automated decisions made by the system and the business outcomes have to be logged for refining the ML models and for monitoring.
- The service has to be designed for high availability, disaster recovery and geo proximity to end points.
- When the service has to be scaled to meet higher transaction rates and/or low latency, more work is required to provision new hardware, deploy the service to new machines and scale out.
All of these are time consuming and engineering-intensive steps. It is expensive in terms of both infrastructure and manpower. The end-to-end engineering and maintenance of a production ML application requires a highly skilled team that few organizations can build and sustain.
Microsoft Azure ML was designed to solve these problems:
- It’s a fully managed cloud service with no software to install, no hardware to manage, no OS versions or development environments to grapple with.
- Armed with nothing but a browser, data scientists can log on to Azure and start developing ML models from any location, from any device. They can host a practically unlimited number of files on Azure storage.
- ML Studio, an integrated development environment for ML, lets you set up experiments as simple data flow graphs, with an easy to use drag, drop and connect paradigm. Data scientists can avoid programming for a large number of common tasks, allowing them to focus on experiment design and iteration.
- Many sample experiments are provided to make it easy to get started.
- A collection of best of breed algorithms developed by Microsoft Research come built-in, as is support for custom R code – over 350 open source R packages can be used securely within Azure ML.
- Data flow graphs can have several parallel paths which automatically run in parallel, allowing scientists to execute complex experiments and make side-by-side comparisons without the usual computational constraints.
- Experiments are readily sharable, so others can pick up on your work and continue where you left off.
Azure ML also makes it simple to create production deployments at scale in the cloud. Pre-trained ML models can be incorporated into a scoring workflow and, with a few clicks, a new cloud-hosted REST API can be created. This REST API has been engineered to respond with low latency. No reimplementation or porting is required – a key benefit over traditional data analytics software. Data from anywhere on the internet – laptops, websites, mobile devices, wearables and connected machines – can be sent to the newly created API to get back predictions. For example, a data scientist can create a fraud detection API that takes transaction information as input and returns a low/medium/high risk indicator as output. Such an API would then be “live” on the cloud, ready to accept calls from any software that a developer chooses to call it from. The API backend scales elastically, so that when transaction rates spike, the Azure ML service can automatically handle the load. There are virtually no limits on the number of ML APIs that a data scientist can create and deploy – and all this without any dependency on engineering. For engineering and IT, it becomes simple to integrate a new ML model using those REST APIs, and testing multiple models side-by-side before deployment becomes easy, allowing dramatically better agility at low cost. Azure provides mechanisms to scale and manage APIs in production, including mechanisms to measure availability, latency, and performance. Building robust, highly available, reliable ML systems and managing the production deployment is therefore dramatically faster, cheaper and easier for the enterprise, with huge business benefits.
We believe Azure ML is a game changer. It makes the incredible potential of ML accessible both to startups and large enterprises. Startups are now able to use the same capabilities that were previously available to only the most sophisticated businesses. Larger enterprises are able to unleash the latent value in their big data to generate significantly more revenue and efficiencies. Above all, the speed of iteration and experimentation that is now possible will allow for rapid innovation and pave the way for intelligence in cloud-connected devices all around us.
When I started my career in 1995, it took a large organization to build and deploy credit card fraud detection systems. With tools like Azure ML and the power of the cloud, a single talented data scientist can accomplish the same feat.
Follow me on Twitter