Machine Learning operations maturity model

Azure Machine Learning

The purpose of this maturity model is to help clarify the Machine Learning Operations (MLOps) principles and practices. The maturity model shows the continuous improvement in the creation and operation of a production level machine learning application environment. You can use it as a metric for establishing the progressive requirements needed to measure the maturity of a machine learning production environment and its associated processes.

Maturity model

The MLOps maturity model helps clarify the Development Operations (DevOps) principles and practices necessary to run a successful MLOps environment. It's intended to identify gaps in an existing organization's attempt to implement such an environment. It's also a way to show you how to grow your MLOps capability in increments rather than overwhelm you with the requirements of a fully mature environment. Use it as a guide to:

Estimate the scope of the work for new engagements.
Establish realistic success criteria.
Identify deliverables you'll hand over at the conclusion of the engagement.

As with most maturity models, the MLOps maturity model qualitatively assesses people/culture, processes/structures, and objects/technology. As the maturity level increases, the probability increases that incidents or errors will lead to improvements in the quality of the development and production processes.

The MLOps maturity model encompasses five levels of technical capability:

Level	Description	Highlights	Technology
0	No MLOps	Difficult to manage full machine learning model lifecycle The teams are disparate and releases are painful Most systems exist as "black boxes," little feedback during/post deployment	Manual builds and deployments Manual testing of model and application No centralized tracking of model performance Training of model is manual
1	DevOps but no MLOps	Releases are less painful than No MLOps, but rely on Data Team for every new model Still limited feedback on how well a model performs in production Difficult to trace/reproduce results	Automated builds Automated tests for application code
2	Automated Training	Training environment is fully managed and traceable Easy to reproduce model Releases are manual, but low friction	Automated model training Centralized tracking of model training performance Model management
3	Automated Model Deployment	Releases are low friction and automatic Full traceability from deployment back to original data Entire environment managed: train > test > production	Integrated A/B testing of model performance for deployment Automated tests for all code Centralized tracking of model training performance
4	Full MLOps Automated Operations	Full system automated and easily monitored Production systems are providing information on how to improve and, in some cases, automatically improve with new models Approaching a zero-downtime system	Automated model training and testing Verbose, centralized metrics from deployed model

The tables that follow identify the detailed characteristics for that level of process maturity. The model will continue to evolve. This version was last updated in January 2020.

Level 0: No MLOps

People	Model Creation	Model Release	Application Integration
Data scientists: siloed, not in regular communications with the larger team Data engineers (if exists): siloed, not in regular communications with the larger team Software engineers: siloed, receive model remotely from the other team members	Data gathered manually Compute is likely not managed Experiments aren't predictably tracked End result might be a single model file manually handed off with inputs/outputs	Manual process Scoring script might be manually created well after experiments, not version controlled Release handled by data scientist or data engineer alone	Heavily reliant on data scientist expertise to implement Manual releases each time

Level 1: DevOps no MLOps

People	Model Creation	Model Release	Application Integration
Data scientists: siloed, not in regular communications with the larger team Data engineers (if exists): siloed, not in regular communication with the larger team Software engineers: siloed, receive model remotely from the other team members	Data pipeline gathers data automatically Compute is or isn't managed Experiments aren't predictably tracked End result might be a single model file manually handed off with inputs/outputs	Manual process Scoring script might be manually created well after experiments, likely version controlled Is handed off to software engineers	Basic integration tests exist for the model Heavily reliant on data scientist expertise to implement model Releases automated Application code has unit tests

Level 2: Automated Training

People	Model Creation	Model Release	Application Integration
Data scientists: Working directly with data engineers to convert experimentation code into repeatable scripts/jobs Data engineers: Working with data scientists Software engineers: siloed, receive model remotely from the other team members	Data pipeline gathers data automatically Compute managed Experiment results tracked Both training code and resulting models are version controlled	Manual release Scoring script is version controlled with tests Release managed by Software engineering team	Basic integration tests exist for the model Heavily reliant on data scientist expertise to implement model Application code has unit tests

Level 3: Automated Model Deployment

People	Model Creation	Model Release	Application Integration
Data scientists: Working directly with data engineers to convert experimentation code into repeatable scripts/jobs Data engineers: Working with data scientists and software engineers to manage inputs/outputs Software engineers: Working with data engineers to automate model integration into application code	Data pipeline gathers data automatically Compute managed Experiment results tracked Both training code and resulting models are version controlled	Automatic release Scoring script is version controlled with tests Release managed by continuous delivery (CI/CD) pipeline	Unit and integration tests for each model release Less reliant on data scientist expertise to implement model Application code has unit/integration tests

Level 4: Full MLOps Automated Retraining

People	Model Creation	Model Release	Application Integration
Data scientists: Working directly with data engineers to convert experimentation code into repeatable scripts/jobs. Working with software engineers to identify markers for data engineers Data engineers: Working with data scientists and software engineers to manage inputs/outputs Software engineers: Working with data engineers to automate model integration into application code. Implementing post-deployment metrics gathering	Data pipeline gathers data automatically Retraining triggered automatically based on production metrics Compute managed Experiment results tracked Both training code and resulting models are version controlled	Automatic Release Scoring Script is version controlled with tests Release managed by continuous integration and CI/CD pipeline	Unit and Integration tests for each model release Less reliant on data scientist expertise to implement model Application code has unit/integration tests

Machine Learning operations maturity model

Maturity model

Level 0: No MLOps

Level 1: DevOps no MLOps

Level 2: Automated Training

Level 3: Automated Model Deployment

Level 4: Full MLOps Automated Retraining

Next steps

Feedback

Feedback

Additional resources

Machine Learning operations maturity model

Maturity model

Level 0: No MLOps

Level 1: DevOps no MLOps

Level 2: Automated Training

Level 3: Automated Model Deployment

Level 4: Full MLOps Automated Retraining

Next steps

Related resources

Feedback

Feedback

Additional resources