ModelOps: The next-generation approach to managing your models

A circular flow surrounding syntax with other elements on the outside illustrating the nature of ModelOps

What is ModelOps?

More and more companies are using advanced machine learning models in their businesses, and as a result, they face many challenges, including how to manage these models throughout their lifecycles.

This is where ModelOps comes in. It is a practice that focuses on scaling and deploying machine learning models in production environments. It involves a combination of people, processes, and technology to ensure that the models remain accurate and reliable over time.

ModelOps enables organizations to manage their models throughout their lifecycle, from development and testing to deployment, monitoring, and maintenance.

This helps companies reduce the risk of model failure, optimize model performance, and improve the overall success of their machine learning projects. It is often used to identify opportunities to improve the value, quality, and efficiency of machine learning models at every stage of the model lifecycle and prevent value decay.

It includes activities such as initial model deployment, ongoing automated monitoring, evaluation and performance check-ins, and continuous retraining and redeployment.

PwC uses simulation for their ModelOps

To implement ModelOps, PwC used AnyLogic to develop a simulation model that would track their model portfolio. This allowed them to:

  • Reflect the dynamics of a development team.
  • Visualize the model portfolio.
  • Show the value and cost of adding different ModelOps functionalities.
  • Integrate actual customer data to quantify outcomes.
  • Assist clients in experimenting with and testing various probabilities as a decision-making tool.

The overall solution

To manage their portfolio of models, PwC devised a four-part solution that included the model lifecycle, the models themselves, usage dynamics, and ModelOps capabilities. All of these are crucial, so let’s explore them in more detail now.

The model lifecycle

This is a process flow describing the typical lifecycle of an AI or ML model, starting with an idea backlog, followed by model building and deployment, production, and eventual retirement or obsolescence.

The focus is on activities during production, whether online serving predictions or offline doing retraining.

The development team, as resources within the model, prioritizes these activities. The priority is to fix issues and take models offline. At the same time, there is also a focus on having models online and operational as much as possible. Regular retraining is prioritized second, with each model having a specific schedule. Priorities three and four are to deploy new models or build entirely new ones.

A model lifecycle showing the flows from block to block, from the initial generation of new models to the retirement of models

The process flow of a typical lifecycle of an AI/ML model

The models

In the simulation, each model is represented as an agent with a statechart to track its operation status. Once in production, the model spends most of its time online making predictions.

Periodical retraining is usually scheduled where the model is not necessarily taken offline, but there are two scenarios where this may happen: an issue requiring a developer to repair and bring it back online, or a drift event causing performance decay.

Tracking of the model's statistics is also available in the simulation. This is essential to see its duration online, predictions made, total value, and generated costs.

A statechart used to define behavior with the use of states and transitions illustrated

An example of a model agent statechart

The simulation aims to demonstrate the value and impact of possibly implementing ModelOps practices and to understand the generated value and cost of the model portfolio.

Three components are used to build that total value and cost: user base, prediction volume, and value and cost per prediction. Aggregating these factors across all models yields the total net value of the model portfolio.

The simulation, therefore, helps businesses understand the value and impact of ModelOps practices and their associated costs.

User base x prediction volume x value and cost per prediction illustrated aggregating to give the total net vaule of the model portfolio

The value and cost generated by each model depends on three factors

In the simulation, the three components work together, and one important element in this is the model’s predictive power, which decays from day one. To simulate this, a decay factor is added, gradually degrading the performance of each model. This decay affects all three parts of the business calculations, ensuring the realistic nature of the model.

Usage dynamics: prediction volume, model performance, and user base

The usage dynamics part of the calculation uses a bass diffusion model to represent the interplay of model performance, user adoption, and usage volume. Each model has its own bass diffusion to model its user base, with the decay factor being a specific aspect. This decay factor affects the rate of adoption and usage of each model.

Understanding the user base for each model directly affects the predictive power of the model, allowing the aggregated total user base to be used in the total model portfolio.

The prediction volume is also affected by the decay factor, which is influenced by the model's online presence. If a model is performing poorly and rarely available, users may stop using it.

A bass diffusion model showing potential users and current users and the interaction between them and other parameters

A bass diffusion model illustrating user base and user adoption

ModelOps Capabilities

In this simulation, PwC demonstrated five capabilities of ModelOps as illustrated below.

A pie chart showing the 5 modelops capabilities of automation, monitoring, model lineage, autoscaling, and model rollback with explanations of each

The 5 ModelOps Capabilities (click to enlarge)

These capabilities are modeled as a stock-and-flow, allowing users to simulate different maturity levels or investments for each. This allows for comparing elementary automation levels to higher levels of investment.

PwC also uses a flow rate to show that building this functionality takes some time. It allows some resources to be diverted and delays the realization of the benefits in a realistic way within the simulation.

A stock-and-flow showing the automation capability of modelops

A capability of ModelOps modeled as a stock-and-flow

Further development of ModelOps

ModelOps is a crucial approach for managing machine learning and AI models at scale, integrating data science with DevOps for smooth and reliable management throughout their lifecycle. It enables the implementation of more effective and sustainable models for real-world problems, streamlining the complex nature of model development, and reducing costs through automation. By implementing ModelOps strategies, companies can stay competitive and deliver value to customers, ensuring efficient and sustainable implementation of real-world problems.

Watch the video below from PwC’s presentation at the AnyLogic Conference 2022 to explore this topic in more detail and view the simulation model demo.

Don’t forget to subscribe to our monthly newsletter to stay up to date with AnyLogic blog posts and our Model of the Month.

Verwandte Posts