Model drift and performance degradation
Model drift is generally about the degradation of ML model performance because of changes in data and relationships between input and output (I/O) variables, known as model decay, as well. Model drift can be addressed by continuous learning to adapt to the latest changes in datasets or environments in near real time. One of the important aspects of FL is realizing a continuous learning framework by updating an ML model instantly whenever the learning happens in the local distributed environment anytime, in a consistent manner. That way, FL could resolve the situation often seen in enterprise AI applications where the intelligence is useless by the time it is delivered for production.
We will now touch on how models could get degraded or stop working, and then some of the current efforts of model operations (ModelOps) to continuously improve the performance of models and achieve sustainable AI operations.
How models can stop working
Any AI and ML model with fixed parameters, or weights, generated from the training data and adjusted to the test data can perform fairly well when deployed in an environment where the model receives data similar to the training and test data. If an autonomous driving model is well trained with data recorded during sunny daytime, the model can drive vehicles safely on sunny days because it is doing what it has been trained to do. On a rainy night, however, nobody should be in or near the vehicle if it is autonomously driven: the model is fed with totally unfamiliar, dark, and blurry images; its decisions will not be reliable at all. In such a situation, the model’s decision will be far off the track, hence the name model drift. Again, model drift is not likely to happen if the model is deployed in an environment similar to the training and testing environment and if the environment does not change significantly over time. But in many business situations, that assumption does not always hold, and model drift becomes a serious issue.
There are two types of model drifts: data drift and concept drift. Data drift happens when input data to a deployed model is significantly different from the data the model has been trained with. In other words, changes in data distribution are the cause of data drift. The aforementioned diurnal autonomous vehicle model not performing well in the nighttime is an example of data drift. Another example would be an ice-cream sale prediction model trained in California being deployed in New Zealand; seasonality in the southern hemisphere is opposite to that in the northern hemisphere, and the estimated sales of ice cream will be low for summer and high for winter, on the contrary to the actual sales volume.
Concept drift, on the other hand, is a result of changes in how variables correlate with each other. In the terminology of statistics, this implies that the data-generating process has been altered. And this is what Google Flu Trends (GFT) suffered from, as the author of The Undercover Economist put it in the following Financial Times article: https://www.ft.com/content/21a6e7d8-b479-11e3-a09a-00144feabdc0#axzz30qfdzLCB.
Prior to the period, search queries were meaningfully correlated with the spread of flu as mainly people who suspected that they were infected typed those words in the browser, and therefore the model worked successfully. This may no longer have been the case in 2013 since people in other categories, such as those who were precautious about a potential pandemic or those who were just curious, were searching for those words, and they may have been led to do so by Google’s recommendations. This concept drift likely made GFT overestimate the spread vis-à -vis medical reports provided by the Centers for Disease Control and Prevention (CDC).
Either by data or by concept, model drift causes model performance degradation, and it occurs because of our focus on correlation. The ground truth in data science parlance does not mean something like the universal truth in hard science such as physics and chemistry—that is, causation. It is merely a true statement about how variables in given data correlate with each other in a particular environment, and it provides no guarantee that the correlation holds when the environment changes or differs. This is to say that what we estimate as the ground truth can vary over time and locations, just like the ground has been reshaped by seismic events throughout history and geography.
Continuous monitoring – the price of letting causation go
In a survey commissioned by Redis Labs (https://venturebeat.com/business/redis-survey-finds-ai-is-stressing-it-infrastructure-to-breaking-point/), about half of the respondents cited model reliability (48%), model performance (44%), accuracy over time (57%), and latency of running the model (51%) as the top challenges for getting models deployed. Given the risk and concern of model drift, AI and ML model stakeholders need to work on two additional tasks after deployment. First, model performance must be continuously monitored to detect model drift. Both data drift and concept drift can take place gradually or suddenly. Once model drift is detected, the model needs to be retrained with new training data, and when concept drift occurs, even the use of a new model architecture may be necessary to upgrade the model.
In order to address these requirements, a new ML principle called Continuous Delivery for Machine Learning (CD4ML) has been proposed. In the framework of CD4ML, a model is coded and trained with training data in the first step. The model is then tested with a separate dataset and evaluated based on some metrics, and more often than not, the best model is selected from multiple candidates. Next, the selected model is productionized with a further test to make sure that the model performs well after the deployment, and once it passes the test, it is deployed. Here, the monitoring process starts. When model drift is observed, the model will be retrained with new data or given a new architecture, depending on the severity of the drift. If you are familiar with software engineering, you might have noticed that CD4ML is the adoption of continuous integration/continuous delivery (CI/CD) in the field of ML. In a similar vein, ModelOps, an AI and ML operational framework stemming from the development-operations (DevOps) software engineering framework is gaining popularity. ModelOps bridges ML operations (MLOps: the integration of data engineering and data science) and application engineering; it can be seen as the enabler of CD4ML.
The third factor of the Triple-A mindset for big data lets us focus on correlation and has helped in building AI and ML models rapidly over the last decade. Finding correlation is much easier than discovering causation. For many AI and ML models that have been telling us what we need to know from people’s Google search patterns over years, we have to check if it still works today. And so do we tomorrow.
That is why FL is one of the important approaches for continuous learning. When creating and operating an FL system, it is also important to develop the system with ModelOps functionalities, as the critical role of FL is to keep improving models constantly from various learning environments in a collaborative manner. It is even possible to realize a crowdsourced learning framework with FL so that people in the platform can take the desired ML model to adapt and train it locally and return an updated model to the FL server with an aggregator. With an advanced model aggregation framework to filter out poisonous ML models that could potentially degrade the current models, FL can consistently integrate other learnings, and thus realize a sustainable continuous learning operation that is key for the platform with ModelOps functionalities.