Data registration and versioning
It is vital to register and version the data in the workspace before starting ML training as it enables us to backtrack our experiments or ML models to the source of data used for training the models. The purpose of versioning the data is to backtrack at any point, to replicate a model's training, or to explain the workings of the model as per the inference or testing data for explaining the ML model. For these reasons, we will register the processed data and version it to use it for our ML pipeline. We will register and version the processed data to the Azure Machine Learning workspace using the Azure Machine Learning SDK as follows:
subscription_id = '---insert your subscription ID here----' resource_group = 'Learn_MLOps' workspace_name = 'MLOps_WS' workspace = Workspace(subscription_id, resource_group, workspace_name)
Fetch your subscription ID
, resource_group
and workspace_name
from the Azure...