Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Machine Learning Engineering with MLflow
Machine Learning Engineering with MLflow

Machine Learning Engineering with MLflow: Manage the end-to-end machine learning life cycle with MLflow

eBook
$24.99 $35.99
Paperback
$43.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Machine Learning Engineering with MLflow

Chapter 1: Introducing MLflow

MLflow is an open source platform for the machine learning (ML) life cycle, with a focus on reproducibility, training, and deployment. It is based on an open interface design and is able to work with any language or platform, with clients in Python and Java, and is accessible through a REST API. Scalability is also an important benefit that an ML developer can leverage with MLflow.

In this chapter of the book, we will take a look at how MLflow works, with the help of examples and sample code. This will build the necessary foundation for the rest of the book in order to use the concept to engineer an end-to-end ML project.

Specifically, we will look at the following sections in this chapter:

  • What is MLflow?
  • Getting started with MLflow
  • Exploring MLflow modules

Technical requirements

For this chapter, you will need the following prerequisites:

  • The latest version of Docker installed in your machine. In case you don't have the latest version, please follow the instructions at the following URL: https://docs.docker.com/get-docker/.
  • Access to a bash terminal (Linux or Windows).
  • Access to a browser.
  • Python 3.5+ installed.
  • PIP installed.

What is MLflow?

Implementing a product based on ML can be a laborious task. There is a general need to reduce the friction between different steps of the ML development life cycle, and between teams of data scientists and engineers that are involved in the process.

ML practitioners, such as data scientists and ML engineers, operate with different systems, standards, and tools. While data scientists spend most of their time developing models in tools such as Jupyter Notebooks, when running in production, the model is deployed in the context of a software application with an environment that is more demanding in terms of scale and reliability.

A common occurrence in ML projects is to have the models reimplemented by an engineering team, creating a custom-made system to serve the specific model. A set of challenges are common with teams that follow bespoke approaches regarding model development:

  • ML projects that run over budget due to the need to create bespoke software infrastructure to develop and serve models
  • Translation errors when reimplementing the models produced by data scientists
  • Scalability issues when serving predictions
  • Friction in terms of reproducing training processes between data scientists due to a lack of standard environments

Companies leveraging ML tend to create their own (often extremely laborious) internal systems in order to ensure a smooth and structured process of ML development. Widely documented ML platforms include systems such as Michelangelo and FBLearner, from Uber and Facebook, respectively.

It is in the context of the increasing adoption of ML that MLflow was initially created at Databricks and open sourced as a platform, to aid in the implementation of ML systems.

MLflow enables an everyday practitioner in one platform to manage the ML life cycle, from iteration on model development up to deployment in a reliable and scalable environment that is compatible with modern software system requirements.

Getting started with MLflow

Next, we will install MLflow on your machine and prepare it for use in this chapter. You will have two options when it comes to installing MLflow. The first option is through a Docker container-based recipe provided in the repository of the book: https://github.com/PacktPublishing/Machine-Learning-Engineering-with-Mlflow.git.

To install it, follow these instructions:

  1. Use the following commands to install the software:
    $ git clone https://github.com/PacktPublishing/Machine-Learning-Engineering-with-Mlflow.git
    $ cd Machine-Learning-Engineering-with-Mlflow
    $ cd Chapter01
  2. The Docker image is very simple at this stage: it simply contains MLflow and sklearn, the main tools to be used in this chapter of the book. For illustrative purposes, you can look at the content of the Dockerfile:
    FROM jupyter/scipy-notebook
    RUN pip install mlflow
    RUN pip install sklearn
  3. To build the image, you should now run the following command:
    docker build -t chapter_1_homlflow
  4. Right after building the image, you can run the ./run.sh command:
    ./run.sh

    Important note

    It is important to ensure that you have the latest version of Docker installed on your machine.

  5. Open your browser to http://localhost:888 and you should be able to navigate to the Chapter01 folder.

In the following section, we will be developing our first model with MLflow in the Jupyter environment created in the previous set of steps.

Developing your first model with MLflow

From the point of view of simplicity, in this section, we will use the built-in sample datasets in sklearn, the ML library that we will use initially to explore MLflow features. For this section, we will choose the famous Iris dataset to train a multi-class classifier using MLflow.

The Iris dataset (one of sklearn's built-in datasets available from https://scikit-learn.org/stable/datasets/toy_dataset.html) contains the following elements as features: sepal length, sepal width, petal length, and petal width. The target variable is the class of the iris: Iris Setosa, Iris Versocoulor, or Iris Virginica:

  1. Load the sample dataset:
    from sklearn import datasets
    from sklearn.model_selection import train_test_split
    dataset = datasets.load_iris()
    X_train, X_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.4)
  2. Next, let's train your model.

    Training a simple machine model with a framework such as scikit-learn involves instantiating an estimator such as LogisticRegression and calling the fit command to execute training over the Iris dataset built in scikit-learn:

    from sklearn.linear_model import LogisticRegression
    clf = LogisticRegression()
    clf.fit(X_train, y_train)

    The preceding lines of code are just a small portion of the ML Engineering process. As will be demonstrated, a non-trivial amount of code needs to be created in order to productionize and make sure that the preceding training code is usable and reliable. One of the main objectives of MLflow is to aid in the process of setting up ML systems and projects. In the following sections, we will demonstrate how MLflow can be used to make your solutions robust and reliable.

  3. Then, we will add MLflow.

    With a few more lines of code, you should be able to start your first MLflow interaction. In the following code listing, we start by importing the mlflow module, followed by the LogisticRegression class in scikit-learn. You can use the accompanying Jupyter notebook to run the next section:

    import mlflow
    from sklearn.linear_model import LogisticRegression
    mlflow.sklearn.autolog()
    with mlflow.start_run():
        clf = LogisticRegression()
        clf.fit(X_train, y_train)

    The mlflow.sklearn.autolog() instruction enables you to automatically log the experiment in the local directory. It captures the metrics produced by the underlying ML library in use. MLflow Tracking is the module responsible for handling metrics and logs. By default, the metadata of an MLflow run is stored in the local filesystem.

  4. If you run the following excerpt on the accompanying notebook's root document, you should now have the following files in your home directory as a result of running the following command:
    $ ls -l 
    total 24
    -rw-r--r-- 1 jovyan users 12970 Oct 14 16:30 chapther_01_introducing_ml_flow.ipynb
    -rw-r--r-- 1 jovyan users    53 Sep 30 20:41 Dockerfile
    drwxr-xr-x 4 jovyan users   128 Oct 14 16:32 mlruns
    -rwxr-xr-x 1 jovyan users    97 Oct 14 13:20 run.sh

    The mlruns folder is generated alongside your notebook folder and contains all the experiments executed by your code in the current context.

    The mlruns folder will contain a folder with a sequential number identifying your experiment. The outline of the folder will appear as follows:

    ├── 46dc6db17fb5471a9a23d45407da680f
    │   ├── artifacts
    │   │   └── model
    │   │       ├── MLmodel
    │   │       ├── conda.yaml
    │   │       ├── input_example.json
    │   │       └── model.pkl
    │   ├── meta.yaml
    │   ├── metrics
    │   │   └── training_score
    │   ├── params
    │   │   ├── C
    │   │   …..
    │   └── tags
    │       ├── mlflow.source.type
    │       └── mlflow.user
    └── meta.yaml

    So, with very little effort, we have a lot of traceability available to us, and a good foundation to improve upon.

Your experiment is identified as UUID on the preceding sample by 46dc6db17fb5471a9a23d45407da680f. At the root of the directory, you have a yaml file named meta.yaml, which contains the content:

artifact_uri: file:///home/jovyan/mlruns/0/518d3162be7347298abe4c88567ca3e7/artifacts
end_time: 1602693152677
entry_point_name: ''
experiment_id: '0'
lifecycle_stage: active
name: ''
run_id: 518d3162be7347298abe4c88567ca3e7
run_uuid: 518d3162be7347298abe4c88567ca3e7
source_name: ''
source_type: 4
source_version: ''
start_time: 1602693152313
status: 3
tags: []
user_id: jovyan

This is the basic metadata of your experiment, with information including start time, end time, identification of the run (run_id and run_uuid), an assumption of the life cycle stage, and the user who executed the experiment. The settings are basically based on a default run, but provide valuable and readable information regarding your experiment:

├── 46dc6db17fb5471a9a23d45407da680f
│   ├── artifacts
│   │   └── model
│   │       ├── MLmodel
│   │  ^   ├── conda.yaml
│   │       ├── input_example.json
│   │       └── model.pkl

The model.pkl file contains a serialized version of the model. For a scikit-learn model, there is a binary version of the Python code of the model. Upon autologging, the metrics are leveraged from the underlying machine library in use. The default packaging strategy was based on a conda.yaml file, with the right dependencies to be able to serialize the model.

The MLmodel file is the main definition of the project from an MLflow project with information related to how to run inference on the current model.

The metrics folder contains the training score value of this particular run of the training process, which can be used to benchmark the model with further model improvements down the line.

The params folder on the first listing of folders contains the default parameters of the logistic regression model, with the different default possibilities listed transparently and stored automatically.

Exploring MLflow modules

MLflow modules are software components that deliver the core features that aid in the different phases of the ML life cycle. MLflow features are delivered through modules, extensible components that organize related features in the platform.

The following are the built-in modules in MLflow:

  • MLflow Tracking: Provides a mechanism and UI to handle metrics and artifacts generated by ML executions (training and inference)
  • Mlflow Projects: A package format to standardize ML projects
  • Mlflow Models: A mechanism that deploys to different types of environments, both on-premises and in the cloud
  • Mlflow Model Registry: A module that handles the management of models in MLflow and its life cycle, including state

In order to explore the different modules, we will install MLflow in your local environment using the following command:

pip install mlflow

Important note

It is crucial that the technical requirements are correctly installed on your local machine to allow you to follow along. You can also use the pip command with the required permissions.

Exploring MLflow projects

An MLflow project represents the basic unit of organization of ML projects. There are three different environments supported by MLflow projects: the Conda environment, Docker, and the local system.

Important note

Model details of the different parameters available on an MLProject file can be consulted in the official documentation available at https://www.mlflow.org/docs/latest/projects.html#running-projects.

The following is an example of an MLproject file of a conda environment:

name: condapred
conda_env:
  image: conda.yaml
entry_points:
  main:
    command: "python mljob.py"

In the conda option, the assumption is that there is a conda.yaml file with the required dependencies. MLflow, when asked to run the project, will start the environment with the specified dependencies.

The system-based environment will look like the following; it's actually quite simple:

name: syspred
entry_points:
  main:
    command: "python mljob.py"

The preceding system variant will basically rely on the local environment dependencies, assuming that the underlying operating system contains all the dependencies. This approach is particularly prone to library conflicts with the underlying operating system; it might be valuable in contexts where there is already an existing operating system environment that fits the project.

The following is a Docker environment-based MLproject file:

name: syspred
docker_env:
  image: stockpred-docker
entry_points:
  main:
    command: "python mljob.py"

Once you have your environment, the main file that defines how your project should look is the MLProject file. This file is used by MLflow to understand how it should run your project.

Developing your first end-to-end pipeline in MLflow

We will prototype a simple stock prediction project in this section with MLflow and will document the different files and phases of the solution. You will develop it in your local system using the MLflow and Docker installed locally.

Important note

In this section, we are assuming that MLflow and Docker are installed locally, as the steps in this section will be executed in your local environment.

The task in this illustrative project is to create a basic MLflow project and produce a working baseline ML model to predict, based on market signals over a certain number of days, whether the stock market will go up or down.

In this section, we will use a Yahoo Finance dataset available for quoting the BTC-USD pair in https://finance.yahoo.com/quote/BTC-USD/ over a period of 3 months. We will train a model to predict whether the quote will be going up or not on a given day. A REST API will be made available for predictions through MLflow.

We will illustrate, step by step, the creation of an MLflow project to train a classifier on stock data, using the Yahoo API for financial information retrieved using the package's pandas data reader:

  1. Add your MLProject file:
    name: stockpred
    docker_env:
      image: stockpred-docker
    entry_points:
      main:
        command: "python train.py"

    The preceding MLProject file specifies that dependencies will be managed in Docker with a specific image name. MLflow will try to pull the image using the version of Docker installed on your system. If it doesn't find it, it will try to retrieve it from Docker Hub. For the goals of this chapter, it is completely fine to have MLflow running on your local machine.

    The second configuration that we add to our project is the main entry point command. The command to be executed will invoke in the Docker environment the train.py Python file, which contains the code of our project.

  2. Add a Docker file to the project.

    Additionally, you can specify the Docker registry URL of your image. The advantage of running Docker is that your project is not bound to the Python language, as we will see in the advanced section of this book. The MLflow API is available in a Rest interface alongside the official clients: Python, Java, and R:

    FROM continuumio/miniconda:4.5.4
    RUN pip install mlflow==1.11.0 \
        && pip install numpy==1.14.3 \
        && pip install scipy \
        && pip install pandas==0.22.0 \
        && pip install scikit-learn==0.20.4 \
        && pip install cloudpickle \
        && pip install pandas_datareader>=0.8.0

    The preceding Docker image file is based on the open source package Miniconda, a free minimal installer with a minimal set of packages for data science that allow us to control the details of the packages that we need in our environment.

    We will specify the version of MLflow (our ML platform), numpy, and scipy for numerical calculations. Cloudpickle allows us to easily serialize objects. We will use pandas to manage data frames, and pandas_datareader to allow us to easily retrieve the data from public sources.

  3. Import the packages required for the project.

    On the following listing, we explicitly import all the libraries that we will use during the execution of the training script: the library to read the data, and the different sklearn modules related to the chosen initial ML model:

    import numpy as np
    import datetime
    import pandas_datareader.data as web
    from sklearn.model_selection import train_test_split
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.metrics import classification_report
    from sklearn.metrics import precision_score
    from sklearn.metrics import recall_score
    from sklearn.metrics import f1_score
    import mlflow.sklearn

    We explicitly chose for the stock market movement detection problem a RandomForestClassifier, due to the fact that it's an extremely versatile and widely accepted baseline model for classification problems.

  4. Acquire your training data.

    The component of the code that acquires the Yahoo Finance stock dataset is intentionally small, so we choose a specific interval of 3 months to train our classifier.

    The acquire_training_data method returns a pandas data frame with the relevant dataset:

    def acquire_training_data():
        start = datetime.datetime(2019, 7, 1)
        end = datetime.datetime(2019, 9, 30)
        df = web.DataReader("BTC-USD", 'yahoo', start, end)
        return df

    The format of the data acquired is the classic format for financial securities in exchange APIs. For every day of the period, we retrieve the following data: the highest value of the stock, the lowest, opening, and close values of the stock, as well as the volume. The final column represents the adjusted close value, the value after dividends, and splits:

    Figure 1.1 – Excerpt from the acquired data

    Figure 1.1 – Excerpt from the acquired data

    Figure 1.2 is illustrative of the target variable that we would like to achieve by means of the current data preparation process:

    Figure 1.2 – Excerpt from the acquired data with the prediction column

    Figure 1.2 – Excerpt from the acquired data with the prediction column

  5. Make the data usable by scikit-learn.

    The data acquired in the preceding step is clearly not directly usable by RandomForestAlgorithm, which thrives on categorical features. In order to facilitate the execution of this, we will transform the raw data into a feature vector using the rolling window technique.

    Basically, the feature vector for each day becomes the deltas between the current and previous window days. In this case, we use the previous day's market movement (1 for a stock going up, 0 otherwise):

    def digitize(n):
        if n > 0:
            return 1
        return 0
    def rolling_window(a, window):
        """
            Takes np.array 'a' and size 'window' as parameters
            Outputs an np.array with all the ordered sequences of values of 'a' of size 'window'
            e.g. Input: ( np.array([1, 2, 3, 4, 5, 6]), 4 )
                 Output:
                         array([[1, 2, 3, 4],
                               [2, 3, 4, 5],
                               [3, 4, 5, 6]])
        """
        shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
        strides = a.strides + (a.strides[-1],)
        return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
    def prepare_training_data(data):
        data['Delta'] = data['Close'] - data['Open']
        data['to_predict'] = data['Delta'].apply(lambda d: digitize(d))
        return data

    The following example is illustrative of the data frame output produced with the binarized ups and downs of the previous days:

    Figure 1.3 – Feature vector with binarized market ups and downs

    Figure 1.3 – Feature vector with binarized market ups and downs

  6. Train and store your model in MLflow.

    This portion of the following code listing calls the data preparation methods declared previously and executes the prediction process.

    The main execution also explicitly logs the ML model trained in the current execution in the MLflow environment.

    if __name__ == "__main__":
        with mlflow.start_run():
        training_data = acquire_training_data()
        prepared_training_data_df = prepare_training_data(training_data)
        btc_mat = prepared_training_data_df.as_matrix()
        WINDOW_SIZE = 14
        X = rolling_window(btc_mat[:, 7], WINDOW_SIZE)[:-1, :]
        Y = prepared_training_data_df['to_predict'].as_matrix()[WINDOW_SIZE:]
        X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.25, random_state=4284, stratify=Y)
        clf = RandomForestClassifier(bootstrap=True, criterion='gini', min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=50, random_state=4284, verbose=0)
        clf.fit(X_train, y_train)
        predicted = clf.predict(X_test)
        mlflow.sklearn.log_model(clf, "model_random_forest")
        mlflow.log_metric("precision_label_0", precision_score(y_test, predicted, pos_label=0))
        mlflow.log_metric("recall_label_0", recall_score(y_test, predicted, pos_label=0))
        mlflow.log_metric("f1score_label_0", f1_score(y_test, predicted, pos_label=0))
        mlflow.log_metric("precision_label_1", precision_score(y_test, predicted, pos_label=1))
        mlflow.log_metric("recall_label_1", recall_score(y_test, predicted, pos_label=1))
        mlflow.log_metric("f1score_label_1", f1_score(y_test, predicted, pos_label=1))

    The mlflow.sklearn.log_model(clf, "model_random_forest") method takes care of persisting the model upon training. In contrast to the previous example, we are explicitly asking MLflow to log the model and the metrics that we find relevant. This flexibility in the items to log allows one program to log multiple models into MLflow.

    In the end, your project layout should look like the following, based on the files created previously:

    ├── Dockerfile
    ├── MLproject
    ├── README.md
    └── train.py
  7. Build your project's Docker image.

    In order to build your Docker image, you should run the following command:

    docker build -t stockpred -f dockerfile

    This will build the image specified previously with the stockpred tag. This image will be usable in MLflow in the subsequent steps as the model is now logged into your local registry.

    Following execution of this command, you should expect a successful Docker build:

    ---> 268cb080fed2
    Successfully built 268cb080fed2
    Successfully tagged stockpred:latest
  8. Run your project.

    In order to run your project, you can now run the MLflow project:

    mlflow run .

    Your output should look similar to the excerpt presented here:

    MLFLOW_EXPERIMENT_ID=0 stockpred:3451a1f python train.py' in run with ID '442275f18d354564b6259a0188a12575' ===
                  precision    recall  f1-score   support
               0       0.61      1.00      0.76        11
               1       1.00      0.22      0.36         9
        accuracy                           0.65        20
       macro avg       0.81      0.61      0.56        20
    weighted avg       0.79      0.65      0.58        20
    2020/10/15 19:19:39 INFO mlflow.projects: === Run (ID '442275f18d354564b6259a0188a12575') succeeded ===

    This contains a printout of your model, the ID of your experiment, and the metrics captured during the current run.

At this stage, you have a simple, reproducible baseline of a stock predictor pipeline using MLflow that you can improve on and easily share with others.

Re-running experiments

Another extremely useful feature of MLflow is the ability to re-run a specific experiment with the same parameters as it was run with originally.

For instance, you should be able to run your previous project by specifying the GitHub URL of the project:

mlflow run https://github.com/PacktPublishing/Machine-Learning-Engineering-with-MLflow/tree/master/Chapter01/stockpred

Basically, what happens with the previous command is that MLflow clones the repository to a temporary directory and executes it, according to the recipe on MLProject.

The ID of the experiment (or the name) allows you to run the project with the original parameters, thereby enabling complete reproducibility of the project.

The MLflow projects feature allows your project to run in advanced cloud environments such as Kubernetes and Databricks. Scaling your ML job seamlessly is one of the main selling points of a platform such as MLflow.

As you have seen from the current section, the MLflow project module allows the execution of a reproducible ML job that is treated as a self-contained project.

Exploring MLflow tracking

The MLflow tracking component is responsible for observability. The main features of this module are the logging of metrics, artifacts, and parameters of an MLflow execution. It provides vizualisations and artifact management features.

In a production setting, it is used as a centralized tracking server implemented in Python that can be shared by a group of ML practitioners in an organization. This enables improvements in ML models to be shared within the organization.

In Figure 1.4, you can see an interface that logs all the runs of your model and allows you to log your experiment's observables (metrics, files, models and artifacts). For each run, you can look and compare the different metrics and parameters of your module.

It addresses common pain points when model developers are comparing different iterations of their models on different parameters and settings.

The following screenshot presents the different metrics for our last run of the previous model:

Figure 1.4 – Sample of the MLFlow interface/UI

Figure 1.4 – Sample of the MLFlow interface/UI

MLflow allows the inspection of arbitrary artifacts associated with each model and its associated metadata, allowing metrics of different runs to be compared. You can see the RUN IDs and the Git hash of the code that generated the specific run of your experiment:

Figure 1.5 – Inspecting logged model artifacts

Figure 1.5 – Inspecting logged model artifacts

In your current directory of stockpred, you can run the following command to have access to the results of your runs:

mlflow ui

Running the MLflow UI locally will make it available at the following URL: http://127.0.0.1:5000/.

In the particular case of the runs shown in the following screenshot, we have a named experiment where the parameter of the size of the window in the previous example was tweaked. Clear differences can be seen between the performance of the algorithms in terms of F1 score:

Figure 1.6 – Listing of MLflow runs

Figure 1.6 – Listing of MLflow runs

Another very useful feature of MLFlow tracking is the ability to compare between different runs of jobs:

Figure 1.7 – Comparison of F1 metrics of job runs

Figure 1.7 – Comparison of F1 metrics of job runs

This preceding visualization allows a practitioner to make a decision as to which model to use in production or whether to iterate further.

Exploring MLflow Models

MLflow Models is the core component that handles the different model flavors that are supported in MLflow and intermediates the deployment into different execution environments.

We will now delve into the different models supported in the latest version of MLflow.

As shown in the Getting started with MLflow section, MLflow models have a specific serialization approach for when the model is persisted in its internal format. For example, the serialized folder of the model implemented on the stockpred project would look like the following:

├── MLmodel
├── conda.yaml
└── model.pkl

Internally, MLflow sklearn models are persisted with the conda files with their dependencies at the moment of being run and a pickled model as logged by the source code:

artifact_path: model_random_forest
flavors:
  python_function:
    env: conda.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    python_version: 3.7.6
  sklearn:
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.23.2
run_id: 22c91480dc2641b88131c50209073113
utc_time_created: '2020-10-15 20:16:26.619071'
~

MLflow, by default, supports serving models in two flavors, namely, as a python_function or in sklearn format. The flavors are basically a format to be used by tools or environments serving models.

A good example of using the preceding is being able to serve your model without any extra code by executing the following command:

mlflow models serve -m ./mlruns/0/b9ee36e80a934cef9cac3a0513db515c/artifacts/model_random_forest/

You have access to a very simple web server that can run your model. Your model prediction interface can be executed by running the following command:

curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d '{"data":[[1,1,1,1,0,1,1,1,0,1,1,1,0,0]]}' [1]%

The response to the API call to our model was 1; as defined in our predicted variable, this means that in the next reading, the stock will move up.

The final few steps outline how powerful MLflow is as an end-to-end tool for model development, including for the prototyping of REST-based APIs for ML services.

The MLflow Models component allows the creation of custom-made Python modules that will have the same benefits as the built-in models, as long as a prediction interface is followed.

Some of the notable model types supported will be explored in upcoming chapters, including the following:

  • XGBoost model format
  • R functions
  • H2O model
  • Keras
  • PyTorch
  • Sklearn
  • Spark MLib
  • TensorFlow
  • Fastai

Support for the most prevalent ML types of models, combined with its built-in capability for on-premises and cloud deployment, is one of the strongest features of MLflow Models. We will explore this in more detail in the deployment-related chapters.

Exploring MLflow Model Registry

The model registry component in MLflow gives the ML developer an abstraction for model life cycle management. It is a centralized store for an organization or function that allows models in the organization to be shared, created, and archived collaboratively.

The management of the model can be made with the different APIs of MLflow and with the UI. Figure 1.7 demonstrates the Artifacts UI in the tracking server that can be used to register a model:

Figure 1.8 – Registering a model as an artifact

Figure 1.8 – Registering a model as an artifact

Upon registering the model, you can annotate the registered model with the relevant metadata and manage its life cycle. One example is to have models in a staging pre-production environment and manage the life cycle by sending the model to production:

Figure 1.9 – Managing different model versions and stages

Figure 1.9 – Managing different model versions and stages

The model registry module will be explored further in the book, with details on how to set up a centralized server and manage ML model life cycles, from conception through to phasing out a model.

Summary

In this chapter, we introduced MLflow, and explored some of the motivation behind adopting a ML platform to reduce the time from model development to production in ML development. With the knowledge and experience acquired in this chapter, you can start improving and making your ML development workflow reproducible and trackable.

We delved into each of the important modules of the platform: projects, models, trackers, and model registry. A particular emphasis was given to practical examples to illustrate each of the core capabilities, allowing you to have a hands-on approach to the platform. MLflow offers multiple out-of-the-box features that will reduce friction in the ML development life cycle with minimum code and configuration. Out-of-the-box metrics management, model management, and reproducibility are provided by MLflow.

We will build on this introductory knowledge and expand our skills and knowledge in terms of building practical ML platforms in the rest of the chapters.

We briefly introduced in this chapter the use case of stock market prediction, which will be used in the rest of the book. In the next chapter, we will focus on defining rigorously the ML problem of stock market prediction.

Further reading

In order to enhance your knowledge, you can consult the documentation available at the following links:

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Explore machine learning workflows for stating ML problems in a concise and clear manner using MLflow
  • Use MLflow to iteratively develop a ML model and manage it
  • Discover and work with the features available in MLflow to seamlessly take a model from the development phase to a production environment

Description

MLflow is a platform for the machine learning life cycle that enables structured development and iteration of machine learning models and a seamless transition into scalable production environments. This book will take you through the different features of MLflow and how you can implement them in your ML project. You will begin by framing an ML problem and then transform your solution with MLflow, adding a workbench environment, training infrastructure, data management, model management, experimentation, and state-of-the-art ML deployment techniques on the cloud and premises. The book also explores techniques to scale up your workflow as well as performance monitoring techniques. As you progress, you’ll discover how to create an operational dashboard to manage machine learning systems. Later, you will learn how you can use MLflow in the AutoML, anomaly detection, and deep learning context with the help of use cases. In addition to this, you will understand how to use machine learning platforms for local development as well as for cloud and managed environments. This book will also show you how to use MLflow in non-Python-based languages such as R and Java, along with covering approaches to extend MLflow with Plugins. By the end of this machine learning book, you will be able to produce and deploy reliable machine learning algorithms using MLflow in multiple environments.

Who is this book for?

This book is for data scientists, machine learning engineers, and data engineers who want to gain hands-on machine learning engineering experience and learn how they can manage an end-to-end machine learning life cycle with the help of MLflow. Intermediate-level knowledge of the Python programming language is expected.

What you will learn

  • Develop your machine learning project locally with MLflow's different features
  • Set up a centralized MLflow tracking server to manage multiple MLflow experiments
  • Create a model life cycle with MLflow by creating custom models
  • Use feature streams to log model results with MLflow
  • Develop the complete training pipeline infrastructure using MLflow features
  • Set up an inference-based API pipeline and batch pipeline in MLflow
  • Scale large volumes of data by integrating MLflow with high-performance big data libraries

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Aug 27, 2021
Length: 248 pages
Edition : 1st
Language : English
ISBN-13 : 9781800560796
Category :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Aug 27, 2021
Length: 248 pages
Edition : 1st
Language : English
ISBN-13 : 9781800560796
Category :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 147.97
Engineering MLOps
$48.99
Machine Learning for Time-Series with Python
$54.99
Machine Learning Engineering with MLflow
$43.99
Total $ 147.97 Stars icon
Banner background image

Table of Contents

17 Chapters
Section 1: Problem Framing and Introductions Chevron down icon Chevron up icon
Chapter 1: Introducing MLflow Chevron down icon Chevron up icon
Chapter 2: Your Machine Learning Project Chevron down icon Chevron up icon
Section 2: Model Development and Experimentation Chevron down icon Chevron up icon
Chapter 3: Your Data Science Workbench Chevron down icon Chevron up icon
Chapter 4: Experiment Management in MLflow Chevron down icon Chevron up icon
Chapter 5: Managing Models with MLflow Chevron down icon Chevron up icon
Section 3: Machine Learning in Production Chevron down icon Chevron up icon
Chapter 6: Introducing ML Systems Architecture Chevron down icon Chevron up icon
Chapter 7: Data and Feature Management Chevron down icon Chevron up icon
Chapter 8: Training Models with MLflow Chevron down icon Chevron up icon
Chapter 9: Deployment and Inference with MLflow Chevron down icon Chevron up icon
Section 4: Advanced Topics Chevron down icon Chevron up icon
Chapter 10: Scaling Up Your Machine Learning Workflow Chevron down icon Chevron up icon
Chapter 11: Performance Monitoring Chevron down icon Chevron up icon
Chapter 12: Advanced Topics with MLflow Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.1
(17 Ratings)
5 star 47.1%
4 star 35.3%
3 star 0%
2 star 11.8%
1 star 5.9%
Filter icon Filter
Top Reviews

Filter reviews by




Rahul Z Sep 30, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
As an AI/ML practitioner, I use MLFlow as an integral part of my day-to-day work coupled with Azure Databricks. This book helped me level up my understanding and comfort with the overall platform.- I can use MLFlow far more efficiently, do more with it since reading this book as I got introduced to the additional functionalities- Well documented steps, with screenshots and diagrams helped me keep track and follow along- Conventions being used throughout are documented at the beginning: which is highly appreciated and sets the tone nicely for professionals like me- Book ends with a nice chapter about advance use cases, which I definitely want to explore going forwardAll in all, great bang for buck here in the US: you can't go wrong with this purchase. However, be sure to review the description and key features being covered in the listing to ensure this copy matches your expectations going in.
Amazon Verified review Amazon
Luis Felipe Yepez Barrios Aug 27, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book introduce you to MLflow open source framework to manage all the machine learning models/projects life cycle in your organization, from tracking experiments, projects, models up to serving locally and in AWS sagemaker.It is organize in a way that you can start to understand Mlflow in detail, through several walkthrough hands-on exercises highlighting in every step all the aspects from Problem framing, Model development, experimentation, Productionalize, Monitor and also include topics developed in details like scaling machine learning workflow with Mlflow with Apache Spark, Nvidia RAPIDS, Ray platform, how to integrate Mlflow with java and R with clear and thoughtfully examples.This book provide a proper MLOps implementation streamlines the process of developing and deploying ML models, and outline several considerations that will help ensure ML applications make it to production and run smoothly. At the end of the day, that’s what it takes for a model to provide business value.As a summary is not just a great book of Machine Learning explaining how to manage the life cycle of the model/project also provide a lot of good practices not related exclusively to Mlflow, those cover all the different stages of the project development.
Amazon Verified review Amazon
TukReader Sep 23, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is easy to understand as it provides many examples of the use of MLflow and a clear chapter breakdown.This book is also well suited for beginners as most of the examples are well packaged in docker images that make it easy to install and demo locally.If your company is struggling to produce and monitor your ML process then this is the right book for you.
Amazon Verified review Amazon
IntegralBill Sep 11, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book gets straight to the point, being setting up your end-to-end ML pipeline. He doesn't spend much time going over machine learning in detail; so if you don't have a background in ML prior to approaching this book I would recommend getting familiar with that first. If you're looking to learn about MLOps, creating ML-pipelines, this is an excellent book!! This book is not platform dependent (can use for any major cloud platform). I'm currently using it for my projects and this book helped me get up and running quickly!! Please see my video review for more.
Amazon Verified review Amazon
Jagane D. Sundar Nov 29, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
MLflow is the most popular open source Machine Learning Ops platform. Nate has done a thorough job of covering all the features of MLflow in detail. In addition, this book turns out to be comprehensive MLops guide as well. It is an invaluable introduction to the full lifecycle of Machine Learning projects.Nate starts with an introduction to the Projects, Experiments, Models and Model Registry features of MLflow. He builds on that knowledge by creating a 'Data Science Workbench' using MLflow and other popular open source projects such as Jupyterlab, Kubeflow, feast, etc. He finishes with chapters on MLflow integration with Amazon Sagemaker, Databricks, etc.Two Criticisms:First, the book deals exclusively with structured data and no mention of unstructured use cases such as images, video, audio and NLP - Machine Learning is rapidly moving to more unstructured use cases and this is an important oversight.Second, while the book mentions Jupyterlab in passing, most of the code, samples, work etc. are based on using the command line with Docker - this is not an environment that Data Scientists are comfortable with.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.