You're reading from The Machine Learning Solutions Architect Handbook Practical strategies and best practices on the ML lifecycle, system design, MLOps, and generative AI

Product type Paperback

Published in Apr 2024

Publisher Packt

ISBN-13 9781805122500

Length 602 pages

Edition 2nd Edition

Languages

Python

Tools

MLOps

Concepts

Machine Learning

Author (1):

David Ping

View More author details

Table of Contents (19) Chapters

Preface

1. Navigating the ML Lifecycle with ML Solutions Architecture FREE CHAPTER

2. Exploring ML Business Use Cases

3. Exploring ML Algorithms

4. Data Management for ML

5. Exploring Open-Source ML Libraries

6. Kubernetes Container Orchestration Infrastructure Management

7. Open-Source ML Platforms

8. Building a Data Science Environment Using AWS ML Services

9. Designing an Enterprise ML Architecture with AWS ML Services

10. Advanced ML Engineering

11. Building ML Solutions with AWS AI Services

12. AI Risk Management

13. Bias, Explainability, Privacy, and Adversarial Attacks

14. Charting the Course of Your ML Journey

15. Navigating the Generative AI Project Lifecycle

16. Designing Generative AI Platforms and Solutions

17. Other Books You May Enjoy

18. Index

ML solutions architecture

When I initially worked with companies as an ML solutions architect, the landscape was quite different from what it is now. The focus was mainly on data science and modeling, and the problems at hand were small in scope. Back then, most of the problems could be solved using simple ML techniques. The datasets were small, and the infrastructure required was not too demanding. The scope of the ML initiative at these companies was limited to a few data scientists or teams. As an ML architect at that time, I primarily needed to have solid data science skills and general cloud architecture knowledge to get the job done.

In more recent years, the landscape of ML initiatives has become more intricate and multifaceted, necessitating involvement from a broader range of functions and personas at companies. My engagement has expanded to include discussions with business executives about ML strategies and organizational design to facilitate the broad adoption of AI/ML throughout their enterprises. I have been tasked with designing more complex ML platforms, utilizing a diverse range of technologies for large enterprises to meet stringent security and compliance requirements. ML workflow orchestration and operations have become increasingly crucial topics of discussion, and more and more companies are looking to train large ML models with enormous amounts of training data. The number of ML models trained and deployed by some companies has skyrocketed to tens of thousands from a few dozen models in just a few years. Furthermore, sophisticated and security-sensitive customers have sought guidance on topics such as ML privacy, model explainability, and data and model bias. As an ML solutions architect, I’ve noticed that the skills and knowledge required to be successful in this role have evolved significantly.

Trying to navigate the complexities of a business, data, science, and technology landscape can be a daunting task. As an ML solutions architect, I have seen firsthand the challenges that companies face in bringing all these pieces together. In my view, ML solutions architecture is an essential discipline that serves as a bridge connecting the different components of an ML initiative. Drawing on my years of experience working with companies of all sizes and across diverse industries, I believe that an ML solutions architect plays a pivotal role in identifying business needs, developing ML solutions to address these needs, and designing the technology platforms necessary to run these solutions. By collaborating with various business and technology partners, an ML solutions architect can help companies unlock the full potential of their data and realize tangible benefits from their ML initiatives.

The following figure illustrates the core functional areas covered by the ML solutions architecture:

Figure 1.3: ML solutions architecture coverage

In the following sections, we will explore each of these areas in greater detail:

Business understanding: Business problem understanding and transformation using AI and ML.
Identification and verification of ML techniques: Identification and verification of ML techniques for solving specific ML problems.
System architecture of the ML technology platform: System architecture design and implementation of the ML technology platforms.
MLOps: ML platform automation technical design.
Security and compliance: Security, compliance, and audit considerations for the ML platform and ML models.

So, let’s dive in!

Business understanding and ML transformation

The goal of the business workflow analysis is to identify inefficiencies in the workflows and determine if ML can be applied to help eliminate pain points, improve efficiency, or even create new revenue opportunities.

Picture this: you are tasked with improving a call center’s operations. You know there are inefficiencies that need to be addressed, but you’re not sure where to start. That’s where business workflow analysis comes in. By analyzing the call center’s workflows, you can identify pain points such as long customer wait times, knowledge gaps among agents, and the inability to extract customer insights from call recordings. Once you have identified these issues, you can determine what data is available and which business metrics need to be improved. This is where ML comes in. You can use ML to create virtual assistants for common customer inquiries, transcribe audio recordings to allow for text analysis, and detect customer intent for product cross-sell and up-sell. But sometimes, you need to modify the business process to incorporate ML solutions. For example, if you want to use call recording analytics to generate insights for cross-selling or up-selling products, but there’s no established process to act on those insights, you may need to introduce an automated target marketing process or a proactive outreach process by the sales team.

Identification and verification of ML techniques

Once you have come up with a list of ML options, the next step is to determine if the assumption behind the ML approach is valid. This could involve conducting a simple proof of concept (POC) modeling to validate the available dataset and modeling approach, or technology POC using pre-built AI services, or testing of ML frameworks. For example, you might want to test the feasibility of text transcription from audio files using an existing text transcription service or build a customer propensity model for a new product conversion from a marketing campaign.

It is worth noting that ML solutions architecture does not focus on developing new machine algorithms, a job best suited for applied data scientists or research data scientists. Instead, ML solutions architecture focuses on identifying and applying ML algorithms to address a range of ML problems such as predictive analytics, computer vision, or natural language processing. Also, the goal of any modeling task here is not to build production-quality models but rather to validate the approach for further experimentations by full-time applied data scientists.

System architecture design and implementation

The most important aspect of the ML solutions architect’s role is the technical architecture design of the ML platform. The platform will need to provide the technical capability to support the different phases of the ML cycle and personas, such as data scientists and operations engineers. Specifically, an ML platform needs to have the following core functions:

Data explorations and experimentation: Data scientists use ML platforms for data exploration, experimentation, model building, and model evaluation. ML platforms need to provide capabilities such as data science development tools for model authoring and experimentation, data wrangling tools for data exploration and wrangling, source code control for code management, and a package repository for library package management.
Data management and large-scale data processing: Data scientists or data engineers will need the technical capability to ingest, store, access, and process large amounts of data for cleansing, transformation, and feature engineering.
Model training infrastructure management: ML platforms will need to provide model training infrastructure for different modeling training using different types of computing resources, storage, and networking configurations. It also needs to support different types of ML libraries or frameworks, such as scikit-learn, TensorFlow, and PyTorch.
Model hosting/serving: ML platforms will need to provide the technical capability to host and serve the model for prediction generations, for real-time, batch, or both.
Model management: Trained ML models will need to be managed and tracked for easy access and lookup, with relevant metadata.
Feature management: Common and reusable features will need to be managed and served for model training and model serving purposes.

ML platform workflow automation

A key aspect of ML platform design is workflow automation and continuous integration/continuous deployment (CI/CD), also known as MLOps. ML is a multi-step workflow – it needs to be automated, which includes data processing, model training, model validation, and model hosting. Infrastructure provisioning automation and self-service is another aspect of automation design. Key components of workflow automation include the following:

Pipeline design and management: The ability to create different automation pipelines for various tasks, such as model training and model hosting.
Pipeline execution and monitoring: The ability to run different pipelines and monitor the pipeline execution status for the entire pipeline and each of the steps in the ML cycle such as data processing and model training.
Model monitoring configuration: The ability to monitor the model in production for various metrics, such as data drift (where the distribution of data used in production deviates from the distribution of data used for model training), model drift (where the performance of the model degrades in the production compared with training results), and bias detection (the ML model replicating or amplifying bias towards certain individuals).

Security and compliance

Another important aspect of ML solutions architecture is the security and compliance consideration in a sensitive or enterprise setting:

Authentication and authorization: The ML platform needs to provide authentication and authorization mechanisms to manage access to the platform and different resources and services.
Network security: The ML platform needs to be configured for different network security controls such as a firewall and an IP address access allowlist to prevent unauthorized access.
Data encryption: For security-sensitive organizations, data encryption is another important aspect of the design consideration for the ML platform.
Audit and compliance: Audit and compliance staff need the information to help them understand how decisions are made by the predictive models if required, the lineage of a model from data to model artifacts, and any bias exhibited in the data and model. The ML platform will need to provide model explainability, bias detection, and model traceability across the various datastore and service components, among other capabilities.

Various industry technology providers have established best practices to guide the design and implementation of ML infrastructure, which is part of the ML solutions architect’s practices. Amazon Web Services, for example, created Machine Learning Lens to provide architectural best practices across crucial domains like operational excellence, security, reliability, performance, cost optimization, and sustainability. Following these published guidelines can help practitioners implement robust and effective ML solutions.