Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
TensorFlow Developer Certificate Guide

You're reading from   TensorFlow Developer Certificate Guide Efficiently tackle deep learning and ML problems to ace the Developer Certificate exam

Arrow left icon
Product type Paperback
Published in Sep 2023
Publisher Packt
ISBN-13 9781803240138
Length 344 pages
Edition 1st Edition
Arrow right icon
Author (1):
Arrow left icon
Oluwole Fagbohun Oluwole Fagbohun
Author Profile Icon Oluwole Fagbohun
Oluwole Fagbohun
Arrow right icon
View More author details
Toc

Table of Contents (20) Chapters Close

Preface 1. Part 1 – Introduction to TensorFlow
2. Chapter 1: Introduction to Machine Learning FREE CHAPTER 3. Chapter 2: Introduction to TensorFlow 4. Chapter 3: Linear Regression with TensorFlow 5. Chapter 4: Classification with TensorFlow 6. Part 2 – Image Classification with TensorFlow
7. Chapter 5: Image Classification with Neural Networks 8. Chapter 6: Improving the Model 9. Chapter 7: Image Classification with Convolutional Neural Networks 10. Chapter 8: Handling Overfitting 11. Chapter 9: Transfer Learning 12. Part 3 – Natural Language Processing with TensorFlow
13. Chapter 10: Introduction to Natural Language Processing 14. Chapter 11: NLP with TensorFlow 15. Part 4 – Time Series with TensorFlow
16. Chapter 12: Introduction to Time Series, Sequences, and Predictions 17. Chapter 13: Time Series, Sequences, and Prediction with TensorFlow 18. Index 19. Other Books You May Enjoy

ML life cycle

Before embarking on any ML project, we must take into account some key components that can determine whether our project will be successful or not. And this is important because as data professionals who want to build and implement successful ML projects, we need to understand how the ML life cycle works. The ML life cycle is a sensible framework to implement an ML project, as shown in Figure 1.7:

Figure 1.7 – The ML life cycle

Figure 1.7 – The ML life cycle

Let’s look at each of these in detail.

The business case

Before unleashing state-of-the-art models on any problem, it is imperative you take time to sit with stakeholders to clearly understand the business objectives or the pain points to be resolved, as without clarity, the entire process will almost definitely fail. It is always important to keep in mind that the goal of the entire process is not to test a new breakthrough model you have been itching to try out but to solve a pain point, or create value for your company.

Once we understand the problem, we can categorize the problem as either a supervised or unsupervised learning task. This phase of an ML life cycle is all about asking the right questions. We need to sit with the concerned team to determine what the key metrics that would define the project as a success are. What resources are required in terms of budget, manpower, compute, and the project timeline? Do we have the domain understanding or do we need an expert’s input into defining and understanding the underlying factors and goals that will define the project’s success? These are some of the questions we should ask as data professionals before we embark on a project.

For the exam, we will need to understand the requirements of each question before we tackle them. We will discuss a lot more about the exam before we conclude this chapter.

Data gathering and understanding

When all the requirements are detailed, the next step is to collect the data required for the project. In this phase, we would first determine what type of data we will collect and where we will collect it from. Before we embark on anything, we need to ask ourselves whether the data is relevant – for example, if we collect historical car data from 1980, would we be able to predict the price of a car in 2022? Would data be made available by stakeholders, or would we be collecting it from a database, Internet of Things (IoT) devices, or via web scraping? Would there be any need for the collection of secondary data for the task at hand? Also, we would need to establish whether the data will be collected all at once or whether it will be a continuous process of data collection. Once we have collected the data needed for the project, we would then examine the data to get an understanding of it.

Next, we would examine the data to see whether the data collected is in the right format. For example, if you collect car sales data from multiple sources, one source may calculate a car’s mileage in kilometers per hour and another source could use miles per hour. Also, there could be missing values in some of the features, and we might also encounter duplicates, outliers, and irrelevant features in the data we collected. During this phase, we would carry out data exploration to gain insights into the data, and data preprocessing to handle various issues such as formatting problems, missing values, duplicates, removal of irrelevant features, and handling outliers, imbalanced data, and categorical features.

Modeling

Now that we have a good understanding of the business needs, we have decided on the type of ML problem that we will address, and we also have good-quality data after completing our preprocessing step. We will split our data into a training split and keep a small subset of the test to evaluate the model’s performance. We will train our model to understand the relationship between the features and the target variable using our training set. For example, we could train our fraud detection model on historical data provided by the bank and test it out with our hold out (test set) to evaluate our model’s performance before deploying it for use. We go through an iterative process of fine-tuning our model hyperparameters until we arrive at our optimal model.

Defining whether the modeling process is a success or not is tied to the business objective, since achieving a high accuracy of 90 percent would still leave room for a 10 percent error, which could be decisive in high-stake domains such as healthcare. Imagine you deploy a model for early-stage cancer detection with an accuracy of 90 percent, which means the model would likely fail once for every 10 people; in 100 tries, it could fail about 10 times, and it could misclassify someone with cancer as healthy. This could lead to the individual not only failing to seek medical advice but also to an untimely demise. Your company could get sued and the blame would fall in your lap. To avoid situations like this, we need to understand what metrics are important for our project and what we should be less strict with. It is also important to address factors such as class imbalance, model interpretability, and ethical implications.

There are various metrics that are used to evaluate a model, and the type of evaluation depends on the type of problem we will handle. We will discuss regression metrics in Chapter 3, Linear Regression with TensorFlow, and classification metrics in Chapter 4, Classification with TensorFlow,.

Error analysis

We are not ready for deployment yet. Remember the 10 percent data that could tank our project? We will address that here. We perform an error analysis to identify the misclassified labels to identify why the model missed them. Do we have enough representative samples of these misclassified labels in our training data? We would have to determine whether we need to collect more data to capture these cases where the model failed. Can we generate synthetic data to capture the misclassified labels? Or was the misclassified data down to the wrong labeling?

Wrongly labeled data can hamper the performance of a model, as it will learn incorrect relationships between the features and target, resulting in poor performance on unseen data, making the model unreliable and the entire process a waste of resources and time. Once we resolve these questions and ensure accurate labels, we need to retrain and reevaluate our model. These steps are continuous until the business objective is achieved, and then we can proceed to deploy our model.

Model deployment and monitoring

After resolving the issues identified in the error analysis step, we can now deploy our model to production. There are various methods of deployment available. We could deploy our model as a web service, on the cloud, or on edge devices. Model deployment can be challenging as well as exciting because the entire point of building and training a model is to allow end users to apply it to solve a pain point. Once we deploy our model, we also monitor the model to ensure that the overall objectives of the business are continually achieved, and even the best-performing models can begin to underperform over time due to concept drift and data drift. Hence, after deploying our model, we cannot retire to some island. We need to continuously monitor our model and retrain the model when needed in order to ensure it continues to perform optimally.

We have now gone through the full length of the ML life cycle at a high level. Of course, there is a lot more that we can talk about in greater depth, but this is out of the scope of this exam. Hence, we will now switch our focus to looking at a number of exciting use cases where ML can be applied.

You have been reading a chapter from
TensorFlow Developer Certificate Guide
Published in: Sep 2023
Publisher: Packt
ISBN-13: 9781803240138
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image