Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Learn Amazon SageMaker
Learn Amazon SageMaker

Learn Amazon SageMaker: A guide to building, training, and deploying machine learning models for developers and data scientists , Second Edition

Arrow left icon
Profile Icon Julien Simon
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8 (10 Ratings)
Paperback Nov 2021 554 pages 2nd Edition
eBook
$27.98 $39.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Julien Simon
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8 (10 Ratings)
Paperback Nov 2021 554 pages 2nd Edition
eBook
$27.98 $39.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$27.98 $39.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Learn Amazon SageMaker

Chapter 1: Introducing Amazon SageMaker

Machine learning (ML) practitioners use a large collection of tools in the course of their projects: open source libraries, deep learning frameworks, and more. In addition, they often have to write their own tools for automation and orchestration. Managing these tools and their underlying infrastructure is time-consuming and error-prone.

This is the very problem that Amazon SageMaker was designed to address (https://aws.amazon.com/sagemaker/). Amazon SageMaker is a fully managed service that helps you quickly build and deploy machine learning models. Whether you're just beginning with machine learning or you're an experienced practitioner, you'll find SageMaker features to improve the agility of your workflows, as well as the performance of your models. You'll be able to focus 100% on the machine learning problem at hand, without spending any time installing, managing, and scaling machine learning tools and infrastructure.

In this first chapter, we're going to learn what the main capabilities of SageMaker are, how they help solve pain points faced by machine learning practitioners, and how to set up SageMaker. This chapter will comprise the following topics:

  • Exploring the capabilities of Amazon SageMaker
  • Setting up Amazon SageMaker on your local machine
  • Setting up Amazon SageMaker Studio
  • Deploying one-click solutions and models with Amazon SageMaker JumpStart

Technical requirements

You will need an AWS account to run the examples included in this chapter. If you haven't got one already, please point your browser to https://aws.amazon.com/getting-started/ to learn about AWS and its core concepts, and to create an AWS account. You should also familiarize yourself with the AWS Free Tier (https://aws.amazon.com/free/), which lets you use many AWS services for free within certain usage limits.

You will need to install and configure the AWS CLI for your account (https://aws.amazon.com/cli/).

You will need a working Python 3.x environment. Installing the Anaconda distribution (https://www.anaconda.com/) is not mandatory but is strongly encouraged as it includes many projects that we will need (Jupyter, pandas, numpy, and more).

Code examples included in the book are available on GitHub at https://github.com/PacktPublishing/Learn-Amazon-SageMaker-second-edition. You will need to install a Git client to access them (https://git-scm.com/).

Exploring the capabilities of Amazon SageMaker

Amazon SageMaker was launched at AWS re:Invent 2017. Since then, a lot of new features have been added: you can see the full (and ever-growing) list at https://aws.amazon.com/about-aws/whats-new/machine-learning.

In this section, you'll learn about the main capabilities of Amazon SageMaker and its purpose. Don't worry, we'll dive deep into each of them in later chapters. We will also talk about the SageMaker Application Programming Interfaces (APIs), and the Software Development Kits (SDKs) that implement them.

The main capabilities of Amazon SageMaker

At the core of Amazon SageMaker is the ability to prepare, build, train, optimize, and deploy models on fully managed infrastructure at any scale. This lets you focus on studying and solving the machine learning problem at hand, instead of spending time and resources on building and managing infrastructure. Simply put, you can go from building to training to deploying more quickly. Let's zoom in on each step and highlight relevant SageMaker capabilities.

Preparing

Amazon SageMaker includes powerful tools to label and prepare datasets:

  • Amazon SageMaker Ground Truth: Annotate datasets at any scale. Workflows for popular use cases are built in (image detection, entity extraction, and more), and you can implement your own. Annotation jobs can be distributed to workers that belong to private, third-party, or public workforces.
  • Amazon SageMaker Processing: Run batch jobs for data processing (and other tasks such as model evaluation) using your own code written with scikit-learn or Spark.
  • Amazon SageMaker Data Wrangler: Using a graphical interface, apply hundreds of built-in transforms (or your own) to tabular datasets, and export them in one click to a Jupyter notebook.
  • Amazon SageMaker Feature Store: Store your engineered features offline in Amazon S3 to build datasets, or online to use them at prediction time.
  • Amazon SageMaker Clarify: Using a variety of statistical metrics, analyze potential bias present in your datasets and models, and explain how your models predict.

Building

Amazon SageMaker provides you with two development environments:

  • Notebook instances: Fully managed Amazon EC2 instances that come preinstalled with the most popular tools and libraries: Jupyter, Anaconda, and so on.
  • Amazon SageMaker Studio: An end-to-end integrated development environment for machine learning projects, providing an intuitive graphical interface for many SageMaker capabilities. Studio is now the preferred way to run notebooks, and we recommend that you use it instead of notebook instances.

When it comes to experimenting with algorithms, you can choose from the following:

  • A collection of 17 built-in algorithms for machine learning and deep learning, already implemented and optimized to run efficiently on AWS. No Machine learning code to write!
  • A collection of built-in, open source frameworks (TensorFlow, PyTorch, Apache MXNet, scikit-learn, and more), where you simply bring your own code.
  • Your own code running in your own container: custom Python, R, C++, Java, and so on.
  • Algorithms and pre-trained models from AWS Marketplace for machine learning (https://aws.amazon.com/marketplace/solutions/machine-learning).
  • Machine learning solutions and state-of-the-art models available in one click in Amazon SageMaker JumpStart.

In addition, Amazon SageMaker Autopilot uses AutoMachine learning to automatically build, train, and optimize models without the need to write a single line of Machine learning code.

Training

As mentioned earlier, Amazon SageMaker takes care of provisioning and managing your training infrastructure. You'll never spend any time managing servers, and you'll be able to focus on machine learning instead. On top of this, SageMaker brings advanced capabilities such as the following:

  • Managed storage using either Amazon S3, Amazon EFS, or Amazon FSx for Lustre depending on your performance requirements.
  • Managed spot training, using Amazon EC2 Spot instances for training in order to reduce costs by up to 80%.
  • Distributed training automatically distributes large-scale training jobs on a cluster of managed instances, using advanced techniques such as data parallelism and model parallelism.
  • Pipe mode streams infinitely large datasets from Amazon S3 to the training instances, saving the need to copy data around.
  • Automatic model tuning runs hyperparameter optimization to deliver high-accuracy models more quickly.
  • Amazon SageMaker Experiments easily tracks, organizes, and compares all your SageMaker jobs.
  • Amazon SageMaker Debugger captures the internal model state during training, inspects it to observe how the model learns, detects unwanted conditions that hurt accuracy, and profiles the performance of your training job.

Deploying

Just as with training, Amazon SageMaker takes care of all your deployment infrastructure, and brings a slew of additional features:

  • Real-time endpoints create an HTTPS API that serves predictions from your model. As you would expect, autoscaling is available.
  • Batch transform uses a model to predict data in batch mode.
  • Amazon Elastic Inference adds fractional GPU acceleration to CPU-based endpoints to find the best cost/performance ratio for your prediction infrastructure.
  • Amazon SageMaker Model Monitor captures data sent to an endpoint and compares it with a baseline to identify and alert on data quality issues (missing features, data drift, and more).
  • Amazon SageMaker Neo compiles models for a specific hardware architecture, including embedded platforms, and deploys an optimized version using a lightweight runtime.
  • Amazon SageMaker Edge Manager helps you deploy and manage your models on edge devices.
  • Last but not least, Amazon SageMaker Pipelines lets you build end-to-end automated pipelines to run and manage your data preparation, training, and deployment workloads.

The Amazon SageMaker API

Just like all other AWS services, Amazon SageMaker is driven by APIs that are implemented in the language SDKs supported by AWS (https://aws.amazon.com/tools/). In addition, a dedicated Python SDK, aka the SageMaker SDK is also available. Let's look at both, and discuss their respective benefits.

The AWS language SDKs

Language SDKs implement service-specific APIs for all AWS services: S3, EC2, and so on. Of course, they also include SageMaker APIs, which are documented here: https://docs.aws.amazon.com/sagemaker/latest/dg/api-and-sdk-reference.htmachine learning.

When it comes to data science and machine learning, Python is the most popular language, so let's take a look at the SageMaker APIs available in boto3, the AWS SDK for the Python language (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.htmachine learning). These APIs are quite low-level and verbose: for example, create_training_job() has a lot of JSON parameters that don't look very obvious. You can see some of them in the next screenshot. You may think that this doesn't look very appealing for everyday Machine learning experimentation… and I would totally agree!

Figure 1.1 – A (partial) view of the create_training_job() API in boto3

Figure 1.1 – A (partial) view of the create_training_job() API in boto3

Indeed, these service-level APIs are not meant to be used for experimentation in notebooks. Their purpose is automation, through either bespoke scripts or Infrastructure as Code tools such as AWS CloudFormation (https://aws.amazon.com/cloudformation) and Terraform (https://terraform.io). Your DevOps team will use them to manage production, where they do need full control over each possible parameter.

So, what should you use for experimentation? You should use the Amazon SageMaker SDK.

The Amazon SageMaker SDK

The Amazon SageMaker SDK (https://github.com/aws/sagemaker-python-sdk) is a Python SDK specific to Amazon SageMaker. You can find its documentation at https://sagemaker.readthedocs.io/en/stable/.

Note

Every effort has been made to check the code examples in this book with the latest SageMaker SDK (v2.58.0 at the time of writing).

Here, the abstraction level is much higher: the SDK contains objects for models, estimators, models, predictors, and so on. We're definitely back in Machine learning territory.

For instance, this SDK makes it extremely easy and comfortable to fire up a training job (one line of code) and to deploy a model (one line of code). Infrastructure concerns are abstracted away, and we can focus on Machine learning instead. Here's an example. Don't worry about the details for now:

# Configure the training job
my_estimator = TensorFlow(
    entry_point='my_script.py',
    role=my_sagemaker_role,
    train_instance_type='machine learning.p3.2xlarge',
    instance_count=1,
    framework_version='2.1.0')
# Train the model
my_estimator.fit('s3://my_bucket/my_training_data/')
# Deploy the model to an HTTPS endpoint
my_predictor = my_estimator.deploy(
    initial_instance_count=1, 
    instance_type='machine learning.c5.2xlarge')

Now that we know a little more about Amazon SageMaker, let's see how we can set it up.

Setting up Amazon SageMaker on your local machine

A common misconception is that you can't use SageMaker outside of the AWS cloud. Obviously, it is a cloud-based service, and its most appealing capabilities require cloud infrastructure to run. However, many developers like to set up their development environment their own way, and SageMaker lets them do that: in this section, you will learn how to install the SageMaker SDK on your local machine or on a local server. In later chapters, you'll learn how to train and deploy models locally.

It's good practice to isolate Python environments in order to avoid dependency hell. Let's see how we can achieve this using two popular projects: virtualenv (https://virtualenv.pypa.io) and Anaconda (https://www.anaconda.com/).

Installing the SageMaker SDK with virtualenv

If you've never worked with virtualenv before, please read this tutorial before proceeding: https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/:

  1. First, let's create a new environment named sagemaker and activate it:
    $ mkdir workdir
    $ cd workdir
    $ python3 -m venv sagemaker
    $ source sagemaker/bin/activate
  2. Now, let's install boto3, the SageMaker SDK, and the pandas library (https://pandas.pydata.org/), which is also required:
    $ pip3 install boto3 sagemaker pandas
  3. Now, let's quickly check that we can import these SDKs into Python:
    $ python3
    Python 3.9.5 (default, May  4 2021, 03:29:30)
    >>> import boto3
    >>> import sagemaker
    >>> print(boto3.__version__)
    1.17.70
    >>> print(sagemaker.__version__)
    2.39.1
    >>> exit()

The installation looks fine. Your own versions will certainly be newer and that's fine. Now, let's run a quick test with a local Jupyter server (https://jupyter.org/). If Jupyter isn't installed on your machine, you can find instructions at https://jupyter.org/install:

  1. First, let's create a Jupyter kernel based on our virtual environment:
    $ pip3 install jupyter ipykernel
    $ python3 -m ipykernel install --user --name=sagemaker
  2. Then, we can launch Jupyter:
    $ jupyter notebook
  3. Creating a new notebook, we can see that the sagemaker kernel is available, so let's select it in the New menu, as seen in the following screenshot:
    Figure 1.2 – Creating a new notebook

    Figure 1.2 – Creating a new notebook

  4. Finally, we can check that the SDKs are available by importing them and printing their version, as shown in the following screenshot:
Figure 1.3 – Checking the SDK version

Figure 1.3 – Checking the SDK version

This completes the installation with virtualenv. Don't forget to terminate Jupyter, and to deactivate your virtualenv:

$ deactivate

You can also install the SDK using Anaconda.

Installing the SageMaker SDK with Anaconda

Anaconda includes a package manager named conda that lets you create and manage isolated environments. If you've never worked with conda before, you should do the following:

We will get started using the following steps:

  1. Let's create and activate a new conda environment named conda-sagemaker:
    $ conda create -y -n conda-sagemaker
    $ conda activate conda-sagemaker
  2. Then, we install pandas, boto3, and the SageMaker SDK. The latter has to be installed with pip as it's not available as a conda package:
    $ conda install -y boto3 pandas
    $ pip3 install sagemaker
  3. Now, let's add Jupyter and its dependencies to the environment, and create a new kernel:
    $ conda install -y jupyter ipykernel
    $ python3 -m ipykernel install --user --name conda-sagemaker
  4. Then, we can launch Jupyter:
    $ jupyter notebook

    Check that the conda-sagemaker kernel is present in the New menu, as is visible in the following screenshot:

    Figure 1.4 – Creating a new conda environment

    Figure 1.4 – Creating a new conda environment

  5. Just like in the previous section, we can create a notebook using this kernel and check that the SDKs are imported correctly.

This completes the installation with conda. Whether you'd rather use it instead of virtualenv is largely a matter of personal preference. You can definitely run all notebooks in this book and build your own projects with one or the other.

A word about AWS permissions

Amazon Identity and Access Management (IAM) enables you to manage access to AWS services and resources securely (https://aws.amazon.com/iam). Of course, this applies to Amazon SageMaker as well, and you need to make sure that your AWS user has sufficient permissions to invoke the SageMaker API.

IAM permissions

If you're not familiar with IAM at all, please read the following documentation:

https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.htmachine learning

You can run a quick test by using the AWS CLI on one of the SageMaker APIs, for example, list-endpoints. I'm using the eu-west-1 region here, but feel free to use the region that is nearest to you:

$ aws sagemaker list-endpoints --region eu-west-1
{
    "Endpoints": []
}

If you get an error message complaining about insufficient permissions, you need to update the IAM role attached to your AWS user.

If you own the AWS account in question, you can easily do this yourself in the IAM console by adding the AmazonSageMakerFullAccess managed policy to your role. Note that this policy is extremely permissive: this is fine for a development account, but certainly not for a production account.

If you work with an account where you don't have administrative rights (such as a company-provided account), please contact your IT administrator to add SageMaker permissions to your AWS user.

For more information on SageMaker permissions, please refer to the documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/security-iam.htmachine learning.

Setting up Amazon SageMaker Studio

Experimentation is a key part of the Machine learning process. Developers and data scientists use a collection of open source tools and libraries for data exploration, data processing, and, of course, to evaluate candidate algorithms. Installing and maintaining these tools takes a fair amount of time, which would probably be better spent on studying the Machine learning problem itself!

Amazon SageMaker Studio brings you the machine learning tools you need from experimentation to production. At its core is an integrated development environment based on Jupyter that makes it instantly familiar.

In addition, SageMaker Studio is integrated with other SageMaker capabilities, such as SageMaker Experiments to track and compare all jobs, SageMaker Autopilot to automatically create machine learning models, and more. A lot of operations can be achieved in just a few clicks, without having to write any code.

SageMaker Studio also further simplifies infrastructure management. You won't have to create notebook instances: SageMaker Studio provides you with compute environments that are readily available to run your notebooks.

Note

This section requires basic knowledge of Amazon S3, Amazon VPC, and Amazon IAM. If you're not familiar with them at all, please read the following documentation:

https://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.htmachine learning

https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.htmachine learning

https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.htmachine learning

Now would also probably be a good time to take a look at (and bookmark) the SageMaker pricing page: https://aws.amazon.com/sagemaker/pricing/.

Onboarding to Amazon SageMaker Studio

You can access SageMaker Studio using any of these three options:

  • Use the quick start procedure: This is the easiest option for individual accounts, and we'll walk through it in the following paragraphs.
  • Use AWS Single Sign-On (SSO): If your company has an SSO application set up, this is probably the best option. You can learn more about SSO onboarding at https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-sso-users.htmachine learning. Please contact your IT administrator for details.
  • Use Amazon IAM: If your company doesn't use SSO, this is probably the best option. You can learn more about SSO onboarding at https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-iam.htmachine learning. Again, please contact your IT administrator for details.

Onboarding with the quick start procedure

There are several steps to the quick start procedure:

  1. First, open the AWS Console in one of the regions where Amazon SageMaker Studio is available, for example, https://us-east-2.console.aws.amazon.com/sagemaker/.
  2. As shown in the following screenshot, the left-hand vertical panel has a link to SageMaker Studio:
    Figure 1.5 – Opening SageMaker Studio

    Figure 1.5 – Opening SageMaker Studio

  3. Clicking on this link opens the onboarding screen, and you can see its first section in the next screenshot:
    Figure 1.6 – Running Quick start

    Figure 1.6 – Running Quick start

  4. Let's select Quick start. Then, we enter the username we'd like to use to log in to SageMaker Studio, and we create a new IAM role as shown in the preceding screenshot. This opens the following screen:
    Figure 1.7 – Creating an IAM role

    Figure 1.7 – Creating an IAM role

    The only decision we have to make here is whether we want to allow our notebook instance to access specific Amazon S3 buckets. Let's select Any S3 bucket and click on Create role. This is the most flexible setting for development and testing, but we'd want to apply much stricter settings for production. Of course, we can edit this role later on in the IAM console, or create a new one.

  5. Once we've clicked on Create role, we're back to the previous screen. Please make sure that project templates and JumpStart are enabled for this account. (this should be the default setting).
  6. We just have to click on Submit to launch the onboarding procedure. Depending on your account setup, you may get an extra screen asking you to select a VPC and a subnet. I'd recommend selecting any subnet in your default VPC.
  7. A few minutes later, SageMaker Studio is in service, as shown in the following screenshot. We could add extra users if we needed to, but for now, let's just click on Open Studio:
    Figure 1.8 – Launching SageMaker Studio

    Figure 1.8 – Launching SageMaker Studio

    Don't worry if this takes a few more minutes, as SageMaker Studio needs to complete the first-run setup of your environment. As shown in the following screenshot, once we open SageMaker Studio, we see the familiar JupyterLab layout:

    Note

    SageMaker Studio is a living thing. By the time you're reading this, some screens may have been updated. Also, you may notice small differences from one region to the next, as some features or instance types are not available there.

    Figure 1.9 – SageMaker Studio welcome screen

    Figure 1.9 – SageMaker Studio welcome screen

  8. We can immediately create our first notebook. In the Launcher tab, in the Notebooks and compute resources section, let's select Data Science, and click on NotebookPython 3.
  9. This opens a notebook, as is visible in the following screenshot. We first check that SDKs are readily available. As this is the first time we are launching the Data Science kernel, we need to wait for a couple of minutes.

    Figure 1.10 – Checking the SDK version

    Figure 1.10 – Checking the SDK version

  10. As is visible in the following screenshot, we can easily list resources that are currently running in our Studio instance: an machine learning.t3.medium instance, the data science image supporting the kernel used in our notebook, and the notebook itself:
    Figure 1.11 – Viewing Studio resources

    Figure 1.11 – Viewing Studio resources

  11. To avoid unnecessary costs, we should shut these resources down when we're done working with them. For example, we can shut down the instance and all resources running on it, as you can see in the following screenshot. Don't do it now, we'll need the instance to run the next examples!
    Figure 1.12 – Shutting down an instance

    Figure 1.12 – Shutting down an instance

  12. Machine learning.t3.medium is the default instance size that Studio uses. You can switch to other instance types by clicking on 2 vCPU + 4 GiB at the top of your notebook. This lets you select a new instance size and launch it in Studio. After a few minutes, the instance is up and your notebook code has been migrated automatically. Don't forget to shut down the previous instance, as explained earlier.
  13. When we're done working with SageMaker Studio, all we have to do is close the browser tab. If we want to resume working, we just have to go back to the SageMaker console and click on Open Studio.
  14. If we wanted to shut down the Studio instance itself, we'd simply select Shut Down in the File menu. All files would still be preserved until we deleted Studio completely in the SageMaker console.

Now that we've completed the setup, I'm sure you're impatient to get started with machine learning. Let's start deploying some models!

Deploying one-click solutions and models with Amazon SageMaker JumpStart

If you're new to machine learning, you may find it difficult to get started with real-life projects. You've run all the toy examples, and you've read several blog posts on the state of the models for COMPUTER VISION OR NATURAL LANGUAGE PROCESSING. Now what? How can you start using these complex models on your own data to solve your own business problems?

Even if you're an experienced practitioner, building end-to-end machine learning solutions is not an easy task. Training and deploying models is just part of the equation: what about data preparation, automation, and so on?

Amazon SageMaker JumpStart was specifically built to help everyone get started more quickly with their machine learning projects. In literally one click, you can deploy the following:

  • 16 end-to-end solutions for real-life business problems such as fraud detection in financial transactions, explaining credit decisions, predictive maintenance, and more
  • Over 180 TensorFlow and PyTorch models pre-trained on a variety of computer vision and natural language processing tasks
  • Additional learning resources, such as sample notebooks, blog posts, and video tutorials

Time to deploy a solution.

Deploying a solution

Let's begin:

  1. Starting from the icon bar on the left, we open JumpStart. The following screenshot shows the opening screen:
    Figure 1.13 – Viewing solutions in JumpStart

    Figure 1.13 – Viewing solutions in JumpStart

  2. Select Fraud Detection in Financial Transactions. As can be seen in the following screenshot, this is a fascinating example that uses graph data and graph neural networks to predict fraudulent activities based on interactions:
    Figure 1.14 – Viewing solution details

    Figure 1.14 – Viewing solution details

  3. Once we've read the solution details, all we have to do is click on the Launch button. This will run an AWS CloudFormation template in charge of building all the AWS resources required by the solution.

    CloudFormation

    If you're curious about CloudFormation, you may find this introduction useful: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.htmachine learning.

  4. A few minutes later, the solution is ready, as can be seen in the following screenshot. We click on Open Notebook to open the first notebook.
    Figure 1.15 – Opening a solution

    Figure 1.15 – Opening a solution

  5. As you can see in the following screenshot, we can browse solution files in the left-hand pane: notebooks, training code, and so on:
    Figure 1.16 – Viewing solution files

    Figure 1.16 – Viewing solution files

  6. From then on, you can start running and tweaking the notebook. If you're not familiar with the SageMaker SDK yet, don't worry about the details.
  7. Once you're done, please go back to the solution page and click on Delete all resources to clean up and avoid unnecessary costs, as shown in the following screenshot:
Figure 1.17 – Deleting a solution

Figure 1.17 – Deleting a solution

As you can see, JumpStart solutions are a great way to explore how to solve business problems with machine learning and to start thinking about how you could do the same in your own business environment.

Now, let's see how we can deploy pre-trained models.

Deploying a model

JumpStart includes over 180 TensorFlow and PyTorch models pre-trained on a variety of computer vision and natural language processing tasks. Let's take a look at computer vision models:

  1. Starting from the JumpStart main screen, we open Vision models, as can be seen in the following screenshot:
    Figure 1.18 – Viewing computer vision models

    Figure 1.18 – Viewing computer vision models

  2. Let's say that we're interested in trying out object detection models based on the Single Shot Detector (SSD) architecture. We click on the SSD model from the PyTorch Hub (the fourth one from the left).
  3. This opens the model details page, telling us where the model comes from, what dataset it has been trained on, and which labels it can predict. We can also select which instance type to deploy the model. Sticking to the default, we click on Deploy to deploy the model on a real-time endpoint, as shown in the following screenshot:
    Figure 1.19 – Deploying a JumpStart model

    Figure 1.19 – Deploying a JumpStart model

  4. A few minutes later, the model has been deployed. As can be seen in the following screenshot, we can see the endpoint status in the left-hand panel, and we simply click on Open Notebook to test it.
    Figure 1.20 – Opening a JumpStart notebook

    Figure 1.20 – Opening a JumpStart notebook

  5. Clicking through the notebook cells, we download a test image and we predict which objects it contains. Bounding boxes, classes, and probabilities are visible in the following screenshot:
    Figure 1.21 – Detecting objects in a picture

    Figure 1.21 – Detecting objects in a picture

  6. When you're done, please make sure to delete the endpoint to avoid unnecessary charges: simply click on Delete in the endpoint details screen visible in Figure 1.20.

Not only does JumpStart make it extremely easy to experiment with state-of-the-art models, but it also provides you with code that you can readily use in your own projects: loading an image for prediction, predicting with an endpoint, plotting results, and so on.

As useful as pre-trained models are, we often need to fine-tune them on our own datasets. Let's see how we can do that with JumpStart.

Fine-tuning a model

Let's use an image classification model this time:

Note

A word of warning about fine-tuning text models: complex models such as BERT can take a very long time to fine-tune, sometimes several hours per epoch on a single GPU. In addition to the long waiting time, the cost won't be negligible, so I'd recommend avoiding these examples unless you have a real-life business project to work on.

  1. We select the Resnet 18 model (the second from the left in Figure 1.18).
  2. On the model details page, we see that this model can be fine-tuned either on a default dataset available for testing (a TensorFlow dataset with five flower classes) or on our own dataset stored in S3. Scrolling down, we learn about the format that our dataset should have.
  3. As visible in the following figure we stick to the default dataset. We also leave the deployment configuration and training parameters unchanged. Then, we click on Train to launch the fine-tuning job.
    Figure 1.22 – Fine-tuning a model

    Figure 1.22 – Fine-tuning a model

  4. After just a few minutes, fine-tuning is complete (which is why I picked this example!). We can see the output path in S3 where the fine-tuned model has been stored. Let's write down that path; we're going to need it in a minute.
    Figure 1.23 – Viewing fine-tuning results

    Figure 1.23 – Viewing fine-tuning results

  5. Then, we click on Deploy just like in the previous example. Once the model has been deployed, we open the sample notebook showing us how to predict with the initial pre-trained model.
  6. This notebook uses images from the original dataset that the model was pre-trained on. No problem, let's adapt it! Even if we're not yet familiar with the SageMaker SDK, the notebook is simple enough that we can understand what's going on, and add a few cells to predict a flower image with our fine-tuned model.
  7. First, we add a cell to copy the fine-tuned model artifact from S3, and we extract the list of classes and class indexes that JumpStart added:
    %%sh
    aws s3 cp s3://sagemaker-REGION_NAME-123456789012/smjs-d-pt-ic-resnet18-20210511-142657/output/model.tar.gz .
    tar xfz model.tar.gz
    cat class_label_to_prediction_index.json
    {"daisy": 0, "dandelion": 1, "roses": 2, "sunflowers": 3, "tulips": 4}
  8. As expected, the fine-tuned model can predict five classes. Let's add a cell to download a sunflower image from Wikipedia:
    %%sh
    wget https://upload.wikimedia.org/wikipedia/commons/a/a9/A_sunflower.jpg
  9. Now, we load the image and invoke the endpoint:
    import boto3
    endpoint_name = 'jumpstart-ftd-pt-ic-resnet18'
    client = boto3.client('runtime.sagemaker')
    with open('A_sunflower.jpg', 'rb') as file:
        image = file.read()
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, 
        ContentType='application/x-image',
        Body=image)
  10. Finally, we print out the predictions. The highest probability is class #3 at 60.67%, confirming that our image contains a sunflower!
    import json
    model_predictions = json.loads(response['Body'].read())
    print(model_predictions)
    [0.30362239480018616, 0.06462913751602173, 0.007234351709485054, 0.6067869663238525, 0.017727158963680267]
  11. When you're done testing, please make sure to delete the endpoint to avoid unnecessary charges.

This example illustrates how easy it is to fine-tune pre-trained models on your own datasets with SageMaker JumpStart and to use them to predict your own data. This is a great way to experiment with different models and find out which one could work best on the particular problem you're trying to solve.

This is the end of the first chapter, and it was already quite action-packed, wasn't it? It's now time to review what we've learned.

Summary

In this chapter, you discovered the main capabilities of Amazon SageMaker, and how they can help solve your machine learning pain points. By providing you with managed infrastructure and pre-installed tools, SageMaker lets you focus on the machine learning problem itself. Thus, you can go more quickly from experimenting with models to deploying them in production.

Then, you learned how to set up Amazon SageMaker on your local machine and in Amazon SageMaker Studio. The latter is a managed machine learning IDE where many other SageMaker capabilities are just a few clicks away.

Finally, you learned about Amazon SageMaker JumpStart, a collection of machine learning solutions and state-of-the-art models that you can deploy in one click, and start testing in minutes.

In the next chapter, we'll see how you can use Amazon SageMaker and other AWS services to prepare your datasets for training.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Build, train, and deploy machine learning models quickly using Amazon SageMaker
  • Optimize the accuracy, cost, and fairness of your models
  • Create and automate end-to-end machine learning workflows on Amazon Web Services (AWS)

Description

Amazon SageMaker enables you to quickly build, train, and deploy machine learning models at scale without managing any infrastructure. It helps you focus on the machine learning problem at hand and deploy high-quality models by eliminating the heavy lifting typically involved in each step of the ML process. This second edition will help data scientists and ML developers to explore new features such as SageMaker Data Wrangler, Pipelines, Clarify, Feature Store, and much more. You'll start by learning how to use various capabilities of SageMaker as a single toolset to solve ML challenges and progress to cover features such as AutoML, built-in algorithms and frameworks, and writing your own code and algorithms to build ML models. The book will then show you how to integrate Amazon SageMaker with popular deep learning libraries, such as TensorFlow and PyTorch, to extend the capabilities of existing models. You'll also see how automating your workflows can help you get to production faster with minimum effort and at a lower cost. Finally, you'll explore SageMaker Debugger and SageMaker Model Monitor to detect quality issues in training and production. By the end of this Amazon book, you'll be able to use Amazon SageMaker on the full spectrum of ML workflows, from experimentation, training, and monitoring to scaling, deployment, and automation.

Who is this book for?

This book is for software engineers, machine learning developers, data scientists, and AWS users who are new to using Amazon SageMaker and want to build high-quality machine learning models without worrying about infrastructure. Knowledge of AWS basics is required to grasp the concepts covered in this book more effectively. A solid understanding of machine learning concepts and the Python programming language will also be beneficial.

What you will learn

  • Become well-versed with data annotation and preparation techniques
  • Use AutoML features to build and train machine learning models with AutoPilot
  • Create models using built-in algorithms and frameworks and your own code
  • Train computer vision and natural language processing (NLP) models using real-world examples
  • Cover training techniques for scaling, model optimization, model debugging, and cost optimization
  • Automate deployment tasks in a variety of configurations using SDK and several automation tools

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Nov 26, 2021
Length: 554 pages
Edition : 2nd
Language : English
ISBN-13 : 9781801817950
Category :
Languages :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Nov 26, 2021
Length: 554 pages
Edition : 2nd
Language : English
ISBN-13 : 9781801817950
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 152.97
Learn Amazon SageMaker
$48.99
Machine Learning with Amazon SageMaker Cookbook
$54.99
Amazon SageMaker Best Practices
$48.99
Total $ 152.97 Stars icon
Banner background image

Table of Contents

18 Chapters
Section 1: Introduction to Amazon SageMaker Chevron down icon Chevron up icon
Chapter 1: Introducing Amazon SageMaker Chevron down icon Chevron up icon
Chapter 2: Handling Data Preparation Techniques Chevron down icon Chevron up icon
Section 2: Building and Training Models Chevron down icon Chevron up icon
Chapter 3: AutoML with Amazon SageMaker Autopilot Chevron down icon Chevron up icon
Chapter 4: Training Machine Learning Models Chevron down icon Chevron up icon
Chapter 5: Training CV Models Chevron down icon Chevron up icon
Chapter 6: Training Natural Language Processing Models Chevron down icon Chevron up icon
Chapter 7: Extending Machine Learning Services Using Built-In Frameworks Chevron down icon Chevron up icon
Chapter 8: Using Your Algorithms and Code Chevron down icon Chevron up icon
Section 3: Diving Deeper into Training Chevron down icon Chevron up icon
Chapter 9: Scaling Your Training Jobs Chevron down icon Chevron up icon
Chapter 10: Advanced Training Techniques Chevron down icon Chevron up icon
Section 4: Managing Models in Production Chevron down icon Chevron up icon
Chapter 11: Deploying Machine Learning Models Chevron down icon Chevron up icon
Chapter 12: Automating Machine Learning Workflows Chevron down icon Chevron up icon
Chapter 13: Optimizing Prediction Cost and Performance Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8
(10 Ratings)
5 star 80%
4 star 20%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




N/A Oct 10, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
An amazing book. I really enjoyed reading and applying it, I am very grateful to the author for the professional and orderly way in which he designed and wrote the book.
Feefo Verified review Feefo
Akshay Nov 28, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Have to give it to the author for maintaining the smooth flow and simplicity while taking us through the seemingly niche topic of Machine Learning on AWS. After reading the book, no doubt you will end up learning the subject, but the book also provides value add with valuable information on using built in frameworks like Hugging Face, Apache Spark etc. As a solutions architect, my favorite section of the book is the one on optimizing cost and performance which is helping me incorporate these new learnings to my day to day job. The book makes the learning easy with attached screenshots from AWS console.
Amazon Verified review Amazon
Gary A. Stafford Nov 19, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Being such a large and ever-expanding ML platform, I find it challenging to keep up with the breadth of Amazon SageMaker's features. Similar to the first edition, I found "Learn Amazon SageMaker (Second Edition)" to be adept at covering all the current features and functions of SageMaker in an easy-to-understand format for non-Data Scientists like myself.I also found significant value in the book's focus on the general ML process independent of SageMaker - preparing data, building, training, deploying models, and automating your ML workflows.Lastly, since the cost of ML is frequently a concern of many organizations I work with, I appreciated the final chapter of the book, "Optimizing Prediction Cost and Performance." The author claims prediction costs are "...typically accounts for 90% of the machine learning spend by AWS customers."Disclosure: I received a copy of the book from the publisher for an honest review.
Amazon Verified review Amazon
Om S Dec 21, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I have been following @Julien since 2007 who is the author of this book. He presented at numerous conferences and recorded hundred of videos at every level for everyone who has an interest in the subject. Today I am honored to review his book the second time which happened to be the second version of the book too. This book for myself is a note and reminder of the topics which I have seen and experienced so far. The author is extremely knowledgeable on not only SageMaker but other AWS services too. He manages to have all the available AWS certifications. Learning AWS Sagemaker from him is an amazing experience in form of the book.This book is exceptional when it comes to learning SageMaker, it starts with a clear beginner-friendly overview and SageMaker Studio which is the brain of this service in AWS.There are a total of thirteen (13) chapters in the book. The first 4 chapters are great for a beginner for who has less exposer in machine learning and wants to get hands dirty with starting with an overview of Service / Data Preparation using Data Wrangler / AutoML / Training Model with building in algorithms and basic model and deployment.Chapter 5 and 6 - Cover Computer vision and NLP which are hot topics today. On CV side Image classification, Object detection, and semantic segmentation are well explained. SageMaker and AWS have made these complex topics super easy. Instead of investing your months of time and energy now, it can be done within hrs. The author has touched upon every aspect of feature and associated services which are needed to cover these complex topics. This helps developers which have some AWS knowledge and coding experience can make an end to end projects in less time. NLP BlazingText, LDA, NTM are well covered in the book with examples. Chapter 7 - Covers built-in frameworks in Amazon SageMaker. Running your framework code on Amazon SageMaker. Using the built-in frameworks. Having Some knowledge of Docker is helpful. This is an advanced topic! The most interesting part is Hugging Face. The author himself working for hugging face now! Chapter 8 - Contains a lot of advanced info and a good understanding of Docker. Training and deploying with your custom Python code on MLflow. Building fully custom containers for SageMaker Processing etc.Chapter 9 – From this chapter onwards advanced training techniques have been covered such as Scaling training jobs SageMaker Debugger, pipe mode, distributed training, data parallelism,and model parallelism.Chapter 10 - This chapter covers managed spot training (50-70% $ saving), automatic modeltuning, SageMaker Feature Store, etc. Chapter 11 - Deploying Machine Learning Models (Inference pipeline, Multi-model Endpoint – “I used in my company”, Batch Transform, Model Monitor)Chapter 12 - Automating Machine Learning Workflows (AWSCloudFormation and AWS Cloud Development Kit (CDK), Step function stole my heart! The AWS step function is more powerful when you use with SageMaker it takes ML to next level with ease. Chapter 13 - Optimizing Cost and Performance - Autoscaling an endpoint, Deploying a multi-model endpoint, Deploying a model with Amazon Elastic Inference, Compiling models with Amazon SageMaker Neo.This book is helping me a lot in understanding how Machine Learning works at AWS and passing the certification exam also.I will highly recommend this book, 533 pages are well glued with amazing info.
Amazon Verified review Amazon
Josh Schuller Nov 26, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
"Learn Amazon Sagemaker" is a great resource if you are ready to get hands on with Amazon Sagemaker. Immediately you are guided in how to configure your environment (local or in AWS cloud) so you can be productive. Each chapter starts with a discussion of the topic, follows with a step-by-step you can follow along in your environment (also with screenshots should you prefer to skim the topic) and then a summary to recap what you've done. Definitely worth picking it up if you are interested in doing what the title says ... Learn Amazon Sagemaker.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.