What this book covers
Chapter 1, Deconstructing the Training Process, provides an overview of how the training process works under the hood, describing the training algorithm and covering the phases executed by this process. This chapter also explains how factors such as hyperparameters, operations, and neural network parameters impact the training process’s computational burden.
Chapter 2, Training Models Faster, provides an overview of the possible approaches to accelerate the training process. This chapter discusses how to modify the application and environment layers of the software stack to reduce the training time. Moreover, it explains vertical and horizontal scalability as another option to improve performance by increasing the number of resources.
Chapter 3, Compiling the Model, provides an overview of the novel Compile API introduced on PyTorch 2.0. This chapter covers the differences between eager and graph modes and describes how to use the Compile API to accelerate the model-building process. This chapter also explains the compiling workflow and components involved in the compiling process.
Chapter 4, Using Specialized Libraries, provides an overview of the libraries used by PyTorch to execute specialized tasks. This chapter describes how to install and configure OpenMP to deal with multithreading and IPEX to optimize the training process on an Intel CPU.
Chapter 5, Building an Efficient Data Pipeline, provides an overview of how to build an efficient data pipeline to keep the GPU working as much as possible. Besides explaining the steps executed on the data pipeline, this chapter describes how to accelerate the data-loading process by optimizing GPU data transfer and increasing the number of workers on the data pipeline.
Chapter 6, Simplifying the Model, provides an overview of how to simplify a model by reducing the number of parameters of the neural network without sacrificing the model’s quality. This chapter describes techniques used to reduce the model complexity, such as model pruning and compression, and explains how to use the Microsoft NNI toolkit to simplify a model easily.
Chapter 7, Adopting Mixed Precision, provides an overview of how to adopt a mixed precision strategy to burst the model training process without penalizing the model’s accuracy. This chapter briefly explains numeric representation in computer systems and describes how to employ PyTorch’s automatic mixed precision approach.
Chapter 8, Distributed Training at a Glance, provides an overview of the basic concepts of distributed training. This chapter presents the most adopted parallel strategies and describes the basic workflow to implement distributed training on PyTorch.
Chapter 9, Training with Multiple CPUs, provides an overview of how to code and execute distributed training in multiple CPUs on a single machine using a general approach and Intel oneCCL to optimize the execution on Intel platforms.
Chapter 10, Training with Multiple GPUs, provides an overview of how to code and execute distributed training in a multi-GPU environment on a single machine. This chapter presents the main characteristics of a multi-GPU environment and explains how to code and launch distributed training on multiple GPUs using NCCL, the default communication backend for NVIDIA GPUs.
Chapter 11, Training with Multiple Machines, provides an overview of how to code and execute distributed training in multiple GPUs on multiple machines. Besides an introductory explanation of computing clusters, this chapter shows how to code and launch distributed training among multiple machines using Open MPI as the launcher and NCCL as the communication backend.