Building an Efficient Data Pipeline
Machine learning is grounded on data. Simply put, the training process feeds the neural network with a bunch of data, such as images, videos, sound, and text. Thus, apart from the training algorithm itself, data loading is an essential part of the entire model-building process.
It turns out that deep learning models deal with huge amounts of data, such as thousands of images and terabytes of text sequences. As a consequence, tasks related to data loading, preparation, and augmentation can severely delay the training process as a whole. So, to overcome a potential bottleneck in the model-building process, we must guarantee an uninterrupted flow of dataset samples to the training process.
In this chapter, we’ll explain how to build an efficient data pipeline to keep the training process running smoothly. The main idea is to prevent the training process from being stalled by data-related tasks.
Here is what you will learn as part of...