Working with data functions in H2O Flow
An ML pipeline always starts with data. The amount of data you collect and the quality of that data play a very crucial role when training models of the highest quality. If one part of the data has no relationship with another part of the data, or if there is a lot of noisy data that does not contribute to the said relationship, the quality of the model will degrade accordingly. Therefore, before training any models, often, we perform several processes on the data before sending it to model training. H2O Flow provides interfaces for all of these processes in its Data operation drop-down list.
We will understand the various data operations and what the output looks like in a step-by-step process as we build our ML pipeline using H2O Flow.
So, let’s begin creating our ML pipeline by, first, importing a dataset.
Importing the dataset
The dataset we will be working with in this chapter will be the Heart Failure Prediction
dataset...