Reframing your dataframe
Data collected from various sources is often termed raw data. It is called raw in the sense that there might be a lot of unnecessary or stale data, which might not necessarily benefit our model training. The structure of the data collected also might not be consistent among all the sources. Hence, it becomes very important to first reframe the data from various sources into a consistent format.
You may have noticed that once we import the dataset into H2O, H2O converts the dataset into a .hex
file, also called a dataframe. You have the option to import multiple datasets as well. Assuming you are importing multiple datasets from various sources, each with its own format and structure, then you will need a certain functionality that helps you reframe the contents of the dataset and merge them to form a single dataframe that you can feed to your ML pipeline.
H2O provides several functionalities that you can use to perform the required manipulations.
Here...