Representation learning – what is it?
Modern ML-related tasks and experiments have settled into a standardized workflow pipeline. Here’s a quick and simplified overview of the steps:
- Convert the business/domain-specific problem into an ML problem (supervised or unsupervised, what metric is being optimized, baseline levels of metrics, and so on).
- Get the data.
- Pamper the data (by introducing new columns based on existing ones, imputing missing values, and more).
- Train an ML model on the data and evaluate its performance on the test set. Iterate on this step with new models until a satisfactory performance is achieved.
One of the most important and time-consuming steps in this list is deciding how new columns can be created from the existing ones to add to the knowledge being specified in the data.
To understand this, let’s understand the meaning of what a dataset is. A row in a dataset is effectively just a record of an event. The different...