Labeling Data for Regression
In this chapter, we will explore the process of labeling data for regression-based machine learning tasks, such as predicting housing prices, in situations where there is insufficient labeled data available for training. Regression tasks are tasks that involve predicting numerical values using a labeled training dataset, making them integral to fields such as finance and economics. However, real-world scenarios often present a challenge: labeled data is a precious commodity, often in short supply.
If there is a short supply of labeled data to train a machine learning model, you can still use summary statistics, semi-supervised learning, and clustering to predict the target labels for your unlabeled data. We have demonstrated this using house price data as an example and generated the predicted labels for house prices programmatically using Python. We will look at different approaches to labeling data for regression using Snorkel libraries, semi-supervised...