Using Boruta for feature selection
The Boruta package takes a unique approach to feature selection, though it has some similarities with wrapper methods. For each feature, Boruta creates a shadow feature, one with the same range of values as the original feature but with shuffled values. It then evaluates whether the original feature offers more information than the shadow feature, gradually removing features providing the least information. Boruta outputs confirmed, tentative, and rejected features with each iteration.
Let's use the Boruta package to select features for a classification model of bachelor's degree completion (you can install the Boruta package with pip
if you have not yet installed it):
- We start by loading the necessary libraries:
import pandas as pd from feature_engine.encoding import OneHotEncoder from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.ensemble import RandomForestClassifier...