In this section, we will implement the k-nearest neighbors (KNN) algorithm to build a model on our IBM attrition dataset. Of course, we are already aware from EDA that we have a class imbalance problem in the dataset at hand. However, we will not be treating the dataset for class imbalance for now as this is an entire area on its own and several techniques are available in this area and therefore out of scope for the ML ensembling topic covered in this chapter. We will, for now, consider the dataset as is and build ML models. Also, for class imbalance datasets, Kappa or precision and recall or the area under the curve of the receiver operating characteristic (AUROC) are the appropriate metrics to use. However, for simplicity, we will use accuracy as a performance metric. We will adapt 10-fold cross validation repeated...
Germany
Slovakia
Canada
Brazil
Singapore
Hungary
Philippines
Mexico
Thailand
Ukraine
Luxembourg
Estonia
Lithuania
Norway
Chile
United States
Great Britain
India
Spain
South Korea
Ecuador
Colombia
Taiwan
Switzerland
Indonesia
Cyprus
Denmark
Finland
Poland
Malta
Czechia
New Zealand
Austria
Turkey
France
Sweden
Italy
Egypt
Belgium
Portugal
Slovenia
Ireland
Romania
Greece
Argentina
Malaysia
South Africa
Netherlands
Bulgaria
Latvia
Australia
Japan
Russia