Predicting heart disease
We'll put logistic regression for the binary classification task to the test with a real-world data set from the UCI Machine Learning Repository. This time, we will be working with the Statlog (Heart) data set, which we will refer to as the heart data set henceforth for brevity. The data set can be downloaded from the UCI Machine Repository's website at http://archive.ics.uci.edu/ml/datasets/Statlog+%28Heart%29. The data contain 270 observations for patients with potential heart problems. Of these, 120 patients were shown to have heart problems, so the split between the two classes is fairly even. The task is to predict whether a patient has a heart disease based on their profile and a series of medical tests. First, we'll load the data into a data frame and rename the columns according to the website:
> heart <- read.table("heart.dat", quote = "\"") > names(heart) <- c("AGE", "SEX", "CHESTPAIN", "RESTBP", "CHOL", "SUGAR", "ECG", "MAXHR", "ANGINA", "DEP...