The Cleveland dataset
The Cleveland Heart Disease dataset (http://archive.ics.uci.edu/dataset/45/heart+disease) is a dataset used in exemplar data analysis and machine learning for predicting the presence or absence of heart disease in patients. In this case, we will use it for biostatistical example purposes.
To proceed with this chapter, please download the dataset using the link provided (the downloaded file should be in .data
format). Here is the name of the dataset found in the .zip
file you downloaded: processed.cleveland.data
Here is the citation for the dataset: Janosi, Andras, Steinbrunn, William, Pfisterer, Matthias, and Detrano, Robert. (1988). Heart Disease. UCI Machine Learning Repository. https://doi.org/10.24432/C52P4X. Before proceeding with analyzing the variables, let’s first explore the main topic in this project, which is coronary artery disease (CAD). This specific form of heart disease is characterized by 50% or more congestion of the heart’...