Gender classification – clustering to classify
The following data is taken from the gender classification example, Problem 6, Chapter 2, Naive Bayes:
Height in cm | Weight in kg | Hair length | Gender |
180 | 75 | Short | Male |
174 | 71 | Short | Male |
184 | 83 | Short | Male |
168 | 63 | Short | Male |
178 | 70 | Long | Male |
170 | 59 | Long | Female |
164 | 53 | Short | Female |
155 | 46 | Long | Female |
162 | 52 | Long | Female |
166 | 55 | Long | Female |
172 | 60 | Long | ? |
To simplify matters, we will remove the column entitled Hair length. We will also remove the column entitled Gender, since we would like to cluster the people in the table based on their height and weight. We would like to establish whether the eleventh person in the table is more likely to be a man or a woman using clustering:
Height in cm | Weight in kg |
180 | 75 |
174 | 71 |
184 | 83 |
168 | 63 |
178 | 70 |
170 | 59 |
164 | 53 |
155 | 46 |
162 | 52 |
166 | 55 |
172 | 60 |
Analysis
We may apply scaling to the initial data, but to simplify matters, we will use the unscaled data in the algorithm. We will cluster the data we have into two clusters, since there are two possibilities for gender—male or female. Then, we will...