Determining the number of factors
The number of factors produced is typically equal to the number of variables in our dataset. However, a significant proportion of the valuable information is contained in only a few factors. This means by keeping only a few factors, we can still get a good representation of our data.
A scree plot can be used to determine the number of factors. The scree plot plots the factors against their eigenvalues. The eigenvalues give us a sense of how much variance (information) in our dataset is explained by the factors. The thumb rule is to select factors whose eigenvalues are greater than 1. Because our variables are typically scaled, the eigenvalue (variance) of a single variable is equal to 1. Hence, useful factors need to explain more information than a single variable, since a factor is meant to be a combination of variables.
We will explore how to check the optimal number of factors using the factor_analyzer
and matplotlib
libraries.