Principal component analysis and total variance
Simply put, PCA is a tool that enables dimension reduction of data. While this sounds simple enough, let's discuss in a bit more detail. If we start with a dataset that has a large number of variables, say 100, then the question may arise as to whether we really need all 100 variables or if there is some redundancy in the data such that it can be summarized with fewer variables. In this case, redundancy does not mean complete duplication of a measurement but rather a significant amount of overlap.
We will get to some real data in a moment, but for now, let's assume the voting results from senate bills in the United States. Each bill gets voted on by up to 100 senators (some may abstain), and we want to get an idea of how each bill was received by the voting members of the senate. There are two political parties in the United States, Democratic and Republican, and there may be a lot of redundancy in voting. In fact, ample research shows that...