Recall from previous chapters that problems involving high-dimensional data can be affected by the curse of dimensionality. As the number of dimensions of a dataset increases, the number of samples required for an estimator to generalize increases exponentially. Acquiring such large data may be infeasible in some applications, and learning from large datasets requires more memory and processing power. Furthermore, the sparseness of data often increases with its dimensions. It can become more difficult to detect similar instances in high-dimensional space as all instances are similarly sparse.
PCA also known as the Karhunen-Loeve Transform (KLT), is a technique for finding patterns in high-dimensional data. PCA is commonly used to explore and visualize high-dimensional datasets. It can also be used to compress data and to process data before it...