Part 4 Exercise – Performing Statistics for Biology Studies in Python
In this chapter, you will learn how to perform biostatistical analysis using advanced methods such as Principal Component Analysis (PCA), random forests, latent variable modeling, and others. Data dimensionality (having a large number of biological variables) is a common aspect of real-world biological datasets. This is often an advantage because we have more data and more insights as a result. But sometimes we want to reduce dimensionality to better summarize and understand the data from the perspective of having fewer dimensions than in the original data. This set of methods is called data dimensionality reduction. This is especially important in studies involving genetics and protein analysis. In this chapter, you will learn how to practically reduce dimensionality and perform PCA in Python using a real-world mice protein dataset with Down syndrome data.
Further, you will learn how to identify the unknown...