Chapter 5. Handling Missing Values and Correlation Analysis
Note
Learning Objectives
By the end of this chapter, you will be able to:
Detect and handle missing values in data using PySpark
Describe correlations between variables
Compute correlations between two or more variables in PySpark
Create a correlation matrix using PySpark
Note
In this chapter, we will be using the Iris dataset to handle missing data and find correlations between data values.