Analyzing associations among multiple variables – correlations in Python
We performed the chi-square test, which was ideal for categorical variables to test for independence between them. But what about testing the raw continuous variables and testing multiple associations between variables using a simple parametric approach (the one for continuous normal distribution)? One easy approach for this situation is the correlation analysis using the Pearson correlation method.
Let’s perform the correlation analysis:
import pandas as pd # Load the data data = pd.read_csv(r'C:\Users\KORISNIK\Downloads\Dataset of Diabetes .csv') # Filter data for rows with 'Y' or 'N' values in 'CLASS' column filtered_data = data[data['CLASS'].isin(['Y', 'N'])] # Drop the first two columns filtered_data = filtered_data.iloc[:, 2:] # Compute the correlation matrix corr_matrix = filtered_data.corr( ...