Performing and visualizing linear regression in Python
Using the diabetes dataset, we will now perform analysis with the main research question of checking if there is an association between urea and HbA1C in diabetes subjects (CLASS: Y
). Then we will present the results visually using the regression plot and also check whether gender, age, or High-Density Lipoprotein (HDL) influence the results. Before starting with the linear regression coding, let’s define the model using a simple diagram:
Figure 8.7 – Predictive analysis, linear regression of HbA1C as dependent variable
You can see that gender and age are added to the model scheme. This will adjust the results for age and gender, which is routinely done in most biostatistical analysis where such data is available and where these factors could influence the results.
The variables that are interesting from the perspective of predictor potential are HDL and urea in this example. They...