Logistic regression in Python
In linear regression, we were predicting a continuous dependent variable such as HbA1c, but what if the variable is categorical and contains a certain diagnosis such as the presence or absence of diabetes in subjects? For this model, we can use the CLASS
variable, which contains the diagnosed diabetes
and control
categories. However, we also need a different statistical method that is better adapted for this analysis. Linear model will not work well for categories because categorical data does not show linearity as normal continuous data. For this reason, it’s better to use a specific statistical method called logistic regression. The logistic regression model will create the sigmoid probability function, which can be used to predict whether a subject has diabetes based on other parameters.
To see how the sigmoid function works, let’s do a practical representation:
import pandas as pd import statsmodels.api as sm import seaborn as sns...