Problems with linear regression
In this chapter, we've already seen some examples where trying to build a linear regression model might run into problems. One big class of problems that we've talked about is related to our model assumptions of linearity, feature independence, and the homoscedasticity and normality of errors. In particular we saw methods of diagnosing these problems either via plots, such as the residual plot, or by using functions that identify dependent components. In this section, we'll investigate a few more issues that can arise with linear regression.
Multicollinearity
As part of our preprocessing steps, we were diligent to remove features that were linearly related to each other. In doing this we were looking for an exact linear relationship and this is an example of perfect collinearity. Collinearity is the property that describes when two features are approximately in a linear relationship. This creates a problem for linear regression as we are trying...