Variable transformation
Some machine learning models, such as linear and logistic regression, assume that the variables follow a normal distribution. More likely, variables in real datasets will follow a more skewed distribution.
By applying several transformations to these variables, and mapping their skewed distribution to a normal distribution, we can increase the performance of our models.
Plotting a histogram or using Q-Q plots could give you an idea of whether the data has a normal distribution or is skewed.
Next, we will look at four methods you can use to adjust your data distribution.
Logarithmic transformation
This is the simplest and most popular among the different types of transformations and involves a substantial transformation that significantly affects the distribution shape.
We can use it (natural logarithmic ln or log base 10) to make extremely skewed distributions less skewed, especially for right-skewed (or positively skewed) distributions.
...