The learning rate is the mother of all hyperparameters and quantifies the model's learning progress in a way that can be used to optimize its capacity.
A too-low learning rate would increase the training time of the model as it would take longer to incrementally change the weights of the network to reach an optimal state. On the other hand, although a large learning rate helps the model adjust to the data quickly, it causes the model to overshoot the minima. A good starting value for the learning rate for most models would be 0.001; in the following diagram, you can see that a low learning rate requires many updates before reaching the minimum point:
However, an optimal learning rate swiftly reaches the minimum point. It requires less of an update before reaching near minima. Here, we can see a diagram with a decent learning rate:
A high learning rate causes drastic updates that lead...