The optimizer algorithm
In the Linear regression section in Chapter 4, we discussed the GD algorithm, which optimizes the linear regression cost function. In neural networks, the optimizer is an algorithm used to minimize the cost function in model training. The commonly used optimizers are Stochastic Gradient Descent (SGD), RMSprop, and Adam as follows:
- SGD is useful for very large datasets. Instead of GD, which runs through all of the samples in your training dataset to update parameters, SGD uses one or a subset of training samples.
- RMSprop improves SGD by introducing variable learning rates. The learning rate, as we discussed in Chapter 4, impacts model performances—larger learning rates can reduce training time but may lead to model oscillation and may miss the optimal model parameter values. Lower learning rates can make the training process longer. In SGD, the learning rate is fixed. RMSprop adapts the learning rate as training progresses, and thus it allows...