- In neurons, we introduce non-linearity to the result, z, by applying a function f() called the activation or transfer function. Refer section Artificial neurons.
- Activation functions are used for introducing nonlinearity.
- We calculate the gradient of the cost function with respect to the weights to minimize the error.
- RNN predicts the output not only based on the current input but also on the previous hidden state.
- While backpropagating the network if the gradient value becomes smaller and smaller it is called vanishing gradient problem if the gradient value becomes bigger then it is exploding gradient problem.
- Gates are special structures in LSTM used to decide what information to keep, discard and update.
- The pooling layer is used to reduce the dimensions of the feature maps and keeps only necessary details so that the amount of computation can be reduced.




















































