The vanishing gradient problem is a problem that user face, when we are training Neural Networks by using gradient-based methods like backpropagation.
Vanishing
Gradient Problem
•
The vanishing gradient problem is a problem that user
face, when we are training Neural Networks by using gradient-based methods like
backpropagation. This problem makes it difficult to learn and tune the
parameters of the earlier layers in the network.
•
The vanishing gradient problem is essentially a
situation in which a deep multilayer feed-forward network or a Recurrent Neural
Network (RNN) does not have the ability to propagate useful gradient
information from the grim the model back to the layers near the input end of
the model.
•
It results in models with many layers being rendered
unable to learn on a specific dataset. It could even cause models with many
layers to prematurely converge to a substandard solution.
•
When the backpropagation algorithm advances downwards
or backward going from the output layer to the input layer, the gradients tend
to shrink, becoming smaller and smaller till they approach zero. This ends up
leaving the weights of the initial or lower layers practically unchanged. In
this situation, the gradient descent does not ever end up converging to the
optimum.
•
Vanishing gradient does not necessarily imply that
the gradient vector is all zero. It implies that the gradients are minuscule,
which would cause the learning to be very slow.
•
The most important solution to the vanishing gradient
problem is a specific type of neural network called Long Short-Term Memory
Networks (LSTMs).
•
Indication of vanishing gradient problem:
a)
The parameters of the higher layers change to a great extent, while the
parameters of lower layers barely change.
b)
The model weights could become 0 during training.
c)
The model learns at a particularly slow pace and the training could stagnate at
a very early phase after only a few iterations.
• Some methods that are proposed to overcome the vanishing gradient problem:a) Residual neural networks (ResNets)
b)
Multi-level hierarchy
c)
Long short term memory (LSTM)
d)
Faster hardware
e)
ReLU
f)
Batch normalization
Artificial Intelligence and Machine Learning: Unit V: Neural Networks : Tag: : Neural Networks - Artificial Intelligence and Machine Learning - Vanishing Gradient Problem
Artificial Intelligence and Machine Learning
CS3491 4th Semester CSE/ECE Dept | 2021 Regulation | 4th Semester CSE/ECE Dept 2021 Regulation