Artificial Intelligence and Machine Learning: Unit V: Neural Networks

Vanishing Gradient Problem

Neural Networks - Artificial Intelligence and Machine Learning

The vanishing gradient problem is a problem that user face, when we are training Neural Networks by using gradient-based methods like backpropagation.

Vanishing Gradient Problem

The vanishing gradient problem is a problem that user face, when we are training Neural Networks by using gradient-based methods like backpropagation. This problem makes it difficult to learn and tune the parameters of the earlier layers in the network.

The vanishing gradient problem is essentially a situation in which a deep multilayer feed-forward network or a Recurrent Neural Network (RNN) does not have the ability to propagate useful gradient information from the grim the model back to the layers near the input end of the model.

It results in models with many layers being rendered unable to learn on a specific dataset. It could even cause models with many layers to prematurely converge to a substandard solution.

When the backpropagation algorithm advances downwards or backward going from the output layer to the input layer, the gradients tend to shrink, becoming smaller and smaller till they approach zero. This ends up leaving the weights of the initial or lower layers practically unchanged. In this situation, the gradient descent does not ever end up converging to the optimum.

Vanishing gradient does not necessarily imply that the gradient vector is all zero. It implies that the gradients are minuscule, which would cause the learning to be very slow.

The most important solution to the vanishing gradient problem is a specific type of neural network called Long Short-Term Memory Networks (LSTMs).

Indication of vanishing gradient problem:

a) The parameters of the higher layers change to a great extent, while the parameters of lower layers barely change.

b) The model weights could become 0 during training.

c) The model learns at a particularly slow pace and the training could stagnate at a very early phase after only a few iterations.

Some methods that are proposed to overcome the vanishing gradient problem:a) Residual neural networks (ResNets)

b) Multi-level hierarchy

c) Long short term memory (LSTM)

d) Faster hardware

e) ReLU

f) Batch normalization 

Artificial Intelligence and Machine Learning: Unit V: Neural Networks : Tag: : Neural Networks - Artificial Intelligence and Machine Learning - Vanishing Gradient Problem