Hyperparameters are parameters whose values control the learning process and determine the values of model parameters that a learning algorithm ends up learning.
Hyperparameter
Tuning
•
Hyperparameters are parameters whose values control
the learning process and determine the values of model parameters that a
learning algorithm ends up learning.
•
While designing a machine learning model, one always
has multiple choices for the architectural design for the model. This creates a
confusion on which design to choose for the model based on its optimality. And
due to this, there are always trials for defining a perfect machine learning
model.
•
The parameters that are used to define these machine
learning models are known as the hyperparameters and the rigorous search for
these parameters to build an optimized model is known as hyperparameter tuning.
•
Hyperparameters are not model parameters, which can
be directly trained from data. Model parameters usually specify the way to
transform the input into the required output, whereas hyperparameters define
the actual structure of the model that gives the required data.
•
Layer size is defined by the number of neurons in a
given layer. Input and output layers are relatively easy to figure out because
they correspond directly to how our modeling problem handles input and ouput.
•
For the input layer, this will match up to the number
of features in the input vector. For the output layer, this will either be a
single output neuron or a number of neurons matching the number of classes we
are trying to predict.
•
It is obvious that a neural network with 3 layers
will give better performance than that of 2 layers. Increasing more than 3
doesn't help that much in neural networks. In the case of CNN, an increasing
number of layers makes the model better.
•
The amount that the weights are updated during
training is referred to as the step size or the learning rate. Specifically,
the learning rate is a configurable hyper-parameter used in the training of
neural networks that has a small positive value, often in the range between 0.0
and 1.0.
•
For example, if learning rate is 0.1, then the
weights in the network are updated 0.1* (estimated weight error) or 10% of the
estimated weight error each time the Top weights are updated. The learning rate
hyper-parameter controls the rate or speed at which the model learns.
•
Learning rates are tricky because they end up being
specific to the dataset and even to other hyper-parameters. This creates a lot
of overhead for finding the right setting for hyper-parameters.
•
Large learning rates () make the model learn faster
but at the same time it may cause us to miss the minimum loss function and only
reach the surrounding of it. In cases where the learning rate is too large, the
optimizer overshoots the minimum and the loss updates will lead to divergent
behaviours.
•
On the other hand, choosing lower learning rate
values gives a better chance of finding the local minima with the trade-off of
needing larger number of epochs and more time.
Artificial Intelligence and Machine Learning: Unit V: Neural Networks : Tag: : Neural Networks - Artificial Intelligence and Machine Learning - Hyperparameter Tuning
Artificial Intelligence and Machine Learning
CS3491 4th Semester CSE/ECE Dept | 2021 Regulation | 4th Semester CSE/ECE Dept 2021 Regulation