Artificial Intelligence and Machine Learning: Unit V: Neural Networks

Normalization

Neural Networks - Artificial Intelligence and Machine Learning

Normalization is a data preparation technique that is frequently used in machine learning. The process of transforming the columns in a dataset to the same scale is referred to as normalization.

Normalization

Normalization is a data preparation technique that is frequently used in machine learning. The process of transforming the columns in a dataset to the same scale is referred to as normalization. Every dataset does not need to be normalized for machine learning.

Normalization makes the features more consistent with each other, which allows the model to predict outputs more accurately. The main goal of normalization is to make the data homogenous over all records and fields.

Normalization refers to rescaling real-valued numeric attributes into a 0 to 1 range. Data normalization is used in machine learning to make model training less sensitive to the scale of features.

Normalization is important in such algorithms as k-NN, support vector machines, neural networks, and principal components. The type of feature preprocessing and normalization that's needed can depend on the data.

Batch Normalization

It is a method of adaptive reparameterization, motivated by the difficulty of training very deep models. In Deep networks, the weights are updated for each layer. So the output will no longer be on the same scale as the input.

When we input the data to a machine or deep learning algorithm we tend to for change the values to a balanced scale because, we ensure that our model can generalize appropriately.

Batch normalization is a technique for standardizing the inputs to layers in a neural network. Batch normalization was designed to address the problem of internal covariate shift, which arises as a consequence of updating multiple-layer inputs simultaneously in deep neural networks.

Batch normalization is applied to individual layers, or optionally, to all of them: In each training iteration, we first normalize the inputs by subtracting their mean and dividing by their standard deviation, where both are estimated based on the statistics of the current mini-batch.

Next, we apply a scale coefficient and an offset to recover the lost degrees of freedom. It is precisely due to this normalization based on batch statistics that batch normalization derives its name.

We take the output a[i-1] from the preceding layer, and multiply by the weights W and add the bias b of the current layer. The variable I denotes the current layer.

Z[i] = W [i] a[i-1] + b[i]

Next, we usually apply the non-linear activation function that results in the output a[i] of the current layer. When applying batch norm, we correct our data before feeding it to the activation function.

To apply batch norm, calculate the mean as well as the variance of current z.

μ =  Σ mi=1 Zj

When calculating the variance, we add a small constant to the variance to prevent potential divisions by zero.

σ2 = 1/m Σmi=1 (Zj - μ)2 + €

To normalize the data, we subtract the mean and divide the expression by the standard deviation.

Z[i] = Z[i]-μ / √σ 2

This operation scales the inputs to have a mean of 0 and a standard deviation of 1.

Advantages of Batch Normalisation:

a) The model is less delicate to hyperparameter tuning.

b) Shrinks internal covariant shift.

c) Diminishes the reliance of gradients on the scale of the parameters or their underlying values.

d) Dropout can be evacuated for regularization

Artificial Intelligence and Machine Learning: Unit V: Neural Networks : Tag: : Neural Networks - Artificial Intelligence and Machine Learning - Normalization