Weight Update

Introduction

In machine learning — especially in deep learning — Weight Update refers to the process by which a model’s internal parameters (weights) are adjusted to reduce the error between its predictions and the actual target values. It is the most fundamental operation in the learning process and the very mechanism by which a model learns from data.

At every iteration or batch during training, the model calculates gradients based on a loss function and then uses those gradients to update its weights. This iterative refinement of weights is what transforms a randomly initialized model into an accurate one.

What Are Weights?

In neural networks and other parametric models, weights are the values that modulate the strength of the connection between nodes. For example:

In linear regression: y_hat = w * x + b
In a neuron: z = sum(w_i * x_i) + b

Weights determine how input data flows through the network to produce predictions. When you update weights, you’re tuning the model’s internal logic.

The Core Formula

The general weight update rule using gradient descent is:

w_new = w_old - learning_rate * gradient

Or, symbolically:

w_(t+1) = w_t - α * ∇L(w_t)

Where:

w_t: weight at iteration t
α: learning rate
∇L(w_t): gradient of the loss function with respect to w_t

This rule moves the weights in the direction that reduces the loss.

Weight Update in Neural Networks

In multi-layer neural networks, weight updates occur layer-by-layer, usually through a process known as backpropagation, which calculates the gradients of the loss function with respect to each weight using the chain rule.

For each weight:

w = w - α * (dL/dw)

In code (simplified):

for param in model.parameters():
    param.data -= learning_rate * param.grad

Mini Batch Weight Updates

Instead of updating weights after every single sample (SGD) or the entire dataset (Batch), modern training uses mini batches. For a mini batch size of m, the gradient is averaged and then the weight is updated:

w = w - α * (1/m) * sum(dL_i/dw for i in mini_batch)

Optimization Algorithms for Weight Update

Many algorithms improve on basic gradient descent by modifying how weights are updated:

Stochastic Gradient Descent (SGD)

w = w - α * gradient

Momentum

v = β * v - α * gradient
w = w + v

RMSProp

s = ρ * s + (1 - ρ) * gradient^2
w = w - α * gradient / (sqrt(s) + ε)

Adam

m = β1 * m + (1 - β1) * gradient
v = β2 * v + (1 - β2) * gradient^2

m_hat = m / (1 - β1^t)
v_hat = v / (1 - β2^t)

w = w - α * m_hat / (sqrt(v_hat) + ε)

Weight Update Scheduling

Weight updates can be fine-tuned by reducing the learning rate over time:

Step Decay

α = α_0 * (drop)^floor(epoch / steps)

Exponential Decay

α = α_0 * exp(-k * epoch)

Cosine Annealing

α = α_min + 0.5 * (α_max - α_min) * (1 + cos(pi * t / T))

Example: Manual Weight Update in Python

# Simple linear regression weight update
w = np.random.randn()
b = 0
learning_rate = 0.01

for epoch in range(epochs):
    y_pred = X * w + b
    loss = ((y_pred - y) ** 2).mean()
    
    dw = (2 / len(X)) * ((y_pred - y) * X).sum()
    db = (2 / len(X)) * (y_pred - y).sum()
    
    w -= learning_rate * dw
    b -= learning_rate * db

Weight Update Challenges

Vanishing Gradient

Gradients become too small
Causes weights in early layers to barely update

Exploding Gradient

Gradients become very large
Causes unstable training and weight divergence

Overshooting

Learning rate too high
Model jumps over minima

Weight Update vs Weight Initialization

Feature	Weight Initialization	Weight Update
Timing	Before training begins	Every training iteration
Purpose	Sets starting point	Optimizes to minimize loss
Tools Involved	Random/Xavier/He methods	Optimizers like Adam, SGD

Weight Update in CNNs

In convolutional networks:

Weights are kernel matrices
Updates adjust filters to detect specific visual features
Gradient updates follow the same rule: W_new = W_old - α * dL/dW

Weight Update in RNNs

For recurrent networks:

Weights are shared across time steps
Gradients are backpropagated through time
Susceptible to exploding/vanishing gradients

Solutions:

Gradient clipping
Use of gated units like LSTM and GRU

Techniques to Improve Weight Updates

Batch Normalization
Gradient Clipping
Dropout
Weight Regularization (L1/L2)
Learning Rate Warmup

Summary

Weight Update is the core learning step in all neural network training processes. Every epoch, mini batch, or gradient pass results in updated weights that slowly improve the model’s performance. How these updates are made — the choice of optimizer, schedule, batch size, and gradient conditioning — determines how effective and stable the training process will be.

A deep understanding of the weight update mechanics empowers you to debug, optimize, and scale your models with confidence.