Learn Before
Mini-Batch Gradient Descent
Epoch in Gradient Descent
Mini-Batch Gradient Descent Algorithm
for t = 1, 2,...N: (N is the number of mini-batches)
- Forward propagate on
- Compute cost function
- Backpropagate to compute gradients wrt (using ,)
This is one pass through your training set using mini-batch gradient descent. It is also called doing one epoch of training.
0
2
Tags
Data Science
Related
An Example of Mini-Batches
Mini-Batch Gradient Descent Algorithm
Batch vs Stochastic vs Mini-Batch Gradient Descent
Example Using Mini-Batch Gradient Descent (Learning Rate Decay)
Mini-Batches Size
Which of these statements about mini-batch gradient descent do you agree with?
Why is the best mini-batch size usually not 1 and not m, but instead something in-between?
Suppose your learning algorithm’s cost J, plotted as a function of the number of iterations, looks like the image below:
Stochastic Gradient Descent Algorithm
Loss Gradient over a Mini-batch
Which of these statements about mini-batch gradient descent do you agree with?
Mini-Batch Gradient Descent Algorithm
Common Learning Rate Decay Implementation