Learn Before
  • Mini-Batch Gradient Descent

  • Epoch in Gradient Descent

Mini-Batch Gradient Descent Algorithm

for t = 1, 2,...N: (N is the number of mini-batches)

  • Forward propagate on X{t}X^{\{ t\}}
  • Compute cost function J{t}J^{\{ t\}}
  • Backpropagate to compute gradients wrt J{t}J^{\{ t\}} (using X{t}X^{\{ t\}},Y{t}Y^{\{ t\}})
  • W[l]=W[l]αdW[l],b[l]=b[l]αdb[l]W^{[l]} =W^{[l]}-\alpha dW^{[l]}, b^{[l]} = b^{[l]}-\alpha db^{[l]}

This is one pass through your training set using mini-batch gradient descent. It is also called doing one epoch of training.

0

2

5 years ago

Tags

Data Science

Related
  • An Example of Mini-Batches

  • Mini-Batch Gradient Descent Algorithm

  • Batch vs Stochastic vs Mini-Batch Gradient Descent

  • Example Using Mini-Batch Gradient Descent (Learning Rate Decay)

  • Mini-Batches Size

  • Which of these statements about mini-batch gradient descent do you agree with?

  • Why is the best mini-batch size usually not 1 and not m, but instead something in-between?

  • Suppose your learning algorithm’s cost J, plotted as a function of the number of iterations, looks like the image below:

  • Stochastic Gradient Descent Algorithm

  • Loss Gradient over a Mini-batch

  • Which of these statements about mini-batch gradient descent do you agree with?

  • Mini-Batch Gradient Descent Algorithm

  • Common Learning Rate Decay Implementation