Let’s say we only feed in one data point.
- out = model:forward( xi ) computes fw(xi) where fw is our model with its current parameters w , and stores the result in out.
- loss = criterion:forward(out,
yi ) computes the loss ℓ(fw(xi),yi) with respect to the true value yi . - dl_dout = criterion:backward(out, yi ) computes ∂ℓ(...)∂fw(xi) .
- model:backward( xi , dl_dout) computes ∂ℓ(...)∂w and stores this gradient in a place we have a reference to, usually called gradParameters in our code.