1. Learning with large datasets
- It's not who has the best algorithm that wins. It's who has the most data
2. Stochastic gradient descent
- batch gradient descent
repeat
- stochastic gradient descent
- randomly shuffle datasets
- repeat
3. Mini-batch gradient descent
- batch gradient descent: use all
examples in each iteration
- stochastic gradient descent: use all 1 example in each iteration
- mini-batch gradient descent: use all
examples in each iteration
4. Stochastic gradient descent convergence
- Learning rate
is typically held constant. Can slowly decrease
over time if we want
to converge
5. Online learning
6. Map-reduce and data parallelism