Model Bias
- add the features
- make your model complex, improve the level of the net
Optimization Issue
- If deeper networks do not obtain smaller loss on training data
- problems: gradient is close to zero(critical point)
local minima: mostly the question you face is not local minima for the dimensions you choose
saddle point
how to distinguish the two points
Overfitting
- solution
1 more training data
2 data augmentation
3 constrained modelless parameters
sharing parameters: CNN
early stopping
less features
regularization
dropout
Mismatch
your training and testing data have different distributions
Batch
- shuffle: every epoch, the batch is different
- small batch
more noisy and better performance on training data and testing data
more time consume
Momentum
Gradient Descent + Momentum
Error Surface
norm of gradient is not low, but loss can not reduce
but if we reduce the learning rate, the speed of converge is too slow
Solution:
for different parameters
for the same parameter in different time
Learning rate scheduling
- Learning rate decay: as the training goes, we are closer to the destination, so we reduce the learning rate
- warm up: increase and then decrease
summary