Backto DeepCV
Notes from Neural Network Hyperparameters.
What is Hyperparameters?
Most machine learning algorithms involve “hyperparameters” which are variables set before actually optimizing the model’s parameters. Neural networks can have many hyperparameters, including those which specify the structure of the network itself
and those which determine how the network is trained
.
Setting the values of hyperparameters can be seen as model selection, i.e. choosing which model to use from the hypothesized set of possible models.
How to set Hyperparameters?
Hyperparameters are often set
- by hand. experience
- selected by some search algorithm, such as grid search, random search,
- optimized by some “hyper-learner”. (
hot topic
)
Typical Hyperparameter
In particular, we will focus on feed-forward neural nets trained with mini-batch gradient descent.
Trainning Hyperparameters
- learning rate: determines how
quickly
the gradient updates follow the gradient direction. If the learning rate is very small, the model will converge too slowly; if the learning rate is too large, the model will diverge. - Momentum: A very common technique is to “smooth” the gradient updates using a leaky integrator filter with parameter β \beta β.
- Loss Function: compares the network’s output for a training example against the intended ground truth output. A common general-purpose loss function is the squared Eclidian distance, given by L = 1 2 ∑ i ( y i − z