LSTM overview
- This is an awfully great explanation of the idea behind LSTM and its variations.
Experiment
- Vanilla LSTM
- Tested modifications:
- No Input Gate (NIG)
- No Forget Gate (NFG)
- No Output Gate (NOG)
- No Input Activation Function (NIAF)
- No Output Activation Function (NOAF)
- No Peepholes (NP)
- Coupled Input and Forget Gate (CIFG)
- Full Gate Recurrence (FGR)
- Hyperparameter Search
- number of LSTM blocks per hidden layer: log-uniform
samples from [20, 200]; - learning rate: log-uniform samples from [10−6, 10−2];
- momentum: 1 − log-uniform samples from [0.01, 1.0];
- standard deviation of Gaussian input noise: uniform samples from [0, 1].
- number of LSTM blocks per hidden layer: log-uniform
- Tested datasets
- TIMIT Speech corpus (speech recognition)
- IAM Online Handwriting Database (OCR)
- JSB Chorales (music modeling)
- Conclusions
- Vanilla LSTM is good. Combine input/forget gate and remove peephole connections are worth trying.
- Do not remove output gate or forget gate.
- Learning rate is the most important parameter. Momentum is unimportant for LSTM. Gaussian noise on input may hurt.
- Hyperparameters can be tuned independently.
本文探讨了LSTM及其多种变体在网络训练中的表现,并通过实验证明了去除输入门和遗忘门对模型效果的影响。此外,还研究了不同超参数设置对模型性能的影响,发现学习率是最关键的参数。
5387

被折叠的 条评论
为什么被折叠?



