[Paper note] LSTM: A Search Space Odyssey

最新推荐文章于 2025-11-16 07:22:43 发布

原创最新推荐文章于 2025-11-16 07:22:43 发布 · 1.9k 阅读

CC 4.0 BY-SA版权

文章标签：

20 篇文章

订阅专栏

本文探讨了LSTM及其多种变体在网络训练中的表现，并通过实验证明了去除输入门和遗忘门对模型效果的影响。此外，还研究了不同超参数设置对模型性能的影响，发现学习率是最关键的参数。

This is an awfully great explanation of the idea behind LSTM and its variations.

Vanilla LSTM
Tested modifications:
1. No Input Gate (NIG)
2. No Forget Gate (NFG)
3. No Output Gate (NOG)
4. No Input Activation Function (NIAF)
5. No Output Activation Function (NOAF)
6. No Peepholes (NP)
7. Coupled Input and Forget Gate (CIFG)
8. Full Gate Recurrence (FGR)
Hyperparameter Search
- number of LSTM blocks per hidden layer: log-uniform
  samples from [20, 200];
- learning rate: log-uniform samples from [10−6, 10−2];
- momentum: 1 − log-uniform samples from [0.01, 1.0];
- standard deviation of Gaussian input noise: uniform samples from [0, 1].
Tested datasets
- TIMIT Speech corpus (speech recognition)
- IAM Online Handwriting Database (OCR)
- JSB Chorales (music modeling)
Conclusions
- Vanilla LSTM is good. Combine input/forget gate and remove peephole connections are worth trying.
- Do not remove output gate or forget gate.
- Learning rate is the most important parameter. Momentum is unimportant for LSTM. Gaussian noise on input may hurt.
- Hyperparameters can be tuned independently.