1、Deep learning Reading List(http://deeplearning.net/reading-list/)
Review Papers
-
- Representation Learning: A Review and New Perspectives, Yoshua Bengio, Aaron Courville, Pascal Vincent, Arxiv, 2012.
- The monograph or review paperLearning Deep Architectures for AI (Foundations & Trends in Machine Learning, 2009).
- Deep Machine Learning – A New Frontier in Artificial Intelligence Research – asurvey paper by Itamar Arel, Derek C. Rose, and Thomas P. Karnowski.
- Graves, A. (2012). Supervised sequence labelling with recurrent neural networks(Vol. 385). Springer.
- Computer Vision
- ImageNet Classification with Deep Convolutional Neural Networks, Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton, NIPS 2012.
- Learning Hierarchical Features for Scene Labeling, Clement Farabet, Camille Couprie, Laurent Najman and Yann LeCun, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013.
- Learning Convolutional Feature Hierachies for Visual Recognition, Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michaël Mathieu and Yann LeCun, Advances in Neural Information Processing Systems (NIPS 2010), 23, 2010.
- Graves, Alex, et al.“A novel connectionist system for unconstrained handwriting recognition.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 31.5 (2009): 855-868.
- Cireşan, D. C., Meier, U., Gambardella, L. M., & Schmidhuber, J. (2010).Deep, big, simple neural nets for handwritten digit recognition. Neural computation, 22(12), 3207-3220.
- Ciresan, Dan, Ueli Meier, and Jürgen Schmidhuber.“Multi-column deep neural networks for image classification.” Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
- Ciresan, D., Meier, U., Masci, J., & Schmidhuber, J. (2011, July).A committee of neural networks for traffic sign classification. In Neural Networks (IJCNN), The 2011 International Joint Conference on (pp. 1918-1921). IEEE.
- NLP and Speech
- Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing, Antoine Bordes, Xavier Glorot, Jason Weston and Yoshua Bengio (2012), in: Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS)
- Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. Socher, R., Huang, E. H., Pennington, J., Ng, A. Y., and Manning, C. D. (2011a). In NIPS’2011.
- Semi-supervised recursive autoencoders for predicting sentiment distributions. Socher, R., Pennington, J., Huang, E. H., Ng, A. Y., and Manning, C. D. (2011b). In EMNLP’2011.
- Mikolov Tomáš: Statistical Language Models based on Neural Networks. PhD thesis, Brno University of Technology, 2012.
- Graves, Alex, and Jürgen Schmidhuber. “Framewise phoneme classification with bidirectional LSTM and other neural network architectures.“ Neural Networks 18.5 (2005): 602-610.
- Disentangling Factors and Varitions with Depth
- Goodfellow, Ian, et al. “Measuring invariances in deep networks.”Advances in neural information processing systems 22 (2009): 646-654.
- Bengio, Yoshua, et al. “Better Mixing via Deep Representations.”arXiv preprint arXiv:1207.4404 (2012).
- Xavier Glorot, Antoine Bordes andYoshua Bengio, Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach, in: Proceedings of the Twenty-eight International Conference on Machine Learning (ICML’11), pages 97-110, 2011.
- Transfer Learning and domain adaptation
- Raina, Rajat, et al. “Self-taught learning: transfer learning from unlabeled data.”Proceedings of the 24th international conference on Machine learning. ACM, 2007.
- Xavier Glorot, Antoine Bordes andYoshua Bengio, Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach, in: Proceedings of the Twenty-eight International Conference on Machine Learning (ICML’11), pages 97-110, 2011.
- R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu and P. Kuksa.Natural Language Processing (Almost) from Scratch.Journal of Machine Learning Research, 12:2493-2537, 2011.
- Mesnil, Grégoire, et al. “Unsupervised and transfer learning challenge: a deep learning approach.”Unsupervised and Transfer Learning Workshop, in conjunction with ICML. 2011.
- Ciresan, D. C., Meier, U., & Schmidhuber, J. (2012, June).Transfer learning for Latin and Chinese characters with deep neural networks. In Neural Networks (IJCNN), The 2012 International Joint Conference on (pp. 1-6). IEEE.
- Practical Tricks and Guides
- “Improving neural networks by preventing co-adaptation of feature detectors.” Hinton, Geoffrey E., et al. arXiv preprint arXiv:1207.0580 (2012).
- Practical recommendations for gradient-based training of deep architectures, Yoshua Bengio, U. Montreal, arXiv report:1206.5533, Lecture Notes in Computer Science Volume 7700, Neural Networks: Tricks of the Trade Second Edition, Editors: Grégoire Montavon, Geneviève B. Orr, Klaus-Robert Müller, 2012.
- A practicalguide to training Restricted Boltzmann Machines, by Geoffrey Hinton.
- Sparse Coding
- Emergence of simple-cell receptive field properties by learning a sparse code for natural images, Bruno Olhausen, Nature 1996.
- Kavukcuoglu, Koray, Marc’Aurelio Ranzato, and Yann LeCun. “Fast inference in sparse coding algorithms with applications to object recognition.“ arXiv preprint arXiv:1010.3467 (2010).
- Goodfellow, Ian, Aaron Courville, and Yoshua Bengio. “Large-Scale Feature Learning With Spike-and-Slab Sparse Coding.” ICML 2012.
- Efficient sparse coding algorithms. Honglak Lee, Alexis Battle, Raina Rajat and Andrew Y. Ng. In NIPS 19, 2007. pdf
- “Sparse coding with an overcomplete basis set: A strategy employed by VI?.” . Olshausen, Bruno A., and David J. Field.Vision research 37.23 (1997): 3311-3326.
- Foundation Theory and Motivation
- Hinton, Geoffrey E. “Deterministic Boltzmann learning performs steepest descent in weight-space.”Neural computation 1.1 (1989): 143-150.
- Bengio, Yoshua, and Samy Bengio. “Modeling high-dimensional discrete data with multi-layer neural networks.”Advances in Neural Information Processing Systems 12 (2000): 400-406.
- Bengio, Yoshua, et al. “Greedy layer-wise training of deep networks.”Advances in neural information processing systems 19 (2007): 153.
- Bengio, Yoshua, Martin Monperrus, and Hugo Larochelle. “Nonlocal estimation of manifold structure.”Neural Computation 18.10 (2006): 2509-2528.
- Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. “Reducing the dimensionality of data with neural networks.”Science 313.5786 (2006): 504-507.
- Marc’Aurelio Ranzato, Y., Lan Boureau, and Yann LeCun. “Sparse feature learning for deep belief networks.”Advances in neural information processing systems 20 (2007): 1185-1192.
- Bengio, Yoshua, and Yann LeCun. “Scaling learning algorithms towards AI.”Large-Scale Kernel Machines 34 (2007).
- Le Roux, Nicolas, and Yoshua Bengio. “Representational power of restricted boltzmann machines and deep belief networks.”Neural Computation 20.6 (2008): 1631-1649.
- Sutskever, Ilya, and Geoffrey Hinton. “Temporal-Kernel Recurrent Neural Networks.”Neural Networks 23.2 (2010): 239-243.
- Le Roux, Nicolas, and Yoshua Bengio. “Deep belief networks are compact universal approximators.”Neural computation 22.8 (2010): 2192-2207.
- Bengio, Yoshua, and Olivier Delalleau. “On the expressive power of deep architectures.”Algorithmic Learning Theory. Springer Berlin/Heidelberg, 2011.
- Montufar, Guido F., and Jason Morton. “When Does a Mixture of Products Contain a Product of Mixtures?.”arXiv preprint arXiv:1206.0387 (2012).
- Classification
- The Manifold Tangent Classifier, Salah Rifai, Yann Dauphin, Pascal Vincent, Yoshua Bengio and Xavier Muller, in: NIPS’2011.
- “Discriminative Learning of Sum-Product Networks.“, Gens, Robert, and Pedro Domingos, NIPS 2012 Best Student Paper.
- Large Scale Deep Learning
- Building High-level Features Using Large Scale Unsupervised Learning Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean, and Andrew Y. Ng, ICML 2012.
- Bengio, Yoshua, et al. “Neural probabilistic language models.“ Innovations in Machine Learning (2006): 137-186. Specifically Section 3 of this paper discusses the asynchronous SGD.
- Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., and Bengio, Y. (2013). Maxout networks. Technical Report, Universite de Montreal.
- Recurrent Networks
- Training Recurrent Neural Networks, Ilya Sutskever, PhD Thesis, 2012.
- Bengio, Yoshua, Patrice Simard, and Paolo Frasconi.“Learning long-term dependencies with gradient descent is difficult.” Neural Networks, IEEE Transactions on 5.2 (1994): 157-166.
- Mikolov Tomáš: Statistical Language Models based on Neural Networks. PhD thesis, Brno University of Technology, 2012.
- Hochreiter, Sepp, and Jürgen Schmidhuber. “Long short-term memory.” Neural computation 9.8 (1997): 1735-1780.
- Hochreiter, S., Bengio, Y., Frasconi, P., & Schmidhuber, J. (2001).Gradient flow in recurrent nets: the difficulty of learning long-term dependencies.
- Schmidhuber, J. (1992). Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2), 234-242.
- Graves, A., Fernández, S., Gomez, F., & Schmidhuber, J. (2006, June). Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning (pp. 369-376). ACM.
- Hyper Parameters
- “Practical Bayesian Optimization of Machine Learning Algorithms”, Jasper Snoek, Hugo Larochelle, Ryan Adams, NIPS 2012.
- Random Search for Hyper-Parameter Optimization, James Bergstra and Yoshua Bengio (2012), in: Journal of Machine Learning Research, 13(281–305).
- Algorithms for Hyper-Parameter Optimization, James Bergstra, Rémy Bardenet, Yoshua Bengio and Balázs Kégl, in: NIPS’2011, 2011.
- Optimization
- Training Deep and Recurrent Neural Networks with Hessian-Free Optimization, James Martens and Ilya Sutskever, Neural Networks: Tricks of the Trade, 2012.
- Schaul, Tom, Sixin Zhang, and Yann LeCun. “No More Pesky Learning Rates.” arXiv preprint arXiv:1206.1106 (2012).
- Le Roux, Nicolas, Pierre-Antoine Manzagol, and Yoshua Bengio. “Topmoumoute online natural gradient algorithm.” Neural Information Processing Systems (NIPS). 2007.
- Bordes, Antoine, Léon Bottou, and Patrick Gallinari. “SGD-QN: Careful quasi-Newton stochastic gradient descent.” The Journal of Machine Learning Research 10 (2009): 1737-1754.
- Glorot, Xavier, and Yoshua Bengio.“Understanding the difficulty of training deep feedforward neural networks.” Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’10). Society for Artificial Intelligence and Statistics. 2010.
- Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. “Deep Sparse Rectifier Networks.“ Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP Volume. Vol. 15. 2011.
- “Deep learning via Hessian-free optimization.”Martens, James. Proceedings of the 27th International Conference on Machine Learning (ICML). Vol. 951. 2010.
- Hochreiter, Sepp, and Jürgen Schmidhuber. “Flat minima.” Neural Computation, 9.1 (1997): 1-42.
- Unsupervised Feature Learning
- Salakhutdinov, Ruslan, and Geoffrey E. Hinton.“Deep boltzmann machines.” Proceedings of the international conference on artificial intelligence and statistics. Vol. 5. No. 2. Cambridge, MA: MIT Press, 2009.
- Scholarpedia page on Deep Belief Networks.
- Deep Boltzmann Machines
- An Efficient Learning Procedure for Deep Boltzmann Machines, Ruslan Salakhutdinov and Geoffrey Hinton, Neural Computation August 2012, Vol. 24, No. 8: 1967 — 2006.
- Montavon, Grégoire, and Klaus-Robert Müller. “Deep Boltzmann Machines and the Centering Trick.“ Neural Networks: Tricks of the Trade (2012): 621-637.
- Salakhutdinov, Ruslan, and Hugo Larochelle. “Efficient learning of deep boltzmann machines.“ International Conference on Artificial Intelligence and Statistics. 2010.
- Salakhutdinov, Ruslan. Learning deep generative models. Diss. University of Toronto, 2009.
- RBMs
- Large-Scale Feature Learning With Spike-and-Slab Sparse Coding, Ian Goodfellow, Aaron Courville and Yoshua Bengio, in: ICML’2012
- Unsupervised Models of Images by Spike-and-Slab RBMs, Aaron Courville, James Bergstra and Yoshua Bengio, in: ICML’2011
-
- Autoencoders
- Regularized Auto-Encoders Estimate Local Statistics, Guillaume Alain, Yoshua Bengio and Salah Rifai, Université de Montréal, arXiv report 1211.4246, 2012
- A Generative Process for Sampling Contractive Auto-Encoders, Salah Rifai, Yoshua Bengio, Yann Dauphin and Pascal Vincent, in: ICML’2012, Edinburgh, Scotland, U.K., 2012
- Contracting Auto-Encoders: Explicit invariance during feature extraction, Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot and Yoshua Bengio, in: ICML’2011
- Disentangling factors of variation for facial expression recognition, Salah Rifai, Yoshua Bengio, Aaron Courville, Pascal Vincent and Mehdi Mirza, in: ECCV’2012.
- Vincent, Pascal, et al. “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.“ The Journal of Machine Learning Research 11 (2010): 3371-3408.
- Vincent, Pascal. “A connection between score matching and denoising autoencoders.” Neural computation 23.7 (2011): 1661-1674.
- Chen, Minmin, et al. “Marginalized denoising autoencoders for domain adaptation.“ arXiv preprint arXiv:1206.4683 (2012).
- Miscellaneous
- The ICML 2009 Workshop on Learning Feature Hierarchieswebpage has a reading list.
- Stanford’sUFLDL Recommended Readings.
- TheLISApublic wiki has a reading list and abibliography.
- Geoff Hinton hasreadingsNIPS 2007 tutorial.
- The LISA publications database contains adeep architectures category.
- A very brief introduction toAI, Machine Learning, andDeep Learning inYoshua Bengio‘sIFT6266 graduate class
2、深度学习的9篇标志性论文(转载自http://download.youkuaiyun.com/detail/luoyun614/6718113)
A Fast Learning Algorithm for Deep Belief Nets (2006)
- 首 次提出layerwise greedy pretraining的方法,开创deep learning方向。 layerwise pretraining的Restricted Boltzmann Machine (RBM)堆叠起来构成 Deep Belief Network (DBN),其中训练最高层的RBM时加入了label。之后对整个DBN进行fine-tuning。在 MNIST数据集上测试没有严重过拟合,得到了比Neural Network (NN)更低的test error。
Reducing the Dimensionality of Data with Neural Networks (2006)
- 提 出deep autoencoder,作为数据降维方法发在Science上。Autoencoder是一类通过最小化函数集对训练集数据的重构误差,自 适应地编解码训练数据的算法。Deep autoencoder模型用Contrastive Divergence (CD)算法逐层训练重构输入数据 的RBM,堆叠在一起fine-tuning最小化重构误差。作为非线性降维方法在图像和文本降维实验中明显优于传统方法。
Learning Deep Architectures for AI (2009)
- Bengio关于deep learning的tutorial,从研究背景到RBM和CD再到数种deep learning算法都有详细介绍。还有丰富的reference。于是也有个缺点就是太长了。
A Practical Guide to Training Restricted Boltzmann Machines (2010)
- 如果想要自己实现deep learning算法,这篇是不得不看的。我曾经试过自己写但是效果很不好,后来看到它才知道算法实现中还有很多重要的细节。对照网上的代码看也能更好地理解代码。
Greedy Layer-Wise Training of Deep Networks (2007)
- 对DBN的一些扩展,比如应用于实值输入等。根据实验提出了对deep learning的performance的一种解释。
Why Does Unsupervised Pre-training Help Deep Learning? (2010)
- 总结了对deep learning的pretraining作用的两种解释:regularization和help optimization。设计实验验证两种因素的作用。
Autoencoders, Unsupervised Learning, and Deep Architectures (2011)
- 从理论角度对不同的Autoencoders作了统一分析的尝试。
On the Quantitative Analysis of Deep Belief Networks (2008)
- 用annealed importance sampling (AIS)给出一种估计RBM的partition function的方法,从而能够估算p(x)以及比较不同的DBN。
Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient (2008)
- 提 出用persistent contrastive divergence (PCD)算法逼近 maximum likelihood estimation的目标,从而可以得到更好的generative model。传统CD算法并不是以最大化 p(x)为目标的,另有paper证明CD算法不对应任何优化目标函数