来自Quora.
1. RNN do not make the Markov assumption and so can, in theory, take into account long-term dependencies when modeling natural language.
但训练RNN也会面临gradient vanish问题,怎么解决,用LSTM吗?
但训练RNN也会面临gradient vanish问题,怎么解决,用LSTM吗?
2. The main advantages of using a recurrent neural network over Markov chains and hidden Markov model would be the greater representational power of neural networks and their ability to perform intelligent smoothing by taking into account syntactic and semantic features (see for example Turian et al.). By comparison n-grams have a number of parameters exploding with the size of the vocabulary and n and rely on simple smoothing techniques like Kneser–Ney or Good–Turing. I would add (for what it's worth) that I kind of think of the Hidden Markov model vs Recurrent Neural Network "battle" as being similar to a mixture model vs Feed-Forward Neural Network "battle".