oxford-deepNLP
- L2a Word Level Semantics
- L3 Language Modeling and RNNs I
- L4 Language Modeling and RNNs II
- L5 Text Classification
- L6 RNNs and GPUs
- L7 Conditional Language Modeling
- L8 Conditional Language Modeling with Attention
- L9 Speech Recognition
- L10 Text to Speech
- L11 Question Answering
- L12 Memory Lecture
- L13 Linguistics
L2a Word Level Semantics
( Word2Vec == PMI matrix factorization of count based models)
Count-based methods
Neural Embedding Models: C&W
Embed all words in a sentence with E、Shallow convolution over embeddings、Minimise hinge loss
Neural Embedding Models: CBow
Embed context words、 Add them、Minimize Negative Log Likelihood、
Neural Embedding
Target word predicts context word、Embed target word
Task-based Embedding Learning
directly train embeddings jointly with the parameters of the network which uses them
Embeddings matrix can be learned from scratch, or initialised with pre-learned embeddings(fine-tuning)
Applications
- Text categorisation
- Natural language generation( language modeling \ conditional language modeling)
- Natural language understanding(
- Translation
- summarisation
- conversational agents
- Question answering
- structured knowledge-base population
- Dialogue)
L3 Language Modeling and RNNs I
Count based N-Gram Language Models
approximate the history with just the previous n words
Neural N-Gram Language Models
embed the same fixed n-gram history in a continues space(Feed forward network, h层之间没有关系,反向传播独立进行,可以并行化 Note that calculating the gradients for each time step n is independent of all other timesteps, as such they are calculated in parallel and summed)
Recurrent Neural Network Language Models
compress the entire history in a fixed length vector,enabling long range correlations to be captured(Recurrent Network,h层之间有时序关系,Back Propagation Through Time, Truncated Back Propagation Through Time== break depdencies after a fixed number of timesteps)
Bias vs Variance in LM Approximations
- N-gram are biased but low variance
- RNNs decrease the biase considerably, hopefully at a small cost to variance.
L4 Language Modeling and RNNs II
LSTM
GRU
L5 Text Classification
Binary classification
Multi-class classification
Multi-label classification
Clustering
Naive Bayes classifier (generative model)
Logistic Regression
RNN Classifier
- Dual Objective RNN (combine an LM objective with classifier training and to optimise the two losses jointly)
- Bi-Directional RNNs
- RNN classsifier can be a generative or discriminative model either(Joint-model: generative. learns both P© and P(d))
- Recursive Neural Networks