oxford-deepNLP_biji_a conditional language-modeling loss-优快云博客

这篇博客涵盖了从词级语义到循环神经网络语言模型的深度学习自然语言处理主题，包括Word2Vec、N-Gram模型、RNN LSTM、GRU、文本分类和注意力机制在语言建模中的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

L2a Word Level Semantics

( Word2Vec == PMI matrix factorization of count based models)

Count-based methods

Neural Embedding Models: C&W

在这里插入图片描述
Embed all words in a sentence with E、Shallow convolution over embeddings、Minimise hinge loss

Neural Embedding Models: CBow

在这里插入图片描述
Embed context words、 Add them、Minimize Negative Log Likelihood、

Neural Embedding

在这里插入图片描述
Target word predicts context word、Embed target word

Task-based Embedding Learning

directly train embeddings jointly with the parameters of the network which uses them
Embeddings matrix can be learned from scratch, or initialised with pre-learned embeddings(fine-tuning)

Applications

Text categorisation
Natural language generation( language modeling \ conditional language modeling)
Natural language understanding(
- Translation
- summarisation
- conversational agents
- Question answering
- structured knowledge-base population
- Dialogue)

L3 Language Modeling and RNNs I

Count based N-Gram Language Models

approximate the history with just the previous n words

Neural N-Gram Language Models

在这里插入图片描述
embed the same fixed n-gram history in a continues space(Feed forward network, h层之间没有关系，反向传播独立进行，可以并行化 Note that calculating the gradients for each time step n is independent of all other timesteps, as such they are calculated in parallel and summed)
在这里插入图片描述

Recurrent Neural Network Language Models

在这里插入图片描述

在这里插入图片描述
compress the entire history in a fixed length vector,enabling long range correlations to be captured（Recurrent Network，h层之间有时序关系，Back Propagation Through Time, Truncated Back Propagation Through Time== break depdencies after a fixed number of timesteps）
在这里插入图片描述

Bias vs Variance in LM Approximations

N-gram are biased but low variance
RNNs decrease the biase considerably, hopefully at a small cost to variance.

L4 Language Modeling and RNNs II

LSTM
GRU

L5 Text Classification

Binary classification

Multi-class classification

Multi-label classification

Clustering

Naive Bayes classifier (generative model)

Logistic Regression

RNN Classifier

Dual Objective RNN (combine an LM objective with classifier training and to optimise the two losses jointly)
Bi-Directional RNNs
Recursive Neural Networks