【NLP】Recurrent Neural Network and Language Models

本文介绍了三种语言模型构建方法:n元语法模型、神经n元语法模型及循环神经网络语言模型。详细探讨了每种模型的工作原理、优缺点,并对比了它们在不同应用场景下的表现。

0. Overview

What is language models?

A time series prediction problem.

It assigns a probility to a sequence of words,and the total prob of all the sequence equal one.

Many Natural Language Processing can be structured as (conditional) language modelling.

Such as Translation:

P(certain Chinese text | given English text)

Note that the Prob follows the Bayes Formula.

How to evaluate a Language Model?

Measured with cross entropy.

image

Three data sets:

1 Penn Treebank: www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz

2 Billion Word Corpus: code.google.com/p/1-billion-word-language-modeling-benchmark/

3 WikiText datasets: Pointer Sentinel Mixture Models. Merity et al., arXiv 2016

Overview: Three approaches to build language models:

Count based n-gram models: approximate the history of observed words with just the previous n words.

Neural n-gram models: embed the same fixed n-gram history in a continues space and thus better capture correlations between histories.

Recurrent Neural Networks: we drop the fixed n-gram history and compress the entire history in a fixed length vector, enabling long range correlations to be captured.

 

1. N-Gram models:

Assumption:

Only previous history matters.

Only k-1 words are included in history

Kth order Markov model

2-gram language model:

image

The conditioning context, wi−1, is called the history

Estimate Probabilities:

(For example: 3-gram)

image(count w1,w2,w3 appearing in the corpus)

Interpolated Back-Off:

That is , sometimes some certain phrase don’t appear in the corpus so the Prob of them is zero. To avoid this situation, we use Interpolated Back-off. That is to say, Interpolate k-gram models(k = n-1、n-2…1) into the n-gram models.

A simpal approach:

image

Summary for n-gram:

Good: easy to train. Fast.

Bad: Large n-grams are sparse. Hard to capture long dependencies. Cannot capture correlations between similary word distributions. Cannot resolve the word morphological problem.(running – jumping)

2. Neural N-Gram Language Models

Use A feed forward network like:

image

Trigram(3-gram) Neural Network Language Model for example:

imageimage


Wi are hot-vectors. Pi are distributions. And shape is |V|(words in the vocabulary)

image

(a sampal:detail cal graph)

image

Define the losscross entopy:

image

Training: use Gradient Descent

image

And a sampal of taining:

image

Comparsion with Count based n-gram LMs:

Good: Better performance on unseen n-grams But poorer on seen n-grams.(Sol: direct(linear) n-gram fertures). Use smaller memory than Counted based n-gram.

Bad: The number of parameters in the models scales with n-gram size. There is a limit on the longest dependencies that an be captured.

3. Recurrent Neural Network LM

That is to say, using a recurrent neural network to build our LM.

image

image

image

Model and Train:

image

Algorithm: Back Propagation Through Time(BPTT)

Note:

image

Note that, the Gradient Descent depend heavily. So the improved algorithm is:

Algorithm: Truncated Back Propagation Through Time.(TBPTT)

So the Cal graph looks like this:

image

So the Training process and Gradient Descent:

image

Summary of the Recurrent NN LMs:

Good:

RNNs can represent unbounded dependencies, unlike models with a fixed n-gram order.

RNNs compress histories of words into a fixed size hidden vector.

The number of parameters does not grow with the length of dependencies captured, but they do grow with the amount of information stored in the hidden layer.

Bad:

RNNs are hard to learn and often will not discover long range dependencies present in the data(So we learn LSTM unit).

Increasing the size of the hidden layer, and thus memory, increases the computation and memory quadratically.

Mostly trained with Maximum Likelihood based objectives which do not encode the expected frequencies of words a priori.

Some blogs recommended:

Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks karpathy.github.io/2015/05/21/rnn-effectiveness/

Yoav Goldberg: The unreasonable effectiveness of Character-level Language Models nbviewer.jupyter.org/gist/yoavg/d76121dfde2618422139

Stephen Merity: Explaining and illustrating orthogonal initialization for recurrent neural networks. smerity.com/articles/2016/orthogonal_init.html

 

转载于:https://www.cnblogs.com/duye/p/9372627.html

### 神经网络与学习系统相关术语及应用领域英译汉 以下为神经网络、学习系统、神经计算硬件和软件环境以及不同领域的应用相关的英文术语列表及其翻译: #### 1. 神经网络相关术语 - Neural Network (NN):神经网络[^2] - Artificial Neural Network (ANN):人工神经网络[^2] - Convolutional Neural Network (CNN):卷积神经网络 - Recurrent Neural Network (RNN):循环神经网络 - Long Short-Term Memory (LSTM):长短期记忆网络[^2] - Gated Recurrent Unit (GRU):门控循环单元 - Fully Connected Layer:全连接层[^2] #### 2. 学习系统相关术语 - Supervised Learning:监督学习 - Unsupervised Learning:无监督学习 - Reinforcement Learning:强化学习[^2] - Transfer Learning:迁移学习[^3] - Fine-tuning:微调[^3] - Backpropagation:反向传播算法 #### 3. 神经计算硬件相关术语 - Graphics Processing Unit (GPU):图形处理器[^2] - Tensor Processing Unit (TPU):张量处理器[^2] - Field-Programmable Gate Array (FPGA):现场可编程门阵列 - Application-Specific Integrated Circuit (ASIC):专用集成电路[^2] #### 4. 软件环境相关术语 - TensorFlow:张量流 - PyTorch:火炬(深度学习框架)[^2] - Keras:凯拉斯(高级神经网络API) - Natural Language Toolkit (NLTK):自然语言工具包[^1] #### 5. 应用领域相关术语 - Computer Vision (CV):计算机视觉 - Natural Language Processing (NLP):自然语言处理[^1] - Speech Recognition:语音识别 - Machine Translation:机器翻译 - Image Segmentation:图像分割[^2] - Object Detection:目标检测 ```python # 示例代码:将英文术语翻译成中文 translations = { "Neural Network": "神经网络", "Artificial Neural Network": "人工神经网络", "Convolutional Neural Network": "卷积神经网络", "Recurrent Neural Network": "循环神经网络", "Supervised Learning": "监督学习", "Unsupervised Learning": "无监督学习", "TensorFlow": "张量流", "PyTorch": "火炬", "Natural Language Toolkit": "自然语言工具包" } for term, translation in translations.items(): print(f"{term}: {translation}") ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值