cs224n lecture8 Recurrent Neural Networks and Language Models

本文探讨了传统语言模型及循环神经网络(RNN)在处理序列任务中的应用,包括语言建模、机器翻译等,并介绍了RNN在解决梯度消失等问题上的改进措施及其在高级RNN结构如LSTM和GRU中的实现。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

  • Traditional language models
  • RNNs
  • RNN language models
  • training problems and tricks
  • RNN for other sequence tasks
  • Bi and deep rnns
Language Models
  • computes a probability for a sequence of words P(w1,...,wT)P(w1,...,wT)

  • ML

    • word ordering (ab vs ba)
    • word choice(home vs house)
  • Traditional

    • conditioned on window of n previous words
    • Markov assumption
    • use counts to estimate prob.
    • RAM requirment scales with length of sequence(n-gram)
  • Recurrent Neural Networks

    • RAM requirement scales with number of words
      rnn
    • use same set of W weights at all time steps
      loss
  • Gradient vanish or explosive

    • long distance: can only memory 5-6 words
    • solution1: initialize W to identity matrix and RELU f(z)=reac(z)=max(z,0)f(z)=reac(z)=max(z,0)
birnn
  • just need to reverse the order of sequence
SMT

f: french, source
e: english, destiny

p(e|f)=argmaxep(f|e)p(e)p(e|f)=argmaxep(f|e)p(e)

p(e)p(e) language model: see as a weighted parameter, control the fluency
p(f|e)p(f|e) translate model

p(text|voice)=p(voice|text)p(text)p(text|voice)=p(voice|text)p(text)
translate model
  • alignment: hard
    • zero
    • one to many
    • many to many
    • many to many
    • reorder
  • many options:beam search
NMT

AI advantage: end2end trainable model, just consider a final objective function, then everything is learned in the model

RNN Translation model extensions
  1. Train different RNN weights for encoding and decoding
  2. compute every hidden state in decoder from
    • Previous hidden state
    • last hidden vector of encoder
    • previous predict output word
  3. train stacked rnns
  4. train bidirectional encoder(occasionally)
  5. train input sequence in reverse order for simpler optimization(escape vanishing gradients): A B C -> X Y ==> C B A -> X Y
Advanced RNN
  • LSTM
  • GRU
GRU
  • update gate: based on current input word vector and hidden state
    zt=sig(W(z)xt+U(z)ht1)zt=sig(W(z)xt+U(z)ht−1)
  • reset gate:
    rt=sig(W(r)xt+U(r)ht1)rt=sig(W(r)xt+U(r)ht−1)
LSTM
  1. Input gate
  2. Forge gate
  3. Output
Recent Improvements
  1. prob with softmax
    • no zero shot word predictions
    • combine pointer and softmax
Tricks
  • prob: softmax is huge and slow
    • class-based word prediction(instead of softmax)
  • just need to back propagation once
  • initialize W to identity matrix and RELU
How to improve word Embedding
  1. Input: word -> subword

    • morpheme: BP encoding
    • character embedding
  2. regularization

    • preprocessing: replace some words, drop frequent word and add unfrequent word

Taks List

  1. NER todo: see leture8
  2. Machine Translation:
todos:
  • Recap word vector equtions, shows in the begining of leture9: Machine Translation and Adavanced Recurrent LSTMs and GRUs
  • replicating NER paper
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值