Lecture 6 Sequence Tagging: Hidden Markov Models

Problems with POS Tagging 词性标注的问题

  • Exponentially many combinations: |Tags|M, for length M 组合数量呈指数级增长:|Tags|M,长度为M

  • Tag sequences of different lengths 标记不同长度的序列

  • Tagging is a sentence-level task but as humans we decompose it into small word-level tasks 标注是句级任务,但作为人类,我们将其分解为小型的词级任务

  • Solution:

    • Define a model that decomposes process into individual word-level tasks steps. But this takes into account the whole sequence when learning and predicting. 定义一个模型,将过程分解为单个词级任务步骤。但在学习和预测时,考虑整个序列
    • This is called sequence labelling, or structured prediction 这被称为序列标注,或结构预测

Probabilistic Model of HMM HMM的概率模型

  • Goal: Obtain best tag sequence t from sentence w 目标:从句子w中获取最佳标签序列t

    The formulation 表述公式:

    Applying Bayes Rule 应用贝叶斯定理:

    Decomposing the Elements 分解元素:

    Probability of a word depends only on the tag 单词的概率只取决于标签:

    Probability of a tag depends only on the previous tag 标签的概率只取决于前一个标签:

Two Assumptions of HMM HMM的两个假设

  • Output independence: An observed event(word) depends only on the hidden state(tag) 输出独立性:观察到的事件(词)只取决于隐藏状态(标签) ->

  • Markov assumption: The current state(tag) depends only on the previous state 马尔科夫假设:当前状态(标签)只取决于前一个状态->

Training HMM 训练HMM

  • Parameters are individual probabilities: 参数是单个概率

    • Emission Probabilities 发射概率 (O):
    • Transition Probabilities 转移概率 (A):
  • Training uses Maximum Likelihood Estimation: Done by simp

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值