MVA M2 课程,不定期整理更新。
Lecture 1 - part 1
linguistic data
phonological level - sentence-level analysis, sounds
graphemic level -- chinese characters
morphological level (how words are built) 组合词, 词根,
syntactic level: phrase structure, sub+v+obj / sub+obj+v, free word order, relative order
semantic level : the meaning of words (multiple meanings...) 颜色谱?词义包含
linguistic context : context,
extra-linguistic context : extra, the images, ect.
Diversity
variation
ambiguity
sparsity
phonological diversity : arm
sociolinguistic variation
lexical ambiguity: homonymy
prononciation ambiguity
segmentation ambiguity
tokens and form
amalgams: des = de les
à l'instar du
lemma : class of forms equivalent belonging to a same morphological paradigm
lemma + morphological form
syntactic ambiguity :
fruit flies like a banana
to get the dependency structure (POS tags)
automatic syntactic partitioning --- parsing
eg
pizza with anchovies
garden-path sentences
semantic ambiguity polysemy
hyponymy
corpora
corpus = body of text stored in a machine readable form
annotated, serving as training, development or test data
tree banks
...
zipf's law (distribution) : order words by decreasing frequency. and plot the frequency of the words
long tail (1% - 20% - 80%)
Lecture 1 - part 2
Deconstructing siri
- identify the talker
- recognize the words
- understand the query
- respond orally
lecture 2 - part 1
Chapter 9.3
1. filters and spectrum
2. modeling
classical way : by the Bayesian Model : language model and acoustic model (glotting ect.)
we can also condition on sequence of phonemes instead of words
use HMM
pheno hmm
yet hmm may crash - it also depends on the next one
one HMM model for a 3-phe
model the feature space : uncorrelated dimension of MFCCs
use a gaussian mixture : to model any kind of distribution
3. identification
female is very different from male
use different filters from the beginning
should have some new speakers
nowadays should be speaker independence
multitask training
4 denoising: wild speaker
data augmentation: white noise, other wild noise and add to the input voice, speed control
clean speech vs. noisy speech
5 next class : language models, end-2-end, frontier, different kinds of languages
lecture 3 - part 1
1. transform to a dictionary to a graph of phonim: use the seq2seq-p2g
2. language model : n-gram models : from w 1,... w i-1 to wi-N+1 to wi-1
the 4 gram seems better (for words)
- count words or letters ci/N
- unknown words: no matter how large the vocabulary for training, there is no-seen words
in the test set: lexicon is infinite
- in the n-gram model: use as a simple and dirty solution as unknown words:the corpus
getting large, the unknown words set will be small
- the n-gram will generally generate a sparse matrix : many combinasion will not appear
the problem grows with n (exponentially)
use smoothing: ci + 1 / N + V (works for evite the NAN)
backoff : approximate an n-gram by some combination of n-1, n-2 grams;
and interpolation the matirx
clustering, if the word not found, up-hierarchical to use the possibility of class
- compare different models:
plug into word recognizers
perplexity (混杂度):
extrinsic : improved WER
3. acoustic model + speech input
decoding latices
...
in a RNN language model --> a character based model , yet it grows very quickly for rnn
it will only succeed based on certain context ()
4. the decoding problem
Dynamic Programming :
- define subproblem
given n, find the num:
viterbi: the best : max O | Q,theta known, get Q
nbest : find the n-best path; instead of taking the minimum, take the n best path
- no dynamic programming with RNN : no simple recurrence relation between pi and pi-1
5. Parameter estimation P(O|theta), get theta using EM
6. end-to-end problem
we can add a right to left layer as bi-directional rnn
7. finite state machine
language-model : tranduceur and acceptors
pronunciation model acceptor, convert a sequence of phones to a sequence of words
8. basic operation on WFST
hard to build everything from scratch
combination is possible
OpenFST and Kaldi libraries
3 exercices
mispell detector
Lecture 4 -- language processing in the wild
6. Language III: Language Processing in the wild (2h+1h)
Algorithms: text normalization, coreference, distributional semantics, word embeddings
Human processing: conversational & casual language
Assignment: Evaluating Topic Models
Given a dataset of documents and human topic annotation, correlate different topic models with human judgements.
Readings:
J. Chang, J. Boyd-Graber, C. Wang, S. Gerrish, and D. Blei (2009).
Reading Tea Leaves: How Humans Interpret Topic Models. Neural Information Processing Systems.
Lecture 4
the shorter the windows, the more syntactic the representation 1-3
the longer the windows, the more semantic the representation 4-10
first order cooccurrence syntagmatic association(组合关系):
wrote + book or poem
second order cooccurrence -- paradigmatic association (范例关系)
similar neighbours : wrote / remarked / said + sentences ...
positive pointwise mutual information (PMI)
- 'the' and 'of' : very frequent yet not discriminative
- choose the words that are more informative
- This method often used to combine phrases
- too rare words cause the problem : rareness is amplified
sentence : self-contained syntactic structure
word
token
wordform : 句法上的单位结构 (POS)
named entity : utterance
semantic words: cannot inferred from composition
word correction
Damereau-Levenshtein distance
02/19/2018
Lecture 5 Parsers
Except for the usual lecture notes, other useful notes link are as following. [2] is the advanced nlp course for paper reading seminar. [4] has an explanation for how to deal with unknown words. [5] has described how to use NLTK-PCFG with CYK parser in detail. [6] has an introduction to How to evaluate and improve the parser. [9] offers a list for POS abbreviation for french words. [10] offers a full-version language tagger.
========= intro ===============
Syntax and formal grammars
(句法)
What structure do we need .
- introspection (内省)
- corpus study (interesting patterns)
- psycholinguistic or neurolinguistics
formal grammars : what formal devices do we need to represent such structures
treebanks: collection of sentences annotated with syntactic structures (trees). (heavily used)
syntactic lexicons : wordforms or lexemes. (important symbolically)
parsing : parse tree. parser : machine which do parsing
structures in tress.
- grammatical sentences
- ungrammatical sentence (oral, typo, ect.)
Find a way to distinguish the two.
how to label the tree's nodes ?
- a non-bracketed word : head
- labeled by POS (word form cluster) and phrases (three or two words' POSPhrase)
constituency tree :
- dependencies : dependency tree (no info about the constituent in the sentence)
- constituency : POS (the same level words lost dependency)
they can inter-transform sometimes, using Head percolation table : define rules of finding heads for each constituent.
non-projective case : crossing edges (difficult to deal with)
The constituency based and dependency based structure is two sides of the syntactic structure of a sentence, in the following we will limit ourselves to projective tree-like structures sentences.
=========== mathematical representation =============
Formal Grammarslanguage
language = a set of words over an alphabet T (字母表), called the vocabulary
In other words, a language is a subset of T* (T 构成的全集), by which we have :
- language set is finite or infinite
- T* is infinite yet countable
- # of language defined on T* is non-enumerableGrammar
A grammar G = (V, T, S, P)
where V is a finite set of objects called variables or non-terminal symbols. eg: {V,NP, DET...}
T is a finite set of objects called terminal symbols eg: {abcd...}
S is a special symbol called the start variable
P is a finite set of productions, the rewriting rules
e.g.define G = ({S}, {a, b}, S, P), where P as
The nonterminal symbol set is a singular start variable. Then:
NB:you can find a detailed description in the bookAlgebra by Michael Artin.
Context-free grammars (CFGs)
Can be called Grammaires hors-contexte, grammaires non-contextuelles, grammaires algébriques, ect.
Bottom-up method : First Parse
1 construct lexicon
2 first parse
--> construct a derived tree
The nodes with 0 or 2+ anchors
Top-down method: Substitution
Start by replacing the non-terminal symbols in the start symbol and goes top-down
乔姆斯基谱系
Watchthis.
1- 正则语言
2-
3-
Tree adjunction grammars (TAGs)
1 arguments and modifiers
arguments : pierre, pomme de teere
modifiers : at , in ...
Tree Insertion Grammars
the sequence of writing is called sentential form, the rule : production or rewriting rule
chomsky hierarchy :
nlp : mildly context-sensitive languages
substitution rule: equivalent elementary trees - combine the elementary trees .
pomme de terre problem :
lexicalisation : non-anchored nodes. (one terminal symbol)
substitution operation.
adjunction operation (附加操作)
initial tree
auxiliary tree (same label of root note)
a spine of this the auxiliary tree
jump of the auxiliary tree : split the node , the auxiliary then merge into
the gap
full line for a substitution operation
dashed line for an adjunction operation
wrapping tree: the middle operation (add bracket !)
for more look at pumping lemma for CFGs TAGs..
Complexity:
forbid wrapping auxiliary trees,
TIG : tree insertion grammar
major ways out : metagrammars ; extract it from a tree banks .
PCFGs: probabilistic CFGs (PCFGs) are a direct extension of CFGs,
all the possible way to write a symbol must sum to 1
multiple analyses : redundancies
use a representation form that captures the redandancies
- instantiation :
the probability computation can be done along the parse forest
PCFGs (independent in different part of the forest)
preliminary introduction for assignment:
parsing - CYK algorithm for CFG
deterministic : one tree for one sentence
nlk: any kind of CFG , including the ambiguous ones: 3 main kinds
- early ? 1970
- generalized LR algorithm
- CYK algo.
CYK
a recognizer than a parser
is the input string in the language defined by CYK
normal form: all rewriting rules are one of the three:
S -> AB, S->a, S-> epsilon
it is a bottom-up algorithm based on dynamic programming
lecture 6
MST parser
- create all dependencies
- weigh
- optimise and extract the best
Arc standard
- at each step, construct transitions, and a score to select the best one
- linear complexity !
maltparser
oracle that correctly predicts the next transition .
approximate by using a linear classifier . wf(c, t)
- sensitive to error propagation
local learning (using the history)
mstparser
for short dependency, Malt is a little better, on longer dependency MST is better
Beam search O(n)
Beam size increase, usually better yet not two large .
92% Yue Zhang
- Online recording and projective
Non-projective : no crossing arcs
yet a non-projective can be extended to projective by reordering words
Given a dependency tree, add swap operation.
LAS: labeled one
UAS: unlabeled one
tagger errors: alleviate by doing tagging and parsing at same time
shifting and tagging operation
Arc eager:
- cannot guarantee the number of operations (additional)
- formally the output cannot guarantee to be one single tree (but some trees).
extending the arc eager algorithm
Disfluencies
10% of words in the speech is meaningless
reparandum (edited phrase)
filled pauses, uh, um
eg: I want a flight to Boston, uh, I mean to Denver.
The best option is using a joint architecture, and do disfluency detection and parsing
Rasooli and Tetrault 2013 - 2014
Add three new action : remove the symbol words
============= Neural transition based ============
Chen and Manning 2014
replace the feature-based action by a neural network
1-2 pages report , no coding, try and investigate the machine translation
next day: facebook researcher