BIOE label:
B start of an entity; O background; I other parts of an entity
We first compute a certain score.
def compute_score(emissions, tags, seq_ends, mask, trans, start_trans, end_trans):
seq_length, batch_size = tag.shape
mask = mask.astype(emissions.dtype)
score = start_trans[tags[0]]
score += emissions[0, mnp.arange(batch_size),tags[0]]
for i in range(1, seq_length):
score += trans[tags[i-1], tags[i]] * mask[i]
score += emissions[i, mnp.arange(batch_size), tags[i]] * mask[i]
last_tags = tags[seq_ends, mnp.arange(batch_size)]
score += end_trans[last_tags]
return score
how to understand the score?
Just two thing:
1. When we consider a input seq: x = {x1, x2, x3...} and
a label y = {y1, y2, y3 ...} correspondingly, there exists a transportation probablity
where score(x, y) can represent the probablity from x to generate y.
2. Okay, so we need to acculumate probablity of x_i to y_i, and y_(i-1) to y_i since it is important to generate a reasonable next label after one has been generated.
so the former probablity is called emission probablity , and the next is called transportation probablity
, and we can define a kind of score by:
Next we define a concept called normalizer., which represents the denominator of the formula below: