插入一点有关趋同演化的内容:
infinite site assumption => each site mutates at most once => 所有相同的变异都同源 (要不然就有可能有一个 site 会有多个 mutations) => 趋同演化不大可能
reconstructing mutation histories from real tumor data
apply it to three real single-cell tumor data sets of different data quality
JAK2-negative myeloproliferative neoplasm
712 SNVs, 58 tumor cells, 18 cancer-related mutation sites,error rates are known, mutation matrix distinguishes three observed states: normal, heterozygous, homozygous mutations.
这里说,如果出现 homozygous mutation, 就跟 infinite sites model 矛盾?
接下来看 Methods 部分的内容
首先定义一个 augmented ancestor matrix A(T),
row: mutation index
column: mutation index + the empty root node index
A_{i,k} = 1 if i=k, or i is an ancestor of k; A_{i,k}=0 otherwise.
vector \sigma: \sigma_{i}, the mutation index of the i-th cell;
Based on this, the likelihood of the data P( D|T, \sigma, \theta ) can be calculated.
可以看到,在式子(12)中,
the likelihood of the data 主要是跟 sequencing error 有关。
MCMC sampling
three elements would influence the final output tree T:
the mutation tree T, the attachment vector \sigma, and the sequencing error rates \theta.
Two ways: one is to marginalize out the \sigma component, and the other one is to consider it.
1: marginalize out the sample attachment
pick a sample, and uniformly choose an attachment point, how to satisfy the necessary properties for the MCMC chain on \sigma to converge.
后面的以后再写。