摘要:
摘要部分从信息熵增益的角度来说,主要是 (1:) maximum-likelihood mutation history / mutation tree estimation;(2:) experimental sequencing error estimation; (3:) reconstruction accuracy improvements.
Background:
第一段:how tumor evolve with respect to cells;
第二段:tumor evolves, remission, drug / personalized cancer therapies
第三段:tumor 中的细胞的谱系图;
to reconstruct the evolutionary history of SNVs, infinite sites assumptions=>
each site mutates at most once, i.e., from 0 to 1 and never comes back =>
perfect phylogeny, introducing mutation tree; as in Fig. 1d.
第四段:common bulk high-throughout sequencing admixes the DNA of millions of cells; “the typical output of these tools would be one or several trees as in Fig.1d, augmented with the estimated prevalence of the different subclones in the tumor.”,
这个地方有点奇怪,prevalence 也估计出来了?
第五段:mixed samples => does not work on complex subclones and low-frequency subclones => single-nucleus sequencing techniques => data is error-prone (false negative rate and false positive rate are both high) => prohibits the perfect phylogeny reconstruction approach => generalization of perfect phylogeny model to deal with imperfect data
第六段:bayesain inference, 1: whole posterior tree distribution;
2: error rate estimation
第七段:BitPhylogeny
第八段:Kim and Simon’s pairwise ordering test, and mutation tree
polynomial-time pairwise ordering test algorithm, (这个pairwise ordering test 用到的也是Bayesian Inference)
第九段:SCITE 目的是 1: maximum-likelihood-tree estimation; 2: sequencing error estimation; 3: # cells necessary estimation
Results and Discussion
第十段:tree inference from single-cell mutation profiles
1: represent single-cell mutation histories
2: likelihood-based approach to deal with sequencing errors
第十一段:tumor evolution model and tree representation
Assumptions: point mutations and infinite sets assumption: each site mutates at most once in the evolutionary history of a tumor. Introduce data matrix E, and mutation tree. Some mutation order is not identified.
第十二段:observational errors
这个 errors 是不是就是 sequencing errors ?
觉得就是 sequencing errors.
SCITE, a MCMC sample scheme,
两个重点:1: state change, ergodic moves ?
2: transition probabilities and acceptance ratio ?
SCITE:两种
Maximum A Posterior (MAP) : 在给定数据的前提之下,选取使得参数出现概率最大的那个参数;
Maximum Likelihood (ML):选取参数,使得数据出现的概率值最大
SCITE 还可以有一个参数,来 amplifies the likelihood and speed up tree discovery
SCITE 可以 skip the learning of error rates if it is provided
reconstruct mutation history from real tumor data
JAK2-negative myeloproliferative neoplasm
58 tumor cells, 712 SNVs, 18 cancer-related mutation sites, sequencing error rates 已经知道了。这是 Part 1 部分。