读文章 tree inference for single-cell data -- Part 1_error rate estimation-优快云博客

本文链接：https://blog.youkuaiyun.com/bsn2020/article/details/109068889

本文探讨了肿瘤进化的细胞谱系图重建问题，介绍了SCITE算法在处理单细胞突变数据中的应用。SCITE通过贝叶斯推断和最大似然估计来处理测序误差，同时估计突变树和细胞数量。针对混合样本中复杂和低频亚克隆的挑战，SCITE采用MCMC采样策略，实现了对不完美数据的通用化模型。此外，SCITE在实际肿瘤数据上表现优秀，例如在JAK2阴性骨髓增殖性肿瘤研究中，成功重建了58个肿瘤细胞的突变历史。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

摘要：
摘要部分从信息熵增益的角度来说，主要是（1:) maximum-likelihood mutation history / mutation tree estimation;（2:) experimental sequencing error estimation; (3:) reconstruction accuracy improvements.

Background:
第一段：how tumor evolve with respect to cells;
第二段：tumor evolves, remission, drug / personalized cancer therapies
第三段：tumor 中的细胞的谱系图；
to reconstruct the evolutionary history of SNVs, infinite sites assumptions=>
each site mutates at most once, i.e., from 0 to 1 and never comes back =>
perfect phylogeny, introducing mutation tree; as in Fig. 1d.
第四段：common bulk high-throughout sequencing admixes the DNA of millions of cells; “the typical output of these tools would be one or several trees as in Fig.1d, augmented with the estimated prevalence of the different subclones in the tumor.”,
这个地方有点奇怪，prevalence 也估计出来了？
第五段：mixed samples => does not work on complex subclones and low-frequency subclones => single-nucleus sequencing techniques => data is error-prone (false negative rate and false positive rate are both high) => prohibits the perfect phylogeny reconstruction approach => generalization of perfect phylogeny model to deal with imperfect data
第六段：bayesain inference, 1: whole posterior tree distribution;
2: error rate estimation
第七段：BitPhylogeny
第八段：Kim and Simon’s pairwise ordering test, and mutation tree
polynomial-time pairwise ordering test algorithm, (这个pairwise ordering test 用到的也是Bayesian Inference)
第九段：SCITE 目的是 1: maximum-likelihood-tree estimation; 2: sequencing error estimation; 3: # cells necessary estimation

Results and Discussion
第十段：tree inference from single-cell mutation profiles
1: represent single-cell mutation histories
2: likelihood-based approach to deal with sequencing errors

第十一段：tumor evolution model and tree representation
Assumptions: point mutations and infinite sets assumption: each site mutates at most once in the evolutionary history of a tumor. Introduce data matrix E, and mutation tree. Some mutation order is not identified.

第十二段：observational errors
这个 errors 是不是就是 sequencing errors ?
觉得就是 sequencing errors.
SCITE, a MCMC sample scheme,
两个重点：1: state change, ergodic moves ?
2: transition probabilities and acceptance ratio ?

SCITE：两种
Maximum A Posterior (MAP) : 在给定数据的前提之下，选取使得参数出现概率最大的那个参数；
Maximum Likelihood (ML)：选取参数，使得数据出现的概率值最大

SCITE 还可以有一个参数，来 amplifies the likelihood and speed up tree discovery
SCITE 可以 skip the learning of error rates if it is provided

reconstruct mutation history from real tumor data
JAK2-negative myeloproliferative neoplasm
58 tumor cells, 712 SNVs, 18 cancer-related mutation sites, sequencing error rates 已经知道了。这是 Part 1 部分。