tree inference for single-cell data ---- part 2

探讨了趋同演化的可能性在单细胞肿瘤数据中的受限,通过712个SNVs和18个癌症相关变异位点的实例,分析了homozygous mutation如何与无限 sites model相冲突。方法部分介绍了利用augmented ancestral matrix和MCMC技术,以及在处理数据质量不同的实际肿瘤样本时的方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

插入一点有关趋同演化的内容:
infinite site assumption => each site mutates at most once => 所有相同的变异都同源 (要不然就有可能有一个 site 会有多个 mutations) => 趋同演化不大可能

reconstructing mutation histories from real tumor data

apply it to three real single-cell tumor data sets of different data quality

JAK2-negative myeloproliferative neoplasm

712 SNVs, 58 tumor cells, 18 cancer-related mutation sites,error rates are known, mutation matrix distinguishes three observed states: normal, heterozygous, homozygous mutations.

这里说,如果出现 homozygous mutation, 就跟 infinite sites model 矛盾?

接下来看 Methods 部分的内容
首先定义一个 augmented ancestor matrix A(T),
row: mutation index
column: mutation index + the empty root node index
A_{i,k} = 1 if i=k, or i is an ancestor of k; A_{i,k}=0 otherwise.
vector \sigma: \sigma_{i}, the mutation index of the i-th cell;
Based on this, the likelihood of the data P( D|T, \sigma, \theta ) can be calculated.
可以看到,在式子(12)中,
the likelihood of the data 主要是跟 sequencing error 有关。

MCMC sampling
three elements would influence the final output tree T:
the mutation tree T, the attachment vector \sigma, and the sequencing error rates \theta.
Two ways: one is to marginalize out the \sigma component, and the other one is to consider it.

1: marginalize out the sample attachment
pick a sample, and uniformly choose an attachment point, how to satisfy the necessary properties for the MCMC chain on \sigma to converge.
后面的以后再写。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值