Phylobayes做Cross-Validation

本文详细介绍了使用Phylobayes进行交叉验证的方法,包括理论基础、操作流程及参数设置建议。通过将数据集划分为训练集和测试集,评估模型的预测能力,并通过多次重复实验来提高结果的可靠性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Phylobayes做Cross-Validation

原理
Cross-validation (CV) is a general method for evaluating the fit of alternative models. The rationale is as follows: the dataset is randomly split into two (possibly unequal) parts, the training (or learning) set and the test set. The parameters of the model are estimated on the learning set (i.e. the model is ’trained’ on this subset of empirical observations), and these parameter values are then used to compute the likelihood of the test set (which measures how well the test set is ’predicted’ by the model). The overall procedure has to be repeated (and the resulting log likelihood scores averaged) over several random splits
CV用来评估最适替换模型,原理是将数据集分为训练集和测试集,用训练集去估计模型参数,然后将这些参数用于测试集,去计算似然值。该过程需要多次重复,计算出的似然值取平均输出。

Typically, 10-fold cross-validation (such that D2 represents 10% and D1 90% of the original dataset) has been used (e.g. Philippe et al., 2011), and ten replicates have been run (although ideally, 100 replicates would certainly be more adequate). However, alternative schemes are possible.
用户手册推荐训练集10%、测试集90%的分法(10 fold),重复10次

操作流程

  1. cvrep: prepare the replicates
  2. pb: run each model under each replicated learning set
  3. readcv: compute the cross-validation scores on each replicate
  4. sumcv: pool the cv-scores and combine them into a global scoring of the models

Step I:

cvrep -nrep 10 -nfold 10 -d 13PCG123.phy pcg

生成10对learn和test文件

Step II:

pb -d PCG0_learn.ali -T tree.nwk -x 10 11000 CATpcg0_learn.ali
pb -d PCG0_learn.ali -T tree.nwk -x 1 1100 -wag WAGpcg0_learn.ali

运行完全部的10个 PCG*_learn.ali文件

Step III: Calculate cross-validated likelihoods

readcv -nrep 10 -x 100 1 CAT pcg
readcv -nrep 10 -x 100 1 WAG pcg

Note that, when used with the -nrep option such as above, readcv will process each replicate successively, which may take a very long time. Alternatively readcv can be called on individual replicates. For instance:
readcv -rep 2 -x 100 10 CAT pcg

Step IV: Average the cv-log-likelihood scores over replicates

sumcv -nrep 10 WAG CAT pcg
sumcv -nrep 10 WAG CAT GTR pcg

The first model of the list (here WAG) as the reference

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值