TOGA 项目使用指南

荣钧群

于 2024-10-10 08:47:30 发布

阅读量1.2k

点赞数 16

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/gitblog_00905/article/details/142811015

TOGA 项目使用指南

TOGA TOGA (Tool to infer Orthologs from Genome Alignments): implements a novel paradigm to infer orthologous genes. TOGA integrates gene annotation, inferring orthologs and classifying genes as intact or lost. 项目地址: https://gitcode.com/gh_mirrors/toga/TOGA

1. 项目介绍

TOGA（Tool to infer Orthologs from Genome Alignments）是一个用于从基因组比对中推断直系同源基因的工具。TOGA 通过整合基因注释，推断直系同源基因，并将基因分类为完整或丢失。TOGA 实现了一种基于机器学习的新范式，用于在相关物种之间推断直系同源基因，并准确区分直系同源基因与旁系同源基因或加工假基因。

2. 项目快速启动

安装与配置

TOGA 兼容 Linux 和 MacOS，包括 M1 架构的系统。建议使用 Python 3.11 版本。强烈建议使用计算集群，但对于小型或部分基因组和短基因，桌面 PC 也足够。

安装 Nextflow

首先，检查 Java 版本并安装 Nextflow：

curl -fsSL https://get.nextflow.io | bash
# 或者使用 conda 安装
conda install -c bioconda nextflow

如果使用 curl 下载 Nextflow，请将 Nextflow 可执行文件移动到 $PATH 变量中的目录。

安装 TOGA

克隆 TOGA 仓库并安装必要的 Python 包：

git clone https://github.com/hillerlab/TOGA.git
cd TOGA
python3 -m pip install -r requirements.txt --user

或者，如果你使用 Poetry，只需运行：

poetry install

配置 TOGA

运行配置脚本以训练 xgboost 模型、下载 CESAR2.0 并编译 C 代码：

./configure.sh

运行测试

运行测试以确保 TOGA 安装正确：

./run_test.sh micro

如果看到类似以下输出，则 TOGA 已准备好使用：

Orthology class sizes: one2one: 3
Done, Estimated time: 0:01:02.800084
Program finished with exit code 0

3. 应用案例和最佳实践

案例1：人类和小鼠基因组比对

以下是一个使用 TOGA 进行人类和小鼠基因组比对的示例：

下载人类和小鼠的 2bit 格式基因组文件：

wget https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.2bit
wget https://hgdownload.cse.ucsc.edu/goldenpath/mm10/bigZips/mm10.2bit

运行 TOGA：

./toga.py test_input/hg38 test_input/mm10 chr11 chain test_input/hg38 genCode27 chr11.bed $[path_to_human_2bit] $[path_to_mouse_2bit] --kt --pn test -i supply/hg38.wgEncodeGencodeCompV34.isoforms.txt --nc $[path_to_nextflow_config_dir] --cb 3.5 --cjn 500 --u12 supply/hg38.U12sites.tsv --ms