HERRO 使用教程-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00723/article/details/147272955

HERRO 使用教程

herro HERRO is a highly-accurate, haplotype-aware, deep-learning tool for error correction of Nanopore R10.4.1, Kit 14, Ultra-long (UL) reads. 项目地址: https://gitcode.com/gh_mirrors/he/herro

1. 项目介绍

HERRO（Haplotype-aware ERRor cOrrection）是一个高精度的， haplotype-aware 的深度学习工具，用于纠正 Nanopore R10.4.1 或 R9.4.1 的读数（建议读数长度至少为 10 kbps）。该工具能够有效地进行错误校正，提高组装质量。

2. 项目快速启动

安装环境

操作系统：Linux（已在 RHEL 8.6 和 Ubuntu 22.04 上测试）
Zstandard
Python（及 conda）用于数据预处理
从源代码编译：libtorch 2.0.*，rustup

克隆仓库

git clone https://github.com/dominikstanojevic/herro.git
cd herro

创建 conda 环境

conda env create --file scripts/herro-env.yml

编译 HERRO 二进制文件

下载 singularity 镜像：

wget -O herro.sif https://zenodo.org/records/13802680/files/herro.sif

构建 singularity 镜像（需要 sudo 权限）：

sudo singularity build herro.sif herro-singularity.def

运行工具：

使用以下命令运行 HERRO 工具：

singularity run --nv --bind <host_path>:<dest_path> herro.sif inference <args>

从源代码编译：

确保已下载并安装 libtorch 和 rustup。

export LIBTORCH=<libtorch_path>
export LD_LIBRARY_PATH=$LIBTORCH/lib:$LD_LIBRARY_PATH
RUSTFLAGS="-Ctarget-cpu=native"
cargo build -q --release

编译完成后，二进制文件位于 target/release/herro。

下载模型

对于 R10.4.1 数据：

wget -O model_R10_v0.1.pt https://zenodo.org/records/12683277/files/model_v0.1.pt

对于 R9.4.1 数据（实验性）：

wget -O model_R9_v0.1.pt https://zenodo.org/records/12683277/files/model_R9_v0.1.pt

3. 应用案例和最佳实践

预处理读数

scripts/preprocess.sh <input_fastq> <output_prefix> <number_of_threads> <parts_to_split_job_into>

注意：Porechop 会将所有读数加载到内存中，因此可能需要将输入分成多个部分。如果不需要分割，将 <parts_to_split_job_into> 设置为 1。

minimap2 对齐和分批

scripts/create_batched_alignments.sh <output_from_reads_preprocessing> <read_ids> <num_of_threads> <directory_for_batches_of_alignments>

注意：读 id 可以使用 seqkit 获取：

seqkit seq -ni <reads> > <read_ids>

错误校正

herro inference --read-alns <directory_alignment_batches> -t <feat_gen_threads_per_device> -d <gpus> -m <model_path> -b <batch_size> <preprocessed_reads> <fasta_output>

注意：GPU ID 需要指定。例如，如果 -d 参数设置为 0,1,3，HERRO 将使用第一、第二和第四块 GPU 卡。参数 -t 是每个设备上的线程数，例如 -t 8 和使用 3 块 GPU，HERRO 将创建总共 24 个特征生成线程。

4. 典型生态项目

HERRO 可以与其他组装工具如 hifiasm 结合使用，以提高组装质量。具体案例可以参考项目官方文档中提供的 HG002 数据的组装结果和比较。

以上就是 HERRO 的使用教程，希望能够帮助您顺利地使用这个工具。

herro HERRO is a highly-accurate, haplotype-aware, deep-learning tool for error correction of Nanopore R10.4.1, Kit 14, Ultra-long (UL) reads. 项目地址: https://gitcode.com/gh_mirrors/he/herro

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考