【NextPolish】【1】【2】

NextPolish is used to fix base errors (SNV/Indel) in the genome generated by noisy long reads, it can be used with short read data only or long read data only or a combination of both. It contains two core modules, and use a stepwise fashion to correct the error bases in reference genome.
To correct/assemble the raw third-generation sequencing (TGS) long reads with approximately 10-15% sequencing errors, please use NextDenovo.
To further improve the consensus accuracy of genomes assembled using HiFi long-reads, please use NextPolish2.

Nextpolish1

Paper:https://academic.oup.com/bioinformatics/article/36/7/2253/5645175

Motivation

Although long-read sequencing technologies can produce genomes with long contiguity, they suffer from high error rates. Thus, we developed NextPolish, a tool that efficiently corrects sequence errors in genomes assembled with long reads. This new tool consists of two interlinked modules that are designed to score and count K-mers from high quality short reads, and to polish genome assemblies containing large numbers of base errors.虽然长读测序技术可以产生具有长连接的基因组,但其误差率较高。因此,我们开发NextPolish,这是一种有效纠正长reads基因组序列错误的工具。这个新工具由两个相互连接的模块组成,旨在对高质量短读的K-mers进行评分和计数,并对含有大量碱基错误的基因组片段进行修饰

Results

When evaluated for the speed and efficiency using human and a plant (Arabidopsis thaliana) genomes, NextPolish outperformed Pilon by correcting sequence errors faster, and with a higher correction accuracy.

Availability and implementation

NextPolish is implemented in C and Python. The source code is available from https://github.com/Nextomics/NextPolish.

https://www.jianshu.com/p/8d040fda7261

NextPolish可以使用二代短读序列或者三代序列或者两者结合去纠正三代长读长序列在组装时导致的碱基错误(SNV/Indel)

第一步:创建一个文件,用于记录二代序列的位置信息

realpath ERR2173372_1.fastq ERR2173372_2.fastq  > sgs.fofn

第二步:配置run.cfg文件

# 从NextPolish目录下复制配置文件
cp ~/opt/biosoft/NextPolish/doc/run.cfg run2.cfg

修改配置文件

[General]
job_type = local
job_prefix = nextPolish
task = default
rewrite = 1212
rerun = 3
parallel_jobs = 2
multithread_jobs = 10
genome = ./nextgraph.assembly.contig.fasta
genome_size = auto
workdir = ./01_rundir
polish_options = -p {
    
    multithread_jobs}

[sgs_option]
sgs_fofn = ./sgs.fofn
sgs_options = -max_depth 100

其中需要修改的参数为,其余参数查看官方的参数配置说明:

  • job_type: 任务类型,local表示单个节点运行。由于NextPolish使用DRMAA进行任务投递,因此还支持,SGE, PBS和SLURM
  • task: 任务类型, 用12,1212,121212,12121212来设置polish的轮数,建议迭代2轮就可以了。
  • parallel_jobsmultithread_jobs表示同时投递的任务数和每个任务的线程数,此处2 X 10=20
  • genome: 表示组装基因组的位置
  • workdir: 输出文件所在目录
  • sgs_options: 该选项设置二代测序polish的参数,包括-use_duplicate_reads, -unpaired, -max_depth, -bwa, -minimap2(
conda是一个开源的包管理器和环境管理器,可以用于在不同的操作系统上安装、升级和管理软件包。 要使用conda安装nextpolish,首先需要安装并配置好conda环境。可以从Anaconda官网(https://www.anaconda.com/)下载适用于自己操作系统的安装包,并按照官方文档的指引进行安装和配置。 安装好conda后,打开终端或命令提示符,输入以下命令,创建一个新的conda环境: ``` conda create -n nextpolish ``` 上述命令中,`nextpolish`是环境的名称,可以根据自己的喜好命名。接着输入以下命令,激活新创建的环境: ``` conda activate nextpolish ``` 激活环境后,可以使用conda的命令安装nextpolish。输入以下命令: ``` conda install -c bioconda nextpolish ``` 上述命令中,`-c bioconda`是指定从bioconda通道中安装nextpolish。按下回车键后,conda会自动解析依赖关系,并安装nextpolish及其所需的其他软件包。 安装完成后,就可以使用nextpolish了。继续在终端或命令提示符中运行以下命令,使用nextpolish进行多样性分析: ``` nextPolish -g genome.fa -t 8 -p illumina_reads.R1.fastq,illumina_reads.R2.fastq -o output_directory ``` 上述命令中,`-g genome.fa`指定了要进行多样性分析的基因组文件,`-t 8`指定了线程数,`-p illumina_reads.R1.fastq,illumina_reads.R2.fastq`指定了Illumina测序数据文件,`-o output_directory`指定了输出文件夹。 以上就是使用conda安装nextpolish的简要步骤。具体操作过程中,需要根据自己的环境和需求进行相应的调整。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值