Variation calling and annotation

最新推荐文章于 2022-12-08 14:42:38 发布

原创最新推荐文章于 2022-12-08 14:42:38 发布 · 543 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#Variation

Variation 同时被 3 个专栏收录

1 篇文章

订阅专栏

calling

1 篇文章

订阅专栏

annotation

1 篇文章

订阅专栏

通过对302份野生及栽培大豆种质资源进行重测序，本研究鉴定了与大豆驯化及改良相关的基因。研究中使用了多种生物信息学工具，包括SAMtools、Picard、GATK等，进行了变异检测、注释等一系列分析。

Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean

本文摘自《Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean》

Variation calling and annotation.

Mapping.

SAMtools (Version: 0.1.18) software was used to convert mapping results into the BAM format and to filter the unmapped and non-unique reads.

Duplicated reads were filtered with the Picard package (picard.sourceforge.net, Version:1.87).

The BEDtools (Version: 2.17.0) coverageBed program was used to compute the coverage of sequence alignments. （A sequence was defined as absent if coverage was lower than 90% and present if coverage was greater than 90%.）

SNP calling.

SNP detection was performed using the Genome Analysis Toolkit (GATK, version 2.4-7-g5e89f01) and SAMtools. Only the SNPs detected by both methods were analyzed further.
The detailed processes were as follows:
(1) After BWA alignment, the reads around indels were realigned.
Realignment was performed with GATK in two steps.
The first step used the RealignerTargetCreator package to identify regions where realignment was needed；
The second step used IndelRealigner to realign the regions found in the first step, which produced a realigned BAM file for each accession.
(2) SNPs were called at a population level with GATK and SAMtools. For GATK, the SNP confidence score was set as greater than 30, and the parameter -stand_call_conf was set as 30. The same realigned BAM files were used in SNP calling through the SAMtools mpileup package.
(3) In the filter step, we chose the common sites identified by GATK and SAMtools with the SelectVariants package; SNPs with allele frequencies lower than 1% in the population were discarded.

Indel calling.

Indel calling was similar to SNP calling but with the UnifiedGenotyper parameter -glm INDEL for the indel report only. Only insertions and deletions shorter than or equal to 6 bp were taken into account.

Annotation.

SNP annotation was performed according to the genome using the package ANNOVAR (Version: 2013-08-23).
Based on the genome annotation, SNPs were categorized in exonic regions (overlapping with a coding exon), splicing sites (within 2 bp of a splicing junction), 5′UTRs and 3′UTRs, intronic regions (overlapping with an intron), upstream and downstream regions (within a 1 kb region upstream or downstream from the transcription start site), and intergenic regions.

SNPs in coding exons were further grouped into synonymous SNPs (did not cause amino acid changes) or nonsynonymous SNPs (caused amino acid changes; mutations causing stop gain and stop loss were also classified into this group).

Indels in the exonic regions were classified by whether they had frame-shift (3 bp insertion or deletion) mutations.