- 基因或基因组上的差别
- 基因转录翻译过程
- 变异类型
- 基因转录翻译过程
- Calling SNPs(Index)
- Overall procedure
- fig
- fig
- step.1 mapping
- 建立索引
- bwa index hg19.fa
- 比对
- SAM Format(https://samtools.github.io/hts-specs/)
- fig
- 转换BAM格式
- samtools view –b A.sam > A.bam
- 排序
- samtools sort –O BAM A.bam > A.sorted.bam
- 建立索引(bam.bai)
- samtools index A.sorted.bam
- fig
- 建立索引
- step.2 Remove duplicates
- fig
- code
- gatk MarkDuplicates \-I A.sorted.bam \ # Input BAM from alignment-O A.dedup.bam \ #Output BAM-M A.marked_dup_metrics.txt # Output metrics
- fig
- step.3 Call SNP for each sample
- fig3
- 3.1 Build index
- Need two indicies:• .fai (from samtools index)• .dict (from gatk CreateSequenceDictionary)
- samtools index hg19.fa #Creates hg19.fa.faigatk CreateSequenceDictionary \ #Creates reference.fa.dict–R hg19.fa
- 3.2 Call SNPs for each sample using HaplotypeCaller
- code
- gatk HaplotypeCaller \-R hg19.fa \-I A.sorted.bam \-O A.raw.gvcf \-ERC GVCF \-ploidy 2 \ #modify based on species or sample pool
- code
- fig3
- step.4 Combine GVCF from all the samples and genotype
- gatk CombineGVCFs\-R hg19.fa \-O combine_variants.raw.gvcf \--variant A.raw.gvcf \--variant B.raw.gvcf \
- gatk GenotypeGVCFs \-R hg19.fa \-O combine_variants.raw.vcf \--variant combine_variants.raw.gvcf \
- GVCF vs VCF
- GVCF A record for all sites (including non-variant sites)
- VCF Only variant sites
- GVCF A record for all sites (including non-variant sites)
- 4.1 Obtaining SNP and filter
- code
- gatk SelectVariants \-R hg19.fa \-O combine_SNP.raw.vcf \--variant combine_variants.raw.vcf--select-type-to-include SNP
- gatk VariantFiltration \-R hg19.fa \-O combine_SNP.filtered.vcf \--variant combine_SNP.raw.vcf \–-filter-name “snp_filter” \--filter-expression “QD < 2.0 || FS > 60.0 || SOR > 3.0 || MQ < 40.0 ||MQRankSum < -12.5”
- code
- 4.2 Obtaining Indel and filter
- code
- gatk SelectVariants \-R hg19.fa \-O combine_INDEL.raw.vcf \--variant combine_variants.raw.vcf--select-type-to-include INDEL
- gatk VariantFiltration \-R hg19.fa \-O combine_INDEL.filtered.vcf \--variant combine_INDEL.raw.vcf \–-filter-name “indel_filter” \--filter-expression “QD < 2.0 || FS > 200.0 || SOR > 10.0 || MQ < 40.0 ||MQRankSum < -12.5”
- code
- 4.3 (Rare SNP and call rate filter)
- VCFtools
- code
- vcftools \--vcf combine_SNP.filtered.vcf \--max-missing 0.8 \--maf 0.05 \--minDP 4 \--out final.snp.vcf \
- Visualize SNP and Indel on IGV
- • Integrated Genomics Viewerhttp://software.broadinstitute.org/software/igv/
- Overall procedure
- Calling SV
- SV types
- fig
- fig
- code
- /path/to/configManta.py \--bam ../02.dedup/A.dedup.bam \--referenceFasta ../ref/hg19.fa \--runDir A_mantaA_manta/runWorkflow.py
- fig
- Manta VCF
- SV types
2020.10.12丨二代变异检测
最新推荐文章于 2024-03-11 23:10:19 发布