2019/2/20_*.bam 与 *.sam文件中的flag的含义和统计结果

本文详细解析了SAM文件中的FLAG信息,包括比对结果的各个字段及其含义,如匹配、缺失和剪切等。同时,介绍了如何使用samtools进行flagstat统计,解释了各标志位的二进制表示及其对应的意义,如正链、负链、成对读取状态等。此外,还提到了samtools sort的功能,用于对SAM文件进行排序并生成BAM文件。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

*.sam文件的含义和统计结果

*.sam文件的flag信息

在sam的信息中,有着重要的信息。包括注释信息部分(header section)和比对结果部分(aligment section)。其中注释信息部分以@开头。如图。
在这里插入图片描述

比对结果:

比对结果每行对应一个片段。每列是一个字段。如有片段是如下格式:

D80KHJN1:237:C5HMKACXX:5:1101:1634:1982 147     chr11   2151866 60      100M    =       2151723 -244    ATAGATCAATTGACATGAAATTTGGGGGTTCCTAATTTCTCTATGTAATTCTGCAAGTCTGCTGTCAATCCTCCTGACTTTTCCATCCAAAATCTCCCGG    C@AAA:CD@>ACDDC<DDDDBBDDDFFDFHHGHGHGHGIGIG@88=AIHEGIFCIIHCEF<IGACGGHDGHFIGIGHIJHGFJJHFJHHHFF@DDFFCCC    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:100        YS:i:-1 YT:Z:CP NH:i:1
  1. read序列数据名
  2. flag信息(不同数字不同含义)
  3. 参考基因组染色体名称
  4. 比对的染色体具体位置
  5. MAPQ:比对质量–越高越好,60表示unique mapped read
  6. 比对信息 1
(rmats) [stu4@localhost :~]$ echo -e "/home/stu4/SRR5476912_sorted.bam\n/home/stu4/SRR5476913_sorted.bam" > Epi6.5.bamfile (rmats) [stu4@localhost :~]$ echo -e "/home/stu4/SRR5476914_sorted.bam\n/home/stu4/SRR5476915_sorted.bam" > ExE6.5.bamfile (rmats) [stu4@localhost :~]$ rmats.py --b1 Epi6.5.bamfile --b2 ExE6.5.bamfile --gtf /home/stu4/Mus_musculus.GRCm38.102.chr.gtf --od Epi6.5_ExE6.5_hisat2 --tmp Epi6.5_ExE6.5_hisat2/tmp -t paired --readLength 115 --cstat 0.05 --libType fr-unstranded --nthread 4 --tstat 2 gtf: 18.7707397938 There are 55401 distinct gene ID in the gtf file There are 142604 distinct transcript ID in the gtf file There are 34279 one-transcript genes in the gtf file There are 843402 exons in the gtf file There are 26950 one-exon transcripts in the gtf file There are 21842 one-transcript genes with only one exon in the transcript Average number of transcripts per gene is 2.574033 Average number of exons per transcript is 5.914294 Average number of exons per transcript excluding one-exon tx is 7.059436 Average number of gene per geneGroup is 7.464226 statistic: 0.0249240398407 Fail to open /home/stu4/SRR5476912_sorted.bam /home/stu4/SRR5476913_sorted.bamFail to open /home/stu4/SRR5476914_sorted.bam /home/stu4/SRR5476915_sorted.bam read outcome totals across all BAMs USED: 0 NOT_PAIRED: 0 NOT_NH_1: 0 NOT_EXPECTED_CIGAR: 0 NOT_EXPECTED_READ_LENGTH: 0 NOT_EXPECTED_STRAND: 0 EXON_NOT_MATCHED_TO_ANNOTATION: 0 JUNCTION_NOT_MATCHED_TO_ANNOTATION: 0 CLIPPED: 0 total: 0 outcomes by BAM written to: Epi6.5_ExE6.5_hisat2/tmp/2025-07-01-00:47:36_540509_read_outcomes_by_bam.txt novel: 0.00137400627136 The splicing graph and candidate read have been saved into Epi6.5_ExE6.5_hisat2/tmp/2025-07-01-00:47:36_540509_*.rmats save: 0.000108957290649 Traceback (most recent call last): File "/opt/Anaconda3/envs/rmats/bin/rmats.py", line 536, in <module> main() File "/opt/Anaconda3/envs/rmats/bin/rmats.py", line 507, in main run_pipe(args) File "rmatspipeline/rmatspipeline.pyx", line 3803, in rmats.rmatspipeline.run_pipe File "rmatspipeline/rmatspipeline.pyx", line 3666, in rmats.rmatspipeline.split_sg_files_by_bam File "rmatspipeline/rmatspipeline.pyx", line 3674, in rmats.rmatspipeline.split_sg_files_by_bam ValueError: invalid literal for int() with base 10: '/home/stu4/SRR5476913_sorted.bam' (rmats) [stu4@localhost :~]$
07-01
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值