MAPQ should be 0 for unmapped read.

本文探讨了SAM文件中特定错误及其与Picard工具验证冲突的原因,涉及BWA工具产生的非映射读段标记问题及解决策略。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Ignoring SAM validation error due to lenient parsing:
Error parsing text SAM file. MAPQ should be 0 for unmapped read.; File chrm.sam; Line 1477
Line: HWI-ST499:5:23:21302:167400#GCCAAT    87    chrM    16518    60    55M    =    16319    -254    GGGTCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAATAAGACATCACGATGG    bSbcWbccc_^b\_]da_ddcbcQcUfefcffffccdddde_eebffdddfaefd    XT:A:U    NM:i:2    XN:i:1    SM:i:37    AM:i:25    X0:i:1    X1:i:0    XM:i:2    XO:i:0    XG:i:0    MD:Z:3C50C0

 

sourceforge.net/mailarchive/forum.php?thread_name=4B957356.2000202@broadinstitute.org&forum_name=samtools-devel

 

> >>>>>>>> Exception in thread "main"
> >>>>>>>> net.sf.samtools.SAMFormatException: Error
> >>>>>>>> parsing text SAM file. MAPQ must should be 0 for unmapped  
> >>>>>>>> read.;
> >>>>>>>> File sorted.sam; Line 8910023
> >>>>>>>> Line: ./S2:747_1696_219 4 chr18 90771994 25 48M * 0 0
> >>>>>>>> AGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGTTTGGGGCTT
> >>>>>>>> ]]]]]]PLQ]]XW[]QA6H]]VI+3]M9FSIFG@... XT:A:U
> >>>>>>>> CM:i:1 XN:i:
> >>>>>>>> 10 X0:i:1 X1:i:0 XM:i:4 XO:i:0 XG:i:0 MD:Z:40C7
> >>>>>>>>
> >>>>>> You don't say what creature this is or how long its chr18
> >>>>>> sequence is, but
> >>>>>> this looks a lot like our favourite bwa "feature" in which
> >>>>>> spurious
> >>>>>> mappings bridging adjacent reference sequences are marked as
> >>>>>> unmapped but
> >>>>>> otherwise left intact (see the FAQ at http://bio

-
> >>>>>> bwa.sourceforge.net/ ).
> >>>>>>
> >>>>>> It would appear that bwa's leaving such non-mappings with a non-
> >>>>>> zero MAPQ
> >>>>>> and non-empty CIGAR conflicts with Picard's validity checking.
> >>>>>> (And other
> >>>>>> non-zero/empty fields are not great either, but those are the
> >>>>>> main ones
> >>>>>> that Picard currently checks.)
> >>>>>>
> >>>>>> Depending on your reference, this may or may not explain your
> >>>>>> particular
> >>>>>> problem.  Is your chr18 reference about 90772000 bases?
> >>>>>>
> >>>>>> (Also: picard typo :-))
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>>  John
> >>>>>>


	
	
	
	
	
		<!-- 
		BODY,DIV,TABLE,THEAD,TBODY,TFOOT,TR,TH,TD,P { font-family:"DejaVu Sans Condensed"; font-size:x-small }
		 -->
	
	

HWI-ST499:5:23:21302:167400#GCCAAT87chrM165186055M=16319-254GGGTCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAATAAGACATCACGATGGbSbcWbccc_^b\_]da_ddcbcQcUfefcffffccdddde_eebffdddfaefdXT:A:UNM:i:2XN:i:1SM:i:37AM:i:25X0:i:1X1:i:0XM:i:2XO:i:0XG:i:0MD:Z:3C50C0
HWI-ST499:5:44:16607:70588#GCCAAT141chrM1647625100M=164760GTAGCTAAAGTGAACTGTATCCGACATCTGGTTCCTACTTCAGGGTCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAATAAGACATCACGATGGATCggggggggggbffffffefbfffefgdggegbggfeefchff^ffWdcddffefegggffgfgggfgggddffefggggggggfggggegdfgadgegadXT:A:UNM:i:5XN:i:4SM:i:25AM:i:0X0:i:1X1:i:0XM:i:5XO:i:0XG:i:0MD:Z:45C50C0G0G0G0
HWI-ST499:5:64:19795:166573#GCCAAT 151chrM1647460100M=16387-187GGGTAGCTAAAGTGAACTGTATCCGACATCTGGTTCCTACTTCAGGGTCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAATAAGACATCACGATGGAddda\f_aegUdfdd]eN`eggedgggg_gageffgggdggggggggggdggggggeeeggggdgffdhedffcecggegfggegfggggggggggdgggXT:A:UNM:i:3XN:i:2SM:i:37AM:i:37X0:i:1X1:i:0XM:i:3XO:i:0XG:i:0MD:Z:47C50C0G0
picard (http://picard.sourceforge.net/explain-flags.html) 会检查FLAG的含义如果为: read unmapped,就会报错CIGAR should have zero elements for unmapped read。MAPQ should be 0 for unmapped read 这个主要是在染色体的最后一点,BWA认为一个pair比对上了,另一个也在这个位置但是没有满足条件没有比对上 (如真好在最末尾,referance不够长了,就会产生这个错误。会给一个MAPQ值,和一个read unmapped得FLAGS。) lstFlags = [["read paired", 0x1], ["read mapped in proper pair", 0x2], ["read unmapped", 0x4], ["mate unmapped", 0x8], ["read reverse strand", 0x10], ["mate reverse strand", 0x20], ["first in pair", 0x40], ["second in pair", 0x80], ["not primary alignment", 0x100], ["read fails platform/vendor quality checks", 0x200], ["read is PCR or optical duplicate", 0x400]];
(rmats) [stu4@localhost :~]$ echo -e "/home/stu4/SRR5476912_sorted.bam\n/home/stu4/SRR5476913_sorted.bam" > Epi6.5.bamfile (rmats) [stu4@localhost :~]$ echo -e "/home/stu4/SRR5476914_sorted.bam\n/home/stu4/SRR5476915_sorted.bam" > ExE6.5.bamfile (rmats) [stu4@localhost :~]$ rmats.py --b1 Epi6.5.bamfile --b2 ExE6.5.bamfile --gtf /home/stu4/Mus_musculus.GRCm38.102.chr.gtf --od Epi6.5_ExE6.5_hisat2 --tmp Epi6.5_ExE6.5_hisat2/tmp -t paired --readLength 115 --cstat 0.05 --libType fr-unstranded --nthread 4 --tstat 2 gtf: 18.7707397938 There are 55401 distinct gene ID in the gtf file There are 142604 distinct transcript ID in the gtf file There are 34279 one-transcript genes in the gtf file There are 843402 exons in the gtf file There are 26950 one-exon transcripts in the gtf file There are 21842 one-transcript genes with only one exon in the transcript Average number of transcripts per gene is 2.574033 Average number of exons per transcript is 5.914294 Average number of exons per transcript excluding one-exon tx is 7.059436 Average number of gene per geneGroup is 7.464226 statistic: 0.0249240398407 Fail to open /home/stu4/SRR5476912_sorted.bam /home/stu4/SRR5476913_sorted.bamFail to open /home/stu4/SRR5476914_sorted.bam /home/stu4/SRR5476915_sorted.bam read outcome totals across all BAMs USED: 0 NOT_PAIRED: 0 NOT_NH_1: 0 NOT_EXPECTED_CIGAR: 0 NOT_EXPECTED_READ_LENGTH: 0 NOT_EXPECTED_STRAND: 0 EXON_NOT_MATCHED_TO_ANNOTATION: 0 JUNCTION_NOT_MATCHED_TO_ANNOTATION: 0 CLIPPED: 0 total: 0 outcomes by BAM written to: Epi6.5_ExE6.5_hisat2/tmp/2025-07-01-00:47:36_540509_read_outcomes_by_bam.txt novel: 0.00137400627136 The splicing graph and candidate read have been saved into Epi6.5_ExE6.5_hisat2/tmp/2025-07-01-00:47:36_540509_*.rmats save: 0.000108957290649 Traceback (most recent call last): File "/opt/Anaconda3/envs/rmats/bin/rmats.py", line 536, in <module> main() File "/opt/Anaconda3/envs/rmats/bin/rmats.py", line 507, in main run_pipe(args) File "rmatspipeline/rmatspipeline.pyx", line 3803, in rmats.rmatspipeline.run_pipe File "rmatspipeline/rmatspipeline.pyx", line 3666, in rmats.rmatspipeline.split_sg_files_by_bam File "rmatspipeline/rmatspipeline.pyx", line 3674, in rmats.rmatspipeline.split_sg_files_by_bam ValueError: invalid literal for int() with base 10: '/home/stu4/SRR5476913_sorted.bam' (rmats) [stu4@localhost :~]$
最新发布
07-01
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值