NCBI下载的cram文件无法直接使用,需要先转成bam/sam文件,根据官网说明下载了cramtools,发现早已没有维护,报错如下:
$ java -jar cramtools-3.0.jar
Error: Invalid or corrupt jarfile cramtools-3.0.jar
所以就直接用samtools来转换,但是直接转换会报错:
$ samtools view -b NA12878.final.cram > NA12878.bam &
Failed to populate reference for id 0
Slice ends beyond reference end.
Unable to fetch reference #0 9996..35879
Failure to decode slice
[E::hts_close] Failed to decode sequence.
samtools: error closing "NA12878.final.cram": -1
需要用到参考基因组,并且参考基因组需要与cram的一致,如果不一致跑到不一致的地方就会报错:
$ samtools view -T /database/human/hg38/hg38.fa -b NA12878.final.cram > NA12878.bam &
ERROR: md5sum reference mismatch for ref 0 pos 248747869..248786716
CRAM: 720250455f7998c0d906314e9aae3434
Ref : 5e868f1c3be1506207b2097a2371c4c5
Failure to decode slice
[E::hts_close] Failed to decode sequence.
samtools: error