Nr,GenBank, RefSeq, UniProt 数据库的异同

本文探讨了RefSeq数据库的特点及其在单细胞转录组分析中的应用。通过对比常规转录组,解析了RefSeq如何用于差异表达基因(DEG)分析,并详细介绍了RefSeq数据库的数据结构及下载方式。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Nr,GenBank, RefSeq, UniProt 数据库的异同

 

有的文章在做DEG分析时,会把reads比对到RefSeq的转录组上。我也没搞清楚这和直接比对到常规转录组上有什么区别。

文章:Single-Cell Transcriptome Analysis Reveals Dynamic Changes in lncRNA Expression during Reprogramming

方法:For differential expression analysis, we aligned reads against the refSeq mouse transcriptome using Bowtie version 0.12.7 (Langmead et al., 2009). Expression levels were then  stimated using eXpress (Roberts and Pachter, 2013) (version 1.3.0), with gene-level effective counts and RPKM values derived from the sum of the corresponding values for all isoforms of a gene.

 

refseq 数据库长啥样?

ftp://ftp.ncbi.nlm.nih.gov/refseq/

进到小鼠里:

mRNA_Prot

mRNA_Prot directory
   Contents: organisms-specific RefSeq transcript and protein data

     {org-name}.files.installed: 
         reports the md5checksum and files included in the directory
         For example: /refseq/H_sapiens/mRNA_Prot/human.files.installed

   File Name Conventions:
	File name formats are as follows:
 	     common_name.#.molecule_type.format_type
        Multiple files may be provided for any given molecule and format type and file 
	names include a numerical increment.  Files with the same numerical increment
	are related by content. 

	For example, the files provided for human are named as:
 	      human.#.rna.fna.gz --fasta report for transcript records
 	      human.#.protein.faa.gz --fasta report for protein records
 	      human.#.rna.gbff.gz  --flatfile report for transcript records
 	      human.#.protein.gpff.gz --flatfile report for protein records

  

下载一个rna.fna文件,里面是这样的:

>NM_001013372.2 Mus musculus neural regeneration protein (Nrp), mRNA
CGGTCCAAGGAATTTTTCTGACAAACGCAATAGGCCGACCAGTACTGGAACGCAGTGCGCTTAGCCCCTTTATGGCGGAG
GCTGCCATGTTAAAACGGAATGAATCGAAACCCTGGAGTCGTGACCCCGGAAGAACCTGCCAGAGCCGGAATTTCGAGTT
CTGCTTCCGGGCCAAACTGTTGGCAGCCTCGAGATGGGGAAGATGGCGGCTGCTGTGGCTTCATTAGCCACGCTGGCTGC
AGAGCCCAGAGAGGATGCTTTCCGGAAGCTTTTCCGCTTCTACCGGCAGAGCCGGCCGGGGACAGCGGACCTGGGAGCCG
TCATCGACTTCTCAGAGGCGCACTTGGCTCGGAGCCCGAAGCCCGGCGTGCCCCAGGTAGGAAAGGAGGAGTAGTGTGTG
CCAGCCTAGCGGCCGACTGGGCCACCCGAGACTGGGCCGCCTCCGGGCCGGCTTTGGAGGGAAGCCCCTGCTGGGCCTGT
CCAGTGAGCTGTAATGTCGAGCGATGAGCGACCAGCTGCCTCGCTGTCCCAACGCTCTGGCCACGGCTTGTGCCTTGCCG
CCATTTCCCCCAACCCACGCGGGCCACGGCTTGTGCCCTGCCGCCATTTCCCCCAACCCACGCGACCTTGCTAAAAAAAA
AAAAAGAAAGAAAAGAAAAGAAAGAAAGAAAGAAAAAAATCTGGAAATTGCTTGTACCTCCTTAACTATCTGTTTAATAC
TAATACGATATTTTGTGTAAAGCTCAGAAGAACATCTTCGTGGACGTTAGGGTGGCCTCATAACTTCAGATAAAAGCAGC
CATTTAATAAGTCTCAAACCGTTAATCCGTTGGGCCTGAGACTCGATCGACCCTGTCTTCTCTGAGGCTTTGAAAGTAAA
GGTAAAATTAGCAGGTTTTTTTCCTGAGAATCTAGGAGCCTGGAGAGATAGCTCAGTAATTAAGAGCATTTACCTACTGG
TGTTCCCAAGAACACCAAGTAGATTTGGTTCCTTGCAGCCACGTGGCAGCTCACAGCCTTCTTGTAACTCTTCCGGAGGA
TCAGACACCCTCTCTTGAGCTCCACAGGAGAGCACTCGTAGACATGTAAATAAACTTCTAAGCTAAATCTAAACAATTTA
TGTACCCTCCCTATTTCTTCGTGATGAGAAGAAAGGGGCCAGAGGGTATG
>NR_046233.2 Mus musculus 45S pre-ribosomal RNA (Rn45s), ribosomal RNA
ACTGACACGCTGTCCTTTCCCTATTAACACTAAAGGACACTATAAAGAGACCCTTTCGATTTAAGGCTGTTTTGCTTGTC

  

还是没发现有什么区别!!!

RefSeq转录本是 从gtf得到的转录本的一个子集

  

 

后面会再详细展开~

 

转载于:https://www.cnblogs.com/leezx/p/8654421.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值