EFetch for Sequence and other Molecular Biology Databases (please see http://eutils.ncbi.nlm.nih.gov

Last updated: $Date: 2009-03-23 18:22:05 -0400 (Mon, 23 Mar 2009) $

EFetch documenation is also available for the Literature, and Taxonomy databases.

EFetch: Retrieves records in the requested format from a list of one or more unique identifiers.


Base URL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi


URL parameters:

NOTE: Utility parameters may be case sensitive. Use lower case characters for all parameters except WebEnv


Database

Current database values:
  • gene
  • genome
  • nucleotide:
    • nuccore
    • nucest
    • nucgss
  • protein
  • popset
  • snp
  • sequences - Composite name including nucleotide, protein, popset and genome
Example:
db=protein

Web Environment (WebEnv)

History link value previously returned in XML results from ESearch and used with EFetch in place of primary ID result list.

Example:
WebEnv=WgHmIcDG]

Query Key

The value used for a history search number or previously returned in XML results from Esearch or EPost.

Example:
query_key=6

Note: WebEnv is similar to the cookie that is set on a user's computers when accessing PubMed on the web.  If the parameter usehistory=y is included in an ESearch URL both a WebEnv (cookie string) and query_key (history number) values will be returned in the results. Rather then using the retrieved PMIDs in an ESummary URL you may simply use the WebEnv and query_key values to retrieve the records. WebEnv will change for each ESearch query.


Tool

A string with no internal spaces that identifies the resource which is using Entrez links (e.g., tool=flybase). This argument is used to help NCBI provide better service to third parties generating Entrez queries from programs. As with any query system, it is sometimes possible to ask the same question different ways, with different effects on performance. NCBI requests that developers sending batch requests include a constant 'tool' argument for all requests using the utilities.

Example:
tool=flybase

E-mail Address

If you choose to provide an email address, we will use it to contact you if there are problems with your queries or if we are changing software interfaces that might specifically affect your requests. If you choose not to include an email address we cannot provide specific help to you,  but you can still sign up for utilities-announce to receive general

announcements.

Example:
email=john@doe.org

Record Identifier

IDs required if WebEnv is not used.

Current values:
  • NCBI sequence number (GI)
  • accession
  • accession.version
  • fasta
  • GeneID
  • genome ID
  • seqid
Example:
id=123,U12345,U12345.1,gb|U12345|

Display Numbers

  • retstart - sequential number of the first id retrieved - default=0 which will retrieve the first record
  • retmax - number of items retrieved
Example:
retstart=100&retmax=50

Sequence Strand, Start, Stop and Complexity Parameters

  • strand - what strand of DNA to show (1 = plus or 2 = minus)
  • seq_start - show sequence starting from this base number
  • seq_stop - show sequence ending on this base number
  • complexity - gi is often a part of a biological blob, containing other gis

    Complexity regulates the display:

    • 0 - get the whole blob
    • 1 - get the bioseq for gi of interest (default in Entrez)
    • 2 - get the minimal bioseq-set containing the gi of interest
    • 3 - get the minimal nuc-prot containing the gi of interest
    • 4 - get the minimal pub-set containing the gi of interest
Example:
strand=2&seq_start=50&seq_stop=2000&complexity=2

Retrieval Mode

Output format

Current values:
  • xml
  • html
  • text
  • asn.1
Example:
retmode=text

Retrieval Type:

output types based on database

Note: Not all Retrieval Modes are possible with all Retrieval Types.

 

Sequence Options:
rettypescoperetmodeComment
xmltexthtml*asn1
native (full record)all but genexxxxDefault report for viewing sequences
fastasequence onlyxxxn/aFASTA view of a sequence. Existence of the mode depends on gi type
gbnucleotide sequence onlyn/axxn/aGenBank report for sequences, constructed sequences will be shown as contigs (by pointing to its parts)
gpprotein sequence onlyn/axxn/aGenPept report
gbwithpartsnucleotide sequence onlyn/axxn/aGenBank report for sequences, the sequence will always be shown
gbcnucleotide sequence onlyn/axxn/aINSDSeq structured flat file
gpcprotein sequence onlyn/axxn/aINSDSeq structured flat file
estdbEST sequence onlyn/axxn/aEST Report
gssdbGSS sequence onlyn/axxn/aGSS Report
seqidsequence onlyn/axxn/aTo convert list of gis into list of seqids
accsequence onlyn/axxxTo convert list of gis into list of accessions
ftsequence onlyn/axxn/aFeature Table report

x – retrieval mode available. Click to it to run an example
*  – the same content as text report but with some HTML links
n/a – not available

SNP options:
rettypeDescriptions
chrSNP Chromosome report
fltSNP Flat File report
rsrSNP RS Cluster report
briefSNP ID list
docsetSNP RS summary
Example:
rettype=fasta

Examples

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&complexity=0&rettype=fasta

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=gb&seq_start=1&seq_stop=9

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.cgi?db=nucleotide&id=5&rettype=fasta&seq_start=1&seq_stop=9&strand=2

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=gb

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=popset&id=12829836&rettype=gp

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=8&rettype=gp

Entrez display format GBSeqXML:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=gb&retmode=xml

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=8&rettype=gp&retmode=xml

Entrez display format TinySeqXML:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=fasta&retmode=xml

Entrez Gene, full display as xml:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=2&retmode=xml

SRA Toolkit是由NCBI提供的一个软件工具包,用于管理和分析SRA(Sequence Read Archive)数据。要从GEO数据库下载GSE153270项目的文件并将其转换为FastQ格式(`.fq`),你可以按照以下步骤操作: 1. **安装SRA Toolkit**: 首先,你需要从NCBI官网下载并安装SRA Toolkit。对于Linux用户,可以使用命令行安装;对于macOS或Windows,通常需要通过源码编译。 2. **登录NCBI**: 打开终端或命令提示符,输入以下命令,登录到NCBI FTP服务器: ``` fastq-dump -A GSE153270 ``` 这会下载整个系列的所有数据。如果只需要特定的run,可以提供具体的run accession ID,例如`-s run_accession_number`. 3. **指定输出格式**: 使用`-split-e`选项将每个样本分开成单独的文件,因为默认情况下,`fastq-dump`可能会将所有read组合在一个文件里: ``` fastq-dump -O output_directory -split-e GSE153270 ``` `output_directory`是你希望保存下载文件的目录。 4. **等待下载完成**: 等待下载过程完成,这可能需要一些时间取决于序列的数量和大小。 5. **转换为FastQ**: 默认情况下,SRA Toolkit已经将下载的数据转换为SRA格式,但在某些情况下,你可能需要额外转换为FastQ(`.fq`)。如果是SRA格式,可以直接使用`fastq-dump`的上述选项。如果需要确认,检查下载的文件名,通常会有`.sra`扩展名。 注意:这个过程中需要网络连接,并可能需要一定的计算资源。如果你遇到权限问题或其他问题,可能需要联系GEO支持或查看SRA Toolkit文档获取更多信息。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值