GeneNomenclatureUtils_download_urls.txt(生物信息学数据库数据下载链接)

本文提供了从多个权威数据库下载基因和蛋白质命名资源的指南。涵盖了Mouse Genome Informatics (MGI)、HGNC、Entrez Gene、OMIM、Rat Genome Database (RGD) 和 UniProt等数据库的下载链接及说明。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

 

### Community-provided gene and protein nomenclature resources utilised by the

### GeneNomenclatureUtils packages

   
  ### Mouse Genome Informatics (MGI) ###
   
  wget ftp://ftp.informatics.jax.org/pub/reports/HMD_HumanPhenotype.rpt
  wget ftp://ftp.informatics.jax.org/pub/reports/MGI_PhenotypicAllele.rpt
  wget ftp://ftp.informatics.jax.org/pub/reports/HMD_HGNC_Accession.rpt
  wget ftp://ftp.informatics.jax.org/pub/reports/MGI_MouseHumanSequence.rpt
  wget ftp://ftp.informatics.jax.org/pub/reports/MRK_List2.rpt
  wget ftp://ftp.informatics.jax.org/pub/reports/MRK_Synonym.sql.rpt
  wget ftp://ftp.informatics.jax.org/pub/reports/MRK_InterPro.rpt
  wget ftp://ftp.informatics.jax.org/pub/reports/MGI_InterProDomains.sql.rpt
  wget ftp://ftp.informatics.jax.org/pub/reports/gene_association.mgi
  wget ftp://ftp.informatics.jax.org/pub/reports/VOC_MammalianPhenotype.rpt
  wget ftp://ftp.informatics.jax.org/pub/reports/MGI_PhenoGenoMP.rpt
  wget ftp://ftp.informatics.jax.org/pub/reports/MRK_Sequence.rpt
  wget ftp://ftp.informatics.jax.org/pub/reports/MRK_SwissProt.rpt
  wget ftp://ftp.informatics.jax.org/pub/reports/MRK_SwissProt_TrEMBL.rpt
  wget ftp://ftp.informatics.jax.org/pub/reports/MRK_Reference.rpt
  wget ftp://ftp.informatics.jax.org/pub/reports/MRK_VEGA.rpt
  wget ftp://ftp.informatics.jax.org/pub/reports/MRK_ENSEMBL.rpt
  wget ftp://ftp.informatics.jax.org/pub/reports/HMD_HumanSequence.rpt
   
   
  ### HGNC ###
   
  wget -O hgnc_core_data.txt 'http://www.genenames.org/cgi-bin/hgnc_downloads.cgi?title=Core+Data;col=gd_hgnc_id;col=gd_app_sym;col=gd_app_name;col=gd_status;col=gd_prev_sym;col=gd_aliases;col=gd_pub_chrom_map;col=gd_pub_acc_ids;col=gd_pub_refseq_ids;status=Approved;status=Approved+Non-Human;status=Entry+Withdrawn;status_opt=3;=on;where=;order_by=gd_app_sym_sort;limit=;format=text;submit=submit;.cgifields=;.cgifields=status;.cgifields=chr'
  wget -O hgnc_all_data.txt 'http://www.genenames.org/cgi-bin/hgnc_downloads.cgi?title=All+Data;col=gd_hgnc_id;col=gd_app_sym;col=gd_app_name;col=gd_status;col=gd_locus_type;col=gd_prev_sym;col=gd_prev_name;col=gd_aliases;col=gd_pub_chrom_map;col=gd_date2app_or_res;col=gd_date_mod;col=gd_date_name_change;col=gd_pub_acc_ids;col=gd_enz_ids;col=gd_pub_eg_id;col=gd_mgd_id;col=gd_other_ids;col=gd_pubmed_ids;col=gd_pub_refseq_ids;col=gd_gene_fam_name;col=md_gdb_id;col=md_eg_id;col=md_mim_id;col=md_refseq_id;col=md_prot_id;status=Approved;status=Approved+Non-Human;status=Entry+Withdrawn;status_opt=3;=on;where=;order_by=gd_app_sym_sort;limit=;format=text;submit=submit;.cgifields=;.cgifields=status;.cgifields=chr'
   
   
  ### Entrez Gene ###
   
  wget ftp://ftp.ncbi.nih.gov/gene/DATA/gene2accession.gz
  wget ftp://ftp.ncbi.nih.gov/gene/DATA/gene2ensembl.gz
  wget ftp://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz
  wget ftp://ftp.ncbi.nih.gov/gene/DATA/mim2gene
  wget ftp://ftp.ncbi.nih.gov/gene/DATA/gene2pubmed.gz
  wget ftp://ftp.ncbi.nih.gov/gene/DATA/gene_info.gz
   
   
  ### OMIM files not available from NCBI 2/11 ###
   
  # wget ftp://ftp.ncbi.nih.gov/repository/OMIM/omim.txt.Z
  # wget ftp://ftp.ncbi.nih.gov/repository/OMIM/genemap
  # wget ftp://ftp.ncbi.nih.gov/repository/OMIM/genemap.key
  # wget ftp://ftp.ncbi.nih.gov/repository/OMIM/morbidmap
   
   
  ### Using archived OMIM NCBI files ###
   
  wget ftp://ftp.ncbi.nih.gov/repository/OMIM/ARCHIVE/omim.txt.Z
  wget ftp://ftp.ncbi.nih.gov/repository/OMIM/ARCHIVE/genemap
  wget ftp://ftp.ncbi.nih.gov/repository/OMIM/ARCHIVE/genemap.key
  wget ftp://ftp.ncbi.nih.gov/repository/OMIM/ARCHIVE/morbidmap
   
   
  ### Rat Genome Database (RGD) ###
   
  wget ftp://rgd.mcw.edu/pub/data_release/GENES_HUMAN.txt
  wget ftp://rgd.mcw.edu/pub/data_release/GENES_MOUSE.txt
  wget ftp://rgd.mcw.edu/pub/data_release/GENES_RAT.txt
  wget ftp://rgd.mcw.edu/pub/data_release/RGD_ORTHOLOGS
   
   
  ### UniProt ###
   
  wget ftp://ftp.ebi.ac.uk/pub/databases/uniprot/knowledgebase/uniprot_sprot.dat.gz
  wget ftp://ftp.ebi.ac.uk/pub/databases/uniprot/knowledgebase/uniprot_trembl.dat.gz
   
   
  # setenv RGD_dir ${cwd}
  # setenv MGI_dir ${cwd}
  # setenv ENTREZ_dir ${cwd}
  # setenv HGNC_dir ${cwd}
  # setenv OMIM_dir ${cwd}
  # setenv NCBI_dir ${cwd}
  # setenv SWISS_dir ${cwd}
   
 

要再次处理`failed_urls.txt`中的失败URL,并判断是否仍然失败,将仍然失败的URL写入`2.txt`文件中,可以进行如下修改: ```python from concurrent.futures import ThreadPoolExecutor, wait from selenium import webdriver from selenium.webdriver.chrome.options import Options # 打开URL文件 with open('url.txt', 'r') as file: urls = file.read().splitlines() # 定义特定的域名 specific_domains = ['4qo4li.com:9516/register?i_code='] # 创建ChromeOptions对象 chrome_options = Options() chrome_options.add_argument("--incognito") # 启用无痕模式 def process_url(url): # 创建浏览器实例 driver = webdriver.Chrome(options=chrome_options) # 使用Chrome浏览器,需要下载对应的驱动并设置到环境变量中 # 构建完整的URL full_url = 'http://' + url + '/a/index.php/54545' # 打开浏览器并访问URL driver.get(full_url) # 等待页面跳转完成 driver.implicitly_wait(10) # 设置隐式等待时间,单位为秒 # 获取当前页面的URL current_url = driver.current_url # 判断当前页面的URL是否包含特定域名 if any(domain in current_url for domain in specific_domains): # 写入1.txt文本文件 with open('1.txt', 'a') as file: file.write(url + '\n') else: # 写入2.txt文本文件 with open('2.txt', 'a') as file: file.write(url + '\n') # 关闭浏览器 driver.quit() # 创建线程池 with ThreadPoolExecutor(max_workers=10) as executor: # 提交任务给线程池 futures = [executor.submit(process_url, url) for url in urls] # 等待所有任务完成 wait(futures) # 打开failed_urls.txt文件 with open('failed_urls.txt', 'r') as file: failed_urls = file.read().splitlines() # 创建新的线程池用于处理失败的URL with ThreadPoolExecutor(max_workers=10) as executor: # 提交任务给线程池 futures = [executor.submit(process_url, url) for url in failed_urls] # 等待所有任务完成 wait(futures) # 将仍然失败的URL写入2.txt文件中 for future, url in zip(futures, failed_urls): if future.exception() is not None: with open('2.txt', 'a') as file: file.write(url + '\n') ``` 在代码中,首先使用线程池处理原始的URL列表。然后,打开`failed_urls.txt`文件,读取其中的失败URL,并使用新的线程池处理这些URL。在处理完所有失败URL后,通过检查每个任务的异常状态来判断是否仍然失败,将仍然失败的URL写入`2.txt`文件中。 希望这个修改能满足您的需求。如果还有其他问题,请随时提问。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值