下载pubmed数据

最新推荐文章于 2025-04-08 21:47:19 发布

weixin_30587025

最新推荐文章于 2025-04-08 21:47:19 发布

阅读量396

点赞数

CC 4.0 BY-SA版权

文章标签： json python

原文链接：http://www.cnblogs.com/wlc297984368/p/7928447.html

本文介绍了一种使用Python从PubMed数据库批量下载文献摘要的方法。通过构造特定的时间范围请求，利用requests库获取JSON格式的数据，并分批保存为本地文件。总计下载了约264万个记录，数据量约为134GB。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

 1 import requests
 2 import json
 3 
 4 search_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&mindate=1800/01/01&maxdate=2016/12/31&usehistory=y&retmode=json"
 5 search_r = requests.post(search_url)
 6 search_data = search_r.json()
 7 webenv = search_data["esearchresult"]['webenv']
 8 total_records = int(search_data["esearchresult"]['count'])
 9 fetch_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmax=9999&query_key=1&webenv="+webenv
10 
11 for i in range(0, total_records, 10000):
12     this_fetch = fetch_url+"&retstart="+str(i)
13     print("Getting this URL: "+this_fetch)
14     fetch_r = requests.post(this_fetch)
15     f = open('pubmed_batch_'+str(i)+'_to_'+str(i+9999)+".json", 'w')
16     f.write(fetch_r.text)
17     f.close()
18 
19 print("Number of records found :"+str(total_records))