R语言爬取HMDB,获取关键代谢物相关代谢通路
HMDB数据库是代谢组学常用的代谢物查询数据库
当然,更常用的是KEGG,这里先介绍HMDB
数据分析后获取关键代谢物,需要对其代谢通路进行富集分析
使用R语言Rurl和xml包对HMDB代谢通路数据进行自动获取
代码
library(XML);library(RCurl)//载入软件包,请先自行安装
pathways <- function(id){//自定义函数pathways,函数参数为HMDB代谢物的ID,如乳酸(Lactate)的id是[HMDB0000190](https://hmdb.ca/metabolites/HMDB0000190)
url <- paste('https://hmdb.ca/metabolites/',id,'.xml',sep = '')//获取该id的HMDB网址
wp <- getURL(url) //得到当前网址的网页内容,有点慢,跟网速有关
root <- xmlRoot(xmlParse(wp)) //解析网页内容并得到所有根节点
paths <- xmlChildren(root[[25]][[4]]) //代谢物相关pathway内容位于根节点25,其下的子节点4
pathways <- lapply(paths,function(x) xmlValue(x[[1]][[1]])) //返回所有相关pathway的内容,返回值为列表
return(pathways)
}
pathways("HMDB0000190") //使用乳酸的HMDB id 进行查询,不能少了引号
结果
> pathways("HMDB0000190")
$pathway
[1] "Fructose-1,6-diphosphatase deficiency"
$pathway
[1] "Gluconeogenesis"
$pathway
[1] "Glutaminolysis and Cancer"
$pathway
[1] "Glycogen Storage Disease Type 1A (GSD1A) or Von Gierke Disease"
$pathway
[1] "Glycogenosis, Type IA. Von gierke disease"
$pathway
[1] "Glycogenosis, Type IB"
$pathway
[1] "Glycogenosis, Type IC"
$pathway
[1] "Leigh Syndrome"
$pathway
[1] "Phosphoenolpyruvate carboxykinase deficiency 1 (PEPCK1)"
$pathway
[1] "Primary hyperoxaluria II, PH2"
$pathway
[1] "Pyruvate Decarboxylase E1 Component Deficiency (PDHE1 Deficiency)"
$pathway
[1] "Pyruvate Dehydrogenase Complex Deficiency"
$pathway
[1] "Pyruvate kinase deficiency"
$pathway
[1] "Pyruvate Metabolism"
$pathway
[1] "Triosephosphate isomerase"
$pathway
[1] "Warburg Effect"
乳酸主要跟无氧酵解,糖异生,丙酮酸生成,肿瘤Warburg效应等代谢有关