爬取信息,并对其进行处理
此次操作的网站是链家,详细代码如下:
install.packages("pacman")
pacman::p_load(XML,rvest,jiebaR,dplyr,stringr)
house_inf=data.frame()
for (i in 1:500){
web=read_html(str_c("https://xa.lianjia.com/ershoufang/",i),encoding = "UTF-8")
house_name=web%>%html_nodes(".item a"