爬取豆瓣电影热播名单,包括题目(litile)、时间(time)、国家(country)、导演(director)、作者(actors)、评分(score)。爬取下来的内容如下所示:
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
import urllib.request
import pandas as pd
url = "https://movie.douban.com/" #原始网址
r = urllib.request.Request(url)
response = urllib.request.urlopen(r)
data= response.read() #返回的网页内容
data= data.decode('utf-8')
soup = BeautifulSoup(data,"html.parser")
data1=soup.find_all(name = "div",attrs = {"class":"screening-bd"})
data1=str(data1) #转化为文本形式
建立一个新的文档,利于保存数据</