代码如下
from bs4 import BeautifulSoup
html_path = "/Users/reed/Documents/dev/Plan-for-combating/week1/1_2/1_2answer_of_homework/index.html"
with open(html_path, 'r') as wb_data:
soup = BeautifulSoup(wb_data, "lxml")
images = soup.select("body > div > div > div.col-md-9 > div > div > div > img")
names = soup.select("body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a")
prices = soup.select("body > div > div > div.col-md-9 > div > div > div > div.caption > h4.pull-right")
reviews = soup.select("body > div > div > div.col-md-9 > div > div > div > div.ratings > p.pull-right")
stars_block = soup.find_all("div", class_="ratings")
for name, image, price, review, star_block in zip(names, images, prices, reviews, stars_block):
stars_num = len(star_block.find_all("span", class_="glyphicon glyphicon-star"))
print(name.get_text(), "\n ", image.get('src'), "\n ", price.get_text(), "\n ", review.get_text(),
"\n ", stars_num, " stars\n")
输出结果
/Library/Frameworks/Python.framework/Versions/3.5/bin/python3.5 /Users/reed/PycharmProjects/web01/web_parse2.py
EarPod
img/pic_0000_073a9256d9624c92a05dc680fc28865f.jpg
$24.99
65 reviews
5 stars
New Pocket
img/pic_0005_828148335519990171_c234285520ff.jpg
$64.99
12 reviews
4 stars
New sunglasses
img/pic_0006_949802399717918904_339a16e02268.jpg
$74.99
31 reviews
4 stars
Art Cup
img/pic_0008_975641865984412951_ade7a767cfc8.jpg
$84.99
6 reviews
3 stars
iphone gamepad
img/pic_0001_160243060888837960_1c3bcd26f5fe.jpg
$94.99
18 reviews
4 stars
Best Bed
img/pic_0002_556261037783915561_bf22b24b9e4e.jpg
$214.5
18 reviews
4 stars
iWatch
img/pic_0011_1032030741401174813_4e43d182fce7.jpg
$500
35 reviews
4 stars
Park tickets
img/pic_0010_1027323963916688311_09cc2d7648d9.jpg
$15.5
8 reviews
4 stars
Process finished with exit code 0
总结,学习BeautifulSoup里的find和find_all函数,非常好用,再使用find_all后获取一个特定区块的html代码后,可以使用for in循环再次进入子块进行find_all查找。