Python使用线程并使用mongodb来存储爬取结果

fenjincheng

已于 2022-05-11 14:44:32 修改

阅读量568

点赞数 1

CC 4.0 BY-SA版权

分类专栏：基础知识爬虫文章标签： mongodb 数据库 python

于 2022-05-11 14:41:49 首次发布

本文链接：https://blog.youkuaiyun.com/yunerchen/article/details/124709622

本文介绍了如何在Python中使用线程并利用MongoDB来存储爬取的300条数据。通过正确配置MongoDB数据库，程序运行正常，结果在数据库中得以验证。期待读者的反馈和建议，以便后续文章的改进。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

注意：运行本程序，请正确配置mongodb数据库

#coding:utf-8
import requests,pymongo,threading,logging,time
from bs4 import BeautifulSoup

class Spider:
    def __init__(self):
        client=pymongo.MongoClient(host='localhost')
        db=client.price
        self.collection=db.result

    def get_all_pages(self):
        base_url="http://www.100ppi.com/mprice/plist-1-505-"
        for i in range(1,11):
            new_url=base_url+str(i)+'.\html'
            self.getPage(new_url)

    def getPage(self,url):
        r=requests.get(url)
        r.encoding="utf-8"
        soup=BeautifulSoup(r.text,'lxml')
        s1=soup.select('.lp-table tr')
        s2=s1[1:]
        for tr in s2:
            result={}
            result['商品名称']=tr.select('.p-name a')[0].text
            result['规格']

最低0.47元/天解锁文章