爬虫爬取人民网新闻

水0

已于 2022-01-25 00:43:56 修改

阅读量2k

点赞数 3

文章标签：爬虫 python

于 2022-01-25 00:42:55 首次发布

本文链接：https://blog.youkuaiyun.com/m0_62609328/article/details/122677868

版权

本文介绍了一个简单的Python爬虫实现，用于抓取人民网新闻并提供新闻浏览和收藏功能。通过BeautifulSoup进行页面解析，实现了新闻内容的提取和本地保存。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

import requests#网页请求
import bs4#网页解析
import re#正则表达式
import os#cmd命令
import time

def strcmp(str1,str2):
    if  str2:#爬虫爬的数据是空不做比较
        if str1[:len(str2)].replace(' ','') == str2.replace(' ',''):
            return 1
        elif str2[:len(str1)].replace(' ','') == str1.replace(' ',''):
            return 1
    return 0

def print_article(soup):
    content = soup.find('div',class_='rm_txt_con cf')
    if content:
        for each in content:
            if each and (each.string != None):
                print(each.string)
        return 1
    content = soup.find_all('p', style='text-indent: 2em;')
    if content:
        for each in soup.find_all('p', style='text-indent: 2em;'):
            if each and (each.string != None):
                print(each.string)
            elif each.span and (each.string != None):
                print(each.span.string)
        return 1

def save_news(soup,newsname):#收藏新闻（新闻保存到本地）
    save = input("是否收藏该新闻？\n收藏请输入0以外任何字符\n不收藏请输入0")
    if save == '0':
        return 0
    path = 'E:/py爬虫/news/' + time.strftime("%Y%m%d") + newsname.replace('《','').replace('》','').replace('"','') + '.txt'
    file = open(path,'w',encoding='utf-8')
    content = soup.find('div', class_='rm_txt_con cf')
    if content:
        print('收藏成功')
        for each in content:
            if each and (each.string != None):
                file.write(each.string)
        file.close()
        return 1
    content = soup.find_all('p', style='text-indent: 2em;')
    if content:
        prin

最低0.47元/天解锁文章