find 的 Mark all 标记如何清除掉

本文介绍了两种快速格式化文本的方法:一是使用快捷键Ctrl+Shift+F2;二是通过关闭并重新打开文件来实现。

方法一:
       快捷方式
       Ctrl+Shift+F2 

方法二:
     关掉文件   再重新打开  

import json import requests from bs4 import BeautifulSoup url="https://movie.douban.com/subject/36154853/" headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36', 'Accept-Language': 'zh-CN,zh;q=0.9', 'Referer': 'https://www.douban.com/' } response = requests.get(url, headers=headers) response.raise_for_status() html = response.text soup = BeautifulSoup(html, 'html.parser') start_span = soup.find('span', class_='start-mark') end_span = soup.find('span', class_='end-mark') # 核心逻辑:遍历两个标记之间的所有节点 content = [] current_node = start_span.next_sibling while current_node and current_node != end_span: if current_node.name == 'script' or current_node.name == 'style': current_node = current_node.next_sibling continue # 处理文本节点和标签节点 if current_node.string: content.append(current_node.string.strip()) elif current_node.name: content.append(current_node.get_text(strip=True)) current_node = current_node.next_sibling print(" ".join(filter(None, content))) # 输出:这是需要提取的文本内容 中间嵌套子标签内容 bs=BeautifulSoup(response.text,"html.parser") items=bs.find_all("p",{"class":"comment-item"}) soup = BeautifulSoup(html, 'html.parser') start_span = soup.find('span', text='开始') end_span = soup.find('span', text='结束') content = [] current_node = start_span.next_sibling while current_node and current_node != end_span: if current_node.string: content.append(current_node.string.strip()) else: content.append(current_node.get_text().strip()) current_node = current_node.next_sibling news=[] for item in items: book_comment=item.find_all("div",{"class":"title"}).get_text().strip() news.append( { "book_comment":book_comment } ) with open("book.json","w",encoding="utf-8") as f: json.dump(news,f,ensure_ascii=False) def extract_section(start_selector, end_selector): start = soup.select_one(start_selector) end = soup.select_one(end_selector) return soup.new_tag('div').wrap(start).unwrap_until(end)修正这个代码
05-11
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值