优快云 中的 OBJECTID 处理

本文介绍了一段用于优快云博客中处理OBJECTID的JavaScript代码,该代码实现了通过OBJECTID获取、显示及隐藏DOM元素的功能。适用于需要对特定元素进行操作的场景。

优快云 中的 OBJECTID 处理

    <script type="text/javascript" language="javascript">
    
<!--
        
function getObject(objectId)
        
{
            
if(document.getElementById)
            
{
                
return document.getElementById(objectId)
            }

            
else if(document.all&&document.all(objectId))
            
{
                
return document.all(objectId)
            }

            
else
            
{
                
return false
            }

        }

        
        
function hideObject(objectId)
        
{
            
var obj=getObject(objectId);
                        
            
if(obj&&obj.style)
            
{
                obj.style.display
="none";                    
                
return true
            }

            
return false
        }

        
        
function showObject(objectId)
        
{
            
var obj=getObject(objectId);
            
if(obj&&obj.style)
            
{
                obj.style.display
="";
                
return true
            }

            
return false
        }

        
        
function hideComment(){
            hideObject(
"commentForm");
            hideObject(
"commentform");
            
return;
        }

        
        
function showComment(){
            
return;
        }

    
//-->
    </script>

<script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"> </script>
""" @Project: spider-code-repository @File: Blog.py @IDE: PyCharm @Date: 2025/5/16 星期五 8:02 """ import json import os from datetime import datetime from bson import ObjectId from lxml import etree from dateutil import parser from DrissionPage import Chromium from MongoDB_Util import MongoDbUtil from html2text import html2text keyword = "deepseek" # 搜索的关键词 class Blog: """ 博客 https://blog.csdn.net/ 数据结构: { "BlogTitle": "用python向MongoDB插入时间字段", "BlogText": "我有很多很多的日志数据...", "BlogGraph": "https://i-blog.csdnimg.cn/...", "ReleaseTime": "2024-07-15T12:34:56", "Collection": 111, } """ def __init__(self): self.db = MongoDbUtil().get_db() self.page_url = f"https://so.csdn.net/so/search?q={keyword}" self.listen_url = "https://so.csdn.net/api/v3/search?" self.tab = Chromium().latest_tab def save_to_json(self, data_list): """将爬取数据保存为本地JSON文件""" # 创建data目录(若不存在) data_dir = os.path.join(os.path.dirname(__file__), "data") os.makedirs(data_dir, exist_ok=True) # 生成带时间戳的文件名(避免覆盖) timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") file_path = os.path.join(data_dir, f"blog_data_{timestamp}.json") # 写入JSON文件(使用ISO格式时间字符串,确保中文不乱码) with open(file_path, "w", encoding="utf-8") as f: json.dump(data_list, f, ensure_ascii=False, indent=4) def crawl(self): self.tab.listen.start(self.listen_url) self.tab.get(self.page_url) blog_data_list = [] # 存储待保存的JSON数据 while True: ret = self.tab.listen.wait(timeout=5) if ret: jo = json.loads(ret.response.raw_body, strict=False) result_vos = jo.get("result_vos") for item in result_vos: if item.get("type") == "blog": ID = self.db.Counter.find_one({"_id": ObjectId(MongoDbUtil.COUNTER_ID)}).get("counter") url_location = item.get("url_location") new_tab = self.tab.browser.new_tab(url_location) tree = etree.HTML(new_tab.html) BlogTitle = html2text(item.get("title")) BlogText = tree.xpath("//div[@id='article_content']//text()") BlogText = "".join(BlogText).strip() if BlogText else "" BlogGraphs = tree.xpath("//div[@id='article_content']//img/@src") BlogGraph = "" for bg in BlogGraphs: BlogGraph += bg + ",," created_at = item.get("created_at") ReleaseTime = parser.parse(created_at) # 转换为ISO格式时间字符串(JSON兼容) ReleaseTime_str = ReleaseTime.isoformat() Collection = tree.xpath("//span[contains(@class, 'count get-collection')]/@data-num")[0] Collection = int(Collection) # 构建MongoDB文档(保留datetime类型) mongo_doc = { "ID": ID, "BlogTitle": BlogTitle, "BlogText": BlogText, "BlogGraph": BlogGraph, "ReleaseTime": ReleaseTime, "Collection": Collection, } self.db.Blog.insert_one(mongo_doc) # 构建JSON数据(时间转为字符串) json_data = { "ID": ID, "BlogTitle": BlogTitle, "BlogText": BlogText, "BlogGraph": BlogGraph, "ReleaseTime": ReleaseTime_str, "Collection": Collection, } blog_data_list.append(json_data) self.db.Counter.update_one({"_id": ObjectId(MongoDbUtil.COUNTER_ID)}, {"$inc": {"counter": 1}}) print(ReleaseTime, Collection, BlogTitle) new_tab.close() # 所有数据处理完成后保存JSON文件 if blog_data_list: self.save_to_json(blog_data_list) print(f"数据已保存至:{os.path.join('data', f'blog_data_{datetime.now().strftime("%Y%m%d_%H%M%S")}.json')}") break if __name__ == '__main__': print("请输入关键词:") keyword = input() blog = Blog() blog.crawl() 在以上代码的基础上,增添能够爬取优快云中博客的阅读量的数据项以及功能
最新发布
05-24
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值