想要做AI小说推文,而可以使用本程序直接快速解决根据小说内容生成文案的繁琐步骤,根据官网榜单批量生成推文文案。
学习python语言,用爬虫来做兴趣或者第一个实用程序,是比较好上手并且反馈比较强的一种程序。今天来学习如何爬取小说网上热门榜单书目的小说内容并保存。
一、爬取榜单热门小说
首先我们需要下载这几个库
pip install requests
pip install parsel
pip install openai
pip install os
下一步找到想要爬取的榜单
复制URL:小说排行榜_番茄小说官网
之后就是去一步步分析网页代码,提取关键信息
这里我们使用parsel库中的css标签定位我们想要的信息,学过css的同学应该都知道,可以通过css的id、类名等确定一个标签等
url = 'https://fanqienovel.com/rank/1_2_261' #想要爬取的榜单页面
hotlist = requests.get(url=url,headers=headers).text #获取页面内容
selector = parsel.Selector(hotlist) #使用parsel解析内容
link = selector.css('.book-item-text .title a::attr(href)').getall() #使用css定位想要的信息
由于番茄的反爬机制,只能获取到前十的书目(总共是前30),下一步可以分析网页中的list文件发现剩余的数目链接位置,这里可能需要用到json解析去分析结果
下一步我们对每个数目的链接进行遍历,按章节保存
#拼接链接,爬取章节目录链接
for i in link:
url_content = 'https://fanqienovel.com' + i
content = requests.get(url=url_content,headers=headers).text
con_selector = parsel.Selector(content)
book_title = con_selector.css('.info-name h1::text').get() #书名
content_list = con_selector.css('.chapter-item-title::attr(href)').getall() #章节目录
#创建书名目录
path = r'F:\recommend_novel' + '\\' + book_title
os.makedirs(path,exist_ok=True)
#拼接链接,爬取书目内容
content_list.pop(0)
for j in range(0,10):
url_book = 'https://fanqienovel.com' + content_list[j]
context = requests.get(url=url_book,headers=headers).text
text_selector = parsel.Selector(context)
page_title = text_selector.css('.muye-reader-title::text').get()
novel_texe = text_selector.css('.muye-reader-content p::text').getall()
file_name = path + '\\' + page_title + '.txt'
还是由于番茄的反爬机制,对字体进行了加密,导致爬取的小说有些内容是乱码加密的,这里参考了Python爬虫--爬取文字加密的番茄小说 - RChow - 博客园
爬虫完成代码
import requests
import parsel
import os
#解密字典
dict_data = {
'58670': '0',
'58413': '1',
'58678': '2',
'58371': '3',
'58353': '4',
'58480': '5',
'58359': '6',
'58449': '7',
'58540': '8',
'58692': '9',
'58712': 'a',
'58542': 'b',
'58575': 'c',
'58626': 'd',
'58691': 'e',
'58561': 'f',
'58362': 'g',
'58619': 'h',
'58430': 'i',
'58531': 'j',
'58588': 'k',
'58440': 'l',
'58681': 'm',
'58631': 'n',
'58376': 'o',
'58429': 'p',
'58555': 'q',
'58498': 'r',
'58518': 's',
'58453': 't',
'58397': 'u',
'58356': 'v',
'58435': 'w',
'58514': 'x',
'58482': 'y',
'58529': 'z',
'58515': 'A',
'58688': 'B',
'58709': 'C',
'58344': 'D',
'58656': 'E',
'58381': 'F',
'58576': 'G',
'58516': 'H',
'58463': 'I',
'58649': 'J',
'58571': 'K',
'58558': 'L',
'58433': 'M',
'58517': 'N',
'58387': 'O',
'58687': 'P',
'58537': 'Q',
'58541': 'R',
'58458': 'S',
'58390': 'T',
'58466': 'U',
'58386': 'V',
'58697': 'W',
'58519': 'X',
'58511': 'Y',
'58634': 'Z',
'58611': '的',
'58590': '一',
'58398': '是',
'58422': '了',
'58657': '我',
'58666': '不',
'58562': '人',
'58345': '在',
'58510': '他',
'58496': '有',
'58654': '这',
'58441': '个',
'58493': '上',
'58714': '们',
'58618': '来',
'58528': '到',
'58620': '时',
'58403': '大',
'58461': '地',
'58481': '为',
'58700': '子',
'58708': '中',
'58503': '你',
'58442': '说',
'58639': '生',
'58506': '国',
'58663': '年',
'58436': '着',
'58563': '就',
'58391': '那',
'58357': '和',
'58354': '要',
'58695': '她',
'58372': '出',
'58696': '也',
'58551': '得',
'58445': '里',
'58408': '后',
'58599': '自',
'58424': '以',
'58394': '会',
'58348': '家',
'58426': '可',
'58673': '下',
'58417': '而',
'58556': '过',
'58603': '天',
'58565': '去',
'58604': '能',
'58522': '对',
'58632': '小',
'58622': '多',
'58350': '然',
'58605': '于',
'58617': '心',
'58401': '学',
'58637': '么',
'58684': '之',
'58382': '都',
'58464': '好',
'58487': '看',
'58693': '起',
'58608': '发',
'58392': '当',
'58474': '没',
'58601': '成',
'58355': '只',
'58573': '如',
'58499': '事',
'58469': '把',
'58361': '还',
'58698': '用',
'58489': '第',
'58711': '样',
'58457': '道',
'58635': '想',
'58492': '作',
'58647': '种',
'58623': '开',
'58521': '美',
'58609': '总',
'58530': '从',
'58665': '无',
'58652': '情',
'58676': '己',
'58456': '面',
'58581': '最',
'58509': '女',
'58488': '但',
'58363': '现',
'58685': '前',
'58396': '些',
'58523': '所',
'58471': '同',
'58485': '日',
'58613': '手',
'58533': '又',
'58589': '行',
'58527': '意',
'58593': '动',
'58699': '方',
'58707': '期',
'58414': '它',
'58596': '头',
'58570': '经',
'58660': '长',
'58364': '儿',
'58526': '回',
'58501': '位',
'58638': '分',
'58404': '爱',
'58677': '老',
'58535': '因',
'58629': '很',
'58577': '给',
'58606': '名',
'58497': '法',
'58662': '间',
'58479': '斯',
'58532': '知',
'58380': '世',
'58385': '什',
'58405': '两',
'58644': '次',
'58578': '使',
'58505': '身',
'58564': '者',
'58412': '被',
'58686': '高',
'58624': '已',
'58667': '亲',
'58607': '其',
'58616': '进',
'58368': '此',
'58427': '话',
'58423': '常',
'58633': '与',
'58525': '活',
'58543': '正',
'58418': '感',
'58597': '见',
'58683': '明',
'58507': '问',
'58621': '力',
'58703': '理',
'58438': '尔',
'58536': '点',
'58384': '文',
'58484': '几',
'58539': '定',
'58554': '本',
'58421': '公',
'58347': '特',
'58569': '做',
'58710': '外',
'58574': '孩',
'58375': '相',
'58645': '西',
'58592': '果',
'58572': '走',
'58388': '将',
'58370': '月',
'58399': '十',
'58651': '实',
'58546': '向',
'58504': '声',
'58419': '车',
'58407': '全',
'58672': '信',
'58675': '重',
'58538': '三',
'58465': '机',
'58374': '工',
'58579': '物',
'58402': '气',
'58702': '每',
'58553': '并',
'58360': '别',
'58389': '真',
'58560': '打',
'58690': '太',
'58473': '新',
'58512': '比',
'58653': '才',
'58704': '便',
'58545': '夫',
'58641': '再',
'58475': '书',
'58583': '部',
'58472': '水',
'58478': '像',
'58664': '眼',
'58586': '等',
'58568': '体',
'58674': '却',
'58490': '加',
'58476': '电',
'58346': '主',
'58630': '界',
'58595': '门',
'58502': '利',
'58713': '海',
'58587': '受',
'58548': '听',
'58351': '表',
'58547': '德',
'58443': '少',
'58460': '克',
'58636': '代',
'58585': '员',
'58625': '许',
'58694': '稜',
'58428': '先',
'58640': '口',
'58628': '由',
'58612': '死',
'58446': '安',
'58468': '写',
'58410': '性',
'58508': '马',
'58594': '光',
'58483': '白',
'58544': '或',
'58495': '住',
'58450': '难',
'58643': '望',
'58486': '教',
'58406': '命',
'58447': '花',
'58669': '结',
'58415': '乐',
'58444': '色',
'58549': '更',
'58494': '拉',
'58409': '东',
'58658': '神',
'58557': '记',
'58602': '处',
'58559': '让',
'58610': '母',
'58513': '父',
'58500': '应',
'58378': '直',
'58680': '字',
'58352': '场',
'58383': '平',
'58454': '报',
'58671': '友',
'58668': '关',
'58452': '放',
'58627': '至',
'58400': '张',
'58455': '认',
'58416': '接',
'58552': '告',
'58614': '入',
'58582': '笑',
'58534': '内',
'58701': '英',
'58349': '军',
'58491': '候',
'58467': '民',
'58365': '岁',
'58598': '往',
'58425': '何',
'58462': '度',
'58420': '山',
'58661': '觉',
'58615': '路',
'58648': '带',
'58470': '万',
'58377': '男',
'58520': '边',
'58646': '风',
'58600': '解',
'58431': '叫',
'58715': '任',
'58524': '金',
'58439': '快',
'58566': '原',
'58477': '吃',
'58642': '妈',
'58437': '变',
'58411': '通',
'58451': '师',
'58395': '立',
'58369': '象',
'58706': '数',
'58705': '四',
'58379': '失',
'58567': '满',
'58373': '战',
'58448': '远',
'58659': '格',
'58434': '士',
'58679': '音',
'58432': '轻',
'58689': '目',
'58591': '条',
'58682': '呢'
}
#榜单网页,回去榜单书目链接
headers = {
'Cookie':"x-web-secsdk-uid=b1bc4892-fdbb-4add-9155-bce7d899dbd6; Hm_lvt_2667d29c8e792e6fa9182c20a3013175=1756685113; HMACCOUNT=35DE050AC5E72E8F; s_v_web_id=verify_mf0cylb2_zVItNMNx_7oOY_4WE4_8cKF_kH7yhEzyE43B; csrf_session_id=1551ec55d182bf1036912fdf3500bef6; novel_web_id=7544905032774665778; Hm_lpvt_2667d29c8e792e6fa9182c20a3013175=1756686278; ttwid=1%7CJ7VBrD5LHXa6Ui1qXBIHi4fCf-CeGe-TQ5YIT-v8MfQ%7C1756686274%7C0bc91977c931352ecc751657e7a4ea5eb2d1936701435f13c9f2f8d29d803379",
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36'
}
url = 'https://fanqienovel.com/rank/1_2_261'
hotlist = requests.get(url=url,headers=headers).text
selector = parsel.Selector(hotlist)
link = selector.css('.book-item-text .title a::attr(href)').getall()
#拼接链接,爬取章节目录链接
for i in link:
url_content = 'https://fanqienovel.com' + i
content = requests.get(url=url_content,headers=headers).text
con_selector = parsel.Selector(content)
book_title = con_selector.css('.info-name h1::text').get() #书名
content_list = con_selector.css('.chapter-item-title::attr(href)').getall() #章节目录
#创建书名目录
path = r'F:\recommend_novel' + '\\' + book_title
os.makedirs(path,exist_ok=True)
#拼接链接,爬取书目内容
content_list.pop(0)
for j in range(0,10):
url_book = 'https://fanqienovel.com' + content_list[j]
context = requests.get(url=url_book,headers=headers).text
text_selector = parsel.Selector(context)
page_title = text_selector.css('.muye-reader-title::text').get()
novel_texe = text_selector.css('.muye-reader-content p::text').getall()
file_name = path + '\\' + page_title + '.txt'
#解码加密字体,还原文本
novel_content = ''
for paragraph in novel_texe:
for index in paragraph:
try:
word = dict_data[str(ord(index))]
except KeyError:
word = index
novel_content += word
novel_content += '\n'
with open(file_name,'a',encoding='utf-8') as f:
f.write(novel_content)
print(file_name + "已完成")
二、大模型API接口调用
这里推荐模型使用网站OpenRouter
这里模型很丰富,并且还有许多免费的和国外的模型,都可以调用,里面提供了API调用示例,操作起来很方便
from openai import OpenAI
import os
tip = '你是一名作小说推文的自媒体作者,我需要你把这些内容改成更适合推文视频的文案,开头要有爆点,足够吸引观众,整篇文案通顺,换成主角第一人称叙述故事,情节跌宕冲突明显一些,文案要求符合时长达到3-5分钟视频的字数,只需要一整段文案'
path = r"F:\recommend_novel"
dir=[]
def get_model_response(tip,prompt):
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="****************",
)
completion = client.chat.completions.create(
extra_body={},
model="google/gemini-2.5-flash-image-preview:free",
messages=[
{
"role": "system",
"content": tip
},
{
"role": "user",
"content": prompt
}
]
)
return completion.choices[0].message.content
for dirpath, dirname, filename in os.walk(path):
if len(filename) > 0:
prompt = ''
for i in filename:
filepath = dirpath + '\\' + i
file = open(filepath,'r',encoding='utf-8')
prompt += file.read()
essay = get_model_response(tip,prompt)
essay_path = dirpath + '\\' + "推文文案.txt"
essay_file = open(essay_path,'a',encoding='utf-8')
essay_file.write(essay)
print("已完成"+essay_path)

被折叠的 条评论
为什么被折叠?



