腾讯视频评论抓取脚本详细使用教程-优快云博客

=========================================================================

重要提示：本爬虫工具仅限用于合法、合规的数据采集用途。

法律合规性

使用本爬虫工具前，您必须确保您的数据采集行为符合《中华人民共和国网络安全法》、《数据安全法》、《个人信息保护法》等法律法规的要求。
您有责任了解并遵守目标网站的服务条款、robots.txt协议及任何访问限制。

禁止行为

严禁使用本工具进行以下活动：

未经授权爬取个人隐私信息
侵犯他人知识产权或商业秘密
干扰网站正常运行或造成服务器过载
绕过网站技术保护措施
用于任何非法目的或商业竞争行为

用户责任

您需对使用本工具的一切行为承担全部法律责任。
建议在爬取前评估数据敏感性，必要时获取数据主体的明确授权。
应当设置合理的请求频率，避免对目标服务器造成负担。

免责声明

开发者提供本工具仅作为技术研究和学习用途，不对用户的使用行为及后果承担任何责任。如因使用本工具导致任何法律纠纷，由使用者自行承担责任。

=========================================================================

一、安装Tampermonkey（篡改猴）浏览器扩展

1.1 什么是Tampermonkey

Tampermonkey（篡改猴）是一款流行的浏览器扩展，允许用户安装和运行自定义JavaScript脚本（称为"用户脚本"），可以修改网页行为、增强功能。我们将使用它来运行评论抓取脚本。

1.2 安装步骤（以Chrome浏览器为例）

打开Chrome浏览器
访问Chrome网上应用店：https://chrome.google.com/webstore/category/extensions
在搜索框中输入"Tampermonkey"
找到官方Tampermonkey扩展（开发者：Jan Biniok），点击"添加至Chrome"按钮
在弹出的确认对话框中点击"添加扩展程序"
安装完成后，浏览器右上角会出现Tampermonkey图标（一只猴子的图案）

其他浏览器安装方式：

Microsoft Edge：访问Microsoft Edge加载项商店，搜索"Tampermonkey"并安装
Firefox：访问Firefox附加组件页面，搜索"Tampermonkey"并安装
Safari：通过App Store下载Tampermonkey

1.3 确认Tampermonkey已正确安装

点击浏览器右上角的Tampermonkey图标（猴子图案）
如果看到菜单选项如"创建新脚本"、"已安装的用户脚本"等，表示安装成功
首次使用可能需要点击图标并选择"允许此扩展程序在所有网站上运行"

二、安装评论抓取脚本

2.1 创建新脚本

点击浏览器右上角的Tampermonkey图标
在下拉菜单中选择"创建新脚本"选项
这将打开Tampermonkey编辑器页面，其中包含一个默认脚本模板

2.2 替换脚本内容

选中编辑器中的所有默认代码（或按Ctrl+A全选）
按Delete键删除所有默认代码
复制下面提供的完整脚本代码（从"// ==UserScript=="开始到"})();"结束）
将复制的代码粘贴到编辑器中（按Ctrl+V）

1// ==UserScript==
2// @name         Tencent Video 评论长期抓取（自动滚动、展开跟评、每500条分卷保存）
3// @namespace    http://tampermonkey.net/
4// @version      1.0
5// @description  在腾讯视频评论区自动模仿人工滚动加载，展开跟评，抓取每条评论（含跟评）并按每500条生成一个 txt 文件，支持续抓（localStorage）。在控制台使用 window.runTencentScraper() 启动/停止/重置等。
6// @match        *://v.qq.com/*
7// @grant        none
8// @run-at       document-idle
9// ==/UserScript==
10
11(function () {
12  'use strict';
13  if (window.__tencent_scraper_installed_v1) return;
14  window.__tencent_scraper_installed_v1 = true;
15
16  /**
17   * CONFIG (可在脚本顶部修改这些参数)
18   */
19  const CONFIG = {
20    SAVE_BATCH: 500,                // 每多少条评论保存并生成一个新文件（主+跟评都算一条）
21    SMALL_DELAY_MS: [800, 2200],    // 每次动作间的小随机延迟范围（ms）
22    LARGE_REST_MS: 15000,           // 大休息约 15s
23    LARGE_REST_EVERY_ACTIONS: [5, 12], // 每隔多少次动作随机触发一次大休息（min,max）
24    MAX_SCROLL_STEPS: 40,           // 每次"滑到底"的步数（人为分段滑动）
25    EXTRACT_INTERVAL_MS: 1200,      // 每次提取 DOM 后等待的短延时（ms）
26    KEY_PREFIX: 'tencent_scraper_v1_', // localStorage 前缀
27    AUTO_START: false,              // 是否在脚本注入后自动启动（我默认 false，让你手动启动）
28  };
29
30  // Helper: sleep + random
31  function sleep(ms) { return new Promise(res => setTimeout(res, ms)); }
32  function randInt(min, max) { return Math.floor(Math.random() * (max - min + 1)) + min; }
33  function humanDelay() { return randInt(CONFIG.SMALL_DELAY_MS[0], CONFIG.SMALL_DELAY_MS[1]); }
34  function randBetween(pair) { return randInt(pair[0], pair[1]); }
35
36  /**
37   * Persistent storage helpers (localStorage)
38   * 保存结构：
39   *   - KEY_PREFIX + 'all' : JSON array of comment objects (累计所有已抓到的)
40   *   - KEY_PREFIX + 'seen' : object { key: true } 用于快速去重
41   *   - KEY_PREFIX + 'partIndex' : 当前分卷编号（int）
42   */
43  const LS = {
44    keyAll: CONFIG.KEY_PREFIX + 'all',
45    keySeen: CONFIG.KEY_PREFIX + 'seen',
46    keyPartIndex: CONFIG.KEY_PREFIX + 'partIndex',
47    loadAll() {
48      try {
49        const s = localStorage.getItem(this.keyAll);
50        return s ? JSON.parse(s) : [];
51      } catch (e) { console.warn('loadAll error', e); return []; }
52    },
53    saveAll(arr) {
54      try { localStorage.setItem(this.keyAll, JSON.stringify(arr)); } catch (e) { console.warn('saveAll error', e); }
55    },
56    loadSeen() {
57      try {
58        const s = localStorage.getItem(this.keySeen);
59        return s ? JSON.parse(s) : {};
60      } catch (e) { console.warn('loadSeen error', e); return {}; }
61    },
62    saveSeen(obj) {
63      try { localStorage.setItem(this.keySeen, JSON.stringify(obj)); } catch (e) { console.warn('saveSeen error', e); }
64    },
65    loadPartIndex() {
66      const v = localStorage.getItem(this.keyPartIndex);
67      return v ? parseInt(v, 10) : 1;
68    },
69    savePartIndex(n) {
70      try { localStorage.setItem(this.keyPartIndex, String(n)); } catch (e) { console.warn('savePartIndex error', e); }
71    },
72    resetAll() {
73      localStorage.removeItem(this.keyAll);
74      localStorage.removeItem(this.keySeen);
75      localStorage.removeItem(this.keyPartIndex);
76    }
77  };
78
79  // Initialize storage if missing
80  if (!localStorage.getItem(LS.keyAll)) { LS.saveAll([]); }
81  if (!localStorage.getItem(LS.keySeen)) { LS.saveSeen({}); }
82  if (!localStorage.getItem(LS.keyPartIndex)) { LS.savePartIndex(1); }
83
84  /**
85   * Core heuristics for detecting and extracting comment nodes in Tencent Video pages.
86   * 说明：视频网站 DOM 可能经常变动，下面使用较通用的启发式规则：
87   *  - 先尝试一些常见的评论容器选择器（列表），否则回退为在页面中查找具有"作者、时间、内容"特征的元素集合。
88   *  - 对每个候选元素，尽量提取：author, time, content, likes, comment_id (若有)
89   */
90  function findCommentNodes() {
91    // 常见可能的容器选择器（按经验列出）
92    const containerSelectors = [
93      '.comment_item', '.comment_item__wrap', '.comment_list li', '.comment-list li',
94      '.mod_comment .comment-item', '.comm-list li', '.cmt-list li', '.reply-list li',
95      '[data-commentid]', '.open-comment-list li'
96    ];
97    let nodes = [];
98    for (const sel of containerSelectors) {
99      try {
100        const found = Array.from(document.querySelectorAll(sel));
101        if (found.length > 0) { nodes = nodes.concat(found); }
102      } catch (e) { /* ignore */ }
103    }
104    // 去掉重复
105    nodes = Array.from(new Set(nodes));
106
107    // 如果没找到明显的容器，就做更广泛的启发式选择（搜索含有明显"评论文本"的元素）
108    if (nodes.length === 0) {
109      const candidates = Array.from(document.querySelectorAll('li, div, article'));
110      for (const el of candidates) {
111        try {
112          if (!el.isConnected) continue;
113          // 必须可见且文本量合理
114          const txt = (el.innerText || '').trim();
115          if (txt.length < 10 || txt.length > 2000) continue;
116          // 包含时间字样或包含"回复/楼"关键字或包含"作者姓名后跟时间"模式的元素更可能是评论
117          if (/\d{4}[-/年]\d{1,2}[-/月]\d{1,2}|小时前|分钟前|今天/.test(txt) ||
118              /回复|跟评|楼主|楼/.test(txt) ||
119              (el.querySelector && (el.querySelector('a') || el.querySelector('time')))) {
120            nodes.push(el);
121          }
122        } catch (e) { continue; }
123      }
124      // dedupe
125      nodes = Array.from(new Set(nodes));
126    }
127
128    // Filter visible nodes and those not nested inside other comment nodes (避免重复抓父级与子级)
129    nodes = nodes.filter(el => {
130      try {
131        if (el.offsetParent === null && getComputedStyle(el).position !== 'fixed') return false;
132        // 排除包含在另一个已选中的节点内的节点（尽量仅保留最小节点）
133        return true;
134      } catch (e) { return false; }
135    });
136
137    return nodes;
138  }
139
140  // Extract data from a comment-like element using multiple fallbacks
141  function extractFromNode(node) {
142    try {
143      // comment id (若有)
144      let commentId = node.getAttribute('data-commentid') || node.getAttribute('data-id') || node.id || null;
145
146      // author: 尝试常见选择器，回退到第一个明显的 link/strong/span
147      let author = null;
148      const authorSelectors = ['.name', '.user', '.username', '.nick', '.nick-name', '.author', '.user-name', 'a'];
149      for (const s of authorSelectors) {
150        const el = node.querySelector(s);
151        if (el && (el.textContent || '').trim().length > 0) { author = el.textContent.trim(); break; }
152      }
153      if (!author) {
154        // 找第一个短文本（<a> 或 <span>），作为作者候选（避免把评论正文误取）
155        const links = Array.from(node.querySelectorAll('a, span, strong, b'));
156        for (const l of links) {
157          const t = (l.textContent || '').trim();
158          if (t.length > 1 && t.length <= 30 && !/\d{4}[-/年]/.test(t)) { author = t; break; }
159        }
160      }
161      if (!author) author = '未知作者';
162
163      // time: 尝试 time 标签或包含时间格式的 span
164      let time = null;
165      const timeSelectors = ['time', '.time', '.date', '.reply-time', '.ts'];
166      for (const s of timeSelectors) {
167        const el = node.querySelector(s);
168        if (el && (el.getAttribute('datetime') || el.textContent)) {
169          time = (el.getAttribute('datetime') || el.textContent || '').trim();
170          break;
171        }
172      }
173      if (!time) {
174        // 扫描文本中可能的时间字符串
175        const txt = node.innerText || '';
176        const m = txt.match(/\d{4}[-/年]\d{1,2}[-/月]\d{1,2}|\d{1,2}小时前|\d{1,2}分钟前|今天|刚刚/);
177        if (m) time = m[0];
178      }
179      if (!time) time = '未知时间';
180
181      // content: 优先取评论正文选择器，否则取 node 的短段落
182      let content = '';
183      const contentSelectors = ['.text', '.content', '.comment-text', '.reply-content', '.cmt-txt', '.cmt-content', '.comment-body', 'p'];
184      for (const s of contentSelectors) {
185        const el = node.querySelector(s);
186        if (el && (el.textContent || '').trim().length > 0) { content = el.textContent.trim(); break; }
187      }
188      if (!content) {
189        // 从 node 的直接文本中提取去掉 author/time 的大块
190        content = (node.innerText || '').trim();
191        // 尝试删除 author/time 子文本（若能找到）
192        try {
193          if (author) content = content.replace(author, '');
194          if (time) content = content.replace(time, '');
195        } catch (e) {}
196        content = content.trim();
197      }
198      if (!content) content = '无内容';
199
200      // likes: 尝试查找带有点赞/赞数的元素
201      let likes = null;
202      const likeSelectors = ['.like', '.likes', '.fav', '.vote', '.praise', '.zan', '.up'];
203      for (const s of likeSelectors) {
204        const el = node.querySelector(s);
205        if (el) {
206          const num = (el.textContent || '').trim().match(/\d+/);
207          if (num) { likes = parseInt(num[0], 10); break; }
208        }
209      }
210      // 有时点赞是一个按钮或 span 紧邻 SVG
211      if (likes === null) {
212        const poss = Array.from(node.querySelectorAll('button, span, i'));
213        for (const p of poss) {
214          const t = (p.textContent || '').trim();
215          if (/^\d+$/.test(t) && t.length <= 6) { likes = parseInt(t, 10); break; }
216        }
217      }
218      if (likes === null) likes = 0;
219
220      return { commentId, author, time, content, likes };
221    } catch (e) {
222      return null;
223    }
224  }
225
226  // 点击并展开页面中可能的"查看全部/展开回复/更多回复/展开全文"等按钮（多次尝试）
227  async function expandAllOnPage(maxAttempts = 5) {
228    const keywords = ['展开', '显示全部', '查看全部', '更多回复', '展开回复', '查看更多回复', '展开全文', '回复全部', '加载更多回复', '全部回复'];
229    for (let attempt = 0; attempt < maxAttempts; attempt++) {
230      let clicked = 0;
231      // 先搜索明显的按钮标签
232      const nodes = Array.from(document.querySelectorAll('button, a, span, div'));
233      for (const node of nodes) {
234        try {
235          if (node.offsetParent === null) continue; // not visible
236          const txt = (node.innerText || '').trim();
237          if (!txt) continue;
238          for (const kw of keywords) {
239            if (txt.includes(kw)) {
240              try { node.click(); clicked++; await sleep(randInt(80, 160)); } catch (e) { /* ignore */ }
241              break;
242            }
243          }
244        } catch (e) { continue; }
245      }
246      // 如果本次没有找到可点的则退出
247      if (clicked === 0) break;
248      await sleep(300 + randInt(0, 600));
249    }
250  }
251
252  // 平滑向下滚动以触发懒加载（把一次"滑到底"拆成多步）
253  async function humanScrollToBottom() {
254    try {
255      const totalSteps = CONFIG.MAX_SCROLL_STEPS;
256      for (let i = 0; i < totalSteps; i++) {
257        // 目标位置逐步接近 document bottom
258        const target = document.body.scrollHeight - window.innerHeight;
259        const cur = window.scrollY;
260        const remaining = target - cur;
261        if (remaining <= 50) {
262          window.scrollTo({ top: document.body.scrollHeight, behavior: 'smooth' });
263          await sleep(randInt(200, 500));
264          break;
265        }
266        // 每步移动部分距离
267        const step = Math.max(200, Math.floor(remaining / (totalSteps - i)));
268        window.scrollBy({ top: step, left: 0, behavior: 'smooth' });
269        await sleep(randInt(120, 300));
270      }
271      // 等待动态加载
272      await sleep(600 + randInt(0, 800));
273      // 最后确保到底
274      window.scrollTo({ top: document.body.scrollHeight, behavior: 'instant' });
275      await sleep(300 + randInt(0, 400));
276    } catch (e) { console.warn('scroll error', e); }
277  }
278
279  // 将给定的评论数组保存为 txt 并触发浏览器下载
280  function saveCommentsToFile(comments, partIndex) {
281    const ts = new Date().toISOString().replace(/[:.]/g, '-');
282    const filename = `tencent_comments_part${String(partIndex).padStart(3, '0')}_${ts}.txt`;
283    let text = '';
284    comments.forEach((c, idx) => {
285      text += `评论 ${idx + 1}:\n`;
286      if (c.commentId) text += `id: ${c.commentId}\n`;
287      text += `作者: ${c.author}\n`;
288      text += `时间: ${c.time}\n`;
289      text += `点赞: ${c.likes}\n`;
290      text += `内容:\n${c.content}\n\n---\n\n`;
291    });
292    const blob = new Blob([text], { type: 'text/plain;charset=utf-8' });
293    const url = URL.createObjectURL(blob);
294    const a = document.createElement('a');
295    a.href = url;
296    a.download = filename;
297    document.body.appendChild(a);
298    a.click();
299    a.remove();
300    URL.revokeObjectURL(url);
301    console.log(`[TencentScraper] 已下载文件 ${filename}（条数：${comments.length}）`);
302  }
303
304  /**
305   * 主抓取循环逻辑（可启动与停止）
306   */
307  let controller = {
308    running: false,
309    actionCounter: 0,
310    stopRequested: false
311  };
312
313  async function mainLoop(options = {}) {
314    if (controller.running) {
315      console.log('[TencentScraper] 已在运行中。');
316      return;
317    }
318    controller.running = true;
319    controller.stopRequested = false;
320    controller.actionCounter = 0;
321
322    // load persistent state
323    let all = LS.loadAll();         // array of comment objects（平铺）
324    let seen = LS.loadSeen();       // map for de-dup key => true
325    let partIndex = LS.loadPartIndex();
326
327    // helper: create dedupe key for a comment
328    function makeKey(obj) {
329      // 优先 use commentId + author + time; 若无 commentId，再用 author+time+前100字符内容
330      const cid = obj.commentId || '';
331      const contentStart = (obj.content || '').slice(0, 100).replace(/\s+/g, ' ');
332      return `${cid}|||${obj.author}|||${obj.time}|||${contentStart}`;
333    }
334
335    console.log('[TencentScraper] 抓取开始。页面 URL:', location.href);
336
337    while (!controller.stopRequested) {
338      try {
339        // 1) 尝试展开页面中所有"查看更多/展开回复"等
340        await expandAllOnPage();
341
342        // 2) 平滑滑到底部以触发更多加载
343        await humanScrollToBottom();
344
345        // 3) 等待一小段再提取
346        await sleep(CONFIG.EXTRACT_INTERVAL_MS + randInt(0, 800));
347
348        // 4) 找到可能的评论节点并提取
349        const nodes = findCommentNodes();
350        let newAdded = 0;
351        for (const node of nodes) {
352          const data = extractFromNode(node);
353          if (!data) continue;
354          // 针对"跟评展开"：当我们点击展开回复后，跟评也会变成独立的节点，extractFromNode 会把它们作为独立条目抓取（满足你"主评论+跟评都算一条"的需求）
355          const key = makeKey(data);
356          if (!seen[key]) {
357            seen[key] = true;
358            all.push(data);
359            newAdded++;
360          }
361        }
362
363        if (newAdded > 0) {
364          LS.saveAll(all);
365          LS.saveSeen(seen);
366          console.log(`[TencentScraper] 本轮新增 ${newAdded} 条；累计 ${all.length} 条。`);
367        } else {
368          console.log('[TencentScraper] 本轮未发现新增评论（或选择器暂未匹配到新节点）。');
369        }
370
371        // 5) 如果累计达到分卷阈值，保存一部分到文件并递增 partIndex
372        //    实现方式：当 all.length >= partIndex * SAVE_BATCH 时，把对应这一部分的条目单独导出（避免重复导出）
373        while (all.length >= partIndex * CONFIG.SAVE_BATCH) {
374          const start = (partIndex - 1) * CONFIG.SAVE_BATCH;
375          const partItems = all.slice(start, start + CONFIG.SAVE_BATCH);
376          saveCommentsToFile(partItems, partIndex);
377          partIndex++;
378          LS.savePartIndex(partIndex);
379          // 人为短暂停
380          await sleep(500 + randInt(0, 600));
381        }
382
383        // 6) 控制随机间隔 + 大休息策略
384        controller.actionCounter++;
385        // small random pause
386        await sleep(humanDelay());
387
388        // maybe do a large rest
389        const [minActions, maxActions] = CONFIG.LARGE_REST_EVERY_ACTIONS;
390        if (controller.actionCounter % randBetween([minActions, maxActions]) === 0) {
391          console.log(`[TencentScraper] 执行大休息 ${CONFIG.LARGE_REST_MS}ms`);
392          await sleep(CONFIG.LARGE_REST_MS + randInt(0, 3000));
393        }
394
395        // Optional termination condition: 如果页面在多次轮询都没有新增，脚本可以在 n 次后停止（这里我们不强制停止，除非用户调用 stop）
396        // 你也可以在此加入 maxComments 或 maxRuns 的配置限制。
397      } catch (e) {
398        console.error('[TencentScraper] 循环出错：', e);
399        // 出错时等一段时间再继续
400        await sleep(2000 + randInt(0, 3000));
401      }
402    } // end while
403
404    // 停止后，保存所有累计并生成一个最终累计文件（可选）
405    LS.saveAll(all);
406    LS.saveSeen(seen);
407    console.log('[TencentScraper] 已停止。累计条数：', all.length);
408    // 生成最终全部文件（可选行为；你可以注释掉下面一行如果不想自动下载汇总文件）
409    const summaryTs = new Date().toISOString().replace(/[:.]/g, '-');
410    const summaryFileName = `tencent_comments_all_${summaryTs}.txt`;
411    // 仅在条数>0 时生成汇总
412    if (all.length > 0) {
413      saveCommentsToFile(all, `ALL_${summaryTs}`);
414      console.log(`[TencentScraper] 已生成累计文件 ${summaryFileName}`);
415    }
416
417    controller.running = false;
418  } // end mainLoop
419
420  // Exposed controls for console
421  window.runTencentScraper = function (cmd) {
422    cmd = (cmd || '').toString().trim().toLowerCase();
423    if (!cmd || cmd === 'start') {
424      if (controller.running) { console.log('[TencentScraper] 已在运行中。'); return; }
425      mainLoop();
426    } else if (cmd === 'stop') {
427      if (!controller.running) { console.log('[TencentScraper] 当前未运行。'); return; }
428      controller.stopRequested = true;
429      console.log('[TencentScraper] 已请求停止，脚本会在当前循环结束后停止。');
430    } else if (cmd === 'status') {
431      console.log('[TencentScraper] status:', { running: controller.running, actionCounter: controller.actionCounter });
432      console.log('localStorage counts: all=', (LS.loadAll() || []).length, 'partIndex=', LS.loadPartIndex());
433    } else if (cmd === 'reset') {
434      if (confirm('确认要清空脚本在 localStorage 的存储（已抓取的数据/去重信息/分卷索引）吗？')) {
435        LS.resetAll();
436        console.log('[TencentScraper] 本地状态已重置。下一次运行会从头开始抓取。');
437      } else {
438        console.log('[TencentScraper] 已取消重置。');
439      }
440    } else if (cmd === 'export') {
441      // 手动导出当前累计为单个文件
442      const all = LS.loadAll();
443      if (!all || all.length === 0) { console.log('[TencentScraper] 没有可导出的条目。'); return; }
444      saveCommentsToFile(all, `MANUAL_EXPORT_${new Date().toISOString().replace(/[:.]/g, '-')}`);
445      console.log('[TencentScraper] 手动导出完成。');
446    } else {
447      console.log(`runTencentScraper: 未知命令 "${cmd}". 支持命令：start | stop | status | reset | export`);
448    }
449  };
450
451  // 脚本注入完成提示（以及简短使用说明）
452  console.log('%c[TencentScraper] 已注入。', 'color: green; font-weight: bold');
453  console.log('[TencentScraper] 使用方法：在控制台调用 window.runTencentScraper("start") 开始，"stop" 停止，"status" 查看状态，"reset" 清空本地存储，"export" 手动导出累计文件。');
454
455  // Auto-start based on config
456  if (CONFIG.AUTO_START) {
457    console.log('[TencentScraper] CONFIG.AUTO_START = true，自动开始抓取。');
458    mainLoop();
459  }
460})();

2.3 保存脚本

点击编辑器左上角的"文件"菜单
选择"保存"选项（或直接按Ctrl+S）
关闭编辑器标签页
再次点击浏览器右上角的Tampermonkey图标
在菜单中应能看到已安装的脚本"Tencent Video 评论长期抓取（自动滚动、展开跟评、每500条分卷保存）"

三、准备抓取腾讯视频评论

3.1 找到目标视频

打开浏览器，访问腾讯视频网站：https://v.qq.com
搜索或浏览找到你想抓取评论的视频
点击播放视频

3.2 导航到评论区域

播放页面加载完成后，向下滚动页面
找到"评论"区域（通常在视频下方，可能需要点击"评论"标签）
重要：确保评论区域已经加载并可见，评论至少显示几条

3.3 验证脚本已正确注入

按F12键打开开发者工具
切换到"控制台(Console)"标签
查看是否有绿色的"[TencentScraper] 已注入。"消息
如果看到这条消息，表示脚本已成功注入到当前页面

四、开始抓取评论

4.1 启动抓取脚本

在开发者工具的控制台(Console)中输入以下命令，然后按回车：

1window.runTencentScraper('start')

4.2 观察抓取过程

脚本启动后，页面会自动开始模拟人工滚动
你会看到页面不断向下滚动，加载更多评论

控制台会显示抓取进度，例如：

1[TencentScraper] 本轮新增 12 条；累计 42 条。

每当抓取达到500条评论，脚本会自动生成一个txt文件并下载到你的电脑
文件名格式：tencent_comments_part001_2023-11-15T14-30-22-123Z.txt

4.3 停止抓取

如果需要停止抓取，在控制台输入：

1window.runTencentScraper('stop')

脚本会在完成当前循环后停止，并自动生成一个包含所有已抓取评论的汇总文件。

4.4 查看抓取状态

随时可以查看脚本运行状态，在控制台输入：

1window.runTencentScraper('status')

这将显示当前是否正在运行、已执行的动作次数以及已抓取的评论数量。

4.5 重置抓取

如果你想从头开始抓取（清空之前已抓取的数据），在控制台输入：

1window.runTencentScraper('reset')

系统会弹出确认对话框，确认后将清空本地存储的所有抓取数据。

五、管理抓取的数据

5.1 找到下载的评论文件

默认情况下，下载的txt文件会保存在浏览器的默认下载文件夹
通常在"下载"文件夹或你设置的特定下载位置
文件命名格式：tencent_comments_partNNN_timestamp.txt，其中NNN是分卷号

5.2 文件格式说明

每条评论在文件中按以下格式组织：

1评论 1:
2id: comment_id_here
3作者: 用户名
4时间: 2023-11-15 14:30
5点赞: 5
6内容:
7这里是评论的具体内容
8
9---
10
11评论 2:
12...

5.3 合并多个分卷文件

如果你抓取了大量评论，可能会有多个分卷文件（每500条一个文件）。可以使用文本编辑器或简单的脚本合并这些文件。

六、常见问题解决

6.1 脚本没有反应

问题：输入启动命令后没有任何反应
解决：
1. 确认你已在视频评论页面上（URL包含v.qq.com）
2. 重新加载页面，等待页面完全加载后再尝试
3. 检查控制台是否有错误消息
4. 确认Tampermonkey扩展已启用（图标不应是灰色的）

6.2 无法找到评论元素

问题：控制台显示"本轮未发现新增评论"
解决：
1. 确保评论区域已完全展开并可见
2. 可能是腾讯视频更新了页面结构，需要更新脚本中的选择器
3. 尝试手动滚动到评论区域，然后再启动脚本

6.3 脚本运行很慢

解释：这是故意设计的，模拟人类行为以避免被反爬虫机制拦截

调整：如果你了解风险，可以在脚本开头修改CONFIG中的参数：

1const CONFIG = {
2  SMALL_DELAY_MS: [400, 1000],    // 缩短小延迟
3  LARGE_REST_MS: 5000,            // 缩短大休息时间
4  MAX_SCROLL_STEPS: 20,           // 减少滚动步数
5  // 其他参数...
6};

6.4 下载的文件太多

解决：修改CONFIG.SAVE_BATCH参数，例如设置为1000或2000，每批抓取更多评论再保存

七、使用注意事项

遵守网站规则：抓取评论可能违反网站使用条款，请确保你有权收集和使用这些数据
合理使用：不要过于频繁地运行脚本，避免给服务器造成过大压力
隐私保护：评论中可能包含个人信息，处理数据时请注意保护隐私
法律风险：在某些地区，未经许可抓取网络数据可能违法，请确保你的使用符合当地法律

八、高级使用技巧

8.1 自动启动

如果你希望脚本在页面加载后自动开始工作（无需手动输入命令），可以修改脚本中的配置：

1const CONFIG = {
2  // 其他配置...
3  AUTO_START: true,  // 将此设为true
4  // 其他配置...
5};

8.2 调整抓取参数

根据网络速度和视频评论量，你可能需要调整以下参数：

SAVE_BATCH: 每批保存的评论数量
SMALL_DELAY_MS: 小延迟范围（模拟人类操作）
LARGE_REST_MS: 大休息时长（避免被识别为机器人）
MAX_SCROLL_STEPS: 每次滚动到底部分成的步数

8.3 手动导出数据

在任何时候，你都可以手动导出已抓取的数据，而不必等到停止脚本：

1window.runTencentScraper('export')

这将生成一个包含所有已抓取评论的txt文件。