1、build :validateConfiguration()
2、launch:launch()
new Thread start ,CrawlController.requestCrawlStart()
getFrontier().run();
3、pause:getCrawlController().requestCrawlPause()
4、unpause:getCrawlController().requestCrawlResume()
BdbFrontier.unpause()
BdbFrontier:A Frontier using several BerkeleyDB JE Databases to hold its record of known hosts (queues), and pending URIs.
sendCrawlStateChangeEvent(State.RUNNING, CrawlStatus.RUNNING);
CrawlController noteFrontierState INFO: Crawl running.
CrawlJob onApplicationEvent INFO: RUNNING 20121211155156
5、checkpoint:getCheckpointService().requestCrawlCheckpoint()
6、terminate:terminate()
7、teardown :teardown()
2、launch:launch()
new Thread start ,CrawlController.requestCrawlStart()
getFrontier().run();
3、pause:getCrawlController().requestCrawlPause()
4、unpause:getCrawlController().requestCrawlResume()
BdbFrontier.unpause()
BdbFrontier:A Frontier using several BerkeleyDB JE Databases to hold its record of known hosts (queues), and pending URIs.
sendCrawlStateChangeEvent(State.RUNNING, CrawlStatus.RUNNING);
CrawlController noteFrontierState INFO: Crawl running.
CrawlJob onApplicationEvent INFO: RUNNING 20121211155156
5、checkpoint:getCheckpointService().requestCrawlCheckpoint()
6、terminate:terminate()
7、teardown :teardown()
本文详细解析了爬虫框架中的关键组件及其作用,包括构建验证配置、启动、暂停、恢复、检查点和终止操作。文章深入探讨了前端控制器如何请求开始爬取、前线如何运行以及如何通过调用相关服务来实现爬虫状态的切换和持久化。通过理解这些流程,开发者可以更有效地管理和控制爬虫任务。

被折叠的 条评论
为什么被折叠?



