ToeThread.run()
ProcessorChain.prcess(CrawlURI curi, ChainStatusReceiver thread)
Processor.process(CrawlURI curi)
Scoper.isInScope(CrawlURI caUri)
//foreach getRules()
DecideResult r = rule.decisionFor(uri);
//inner decisionFor method,
DecideResult result = innerDecide(uri);
//last decisiveRule not none is Effective
result = r;
decisiveRule = rule;
decisiveRuleNumber = i;
ProcessorChain.prcess(CrawlURI curi, ChainStatusReceiver thread)
Processor.process(CrawlURI curi)
Scoper.isInScope(CrawlURI caUri)
//foreach getRules()
DecideResult r = rule.decisionFor(uri);
//inner decisionFor method,
DecideResult result = innerDecide(uri);
//last decisiveRule not none is Effective
result = r;
decisiveRule = rule;
decisiveRuleNumber = i;
本文介绍了爬虫系统中决定网页抓取范围的具体流程。通过ToeThread.run()启动进程,使用Scoper.isInScope()判断CrawlURI是否符合抓取条件。针对每个规则,通过rule.decisionFor(uri)进行决策,并确定最终的有效规则。
7562

被折叠的 条评论
为什么被折叠?



