[b]1、CrawlMetadata[/b]: including identification of crawler/operator
[b]org.archive.modules.CrawlMetadata[/b]: Basic crawl metadata, as consulted by functional modules and recorded in ARCs/WARCs.
org.archive.modules.seeds.TextSeedModule
org.archive.modules.deciderules.DecideRuleSequence
org.archive.modules.CandidateChain
org.archive.modules.FetchChain
org.archive.modules.DispositionChain
org.archive.crawler.framework.CrawlController
org.archive.crawler.frontier.BdbFrontier
org.archive.crawler.util.BdbUriUniqFilter
forceRetire
smallBudget
veryPolite
highPrecedence
<!-- OPTIONAL BUT RECOMMENDED BEANS -->
actionDirectory
crawlLimiter
checkpointService
statisticsTracker
loggerModule
sheetOverlaysManager
cookieStorage
serverCache
configPathConfigurer
[b]org.archive.modules.CrawlMetadata[/b]: Basic crawl metadata, as consulted by functional modules and recorded in ARCs/WARCs.
org.archive.modules.seeds.TextSeedModule
org.archive.modules.deciderules.DecideRuleSequence
org.archive.modules.CandidateChain
org.archive.modules.FetchChain
org.archive.modules.DispositionChain
org.archive.crawler.framework.CrawlController
org.archive.crawler.frontier.BdbFrontier
org.archive.crawler.util.BdbUriUniqFilter
forceRetire
smallBudget
veryPolite
highPrecedence
<!-- OPTIONAL BUT RECOMMENDED BEANS -->
actionDirectory
crawlLimiter
checkpointService
statisticsTracker
loggerModule
sheetOverlaysManager
cookieStorage
serverCache
configPathConfigurer
本文深入探讨了CrawlMetadata及其在org.archive.modules框架下的应用,包括TextSeedModule、DecideRuleSequence、CandidateChain、FetchChain、DispositionChain等关键组件的作用及配置。

被折叠的 条评论
为什么被折叠?



