Yan Guo, Yu Wang, Guodong Ding, Donglin Cao, Gang Zhang and Yi Lv. Juicer: Scalable Extraction for Thread Meta-information of Web Forum. Proceedings of Pacific Asia Workshop on Intelligence and Security Informatics (PAISI) 2009. Intelligence and Security Informatics, Lecture Notes in Computer Science (LNCS 5477) : 143-148.
Abstract : In Web forum, thread meta-information contained in list-of-thread of board page provide fundamental data for the further forum mining. This paper describes a complete system named Juicer which was developed as a subsystem for an industrial application that involves forum mining. The task of Juicer is to extract thread meta-information from board pages of a great many of large scale online Web forums, which implies that scalable extraction is required with high accuracy and speed, and minimal user effort for maintenance. Among so many existed approaches about information extraction, we can not find any approach to fully satisfy the requirements, so we present simple scalable extraction approach behind Juicer to achieve the goal. Juicer is constituted by four modules: Template generation, Specifying labeling setting, Automatic extraction, Label assignment. Both experiments and practice show that Juicer successfully satisfied the requirements.
Juicer是一款专为大规模在线论坛设计的元信息抽取系统,用于从网页中高效准确地提取主题元数据。该系统由模板生成、标签设置、自动抽取及标签分配四个模块构成,旨在减少人工维护成本的同时提高抽取速度与准确性。
706

被折叠的 条评论
为什么被折叠?



