一篇经典综述,scholar.google.cn上显示该文被引用超过300次
Laender, A. H. F.; Ribeiro-Neto, B. A.; da Silva, A. S. & Teixeira, J. S. A brief survey of web data extraction tools. SIGMOD Rec., ACM, 2002, 31, 84-93
Abstract:In the last few years, several works in the literature have addressed the problem of data extraction from Web pages. The importance of this problem derives from the fact that, once extracted, the data can be handled in a way similar to instances of a traditional database. The approaches proposed in the literature to address the problem of Web data extraction use techniques borrowed from areas such as natural language processing, languages and grammars, machine learning, information retrieval, databases, and ontologies. As a consequence, they present very distinct features and capabilities which make a direct comparison difficult to be done. In this paper, we propose a taxonomy for characterizing Web data extraction fools, briefly survey major Web data extraction tools described in the literature, and provide a qualitative analysis of them. Hopefully, this work will stimulate other studies aimed at a more comprehensive analysis of data extraction approaches and tools for Web data.
Laender, A. H. F.; Ribeiro-Neto, B. A.; da Silva, A. S. & Teixeira, J. S. A brief survey of web data extraction tools. SIGMOD Rec., ACM, 2002, 31, 84-93
Abstract:In the last few years, several works in the literature have addressed the problem of data extraction from Web pages. The importance of this problem derives from the fact that, once extracted, the data can be handled in a way similar to instances of a traditional database. The approaches proposed in the literature to address the problem of Web data extraction use techniques borrowed from areas such as natural language processing, languages and grammars, machine learning, information retrieval, databases, and ontologies. As a consequence, they present very distinct features and capabilities which make a direct comparison difficult to be done. In this paper, we propose a taxonomy for characterizing Web data extraction fools, briefly survey major Web data extraction tools described in the literature, and provide a qualitative analysis of them. Hopefully, this work will stimulate other studies aimed at a more comprehensive analysis of data extraction approaches and tools for Web data.
本文综述了近年来多种网页数据抓取工具的特点及应用,涵盖了自然语言处理、机器学习等多个信息技术领域的技术手段,并对这些工具进行了分类比较。
3223

被折叠的 条评论
为什么被折叠?



