论文笔记-LSHTC: A Benchmark for Large-Scale Text Classification-2015

关于LSHTC更多介绍见官网

title

LSHTC: A Benchmark for Large-Scale Text
Classification

abstract

LSHTC is a series of challenges which aims to assess the performance
of classification systems in large-scale classification in a a large number of
classes (up to hundreds of thousands). This paper describes the dataset
that have been released along the LSHTC series. The paper details the
construction of the datsets and the design of the tracks as well as the
evaluation measures that we implemented and a quick overview of the
results. All of these datasets are available online and runs may still be
submitted on the online server of the challenges.

dataset

在这里插入图片描述
1 http://www.bioasq.org
2 http://www.image-net.org/challenges/LSVRC/2014/
3 http://research.microsoft.com/en-us/um/people/manik/events/xc13/
4 http://lshtc.iit.demokritos.gr/WSDM_WS
5 http://lshtc.iit.demokritos.gr/
6 http://dbpedia.org/About
7 http://www.dmoz.org/

LSHTC数据集介绍

LSHTC1

在这里插入图片描述
[外链图片转存失败(img-9BwPJFpt-1563424210678)(leanote://file/getImage?fileId=5d25a48fab64413ee900660d)]

The tracks of the first year of the challenge were based on the DMOZ dataset
(tree hierarchy) using only single-label instances. The challenge was split into
4 tracks which were composed by different combinations between Content and
Description vectors. Since both types of vectors were used in this challenge only
the intersection of the two sets of instances were used for this challenge (we used
only instances which had both a Content and Description vector).
【挑战第一年的轨道基于DMOZ数据集(树层次结构)仅使用单标签实例。 挑战分为4个轨道由内容和内容之间的不同组合组成
描述向量。 由于这两种类型的载体仅用于此挑战两组实例的交集用于此挑战(我们使用过只有同时具有内容和描述矢量的实例)。】

LSHTC2

[外链图片转存失败(img-RqjKRcGu-1563424210678)(leanote://file/getImage?fileId=5d25a48fab64413ee900660f)]

During LSHTC2, we used multi-label instances and added non-tree hierarchies.
Instead of using, for DMOZ, the intersection between the instances of Content
and Description vector, we decided to keep one of them. We kept the Content
vectors, since they did not require a human annotator in order be created. Since
we decided to move to multi-label classification, we used all the Content vectors
that we had.
【在LSHTC2期间,我们使用了多标签实例并添加了非树层次结构。对于DMOZ,我们决定保留其中一个实例,而不是使用内容和描述向量的实例之间的交集。 我们保留了内容向量,因为它们不需要创建人类注释器。 由于我们决定采用多标签分类,因此我们使用了所有内容向量】

LSHTC3 & LSHTC4

The two DBpedia datasets were also used, as Track 1, during the third iteration of the LSHTC challenges (LSHTC3) The only addition was regarding the
Medium DBpedia dataset, were we also provided the original text of the instances, without beeing pre-processed. During LSHTC 4, only the Large DBpedia dataset was used for the first track called \Very Large Supervised Learning",
which was evaluated at Kaggle[http://www.kaggle.com/].
【在LSHTC挑战的第三次迭代期间,两个DBpedia数据集也被用作轨道1(LSHTC3)唯一的补充是关于中DBMB数据集,我们还提供了实例的原始文本,而没有预先处理。 在LSHTC 4期间,只有大型DBpedia数据集被用于第一个名为“非常大的监督学习”的轨道,该轨道在Kaggle进行了评估。】

评估方法

During the classification tracks of all LSHTC challenges, we used two types of
measures in order to evaluate the participating systems, flat and hierarchical.
[外链图片转存失败(img-6Fz4pRxq-1563424210679)(leanote://file/getImage?fileId=5d25bfacab64413ee9006b41)]

最好结果:
[外链图片转存失败(img-nTAUWEg7-1563424210679)(leanote://file/getImage?fileId=5d25bfacab64413ee9006b42)]

References(论文提到的算法的论文)

[1] Christophe Brouard. Echo at the lshtc pascal challenge 2. PASCAL Workshop on Large-Scale Hierarchical Classification, ECML/PKDD 2011, pages 49-57, 2011.
[2] Xiaogang Han, Shaohua Li, and Zhiqi Shen. A k-nn method for large scale
hierarchical text classification at lshtc3. Discovery Challenge Workshop on
Large Scale Hierarchical Classification, ECML/PKDD 2012, 2012.
[3] Aris Kosmopoulos, Ioannis Partalas, Eric Gaussier, Georgios Paliouras, and
Ion Androutsopoulos. Evaluation measures for hierarchical classification: a
unified view and novel approaches. Data Mining and Knowledge Discovery,
pages 1{46, 2014.
[4] Dong-Hyun Lee. Multi-stage rocchio classification for large-scale multilabeled text data. Discovery Challenge Workshop on Large Scale Hierarchical Classification, ECML/PKDD 2012, 2012.
[5] Xiao lin Wang, Hai Zhao, and Bao-Liang Lu. A meta-top-down method
for large-scale hierarchical classification. Knowledge and Data Engineering,
IEEE Transactions on, 26(3):500{513, March 2014.
[6] Omid Madani and Jian Huang. Large-scale many-class prediction via flat techniques. In Large-Scale Hierarchical Classification Workshop of ECIR,2010.
[7] Youdong Miao and Xipeng Qiu. Hierarchical centroid-based classifier for large scale text classification. Large Scale Hierarchical Text classification(LSHTC) Pascal Challenge, 18, 2009.
[8] Antti Puurula and Albert Bifet. Ensembles of sparse multinomial classifiers
for scalable text classification. Discovery Challenge Workshop on Large
Scale Hierarchical Classification, ECML/PKDD 2012, 2012.
[9] Yutaka Sasaki and Davy Weissenbacher. Tti’s system for the lshtc3 challenge. Discovery Challenge Workshop on Large Scale Hierarchical Classification, ECML/PKDD 2012, 2012.
[10] Grigorios Tsoumakas and Ioannis Vlahavas. Random k-labelsets: An ensemble method for multilabel classification. In Machine Learning: ECML
2007, volume 4701 of Lecture Notes in Computer Science, pages 406{417.
2007.
[11] Xiao-Lin Wang, Hai Zhao, and Bao-Liang Lu. Enhance k-nearest neighbour algorithm for large-scale multi-labeled hierarchical classification. PASCAL Workshop on Large-Scale Hierarchical Classification, ECML/PKDD 2011,pages 58{67, 2011.
[12] Yiming Yang and Xin Liu. A re-examination of text categorization methods. In Proceedings of the 22Nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’99,
pages 42{49. ACM Press, 1999.

<think>好的,我现在需要解决用户遇到的链接错误问题:“/usr/bin/ld:找不到 -lbenchmark::benchmark_main”。用户之前已经询问过关于GoogleBenchmark库的安装和使用,现在在编译时遇到了链接器的问题。我需要分析可能的原因并提供解决方案。首先,用户提到的错误是链接器找不到指定的库。错误信息中的“-lbenchmark::benchmark_main”看起来有点奇怪。通常,-l后面跟的是库名,比如-lbenchmark,而“::”可能是不正确的用法。可能用户在使用CMake或其他构建工具时,错误地指定了库的名称,或者在命令行编译时写错了参数。接下来,我需要考虑常见的导致此错误的原因:1.库未正确安装:用户可能没有正确安装Google Benchmark库,或者安装路径不在链接器的搜索路径中。 2.编译命令错误:用户可能在命令行中使用了错误的库名称,比如将target_link_libraries中的目标名称直接用在命令行中,导致链接器无法识别。3.CMake配置问题:如果用户使用CMake,可能配置不正确,比如没有正确找到库或链接目标。然后,我需要分步骤提供解决方案:-验证库是否安装正确,比如检查libbenchmark.a或.so文件是否存在。 -检查编译命令是否正确,特别是-l后面的库名是否正确,应该是-lbenchmark,而不是包含命名空间的名称。 -提供正确的编译命令示例,比如g++命令需要正确的链接参数。 -检查CMake配置,确保正确使用find_package和target_link_libraries。 -确保链接了必要的依赖库,比如pthread,因为GoogleBenchmark依赖于线程库。此外,用户可能混淆了库的名称和目标名称。在CMake中,使用benchmark::benchmark来链接,但在命令行中,应使用-lbenchmark。需要解释清楚两者的区别,避免用户错误地使用命名空间格式在命令行中。 最后,需要给出具体的示例,比如正确的编译命令和CMake配置,并解释每个步骤的作用,帮助用户理解问题所在。同时,提醒用户检查安装路径和系统架构是否匹配,比如64位和32位库的冲突。可能还需要考虑其他因素,比如动态库路径是否在LD_LIBRARY_PATH中,或者是否缺少其他依赖库。但根据常见情况,首先检查库安装和编译命令的正确性更为关键。</think>这个链接错误表明编译器无法找到Google Benchmark库。以下是详细的解决方案: 1. **验证库安装状态**: ```bash # 检查库文件是否存在 ls /usr/lib/libbenchmark* || ls /usr/local/lib/libbenchmark* # 检查头文件 ls /usr/include/benchmark/ || ls /usr/local/include/benchmark/ ``` 2. **正确的编译命令**: ```bash # 使用标准链接参数 g++ your_benchmark.cpp -std=c++11 -lbenchmark -lpthread -o benchmark_app ``` 3. **CMake配置修正**: ```cmake cmake_minimum_required(VERSION 3.10) project(BenchmarkDemo) find_package(benchmark REQUIRED) add_executable(benchmark_app main.cpp) target_link_libraries(benchmark_app PRIVATE benchmark::benchmark) ``` 4. **常见错误模式分析**: - ❌ 错误写法:`-lbenchmark::benchmark_main` - ✅ 正确写法:`-lbenchmark` 5. **完整编译流程示例**: ```bash # 清理旧构建 rm -rf build && mkdir build && cd build # 生成构建系统 cmake -DCMAKE_BUILD_TYPE=Release .. # 编译并显示详细链接信息 make VERBOSE=1 ``` **错误原因解析**: 1. `::`符号是CMake的目标命名空间语法,不能直接用于命令行参数 2. 可能缺少必要的链接库(如pthread) 3. 库文件未安装在标准路径且未指定-L参数 4. 32位/64位库版本冲突 **附加检查方法**: ```bash # 查看链接器搜索路径 ldconfig -v | grep benchmark # 验证库架构 file /usr/lib/x86_64-linux-gnu/libbenchmark.so ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

lingpy

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值