RankLib参数-metric2t和-metric2T不能同时生效

发现问题

RankLib参数-metric2t和-metric2T不能同时生效

官方sample

近期学习使用RankLib,按照官网教程的sample做了一下,发现不太对

java -jar bin/RankLib.jar -train MQ2008/Fold1/train.txt -test MQ2008/Fold1/test.txt -validate MQ2008/Fold1/vali.txt -ranker 6 -metric2t NDCG@10 -metric2T ERR@10 -save mymodel.txt

运行结果

以下是运行结果:

gwlin@bjs1:~/extend/dev/ranklib-proj/trunk$ java -train MQ2008/Fold1/train.txt -test MQ2008/Fold1/test.txt -validate MQ2008/Fold1/vali.txt -ranker 6 -metric2t NDCG@10 -metric2T ERR@10 -save model/ar_mymodel.txt

Discard orig. features
Training data:  MQ2008/Fold1/train.txt
Test data:      MQ2008/Fold1/test.txt
Validation data:        MQ2008/Fold1/vali.txt
Feature vector representation: Dense.
Ranking method: LambdaMART
Feature description file:       Unspecified. All features will be used.
Train metric:   ERR@10   ### 请注意这行,按照命令行应该是 NDCG@10
Test metric:    ERR@10
Highest relevance label (to compute ERR): 4
Feature normalization: No
Model file: model/ar_mymodel.txt

[+] LambdaMART's Parameters:
No. of trees: 1000
No. of leaves: 10
No. of threshold candidates: 256
Min leaf support: 1
Learning rate: 0.1
Stop early: 100 rounds without performance gain on validation data

Reading feature file [MQ2008/Fold1/train.txt]... [Done.]
(471 ranked lists, 9630 entries read)
Reading feature file [MQ2008/Fold1/vali.txt]... [Done.]
(157 ranked lists, 2707 entries read)
Reading feature file [MQ2008/Fold1/test.txt]... [Done.]
(156 ranked lists, 2874 entries read)
Initializing... [Done]
---------------------------------
Training starts...
---------------------------------
#iter   | ERR@10-T  | ERR@10-V  |
---------------------------------
1       | 0.0921    | 0.0897    |
2       | 0.0949    | 0.0944    |
3       | 0.0947    | 0.0956    |
4       | 0.0949    | 0.0953    |
5       | 0.0947    | 0.0971    |
6       | 0.0951    | 0.0964    |
......
138     | 0.1222    | 0.0941    | 
---------------------------------
Finished sucessfully.
ERR@10 on training data: 0.1062
ERR@10 on validation data: 0.0999
---------------------------------
ERR@10 on test data: 0.0979

Model saved to: model/ar_mymodel.txt

分析问题

官方参数说明

按道理不应该呀,看参数说明

[ -metric2t <metric> ]  Metric to optimize on the training data. Supported: MAP, NDCG@k, DCG@k, P@k, RR@k, ERR@k (default=ERR@10)
[ -metric2T <metric> ]  Metric to evaluate on the test data (default to the same as specified for -metric2t)

上源码

还是觉得这样设置没有问题,逼不得已,看源码!
源码来自:

svn checkout svn://svn.code.sf.net/p/lemur/code/RankLib/trunk

主文件 trunk/src/ciir/umass/edu/eval/Evaluator.java,其中和“-metric2t ”“-metric2T”相关的代码如下:

for(int i=0;i<args.length;i++)  // L222
{
    if (args[i].equalsIgnoreCase ("-train"))
        trainFile = args[++i];
    else if (args[i].equalsIgnoreCase ("-ranker"))
        rankerType = Integer.parseInt(args[++i]);
    else if (args[i].equalsIgnoreCase ("-feature"))
        featureDescriptionFile = args[++i];
    else if (args[i].equalsIgnoreCase ("-metric2t"))
        trainMetric = args[++i];
    else if (args[i].equalsIgnoreCase ("-metric2T"))
        testMetric = args[++i];
......
}
......
if(testMetric.compareTo("")==0) // L414
    testMetric = trainMetric;

System.out.println("");
System.out.println((keepOrigFeatures)?"Keep orig. features":"Discard orig. features");
Evaluator e = new Evaluator(rType2[rankerType], trainMetric, testMetric);
......
if(trainFile.compareTo("")!=0)  // L421
{
    ......
    System.out.println("Train metric:\t" + trainMetric); // L450
    System.out.println("Test metric:\t" + testMetric);
    ......
}
......

细心的朋友们大概已经发现问题了,没错,就是这个 args[i].equalsIgnoreCase ("-metric2t"), equalsIgnoreCase, 晕死呀!我还查了是不是调用了这个变量的函数,在函数里面改变了它的值,还调试了半天,以前是做c++的,都不知道怎么调试java,去临时学了一下jdb。。。

解决问题

equalsIgnoreCase 换成 equals 就ok了,亲测有效,啦啦啦,结果如下:

gwlin@bjs1:~/extend/dev/ranklib-proj/trunk$ java -jar bin/RankLib.jar -train MQ2008/Fold1/train.txt -test MQ2008/Fold1/test.txt -validate MQ2008/Fold1/vali.txt -ranker 6 -metric2t NDCG@10 -metric2T ERR@10 -save model/ar_mymodel.txt                                                                       

Discard orig. features
Training data:  MQ2008/Fold1/train.txt
Test data:      MQ2008/Fold1/test.txt
Validation data:        MQ2008/Fold1/vali.txt
Feature vector representation: Dense.
Ranking method: LambdaMART
Feature description file:       Unspecified. All features will be used.
Train metric:   NDCG@10
Test metric:    ERR@10
Highest relevance label (to compute ERR): 4
Feature normalization: No
Model file: model/ar_mymodel.txt

[+] LambdaMART's Parameters:
No. of trees: 1000
No. of leaves: 10
No. of threshold candidates: 256
Min leaf support: 1
Learning rate: 0.1
Stop early: 100 rounds without performance gain on validation data

Reading feature file [MQ2008/Fold1/train.txt]... [Done.]            
(471 ranked lists, 9630 entries read)
Reading feature file [MQ2008/Fold1/vali.txt]... [Done.]            
(157 ranked lists, 2707 entries read)
Reading feature file [MQ2008/Fold1/test.txt]... [Done.]            
(156 ranked lists, 2874 entries read)
Initializing... [Done]
---------------------------------
Training starts...
---------------------------------
#iter   | NDCG@10-T | NDCG@10-V | 
---------------------------------
1       | 0.4908    | 0.5351    | 
2       | 0.4949    | 0.5431    | 
3       | 0.4923    | 0.5476    | 
4       | 0.4927    | 0.5462    | 
5       | 0.4948    | 0.5455    | 
6       | 0.4927    | 0.5435    | 
......
135     | 0.6008    | 0.5435    | 
---------------------------------
Finished sucessfully.
NDCG@10 on training data: 0.5322
NDCG@10 on validation data: 0.5477
---------------------------------
ERR@10 on test data: 0.0983

大功告成!

收获

  • 学习了java在linux下的调试(哪天吃饱没事写一下博客,不过记了笔记,也不知道“哪天”是哪天了,简单记一下)
    • ant编译在build.xml的 javac 标签中添加 debug="true"
    • java 调试选项 java -Xdebug -Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=y -jar bin/RankLib.jar -train MQ2008/Fold1/train.txt -test MQ2008/Fold1/test.txt -validate MQ2008/Fold1/vali.txt -ranker 6 -metric2t NDCG@10 -metric2T ERR@10 -save model/ar_mymodel.txt
    • 客户端连接 jdb -attach 127.0.0.1:8787
    • 调试命令
  • 要细心,肉眼debug能力有待加强,看到equal直接就认为是判断是否相同,完全没有发现居然还有“ignore”,我天……眼残!
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值