机器学习的 ranking 技术——learning2rank,包括 pointwise、pairwise、listwise 三大类型。
【Ref-1】给出的:
<Point wise ranking 类似于回归>
Point wise ranking is analogous to regression. Each point has an associated rank score, and you want to predict that rank score. So your labeled data set will have a feature vector and associated rank score given a query
IE: {d1, r1} {d2, r2} {d3, r3} {d4, r4}
where r1 > r2 > r3 >r4
<Pairwise ranking 类似于分类>
Pairwise ranking is analogous to classification. Each data point is associated with another data point, and the goal is to learn a classifier which will predict which of the two is "more" relevant to a given query.
IE: {d1 > d2} {d2 > d3} {d3 > d4}
1、Pointwise Approach
1.1 特点
Pointwise 类方法,其 L2R 框架具有以下特征:
- 输入空间中样本是单个 doc(和对应 query)构成的特征向量;
- 输出空间中样本是单个 doc(和对应 query)的相关度;
- 假设空间中样本是打分函数;
- 损失函数评估单个 doc 的预测得分和真实得分之间差异。
这里讨论下,关于人工标注标签怎么转换到 pointwise 类方法的输出空间:
- 如果标注直接是相关度 s_j,则 doc x_j 的真实标签定义为 y_j=s_j
- 如果标注是 pairwise preference