PU learning
postive learning ,仅有正样本情况下的学习。其应用范围:
- Retrieval(检索)
- Inlier-based outlier detection.
- One-vs-rest classification. 负样本太过分散而不能标注
目前有两种解决方法
- 启发式地从未标注样本里找到可靠的负样本,以此训练二分类器,该方法问题是分类效果严重依赖先验知识。
- 将未标注样本作为负样本训练分类器,由于负样本中含有正样本,错误的标签指定导致分类错误。
理论分析
如果未标注样本里class prior已知的话,PU learning问题可使用cose-sensitive learning方法解决,即原则上,PU 分类可以通过诸如加权支持向量机(weighted SVM)解决。
具体点,对于未标注数据:
PX=πP1+(1−π)P−1P_X = \pi P_1 + (1-\pi)P_{-1}PX=πP1+(1−π)P−1
其中π\piπ为未知的class prior。
Cost-sensitive classification的权重误分率期望:
R(f):=πR1(f)+(1−π)R−1(f)=πR1(f≠1)+(1−π)R−1(f≠−1)(1)R(f): = \pi {R_1}(f) + (1 - \pi ){R_{ - 1}}(f)\\
= \pi {R_1}(f \ne 1) + (1 - \pi ){R_{ - 1}}(f \ne - 1) (1)R(f):=πR1(f)+(1−π)R−1(f)=πR1(f̸=1)+(1−π)R−1(f̸=−1)(1)
其中R代表误分率,对于PU learning, 同样有上述形式,但是R−1(f)R_{-1}(f)R−1(f)需要通过RX(f)R_X(f)RX(f)转换变形得到,
KaTeX parse error: Expected 'EOF', got '\eqalign' at position 1: \̲e̲q̲a̲l̲i̲g̲n̲{
& {R_X}(f) = …
这里其实有X划分到应为-1,如果1则是误分;将上式子(2)代入(1)中有,
R(f)=πR1(f)+(1−π)R−1(f)=πR1(f)+RX(f)−πP1(f(X)=1)=πR1(f)+RX(f)−π(1−R1(f))=2πR1(f)+RX(f)−π(3)R(f) = \pi {R_1}(f) + (1 - \pi ){R_{ - 1}}(f)\\
=\pi {R_1}(f) +R_X(f)-\pi P_1(f(X) = 1)\\
=\pi {R_1}(f) +R_X(f)-\pi(1-R_1(f))\\
=2\pi R_1(f)+R_X(f)-\pi (3)R(f)=πR1(f)+(1−π)R−1(f)=πR1(f)+RX(f)−πP1(f(X)=1)=πR1(f)+RX(f)−π(1−R1(f))=2πR1(f)+RX(f)−π(3)
由于R1(f)=E1[ℓH(g(X))]Rx(f)=πE1[ℓH(−g(x))]+(1−π)E−1[ℓH(−g(X))]
{R_1}(f) = \mathbb E_{_1}[{\ell _H}(g(X))]
{R_x}(f) = \pi \mathbb E_{1}[{\ell _H}( - g(x))] + (1 - \pi ){\mathbb E_{ - 1}}[{\ell _H}( - g(X))] R1(f)=E1[ℓH(g(X))]Rx(f)=πE1[ℓH(−g(x))]+(1−π)E−1[ℓH(−g(X))]
代入上有
JPU−H(g)=πE1[ℓH(g(X))]+(1−π)E−1[ℓH(−g(X))]+πE1[ℓH(g(X))+ℓH(−g(X))]−πJ_{PU-H}(g)=\pi\mathbb E_1[\ell_H(g(X))]+(1-\pi)\mathbb E_{-1}[\ell_H(-g(X))]+\pi\mathbb E_1[\ell_H(g(X))+\ell_H(-g(X))]-\piJPU−H(g)=πE1[ℓH(g(X))]+(1−π)E−1[ℓH(−g(X))]+πE1[ℓH(g(X))+ℓH(−g(X))]−π
另一种形式即
R^pu(g)=2πpR^p+(g)+R^u−(g)−πp
{{\hat R}_{pu}}(g) = 2{\pi _p}\hat R_p^ + (g) + {{\hat R}_u}^ - (g) - {\pi _p} R^pu(g)=2πpR^p+(g)+R^u−(g)−πp
其中:$ \hat R_p^ + (g) = (1/{n_p})\mathop \sum \limits_{i = 1}^{{n_p}} \ell (g(x_i^p), + 1) $和 $ \hat R_n^ - (g) = (1/{n_p})\mathop \sum \limits_{i = 1}^{{n_n}} \ell (g(x_i^n), - 1)$
前两项为普通误差项,后面多多余惩罚项,因此由于多余项的存在,可能无法最小化JPU−H(g)J_{PU-H}(g)JPU−H(g),当且仅当
ℓH(g(X))+ℓH(−g(X))\ell_H(g(X))+\ell_H(-g(X))ℓH(g(X))+ℓH(−g(X))为常数时,能获得最优解
<img src = leanote://file/getImage?fileId=5bbdf300db24ba219b000005) width = 80%>
因此关键是采样合适的loss function.另外,对于PU learning, 在(2)中求取R−1R_{-1}R−1项时,RX(f)−πP1(f(X)=1)R_X(f)-\pi P_1(f(X) = 1)RX(f)−πP1(f(X)=1)可能为负,因此将该项改为max{0,RX(f)−πP1(f(X)=1)}max\{0, R_X(f)-\pi P_1(f(X) = 1)\}max{0,RX(f)−πP1(f(X)=1)}防止过拟合。
实验
-
将未标注数据全部作为negative样本训练随机森林
-
随机选取与positive等量negative 训练分类并对剩余样本
预测,重复多次,将概率平均
-
PU learning
其他
- 采用非线性模型时,可将unlabeled采样为若干份,每份大小与positive类似,然后直接训练多个模型,将得到的概率平均即得,目前该方法在无先验知识分类时最好。
最近看了一些PU learning的东西,总结一下,不对之处敬请指正!
Reference
- du Plessis, M. C., Niu, G. & Sugiyama, M. Analysis of Learning from Positive and Unlabeled Data. Advances in Neural Information Processing Systems 27 703–711 (2014).
- Convex Formulation for Learning from Positive and Unlabeled Data, ICML, 2015.
- 1.Kiryo, R. & Niu, G. Positive-Unlabeled Learning with Non-Negative Risk Estimator. NIPS 11 (2017).