Paper Reading summary for Distance Metric Learning for Large Margin Nearest Neighbor Classification
Link: LMNN
This paper presents a method of learning a Mahanalobis distance metric for K-nearest neighbour classification. Mahanalobis distance is the distance bewteen a point P and a distribution B. It assumes the dataset has a large margin between different classes.
kNN is a well-known classification method which is built on distance metrics. The distance between samples represents the similarity of the classes. Euclidean distance metrics is a popular metric used in kNN but dose not adapt to the problem being solved.
So, this paper shows us how to learn a Mahanalobis distance metrics. It requires no modification for problems in multiway and has no explicit dependence on the number of classes. It introduces a concept of target neighbors which is k other inputs with the same label that we wish to have minimal distance to sample x. Without any prior knowledge, it can be the k nearest Euclidean distance neighbor. The cost function of the model has two other terms besides the simple squared distance, the first term penalizes large distances between each input and its target neighbors, while the second term penalizes small distance between each input and all other inputs with different labels.
The model can be reformulated into an semidefinite programming (SDP) which is a convex optimization problem that can be solved efficiently.
LMNN classification works best with hundreds or thousands of classes,
该论文提出了一种针对K最近邻分类的Mahanalobis距离度量学习方法。Mahanalobis距离是点P与分布B之间的距离,假设数据集在不同类别之间有大的边缘。kNN是一种基于距离度量的知名分类方法,但欧氏距离度量并不适应所解决的问题。因此,论文展示了如何学习Mahanalobis距离度量,该方法适用于多类问题且不依赖于类别数量。模型引入了目标邻居的概念,即希望与样本x保持最小距离的k个具有相同标签的输入。在没有先验知识的情况下,它们可以是k个最近的欧氏距离邻居。模型的成本函数除了简单的平方距离外,还包括两个项,一项惩罚每个输入与其目标邻居之间的大距离,另一项惩罚每个输入与其他类别所有输入的小距离。该模型可以转化为半正定规划(SDP)问题,这是一个可以高效求解的凸优化问题。
2247

被折叠的 条评论
为什么被折叠?



