k-NN处理分类问题,即分类:多数投票原则,先将数据集处理成OneHot矩阵,计算测试文本与每一个训练文本的距离,如果k = 1,则找到最小的距离,则测试文本的分类对应于该训练文本的分类。在用C++实现时,通过好多次的debug,并且将最后生成的OneHot矩阵的每一行都输出到文本中去查看,再在OneHot矩阵每行测试文本尾中输出得到的类比结果,距离,分类。这里使用了欧式距离。
训练文本和测试文本形如:
documentId emotion words
1 5 sad mortar assault leav at least dead
2 4 joy goal delight for sheva
3 4 joy nigeria hostag fear dead is freed
4 3 fear bomber kill shopper
5 6 surprise veget not fruit slow brain declin
6 4 joy pm havana deal a good experi
7 4 joy kate is marri doherti
8 6 surprise nasa revisit life on mar question
9 4 joy happi birthdai ipod
10 4 joy alonso would be happi to retir with three titl
11 4 joy madonna s new tot happi at home in london
12 5 sad nicol kidman ask dad to help stop husband s drink