A Novel Multi-label Classification Based on PCA and ML-KNN

提出一种结合PCA和ML-KNN的新型多标签分类算法(PCA-ML-KNN),通过PCA降低数据集维度并去除冗余,再使用ML-KNN进行分类。实验表明,该算法在Scene和Enron数据集上表现优于传统的ML-KNN。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >



ICIC Express Letters                          ICIC International 2010 ISSN 1881-803X

Volume4, Number5,October 2010                                                pp.1–6

 

 

A Novel Multi-label Classification Based on PCA and ML-KNN

 

Di Wu, Dapeng Zhang, Fengqin Yang, Xu Zhou and Tieli Sun*

School of Computer Science and Information Technology

Northeast Normal University

Changchun, 130117, P. R. China

suntl@nenu.edu.cn

 

ReceivedDecember 2010; accepted February 2011

 

Abstract.Multi-label Classification problems are omnipresent.ML-KNN is a multi-label lazy learning approach. The feature of high dimensionsand redundancy of the dataset is not considered by ML-KNN, so the classificationresult is hard to be improved further. Principal Component Analysis (PCA) is apopular and powerful technique for feature extraction and dimensionalityreduction. In this paper, a novel multi-label classification algorithm based onPCA and ML-KNN (named PCA-ML-KNN) is proposed. Experiments on two benchmarkdatasets for multi-label learning show that, PCA processes the dataset in anoptimized manner, eliminating the need of huge dataset for ML-KNN, andPCA-ML-KNN achieves better performance than ML-KNN.

Keywords:Multi-label classification, ML-KNN, Dimension reduction,Feature extraction, Principal Component Analysis (PCA)

 

1.Introduction.Multi-label classification is arousing more and more attention and is increasingly required by many applications in widefields, such as protein function classification, music categorization and semantic scene classification. During the past decade, several multi-label learning algorithms have been proposed, like the multi-label decision tree based learning algorithm [1,2] , the support vector machine based multi-labellearning algorithm [3], the ML-KNN algorithm [4,5], etc.. ML-KNN is derived from the traditional K-nearest neighbor (KNN) algorithm and is presented by Zhang and others. Several empirical studies demonstrated that the dataset for Multi-label classification is bulky, and has the characteristic of high dimensions and redundancy. These features pose a serious obstac1e to any attempt to extract pertinent information, thus make it difficult to improve the multi-label classification algorithms.

PCA is a technique of data analysis [6]. In fact it is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. The most important application of PCA isto simplify the original data. PCA can effectively identify the most important elements in the dataset, eliminate noise and redundancy. Another advantage ofPCA is that it has no parameter restrictions, and can be applied to variousfields.

In this paper, a novel multi-label classification algorithm based on PCA and ML-KNN is proposed for improving the classification performance. PCA is adopted to reduce dataset dimensionality and noise. This isthe first procedure for the classification. Then ML-KNN method is used for rest processing. To verify the effectiveness of PCA-ML-KNN, two datasets, e.g. Sceneand Enron are used, and the experiments report excellent performance.

......

*Corresponding author

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值