Weka中的Correlation based Feature Selection(特征选择)方法简介

本文介绍了Correlation-based Feature Selection (CFS)方法的基本原理及其在机器学习中的应用。CFS通过评估每个特征的预测能力和特征间的冗余度来选择最佳特征子集,偏好与类别高度相关且内部相关性低的特征组合。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

最近看了几篇文章,在机器学习过程中,特征选择方法都用的是Correlation based Feature Selection (CFS),我之前对这个Feature Selection的方法实在不了解,今天简单看了一下。具体而言,实际上就是Explorer界面中“Select Attributes”中的第一个方法“CfsSubsetEval”,具体介绍如下:

========================================

根据属性子集中每一个特征的预测能力以及它们之间的关联性进行评估,单个特征预测能力强且特征子集内的相关性低的子集表现好。

Evaluates the worth of a subset of attributes by considering the individual predictive ability of each feature along with the degree of redundancy between them.Subsets of features that are highly correlated with the class while having low intercorrelation are preferred.

For more information see:

M. A. Hall (1998). Correlation-based Feature Subset Selection for Machine Learning. Hamilton, New Zealand.

这篇文章竟然还是一篇博士论文,链接:

http://www.cs.waikato.ac.nz/~mhall/thesis.pdf

=======================================

这个网页中,有对所有特征选择方法比较详细的介绍:http://www.cnblogs.com/lutaitou/p/5818027.html 就简单总结介绍这么多。

n many data analysis tasks, one is often confronted with very high dimensional data. Feature selection techniques are designed to find the relevant feature subset of the original features which can facilitate clustering, classification and retrieval. The feature selection problem is essentially a combinatorial optimization problem which is computationally expensive. Traditional feature selection methods address this issue by selecting the top ranked features based on certain scores computed independently for each feature. These approaches neglect the possible correlation between different features and thus can not produce an optimal feature subset. Inspired from the recent developments on manifold learning and L1-regularized models for subset selection, we propose here a new approach, called {\em Multi-Cluster/Class Feature Selection} (MCFS), for feature selection. Specifically, we select those features such that the multi-cluster/class structure of the data can be best preserved. The corresponding optimization problem can be efficiently solved since it only involves a sparse eigen-problem and a L1-regularized least squares problem. It is important to note that MCFS can be applied in superised, unsupervised and semi-supervised cases. If you find these algoirthms useful, we appreciate it very much if you can cite our following works: Papers Deng Cai, Chiyuan Zhang, Xiaofei He, "Unsupervised Feature Selection for Multi-cluster Data", 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'10), July 2010. Bibtex source Xiaofei He, Deng Cai, and Partha Niyogi, "Laplacian Score for Feature Selection", Advances in Neural Information Processing Systems 18 (NIPS'05), Vancouver, Canada, 2005 Bibtex source
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值