什么是特征选择?
定义
举例说明特征选择的定义
特征选择的方法
相关系数——特征与特征之间的相关程度
在sklearn中如何实现特征选择
过滤式
低方差特征过滤
什么是低方差特征过滤
如何在sklearn中实现低方差特征过滤
代码演示
原始数据:
代码:
# -*- coding: utf-8 -*-
"""
@Time : 2021/3/8 19:48
@Author : yuhui
@Email : 3476237164@qq.com
@FileName: 15_删除低方差特征与相关系数.py
@Software: PyCharm
"""
import pandas as pd
from sklearn.feature_selection import VarianceThreshold
def low_variance_feature_filtering():
"""低方差特征过滤"""
# 获取数据
data=pd.read_csv("../data/factor_returns.csv")
data=data.iloc[:,1:-2]
# print(data)
# 实例化一个转换器类
transfer=VarianceThreshold(threshold=10)
# 调用方法
data_new=transfer.fit_transform(data)
# 查看结果
print(data_new.shape)
if __name__ == '__main__':
# low_variance_feature_filtering()
运行结果:
D:\Anaconda3\Installation\envs\math\python.exe D:/Machine_Learning/Machine_Learning_1/code/15_删除低方差特征与相关系数.py
(2318, 7)
Process finished with exit code 0
补充:
- 我的文件结构
- 原始文件中的内容
index,pe_ratio,pb_ratio,market_cap,return_on_asset_net_profit,du_return_on_equity,ev,earnings_per_share,revenue,total_expense,date,return
0,000001.XSHE,5.9572,1.1818,85252550922.0,0.8008,14.9403,1211444855670.0,2.01,20701401000.0,10882540000.0,2012-01-31,0.027657228229937388
1,000002.XSHE,7.0289,1.588,84113358168.0,1.6463,7.8656,300252061695.0,0.326,29308369223.2,23783476901.2,2012-01-31,0.08235182370820669
2,000008.XSHE,-262.7461,7.0003,517045520.0,-0.5678,-0.5943,770517752.56,-0.006,11679829.03,12030080.04,2012-01-31,0.09978900335112327
3,000060.XSHE,16.476,3.7146,19680455995.0,5.6036,14.617,28009159184.6,0.35,9189386877.65,7935542726.05,2012-01-31,0.12159482758620697
4,000069.XSHE,12.5878,2.5616,41727214853.0,2.8729,10.9097,81247380359.0,0.271,8951453490.28,7091397989.13,2012-01-31,-0.0026808154146886697
5,000100.XSHE,10.796,1.5219999999999998,17206724233.0,2.245,7.7394,66034033386.1,0.0974,43883757748.0,43092263405.0,2012-01-31,0.13795588072275808
6,000402.XSHE,8.1032,1.0078,