- 读取CSV文件,拿到特征名字
data = pd.read_csv('train.csv')
feature_names = data.columns[:-1] #特征名,最后一列是标签
- 定义一个决策树或者加载以训练的决策树
- 输出特征重要性
print('Features sorted by their score:')
# print(clf.feature_importances_) # 输出这个就可以得到特征重要性,但是只有数值,不具有可读性
print(sorted(zip(feature_names, map(lambda x:round(x,4), clf.feature_importances_)),key=lambda x: x[1],reverse=True))
示例:
Features sorted by their score:
[ (‘duration’, 0.4400), (‘bytes_out’, 0.2105), (‘issuer_fields’, 0.1228), (‘subject_fields’, 0.1200), (‘num_pkts_out’, 0.1067)]