如何判断特征的重要性?
1. 使用决策树来判断特征的重要性
import joblib
_, data = joblib.load(filename="all_data.csblog")
X_train,y_train, _, _ = data
X_train.shape
y_train.shape
from sklearn.tree import DecisionTreeClassifier
dtc = DecisionTreeClassifier()
dtc.fit(X=X_train,y=y_train)
# 查看特征重要性
feature_importances = dtc.feature_importances_
feature_importances.sort()
feature_importances
2. 分类问题的评价
- 准确率 accuracy:预测正确的数量 / 总的测试数量
样本均衡时,可靠,不同类别的样本,数量上不会相差太多。
样本不均衡时,特别是深度学习模型,该指标有欺骗性。(梯度下降法)
- 精准率 precision
- 召回率 recall
- f1-score
from sklearn.metrics import accuracy_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import f1_score
accuracy_score(y_true=y_test, y_pred=y_pred)
recall_score(y_true = y_test, y_pred=y_pred,average=None)
precision_score(y_true = y_test, y_pred=y_pred,average=None)
f1_score(y_true = y_test, y_pred=y_pred,average=None)
3. 特征筛选

最低0.47元/天 解锁文章
6760

被折叠的 条评论
为什么被折叠?



