任务:分别用IV值和随机森林进行特征选择。然后分别构建模型(逻辑回归、SVM、决策树、随机森林、GBDT、XGBoost和LightGBM),进行模型评估。
用IV值进行特征选择:
import pandas as pd
from pandas import DataFrame as df
from numpy import log
import numpy as np
financial = pd.read_csv("data.csv", encoding='gbk')
X = financial.drop(labels="status", axis=1)
y = financial["status"]
col_list = [col for col in financial.drop(labels=['Unnamed: 0','status',
"source", "bank_card_no", "trade_no","id_name"], axis=1)]
dataIV = df()
feaIV = []
for col in col_list:
subFina = df(financial.groupby(col)[col].count())
subTag = df(financial.groupby(col)["status"].sum())
data = df(pd.merge(subFina, subTag, how='left', left_index=True, right_index=True))
total = financial[col].sum()
b_total = financial["status"].sum(