Task4 建模调参

最新推荐文章于 2020-09-24 23:46:01 发布

谢xie111

最新推荐文章于 2020-09-24 23:46:01 发布

阅读量191

点赞数

分类专栏：学习笔记文章标签：数据挖掘

本文链接：https://blog.youkuaiyun.com/weixin_40299430/article/details/105253525

版权

学习笔记专栏收录该内容

10 篇文章

订阅专栏

Task4 建模调参

在这里插入图片描述
1、线性回归模型：
线性回归对于特征的要求；
处理长尾分布；
理解线性回归模型；

通过作图我们发现数据的标签（price）呈现长尾分布，不利于我们的建模预测。原因是很多模型都假设数据误差项符合正态分布，而长尾分布的数据违背了这一假设。
在这里插入图片描述
2、模型性能验证：
评价函数与目标函数；
交叉验证方法；
留一验证方法；
针对时间序列问题的验证；
绘制学习率曲线；
绘制验证曲线；
3、嵌入式特征选择：
Lasso回归；


model = Lasso().fit(train_X, train_y_ln)

print('intercept:'+ str(model.intercept_))

sns.barplot(abs(model.coef_), continuous_feature_names)

在这里插入图片描述
玲回归；

model = Ridge().fit(train_X, train_y_ln)

print('intercept:'+ str(model.intercept_))

sns.barplot(abs(model.coef_), continuous_feature_names)

在这里插入图片描述
决策树；
4、模型对比：
常用线性模型；

#线性回归训练
model = model.fit(train_X, train_y_ln)
print('intercept:'+ str(model.intercept_))
sorted(dict(zip(continuous_feature_names, model.coef_)).items(), key=lambda x:x[1], reverse=True)

在这里插入图片描述
常用非线性模型；
5、模型调参：
贪心调参方法；

#模型调参

objective = ['regression', 'regression_l1', 'mape', 'huber', 'fair']

num_leaves = [3,5,10,15,20,40, 55]

max_depth = [3,5,10,15,20,40, 55]

bagging_fraction = []

feature_fraction = []

drop_rate = []

#1、贪心算法

best_obj = dict()

for obj in objective:

    model = LGBMRegressor(objective=obj)

    score = np.mean(cross_val_score(model, X=train_X, y=train_y_ln, verbose=0, cv = 5, scoring=make_scorer(mean_absolute_error)))

    best_obj[obj] = score

    

best_leaves = dict()

for leaves in num_leaves:

    model = LGBMRegressor(objective=min(best_obj.items(), key=lambda x:x[1])[0], num_leaves=leaves)

    score = np.mean(cross_val_score(model, X=train_X, y=train_y_ln, verbose=0, cv = 5, scoring=make_scorer(mean_absolute_error)))

    best_leaves[leaves] = score

    

best_depth = dict()

for depth in max_depth:

    model = LGBMRegressor(objective=min(best_obj.items(), key=lambda x:x[1])[0],

                          num_leaves=min(best_leaves.items(), key=lambda x:x[1])[0],

                          max_depth=depth)

    score = np.mean(cross_val_score(model, X=train_X, y=train_y_ln, verbose=0, cv = 5, scoring=make_scorer(mean_absolute_error)))

    best_depth[depth] = score

sns.barplot(x=['0_initial','1_turning_obj','2_turning_leaves','3_turning_depth'], y=[0.143 ,min(best_obj.values()), min(best_leaves.values()), min(best_depth.values())])

在这里插入图片描述
网格调参方法；
贝叶斯调参方法；

def rf_cv(num_leaves, max_depth, subsample, min_child_samples):

    val = cross_val_score(

        LGBMRegressor(objective = 'regression_l1',

            num_leaves=int(num_leaves),

            max_depth=int(max_depth),

            subsample = subsample,

            min_child_samples = int(min_child_samples)

        ),

        X=train_X, y=train_y_ln, verbose=0, cv = 5, scoring=make_scorer(mean_absolute_error)

    ).mean()

    return 1 - val

在这里插入图片描述
绘制学习率曲线和验证曲线