XGBoost实战1：boston房价预测

最新推荐文章于 2025-04-02 14:09:17 发布

bb8886

最新推荐文章于 2025-04-02 14:09:17 发布

阅读量2.2k

点赞数 6

分类专栏：机器学习算法详解+实战文章标签： python 机器学习算法

本文链接：https://blog.youkuaiyun.com/bb8886/article/details/130281549

版权

boston房价数据集包括506个样本，每个样本包括13个特征变量和该地区的平均房价，房价显然和多个特征变量相关，对于XGBoost模型，我们分别用两种方式来创建。

本章学习以下内容：

1、XGBoost两种方式建模以及所需参数
2、gridSearchCV参数详解
3、xgboost如何做cv
4、xgboost如何调参
5、xgboost可视化

一、加载数据集

1、导包

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error

2、读入数据并展示

boston = datasets.load_boston()
data = pd.DataFrame(boston.data)
data.columns = boston.feature_names
data['price'] = boston.target
print(data.columns)
print(data.head())

特征（13个）：

前几行数据：

3、拆分特征和标签

y = data.pop('price')

二、数据集处理

1、查看空值

print(data.isnull().sum())

2、查看数据大小

print(data.shape)

3、查看数据描述信息

print(data.describe())

4、划分数据集

x_train, x_test, y_train, y_test = train_test_split(data, y, test_size=0.2, random_state=14)

三、XGBoost建模两种方式以及cv建模

1、用XGBoost库中的sklearn的API(使用fit和predict)

xgboost参数解释：

不可优化参数：

'booster':'gbtree'--树模型，gblinear--线性模型

'objective': 'multi:softmax'--多分类，'binary:logistic'--二分类，'reg:squarederror'--回归

'nthread':控制线程数目

'silent':设置成1则没有运行信息输出，最好是设置为0.

可优化参数：

'max_depth':树的最大深度。增加这个值会使模型更加复杂，也容易出现过拟合，深度3-10是合理的。

n_estimators: 构建多少颗数，树越多越容易过拟合。

'subsample': 每次迭代用多少数据集 0~1。

'colsample_bytree':每次用多少特征，可以控制过拟合。

'min_child_w

最低0.47元/天解锁文章