决策树,随机森林房价预测

import matplotlib.pyplot as plt

import pandas as pd

from sklearn.datasets.california_housing import fetch_california_housing
housing = fetch_california_housing()
print(housing.DESCR)

housing.data.shape

housing.data[:5][:]

from sklearn import tree
dtr = tree.DecisionTreeRegressor(max_depth = 2)
dtr.fit(housing.data[:, [6, 7]], housing.target) 

#要可视化显示 首先需要安装 graphviz   http://www.graphviz.org/Download..php
import os
os.environ["PATH"] += os.pathsep + 'C:/soft/graphviz/graphviz-2.38/release/bin'  #注意修改你的路径

dot_data = \
    tree.export_graphviz(
        dtr,
        out_file = None,
        feature_names = housing.feature_names[6:8],
        filled = True,
        impurity = False,
        rounded = True
    )

#pip install pydotplus
import pydotplus
graph = pydotplus.graph_from_dot_data(dot_data)
graph.get_nodes()[7].set_fillcolor("#FFF2DD")
from IPython.display import Image
Image(graph.create_png())

graph.write_png("dtr_white_background.png")

from sklearn.model_selection import train_test_split
data_train, data_test, target_train, target_test = \
    train_test_split(housing.data, housing.target, test_size = 0.1, random_state = 42)
dtr = tree.DecisionTreeRegressor(random_state = 42)
dtr.fit(data_train, target_train)

dtr.score(data_test, target_test)

from sklearn.ensemble import RandomForestRegressor
rfr = RandomForestRegressor( random_state = 42)
rfr.fit(data_train, target_train)
rfr.score(data_test, target_test)

#from sklearn.grid_search import GridSearchCV
from sklearn.model_selection import GridSearchCV
tree_param_grid = { 'min_samples_split': list((3,6,9)),'n_estimators':list((10,50,100))}
grid = GridSearchCV(RandomForestRegressor(),param_grid=tree_param_grid, cv=5)
grid.fit(data_train, target_train)
grid.grid_scores_, grid.best_params_, grid.best_score_

rfr = RandomForestRegressor( min_samples_split=3,n_estimators = 100,random_state = 42)
rfr.fit(data_train, target_train)
rfr.score(data_test, target_test)

 

 

 

 

 

 

 

 

### 决策树随机森林在回归预测中的实现与应用 #### 决策树回归模型简介 决策树是一种监督学习方法,适用于分类和回归任务。对于回归问题,决策树通过递归分割输入空间,形成一系列矩形区域,并为每个区域分配一个输出值。该输出通常是区域内训练样本目标变量的平均值。这种简单的机制使决策树易于理解和解释。 ```python from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeRegressor import numpy as np # 加载波士顿房价数据集 boston = load_boston() X, y = boston.data, boston.target # 划分训练集和测试集 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # 创建并训练决策树回归器 tree_regressor = DecisionTreeRegressor(random_state=42) tree_regressor.fit(X_train, y_train) # 进行预测 y_pred_tree = tree_regressor.predict(X_test) print(f"Decision Tree R^2 Score: {np.round(tree_regressor.score(X_test, y_test), 3)}") ``` #### 随机森林回归模型概述 随机森林是由许多决策树组成的集成模型,每棵树都在不同的子样本上独立生长。当面对新的观测值时,这些树会各自给出自己的预测结果;最终,整个森林的预测则是所有单个树木预测值的均值(针对回归)。这种方法不仅提升了模型的表现力,还降低了过拟合的风险[^2]。 ```python from sklearn.ensemble import RandomForestRegressor # 构建随机森林回归器 forest_regressor = RandomForestRegressor(n_estimators=100, random_state=42) forest_regressor.fit(X_train, y_train) # 使用随机森林进行预测 y_pred_forest = forest_regressor.predict(X_test) print(f"Random Forest R^2 Score: {np.round(forest_regressor.score(X_test, y_test), 3)}") ``` #### 调整参数以优化性能 为了进一步提升随机森林的效果,可以通过调节超参数如`n_estimators`(即森林中树的数量),以及控制个体树的最大深度(`max_depth`)等方式来进行调优。此外,还可以考虑采用交叉验证技术来评估不同配置下的表现差异,从而找到最适合当前任务的最佳设置[^3]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

贾世林jiashilin

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值