决策树的构造与涉及参数
树模型参数:
1.criteration gini or entropy
2.spliter best (默认)or random
3.max features None
4.max_detph
5.min_sample_split(小于某个值的时候不再分裂)
6.min_sample_leaf(叶子节点最小的样本数,小于该数则剪枝)
7.min_weight_fraction_leaf(叶子节点权重和的最小值)
8.max_leaf_nodes
9.class_weight
10.min_impurity_split
n_estimators
import matplotlib.pyplot as pd
import pandas as pd
from sklearn.datasets.california_housing import fetch_california_housing
housing=fetch_california_housing()
print(housing.DESCR)
from sklearn import tree
dtr=tree.DecisionTreeRegressor(max_depth=2)
#传递x,y参数:x,数据;y,label
print(dtr.fit(housing.data[:,[6,7]],housing.target))
#可视化显示需要安装graphviz http://www.graphviz.org/Download..php
dot_data= \
tree.export_graphviz(
dtr,
out_file=None,
feature_names=housing.feature_names[6:8],#特征名字
filled=True,
rounded=True
)#生成.dot文件
#pip install pydotplus
import pydotplus
graph=pydotplus.graph_from_dot_data(dot_data)
graph.get_nodes()[7].set_fillcolor("#FFF2DD")
from Ipython import Image
Image(graph.create_png())
from sklearn.model_selection import train_test_split
data_train,data_test,target_train,target_test=\
train_test_split(housing.data,housing.target,test_size=0.1,random_state=42)
dtr=tree.DecisionTreeRegressor(random_state=42)
dtr.fit(data_train,target_train)
print(dtr.score(data_test,target_test))
from sklearn.ensemble import RandomForestRegressor
rfr=RandomForestRegressor(random_state=42)
rfr.fit(data_train,target_train)
print(rfr.score(data_test,target_test))
本文深入探讨了决策树的构建过程,重点介绍了关键参数的影响,如分裂标准(基尼指数或熵)、最佳分裂策略、最大特征数、最大深度、最小样本分裂数、最小叶节点样本数、叶子节点权重和的最小值、最大叶节点数、类别权重以及不纯度阈值。理解并优化这些参数对于构建高效决策树模型至关重要。
105

被折叠的 条评论
为什么被折叠?



