决策树——中文可视化

最新推荐文章于 2024-04-04 15:17:19 发布

ning_ww

最新推荐文章于 2024-04-04 15:17:19 发布

阅读量1.4k

点赞数 6

分类专栏：机器学习文章标签：机器学习

本文链接：https://blog.youkuaiyun.com/bb_sy_w/article/details/107520928

版权

机器学习专栏收录该内容

4 篇文章

订阅专栏

决策树

回归

数据来自kaggle竞赛的house price

import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
import matplotlib.pyplot as plt

#load  dataset
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
train_x = np.arange(1460).reshape((-1,1))
train_y = train.iloc[:,-1]
test = np.arange(1459).reshape((-1,1))
#reshape很重要

#fit
regr_1 = DecisionTreeRegressor(max_depth=2)
regr_2 = DecisionTreeRegressor(max_depth=5)
regr_3 = DecisionTreeRegressor(max_depth=10)

regr_1.fit(train_x,train_y)
regr_2.fit(train_x,train_y)
regr_3.fit(train_x,train_y)

#predict
y1 = regr_1.predict(test)
y2 = regr_2.predict(test)
y3 = regr_3.predict(test)

# Plot the results
plt.figure()
plt.scatter(np.arange(100), train['SalePrice'][:100], s=20, edgecolor="black",
            c="darkorange", label="data")
plt.plot(np.arange(100), y1[:100], color="cornflowerblue", label="max_depth=2", linewidth=2)
plt.plot(np.arange(100), y2[:100], color="yellowgreen", label="max_depth=5", linewidth=2)
plt.plot(np.arange(100), y3[:100], color="red", label="max_depth=10", linewidth=2)
plt.xlabel("data")
plt.ylabel("target")
plt.title("Decision Tree Regression")
plt.legend()
plt.show()

在这里插入图片描述
随着max_depth的增加，拟合的越来越好，但容易出现过拟合现象。

分类

这里我们采用sklearn的内置数据试验

from sklearn import tree
import graphviz
from sklearn import datasets

#加载内置数据集
iris = datasets.load_iris()

#实例化，采用哪种方法
clf = tree.DecisionTreeClassifier()

#训练
clf.fit(iris.data,iris.target)

#画出决策树
dot_data = tree.export_graphviz(clf,out_file = None)
grap = graphviz.Source(dot_data)
grap.render('iris')

在这里插入图片描述

可能遇到的报错：
graphviz.backend.ExecutableNotFound: failed to execute [‘dot’, ‘-Tpdf’, ‘-O’, ‘iris’], make sure the Graphviz executables are on your systems’ PATH
解决方案：Graphviz
好吧。貌似官网上download的exe文件不能用，这里贴出可以用的版本graphviz-2.38.msi
链接：网盘链接
提取码：ei92
也可以设置dot_data里的参数，使决策树更接地气。

#为决策树加上数据内置的feature_names
dot_data = tree.export_graphviz(clf,out_file = None,feature_names = iris.feature_names)

在这里插入图片描述
如何自定义决策树的特征名称显示中文

feature_names = ['花萼的长度 (cm)',
  '花萼的宽度 (cm)',
  '花瓣的长度 (cm)',
  '花瓣的宽度 (cm)']
  
#画出决策树
dot_data = tree.export_graphviz(clf,out_file = None,feature_names = feature_names)
# grap = graphviz.Source(dot_data)
# grap.render('iris2')

中文乱码了。。。，下面给出解决方案：
在这里插入图片描述
在原py文件中，生成txt文件

file_path = "\决策树iris.text"
with open(file_path,'w',encoding = 'utf-8') as f:
  f.writelines(dot_data)

打开生成的text文档，将[shape = box]扩充为[[shape=box fontname=“Microsoft YaHei”]]
创建一个新的py文件，读取text文件，生成带有中文字符的决策树图片。

import os
os.environ['PATH']+= os.pathsep+"D:\graphviz\bin"
import pydotplus

file_path = "\决策树iris.text"
with open(file_path,'r',encoding = 'utf-8') as f:
  dot_data = f.read()

grap = pydotplus.graph_from_dot_data(dot_data)
grap.write_png("中文决策树iris.png")

在这里插入图片描述
同理还可自主设置target_names

target_names = ['山鸢尾', '杂色鸢尾', '韦尔吉鸢尾 ']

dot_data = tree.export_graphviz(clf,out_file = None,feature_names = feature_names,class_names=target_names,)

在这里插入图片描述
其他参数设置，颜色填充，圆角，特殊字符

dot_data = tree.export_graphviz(clf,out_file = None,feature_names = feature_names,
                                class_names=target_names,filled = True,rounded=True,special_characters=True)