分类算法（决策树，SVM，随机森林，逻辑回归）

最新推荐文章于 2025-06-02 19:47:25 发布

原创

最新推荐文章于 2025-06-02 19:47:25 发布 · 1.2w 阅读

30 ·

CC 4.0 BY-SA版权

本文通过Python实现决策树、SVM、随机森林和逻辑回归的分类算法，并用鸢尾花数据集进行可视化演示。展示了决策树的熵作为分裂准则、SVM的线性和非线性决策边界、随机森林的分类效果以及逻辑回归的分类概率。同时，还探讨了各类算法在过拟合和欠拟合问题上的应对策略。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

直接贴代码，注释也一并在代码中

参考资料：python机器学习

#########决策树输出

# -*- coding: utf-8 -*-
"""
Created on Sat May 26 15:07:33 2018

@author: hu
"""
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
import numpy as np
from sklearn import datasets

#数据加载
iris = datasets.load_iris()

#花瓣长度和宽度，2,3、两个特征值
X = iris.data[:,[2,3]]

#类标赋值
y = iris.target

#数据分区导包
from sklearn.cross_validation import train_test_split

#划分数据训练集和测试集，测试集30%
X_train , X_test , y_train, y_test = train_test_split(X,y,test_size = 0.3, random_state=0)

#数据缩放，标准化
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
sc.fit(X_train)#计算特征样本均值和标准差
X_train_std = sc.transform(X_train)#对其样本均值和标准差做标准化处理
X_test_std = sc.transform(X_test)#对其样本均值和标准差做标准化处理

from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt

#可视化函数
def plot_decision_regions(X,y,classifier,test_idx=None ,resolution = 0.02):

#set marker generator and color map
markers = ('s','x','o','^','v')
colors = ('red','blue','lightgreen','gray','cyan')
cmap = ListedColormap(colors[len(np.unique(y))])

#plot the decision suiface
x1_min,x1_max = X[:,0].min()-1, X[:,0].max()+1
x2_min,x2_max = X[:,0].min()-1, X[:,0].max()+1
xx1,xx2 = np.meshgrid(np.arange(x1_min,x1_max,resolution),
np.arange(x2_min,x2_max,resolution))
Z= classifier.predict(np.array([xx1.ravel(),xx2.ravel()]).T)
Z = Z.reshape(xx1.shape)
plt.contourf(xx1,xx2,Z,alpha=0.4,cmap=cmap)
plt.xlim(xx1.min(),xx1.max())
plt.ylim(xx2.min(),xx2.max())

#plot all samples
X_test ,y_test = X[test_idx, :],y[test_idx]
for idx ,cl in enumerate(np.unique(y)):
plt.scatter(x=X[y== cl,0],y=X[y==cl,1],
alpha=0.8,c=cmap(idx),
marker=markers[idx])
#,lable =cl)
#highlight test samples
if test_idx:
X_test,y_test =X[test_idx,:],y[test_idx]
plt.scatter(X_test[:,0],X_test[:,1],c='',
alpha=1.0,linewidths=1,marker='o',
s=55)
#,lable ='test set'
# )
############ 可适用于大型数据
#from sklearn.linear_model import SGDClassifier
#svm=SGDClassifier(loss='hinge')

###########
from sklearn.tree import DecisionTreeClassifier
tree= DecisionTreeClassifier(criterion='entropy',max_depth=3,random_state=0)
tree.fit(X_train_std,y_train)

#SCVM模型
#svm = SVC(kernel='linear',C=1.0,random_state= 0 )
#svm.fit(X_train_std,y_train)
#plot_decision_regions(X_combined_std,y_combined,classifier=svm, test_idx=range(105,150))
#plt.xlabel('petal length [standardized]')
#plt.ylabel