西南交通大学【机器学习实验9】

最新推荐文章于 2026-01-05 11:47:06 发布

原创最新推荐文章于 2026-01-05 11:47:06 发布 · 203 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#机器学习 #人工智能

Python3.11

Conda

Python

Python 是一种高级、解释型、通用的编程语言，以其简洁易读的语法而闻名，适用于广泛的应用，包括Web开发、数据分析、人工智能和自动化脚本

实验目的

参考随机森林，以决策树为基学习器，构建bagging集成器用于多分类任务。

实验要求

编程实现随机森林模型，对手写数字识别数据集进行分类。基模型采用决策树模型，划分属性指标采用信息熵指标，随机选取属性子集数目为50。将决策树数量T依次设置为1,2,…,20，计算随机森林在测试集上的精度，并绘制随机森林精度随基模型数量增加的变化曲线。

实验环境

Python，numpy，matplotlib，sklearn

实验代码

import numpy as np

from matplotlib.ticker import MultipleLocator

from sklearn.tree import DecisionTreeClassifier

import matplotlib.pyplot as plt

import matplotlib





matplotlib.use("TKAgg")





# 计算投票结果

def vote(predictions_matrix):

    n_samples = predictions_matrix.shape[0]

    final_predictions = np.zeros(n_samples, dtype=int)



    for i in range(n_samples):

        row = predictions_matrix[i]

        # 找出所有可能类别及其计数

        unique, counts = np.unique(row, return_counts=True)

        # 选出现次数最多的类别

        final_predictions[i] = unique[np.argmax(counts)]



    return final_predictions





# 读取训练数据

train_data = np.genfromtxt("experiment_09_training_set.csv", delimiter=",", skip_header=1)

columnOfTrainDataset = train_data.shape[1]



# 读取测试数据

test_data = np.genfromtxt("experiment_09_testing_set.csv", delimiter=",", skip_header=1)

test_x = test_data[:, 1: columnOfTrainDataset]

test_y = test_data[:, 0]



# 设置T和随机种子

T = 20

np.random.seed(42)



# 准备画图向量

x_line = np.arange(1, T+1)

y_line = np.zeros(T)



# 循环

for number in range(1, T+1):

    # 存储决策树

    models = []

    # 训练基模型

    for i in range(number):

        # 随机取样得到下标

        index = np.random.choice(train_data.shape[0], size=train_data.shape[0], replace=True)

        # 通过下标得到x和y

        train_x = train_data[index, 1:columnOfTrainDataset]

        train_y = train_data[index, 0]

        # 设置基模型为决策树模型，使用信息熵作为划分属性指标

        tree = DecisionTreeClassifier(criterion='entropy', max_features=50)

        tree.fit(train_x, train_y)

        # 添加模型

        models.append(tree)

   

    # 初始化预测结果

    pred = np.zeros((test_data.shape[0], number))



    # 遍历决策树模型

    for i, tree in enumerate(models):

        # 进行预测

        pred[:, i] = tree.predict(test_x)

    # 得到预测结果

    pred_y = vote(pred)

    # 计算精度

    accuracy = np.sum(pred_y == test_y) / test_y.shape[0]

    # 打印结果

    print(f"T: {number} -> accuracy: {accuracy: .4f}")

    y_line[number-1] = accuracy



# 画图

plt.figure(1)

plt.plot(x_line, y_line, color='b', linewidth=1.5, label='Accuracy line')

plt.xlabel("T", fontsize=12)

plt.ylabel("Accuracy", fontsize=12)

plt.title("Accuracy line", fontsize=14)

plt.legend(loc='upper right', frameon=True)

plt.grid(alpha=0.3, linestyle=':')

plt.ylim(0, 1.1)

ax = plt.gca()

ax.xaxis.set_major_locator(MultipleLocator(2))

plt.tight_layout()

plt.show()

结果分析

测试集上精度

T	1	2	3	4	5	6	7	8	9	10
精度	0.7980	0.8022	0.8730	0.9036	0.9163	0.9237	0.9303	0.9373	0.9386	0.9421
T	11	12	13	14	15	16	17	18	19	20
精度	0.9447	0.9457	0.9489	0.9513	0.9500	0.9505	0.9547	0.9553	0.9548	0.9561