random forest python 实现

最新推荐文章于 2025-02-19 22:07:14 发布

小贺顶詹姆斯

最新推荐文章于 2025-02-19 22:07:14 发布

阅读量4.9k

点赞数

本文链接：https://blog.youkuaiyun.com/hexingwei/article/details/50740404

版权

一、实验数据

实验数据来自http://sci2s.ugr.es/keel/category.php?cat=clas 的mushroom数据集

二、设计思路：

1、首先，实现一个单一的决策树算法；
2、设定训练树的数目，默认为值为10；
3、根据训练数据集进行采样，此处采样大小为原始数据的一半，相同数据进行合并；对采样之后的数据抽取一半的特征，然后开始训练第一棵决策树；
4、重复步骤3生成其他的决策树；
5、利用生成的决策树对测试数据进行多数表决，决定其类别；

__author__ = 'hxw'
#-*- coding=utf-8 -*-
import numpy as np
"""
Note:this random forest can only process discrete data
if you want process continue data,you should disperse your data,before using it
"""
class randomforest():
    def __init__(self,train_data,n_estimators=10):
        self.data=train_data
        self.n_estimators=n_estimators
        self.decision_trees=[]
        self.labels=np.unique(self.data[:,:-1])
    def cal_entropy(self,y):
        elements={}
        total=len(y)
        for ele in y:
            elements[ele]=elements.get(ele,0)+1
        entropy=0
        for ele in elements:
            p=elements.get(ele)*1.0/total
            entropy-=p*np.log2(p)
        return entropy
    def split_data(self,data,i,value):