决策树完结篇

终于看完了决策树生成,和测试的代码,感觉还是非常有收获的,于是总结下决策树相关的东西,决策树说白了就是利用事物已知属性来构建对事物进行判定,划分数据的方式在前面的文章中已经进行了介绍,这里就不多说了,因为前面都没有给出如何利用自己构建的决策树来对新添加的数据进行测试,所以下面给出决策代码:

def classify(inputTree,featLabels,testVec):
    firstStr = list(inputTree.keys())[0]
    secondDict = inputTree[firstStr]
    featIndex = featLabels.index(firstStr)
    for key in secondDict.keys():
        if testVec[featIndex] == key:
            if type(secondDict[key]).__name__=='dict':
                classLabel = classify(secondDict[key],featLabels,testVec)
            else: 
                classLabel = secondDict[key]
    return classLabel

吼吼,这个点单的测试代码就是完成对给定数据进行分类决策的。其实就是对整棵树进行遍历,直到到达叶子节点。

同样给出程序的运行截图:


当然为了保险起见:我还是给出全部的源码,方便没有看前几篇的童鞋直接对其运行,操作和修改成自己的代码。

import math 
import operator

def calcShannonEnt(dataset):
    numEntries = len(dataset)
    labelCounts = {}
    for featVec in dataset:
        currentLabel = featVec[-1]
        if currentLabel not in labelCounts.keys():
            labelCounts[currentLabel] = 0
        labelCounts[currentLabel] +=1
        
    shannonEnt = 0.0
    for key in labelCounts:
        prob = float(labelCounts[key])/numEntries
        shannonEnt -= prob*math.log(prob, 2)
    return shannonEnt
    
def CreateDataSet():
    dataset = [[1, 1, 'yes' ], 
               [1, 1, 'yes' ], 
               [1, 0, 'no'], 
               [0, 1, 'no'], 
               [0, 1, 'no']]
    labels = ['no surfacing', 'flippers']
    return dataset, labels

def splitDataSet(dataSet, axis, value):
    retDataSet = []
    for featVec in dataSet:
        if featVec[axis] == value:
            reducedFeatVec = featVec[:axis]
            reducedFeatVec.extend(featVec[axis+1:])
            retDataSet.append(reducedFeatVec)
    
    return retDataSet

def chooseBestFeatureToSplit(dataSet):
    numberFeatures = len(dataSet[0])-1
    baseEntropy = calcShannonEnt(dataSet)
    bestInfoGain = 0.0;
    bestFeature = -1;
    for i in range(numberFeatures):
        featList = [example[i] for example in dataSet]
        print(featList)
        uniqueVals = set(featList)
        print(uniqueVals)
        newEntropy =0.0
        for value in uniqueVals:
            subDataSet = splitDataSet(dataSet, i, value)
            prob = len(subDataSet)/float(len(dataSet))
            newEntropy += prob * calcShannonEnt(subDataSet)
        infoGain = baseEntropy - newEntropy
        if(infoGain > bestInfoGain):
            bestInfoGain = infoGain
            bestFeature = i
    return bestFeature

def majorityCnt(classList):
    classCount ={}
    for vote in classList:
        if vote not in classCount.keys():
            classCount[vote]=0
        classCount[vote]=1
    sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True) 
    return sortedClassCount[0][0]
 

def createTree(dataSet, inputlabels):
    labels=inputlabels[:]
    classList = [example[-1] for example in dataSet]
    if classList.count(classList[0])==len(classList):
        return classList[0]
    if len(dataSet[0])==1:
        return majorityCnt(classList)
    bestFeat = chooseBestFeatureToSplit(dataSet)
    bestFeatLabel = labels[bestFeat]
    myTree = {bestFeatLabel:{}}
    del(labels[bestFeat])
    featValues = [example[bestFeat] for example in dataSet]
    uniqueVals = set(featValues)
    for value in uniqueVals:
        subLabels = labels[:]
        myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value), subLabels)
    return myTree



def classify(inputTree,featLabels,testVec):
    firstStr = list(inputTree.keys())[0]
    secondDict = inputTree[firstStr]
    featIndex = featLabels.index(firstStr)
    for key in secondDict.keys():
        if testVec[featIndex] == key:
            if type(secondDict[key]).__name__=='dict':
                classLabel = classify(secondDict[key],featLabels,testVec)
            else: 
                classLabel = secondDict[key]
    return classLabel

    
        
myDat,labels = CreateDataSet()
print(calcShannonEnt(myDat))

print(splitDataSet(myDat, 1, 1))

print(chooseBestFeatureToSplit(myDat))

myTree = createTree(myDat, labels)

print(classify(myTree, labels, [1, 0]))
print(classify(myTree, labels, [1, 1]))

吼吼,这样我们全部的决策树的东西就实践完毕了,祝大家学习工作愉快。

 本套课程分为基础与中级两部分,分别就lua语言的各方面知识点进行探讨,学习完本套课程,对于后续Xlua(Tolua等框架)技术的学习提供强大的语言技术保证。       本套lua课程采用入门与商业级两种开发IDE进行教学:入门级的SciTE内置IDE与商业级的IDEA lua插件。本套课程学习完毕,对于除了传统手游外,在VR、AR、商业级大型应用程序、嵌入式设备开发等领域都有较强的指导作用。           《lua中级篇》分为:“函数的进阶”、“字符串进阶”、“Table进阶”、“元表”、“OOP面向对象”、“协同程序”、“IO操作”、“调试与运行”等八个大的章节,详细深入讲解lua开发的方方面面。        内容包含lua可变参数、闭包、模块、函数尾调用、字符串模式匹配、字符串不变性原理、矩阵、链表、元表详解与应用、协同的生命周期与生产消费者问题、lua文件各种读写操作、lua执行外部代码与错误异常处理垃圾收集机制等。       最后,lua中级篇的学习,对于广大学员开发商业级lua热更新技术,具有不可替代的重要作用! 热更新系列(技术含量:中高级):B:《热更新框架设计之Xlua基础视频课程》https://edu.youkuaiyun.com/course/detail/27110C:《热更新框架设计之热更流程与热补丁技术》https://edu.youkuaiyun.com/course/detail/27118D:《热更新框架设计之客户端热更框架(上)》https://edu.youkuaiyun.com/course/detail/27132E:《热更新框架设计之客户端热更框架(中)》https://edu.youkuaiyun.com/course/detail/27135F:《热更新框架设计之客户端热更框架(下)》https://edu.youkuaiyun.com/course/detail/27136 
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值