计算信息增益

# -*- coding: UTF-8 -*-
from math import log
from collections import Counter
import csv
import numpy as np
 

def createDataSet():
    dataSet = np.array([['年龄', '有工作', '有自己的房子', '信贷情况','vqa'] ,
               [0, 0, 0, 0, 'no'], 
               [0, 0, 0, 1, 'no'],
               [0, 1, 0, 1, 'yes'],
               [0, 1, 1, 0, 'yes'],
               [0, 0, 0, 0, 'no'],
               [1, 0, 0, 0, 'no'],
               [1, 0, 0, 1, 'no'],
               [1, 1, 1, 1, 'yes'],
               [1, 0, 1, 2, 'yes'],
               [1, 0, 1, 2, 'yes'],
               [2, 0, 1, 2, 'yes'],
               [2, 0, 1, 1, 'yes'],
               [2, 1, 0, 1, 'yes'],
               [2, 1, 0, 2, 'yes'],
               [2, 0, 0, 0, 'no']])
    return dataSet
 
def calcShannonEnt(dataSet,axis=-1):                      
    numEntires = len(dataSet) 
    columnCounter = Counter(dataSet[:,axis])
    shannonEnt = 0.0                                
    for key in columnCounter.keys():                       
        prob = float(columnCounter[key]) / numEntires  
        shannonEnt -= prob * log(prob, 2)           
    return shannonEnt                           

def subDataSet(dataSet, axis, value):
    numEntires = len(dataSet) 
    subDataSetIndexs = np.where(dataSet[:,axis]==value)
    subDataSet = dataSet[subDataSetIndexs,:]
    subDataSet = subDataSet[0]
    return subDataSet                                   
 
def entropyGain(dataSet,axis=1,baseAxis=-1):
    numEntires = len(dataSet) 
    EntropyGain = calcShannonEnt(dataSet,baseAxis)
    columnCounter = Counter(dataSet[:,axis])
    newEntropy = 0.0
    for key in columnCounter.keys():
        prob = float(columnCounter[key]) / numEntires
        subSet = subDataSet(dataSet=dataSet,axis=axis,value=key)
        newEntropy += prob * calcShannonEnt(dataSet=subSet,axis=baseAxis)
    EntropyGain = EntropyGain - newEntropy
    return EntropyGain
 
 
dataset = createDataSet()
dataset = dataset[1:,:]
Entropy=calcShannonEnt(dataset)
print('Entropy is {Entropy:0.6f}'.format(Entropy=Entropy))    


for i in range(4):
    EntropyGain = entropyGain(dataset,axis=i,baseAxis=4)
    print('EntropyGain is {EntropyGain}'.format(EntropyGain=EntropyGain))    

Entropy is 0.970951
EntropyGain is 0.0830074998558
EntropyGain is 0.323650198152
EntropyGain is 0.419973094022
EntropyGain is 0.362989562537

在sklearn中,计算信息增益率的方法是通过使用决策树模型中的feature_importances_属性来获取特征的重要性程度。在代码中,可以使用clf.feature_importances_来查看特征的重要性程度。\[1\]然而,sklearn库并没有直接提供计算信息增益率的函数。要计算信息增益率,可以使用其他方法,比如先计算信息增益,然后再除以分支度。分支度是一个惩罚项,它考虑了父节点分出的子节点的数量。\[2\]另外,对于连续性变量,可以使用二分法来进行切分,然后计算加权信息熵和信息增益,最终得到增益比例。\[3\] #### 引用[.reference_title] - *1* [机器学习--决策树(sklearn)](https://blog.youkuaiyun.com/weixin_50918736/article/details/125616968)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control_2,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* *3* [机器学习——有监督——决策树(分类树)相关原理及sklearn实现(信息熵、基尼系数、信息增益、特征重要...](https://blog.youkuaiyun.com/huangguohui_123/article/details/105522595)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control_2,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值