第5章 决策树
ID3算法(基于信息增益)
entropy(信息熵): H ( x ) = − ∑ i = 1 n p i log p i H(x) = -\sum_{i=1}^{n}p_i\log{p_i} H(x)=−∑i=1npilogpi
#熵
def calc_ent(datasets):
data_length = len(datasets)
#类别是/否
label_count = {}
for i in range(data_length):
label = datasets[i][-1]
if label not in label_count:
label_count[label] = 0
label_count[label] += 1
ent = -sum([(p/data_length)*log(p/data_length,2)for p in label_count.values()])
return ent
conditional entropy(条件熵): H ( X ∣ Y ) = ∑ P ( X ∣ Y ) log P ( X ∣ Y ) H(X|Y)=\sum{P(X|Y)}\log{P(X|Y)} H(X∣Y)=