C4.5连续值处理方法
1.二分法:
说白了,就是按照连续值进行排序,根据数值生成n-1个平均值(上述计算公式);
例子:
密度进行信息增益计算:
(1)原始score集合:{a1,a2,……,an}= {0.697,0.774,0.634,0.608,0.556,0.403,0.481,0.437,0.666,0.243,0.245,0.343,0.639,0.657,0.360,0.593,0.719}
(2)sort之后score集合:{a1,a2,……,an}={0.243,0.245,0.343,0.360,0.437,0.481,0.556,0.593,0.608,0.634,0.639,0.657,0.666,0.697,0.719,0.774}
(3)平均值:{a1,a2,……,an}= {0.244 , 0.294, 0.351, 0.381, 0.420, 0.459, 0.518,0.574, 0.600, 0.621, 0.636, 0.648, 0.661, 0.681, 0.708, 0.746}.
(4)分别计算a1,a2,…,an的信息增益;