使用ID3算法根据信息增益构建决策树

题目1

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

题目2

根据ID3(信息增益)方法利用以下数据集构建决策树。

electionseasonoil pricestock price
nowinterriserise
nosummerfallfall
nosummerriserise
yeswinterrisefall
nowinterfallfall
yessummerriserise
yessummerfallfall
yeswinterfallfall

设数据集为S,

step1,计算熵
H ( S ) = − ( 3 8 log ⁡ 2 3 8 + 5 8 log ⁡ 2 5 8 ) = 0.954 H(S) = - \left( \frac{3}{8} \log_2 \frac{3}{8} + \frac{5}{8} \log_2 \frac{5}{8} \right)= 0.954 H(S)=(83log283+85log285)=0.954
step2:计算条件熵

  • step2.1 计算特征election的条件熵
    • g ( S , e l e c t i o n ) = H ( S ) − H ( S ∣ e l e c t i o n ) g(S, election)=H(S)-H(S|election) g(S,election)=H(S)H(Selection)
    • H ( S ∣ e l e c t i o n ) = 4 8 H ( S ∣ e l e c t i o n = y e s ) + 4 8 H ( S ∣ e l e c t i o n = n o ) H(S|election) = \frac{4}{8} H(S|election=yes) + \frac{4}{8} H(S|election=no) H(Selection)=84H(Selection=yes)+84H(Selection=no)
    • H ( S ∣ e l e c t i o n = y e s ) = − ( 1 4 log ⁡ 2 1 4 + 3 4 log ⁡ 2 3 4 ) H(S|election=yes) = - \left( \frac{1}{4} \log_2 \frac{1}{4} + \frac{3}{4} \log_2 \frac{3}{4} \right) H(Selection=yes)=(41log241+43log243)
    • H ( S ∣ e l e c t i o n = n o ) = − ( 2 4 log ⁡ 2 2 4 + 2 4 log ⁡ 2 2 4 ) H(S|election=no) = - \left( \frac{2}{4} \log_2 \frac{2}{4} + \frac{2}{4} \log_2 \frac{2}{4} \right) H(Selection=no)=(42log242+42log242)
    • H ( S ∣ e l e c t i o n ) = 0.9055 H(S|election) = 0.9055 H(Selection)=0.9055
    • g ( S , e l e c t i o n ) = 0.954 − 0.9055 = 0.0485 g(S, election)=0.954 - 0.9055 = 0.0485 g(S,election)=0.9540.9055=0.0485
  • step2.2 计算特征season的条件熵
    • g ( S , s e a s o n ) = H ( S ) − H ( S ∣ s e a s o n ) g(S, season)=H(S)-H(S|season) g(S,season)=H(S)H(Sseason)
    • H ( S ∣ s e a s o n ) = 4 8 H ( S ∣ s e a s o n = w i n t e r ) + 4 8 H ( S ∣ s e a s o n = s u m m e r ) H(S|season) = \frac{4}{8} H(S|season=winter) + \frac{4}{8} H(S|season=summer) H(Sseason)=84H(Sseason=winter)+84H(Sseason=summer)
    • H ( S ∣ s e a s o n = w i n t e r ) = − ( 1 4 log ⁡ 2 1 4 + 3 4 log ⁡ 2 3 4 ) = 0.811 H(S|season=winter) = - \left( \frac{1}{4} \log_2 \frac{1}{4} + \frac{3}{4} \log_2 \frac{3}{4} \right) = 0.811 H(Sseason=winter)=(41log241+43log243)=0.811
    • H ( S ∣ s e a s o n = s u m m e r ) = − ( 2 4 log ⁡ 2 2 4 + 2 4 log ⁡ 2 2 4 ) = 1 H(S|season=summer) = - \left( \frac{2}{4} \log_2 \frac{2}{4} + \frac{2}{4} \log_2 \frac{2}{4} \right) = 1 H(Sseason=summer)=(42log242+42log242)=1
    • H ( S ∣ s e a s o n ) = 0.9055 H(S|season) = 0.9055 H(Sseason)=0.9055
    • g ( S , s e a s o n ) = 0.954 − 0.9055 = 0.0485 g(S, season)=0.954 - 0.9055 = 0.0485 g(S,season)=0.9540.9055=0.0485
  • step2.3 计算特征oil price 的条件熵
    • g ( S , o i l p r i c e ) = H ( S ) − H ( S ∣ o i l p r i c e ) g(S, oil price)=H(S)-H(S|oil price) g(S,oilprice)=H(S)H(Soilprice)
    • H ( S ∣ o i l p r i c e ) = 4 8 H ( S ∣ o i l p r i c e = r i s e ) + 4 8 H ( S ∣ o i l p r i c e = f a l l ) H(S|oil price) = \frac{4}{8} H(S|oil price=rise) + \frac{4}{8} H(S|oil price=fall) H(Soilprice)=84H(Soilprice=rise)+84H(Soilprice=fall)
    • H ( S ∣ o i l p r i c e = r i s e ) = − ( 3 4 log ⁡ 2 3 4 + 1 4 log ⁡ 2 1 4 ) H(S|oil price=rise) = - \left( \frac{3}{4} \log_2 \frac{3}{4} + \frac{1}{4} \log_2 \frac{1}{4} \right) H(Soilprice=rise)=(43log243+41log241)
    • H ( S ∣ o i l p r i c e = f a l l ) = − ( 0 4 log ⁡ 2 0 4 + 4 4 log ⁡ 2 4 4 ) = 0 H(S|oil price=fall) = - \left( \frac{0}{4} \log_2 \frac{0}{4} + \frac{4}{4} \log_2 \frac{4}{4} \right) = 0 H(Soilprice=fall)=(40log240+44log244)=0
    • H ( S ∣ o i l p r i c e ) = 0.4055 H(S|oil price)=0.4055 H(Soilprice)=0.4055
    • g ( S , o i l p r i c e ) = 0.954 − 0.4055 = 0.5485 g(S, oil price)= 0.954 - 0.4055 = 0.5485 g(S,oilprice)=0.9540.4055=0.5485

step3:选择信息增益最大的特征为根节点,即oil price为根节点。
step4:将 S S S根据特征oil price=rise和fall,划分为 S 1 S_1 S1 S 2 S_2 S2,其中oil price=fall,stock price全部为fall,则 S 2 S_2 S2为叶节点。对 S 1 S_1 S1从特征election 和season中选择新的特征。

  • step4.1 计算 g ( S 1 , e l e c t i o n ) = H ( S 1 ) − H ( S 1 ∣ e l e c t i o n ) g(S_1, election)=H(S_1)-H(S_1|election) g(S1,election)=H(S1)H(S1election)

    • H ( S 1 ∣ e l e c t i o n ) = 2 4 H ( S 1 ∣ e l e c t i o n = n o ) + 2 4 H ( S 1 ∣ e l e c t i o n = y e s ) H(S_1|election)=\frac{2}{4}H(S_1|election=no)+\frac{2}{4}H(S_1|election=yes) H(S1election)=42H(S1election=no)+42H(S1election=yes)
    • H ( S 1 ∣ e l e c t i o n = n o ) = − 2 2 log ⁡ 2 2 2 − 0 2 log ⁡ 2 0 2 = 0 H(S_1|election=no)=-\frac{2}{2}\log_2 \frac{2}{2}-\frac{0}{2}\log_2 \frac{0}{2} = 0 H(S1election=no)=22log22220log220=0
    • H ( S 1 ∣ e l e c t i o n = y e s ) = − 1 2 log ⁡ 2 1 2 − 1 2 log ⁡ 2 1 2 = 1 H(S_1|election=yes)=-\frac{1}{2}\log_2 \frac{1}{2}-\frac{1}{2}\log_2 \frac{1}{2} = 1 H(S1election=yes)=21log22121log221=1
    • H ( S 1 ∣ e l e c t i o n ) = 0.5 H(S_1|election) = 0.5 H(S1election)=0.5
    • g ( S 1 , e l e c t i o n ) = H ( S 1 ) − H ( S 1 ∣ e l e c t i o n ) = 0.811 − 0.5 = 0.311 g(S_1, election)=H(S_1)-H(S_1|election)=0.811-0.5=0.311 g(S1,election)=H(S1)H(S1election)=0.8110.5=0.311
  • step4.2 计算 g ( S 1 , s e a s o n ) = H ( S 1 ) − H ( S 1 ∣ s e a s o n ) g(S_1, season)=H(S_1)-H(S_1|season) g(S1,season)=H(S1)H(S1season)

    • H(S_1)=H(S|oil price=rise)=
    • H ( S 1 ∣ s e a s o n ) = 2 4 H ( S 1 ∣ s e a s o n = w i n t e r ) + 2 4 H ( S 1 ∣ s e a s o n = s u m m e r ) H(S_1|season) = \frac{2}{4}H(S_1|season=winter)+ \frac{2}{4} H(S_1|season=summer) H(S1season)=42H(S1season=winter)+42H(S1season=summer)
    • H ( S 1 ∣ s e a s o n = w i n t e r ) = − 1 2 log ⁡ 2 1 2 − 1 2 log ⁡ 2 1 2 = 1 H(S_1|season=winter)=-\frac{1}{2}\log_2 \frac{1}{2}-\frac{1}{2}\log_2 \frac{1}{2} = 1 H(S1season=winter)=21log22121log221=1
    • H ( S 1 ∣ s e a s o n = s u m m e r ) = − 2 2 log ⁡ 2 2 2 − 0 2 log ⁡ 2 0 2 = 0 H(S_1|season=summer)=-\frac{2}{2}\log_2 \frac{2}{2}-\frac{0}{2}\log_2 \frac{0}{2} = 0 H(S1season=summer)=22log22220log220=0
    • H ( S 1 ∣ s e a s o n ) = 0.5 H(S_1|season) = 0.5 H(S1season)=0.5
    • g ( S 1 , s e a s o n ) = H ( S 1 ) − H ( S 1 ∣ s e a s o n ) = 0.811 − 0.5 = 0.311 g(S_1, season)=H(S_1)-H(S_1|season)=0.811-0.5=0.311 g(S1,season)=H(S1)H(S1season)=0.8110.5=0.311

step5: g ( S 1 , e l e c t i o n ) g(S_1, election) g(S1,election) g ( S 1 , s e a s o n ) g(S_1, season) g(S1,season)一样,这里选择election作为节点划分。election=no,stock price全为rise,为叶子节点。对数据集election=yes,设为 H ( S 11 ) H(S_{11}) H(S11),只有season特征,选择season作为划分特征,两子集都为纯。当 season = winter 时,stock price = fall。当 season = summer 时,stock price = rise

oil price?
├── fall: stock price = fall
└── rise:
    ├── election?
    │   ├── no: stock price = rise
    │   └── yes:
    │       ├── season?
    │       │   ├── winter: stock price = fall
    │       │   └── summer: stock price = rise

log计算:

  • − 1 2 log ⁡ 2 1 2 − 1 2 log ⁡ 2 1 2 = 1 \displaystyle -\frac{1}{2}\log_2 \frac{1}{2}-\frac{1}{2}\log_2 \frac{1}{2} = 1 21log22121log221=1
  • − 2 2 log ⁡ 2 2 2 − 0 2 log ⁡ 2 0 2 = 0 \displaystyle -\frac{2}{2}\log_2 \frac{2}{2}-\frac{0}{2}\log_2 \frac{0}{2} = 0 22log22220log220=0
  • − 2 3 log ⁡ 2 2 3 − 1 3 log ⁡ 2 1 3 = 0.918 \displaystyle -\frac{2}{3}\log_2 \frac{2}{3}-\frac{1}{3}\log_2 \frac{1}{3} = 0.918 32log23231log231=0.918
  • − 3 4 log ⁡ 2 3 4 − 1 4 log ⁡ 2 1 4 = 0.811 \displaystyle - \frac{3}{4} \log_2 \frac{3}{4} - \frac{1}{4} \log_2 \frac{1}{4}=0.811 43log24341log241=0.811
  • − 3 5 log ⁡ 2 3 5 − 2 5 log ⁡ 2 2 5 = 0.971 \displaystyle - \frac{3}{5} \log_2 \frac{3}{5} - \frac{2}{5} \log_2 \frac{2}{5}=0.971 53log25352log252=0.971
  • − 1 5 log ⁡ 2 1 5 − 4 5 log ⁡ 2 4 5 = 0.722 \displaystyle - \frac{1}{5} \log_2 \frac{1}{5} - \frac{4}{5} \log_2 \frac{4}{5}=0.722 51log25154log254=0.722
  • − 1 6 log ⁡ 2 1 6 − 5 6 log ⁡ 2 5 6 = 0.650 \displaystyle - \frac{1}{6} \log_2 \frac{1}{6} - \frac{5}{6} \log_2 \frac{5}{6}=0.650 61log26165log265=0.650
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值