Optimal binning algorithm using dynamic programming

本文探讨了使用动态规划的最优分箱算法,通过最小化误差平方和(SSE)来优化聚类效果。SSE作为衡量集群内变异性的指标,当集群内所有案例完全相同时,SSE为0。通过递归方法计算会重复多次,而动态规划则避免了重复计算,提高了效率。文章详细解释了如何将问题分解为子问题,并利用前一阶段的最佳解来计算下一阶段的解。

COMP9318-19T1

LAB2

  • Optimal binning algorithm using dynamic programming

Error Sum of Squares (SSE)

SSE is the sum of the squared differences between each observation and its group’s mean. It can be used as a measure of variation within a cluster. If all cases within a cluster are identical the SSE would then be equal to 0.

The formula for SSE is:

Where n is the number of observations xi is the value of the ith observation and 0 is the mean of all the observations. This can also be rearranged to be written as seen in J.H. Ward’s paper.

Recurisive method will compute many times, although its space complicity is lower than dynamic programming.

In this case we apply dynamic programming, like Fibonacci sequence , we use privious consequence to compute the next by storing it in the mermory.

this problem also implement this strategy

the first row ,represent from that block divide into one bin

[ ] [ ] [ ] [3, 1, 18, 11, 13, 17] -1

[ ] [ ] [ ] [1, 18, 11, 13, 17] -1

[ ] [ ] [ ] [18, 11, 13, 17] -1

[ ] [ ] [ ] [11, 13, 17] 18.666

[ ] [ ] [ ] [13, 17] 8.0

[ ] [ ] [ ] [17] 0

the second row ,represent from that block divide into two bin

[ ] [ ] [[3, 1, 18, 11, 13, 17]] -1

[ ] [ ] [[1, 18, 11, 13, 17]] -1

[ ] [ ] [[18, 11, 13, 17]] 18.666

[ ] [ ] [[11, 13, 17]] 2

[ ] [ ] [[13, 17]] 0

[ ] [ ] [[17]] -1

etc…

Divding the problem into sub-problem , store the first row because it is the best optimal solution [11, 13, 17]:18.666, [13, 17]:8.0,[17]:0, after that when we are going to the second row,it will divide into two bins from the first number in the list, so[18, 11, 13, 17] can be split to prefix and suffix,which are [18] and [11,13,17], (this is only the first situation,we still can split into [[18,13],[13,17]] [[18,11,13],[17]] and the rest will apply the same thinking)as we already know what the cost of[11,13,17] from the first row’s data which is 18.666(this is optimal)and the cost of [18] which is 0 , we are able to calculate the optimal cost that dividing [18, 11, 13, 17]into two bins is 0 + 18.999 == 18.999, so in the second row , the optimal cost of dividing [18,11,13,17] into two bins is 18.999 this is the core thinking of the dynamic programming.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值