Keep track of the median?

本文介绍了如何使用最大堆和最小堆数据结构实时计算不断变化数字流的中位数。详细解释了算法原理,包括初始化堆、处理新数字和堆平衡操作,并提供了实例说明。

From: Career up 150.

1. Question: Numbers are randomly generated and stored into an (expanding) array. How would you keep track of the median?

Answer: 

Heap? A heap is really good at basic ordering and keeping track of max and mins. This is actually interesting – if you had two heaps, you could keep track of the biggest half and the smallest half of the elements. The biggest half is kept in a min heap, such that the smallest element in the biggest half is at the root. The smallest half is kept in a max heap, such that the biggest element of the smallest half is at the root. Now, with these data structures, you have the potential median elements at the roots. If the heaps are no longer the same size, you can quickly “rebalance” the heaps by popping an element off the one heap and pushing it onto the other.

2. 可参考 http://yaronspace.cn/blog/archives/1306 http://www.cppblog.com/820986942/archive/2011/05/23/146991.html

题目介绍:

输入为不断地数字流,实时显示出当前已经输入的数字序列的中位数

解答:

求中位数的方法很多,对于大数据量最经典是桶的计数方法,但是对于这个问题不适用,因为数据是不断变化的

可以用最大堆和最小堆来解答这个问题:

1.假设当前的中位数为m,其中最大堆维护的是<=m的数字序列,最小堆维护的是>=m的数字序列,但是两个堆都不包含m

2.当新的数字到达时,比如为a,将a与m进行比较,若a<=m 则将其加入到最大堆中,否则将其加入到最小堆中

3.如果此时最小堆和最大堆的元素个数的差值>=2 ,则将m加入到元素个数少的堆中,然后从元素个数多的堆将根节点赋值到m,最后重建两个最大堆和最小堆,返回到2


进一步,如果数组数据不仅仅是增加,而是可以删除数据。这时应该使用什么数据结构呢?上面的堆就不合适了(查找需要O(n))。例如下面的问题,

https://www.interviewstreet.com/challenges/dashboard/#problem/4fcf919f11817

The median of M numbers is defined as the middle number after sorting them in order, if M is odd or the average number of the middle 2 numbers (again after sorting) if M is even. You have an empty number list at first. Then you can add or remove some number from the list. For each add or remove operation, output the median of numbers in the list.
 
Example : For a set of m = 5 numbers, { 9, 2, 8, 4, 1 } the median is the third number in sorted set { 1, 2, 4, 8, 9 } which is 4. Similarly for set of m = 4, { 5, 2, 10, 4 }, the median is the average of second and the third element in the sorted set { 2, 4, 5, 10 } which is (4+5)/2 = 4.5  
 
Input:
 
The first line is an integer n indicates the number of operations. Each of the next n lines is either "a x" or "r x" which indicates the operation is add or remove.
 
Output:
 
For each operation: If the operation is add output the median after adding x in a single line. If the operation is remove and the number x is not in the list, output "Wrong!" in a single line. If the operation is remove and the number x is in the list, output the median after deleting x in a single line. (if the result is an integer DO NOT output decimal point. And if the result is a double number , DO NOT output trailing 0s.)
 
Constraints:
 
0 < n <= 100,000
 
for each "a x" or "r x" , x will fit in 32-bit integer.
 
Sample Input:
 
7
r 1
a 1
a 2
a 1
r 1
r 2
r 1
 
Sample Output:
Wrong!
1
1.5
1
1.5
1
Wrong!
 
Note: As evident from the last line of the input, if after remove operation the list becomes empty you have to print "Wrong!" ( quotes are for clarity ).

内容概要:本文系统介绍了算术优化算法(AOA)的基本原理、核心思想及Python实现方法,并通过图像分割的实际案例展示了其应用价值。AOA是一种基于种群的元启发式算法,其核心思想来源于四则运算,利用乘除运算进行全局勘探,加减运算进行局部开发,通过数学优化器加速函数(MOA)和数学优化概率(MOP)动态控制搜索过程,在全局探索与局部开发之间实现平衡。文章详细解析了算法的初始化、勘探与开发阶段的更新策略,并提供了完整的Python代码实现,结合Rastrigin函数进行测试验证。进一步地,以Flask框架搭建前后端分离系统,将AOA应用于图像分割任务,展示了其在实际工程中的可行性与高效性。最后,通过收敛速度、寻优精度等指标评估算法性能,并提出自适应参数调整、模型优化和并行计算等改进策略。; 适合人群:具备一定Python编程基础和优化算法基础知识的高校学生、科研人员及工程技术人员,尤其适合从事人工智能、图像处理、智能优化等领域的从业者;; 使用场景及目标:①理解元启发式算法的设计思想与实现机制;②掌握AOA在函数优化、图像分割等实际问题中的建模与求解方法;③学习如何将优化算法集成到Web系统中实现工程化应用;④为算法性能评估与改进提供实践参考; 阅读建议:建议读者结合代码逐行调试,深入理解算法流程中MOA与MOP的作用机制,尝试在不同测试函数上运行算法以观察性能差异,并可进一步扩展图像分割模块,引入更复杂的预处理或后处理技术以提升分割效果。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值