数据流中的中位数python_剑指offer 数据流的中位数

最新推荐文章于 2025-01-08 17:06:48 发布

weixin_39592789

最新推荐文章于 2025-01-08 17:06:48 发布

阅读量198

点赞数

CC 4.0 BY-SA版权

文章标签：数据流中的中位数python

本文链接：https://blog.youkuaiyun.com/weixin_39592789/article/details/113719312

本文介绍如何在数据流中高效地实时计算中位数，通过结合最大堆和最小堆技巧，分别处理奇数和偶数元素个数的情况。Python和Java示例代码展示了如何维护两个堆来保持中位数的更新。适合在线算法和数据结构爱好者学习。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

题目描述：

如何得到一个数据流中的中位数？如果从数据流中读出奇数个数值，那么中位数就是所有数值排序之后位于中间的数值。如果从数据流中读出偶数个数值，那么中位数就是所有数值排序之后中间两个数的平均值。我们使用Insert()方法读取数据流，使用GetMedian()方法获取当前读取数据的中位数。

题目分析：

对于数据流，对应的就是在线算法了，一道很经典的题目就是在1亿个数中找到最大的前100个数，这是一道堆应用题，找最大的前100个数，那么我们就创建一个大小为100的最小化堆，每来一个元素就与堆顶元素比较，因为堆顶元素是目前前100大数中的最小数，前来的元素如果比该元素大，那么就把原来的堆顶替换掉。

那么对于这一道题呢？如果单纯的把所有元素放到一个数组里，每次查找中位数最快也要O(n)，综合下来是O(n^2)的复杂度。可以利用上面例子中的想法，用一个最大堆来维护当前前n/2小的元素，那么每次找中位数只到取出堆顶就可以了。但是，有一个问题，数据要动态增长，有可能之前被替换掉的元素随着元素的增加又跑回来了，所以不能单纯得向上题一样把元素丢掉，这里可以再用一个最小化堆来存前n/2大的元素。

Python版本示例代码：

class Solution:

def __init__(self):

self.littleValueMaxHeap = []

self.bigValueMinHeap = []

self.maxHeapCount = 0

self.minHeapCount = 0

def Insert(self, num):

def cmpMaxHeap(a, b):

return a > b

def cmpMinHeap(a, b):

return a < b

if self.maxHeapCount > self.minHeapCount:

self.minHeapCount += 1

if num < self.littleValueMaxHeap[0]:

tmpNum = self.littleValueMaxHeap[0]

self.adjustHeap(num, self.bigValueMinHeap, cmpMinHeap)

self.createHeap(tmpNum, self.bigValueMinHeap, cmpMinHeap)

else:

self.createHeap(num, self.bigValueMinHeap, cmpMinHeap)

else:

self.maxHeapCount += 1

if len(self.littleValueMaxHeap) == 0:

self.createHeap(num, self.littleValueMaxHeap, cmpMaxHeap)

else:

if self.bigValueMinHeap[0] < num:

tmpNum = self.bigValueMinHeap[0]

self.adjustHeap(num, self.littleValueMaxHeap, cmpMaxHeap)

self.createHeap(tmpNum, self.littleValueMaxHeap, cmpMaxHeap)

else:

self.createHeap(num, self.littleValueMaxHeap, cmpMaxHeap)

def GetMedian(self):

if self.minHeapCount < self.maxHeapCount:

return self.littleValueMaxHeap[0]

else:

return (self.littleValueMaxHeap[0] + self.bigValueMinHeap[0]) / 2

def createHeap(self, num, heap, cmpfun):

heap.append(num)

tmpIndex = len(heap) - 1

while tmpIndex:

parentIndex = (tmpIndex - 1) // 2

if cmpfun(heap[tmpIndex], heap[parentIndex]):

heap[parentIndex], heap[tmpIndex] = heap[tmpIndex], heap[parentIndex]

tmpIndex = parentIndex

else:

break

def adjustHeap(self, num, heap, cmpfun):

if num < heap[0]:

maxHeapLen = len(heap)

heap[0] = num

tmpIndex = 0

while tmpIndex < maxHeapLen:

leftIndex = tmpIndex * 2 + 1

rightIndex = tmpIndex * 2 + 2

largerIndex = 0

if rightIndex < maxHeapLen:

largerIndex = rightIndex if cmpfun(heap[rightIndex], heap[leftIndex]) else leftIndex

elif leftIndex < maxHeapLen:

largerIndex = leftIndex

else:

break

if cmpfun(heap[largerIndex], heap[tmpIndex]):

heap[largerIndex], heap[tmpIndex] = heap[tmpIndex], heap[largerIndex]

tmpIndex = largerIndex

else:

break

Java版本示例代码：

import java.util.PriorityQueue;

import java.util.Comparator;

public class Solution {

PriorityQueue pqMax = new PriorityQueue(new Comparator(){

@Override

public int compare(Integer o1, Integer o2){

return o2 - o1;

}

});

PriorityQueue pqMin = new PriorityQueue();

public void Insert(Integer num) {

if(pqMax.size() == 0 || num < pqMax.peek()){

pqMax.offer(num);

}else{

pqMin.offer(num);

}

if(Math.abs(pqMax.size() - pqMin.size()) > 1){

if(pqMax.size() > pqMin.size()){

pqMin.offer(pqMax.poll());

}else{

pqMax.offer(pqMin.poll());

}

public Double GetMedian() {

int count = pqMax.size() + pqMin.size();

if(count % 2 == 1){

if(pqMax.size() > pqMin.size()){

return 1.0 * pqMax.peek();

}else{

return 1.0 * pqMin.peek();

}

}else{

return (pqMax.peek() + pqMin.peek())/2.0;

}

Tips

插入元素：add()、offer()

删除元素：poll()

获取队头元素：peek()

获取队列大小：size()

清空队列：clear()

是否包含某个元素：contains()

清除一个指定元素：remove()

优先队列默认按从小到大的顺序排列，如果需要一个大顶堆的话，需要重写compar方法。

Note

这个题目也可以用先排序再取中位数的方式，只是这样做对数据量大的时候会超时。在python中可以直接调用.sort()进行排序

class Solution:

def __init__(self):

self.data = []

def Insert(self, num):

self.data.append(num)

self.data.sort()

def GetMedian(self, data):

length = len(self.data)

if length % 2 == 0:

return (self.data[length // 2] + self.data[length // 2 -1]) / 2.0

else:

return self.data[int(length // 2)]欢迎关注，一起学习