Straight-forward solution.
O(n) space, O(1)(get) + O(nlogn)(sort the array before getting median) time solution is straight forward.
Just stores all the values into an array and sort the whole array before finding the medium.
Or we can always keep the array sorted by inserting the number at the right place. This method is good when the number of getMedians is greater than insertion.
But both of them are inefficient.
Two-heap solution
Obviously we are doing redundant works here. We only want the median but we are sorting the whole array. That includes unnecessary work.
In order to prevent from doing useless work, we have to use some data structure to give us the medians without sorting the whole array. Considering that medians is mainly determined by the middle 2 numbers in the input stream, what we need to know is the MAXMIUM of the smaller half and the MINIMUM of the larger half. Thus we could store the whole input stream into 2 heaps in which each contains half of the stream. And left half should be stored in a max-heap, which allows us to get the maximum of the smaller half, right half should be stored in a min-heap, which allows us to get the minimum of the greater half. And combines the 2 will give us the median given the size of the input stream (odd or even).
In this way, the time of get the median will be O(1) for get, O(logn) for insert. Space is still O(n). Specifically the time for insertion is O(5 * logn) in the worst case, the condition is: when the 2 heap is of the same size before we insert a number, then it takes O(2 * logn) to insert the number, take out the max and repercolate the the max. After that, it takes another O(3 * logn) to insert the number to right, take the min from right, repercolate min of right and insert the min to left.
Also keep in mind that when the total number of data is odd, we keep the smaller half heap to hold 1 more element than the greater half.
public class MedianFinder {
int count;
PriorityQueue<Integer> rightHalf;
// Use a custom comparator to construct a maxHeap.
PriorityQueue<Integer> leftHalf;
public MedianFinder() {
this.rightHalf = new PriorityQueue<Integer>();
// Use a custom comparator to construct a maxHeap.
this.leftHalf = new PriorityQueue<Integer>(new Comparator<Integer>() {
@Override
public int compare(Integer a, Integer b) {
return b.compareTo(a);
}
});
this.count = 0;
}
public void addNum(int num) {
if (this.count == 0) { // Both heap is empty, add number to left
this.leftHalf.offer(num);
} else {
// Always add to leftHalf, then take the maximum from leftHalf and add it to rightHalf.
this.leftHalf.offer(num);
this.rightHalf.offer(this.leftHalf.poll());
// This can leads to rightHalf larger than leftHalf.
// If rightHalf contains more element than left half, rebalance the heaps.
if (this.rightHalf.size() > this.leftHalf.size()) {
this.leftHalf.offer(this.rightHalf.poll());
}
}
this.count++;
}
public double findMedian() {
if ((this.count & 1) != 0) { // odd number of elements
return this.leftHalf.peek();
} else {
return (Double.valueOf(this.leftHalf.peek()) + Double.valueOf(this.rightHalf.peek())) / 2;
}
}
}

本文介绍了一种高效计算流数据中位数的方法,通过使用两个堆(一个最大堆和一个最小堆)来实时维护数据流中的中位数,实现了O(logn)的时间复杂度进行插入操作,并能在O(1)时间内获取当前数据流的中位数。
579

被折叠的 条评论
为什么被折叠?



