295. Find Median from Data Stream

本文介绍了一种使用两个优先队列(堆)解决数据流中位数问题的方法。通过维护一个最大堆和一个最小堆,可以实现在O(log n)的时间复杂度内添加新元素并查找中位数。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Description

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.

For example,
[2,3,4], the median is 3

[2,3], the median is (2 + 3) / 2 = 2.5

Design a data structure that supports the following two operations:

void addNum(int num) - Add a integer number from the data stream to the data structure.
double findMedian() - Return the median of all elements so far.

Example:

addNum(1)
addNum(2)
findMedian() -> 1.5
addNum(3)
findMedian() -> 2

Follow up:

If all integer numbers from the stream are between 0 and 100, how would you optimize it?
If 99% of all integer numbers from the stream are between 0 and 100, how would you optimize it?

Problem URL


Solution

设计一个数据结构,能够存贮一系列的数,并且快速找到它们的中位数。

Using two priority queue(heap) to solve this problem. left is a max heap which stores left part, and right is a min heap which stores right part. When a new number comes in, store to right first then store right.poll() to left. If right.size() < left.size(), poll() left to right to make sure median is in right part when number of numbers is odd.

Code
class MedianFinder {
    Queue<Integer> left;
    Queue<Integer> right;
    /** initialize your data structure here. */
    public MedianFinder() {
        left = new PriorityQueue<>();
        right = new PriorityQueue<>(Collections.reverseOrder());
    }
    
    public void addNum(int num) {
        right.offer(num);
        left.offer(right.poll());
        if (right.size() < left.size()){
            right.offer(left.poll());
        }
    }
    
    public double findMedian() {
        if (right.size() == left.size()){
            return (right.peek() + left.peek()) / 2.0;
        }
        else{
            return right.peek();
        }
    }
}

/**
 * Your MedianFinder object will be instantiated and called as such:
 * MedianFinder obj = new MedianFinder();
 * obj.addNum(num);
 * double param_2 = obj.findMedian();
 */

Time Complexity: O()
Space Complexity: O()


Review
<think>我们被要求解释np.median函数,并提供示例代码。同时,最后必须提出3-5个相关问题。 注意:问题必须以“§§相关问题§§”开头,然后列出问题,禁止换行分散。 首先,我们介绍np.median的基本功能:计算数组的中位数(中值)。中位数是将数据排序后位于中间的值,如果数据点数为偶数,则取中间两个数的平均值。 然后,我们将给出一个使用示例,包括代码和注释。 最后,提出相关问题。</think>### NumPy中位数计算函数详解 #### **核心功能** `np.median()` 计算数组沿指定轴的中位数(50%分位数)。中位数是统计学中的中心位置度量,将数据集分为两个相等的部分: - **奇数个元素**:取中间值 - **偶数个元素**:取中间两个值的平均 - **处理NaN**:默认返回NaN(可通过`nan_policy`参数控制) #### **基本用法示例** ```python import numpy as np # 一维数组计算 arr1d = np.array([3, 1, 4, 2, 5]) print(np.median(arr1d)) # 输出: 3.0 # 二维数组沿不同轴计算 arr2d = np.array([[1, 3, 5], [2, 4, 6]]) print(np.median(arr2d, axis=0)) # 沿列计算: [1.5 3.5 5.5] print(np.median(arr2d, axis=1)) # 沿行计算: [3. 4.] # 包含NaN的处理 arr_nan = np.array([1, 2, np.nan, 4]) print(np.nanmedian(arr_nan)) # 忽略NaN计算: 2.5 ``` #### **高级应用场景** 1. **图像处理 - 中值滤波降噪** ```python from scipy import ndimage def median_filter(image, kernel_size=3): """中值滤波去除椒盐噪声""" return ndimage.median_filter(image, size=kernel_size) # 示例:去除图像噪声 noisy_img = np.random.randint(0, 256, (512, 512), dtype=np.uint8) noisy_img[::20, ::20] = 255 # 添加椒盐噪声 clean_img = median_filter(noisy_img) ``` 2. **时间序列分析 - 滚动中位数** ```python def rolling_median(data, window_size): """计算滚动中位数""" return np.array([ np.median(data[max(0, i-window_size+1):i+1]) for i in range(len(data)) ]) # 示例:平滑股票价格波动 stock_prices = np.cumsum(np.random.randn(100)) + 100 smoothed = rolling_median(stock_prices, window_size=5) ``` 3. **大数据分块计算** ```python def chunked_median(arr, chunk_size=1000): """分块计算大型数组中位数""" sorted_chunks = [np.sort(arr[i:i+chunk_size]) for i in range(0, len(arr), chunk_size)] combined = np.concatenate(sorted_chunks) return np.median(combined) # 处理10亿元素数组(仅需4GB内存) large_arr = np.random.rand(10**9) print(chunked_median(large_arr)) # 应接近0.5 ``` #### **性能优化技巧** 1. **使用`partition`替代全排序** ```python # 更高效的中位数计算(避免完全排序) def fast_median(arr): n = len(arr) if n % 2 == 1: return np.partition(arr, n//2)[n//2] else: part = np.partition(arr, [n//2-1, n//2]) return (part[n//2-1] + part[n//2]) / 2 ``` 2. **多线程并行计算** ```python from concurrent.futures import ThreadPoolExecutor def parallel_median(matrix, axis=0): """沿指定轴并行计算中位数""" with ThreadPoolExecutor() as executor: if axis == 0: return np.stack(list(executor.map( lambda i: np.median(matrix[:, i]), range(matrix.shape[1]) ))) else: return np.stack(list(executor.map( lambda i: np.median(matrix[i, :]), range(matrix.shape[0]) ))) ``` 3. **GPU加速(使用CuPy)** ```python import cupy as cp def gpu_median(arr): """使用GPU加速中位数计算""" gpu_arr = cp.asarray(arr) return cp.median(gpu_arr).get() ``` #### **与其他函数对比** | 函数 | 计算内容 | 时间复杂度 | 适用场景 | |------|----------|------------|----------| | `np.median()` | 中位数 | O(n log n) | 通用计算 | | `np.mean()` | 平均值 | O(n) | 正态分布数据 | | `np.percentile(50)` | 50%分位 | O(n log n) | 等同median | | `scipy.stats.median_abs_dev()` | 绝对中位差 | O(n log n) | 鲁棒性统计 |
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值