4. Median of Two Sorted Arrays

本文介绍了一种通过划分两个数组来找到合并数组中位数的方法。关键在于如何将两个数组分割成相同长度的部分,并确保左侧的最大值不大于右侧的最小值。

To solve this problem, we need to understand "What is the use of median". In statistics, the median is used fordividing a set into two equal length subsets, that one subset is always greater than the other. If we understand the use of median for dividing, we are very close to the answer.

First let's cut A into two parts at a random position i:

      left_A             |        right_A
A[0], A[1], ..., A[i-1]  |  A[i], A[i+1], ..., A[m-1]

Since A has m elements, so there are m+1 kinds of cutting( i = 0 ~ m ). And we know: len(left_A) = i, len(right_A) = m - i . Note: when i = 0 ,left_A is empty, and when i = m , right_A is empty.

With the same way, cut B into two parts at a random position j:

      left_B             |        right_B
B[0], B[1], ..., B[j-1]  |  B[j], B[j+1], ..., B[n-1]

Put left_A and left_B into one set, and put right_A and right_B into another set. Let's name them left_part and right_part :

      left_part          |        right_part
A[0], A[1], ..., A[i-1]  |  A[i], A[i+1], ..., A[m-1]
B[0], B[1], ..., B[j-1]  |  B[j], B[j+1], ..., B[n-1]

If we can ensure:

1) len(left_part) == len(right_part)
2) max(left_part) <= min(right_part)

then we divide all elements in {A, B} into two parts with equal length, and one part is always greater than the other. Thenmedian = (max(left_part) + min(right_part))/2.

To ensure these two conditions, we just need to ensure:

(1) i + j == m - i + n - j (or: m - i + n - j + 1)
    if n >= m, we just need to set: i = 0 ~ m, j = (m + n + 1)/2 - i
(2) B[j-1] <= A[i] and A[i-1] <= B[j]

ps.1 For simplicity, I presume A[i-1],B[j-1],A[i],B[j] are always valid even ifi=0/i=m/j=0/j=n . I will talk about how to deal with these edge values at last.

ps.2 Why n >= m? Because I have to make sure j is non-nagative since 0 <= i <= m and j = (m + n + 1)/2 - i. If n < m , then j may be nagative, that will lead to wrong result.

So, all we need to do is:

Searching i in [0, m], to find an object `i` that:
    B[j-1] <= A[i] and A[i-1] <= B[j], ( where j = (m + n + 1)/2 - i )

And we can do a binary search following steps described below:

<1> Set imin = 0, imax = m, then start searching in [imin, imax]

<2> Set i = (imin + imax)/2, j = (m + n + 1)/2 - i

<3> Now we have len(left_part)==len(right_part). And there are only 3 situations
     that we may encounter:
    <a> B[j-1] <= A[i] and A[i-1] <= B[j]
        Means we have found the object `i`, so stop searching.
    <b> B[j-1] > A[i]
        Means A[i] is too small. We must `ajust` i to get `B[j-1] <= A[i]`.
        Can we `increase` i?
            Yes. Because when i is increased, j will be decreased.
            So B[j-1] is decreased and A[i] is increased, and `B[j-1] <= A[i]` may
            be satisfied.
        Can we `decrease` i?
            `No!` Because when i is decreased, j will be increased.
            So B[j-1] is increased and A[i] is decreased, and B[j-1] <= A[i] will
            be never satisfied.
        So we must `increase` i. That is, we must ajust the searching range to
        [i+1, imax]. So, set imin = i+1, and goto <2>.
    <c> A[i-1] > B[j]
        Means A[i-1] is too big. And we must `decrease` i to get `A[i-1]<=B[j]`.
        That is, we must ajust the searching range to [imin, i-1].
        So, set imax = i-1, and goto <2>.

When the object i is found, the median is:

max(A[i-1], B[j-1]) (when m + n is odd)
or (max(A[i-1], B[j-1]) + min(A[i], B[j]))/2 (when m + n is even)

Now let's consider the edges values i=0,i=m,j=0,j=n where A[i-1],B[j-1],A[i],B[j] may not exist. Actually this situation is easier than you think.

What we need to do is ensuring that max(left_part) <= min(right_part). So, ifi and j are not edges values(means A[i-1],B[j-1],A[i],B[j] all exist), then we must check bothB[j-1] <= A[i] and A[i-1] <= B[j]. But if some ofA[i-1],B[j-1],A[i],B[j] don't exist, then we don't need to check one(or both) of these two conditions. For example, ifi=0, then A[i-1] doesn't exist, then we don't need to checkA[i-1] <= B[j]. So, what we need to do is:

Searching i in [0, m], to find an object `i` that:
    (j == 0 or i == m or B[j-1] <= A[i]) and
    (i == 0 or j == n or A[i-1] <= B[j])
    where j = (m + n + 1)/2 - i

And in a searching loop, we will encounter only three situations:

<a> (j == 0 or i == m or B[j-1] <= A[i]) and
    (i == 0 or j = n or A[i-1] <= B[j])
    Means i is perfect, we can stop searching.

<b> j > 0 and i < m and B[j - 1] > A[i]
    Means i is too small, we must increase it.

<c> i > 0 and j < n and A[i - 1] > B[j]
    Means i is too big, we must decrease it.

 pointed out that: i < m ==> j > 0 and i > 0 ==> j < n . Because:

m <= n, i < m ==> j = (m+n+1)/2 - i > (m+n+1)/2 - m >= (2*m+1)/2 - m >= 0    
m <= n, i > 0 ==> j = (m+n+1)/2 - i < (m+n+1)/2 <= (2*n+1)/2 <= n

So in situation <b> and <c>, we don't need to check whether j > 0 and whetherj < n

该数据集通过合成方式模拟了多种发动机在运行过程中的传感器监测数据,旨在构建一个用于机械系统故障检测的基准资源,特别适用于汽车领域的诊断分析。数据按固定时间间隔采集,涵盖了发动机性能指标、异常状态以及工作模式等多维度信息。 时间戳:数据类型为日期时间,记录了每个数据点的采集时刻。序列起始于2024年12月24日10:00,并以5分钟为间隔持续生成,体现了对发动机运行状态的连续监测。 温度(摄氏度):以浮点数形式记录发动机的温度读数。其数值范围通常处于60至120摄氏度之间,反映了发动机在常规工况下的典型温度区间。 转速(转/分钟):以浮点数表示发动机曲轴的旋转速度。该参数在1000至4000转/分钟的范围内随机生成,符合多数发动机在正常运转时的转速特征。 燃油效率(公里/升):浮点型变量,用于衡量发动机的燃料利用效能,即每升燃料所能支持的行驶里程。其取值范围设定在15至30公里/升之间。 振动_X、振动_Y、振动_Z:这三个浮点数列分别记录了发动机在三维空间坐标系中各轴向的振动强度。测量值标准化至0到1的标度,较高的数值通常暗示存在异常振动,可能与潜在的机械故障相关。 扭矩(牛·米):以浮点数表征发动机输出的旋转力矩,数值区间为50至200牛·米,体现了发动机的负载能力。 功率输出(千瓦):浮点型变量,描述发动机单位时间内做功的速率,取值范围为20至100千瓦。 故障状态:整型分类变量,用于标识发动机的异常程度,共分为四个等级:0代表正常状态,1表示轻微故障,2对应中等故障,3指示严重故障。该列作为分类任务的目标变量,支持基于传感器数据预测故障等级。 运行模式:字符串类型变量,描述发动机当前的工作状态,主要包括:怠速(发动机运转但无负载)、巡航(发动机在常规负载下平稳运行)、重载(发动机承受高负荷或高压工况)。 数据集整体包含1000条记录,每条记录对应特定时刻的发动机性能快照。其中故障状态涵盖从正常到严重故障的四级分类,有助于训练模型实现故障预测与诊断。所有数据均为合成生成,旨在模拟真实的发动机性能变化与典型故障场景,所包含的温度、转速、燃油效率、振动、扭矩及功率输出等关键传感指标,均为影响发动机故障判定的重要因素。 资源来源于网络分享,仅用于学习交流使用,请勿用于商业,如有侵权请联系我删除!
题目描述是关于寻找两个已排序数组 `nums1` 和 `nums2` 的合并后的中位数。这两个数组分别包含 `m` 和 `n` 个元素。要解决这个问题,首先我们需要合并这两个数组并保持有序,然后根据数组的总大小决定取中间值的方式。 1. 合并两个数组:由于数组是有序的,我们可以使用双指针法,一个指向 `nums1` 的起始位置,另一个指向 `nums2` 的起始位置。比较两个指针所指元素的大小,将较小的那个放入一个新的合并数组中,同时移动对应指针。直到其中一个数组遍历完毕,再将另一个数组剩余的部分直接复制到合并数组中。 2. 计算中位数:如果合并数组的长度为奇数,则中位数就是最中间的那个元素;如果长度为偶数,则中位数是中间两个元素的平均值。我们可以通过检查数组长度的奇偶性来确定这一点。 下面是Python的一个基本解决方案: ```python def findMedianSortedArrays(nums1, nums2): merged = [] i, j = 0, 0 # Merge both arrays while i < len(nums1) and j < len(nums2): if nums1[i] < nums2[j]: merged.append(nums1[i]) i += 1 else: merged.append(nums2[j]) j += 1 # Append remaining elements from longer array while i < len(nums1): merged.append(nums1[i]) i += 1 while j < len(nums2): merged.append(nums2[j]) j += 1 # Calculate median length = len(merged) mid = length // 2 if length % 2 == 0: # If even, return average of middle two elements return (merged[mid - 1] + merged[mid]) / 2.0 else: # If odd, return middle element return merged[mid] ``` 这个函数返回的是两个数组合并后的中位数。注意,这里假设数组 `nums1` 和 `nums2` 都是非空的,并且已经按照升序排列。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值