Peaks Complexity

解决一个关于如何将数组等分为包含至少一个峰值的子数组的问题,并分析其时间复杂度。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

I've just done the following Codility Peaks problem. The problem is as follows:


A non-empty zero-indexed array A consisting of N integers is given. A peak is an array element which is larger than its neighbors. More precisely, it is an index P such that 0 < P < N − 1, A[P − 1] < A[P] and A[P] > A[P + 1]. For example, the following array A:

A[0] = 1
A[1] = 2
A[2] = 3
A[3] = 4
A[4] = 3
A[5] = 4
A[6] = 1
A[7] = 2
A[8] = 3
A[9] = 4
A[10] = 6
A[11] = 2

has exactly three peaks: 3, 5, 10. We want to divide this array into blocks containing the same number of elements. More precisely, we want to choose a number K that will yield the following blocks: A[0], A[1], ..., A[K − 1], A[K], A[K + 1], ..., A[2K − 1], ... A[N − K], A[N − K + 1], ..., A[N − 1]. What's more, every block should contain at least one peak. Notice that extreme elements of the blocks (for example A[K − 1] or A[K]) can also be peaks, but only if they have both neighbors (including one in an adjacent blocks). The goal is to find the maximum number of blocks into which the array A can be divided. Array A can be divided into blocks as follows:

one block (1, 2, 3, 4, 3, 4, 1, 2, 3, 4, 6, 2). This block contains three peaks.

two blocks (1, 2, 3, 4, 3, 4) and (1, 2, 3, 4, 6, 2). Every block has a peak.

three blocks (1, 2, 3, 4), (3, 4, 1, 2), (3, 4, 6, 2). Every block has a peak. 

Notice in particular that the first block (1, 2, 3, 4) has a peak at A[3], because A[2] < A[3] > A[4], even though A[4] is in the adjacent block. However, array A cannot be divided into four blocks, (1, 2, 3), (4, 3, 4), (1, 2, 3) and (4, 6, 2), because the (1, 2, 3) blocks do not contain a peak. Notice in particular that the (4, 3, 4) block contains two peaks: A[3] and A[5]. The maximum number of blocks that array A can be divided into is three.

Write a function: class Solution { public int solution(int[] A); } that, given a non-empty zero-indexed array A consisting of N integers, returns the maximum number of blocks into which A can be divided. If A cannot be divided into some number of blocks, the function should return 0. For example, given:

A[0] = 1
A[1] = 2 
A[2] = 3 
A[3] = 4 
A[4] = 3 
A[5] = 4 
A[6] = 1 
A[7] = 2 
A[8] = 3 
A[9] = 4 
A[10] = 6 
A[11] = 2

the function should return 3, as explained above. Assume that:

N is an integer within the range [1..100,000]; each element of array A is an integer within the range [0..1,000,000,000].

Complexity:

expected worst-case time complexity is O(N*log(log(N)))

expected worst-case space complexity is O(N), beyond input storage (not counting the storage required for input arguments).

Elements of input arrays can be modified.


My Question

So I solve this with what to me appears to be the brute force solution – go through every group size from 1..N, and check whether every group has at least one peak. The first 15 minutes I was trying to solve this I was trying to figure out some more optimal way, since the required complexity is O(N*log(log(N))).

This is my "brute-force" code that passes all the tests, including the large ones, for a score of 100/100:

public int solution(int[] A) {
    int N = A.length;

    ArrayList<Integer> peaks = new ArrayList<Integer>();
    for(int i = 1; i < N-1; i++){
        if(A[i] > A[i-1] && A[i] > A[i+1]) peaks.add(i);
    }

    for(int size = 1; size <= N; size++){
        if(N % size != 0) continue;
        int find = 0;
        int groups = N/size;
        boolean ok = true;
        for(int peakIdx : peaks){
            if(peakIdx/size > find){
                ok = false;
                break;
            }
            if(peakIdx/size == find) find++;
        }
        if(find != groups) ok = false;
        if(ok) return groups;
    }

    return 0;
}

My question is how do I deduce that this is in fact O(N*log(log(N))), as it's not at all obvious to me, and I was surprised I pass the test cases. I'm looking for even the simplest complexity proof sketch that would convince me of this runtime. I would assume that a log(log(N)) factor means some kind of reduction of a problem by a square root on each iteration, but I have no idea how this applies to my problem. Thanks a lot for any help

share improve this question
 

4 Answers

You're completely right: to get the log log performance the problem needs to be reduced. 

A n.log(log(n)) solution in python [below]. Codility no longer test 'performance' on this problem (!) but the python solution scores 100% for accuracy. 

As you've already surmised: Outer loop will be O(n) since it is testing whether each size of block is a clean divisor Inner loop must be O(log(log(n))) to give O(n log(log(n))) overall.

We can get good inner loop performance because we only need to perform d(n), the number of divisors of n. We can store a prefix sum of peaks-so-far, which uses the O(n) space allowed by the problem specification. Checking whether a peak has occurred in each 'group' is then an O(1) lookup operation using the group start and end indices.

Following this logic, when the candidate block size is 3 the loop needs to perform n / 3 peak checks. The complexity becomes a sum: n/a + n/b + ... + n/n where the denominators (a, b, ...) are the factors of n. 

Short story: The complexity of n.d(n) operations is O(n.log(log(n))). 

Longer version: If you've been doing the Codility Lessons you'll remember from the Lesson 8: Prime and composite numbers that the sum of harmonic number operations will give O(log(n)) complexity. We've got a reduced set, because we're only looking at factor denominators. Lesson 9: Sieve of Eratosthenes shows how the sum of reciprocals of primes is O(log(log(n))) and claims that 'the proof is non-trivial'. In this case Wikipedia tells us that the sum of divisors sigma(n) has an upper bound (see Robin's inequality, about half way down the page). 

Does that completely answer your question? Suggestions on how to improve my python code are also very welcome!

def solution(data):

    length = len(data)

    # array ends can't be peaks, len < 3 must return 0    
    if len < 3:
        return 0

    peaks = [0] * length

    # compute a list of 'peaks to the left' in O(n) time
    for index in range(2, length):
        peaks[index] = peaks[index - 1]

        # check if there was a peak to the left, add it to the count
        if data[index - 1] > data[index - 2] and data[index - 1] > data[index]:
            peaks[index] += 1

    # candidate is the block size we're going to test
    for candidate in range(3, length + 1):

        # skip if not a factor
        if length % candidate != 0:
            continue

        # test at each point n / block
        valid = True
        index = candidate
        while index != length:

            # if no peak in this block, break
            if peaks[index] == peaks[index - candidate]:
                valid = False
                break

            index += candidate

        # one additional check since peaks[length] is outside of array    
        if index == length and peaks[index - 1] == peaks[index - candidate]:
            valid = False

        if valid:
            return length / candidate

    return 0

Credits: Major kudos to @tmyklebu for his SO answer which helped me a lot.

题目大意是希望将序列等分成c片,每片都要至少有一个peak,peak就是比左右都大的数(原序列首尾不能算),求c的最大值 //Codility题目描述还真是啰嗦啊喂

  1. 用O(n)得空间统计从开始到目前为止得peak数sum[],以及最远两peak间坐标差D
  2. 求最大分片数c,即求最小分片长度k,可行解k的可能范围在(D/2,min(D,n/2)],要等分首先 n % k == 0,其次sum[k - 1], sum[2 * k - 1], sum[3 * k - 1],....sum[n - 1]这个数列有n/k项。所以外层循环k的次数等于(D/2,D]间n的约数个数(小于D/2),内层判断是否可行需要n/k的操作(小于2n/D)。于是第2步时间复杂度也是O(n)。
  3. 编程上要注意计算D的时候要考虑第一个和最后一个peak到首尾的距离,还有不要混淆K c的含义(大写字母做变量名很容易出错)。
代码
int solution(vector<int> &A) {
    int N = A.size();
    vector<int> npeaks(N+1, 0);//npeaks[i]代表第i个元素(不包括)之前peak的数量
    int maxD = 0;//最远两peak间坐标差D
    int last_peak = -1;//处理第一个peak到起始的距离 
    for(int i = 1; i < N-1; i++){
        if(A[i]>A[i+1] && A[i]>A[i-1]){
            npeaks[i+1] = npeaks[i] + 1;
            maxD = max(maxD, i - last_peak);
            last_peak = i;
        }
        else{
            npeaks[i+1] = npeaks[i];
        }
    }
    maxD = max(maxD, N - last_peak);//处理最后一个peak到末端的距离
    npeaks[N] = npeaks[N-1];
    if(npeaks[N] < 1) return 0;
    if(maxD > N/2) return 1; 
    for(int K = maxD/2; K <= maxD; K++){//slice长度
        if(N%K == 0){
            bool isvalid = true;
            int c = N/K;//slice数量
            for(int i = 1; i <= c; i++){
                if(npeaks[i*K] - npeaks[(i-1)*K] < 1){
                    isvalid = false;
                    break;
                }
            }
            if(isvalid) return c;
        }
    }
    cout<<"fail"<<endl;
    for(int K = maxD+1;;K++) {
        if(N%K == 0) return N/K;//不能return K 呀
    }
}

import os import random import subprocess import serial import time from scipy.signal import peak_widths, find_peaks from instruments import optical_spectrum_analyzer import pandas as pd import numpy as np from instruments import oscilloscope # 配置串口参数 port = &#39;COM3&#39; baudrate = 115200 # 初始化串口对象 try: ser = serial.Serial(port, baudrate, timeout=1) print("成功打开串口") except serial.SerialException as e: print(f"打开串口失败: {e}") exit(1) # 初始化光谱分析仪 osa = optical_spectrum_analyzer.Yokogawa_AQ6375B() # 读取目标强度数据 target_spec_df = pd.read_csv(&#39;Target.csv&#39;) target_spec_intensity = np.array(target_spec_df["Intensity"]) target_spec_intensity[target_spec_intensity < -60] = -60 # 初始化光谱分析仪 osc = oscilloscope.MSO_X_3102A() # 读取目标强度数据 target_auto_df = pd.read_csv(&#39;t-Target.csv&#39;) target_auto_intensity = np.array(target_auto_df["Intensity"]) # target_auto_intensity[target_auto_intensity < 0.05] = 0.05 # 初始化误差跟踪变量 min_error = float(&#39;inf&#39;) min_error_value = None prev_error = float(&#39;inf&#39;) prev_instruction = None error_records = [] # 用于结果文件命名的计数器 file_counter = 88 # 全局参数配置 PLOT_FONT = {&#39;fontsize&#39;: 30, &#39;fontweight&#39;: &#39;bold&#39;, &#39;family&#39;: &#39;Times New Roman&#39;} GAUSSIAN_SIGMA = 5 # 高斯滤波参数 WEIGHTS = {&#39;spectrum&#39;: 0.8, &#39;autocorr&#39;: 0.5} # 误差权重 # 增强型误差计算函数 def autocorr_error(y_true, y_pred, x_values): """自相关曲线综合误差计算""" metrics = {} try: # 峰值位置误差 peak_true = x_values[np.argmax(y_true)] peak_pred = x_values[np.argmax(y_pred)] metrics[&#39;position&#39;] = np.abs(peak_true - peak_pred) # 脉冲宽度误差 (FWHM) _, _, width_true, _ = peak_widths(y_true, [np.argmax(y_true)], rel_height=0.5) _, _, width_pred, _ = peak_widths(y_pred, [np.argmax(y_pred)], rel_height=0.5) metrics[&#39;width&#39;] = np.abs(width_true[0] - width_pred[0]) * (x_values[1] - x_values[0]) # 波形相似性 metrics[&#39;correlation&#39;] = np.corrcoef(y_true, y_pred)[0, 1] # 综合误差计算(可调整权重) metrics[&#39;total&#39;] = (0 * metrics[&#39;position&#39;] + 0.5 * metrics[&#39;width&#39;] + 0.05 * (1 - metrics[&#39;correlation&#39;])) except: metrics[&#39;total&#39;] = float(&#39;inf&#39;) return metrics def mse_valid(y_true, y_pred, invalid_val=-65): """有效数据均方误差(双条件过滤)""" mask = (y_true > invalid_val) & (y_pred > -65) # 新增y_pred强度过滤 if np.sum(mask) < 1: # 处理无有效数据情况 return float(&#39;inf&#39;) return np.mean((y_true[mask] - y_pred[mask]) ** 2) def center_wavelength_error(y_true, y_pred, x_values): """中心波长误差计算(质心法)""" try: # 有效数据过滤 valid_mask = (y_pred > -65) & (y_true > -65) if np.sum(valid_mask) < 3: return 1000 # 数据不足时返回大误差 # 计算质心 centroid_true = np.sum(x_values[valid_mask] * y_true[valid_mask]) / np.sum(y_true[valid_mask]) centroid_pred = np.sum(x_values[valid_mask] * y_pred[valid_mask]) / np.sum(y_pred[valid_mask]) return np.abs(centroid_true - centroid_pred) except: return 1000 # 异常处理 # def peak_alignment_error(y_true, y_pred, x_values): # """带强度限制的峰值对齐误差""" # # 应用强度过滤 # valid_mask = y_pred > -60 # y_pred_filtered = y_pred[valid_mask] # x_filtered = x_values[valid_mask] # # # 目标峰值检测 # try: # peaks_true, _ = find_peaks(y_true, height=-80, distance=10) # peak_true_pos = x_values[peaks_true[0]] # except IndexError: # return 1000 # 目标无峰惩罚 # # # 预测峰值检测 # if len(y_pred_filtered) == 0: # return 1000 # try: # peaks_pred, _ = find_peaks(y_pred_filtered, height=-80, distance=10) # peak_pred_pos = x_filtered[peaks_pred[0]] # except IndexError: # return 1000 # # return 0.7 * np.abs(peak_true_pos - peak_pred_pos) + 0.3 * np.abs( # y_true[peaks_true[0]] - y_pred_filtered[peaks_pred[0]]) def spectral_correlation(y_true, y_pred): """带强度限制的光谱相关性""" mask = y_pred > -65 # 仅考虑预测值>-65的区域 if np.sum(mask) < 2: # 至少需要2个点计算相关系数 return 0.0 return np.corrcoef(y_true[mask], y_pred[mask])[0, 1] while True: try: # 生成控制指令 if prev_instruction: try: prev_values = prev_instruction.strip().split(&#39;,&#39;) if len(prev_values) == 3 and all(v.strip() for v in prev_values): base_values = list(map(int, prev_values)) # 生成在基值附近波动的参数 value1 = max(40, min(65, random.randint(base_values[0]-2, base_values[0]+2))) value2 = max(15, min(35, random.randint(base_values[1]-2, base_values[1]+2))) value3 = max(0, min(25, random.randint(base_values[2]-2, base_values[2]+2))) else: raise ValueError("Invalid instruction format") except (ValueError, IndexError) as e: print(f"指令解析失败: {e}") value1 = random.randint(40, 65) value2 = random.randint(15, 35) value3 = random.randint(0, 20) else: value1 = random.randint(40, 65) value2 = random.randint(15, 35) value3 = random.randint(0, 20) # 发送指令到Arduino instruction = f"{value1},{value2},{value3}\n" ser.write(instruction.encode(&#39;utf-8&#39;)) print(f"发送指令: {instruction.strip()}") # 等待设备响应 time.sleep(2) try: reply = ser.readline().decode(&#39;utf-8&#39;).strip() if reply: print(f"收到回复: {reply}") except serial.SerialException as e: print(f"读取回复失败: {e}") # 获取并处理光谱数据 Wavelength, Intensity = osa.measure(1) Intensity[Intensity < -65] = -65 spec_metrics = { &#39;mse&#39;: mse_valid(target_spec_intensity, Intensity), # &#39;peak&#39;: peak_alignment_error(target_spec_intensity, Intensity, Wavelength), &#39;centroid&#39;: center_wavelength_error(target_spec_intensity, Intensity, Wavelength), &#39;corr&#39;: spectral_correlation(target_spec_intensity, Intensity) } spec_error = (0.6 * spec_metrics[&#39;mse&#39;] + 0 * spec_metrics[&#39;peak&#39;] + 0.3 * spec_metrics[&#39;centroid&#39;] + 0.8 * (1-spec_metrics[&#39;corr&#39;])) # 获取示波器数据 measurement = osc.measure(1) if measurement is not None: Time, Intensity = measurement # Intensity[Intensity < 0.05] = 0.05 # 计算误差 auto_metrics = autocorr_error(target_auto_intensity, Intensity, Time) auto_error = auto_metrics[&#39;total&#39;] # 计算总误差 total_error = WEIGHTS[&#39;spectrum&#39;] * spec_error + WEIGHTS[&#39;autocorr&#39;] * auto_error print(f"当前误差: {total_error:.2f}") # 更新最小误差和最小误差角度 if total_error < min_error: min_error = total_error min_error_value = instruction print(f"发现更优解: {min_error_value.strip()} 误差: {min_error:.2f}") # 记录误差数据 error_records.append({&#39;指令&#39;: instruction.strip(), &#39;误差&#39;: total_error}) pd.DataFrame(error_records).to_csv(&#39;all_errors1.csv&#39;, index=False) # 运行 optical_spectrum_analyzer.py 文件 try: subprocess.run([&#39;python&#39;, &#39;optical_spectrum_analyzer.py&#39;]) if os.path.exists(&#39;result.csv&#39;): os.rename(&#39;result.csv&#39;, f&#39;gp{file_counter}.csv&#39;) except Exception as e: print(f"运行 optical_spectrum_analyzer.py 失败: {e}") # 运行 oscilloscope.py 文件 try: subprocess.run([&#39;python&#39;, &#39;oscilloscope.py&#39;]) if os.path.exists(&#39;result.csv&#39;): os.rename(&#39;result.csv&#39;, f&#39;zxg{file_counter}.csv&#39;) except Exception as e: print(f"运行 oscilloscope.py 失败: {e}") # 运行停止鼠标控制保存.py 文件 try: subprocess.run([&#39;python&#39;, &#39;鼠标控制保存.py&#39;]) print(f"运行 鼠标控制保存") except Exception as e: print(f"运行 鼠标控制保存.py 失败: {e}") file_counter += 1 # 停止 1 秒 time.sleep(1) # 判断收敛条件 if total_error < 5: print(f"达到目标精度! 最终指令: {min_error_value.strip()}") ser.write(min_error_value.encode(&#39;utf-8&#39;)) break # 更新搜索策略 if total_error < 20 and abs(prev_error - total_error) > 3: prev_error = total_error prev_instruction = instruction else: prev_instruction = None except Exception as e: print(f"运行异常: {e}") prev_instruction = None # 出现异常时重置搜索基准 ser.close()详细解释一下这个代码。基于此代码写一个论文补充材料关于代码通过历史记忆引导搜索、多维度误差评估和随机扰动恢复三个核心机制的类人算法核心思想 的简单思路。创新点在哪.
最新发布
07-02
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值