(白书训练计划)UVa 12627 Erratic Expansion(递归+找规律)

本文详细介绍了UVa12627问题的解题策略,通过找出规律并运用递归来解决。代码实现清晰,适合算法初学者理解和实践。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

题目地址:UVa 12627

这题是先找规律,规律在于对于第k个小时的来说,总是可以分成右下角全是蓝色气球,右上角,左下角与左上角三个一模一样的k-1个小时的气球。这样的话,规律就很清晰了,然后用递归做比较方便。。。

代码如下:

#include <iostream>
#include <cstdio>
#include <string>
#include <cstring>
#include <stdlib.h>
#include <math.h>
#include <ctype.h>
#include <queue>
#include <map>
#include <set>
#include <algorithm>

using namespace std;
#define LL long long
LL fei[32];
LL cal(LL x)
{
    if(x==0) return 0;
    LL z=1, sum=1;
    while(x>=z)
    {
        sum*=3;
        z*=2;
    }
    return sum/3+2*cal(x-z/2);
}
void init()
{
    int i;
    fei[0]=1;
    for(i=1; i<=30; i++)
        fei[i]=fei[i-1]*2;
}
int main()
{
    LL n, i, t, ans, k, a, b, num=0;
    scanf("%lld",&t);
    init();
    while(t--)
    {
        scanf("%lld%lld%lld",&k,&a,&b);
        num++;
        a=fei[k]-a+1;
        b=fei[k]-b+1;
        ans=cal(a)-cal(b-1);
        printf("Case %lld: %lld\n",num,ans);
    }
    return 0;
}


### DDPG Training Curve Analysis In the context of Deep Deterministic Policy Gradient (DDPG), analyzing training curves is essential for understanding model convergence and performance. The typical metrics plotted include reward per episode or average reward over a number of episodes against time steps or episodes. Training curves provide insights into how well an agent learns to perform tasks within its environment. For instance, observing smooth upward trends indicates steady improvement in policy effectiveness[^2]. Conversely, erratic fluctuations may suggest instability during learning phases which could be due to exploration noise settings being too high or other hyperparameter misconfigurations. For effective analysis: - **Reward Trends**: Monitor whether rewards increase consistently as training progresses. - **Evaluation Runs**: Periodically evaluate agents without exploratory actions enabled on separate test environments; this helps distinguish between actual skill acquisition versus random successes driven by action randomness. - **Hyperparameters Sensitivity**: Be aware that certain parameters like discount factor γ, soft update ratio τ, batch size, etc., significantly influence stability and speed of learning processes observed through these visualizations. ```python import matplotlib.pyplot as plt def plot_training_curve(rewards): """ Plots the training curve based on episodic rewards Args: rewards (list): List containing total reward obtained at each episode end """ plt.figure(figsize=(10, 6)) plt.plot(rewards) plt.title('DDPG Training Progress') plt.xlabel('Episode Number') plt.ylabel('Total Reward Per Episode') plt.grid(True) plt.show() ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值