Codeforces 24D 期望DP 解题报告

本文解决了一个涉及随机行走的智能机器人问题,目标是计算机器人从任意起始位置到达棋盘最底行所需的平均步数。文章提供了详细的数学推导及代码实现。

D. Broken robot

You received as a gift a very clever robot walking on a rectangular board. Unfortunately, you understood that it is broken and behaves rather strangely (randomly). The board consists of N rows and M columns of cells. The robot is initially at some cell on the i-th row and the j-th column. Then at every step the robot could go to some another cell. The aim is to go to the bottommost (N-th) row. The robot can stay at it’s current cell, move to the left, move to the right, or move to the cell below the current. If the robot is in the leftmost column it cannot move to the left, and if it is in the rightmost column it cannot move to the right. At every step all possible moves are equally probable. Return the expected number of step to reach the bottommost row.

Input

On the first line you will be given two space separated integers N and M (1 ≤ N, M ≤ 1000). On the second line you will be given another two space separated integers i and j (1 ≤ i ≤ N, 1 ≤ j ≤ M) — the number of the initial row and the number of the initial column. Note that, (1, 1) is the upper left corner of the board and (N, M) is the bottom right corner.

Output

Output the expected number of steps on a line of itself with at least 4 digits after the decimal point.
Examples
input
10 10
10 4
output
0.0000000000
input
10 14
5 14
output
18.0038068653

【解题报告】
http://www.cnblogs.com/liu-runda/p/6251416.html
http://www.cnblogs.com/xiefengze1/p/7728018.html
我们用f[i][j]表示从(i,j)这个点走到第n行所需要的期望时间。那么我们显然得到:

f[i][1]=(f[i][1]+f[i][2]+f[i+1][1])/3+1;

f[i][j]=(f[i][j-1]+f[i][j]+f[i][j+1]+f[i+1][j])/4+1; j∈[2,m-1]

f[i][m]=(f[i][m]+f[i][m-1]+f[i+1][m])/3+1;

如果对上述的方程进行simple的高斯消元,很明显是不行的…..

但还是要消一遍先~,得到:

f[i][1]=(3+f[i][2]+f[i+1][1])/2;

f[i][j]=(4+f[i][j+1]+f[i][j-1]+f[i+1][j])/3;

f[i][m]=(3+f[i][m-1]+f[i+1][m])/2;

不妨设f[i+1]是已知的,那么,我们对该式子做一些细微调整。

以f[i][1]举例 f[i][1] = (3+f[i][2]+f[i+1][1])/2 = (3+f[i+1][1])/3+1/2f[i][2]。

我们设(3+f[i+1][1])/3为A,1/2为B。

则f[i][2]= (4+f[i][3]+f[i][1]+f[i+1][j])/3 = (4+f[i][3]+A+B*f[i][2]+f[i+1][2])/3

化简后得到 f[i][2]=(4+f[i][3]+A+f[i+1][2])/(3-B)。

与化简f[1]的方式相同,将f[i][2]化为A+Bf[i+3],不难得到F[i][2]=(4+A+f[i+1][2])/(3-B) + 1/(3-B)*f[i][3],即A’=(4+A+C)/(3-B),B’=1/(3-B)。 f[i][3…m-1]的推法与f[i][2]相同。(C为f[i+1][2])

下面考虑f[i][m],在此之前,我们已经求得f[i][m-1]=A+B*f[i][m]。 通过前面的推论我们得知:f[i][m]=(3+f[i][m-1]+f[i+1][m])/2,代入后得到f[i][m]=(3+A+B*f[i][m]+f[i+1][m])/2。

化简后得到f[i][m]=(3+A+f[i+1][m])/(2-B)。f[i][m]的准确值终于被求出来了…..,由于先前的f[i][j]均推出了f[i][j]=A+B*f[i][j+1],所以该行的所有f值将全部求出。

这一过程重复n-x+1次即可。时间复杂度为O((n-x+1)*m)。

本题有个小坑:当m=1时,上文描述的转移无效,此情况下转移为f[i][1]=f[i+1][1]+2。

代码如下:

#include<cstdio>
#include<cstring>
#include<algorithm>
using namespace std;
#define N 1010

int n,m,x,y;
double dp[N][N],a[N],b[N];

int main()
{
    scanf("%d%d%d%d",&n,&m,&x,&y);
    if(m==1)
    {
        a[n]=0;
        for(int i=n-1;i>=x;--i) a[i]=a[i+1]+2;
        printf("%.4f\n",a[x]);
    }
    else
    {    
        for(int i=1;i<=m;++i) dp[n][i]=0;
        for(int i=n-1;i>=x;--i)
        {
            a[1]=0.5;b[1]=dp[i+1][1]/2+1.5;
            for(int j=2;j<m;++j)
            {
                b[j]=b[j-1]/4.0+dp[i+1][j]/4.0+1.0;
                a[j]=0.25;
                a[j]/=(0.75-a[j-1]/4.0);
                b[j]/=(0.75-a[j-1]/4.0);
            }
            dp[i][m]=(b[m-1]+dp[i+1][m]+3.0)/(2-a[m-1]);
            for(int j=m-1;j>=1;--j) dp[i][j]=b[j]+a[j]*dp[i][j+1];
        }
        printf("%.10f\n",dp[x][y]);
    }
    return 0;
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值