UVA - 11427 Expect the Expected (概率dp）-优快云博客

本文探讨了一种基于数学期望的博弈策略，通过分析每日游戏胜率与期望的关系，提出了一种试图超越平均胜率的策略，并通过动态规划算法计算了该策略可持续天数的期望值。

Some mathematical background. This problem asks you to compute the expected value of a random
variable. If you haven't seen those before, the simple denitions are as follows. A random variable is a
variable that can have one of several values, each with a certain probability. The probabilities of each
possible value are positive and add up to one. The expected value of a random variable is simply the
sum of all its possible values, each multiplied by the corresponding probability. (There are some more
complicated, more general denitions, but you won't need them now.) For example, the value of a fair,
6-sided die is a random variable that has 6 possible values (from 1 to 6), each with a probability of 1/6.
Its expected value is 1=6 + 2=6 + : : : + 6=6 = 3:5. Now the problem.
I like to play solitaire. Each time I play a game, I have probability p of solving it and probability
(1 .. p) of failing. The game keeps statistics of all my games { what percentage of games I have won.
If I simply keep playing for a long time, this percentage will always hover somewhere around p 100%.
But I want more.
Here is my plan. Every day, I will play a game of solitaire. If I win, I'll go to sleep happy until
the next day. If I lose, I'll keep playing until the fraction of games I have won today becomes larger
than p. At this point, I'll declare victory and go to sleep. As you can see, at the end of each day, I'm
guaranteed to always keep my statistics above the expected p 100%. I will have beaten mathematics!
If your intuition is telling you that something here must break, then you are right. I can't keep
doing this forever because there is a limit on the number of games I can play in one day. Let's say that
I can play at most n games in one day. How many days can I expect to be able to continue with my
clever plan before it fails? Note that the answer is always at least 1 because it takes me a whole day
of playing to reach a failure.
Input
The first line of input gives the number of cases, N. N test cases follow. Each one is a line containing
p (as a fraction) and n.
1 < N < 3000, 0 < p < 1,
The denominator of p will be at most 1000,
1 < n < 100.
Output
For each test case, print a line of the form `Case #x: y', where y is the expected number of days,
rounded down to the nearest integer. The answer will always be at most 1000 and will never be within
0.001 of a round-off error case.
Sample Input
4
1/2 1
1/2 2
0/1 10
1/2 3
Sample Output
Case #1: 2
Case #2: 2
Case #3: 1
Case #4: 2

题意：

有一个人，每天最多玩牌n次，每次获胜概率为p，如果今天胜率大于p，他就会去睡觉，如果玩了k把之后，胜率还是没有大于p，那他就会戒掉这个游戏，问他玩这个游戏天数的期望。（向下取整）

思路：

dp[i][j]表示第i天玩到第j把，胜率小于p的概率。

#include<iostream>
#include<algorithm>
#include<vector>
#include<stack>
#include<queue>
#include<map>
#include<set>
#include<cstdio>
#include<cstring>
#include<cmath>
#include<ctime>
#define fuck(x) cout<<#x<<" = "<<x<<endl;
#define ls (t<<1)
#define rs ((t<<1)|1)
using namespace std;
typedef long long ll;
typedef unsigned long long ull;
const int maxn = 100086;
const int maxm = 100086;
const int inf = 2.1e9;
const ll Inf = 999999999999999999;
const int mod = 1000000007;
const double eps = 1e-6;
const double pi = acos(-1);
double dp[108][108];
int main()
{
//    ios::sync_with_stdio(false);
    freopen("in.txt","r",stdin);

    int cases=0;
    int T;
    scanf("%d",&T);
    while (T--){
        int p1,p2;
        scanf("%d/%d",&p1,&p2);
        double p=1.0*p1/p2;
        int n;
        scanf("%d",&n);

        dp[0][0]=1;
//        fuck(p)
        for(int i=1;i<=n;i++){
            dp[i][0]=pow(1-p,i);
            for(int j=1;j<=n;j++){
                if(p1*i<p2*j){ break;}
                dp[i][j]=dp[i-1][j-1]*p+dp[i-1][j]*(1-p);
            }
        }
        double q=0;
        for(int i=0;i<=n;i++){
            if(1.0*i/n<=p){
                q+=dp[n][i];
//                fuck(dp[n][i])
            }
        }
        int ans=1.0/q;
        printf("Case #%d: %d\n",++cases,ans);

    }
    return 0;
}