distinct subsequence

本文详细介绍了使用递归和动态规划两种方法解决LeetCode上关于计算字符串子序列数量的问题。通过逐步分析和代码实现,解释了如何通过递归递推子问题来解决更复杂的问题,并通过动态规划表格填充来优化递归解决方案,最终得到高效的算法实现。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

A subsequence of a given sequence is just the given sequence with some elements (possibly none) left out. Formally, given a sequence X =x1x2xm, another sequence Z = z1z2zk is a subsequence of X if there exists a strictly increasing sequence <i1i2, …, ik> of indices of such that for all j = 1, 2, …, k, we have xij = zj. For example, Z = bcdb is a subsequence of X = abcbdab with corresponding index sequence< 2, 3, 5, 7 >.

In this problem your job is to write a program that counts the number of occurrences of Z in X as a subsequence such that each has a distinct index sequence.

LeeCode 题目如下:

Given a string S and a string T, count the number of distinct subsequences of T in S.

A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not).

Here is an example:
S = "rabbbit"T = "rabbit"

Return 3.

思路1:递归(TLE)

如果当前字符相同,结果加上S和T在该index之后的匹配方法数

如果当前字符不同,将S的指针向后移,递归计算


class Solution {
private:
    int cnt;
    int len_s;
    int len_t;
public:
    Solution():cnt(0){}
    void Count(string S,string T, int idx_ss, int idx_ts){
        if(idx_ts == len_t){
            cnt++;
            return;
        }
        int i;
        for (i=idx_ss; i<len_s; i++) {
            if (S[i] == T[idx_ts]) {
                Count(S, T, i + 1, idx_ts + 1);
            }
        }
    }
    
    int numDistinct(string S, string T) {
        len_s = S.length();
        len_t = T.length();
        Count(S, T, 0, 0);
        return cnt;
    }
};

思路2:DP

如果当前字符相同,dp[i][j]结果等于用S[i](dp[i-1][j-1])和不用S[i](dp[i-1][j])方法数求和

如果当前字符不同,dp[i][j] = dp[i-1][j]


class Solution {
private:
    int len_s;
    int len_t;
public:
    int Count(string S,string T){
        int i,j;
        int dp[len_s][len_t];
        memset(dp, 0, sizeof(dp));
        
        if (S[0]==T[0]) {
            dp[0][0] = 1;
        }
        
        for(i=1;i<len_s;i++){
            dp[i][0] = dp[i-1][0];
            if (T[0]==S[i]) {
                dp[i][0]++;
            }
        }
                
        for (i=1; i<len_s; i++) {
            for (j=1; j<len_t && j<=i; j++) {
                if (S[i]!=T[j]) {
                    dp[i][j] = dp[i-1][j];
                    //cout<<dp[i-1][j]<<endl;
                }
                else{
                    dp[i][j] = dp[i-1][j-1] + dp[i-1][j];
                    //dp[i-1][j-1]: use S[i], as S[i]==T[j]
                    //dp[i-1][j]  : don't use S[i]
                    //cout<<dp[i][j]<<endl;
                }
            }
        }
        return dp[len_s-1][len_t-1];
    }
    
    int numDistinct(string S, string T) {
        len_s = S.length();
        len_t = T.length();
        return Count(S, T);
    }
};


在stack overflow 找到如下的解决办法(DP办法):

From LeetCode

Given a string S and a string T, count the number of distinct subsequences of T in S.

A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not).

Here is an example: S = "rabbbit", T = "rabbit"

Return 3.

I see a very good DP solution, however, I have hard time to understand it, anybody can explain how this dp works?

int numDistinct(string S, string T) {

        vector<int> f(T.size()+1);

        //set the last size to 1.
        f[T.size()]=1;

        for(int i=S.size()-1; i>=0; --i){
            for(int j=0; j<T.size(); ++j){
                f[j]+=(S[i]==T[j])*f[j+1];
                printf("%d\t", f[j] );
            }
            cout<<"\n";
        }
        return f[0];
    }
share | edit
 

2 Answers

up vote 12 down vote accepted

First, try to solve the problem yourself to come up with a naive implementation:

Let's say that S.length = m and T.length = n. Let's write S{i} for the substring of S starting at i(suffix array). For example, if S = "abcde"S{0} = "abcde"S{4} = "e", and S{5} = "". We use a similar definition for T.

Let N[i][j] be the distinct subsequences for S{i} and T{j}. We are interested in N[0][0](because those are both full strings).

There are two easy cases: N[i][n] for any i and N[m][j] for j<n. How many subsequences are there for "" in some string S? Exactly 1. How many for some T in ""? Only 0.

Now, given some arbitrary i and j, we need to find a recursive formula. There are two cases.

If S[i] != T[j], we know that N[i][j] = N[i+1][j] (I hope you can verify this for yourself, I aim to explain the cryptic algorithm above in detail, not this naive version).

If S[i] = T[j], we have a choice. We can either 'match' these characters and go on with the next characters of both S and T, or we can ignore the match (as in the case that S[i] != T[j]). Since we have both choices, we need to add the counts there: N[i][j] = N[i+1][j] + N[i+1][j+1].


In order to find N[0][0] using dynamic programming, we need to fill the N table. We first need to set the boundary of the table:

N[m][j] = 0, for 0 <= j < n //第m 行
N[i][n] = 1, for 0 <= i <= m  // 第n 列

Because of the dependencies in the recursive relation, we can fill the rest of the table looping ibackwards and j forwards:

for (int i = m-1; i >= 0; i--) {
    for (int j = 0; j < n; j++) {
        if (S[i] == T[j]) {
            N[i][j] = N[i+1][j] + N[i+1][j+1];
        } else {
            N[i][j] = N[i+1][j];
        }
    }
}

We can now use the most important trick of the algorithm: we can use a 1-dimensional array f, with the invariant in the outer loop: f = N[i+1]; This is possible because of the way the table is filled. If we apply this to my algorithm, this gives:

f[j] = 0, for 0 <= j < n
f[n] = 1

for (int i = m-1; i >= 0; i--) {
    for (int j = 0; j < n; j++) {
        if (S[i] == T[j]) {
            f[j] = f[j] + f[j+1];
        } else {
            f[j] = f[j];
        }
    }
}

We're almost at the algorithm you gave. First of all, we don't need to initialize f[j] = 0. Second, we don't need assignments of the type f[j] = f[j].

Since this is C++ code, we can rewrite the snippet

if (S[i] == T[j]) {
    f[j] += f[j+1];
}

to

f[j] += (S[i] == T[j]) * f[j+1];

and that's all. This yields the algorithm:

f[n] = 1

for (int i = m-1; i >= 0; i--) {
    for (int j = 0; j < n; j++) {
        f[j] += (S[i] == T[j]) * f[j+1];
    }
}
share | edit
 
 
thanks for explanation, hope I can vote more times. –   J.W.  Dec 11 '13 at 4:06
 
can you explain this "N[i][n] = 1, for 0 <= i <= m"??? –   nyus2006  Apr 28 at 0:17
 
@S.H. you can think of it as for(int i = 0; i <= m; i++) { N[i][n] = 1; }. The big difference is that that way is operational: I provide an 'algorithm' how to set the values, whereas the way in the post isdeclarative: I only care about the values, not about how to achieve them. That's a more mathematical way of writing it. –   Vincent van der Weele  Apr 28 at 6:19 


#include <iostream>
#include <vector>
#include <cstdlib>
#include <cstdio>

//The zero initialization is specified in the
//standard as default zero initialization/value
//\initialization for builtin types, primarily to
//support just this type of case in template use.
//
//Note that this behavior is different from a
// local variable such as int x; which leaves
// the value uninitialized (as in the C language
//that behavior is inherited from).

using namespace std;

int numDistinct(string S, string T) {

        vector<int> f(T.size() + 1); //默认的vector的每一个element 均被初始化为0

        //set the last size to 1.
        f[T.size()]=1;

        for(int i = S.size() - 1; i >= 0; --i){
            cout << "i = " << i << "\t";
            // traverse the T string and compare with S
            for(int j=0; j < T.size(); ++j){
                f[j] += (S[i] == T[j]) * f[j+1];
                printf("%d\t", f[j] );
            }
            cout<<"\n";
        }
        return f[0];
}

int main() {
    string S = "rabbbitr";
    string T = "rabit";

    cout << numDistinct(S, T) << endl;
}

运行结果如下:






SELECT DATE_FORMAT(first_order_time, '%Y-%m') AS 月份, COUNT(user_phone) AS 总订单, COUNT(DISTINCT CASE WHEN DATE_FORMAT(final_delivery_time, '%Y-%m') = '2025-01' THEN user_phone END) AS '2025年1月(成交)', COUNT(DISTINCT CASE WHEN DATE_FORMAT(final_delivery_time, '%Y-%m') = '2025-02' THEN user_phone END) AS '2025年2月(成交)', COUNT(DISTINCT CASE WHEN DATE_FORMAT(final_delivery_time, '%Y-%m') = '2025-03' THEN user_phone END) AS '2025年3月(成交)', COUNT(DISTINCT CASE WHEN DATE_FORMAT(final_delivery_time, '%Y-%m') = '2025-04' THEN user_phone END) AS '2025年4月(成交)', COUNT(DISTINCT CASE WHEN DATE_FORMAT(final_delivery_time, '%Y-%m') = '2025-05' THEN user_phone END) AS '2025年5月(成交)', COUNT(DISTINCT CASE WHEN DATE_FORMAT(final_delivery_time, '%Y-%m') = '2025-06' THEN user_phone END) AS '2025年6月(成交)', COUNT(DISTINCT CASE WHEN DATE_FORMAT(final_delivery_time, '%Y-%m') = '2025-07' THEN user_phone END) AS '2025年7月(成交)', COUNT(DISTINCT CASE WHEN DATE_FORMAT(final_delivery_time, '%Y-%m') = '2025-08' THEN user_phone END) AS '2025年8月(成交)', COUNT(DISTINCT CASE WHEN DATE_FORMAT(final_delivery_time, '%Y-%m') = '2025-09' THEN user_phone END) AS '2025年9月(成交)', COUNT(DISTINCT CASE WHEN DATE_FORMAT(final_delivery_time, '%Y-%m') = '2025-10' THEN user_phone END) AS '2025年10月(成交)', COUNT(DISTINCT CASE WHEN DATE_FORMAT(final_delivery_time, '%Y-%m') = '2025-11' THEN user_phone END) AS '2025年11月(成交)', COUNT(DISTINCT CASE WHEN DATE_FORMAT(final_delivery_time, '%Y-%m') = '2025-12' THEN user_phone END) AS '2025年12月(成交)' FROM ( SELECT o.user_phone, MIN(o.order_create_time) AS first_order_time, MAX(CASE WHEN o.order_status IN ('4','41','42','5','8','9','10','12','13','14','15','121','122','131') THEN o.order_delivery_time END) AS final_delivery_time FROM orders o LEFT JOIN orders_info_source s ON s.order_id = o.order_id LEFT JOIN source c ON c.identify = s.source WHERE o.user_phone NOT IN ( '13828732621','13530358933','18358889349','13713666003','18038032144','18565399454', '18683409156','18254701866','18907432501','13430561693','14776231050','13543336115', '18973513561','18735590366','18823462817','18682291432','18503022713','19129425190', '17681043696','13544004613','13728731424','15989379809','19854594695','15989379809', '13590480601','14797635860','16602837041','13536880980','16602837041','17633834697', '13544004613','18318452072','13138615524','13078687152','15119332972','17724600370', '13787964520','18681078176','16607486252','18566230353','18824797256','15219857297' ) AND o.platform = 1 AND o.goods_id <> 6233 AND o.order_type <> '3' AND o.order_create_time BETWEEN '2025-01-01 00:00' AND '2025-12-31 23:59' AND o.order_status IN ('4','41','42','5','6','7','71','8','9','10','12','13','14','15','121','122','131') -- AND c.label LIKE 'A2%' and (o.deposit_free <> 1 and (o.add_pay_periods <> 0 or nt.order_id is null)) -- 非直发 GROUP BY o.user_phone ) AS user_orders GROUP BY 月份 ORDER BY 月份; 按月份为分组,左边列是下单的日期,但是下单之后不一定成交,要查询出该订单最后一次下单日期并且成交的,如1月下单的客户有可能是2月才第二次下单成交,有可能是3月再次下单成交的,需要查出第一次下单,最后在哪个月份成交的订单数据 字段:o.order_create_time下单日期,o.order_delivery_time发货日期,o.order_id订单号,o.order_status订单状态, 优化sql写一下完整sql
最新发布
06-28
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值