2014微软编程一小时题目2 : Longest Repeated Sequence

 

题目2 : Longest Repeated Sequence

时间限制: 10000ms
单点时限: 1000ms
内存限制: 256MB
 描述

You are given a sequence of integers, A = a1, a2, ... an. A consecutive subsequence of A (say ai, ai+1 ... aj) is called a "repeated sequence" if it appears more than once in A (there exists some positive k that ai+k = ai, ai+k+1 = ai+1, ... aj+k = aj) and its appearances are not intersected (i + k > j).

Can you find the longest repeated sequence in A?

 输入

Line 1: n (1 <= n <= 300), the length of A.
Line 2: the sequence, a1 a2 ... an (0 <= ai <= 100).

 输出

The length of the longest repeated sequence.

样例输入
5
2 3 2 3 2
样例输出
2
找到最长的重复序列长度:因为i+K>j  序列为a1,a2,……ai,ai+1,ai+2,……,aj,……,ai+k,ai+1+k, ai+2+k,……aj+k,……an    其中ai+k = ai, ai+k+1 = ai+1, ... aj+k = aj,重复序列长度为j,重复序列为ai,ai+1,ai+2,……aj
#include <iostream>
#include <vector>
using namespace std;

int main()
{
	int n;
	vector<int> v;
	int ival;
	int longestlength = 0;//记录The length of the longest repeated sequence.

	//input n
	cin >> n;

	//intput vector
	while(n-- > 0 && cin >> ival)
	{
		v.push_back(ival);
	}


	//The length of the longest repeated sequence.
	for(vector<int>::size_type i = 0 ; i != v.size() - 1 ; ++i )//i序列开始位置
	{
		for(vector<int>::size_type k = 1 ; i + k < v.size() ; ++k )//k步长
		{
			int j = i;//序列最后一个元素位置
			int ii = i;//ii = i,i+1,i+2 ... j

			while(i + k > j && ii + k < v.size() && v[ii] == v[ii + k])//判断序列长度
			{
				ii++;
				j++;
			}

			j--;//最后一个不满足条件,回退

			if((j  - i + 1) > longestlength) //本次重复序列长度 = j-i+1
			{
				longestlength = j - i + 1;
			}
		}
	}

	cout << longestlength << endl;

	return 0;
}


当然可以。首先,我们需要使用pandas库来读取CSV文件,然后用python的生物信息学库如BioPython来进行DNA序列处理,找到重复子片段的最大长度。这里我会提供个基本的步骤: ```python import pandas as pd from Bio.Seq import Seq # 安装 BioPython (如果未安装) !pip install biopython def find_longest_repeated_subsequence(seq): seq = Seq(seq.upper()) # 将所有字母转为大写以便统比较 subseq_dict = {} for i in range(len(seq) - 1, 0, -1): # 从最长子串开始检查 if seq[i:] == seq[:i]: # 如果有重复子串 subseq_dict[i] = seq[i:] return max(subseq_dict, key=subseq_dict.get) def process_sequences(file_path, output_file='output.xlsx'): # 读取CSV文件 df = pd.read_csv(file_path) # 新建个DataFrame来存储结果 results = pd.DataFrame(columns=['Sequence', 'Max_Repeated_Length']) for index, row in df.iterrows(): seq = row['your_column_name'] # 假设序列在名为'your_column_name'的列 length = len(find_longest_repeated_subsequence(seq)) results.loc[index] = [seq, length] # 将结果写入Excel文件 results.to_excel(output_file, index=False) # 使用上述函数 process_sequences('your_csv_file.csv') ``` 在这个代码中,你需要替换`your_column_name`为你CSV文件中包含序列的实际列名。运行这段代码后,它将在输出文件(默认是`output.xlsx`)的第列保存原始序列,第二列保存对应的最大重复子片段长度。 如果你对某些概念不熟悉,例如如何读取CSV或使用BioPython,请告诉我,我可以进步解释。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值