A DNA sequence consists of four letters, A, C, G, and T. The GC-ratio of a DNA sequence is the number of Cs and Gs of the sequence divided by the length of the sequence. GC-ratio is important in gene finding because DNA sequences with relatively high GC-ratios might be good candidates for the starting parts of genes. Given a very long DNA sequence, researchers are usually interested in locating a subsequence whose GC-ratio is maximum over all subsequences of the sequence. Since short subsequences with high GC-ratios are sometimes meaningless in gene finding, a length lower bound is given to ensure that a long subsequence with high GC-ratio could be found. If, in a DNA sequence, a 0 is assigned to every A and T and a 1 to every C and G, the DNA sequence is transformed into a binary sequence of the same length. GC-ratios in the DNA sequence are now equivalent to averages in the binary sequence.

给定一个DNA序列,通过将其转换为二进制序列,寻找长度至少为L且GC-ratio(即1的平均比例)最大的子序列。当有多个子序列具有相同最大平均值时,选择最短的;若存在多个最短子序列,则选取起始位置最小的。利用单调队列优化算法,维护下凸曲线,以高效找到满足条件的子序列。
最低0.47元/天 解锁文章
199

被折叠的 条评论
为什么被折叠?



