【Rosalind】Enumerating k-mers Lexicographically - Python自带迭代器itertools模块-优快云博客

本文链接：https://blog.youkuaiyun.com/Emmett_Bioinfo/article/details/109274839

Rosalind习题：Enumerating k-mers Lexicographically（题目ID：LEXF）

最近在处理数据，练代码和写博客的时间稍微少点，最近做了一个关于今天这道习题的整理，写在这篇博客里。这篇博客主要是对Python自带模块itertools的一个介绍。还是先放今天的题目吧：

Problems

Assume that an alphabet AA has a predetermined order; that is, we write the alphabet as a permutation A=(a1,a2,…,ak), where a1<a2<⋯<ak. For instance, the English alphabet is organized as (A,B,…,Z).

Given two strings ss and tt having the same length nn, we say that ss precedes tt in the lexicographic order (and write s<Lexts<Lext) if the first symbol s[j] that doesn’t match t[j] satisfies sj<tj in A.

Given: A collection of at most 10 symbols defining an ordered alphabet, and a positive integer n (n≤10).

Return: All strings of length nn that can be formed from the alphabet, ordered lexicographically (use the standard order of symbols in the English alphabet).

Sample Dataset

A C G T

2

Sample Output

AA
AC
AG
AT
CA
CC
CG
CT
GA
GC
GG
GT
TA
TC
TG
TT

我的思路

之前其实有些过类似的代码，想到了使用迭代器来进行组合字符串。这道题只需要将给定的字符（已经按照字母表顺序进行排列了）进行迭代的组合输出就可以。那么这个迭代器的模块我之前也用过，不过既然是认真学习，我就去认真看了itertools的说明文档，给大家做一个总结。

首先放下大神的代码，因为我虽然用了itertools还是显得有点啰嗦。看下别人的解法

# ^_^ coding:utf-8 ^_^

"""
Enumerating k-mers Lexicographically
url: http://rosalind.info/problems/lexf/
Given: A collection of at most 10 symbols defining an ordered alphabet, and a positive integer n (n <= 10).
Return: All strings of length that can be formed from the alphabet, ordered lexicographically.
"""

import itertools

with open("../data/rosalind_lexf.txt", 'r') as f:
    string = f.readline().split()
    n = int(f.readline().strip())
    result = list(itertools.product(string, repeat = n))
    print("\n".join(["".join(x) for x in result]))

(上述代码来源：https://github.com/zonghui0228/Rosalind-Solutions/blob/master/code/rosalind_lexf.py）

`itertools`模块

itertools模块标准化了一套运算速度快、节省内存的迭代工具，这些迭代工具可以单独或者组合使用。利用这些工具函数可以很容易在Python中高效构造其他工具。

itertools中的迭代工具主要分为三类：无穷迭代器 (Infinite iterators)、根据最短输入序列长度停止的迭代器 (Iterators terminating on the shortest input sequence)、排列组合迭代器 (Combinatoric iterators)，以下分别介绍这几类函数。

无穷迭代器（Infinite iterators）

无穷迭代器主要包含了三个函数：count(start, step)、cycle(string)、repeat(elem, n)。count()函数从参数start开始按照步长step计数，其中步长不设置的话就是默认按1处理。cycle()函数是将传入的字符串的每个字符循环输出。repeat()函数将输入的元素elem重复输出指定次数（n），不填n默认无限次。

根据最短输入序列长度停止的迭代器（Iterators terminating on the shortest input sequence）

这一类迭代器常用的有：

accumulate()：创建迭代器返回累计汇总值或其他双目运算的累计结果值。

例：

>>> from itertools import accumulate
>>> data = [1,2,3,4,5]
>>> list(accumulate(data)) #默认进行的运算是加法运算
[1, 3, 6, 10, 15]
>>> import operator
>>> list(accumulate(data, operator.mul)) #进行累乘
[1, 2, 6, 24, 120]
>>> list(accumulate(data, max)) #计算每一步的最大值
[1, 2, 3, 4, 5]

chain()：创建迭代器依次返回输入的所有可迭代对象中的所有元素。这个函数可以进行类似于将多个字符串合并到一起的任务。

例：

>>> from itertools import chain
>>> for i in chain('Emmett','is','happy'):
...     print(i)
...
E
m
m
e
t
t
i
s
h
a
p
p
y

compress(data, selectors)：创建迭代器返回data中经selectors测试为True的元素，迭代数由data和selectors中较短的长度决定。

例：

>>> from itertools import compress
>>> data = 'AaTtCcGg'
>>> sel = [1,0,1,0,1,0,1,0] #设定只输出索引为0,2,4,6的元素
>>> list(compress(data, sel)) 
['A', 'T', 'C', 'G']

dropwhile(function, iterable)：提供一个函数和一个可迭代对象即可（该函数起到过滤作用，满足条件的值都会丢弃直到有元素不满足为止）。

例（来源https://blog.youkuaiyun.com/windy135/article/details/84576646）：

with open('/etc/passwd') as f:
    for line in f:
        print(line, end='')   # 返回所有行 
 
import itertools
with open('/etc/passwd') as f:
    # 丢弃f中以#开头的行，直到找到一个非#开头的行为止，然后全部输出
    for line in itertools.dropwhile(lambda line: line.startswith('#'), f):
        print(line, end='')

filterfalse(function, iterable)：提供一个函数和一个可迭代对象，返回函数值为False的元素。

例（来源https://vimsky.com/examples/usage/python-itertools-filterfalse.html）：

import itertools 
from itertools import filterfalse  
        
li = [2, 4, 5, 7, 8, 10, 20]   
    
# Slicing the list  
print (list(itertools.filterfalse(lambda x:x % 2 == 0, li)))

groupby(iterable, key=None)：创建迭代器返回iterable中连续的键和组。key是一个计算元素键值的函数。

例：

见https://blog.youkuaiyun.com/qq_33688922/article/details/92003430。

islice(iterable, start, stop[, step])：产生的结果是一个迭代器，它可以产生出所需要的切片元素，但这是通过访问并丢弃所有起始索引之前的元素来实现的；之后的元素会由islice对象产生出来，直到到达结束索引为止；islice()会消耗掉所提供的迭代器中的数据，由于迭代器中的元素只能访问一次，所有如果之后还需要去访问，那就应该先将数据转到列表中去。

例（来源：https://blog.youkuaiyun.com/windy135/article/details/84576640）：

def count(n):
    while True:
        yield n
        n += 1
 
c = count(0)  # 生成器的绝妙之处：它只会在迭代时才会运行，所以死循环也没有问题，返回一个generator
# print(c[:]) # TypeError: 'generator' object is not subscriptable
 
import itertools
for x in itertools.islice(c, 10, 13):
    print(x)
    
'''
Outs:
10
11
12
'''

takewhile(function, iterable)：创建迭代器只要function为真就从可迭代对象中返回元素。

例：

from itertools import takewhile 
  
# function to test the elements 
def test_func(x):
      
    print("Testing:", x) 
    return(x.isdigit()) 
  
# using takewhile with for-loop 
for i in takewhile(test_func, "11234erdg456"):
      
    print("Output:", i) 
    print()
'''
Output:
Testing:1
Output:1

Testing:1
Output:1

Testing:2
Output:2

Testing:3
Output:3

Testing:4
Output:4

Testing:e
'''

zip_longest(iterables, fillvalue=None)：创建迭代器从多个可迭代对象中收集元素，若可迭代对象未对其就用fillvalue补齐。

例（来源：https://www.jb51.net/article/141848.htm）：

res = zip_longest('abc', '12')
for x in res:
  print(x)

'''
('a', '1')
('b', '2')
('c', None)
'''

排列组合迭代器（Combinatoric iterators）

迭代器	实参	结果
`product()`	p, q, … [repeat=1]	笛卡尔积，相当于嵌套的for循环
`permutations()`	p[, r]	长度r元组，所有可能的排列，无重复元素
`combinations()`	p, r	长度r元组，有序，无重复元素
`combinations_with_replacement()`	p, r	长度r元组，有序，元素可重复

例子	结果
`product('ABCD', repeat=2)`	`AA AB AC AD BA BB BC BD CA CB CC CD DA DB DC DD`
`permutations('ABCD', 2)`	`AB AC AD BA BC BD CA CB CD DA DB DC`
`combinations('ABCD', 2)`	`AB AC AD BC BD CD`
`combinations_with_replacement('ABCD', 2)`	`AA AB AC AD BB BC BD CC CD DD`

总结

以上是根据Python的itertools模块的官方文档中整理的一些迭代器的用法。在网上看了一些相关的博客，发现每个函数都能够单独的讲解一下用法，在这里只是简单地总结了一下。之后有这种需要快速完成迭代的任务时可以回来查阅。

参考

https://docs.python.org/3/library/itertools.html#module-itertools (itertools说明文档)
http://rosalind.info/problems/lexf/
https://github.com/zonghui0228/Rosalind-Solutions/blob/master/code/rosalind_lexf.py
https://blog.youkuaiyun.com/windy135/article/details/84576646
https://vimsky.com/examples/usage/python-itertools-filterfalse.html
https://blog.youkuaiyun.com/qq_33688922/article/details/92003430
https://blog.youkuaiyun.com/windy135/article/details/84576640
https://www.jb51.net/article/141848.htm