Python Cookbook读书笔记

字符串操作技巧

最新推荐文章于 2023-12-16 14:30:07 发布

原创最新推荐文章于 2023-12-16 14:30:07 发布 · 302 阅读

0 ·

CC 4.0 BY-SA版权

Python 专栏收录该内容

4 篇文章

订阅专栏

本文介绍了一系列字符串处理技巧，包括字符处理、字符串对齐、去除空格、字符串合并等常见任务的解决方案。

1.1每次处理一个字符
解决方案：列表化(list)、for循环遍历、列表推导、map函数

thelist = list(thestring)
for c in thestring:
	do_something_with(c)
result = [do_something_with(c) for c in thestring]
results = map(do_something, thestring)

若想获得字符串当中不重复的所有的字符，可以用sets.Set。

import sets
magic_chars = sets.Set(‘abracadabra’)
print magic_chars

#输出Set([‘a’,’r’,’b’,’c’,’d’])

‘’.join(magic_chars)

#输出 ‘arbcd’

1.2字符和字符值之间的转换
任务：将一个字符转化为相应的ASCLL或者Unicode码，或者反其道而行。
解决方案：使用ord和chr（注意：只针对长度为1的字符串）

>>> print ord(‘a’)
97
>>> print chr(97)
a

对于长度为1的Unicode字符串，也可以使用ord函数，返回的最大值为65535，同样若想把一个数字的Unicode码转化为长度为1的Unicode字符串，可以使用内建函数unichr:

>>> print ord(u’\u2020’)
8224
>>> print repr(unichr(8224))
u’\u2020’

若想批量处理，可以结合map函数使用。

>>> print map(ord, ‘ciao’)
[99,105,97,111]
print ‘’.join(map(chr,range(97,100)))
abc

1.3 测试一个对象是否是类字符
任务：测试传入的参数是否是一个字符串
解决方案：用isinstance而不是用type。

def isAString(anobj):
	return isinstance(anobj,basestring)

这对于所有继承自basestring的类似类字符串对象都可以检测。然而对于其他不继承自basestring的类字符串对象缺不能检测出来，那么可以采取措施判断类字符串是否真的有字符串的某些行为：

def isStringLike(anobj):
	try:anobj + ‘’
	except: return False
	else: return True

1.4 字符串对齐
任务：实现字符串左对齐，居中对齐，或者右对齐
解决方案：ljust、rjust、center。每个方法都需要一个参数指定生成的字符串的宽度。

>>> print ‘|’,‘hej’.ljust(20),’|’,’hej’.rjust(20),’|’,’hej’.center(20)
| hej           |           hej |      hej       |

默认是使用空格支付填充，但是也可以为函数增加第二个参数指定要填充的特定的支付，例如：’hej’.center(20,’+’)

1.5 去除字符串两端的空格
任务：获得一个开头和末尾都没有多余空格的字符串
解决方案：用lstrip、rstrip、strip。当然不一定必须是空格，也可以去除其他的字符串，例如：

>>> x = ‘xyxxyy hejyx yyx’
>>> print ‘|’+x.strip(‘xy’)+‘|’

这个只会去除指定的x和y字符，对于空格不会去除，所以结果就是空格+hejyx+空格

1.6合并字符串
任务：把零碎的字符串合并成一个大的字符串
解决方案：使用字符串操作符join。例如：假设pieces是一个字符串列表，若想将列表中的字符串全部拼接起来，可以使用：

largeString = ‘’.join(pieces)

若想将变量中的字符串片段拼接起来，可以考虑使用字符串格式化操作符%。

LargeString = ‘%s%s something %s yet more’ %(small1,small2,small3)

用join相对于用’+’的好处在于它的效率性，用’+’拼接多个字符串会产生多个中间结果，需要的时间和需要累加的字符数的平方成正比。而join会一次性将多个需要拼接的字符串载入而无需产生中间结果，只使用了一个string拷贝用于拼接所有的子项。

1.7将字符串逐字符或者逐词反转
任务：反转字符串
解决方案：谨记，字符串无法改变，反转字符串需要一个拷贝，最简单的办法是使用-1的步长产生反转的效果，例如：

revchars = astring[::-1]

若是想要按照单词来反转，可以调用split创建单词的列表，然后反转列表后调用字符串操作符join拼接成大的字符串

revwords = astring.split()
revwords.reverse()
revwords = ‘’.join(revwords)

1.8检查字符串中是否包含某字符集合中的字符
任务：检查某字符串是否包含特定的集合中的字符
解决方案：for 遍历

def containAny(seq,aset): “””检查序列seq中是否含有aset中的项”””
	for c in seq:
if c in aset: return True
	return False

高效一点的方法:

import itertools
def containAny(seq,aset):
	for item in itertools.ifilter(aset.__contains__,seq):
		return Ture
	return False

ifilter(fun,iterator):返回一个可以让fun返回True的迭代器。

a.diffrence(b)：返回a中所有不属于b的元素

接下来的一个版本涉及到string的maketrans以及translate用法，因此在展示另一个版本之前，先熟悉这它们的做法。首先看官方给出的解释：
string.maketrans(from, to)
Return a translation table suitable for passing to translate(), that will map each character in from into the character at the same position in to; from and to must have the same length.
string.translate(s, table[, deletechars])
Delete all characters from s that are in deletechars (if present), and then translate the characters using table, which must be a 256-character string giving the translation for each character value, indexed by its ordinal. If table is None, then only the character deletion step is performed.
例子1：

from string import maketrans   # Required to call maketrans function.
intab = "aeiou"
outtab = "12345"
trantab = maketrans(intab, outtab)
str = "this is string example....wow!!!";
print str.translate(trantab);

输出结果：
>>>
th3s 3s str3ng 2x1mpl2....w4w!!!
例子2：

from string import maketrans   # Required to call maketrans function.
intab = "aeiou"
outtab = "12345"
trantab = maketrans(intab, outtab)
str = "this is string example....wow!!!";
print str.translate(trantab, 'xm');

输出结果:
>>>
th3s 3s str3ng 21pl2....w4w!!!
例子2的示例更具体的说明了它们的一般用法，maketrans表明会将aeiou相应的替换为12345。str.translate(trantab,’xm’)表示会将str当中的x和m删除，然后在str当中遇到intab中的字母，会相应的转换为outtab对应的字母，即a替换为1,e替换为2,i替换为3,o替换为4，u替换为5.
了解了其一般的用法，就可以知晓下面另一个版本的意义了

import string
notrans = string.maketrans(‘’,’’)   #表示无需翻译（转换）
def containsAny(astr,strset):
	return len(strset) != len(strset.translate(notrans,astr))
strset.translate(notrans,astr)，#表示加入strset当中若包含astr字符串中的字符，则在strset当中删除这些字符。

那么若是上述表达式两边长度相等，表示strset当中不包含astr字符串里的字符，若不等，则表示包含astr字符串当中的任意字符。

1.9简化字符串的translate方法的使用
任务：做个简单的封装
解决方案：返回闭包的工参函数。

import string
def translator(frm=’’,to=’’,delete=’’,keep=None):
	if len(to) == 1:
		to = to*len(frm)
	trans = string.maketrans(frm,to)
	if keep is not None:
		allchars = string.maketrans(‘’,’’)
		delete = allchars.translate(cllchars,keep.translate(cllchars,delete))
	def translate(s):
		return s.translate(trans,delete)
	return translate ”””函数里面返回一个函数，称之为闭包”””

1.10 过滤字符串中不属于指定集合的字符
任务：过滤字符串中不属于指定集合的字符
解决方案：使用translate来解决问题

import string
#生成所有字符的可复用的字符串，它还可以作为一个翻译表，指明
#”无需翻译”，也就是不用转换
allchars = string.maketrans(‘’,’’)
def makefilter(keep):
	#删除keep，剩下为keep的补集


	delchars = allchars.translate(allchars,keep)
	def thefilter(s):
		#删除delchars，剩下为delchars的补集keep，达到目的
		return s.translate(allchars,delchars)
if __name__ == ‘__main__’:
	just_vowels = makefilter(‘aeiouy’)
	print just_vowels(‘four score and seven years ago’)

#输出：ouoeaeeyeaao

def canonicform(s):
	return makefilter(s)(allchars)

会将s以字母表的顺序排列，且没有重复。
下面这个版本适用于unicode，略微看不太懂

import sets
class Keeper(object):
	def __init__(self,keep):
		self.keep = sets.Set(map(ord,keep))
	def __getitem__(self,n):
		if n not in self.keep:
			return None
		return unichar(n)
	def __call__(self,s):
		#猜测unicode(s).translate(self)执行的时候会调用__getitem__
#去验证s当中的字符是否是self.keep里的字符，若不是者因
#因为None的缘故删除掉，若是，则保留
return unicode(s).translate(self)
makefilter = Keeper
if __name__ == ‘__main__’:
	just_vowels = makefilter(‘aeiouy’)
	print just_vowels(u’four score and seven years ago’)

#输出：ouoeaeeyeaao