python标准库—文本

最新推荐文章于 2024-11-24 21:27:48 发布

吴正伟的博客

最新推荐文章于 2024-11-24 21:27:48 发布

阅读量98

点赞数

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/yapian8/article/details/38401759

sample_text = '''

    The textwrap module can beused to format text for output in

    situations wherepretty-printing is desired.  It offers

    programmatic functionalitysimilar to the paragraph wrapping

    or filling features found inmany text editors.

'''

<span style="font-family: Arial, Helvetica, sans-serif;">import textwrap</span>

print textwrap.fill(sample_text,width = 50)

fill()取文本作为输入，生成格式化的文本作为输出。

去除现有缩进，结果变得漂亮一些，删除了各行前面都有的空白符，如果某一行比其他行有更多的缩进，那么会有一些空白符没有删除。

>>> dedented_text = textwrap.dedent(sample_text)
>>> print dedented_text


The textwrap module can beused to format text for output in

situations wherepretty-printing is desired.  It offers

programmatic functionalitysimilar to the paragraph wrapping

or filling features found inmany text editors.

结合dedent和fill

>>> dedented_text = textwrap.dedent(sample_text).strip()
>>> for width in [45,70]:
	print "%d Columns: \n" % width
	print textwrap.fill(dedented_text,width = width)

	
45 Columns: 

The textwrap module can beused to format text
for output in  situations wherepretty-
printing is desired.  It offers  programmatic
functionalitysimilar to the paragraph
wrapping  or filling features found inmany
text editors.
70 Columns: 

The textwrap module can beused to format text for output in
situations wherepretty-printing is desired.  It offers  programmatic
functionalitysimilar to the paragraph wrapping  or filling features
found inmany text editors.

悬挂缩进，不仅输出的宽度可以设置，还可以单独控制第一行的缩进，以区别后面各行。

>>> print textwrap.fill(dedented_text,initial_indent = ' ',subsequent_indent = ' ' * 4, width = 50)
 The textwrap module can beused to format text for
    output in  situations wherepretty-printing is
    desired.  It offers  programmatic
    functionalitysimilar to the paragraph wrapping
    or filling features found inmany text editors.

re -- 正则表达式

查找文本中的模式

>>> import re
>>> pattern = 'this'
>>> text = "Does this text match the pattern?"
>>> match = re.search(pattern,text)
>>> match
<_sre.SRE_Match object at 0x021488A8>
>>> s = match.start()
>>> e = match.end()

>>> print 'Found "%s" \n in "%s" \n from %d to %d ("%s")' % \
      (match.re.pattern, match.string, s, e, text[s:e])
Found "this" 
 in "Does this text match the pattern?" 
 from 5 to 9 ("this")

编译表达式,compile()函数会把一个表达式字符串转换为一个正则对象。

>>> regexes = [re.compile(p) for p in ['this','that']]
>>> text = "Does this text match the pattern?"

>>> print "Text: % r\n" %text
Text: 'Does this text match the pattern?'

>>> for regex in regexes:
	print 'Seeking "%s" -> ' %regex.pattern
	if regex.search(text):
		print 'Match!'
	else:
		print 'Unmatch!'

		
Seeking "this" -> 
Match!
Seeking "that" -> 
Unmatch!

模块及函数会维护一编译表达式的一个缓存，不过这个缓存是有大小限制的，直接使用已编译表达式可以避免缓存查找开销。使用已编译表达式的另一个好处是，通过早加载模块时预编译所有表达式，可以把编译工作转移到应用开始时，而不是当程序响应一个用户动作时才进行编译。

多重匹配，findall()会返回输入中与模式匹配而不重复的所有子串。

>>> text = 'abbaaabbbbaaaaa'
>>> pattern = 'ab'
>>> for match in re.findall(pattern,text):
	print 'Found "%s"' % match

	
Found "ab"
Found "ab"

参考： String Services 标准库文档