python的yield、标准输入输出的使用练习

需求背景是将一个test.txt文档内容转化为html文件

需要用到sys.argv,yield,re.sub,标准输入输出用法

  1. test.txt内容
    Welcome to World Wide Spam, Inc.
    
    These are the corporate web pages of *World Wide Spam*, Inc. We hope
    you find your stay enjoyable, and that you will sample many of our
    products.
    
    A short history of the company
    World Wide Spam was started in the summer of 2000. The business
    concept was to ride the dot-com wave and to make money both through
    bulk email and by selling canned meat online.
    After receiving several complaints from customers who weren't
    satisfied by their bulk email, World Wide Spam altered their profile,
    and focused 100% on canned goods. Today, they rank as the world's
    13,892nd online supplier of SPAM.
    Destinations
    From this page you may visit several of our interesting web pages:
    - What is SPAM? (http://wwspam.fu/whatisspam)
    - How do they make it? (http://wwspam.fu/howtomakeit)
    - Why should I eat it? (http://wwspam.fu/whyeatit)
    How to get in touch with us
    You can get in touch with us in *many* ways: By phone (555-1234), by
    email (wwspam@wwspam.fu) or by visiting our customer feedback page
    (http://wwspam.fu/feedback).

  2. 用来读文件内容的生成器block.py。函数+yield=生成器,可用next(f)或 for x in f实现迭代
    #block.py
    def line(file):
        with open(file) as f:
            for line in f.readlines():
                yield line  #迭代返回读取到的每一行
            yield '\n'      #最后一次迭代返回一个空行
    
    def block(file):
        block=[]                             #创建一个空列表用来装段落
        for l in line(file):               
            if l.strip():                
                block.append(l)              #如果读取到的这行不为空则增加到列表中
            elif block:
                yield ''.join(block).strip() #如果读取到的这行为空则将列表中的元素合并到一起形成一个段落,并迭代返回
                block=[]                     

  3. 用来生成文章段落的生成器txt2html.py。sys.argv[1]用来读取命令行的第一个参数,re.sub正则替换(.+?)表示非贪婪模式
    #txt2html.py
    import re,sys
    from blocks import *
    
    print('<html><head><title>...</title><body>')
    
    title=True
    for b in block(sys.argv[1]):
        b=re.sub(r'\*(.+?)\*',r'<em>\1<em>',b)
        if title:
            print('<h1>')
            print(b)
            print('<h1>')
            title=False
        else:
            print('<p>')
            print(b)
            print('<p>')
    print('</body></html>')
    
    import re
    a=r"'test01','test02','test03'"
    b=re.sub(r"'(.+?)'",r"+\1+",a)
    c=re.sub(r"'(.+)'",r"+\1+",a)
    print(b)
    print(c)

    b:+test01+,+test02+,+test03+
    c:+test01','test02','test03+       可以看到使用非贪婪模式的b是将每个单引号都替换成了+,而使用非贪婪模式的c之后最前和最后的单引号替换成了+。也就说明在贪婪模式下会从开始一直匹配到最后一个匹配的字符,中间即使有匹配上的也会忽略。而非贪婪模式下,会按最短距离匹配。

  4. 调用命令。标准输入(<)输出(>) 
    python txt2html.py e:\test.sql > e:\out.html

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值