pythonic之路（二）

最新推荐文章于 2025-03-22 20:16:55 发布

原创最新推荐文章于 2025-03-22 20:16:55 发布 · 262 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#python #pythonic #语法简洁 #数据

python 专栏收录该内容

12 篇文章

订阅专栏

pythonic之路—-编写简洁清晰python代码（二）

十、多用生成器和生成器表达式

至于什么是生成器，可参看我的另一篇文章迭代对象、迭代器、生成器浅析。

如果需要迭代处理的序列包含了无限的元素，比如串口读回来的数据流、某网站发帖信息流等，生成器是最好选择，否则用list类容器的话数据会占用完内存，除非不断地把旧值pop掉，但这样做显然没有生成器那么简洁清晰。

如果需要从一个很大的序列每次提取一个值来做非常复杂的运算，那么用list类容器一次性把数据全读入内存显然会降低运算效率，这时最好用生成器。

故，生成器是你python路上的挚友。

一个例子，获取Twitter上某个关键字对应的数据流。

list实现：

def get_twitter_stream_for_keyword(keyword):

    """Get's the 'live stream', but only at the moment
    the function is initially called. To get more entries,
    the client code needs to keep calling
    'get_twitter_livestream_for_user'. Not ideal.
    """

    imaginary_twitter_api = ImaginaryTwitterAPI()
    if imaginary_twitter_api.can_get_stream_data(keyword):
        return imaginary_twitter_api.get_stream(keyword)
    current_stream = get_twitter_stream_for_keyword('#jeffknupp')
    for tweet in current_stream:
        process_tweet(tweet)

# Uh, I want to keep showing tweets until the program is quit.
# What do I do now? Just keep calling
# get_twitter_stream_for_keyword? That seems stupid.

def get_list_of_incredibly_complex_calculation_results(data):
    return [first_incredibly_long_calculation(data),
    second_incredibly_long_calculation(data),
    third_incredibly_long_calculation(data),
    ]

generator实现：

def get_twitter_stream_for_keyword(keyword):

    """Now, 'get_twitter_stream_for_keyword' is a generator
    and will continue to generate Iterable pieces of data
    one at a time until 'can_get_stream_data(user)' is
    False (which may be never).
    """

    imaginary_twitter_api = ImaginaryTwitterAPI()
    while imaginary_twitter_api.can_get_stream_data(keyword):
        yield imaginary_twitter_api.get_stream(keyword)
    # Because it's a generator, I can sit in this loop until
    # the client wants to break out
    for tweet in get_twitter_stream_for_keyword('#jeffknupp'):
        if got_stop_signal:
            break
        process_tweet(tweet)

def get_list_of_incredibly_complex_calculation_results(data):

    """A simple example to be sure, but now when the client
    code iterates over the call to
    'get_list_of_incredibly_complex_calculation_results',
    we only do as much work as necessary to generate the
    current item.
    """

    yield first_incredibly_long_calculation(data)
    yield second_incredibly_long_calculation(data)
    yield third_incredibly_long_calculation(data)

十一、多用上下文管理器with

对于文件的操作上下文管理器可谓是将遇良才，极其适用。

上下文管理器本质上是一个实现了__enter__()和__exit__()魔法方法的类。除了python内置的上下文管理器，自己也可以实现一个，只要加入这两个魔法方法即可。也可以用@contextmanager装饰器来实现上下文管理器。

一个官方例子。当执行到yield子句时，with下面的代码将被执行，类似于open文件时的f.read()。其实open返回的就是一个生成器，源码内部有yield语句。

from contextlib import contextmanager

@contextmanager
def tag(name):
    print("<%s>" % name)
    yield
    print("</%s>" % name)
with tag("h1"):
    print("foo")#输出<h1> foo </h1>

十二、用单下划线_作为占位符忽略掉不使用的数据

有时候某些数据的出现是毫无用途的，这时候就不要把它赋值给某个变量了，而是直接舍弃掉，也可以节约点内存。

s = "afmldskmfl"
occurrences = {}
for _,e in enumerate(s):
    occurrences1[e]=occurrences1.get(e,0)+1#返回{';': 1, 'a': 3, 'd': 1, 'f': 1, 'k': 1, 'l': 1, 's': 3}

(name, age, _, _) = get_user_info(user)

十三、直接交换两个变量的值，不要引入临时变量作为过渡

a, b = 1, 2
a, b = b, a+b#返回a = 2, b = 3

十四、使用全大写字母表示全局变量或常数。

LIGHT_SPEED = 299792458#[M/S]

十五、在每个程序中都加入 if __name__ == __main__语句

在每一个.py文件末尾最好都加入if __name__ == __main__: do_something。尤其是当该文件作为模块导入时更方便。

可将调试语句和零时验证程序正确与否的语句放在if条件句后面，很方便简单调试及结果显示，如果一旦作为模块导入了，那么if条件句后面的代码块都会被忽略，因为此时的__name__不再是__main__了。

十六、用sys.exit返回程序错误代码

对于任何程序，最好都定义一个返回值为0的main()函数。在main()内部通过调用sys.exit返回错误信息，在if __name__ == __main__里通过sys.exit(main())调用main()，如果正常运行则返回0并安全退出。

这样的写法可直接用于Unix pipelines和SHELL。

import sys

def do_stuff(argument):
    pass

def do_stuff_with_result(result):
    pass

def main():
    if len(sys.argv) < 2:
        sys.exit('You forgot to pass an argument')
    argument = sys.argv[1]
    result = do_stuff(argument)
    if not result:
        sys.exit(1)
    do_stuff_with_result(result)
    return 0

if __name__ == '__main__':
    sys.exit(main())

十七、永远不要使用from package import *来导入包或模块

*是个通配符，会将package里的所有东西都导入进来，如果package里有个模块名字为foo，而自己写的程序里也有一个子程序叫foo，这会导致命名空间冲突，因为from package import *污染了当前命名空间。

十八、导入模块顺序

其实，可以以任意顺序导入模块，但最好用约定俗成的顺序。

standard library modules
third-party library modules installed in site-packages
modules local to the current project

如果每一类型模块包含了好几个模块，最好用字母顺序排序。

十九、善用自带电池，不要重新造轮子

python是一门自带电池的语言，扩展库和模块非常丰富。每当遇到一个功能，首先考虑标准库有没有实现，其次再想办法自己造轮子。

充分利用标准库好处良多，最主要的有两点，第一，节省程序开发时间，同时标准库模块的性能更高；第二、代码更加清晰简洁，让维护者更轻松。

如果标准库里没有找到你需要的功能，那就上PyPI (the Python Package Index)找找，PyPI维护着超过27000个各种包。如果PyPI里也没有，那就上GitHub找；还没有的话那就自己造，如果能造出很好的轮子，可以申请添加到PyPI，让整个python社区受益。

如果找到了一个包，最好用python包管理器pip进行安装，语法为pip install <package name>。也可以上GitHub下载源码，在CMD里通过python install setup.py手动安装，但是这种方法不方便，因为有些包还需要别的包来支持，所以还得手动安装支持包（如果支持包之前没安装过）。但用pip就很方便了，会自动检测支持包并自动安装。

二十、用os.path模块操作文件路径

os.path包含了处理文件路径的所有功能，不要用“+”去操作路径。

一个重命名文件的例子。

from datetime import date
import os

current_directory = os.getcwd()
filename_to_archive = 'test.txt'
new_filename = os.path.splitext(filename_to_archive)[0] + '.bak'
target_directory = os.path.join(current_directory, 'archives')
today = date.today()
new_path = os.path.join(target_directory, str(today))

if (os.path.isdir(target_directory)):
    if not os.path.exists(new_path):
        os.mkdir(new_path)
    os.rename(
    os.path.join(current_directory, filename_to_archive),
    os.path.join(new_path, new_filename))

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
（内容同步更新到微信公众号python数学物理，微信号python_math_physics）

这里写图片描述