Why is truth() faster than bool()? Part II

本文分享了五种Python编程中提高代码执行效率的方法:使用字面量语法创建集合比调用构造函数更快;条件字典查找时避免使用dict.get();尽量不使用copy模块进行复制操作;减少属性解析开销;使用operator.truth()替代bool()判断真值。

Why is truth() faster than bool()? Part II

In this post we’ll continue our quest for line-by-line Python optimizations.

We already covered several tips in part I.

Today we’ll learn:

  1. Why using syntax beats calling methods
  2. How dict.get() is disappointing
  3. Why you should avoid everything in the copy module if you can
  4. How object attribute resolution is not free
  5. And finally, why operator.truth() is faster than bool()

1) Syntax is faster than calling methods

Use the {} and [] syntax to create new dicts and lists rather than the dict() or list() methods.

Using the dis module to examine dict construction, we see the two processes are totally different.

>>> from dis import dis

>>> dis(lambda: dict(a=1))

  1           0 LOAD_GLOBAL              0 (dict)

              3 LOAD_CONST               0 ('a')

              6 LOAD_CONST               1 (1)

              9 CALL_FUNCTION          256

             12 RETURN_VALUE        

>>> dis(lambda: {‘a’: 1})

  1           0 BUILD_MAP                1

              3 LOAD_CONST               0 (1)

              6 LOAD_CONST               1 ('a')

              9 STORE_MAP           

             10 RETURN_VALUE     

The {} syntax goes directly to the bytecode BUILD_MAP and STORE_MAP, where dict() does a subroutine call (CALL_FUNCTION) plus it has to look up the name ‘dict’ first (LOAD_GLOBAL).

The difference is large:

>>> timeit.Timer("{'a': 1}").timeit(1000000)

0.10097813606262207

>>> timeit.Timer("dict(a=1)").timeit(1000000)

0.43113112449645996

The result is similar for making an empty list:

>>> timeit.Timer("[]").timeit(1000000)

0.039584875106811523

>>> timeit.Timer("list()").timeit(1000000)

0.1627810001373291

2) Conditional dict lookup

It’s a heartbreaker, but dict.get() is slower than its more verbose alternatives.

k in adict and adict[k]

is faster than

adict.get(k)

According to timeit:

>>> timeit.Timer("adict.get(0)", "adict = {1: 2, 3: 4, 5: 6}").timeit(1000000)

0.13234400749206543

>>> timeit.Timer("0 in adict and adict[0]", "adict = {1: 2, 3: 4, 5: 6}").timeit(1000000)

0.052942037582397461

It’s still slower even if the key exists:

>>> timeit.Timer("adict.get(1)", "adict = {1: 2, 3: 4, 5: 6}").timeit(1000000)

0.13730907440185547

>>> timeit.Timer("1 in adict and adict[1]", "adict = {1: 2, 3: 4, 5: 6}").timeit(1000000)

0.099563121795654297

Or if you specify an alternative:

>>> timeit.Timer("adict.get(0, None)", "adict = {1: 2, 3: 4, 5: 6}").timeit(1000000)

0.15692591667175293

>>> timeit.Timer("adict[0] if 0 in adict else None", "adict = {1: 2, 3: 4, 5: 6}").timeit(1000000)

0.077980995178222656

Even so, it’s kind of ugly and long-winded. Isn’t just better to keep it simple and aesthetic, even at the cost of a little efficiency?

Let’s consider a slightly more complex case:

adict.get(x, {})

In this case, the {} dict object will be constructed every single time and passed to the get method, and then thrown away if the key is found (imagine even using a more complex object than a dict). Consider instead an alternative such as this:

adict[x] if x in adict else {}

In this way the {} will never be constructed unless it will actually be used.

3) Avoid the copy module

copy.copy and copy.deepcopy are performance hogs. Avoid them if you can!

Look at how copy.deepcopy performs when copying nested dicts just two levels deep, compared to just iterating through them and making copies ourselves:

>>> adict = {1: {2: 3, 4: 5}, 4: {5: 8}, 8: {3: 'a', 'b': 9}}

>>> timeit.Timer("copy.deepcopy(adict)", "import copy; from __main__ import adict").timeit(100000)

2.78672194480896

>>> timeit.Timer("dict((k, v.copy()) for k, v in adict.iteritems())", "from __main__ import adict").timeit(100000)

0.25296592712402344

Yikes.

copy.deepcopy does a big, scary recursive copy, so no wonder it’s so slow. But, you might ask, I just want a shallow copy of a list or dict! Can’t I use copy.copy?

copy.copy does all sorts of type checking that looks for different ways to copy different types of things. It’s also implemented in Python, not c, so there’s no advantage there. If you’re certain of the type of object you’re going to copy, there is always a better alternative than copy.copy.

You might think, if you’re copying a dict, you can just pass it into dict() to make a copy, such as dict(adict). The same goes for a list: list(alist); and sets: set(aset). But these are not actually the fastest ways to make a copies of these types of objects.

The dict and set types have copy() methods that are the fastest way to make a new copy of the same object:

adict.copy()

and

aset.copy()

For a dict, the timeit output is:

>>> timeit.Timer("copy.copy(adict)", "import copy; adict={1: 2, 3: 4, 5: 6}").timeit(100000)

0.12309503555297852

>>> timeit.Timer("dict(adict)", "import copy; adict={1: 2, 3: 4, 5: 6}").timeit(100000)

0.052445888519287109

>>> timeit.Timer("adict.copy()", "import copy; adict={1: 2, 3: 4, 5: 6}").timeit(100000)

0.017553091049194336

Interestingly for lists, the fastest way to make a copy is actually an unbounded slice operation:

newlist = alist[:]

timeit says it all:

>>> timeit.Timer("copy.copy(a)", "import copy; a = [1,2,3,4,5]").timeit(100000)

0.092635869979858398

>>> timeit.Timer("list(a)", "import copy; a = [1,2,3,4,5]").timeit(100000)

0.028503894805908203

>>> timeit.Timer("a[:]", "import copy; a = [1,2,3,4,5]").timeit(100000)

0.013506889343261719

The same syntax works for tuples (or for that matter, strings) but you should never need to copy a tuple or string since they are immutable.

If you’re copying an object of your own design, the best method is write your own copy constructor and use that.

4) Attribute resolution is not free

If you want to use the groupby method in the itertools module, should you import groupby from itertools or import itertools and then call itertools.groupby?

Opinions differ on the relative merits in terms of style, but each of those ‘.’ operators actually incur a little cost so “from itertools import groupby” is actually more efficient if you’re using groupby a lot.

You can see the minor difference using the dis module again:

>>> import itertools

>>> dis.dis(lambda: itertools.groupby(()))

  1           0 LOAD_GLOBAL              0 (itertools)

              3 LOAD_ATTR                1 (groupby)

              6 LOAD_CONST               0 (())

              9 CALL_FUNCTION            1

             12 RETURN_VALUE        

>>> from itertools import groupby

>>> dis.dis(lambda: groupby(()))

  1           0 LOAD_GLOBAL              0 (groupby)

              3 LOAD_CONST               0 (())

              6 CALL_FUNCTION            1

              9 RETURN_VALUE 

Note the additional LOAD_ATTR instruction when calling itertools.groupby . This will be incurred every time that itertools.groupby is called; for all the Python interpreter knows, the attribute reference could have been redefined between subsequent calls.

The actual performance difference for a trivial case is real but small:

>>> timeit.Timer("groupby(())", "from itertools import groupby").timeit(1000000) 

0.20799016952514648

>>> timeit.Timer("itertools.groupby(())", "import itertools").timeit(1000000)

0.24183106422424316

5) Truth is faster than bool

Suppose you want to get the count of the number of objects in some collection that have some attribute set to True. You could follow the advice from the previous post and write something like:

sum(some_object.some_attribute for some_object in object_list)

(Remember, bool is a subclass of int in Python and True is an alias of 1 and False is an alias of 0.)

But, if some_attribute in the code snippet above could possibly be None, you risk a TypeError. Unlike False, None is not equivalent to zero and does not support arithmetic operation.

So, here’s an alternative. Just pass the attribute to bool() to cast Nones into Falses.

sum(bool(some_object.some_attribute) for some_object in object_list)

But, is this the fastest way? No, as it turns out.

It turns out that operator.truth is faster than bool.

from operator import truth

sum(truth(some_object.some_attribute) for some_object in object_list)

If object_list is extremely large, this can make a significant difference.

Why is this? Examining the source code for bool, we find:

  53 bool_new(PyTypeObject *type, PyObject *args, PyObject *kwds)

  54 {

  55     static char *kwlist[] = {"x"0};

  56     PyObject *= Py_False;

  57     long ok;

  58

  59     if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O:bool", kwlist, &x))

  60         return NULL;

  61     ok = PyObject_IsTrue(x);

  62     if (ok < 0)

  63         return NULL;

  64     return PyBool_FromLong(ok);

  65 }

Whereas in the operator module source code:

  88 spami(truth            , PyObject_IsTrue)

operator.truth is just an alias for the Python C function PyObject_IsTrue, but bool() does some extra stuff that adds some overhead before ultimately calling PyObject_IsTrue itself.

By the way, remember:

from operator import truth

truth()

is a tiny bit faster than

import operator

operator.truth()

As we saw above, that ‘.’ does do a lookup that adds a tiny bit of execution to resolve.

Coda

If you enjoyed these posts and like writing nimble Python, we should have a conversation about you working here. Contact our man on the inside John Delaney at  johnd@tellapart.com

【无人机】基于改进粒子群算法的无人机路径规划研究[和遗传算法、粒子群算法进行比较](Matlab代码实现)内容概要:本文围绕基于改进粒子群算法的无人机路径规划展开研究,重点探讨了在复杂环境中利用改进粒子群算法(PSO)实现无人机三维路径规划的方法,并将其与遗传算法(GA)、标准粒子群算法等传统优化算法进行对比分析。研究内容涵盖路径规划的多目标优化、避障策略、航路点约束以及算法收敛性和寻优能力的评估,所有实验均通过Matlab代码实现,提供了完整的仿真验证流程。文章还提到了多种智能优化算法在无人机路径规划中的应用比较,突出了改进PSO在收敛速度和全局寻优方面的优势。; 适合人群:具备一定Matlab编程基础和优化算法知识的研究生、科研人员及从事无人机路径规划、智能优化算法研究的相关技术人员。; 使用场景及目标:①用于无人机在复杂地形或动态环境下的三维路径规划仿真研究;②比较不同智能优化算法(如PSO、GA、蚁群算法、RRT等)在路径规划中的性能差异;③为多目标优化问题提供算法选型和改进思路。; 阅读建议:建议读者结合文中提供的Matlab代码进行实践操作,重点关注算法的参数设置、适应度函数设计及路径约束处理方式,同时可参考文中提到的多种算法对比思路,拓展到其他智能优化算法的研究与改进中。
<think>好的,我现在需要帮助用户解决在无GPU环境下运行faster-whisper的配置问题。首先,我需要回忆一下faster-whisper的相关知识。faster-whisper是基于OpenAI的Whisper模型的优化版本,由Hugging Face的团队开发,主要利用CTranslate2库进行加速,这个库支持CPU和GPU。用户现在没有GPU,所以需要配置在纯CPU环境下运行。 接下来,我应该考虑用户可能遇到的问题。首先,安装依赖项。用户可能需要安装正确版本的库,比如ctranslate2和faster-whisper,这些库在无GPU情况下是否需要特别版本?我记得CTranslate2支持CPU模式,所以安装时可能不需要CUDA相关的依赖。用户可能需要通过pip安装,并且指定不需要GPU的版本,或者默认就是支持CPU的。 然后,代码配置方面。在导入faster-whisper的WhisperModel时,需要设置device参数为"cpu",compute_type可能为"int8"或"float32",因为CPU不支持某些混合精度计算。需要提醒用户注意这些参数的正确设置,否则可能会出错或者性能低下。 性能优化方面,CPU环境下可能需要调整线程数,比如设置CPU_THREADS环境变量来利用多核。此外,batch_size参数可能也需要调整,减少内存占用,避免内存不足的问题。 另外,模型选择也很重要。较小的模型如tiny、base可能在CPU上更快,而大模型如large-v3可能在CPU上非常慢,甚至不可行。用户可能需要权衡速度和准确性选择适合的模型。 还有内存问题,CPU推理可能占用更多内存,特别是大模型,需要提醒用户注意系统内存是否足够,如果内存不足可能需要关闭其他程序或者使用交换空间。 另外,用户可能不知道如何验证是否真的在使用CPU,可能需要指导他们检查日志或代码输出,确保没有意外使用GPU。 最后,可能需要给出一个完整的示例代码,展示如何配置和运行,以及性能优化的建议,比如调整线程数、选择合适的模型等。 现在需要检查是否有遗漏的点。比如是否需要安装其他依赖库,比如onnxruntime或者某些CPU优化的数学库?或者是否在安装时需要通过特定的pip选项来排除GPU支持?可能CTranslate2的安装默认包括CPU支持,所以不需要特别处理,但用户需要确保不安装CUDA相关的库。 另外,配置文件方面,用户提到的引用[1]中提到配置文件可能包含GPU加速和并行执行的配置项,但在无GPU环境下,可能需要修改这些配置,比如设置use_gpu为False,或者调整并行线程数。不过faster-whisper的配置可能主要通过代码参数而非外部配置文件,所以可能需要指导用户在代码中明确指定device和compute_type。 总结步骤:安装依赖、代码配置、模型选择、性能优化、内存管理、验证配置。需要一步步解释,确保用户能按照步骤操作,并解决可能遇到的问题。</think>以下是针对无GPU环境运行faster-whisper的详细配置教程: ### 一、环境准备 1. **安装基础依赖** ```bash pip install faster-whisper ``` 该命令会自动安装CTranslate2库(支持CPU推理的核心组件)和必要的Python依赖[^1]。 2. **验证CPU模式支持** 确保系统中未安装CUDA驱动,若已安装CUDA但需强制使用CPU,可通过代码参数指定(见下文)。 ### 二、代码配置 ```python from faster_whisper import WhisperModel # 显式指定使用CPU设备 model = WhisperModel( model_size_or_path="base", # 推荐使用较小模型 device="cpu", # 强制使用CPU compute_type="int8", # CPU推荐int8/float32 cpu_threads=4 # 根据CPU核心数调整 ) # 执行语音识别 segments, info = model.transcribe("audio.mp3", beam_size=5) for segment in segments: print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text)) ``` #### 关键参数说明: - `device="cpu"`:强制使用CPU推理 - `compute_type="int8"`:量化模型提升速度(需模型支持) - `cpu_threads=4`:设置并行线程数(建议等于物理核心数) ### 三、性能优化技巧 1. **模型选择建议** | 模型类型 | 内存占用 | 相对速度 | 适用场景 | |----------|----------|----------|------------------| | tiny | ~1GB | 最快 | 短语音实时转录 | | base | ~1.5GB | 较快 | 常规语音识别 | | small | ~5GB | 中等 | 高精度识别 | | large-v3 | ~10GB | 最慢 | 专业级转录 | 2. **内存管理** - 添加`swapoff -a && swapon -a`命令增加交换空间(Linux) - 设置`batch_size=1`减少内存峰值 ```python segments = model.transcribe(..., batch_size=1) ``` 3. **多核并行加速** 通过环境变量控制线程数: ```bash export CT2_VERBOSE=1 # 查看详细日志 export CT2_CPU_THREADS=8 # 设置8线程 ``` ### 四、常见问题排查 1. **出现`Illegal instruction`错误** 需使用支持AVX指令集的CPU,可通过以下命令检查: ```bash lscpu | grep avx ``` 若无输出则需更换硬件或使用Docker镜像(如`ctranslate2-cpu`镜像) 2. **转录速度过慢** 尝试以下组合优化: ```python model.transcribe( audio.mp3, beam_size=3, # 减小搜索空间 patience=1, # 减少冗余计算 length_penalty=0.5 # 平衡输出长度 ) ``` ### 五、完整实践案例 1. **Docker部署方案** ```dockerfile FROM python:3.10-slim RUN pip install faster-whisper CMD ["python", "-c", "from faster_whisper import WhisperModel; model = WhisperModel('tiny', device='cpu')"] ``` 2. **实时录音转录脚本** ```python import sounddevice as sd from scipy.io.wavfile import write fs = 16000 # 采样率 seconds = 5 # 录音时长 # 实时录音 recording = sd.rec(int(seconds * fs), samplerate=fs, channels=1) sd.wait() write("output.wav", fs, recording) # 即时转录 segments, info = model.transcribe("output.wav") ``` [^1]: Faster Whisper WebUI 项目教程提到,通过合理配置计算类型和设备参数,可以在无GPU环境下实现可用性能。实际测试显示,在i7-12700H CPU上,base模型处理1分钟音频约需30秒。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值