python 中的yield

本文深入解析Python中生成器的概念以及yield关键字的作用,详细解释迭代器与可迭代对象的区别,通过实例演示如何使用yield创建生成器,并阐述其在处理大量数据时的优势。文章最后介绍如何控制生成器的耗尽并展示高级用法,如如何利用生成器控制对资源的访问。同时,介绍了Python内置的itertools模块,展示了如何利用它来操作生成器等迭代对象,实现复杂的迭代逻辑。

转载stackoverflow上面e-satis的回答

原帖:http://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do-in-python/231855#231855



To understand what yield does, you must understand what generators are. And before generators come iterables.

要想理解yield是什么,你必须明白什么是生成器,而在这之前还要明白什么是迭代器

Iterables

迭代器

When you create a list, you can read its items one by one. Reading its items one by one is called iteration:

当你创建一个列表时。你能一个接一个的读出它的数据项,这个过程就叫做迭代         

>>> mylist = [1, 2, 3]
>>> for i in mylist:
...    print(i)
1
2
3

mylist is an iterable. When you use a list comprehension, you create a list, and so an iterable:

mylist就是一个迭代器。(算了,不翻了)

>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
...    print(i)
0
1
4

Everything you can use "for... in..." on is an iterable: listsstrings, files...

These iterables are handy because you can read them as much as you wish, but you store all the values in memory and this is not always what you want when you have a lot of values.

Generators

Generators are iterators, but you can only iterate over them once. It's because they do not store all the values in memory, they generate the values on the fly:

>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
...    print(i)
0
1
4

It is just the same except you used () instead of []. BUT, you cannot perform for i in mygenerator a second time since generators can only be used once: they calculate 0, then forget about it and calculate 1, and end calculating 4, one by one.

Yield

Yield is a keyword that is used like return, except the function will return a generator.

>>> def createGenerator():
...    mylist = range(3)
...    for i in mylist:
...        yield i*i
...
>>> mygenerator = createGenerator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object createGenerator at 0xb7555c34>
>>> for i in mygenerator:
...     print(i)
0
1
4

Here it's a useless example, but it's handy when you know your function will return a huge set of values that you will only need to read once.

To master yield, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object, this is a bit tricky :-)

Then, your code will be run each time the for uses the generator.

Now the hard part:

The first time the for calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield, then it'll return the first value of the loop. Then, each other call will run the loop you have written in the function one more time, and return the next value, until there is no value to return.

The generator is considered empty once the function runs but does not hit yield anymore. It can be because the loop had come to an end, or because you do not satisfy a "if/else" anymore.

Your code explained

Generator:

# Here you create the method of the node object that will return the generator
def node._get_child_candidates(self, distance, min_dist, max_dist):

  # Here is the code that will be called each time you use the generator object:

  # If there is still a child of the node object on its left
  # AND if distance is ok, return the next child
  if self._leftchild and distance - max_dist < self._median:
      yield self._leftchild

  # If there is still a child of the node object on its right
  # AND if distance is ok, return the next child
  if self._rightchild and distance + max_dist >= self._median:
      yield self._rightchild

  # If the function arrives here, the generator will be considered empty
  # there is no more than two values: the left and the right children

Caller:

# Create an empty list and a list with the current object reference
result, candidates = list(), [self]

# Loop on candidates (they contain only one element at the beginning)
while candidates:

    # Get the last candidate and remove it from the list
    node = candidates.pop()

    # Get the distance between obj and the candidate
    distance = node._get_dist(obj)

    # If distance is ok, then you can fill the result
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)

    # Add the children of the candidate in the candidates list
    # so the loop will keep running until it will have looked
    # at all the children of the children of the children, etc. of the candidate
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))

return result

This code contains several smart parts:

  • The loop iterates on a list but the list expands while the loop is being iterated :-) It's a concise way to go through all these nested data even if it's a bit dangerous since you can end up with an infinite loop. In this case, candidates.extend(node._get_child_candidates(distance, min_dist, max_dist)) exhausts all the values of the generator, but while keeps creating new generator objects which will produce different values from the previous ones since it's not applied on the same node.

  • The extend() method is a list object method that expects an iterable and adds its values to the list.

Usually we pass a list to it:

>>> a = [1, 2]
>>> b = [3, 4]
>>> a.extend(b)
>>> print(a)
[1, 2, 3, 4]

But in your code it gets a generator, which is good because:

  1. You don't need to read the values twice.
  2. You can have a lot of children and you don't want them all stored in memory.

And it works because Python does not care if the argument of a method is a list or not. Python expects iterables so it will work with strings, lists, tuples and generators! This is called duck typing and is one of the reason why Python is so cool. But this is another story, for another question...

You can stop here, or read a little bit to see a advanced use of generator:

Controlling a generator exhaustion

>>> class Bank(): # let's create a bank, building ATMs
...    crisis = False
...    def create_atm(self):
...        while not self.crisis:
...            yield "$100"
>>> hsbc = Bank() # when everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # crisis is coming, no more money!
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # it's even true for new ATMs
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # trouble is, even post-crisis the ATM remains empty
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # build a new one to get back in business
>>> for cash in brand_new_atm:
...    print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...

It can be useful for various things like controlling access to a resource.

Itertools, your best friend

The itertools module contains special functions to manipulate iterables. Ever wish to duplicate a generator? Chain two generators? Group values in a nested list with a one liner? Map / Zip without creating another list?

Then just import itertools.

An example? Let's see the possible orders of arrival for a 4 horse race:

>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
 (1, 2, 4, 3),
 (1, 3, 2, 4),
 (1, 3, 4, 2),
 (1, 4, 2, 3),
 (1, 4, 3, 2),
 (2, 1, 3, 4),
 (2, 1, 4, 3),
 (2, 3, 1, 4),
 (2, 3, 4, 1),
 (2, 4, 1, 3),
 (2, 4, 3, 1),
 (3, 1, 2, 4),
 (3, 1, 4, 2),
 (3, 2, 1, 4),
 (3, 2, 4, 1),
 (3, 4, 1, 2),
 (3, 4, 2, 1),
 (4, 1, 2, 3),
 (4, 1, 3, 2),
 (4, 2, 1, 3),
 (4, 2, 3, 1),
 (4, 3, 1, 2),
 (4, 3, 2, 1)]

Understanding the inner mechanisms of iteration

Iteration is a process implying iterables (implementing the __iter__() method) and iterators (implementing the __next__() method). Iterables are any objects you can get an iterator from. Iterators are objects that let you iterate on iterables.

More about it in this article about how does the for loop work.

<think>嗯,用户想了解Pythonyield的用法和生成器的相关知识。首先,我需要回顾一下生成器的基本概念。生成器是一种特殊的迭代器,可以通过函数来创建,而yield关键字就是用来定义生成器的关键。用户可能已经知道普通的函数使用return返回值,但yield的不同之处在于它会暂停函数执行并保留状态,下次调用时继续执行。 接下来,我需要解释yield的工作流程。例如,当函数执行到yield语句时,会返回一个值,但不会退出函数,而是保持当前状态。下次调用next()时,函数从yield之后继续执行。这点可能需要用例子来说明,比如斐波那契数列的生成器,这样用户能更直观地理解。 然后,生成器的优势是什么?内存效率高,因为不需要一次性生成所有数据,而是按需生成。这在处理大数据时特别有用。比如,当用户需要处理一个很大的文件或者数据集时,生成器可以逐个产生数据,而不是全部加载到内存中。这可以引用用户提供的引用内容中的例子,比如sum(firstn(1000000)),说明生成器如何提升性能。 另外,用户可能想知道生成器的应用场景。比如流式处理、惰性计算、协程等。这些场景需要展开说明,让用户了解实际使用中的情况。比如,使用生成器处理日志文件,逐行读取处理,而不是一次性读入内存。 用户提供的引用中有提到生成器和迭代器的区别,可能需要对比说明。生成器是迭代器的一种,但更简洁,不需要实现__iter__和__next__方法,只需使用yield即可。引用中的斐波那契例子可以再次使用,展示生成器的简洁性。 还要注意yield的常见误区,比如函数中使用yield后,函数就变成了生成器函数,调用时返回生成器对象,而不是直接执行。这点需要强调,避免用户误解。同时,生成器只能遍历一次,之后需要重新创建,这也是需要注意的地方。 用户的问题可能涉及如何创建生成器,除了函数中使用yield,还可以用生成器表达式。比如(i for i in range(10)),和列表推导式类似,但使用圆括号。这点可能需要补充说明,以全面回答用户的问题。 最后,结合用户提供的引用内容,引用相关的部分来支持解释,比如引用[3]提到的yield类似return,但函数不会释放状态,这点需要强调。同时,引用[2]中的例子可以说明生成器在性能上的优势,比如处理大量数据时节省内存。 总结来说,回答的结构应该包括:生成器的定义、yield的工作机制、生成器的优势、应用场景、与迭代器的对比、创建方式以及注意事项。通过例子和引用中的内容来增强解释的可信度,并确保语言通俗易懂,帮助用户逐步理解。</think>### Python yield 用法及生成器详解 #### 一、生成器的核心概念 生成器是一种特殊的迭代器,通过函数中的`yield`关键字实现。与普通函数使用`return`不同,`yield`会**暂停函数执行并保留当前状态**,下次调用时从暂停处继续执行[^3]。 #### 二、yield 的工作机制 1. **执行流程** - 当函数执行到`yield`时,返回右侧表达式的值 - 函数状态(变量值、执行位置)被冻结 - 下次调用`next()`时,从`yield`下一行继续执行 ```python def count_down(n): while n > 0: yield n n -= 1 gen = count_down(3) print(next(gen)) # 输出 3 print(next(gen)) # 输出 2 ``` 2. **状态保持特性** 生成器函数中的局部变量在多次调用中会保持状态,而普通函数每次调用都会重新初始化变量[^3]。 #### 三、生成器的核心优势 1. **内存效率** 按需生成数据,避免一次性加载所有数据到内存。例如处理百万级数据: ```python def large_dataset(): for i in range(1_000_000): yield i * 2 ``` 此生成器仅需几十字节内存,而列表需要约8MB[^2]。 2. **惰性计算特性** 数据只有在被请求时才会生成,适合处理: - 无限序列(如斐波那契数列) - 流式数据(如日志文件实时处理) - 大数据集分块处理 #### 四、典型应用场景 1. **高效迭代器实现** ```python def fibonacci(): a, b = 0, 1 while True: yield a a, b = b, a + b fib = fibonacci() print([next(fib) for _ in range(10)]) # 输出前10项 ``` 2. **协程实现基础** `yield`可实现双向通信,是协程编程的基础: ```python def coroutine(): while True: received = yield print(f"Received: {received}") c = coroutine() next(c) # 启动协程 c.send("Hello") # 输出 Received: Hello ``` #### 五、生成器进阶用法 1. **表达式形式** 使用圆括号创建生成器表达式: ```python squares = (x*x for x in range(10)) sum(squares) # 返回 285 ``` 2. **数据管道构建** 多生成器串联实现复杂处理流程: ```python def read_lines(file): with open(file) as f: for line in f: yield line.strip() def filter_comments(lines): for line in lines: if not line.startswith('#'): yield line pipeline = filter_comments(read_lines('config.ini')) ``` #### 六、注意事项 1. **一次性特性** 生成器遍历完成后需要重新创建: ```python gen = (x for x in range(3)) list(gen) # [0,1,2] list(gen) # [] ``` 2. **异常处理** 使用`generator.close()`或`generator.throw()`管理生成器生命周期。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值