python - function list generator

最新推荐文章于 2025-03-19 14:00:18 发布

michael_wq

最新推荐文章于 2025-03-19 14:00:18 发布

阅读量223

点赞数

CC 4.0 BY-SA版权

分类专栏： python 数据分析文章标签： python

本文链接：https://blog.youkuaiyun.com/michael_wq/article/details/109303111

python 同时被 2 个专栏收录

22 篇文章

订阅专栏

数据分析

13 篇文章

订阅专栏

本文介绍了Python编程中的多种实用技巧，包括如何使用*args和**kwargs参数、列表推导式、迭代器和生成器等高级特性，以及如何高效处理大量数据。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

*args

传递多个变量进来

**kwargs

传递个字典过来

def func(**kwargs):
	for key, value in kwargs.items():
		print(key + ':' + value)

lambda

map(func, seq) 会遍历所有items在seq中

⚠️：要使用list(xxxxx)来读取数据
在这里插入图片描述

filter(func, seq)

表达的是一个判断，返回为True的原数值

list(map(lambda x : x % 2, range(10)))
# => [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
list(filter(lambda x : x % 2, range(10)))
# => [1, 3, 5, 7, 9]

iter() + next()

e.g 每一次调用next都会读一个数据

# Create a list of strings: flash
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']

# Create an iterator for flash: superhero
superhero = iter(flash)

# Print each item from the iterator
print(next(superhero)) # jay garrick
print(next(superhero)) # barry allen
print(next(superhero)) # wally west
print(next(superhero)) # bart allen

Q: 要做一个这样的tuple list怎么搞？
[(0, ‘a’), (1, ‘b’), (2, ‘c’), (3, ‘dd’)]

[(100, ‘a’), (101, ‘b’), (102, ‘c’), (103, ‘dd’)]

a = ['a', 'b', 'c', 'dd']
enu_a = enumerate(a)
print(type(enu_a))
#<class 'enumerate'>

list_enu_a = list(enu_a)
print(list_enu_a)
#[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'dd')]

list(enumerate(a, start=100))
# [(100, 'a'), (101, 'b'), (102, 'c'), (103, 'dd')]

Q: 如何去遍历读取enumerate里头的值？

# Unpack and print the tuple pairs
for index1, value1 in enumerate(enu_a):
    print(index1, value1)

# Change the start index
for index2, value2 in enumerate(enu_a, start=1):
    print(index2, value2)

Q: zip()有啥用？

a = ['a', 'b', 'c', 'dd']
b = ['q', 'w', 'e', 'rr']
c = ['a', 's', 'd', 'ff']
list(zip(a, b, c))
# [('a', 'q', 'a'), ('b', 'w', 's'), ('c', 'e', 'd'), ('dd', 'rr', 'ff')]

Q: 情景介绍：Processing large amounts of Twitter data
Sometimes, the data we have to process reaches a size that is too much for a computer’s memory to handle. This is a common problem faced by data scientists. A solution to this is to process an entire data source chunk by chunk, instead of a single go all at once.

使用chunksize批量处理

# Initialize an empty dictionary: counts_dict
counts_dict ={}

# Iterate over the file chunk by chunk
for chunk in pd.read_csv('tweets.csv', chunksize=10):

    # Iterate over the column in DataFrame
    for entry in chunk['lang']:
        if entry in counts_dict.keys():
            counts_dict[entry] += 1
        else:
            counts_dict[entry] = 1

# Print the populated dictionary
print(counts_dict)

Q:这个matrix怎么用[…]一行写出来？
在这里插入图片描述

matrix = [[col for col in range(5)] for row in range(5)]

matrix = [[col for col in range(5)]] * 5

generator function

应用：list创造出来的数据如果很大的话非常占内存，这时候可以换一种方法处理，就是用generator，思想是逐步计算出来，而不是一次性全算出来（内存不够用）。其中yield就相当于return，具体看这篇文章。next()与其连用读取数据
e.g.

# Create a list of strings: lannister
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Create a generator object: lengths
lengths = (len(person) for person in lannister)

# Iterate over and print the values in lengths
for value in lengths:
    print(value)

相当于

# Create a list of strings
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']
# Define generator function get_lengths
def get_lengths(input_list):
    """Generator function that yields the
    length of the strings in input_list."""
    # Yield the length of a string
    for person in input_list:
        yield len(person)

# Print the values generated by get_lengths()
for value in get_lengths(lannister):
    print(value)