Python for Data Analysis》书中Pythonic写法

最新推荐文章于 2025-10-14 09:19:06 发布

原创最新推荐文章于 2025-10-14 09:19:06 发布 · 168 阅读

0 ·

CC 4.0 BY-SA版权

最近重新阅读了《Python for Data Analysis》一书，发现好多Pythonic的写法。之前也看到过，每次看的时候都会发自内心的感叹一句“写得真好”，但是几天之后就忘了，所以这次再读的时候，打算将他们写下来。

zip

zip的用法就是将几个list，tuple或者其他队列组合成tuple的列表。最基本的用法

Python

seq1 = ['foo', 'bar', 'baz'] seq2 = ['one', 'two', 'three'] zipped = zip(seq1, seq2) print (list(zipped))

seq1 = [ 'foo' , 'bar' , 'baz' ]

seq2 = [ 'one' , 'two' , 'three' ]

zipped = zip ( seq1 , seq2 )

print ( list ( zipped ) )

结果： [('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

基本的用法了解，我们来一下高级用法

Python

pitchers = [('Nolan', 'Ryan'), ('Roger', 'Clemens'),('Schilling', 'Curt')] print (*a) # 打印结果为：('Nolan', 'Ryan') ('Roger', 'Clemens') ('Schilling', 'Curt') first_names, last_names = zip(*pitchers) # 打印结果为：('Nolan', 'Ryan') ('Roger', 'Clemens') ('Schilling', 'Curt') first_names, last_names = zip(*pitchers)

pitchers = [ ( 'Nolan' , 'Ryan' ) , ( 'Roger' , 'Clemens' ) , ( 'Schilling' , 'Curt' ) ]

print ( * a )

# 打印结果为：('Nolan', 'Ryan') ('Roger', 'Clemens') ('Schilling', 'Curt')

first_names , last_names = zip ( * pitchers )

# 打印结果为：('Nolan', 'Ryan') ('Roger', 'Clemens') ('Schilling', 'Curt')

first_names , last_names = zip ( * pitchers )

这里的 * 我之前也不知道怎么使用。* 可以将元素提取出来，然后用zip，将first_name和second_name分开。

dict

情境一：我们经常会遇到要从dict中取value，但是在取之前，我们并不确定key是否存在呢，所以会先判断key是否存在：

Python

if key in my_dict: value = my_dict[key] else: value = default_value

if key in my_dict :

value = my_dict [ key ]

else :

value = default _value

优化代码:dict其实是有get方法

Python

value = my_dict.get(key, default_value)

1 2	value = my_dict . get ( key , default_value )

情境二：要将一个文档里的单词，按照首字母归类。最后的储存结果为一个dict，key是首字母，value是单词的list。 result = {"a":[],"b":[],...}

Python

words = ['apple', 'bat', 'bar', 'atom', 'book'] result = {} for word in words: letter = word[0] if letter in result.key(): result[letter].append(word) else: result['letter'] = [word]

words = [ 'apple' , 'bat' , 'bar' , 'atom' , 'book' ]

result = { }

for word in words :

letter = word [ 0 ]

if letter in result . key ( ) :

result [ letter ] . append ( word )

else :

result [ 'letter' ] = [ word ]

优化代码：dict有setdefault方法

Python

words = ['apple', 'bat', 'bar', 'atom', 'book'] result = {} for word in words: letter = word[0] result.setdefault(letter, []).append(word)

words = [ 'apple' , 'bat' , 'bar' , 'atom' , 'book' ]

result = { }

for word in words :

letter = word [ 0 ]

result . setdefault ( letter , [ ] ) . append ( word )

sort 和 lambda

我们需要将一些string

按字符长排序words = ['foo', 'card', 'bar', 'aaaa', 'abab']
words.sort(key = lambda x : len(x))
按字符不同字母数量排序words = ['foo', 'card', 'bar', 'aaaa', 'abab']
words.sort(key = lambda x : len(set(x)))

np.where()

需要将大于0的数字变成5，将小于0的数据变成-4.

Python

arr = np.random.randn(4,4) np.where(arr>0,5,-4)

1 2	arr = np . random . randn ( 4 , 4 ) np . where ( arr > 0 , 5 , - 4 )

或者大于0的数字变成5，小于0的数据不变

Python

arr = np.random.randn(4,4) np.where(arr>0,5,arr)

1 2	arr = np . random . randn ( 4 , 4 ) np . where ( arr > 0 , 5 , arr )

np.cumsum()

我自己一直有用sum(),mean()但是很少用到累计和。这次看到了所以也记一笔。

Python

arr = np.random.randn(4,4) #所有数字累加 arr.cumsum() np.cumsum(arr) #按列累加 arr.cumsum(axis = 0) np.cumsum(arr,axis = 0) #按行累加 arr.cumsum(axis = 1) np.cumsum(arr,axis = 1) #同理累乘 arr.cumprod()