Python for Data Analysis》书中Pythonic写法

最近重新阅读了《Python for Data Analysis》一书,发现好多Pythonic的写法。之前也看到过,每次看的时候都会发自内心的感叹一句“写得真好”,但是几天之后就忘了,所以这次再读的时候,打算将他们写下来。

zip

zip的用法就是将几个list,tuple或者其他队列组合成tuple的列表。最基本的用法

Python
seq1 = ['foo', 'bar', 'baz'] seq2 = ['one', 'two', 'three'] zipped = zip(seq1, seq2) print (list(zipped))
1
2
3
4
seq1 = [ 'foo' , 'bar' , 'baz' ]
seq2 = [ 'one' , 'two' , 'three' ]
zipped = zip ( seq1 , seq2 )
print ( list ( zipped ) )

结果: [('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

基本的用法了解,我们来一下高级用法

Python
pitchers = [('Nolan', 'Ryan'), ('Roger', 'Clemens'),('Schilling', 'Curt')] print (*a) # 打印结果为:('Nolan', 'Ryan') ('Roger', 'Clemens') ('Schilling', 'Curt') first_names, last_names = zip(*pitchers) # 打印结果为:('Nolan', 'Ryan') ('Roger', 'Clemens') ('Schilling', 'Curt') first_names, last_names = zip(*pitchers)
1
2
3
4
5
6
7
8
9
pitchers = [ ( 'Nolan' , 'Ryan' ) , ( 'Roger' , 'Clemens' ) , ( 'Schilling' , 'Curt' ) ]
print ( * a )
 
# 打印结果为:('Nolan', 'Ryan') ('Roger', 'Clemens') ('Schilling', 'Curt')
 
first_names , last_names = zip ( * pitchers )
# 打印结果为:('Nolan', 'Ryan') ('Roger', 'Clemens') ('Schilling', 'Curt')
 
first_names , last_names = zip ( * pitchers )

 

这里的 * 我之前也不知道怎么使用。* 可以将元素提取出来,然后用zip,将first_name和second_name分开。

dict

情境一:我们经常会遇到要从dict中取value,但是在取之前,我们并不确定key是否存在呢,所以会先判断key是否存在:

Python
if key in my_dict: value = my_dict[key] else: value = default_value
1
2
3
4
5
if key in my_dict :
     value = my_dict [ key ]
else :
     value = default _value
 

优化代码:dict其实是有get方法

Python
value = my_dict.get(key, default_value)
1
2
value = my_dict . get ( key , default_value )
 

情境二:要将一个文档里的单词,按照首字母归类。最后的储存结果为一个dict,key是首字母,value是单词的list。 result = {"a":[],"b":[],...}

Python
words = ['apple', 'bat', 'bar', 'atom', 'book'] result = {} for word in words: letter = word[0] if letter in result.key(): result[letter].append(word) else: result['letter'] = [word]
1
2
3
4
5
6
7
8
9
words = [ 'apple' , 'bat' , 'bar' , 'atom' , 'book' ]
result = { }
for word in words :
     letter = word [ 0 ]
     if letter in result . key ( ) :
         result [ letter ] . append ( word )
     else :
         result [ 'letter' ] = [ word ]
 

优化代码:dict有setdefault方法

Python
words = ['apple', 'bat', 'bar', 'atom', 'book'] result = {} for word in words: letter = word[0] result.setdefault(letter, []).append(word)
1
2
3
4
5
6
words = [ 'apple' , 'bat' , 'bar' , 'atom' , 'book' ]
result = { }
for word in words :
     letter = word [ 0 ]
     result . setdefault ( letter , [ ] ) . append ( word )
 

sort 和 lambda

我们需要将一些string

  • 按字符长排序words = ['foo', 'card', 'bar', 'aaaa', 'abab']
    words.sort(key = lambda x : len(x))
  • 按字符不同字母数量排序words = ['foo', 'card', 'bar', 'aaaa', 'abab']
    words.sort(key = lambda x : len(set(x)))

np.where()

需要将大于0的数字变成5,将小于0的数据变成-4.

Python
arr = np.random.randn(4,4) np.where(arr>0,5,-4)
1
2
arr = np . random . randn ( 4 , 4 )
np . where ( arr > 0 , 5 , - 4 )

或者大于0的数字变成5,小于0的数据不变

Python
arr = np.random.randn(4,4) np.where(arr>0,5,arr)
1
2
arr = np . random . randn ( 4 , 4 )
np . where ( arr > 0 , 5 , arr )

np.cumsum()

我自己一直有用sum(),mean()但是很少用到累计和。这次看到了所以也记一笔。

Python
arr = np.random.randn(4,4) #所有数字累加 arr.cumsum() np.cumsum(arr) #按列累加 arr.cumsum(axis = 0) np.cumsum(arr,axis = 0) #按行累加 arr.cumsum(axis = 1) np.cumsum(arr,axis = 1) #同理累乘 arr.cumprod()
1
2
3
4
5
6
7
8
9
10
11
12
13
arr = np . random . randn ( 4 , 4 )
#所有数字累加
arr . cumsum ( )
np . cumsum ( arr )
#按列累加
arr . cumsum ( axis = 0 )
np . cumsum ( arr , axis = 0 )
#按行累加
arr . cumsum ( axis = 1 )
np . cumsum ( arr , axis = 1 )
#同理累乘
arr . cumprod ( )
 

np.any() and np.all()

我们需要判断一个数组是否都为正数

Python
arr = np.random.randn(4,4) (arr>0).any() (arr>0).all()
1
2
3
4
arr = np . random . randn ( 4 , 4 )
( arr > 0 ) . any ( )
( arr > 0 ) . all ( )
 

巧取四分之一的数据

Python
large_arr = np.random.randn(1000) large_arr.sort() large_arr[int(0.25 * len(large_arr)]
1
2
3
4
large_arr = np . random . randn ( 1000 )
large_arr . sort ( )
large_arr [ int ( 0.25 * len ( large_arr ) ]
 

我觉得这上面的写法可以使我的代码精简很多,pythonic是最终目标。

如果这些中有你不熟悉的用法,那就点个赞,因为你离pythonic又近了一步。

谢谢大家支持呢。要不到200赞就给这本书的链接咯,这样大家也可以一起交流呢。




  • zeropython 微信公众号 5868037 QQ号 5868037@qq.com QQ邮箱
这本主要是用 pandas 连接 SciPy 和 NumPy,用pandas做数据处理是Pycon2012上一个很热门的话题。另一个功能强大的东西是Sage,它将很多开源的软件集成到统一的 Python 接口。, Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language., Written by Wes McKinney, the main author of the pandas library, this hands-on book is packed with practical cases studies. It’s ideal for analysts new to Python and for Python programmers new to scientific computing., Use the IPython interactive shell as your primary development environment, Learn basic and advanced NumPy (Numerical Python) features, Get started with data analysis tools in the pandas library, Use high-performance tools to load, clean, transform, merge, and reshape data, Create scatter plots and static or interactive visualizations with matplotlib, Apply the pandas groupby facility to slice, dice, and summarize datasets, Measure data by points in time, whether it’s specific instances, fixed periods, or intervals, Learn how to solve problems in web analytics, social sciences, finance, and economics, through detailed examples
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值