Series.rank()
obj = pd.Series([7, -5, 7, 4, 2, 0, 4])
print(type(obj.rank()))
print(obj.rank())
print (obj.rank(method = 'first',ascending=False))
print (obj.rank(method = 'max',ascending=False))
print (obj.rank(method = 'min',ascending=False))
result:
<class 'pandas.core.series.Series'>
0 6.5
1 1.0
2 6.5
3 4.5
4 3.0
5 2.0
6 4.5
dtype: float64
0 1.0
1 7.0
2 2.0
3 3.0
4 5.0
5 6.0
6 4.0
dtype: float64
0 2.0
1 7.0
2 2.0
3 4.0
4 5.0
5 6.0
6 4.0
dtype: float64
0 1.0
1 7.0
2 1.0
3 3.0
4 5.0
5 6.0
6 3.0
dtype: float64
从结果可以看出,这里仅仅针对的是Series整列排序,method参数是针对相同数据的处理方式,详细参数见
官方文档
.
针对某一属性值的排序
- 例如:我想计算对于每一个Auction_ID的Bid_Price进行排序
Auction_ID Bid_Price
123 9
123 7
123 6
123 2
124 3
124 2
124 1
125 1
- 效果
Auction_ID Bid_Price Auction_Rank
123 9 1
123 7 2
123 6 3
123 2 4
124 3 1
124 2 2
124 1 3
125 1 1
- 代码实现
In [68]: df['Auction_Rank'] = df.groupby('Auction_ID')['Bid_Price'].rank(ascending=False)
In [69]: df
Out[69]:
Auction_ID Bid_Price Auction_Rank
0 123 9 1
1 123 7 2
2 123 6 3
3 123 2 4
4 124 3 1
5 124 2 2
6 124 1 3
7 125 1 1
补充
s = pd.DataFrame([['2012', 'A', 4], ['2012', 'B', 8], ['2011', 'A', 21], ['2011', 'B', 31]], columns=['Year', 'Manager', 'Return'])
b = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 7], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
s = s.append(b)
print(s)
s.reset_index(drop=True, inplace=True)
print(s)
s['Rank'] = s.groupby(['Manager'])['Return'].rank(ascending=True)
print(s.sort_values(by=['Manager']))
result:
Year Manager Return
0 2012 A 4
1 2012 B 8
2 2011 A 21
3 2011 B 31
0 2012 A 3
1 2012 B 7
2 2011 A 20
3 2011 B 30
Year Manager Return
0 2012 A 4
1 2012 B 8
2 2011 A 21
3 2011 B 31
4 2012 A 3
5 2012 B 7
6 2011 A 20
7 2011 B 30
Year Manager Return Rank
0 2012 A 4 2.0
2 2011 A 21 4.0
4 2012 A 3 1.0
6 2011 A 20 3.0
1 2012 B 8 2.0
3 2011 B 31 4.0
5 2012 B 7 1.0
7 2011 B 30 3.0
小结
排序特征在某些应用场景是十分有效的特征,它反应的是事件发生的先后顺序。