python中idxmax_Python Pandas groupby forloop& Idxmax

该博客介绍了如何在Python的Pandas DataFrame中,针对多级别分组的数据找到最高的ROI(投资回报率)。通过groupby和idxmax函数组合,可以找出每个公司、产品和行业的最高ROI及其对应日期。示例数据和两种实现方法——使用loc选择索引和使用reindex重新索引——被详细阐述,并比较了它们的效率。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

I have a DataFrame that must be grouped on three levels, and would then have the highest value returned. Each day there is a return for each unique value, and I would like to find the highest return and the details.

data.groupby(['Company','Product','Industry'])['ROI'].idxmax()

The return would show that:

Target - Dish Soap - House had a 5% ROI on 9/17

Best Buy - CDs - Electronics had a 3% ROI on 9/3

was the highest.

Here's some example data:

+----------+-----------+-------------+---------+-----+

| Industry | Product | Industry | Date | ROI |

+----------+-----------+-------------+---------+-----+

| Target | Dish Soap | House | 9/17/13 | 5% |

| Target | Dish Soap | House | 9/16/13 | 2% |

| BestBuy | CDs | Electronics | 9/1/13 | 1% |

| BestBuy | CDs | Electroincs | 9/3/13 | 3% |

| ...

Not sure if this would be a for loop, or using .ix.

解决方案

I think, if I understand you correctly, you could collect the index values in a Series using groupby and idxmax(), and then select those rows from df using loc:

idx = data.groupby(['Company','Product','Industry'])['ROI'].idxmax()

data.loc[idx]

another option is to use reindex:

data.reindex(idx)

On a (different) dataframe I happened to have handy, it appears reindex might be the faster option:

In [39]: %timeit df.reindex(idx)

10000 loops, best of 3: 121 us per loop

In [40]: %timeit df.loc[idx]

10000 loops, best of 3: 147 us per loop

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值