使用 pandas.pivot 构造透视表时,报错:ValueError: Index contains duplicate entries, cannot reshape

本文介绍了如何处理在使用 pandas.pivot 构建透视表时遇到的 `ValueError: Index contains duplicate entries, cannot reshape` 错误。详细探讨了错误原因及解决方法,通过一个具体案例进行说明。" 116615934,10535995,使用rsync在Linux中高效同步文件夹,"['Linux运维', '文件同步', 'rsync命令', '数据备份', '增量备份']

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1、pivot 语法
DataFrame.pivot(index=None, columns=None, values=None)

index & column 构成的组合中存在重复数据时,会报下图中的错误
在这里插入图片描述

2、案例:
df = pd.DataFrame({"foo": ['one', 'one', 'two', 'two'],

                   "bar": ['A', 'A', 'B', 'C'],

                   "baz": [1, 2, 3, 4]})
df
#    foo bar  baz
# 0  one   A    1
# 1  one   A    2
# 2  two   B    3
# 3  two   C    4

df.pivot(index='foo', columns='bar', values='baz')
"""
报错原因:“foo列 & bar列” 组合起来的数据中,有重复的,无法作为主键处理
"""
# Traceback (most recent call last):
#    ...
# ValueError: Index contains duplicate entries, cannot reshape                     
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[149], line 80 76 return all_client_df 78 # 计算每日持仓余弦相似性 79 # daily_positions_cos = calculate_positions_consine_similarity(multi_cpty_df[(multi_cpty_df["client_id"] == "ABRH400")], interval=90) ---> 80 daily_positions_cos = calculate_positions_consine_similarity(multi_cpty_df, interval=90) 82 # plot_pos_sim(daily_positions_cos["date"], daily_positions_cos["pos_sim"]) 84 daily_positions_cos Cell In[149], line 66, in calculate_positions_consine_similarity(df, interval) 63 merged["posion_weight"] = merged["posion_weight"].fillna(0) 65 # 每个client,每日,根据所有股票仓位,可以计算相似性 ---> 66 similarity_median = calculate_cosine_similarity(merged, interval) 67 similarity_results = pd.DataFrame({ 68 "client_id": [client], 69 "dates_num": [all_dates.size], 70 "instruments_num": [all_instruments.size], 71 "pos_sim": [similarity_median]}) 73 all_client_result.append(similarity_results) Cell In[149], line 12, in calculate_positions_consine_similarity.<locals>.calculate_cosine_similarity(group_df, interval) 9 group_df = group_df.sort_values(by=["date"]) 11 results = [] ---> 12 stock_matrix = group_df.pivot(index='date', columns='instrument_id', values='posion_weight').fillna(0) 14 n = len(stock_matrix) 15 interval = max(1, int(interval)) File c:\Users\matianht\.conda\envs\nomura\lib\site-packages\pandas\core\frame.py:9339, in DataFrame.pivot(self, columns, index, values) 9332 @Substitution("") 9333 @Appender(_shared_docs["pivot"]) 9334 def pivot( 9335 self, *, columns, index=lib.no_default, values=lib.no_default 9336 ) -> DataFrame: 9337 from pandas.core.reshape.pivot import pivot -> 9339 return pivot(self, index=index, columns=columns, values=values) File c:\Users\matianht\.conda\envs\nomura\lib\site-packages\pandas\core\reshape\pivot.py:570, in pivot(data, columns, index, values) 566 indexed = data._constructor_sliced(data[values]._values, index=multiindex) 567 # error: Argument 1 to "unstack" of "DataFrame" has incompatible type "Union 568 # [List[Any], ExtensionArray, ndarray[Any, Any], Index, Series]"; expected 569 # "Hashable" --> 570 result = indexed.unstack(columns_listlike) # type: ignore[arg-type] 571 result.index.names = [ 572 name if name is not lib.no_default else None for name in result.index.names 573 ] 575 return result File c:\Users\matianht\.conda\envs\nomura\lib\site-packages\pandas\core\series.py:4615, in Series.unstack(self, level, fill_value, sort) 4570 """ 4571 Unstack, also known as pivot, Series with MultiIndex to produce DataFrame. 4572 (...) 4611 b 2 4 4612 """ 4613 from pandas.core.reshape.reshape import unstack -> 4615 return unstack(self, level, fill_value, sort) File c:\Users\matianht\.conda\envs\nomura\lib\site-packages\pandas\core\reshape\reshape.py:517, in unstack(obj, level, fill_value, sort) 515 if is_1d_only_ea_dtype(obj.dtype): 516 return _unstack_extension_series(obj, level, fill_value, sort=sort) --> 517 unstacker = _Unstacker( 518 obj.index, level=level, constructor=obj._constructor_expanddim, sort=sort 519 ) 520 return unstacker.get_result( 521 obj._values, value_columns=None, fill_value=fill_value 522 ) File c:\Users\matianht\.conda\envs\nomura\lib\site-packages\pandas\core\reshape\reshape.py:154, in _Unstacker.__init__(self, index, level, constructor, sort) 146 if num_cells > np.iinfo(np.int32).max: 147 warnings.warn( 148 f"The following operation may generate {num_cells} cells " 149 f"in the resulting pandas object.", 150 PerformanceWarning, 151 stacklevel=find_stack_level(), 152 ) --> 154 self._make_selectors() File c:\Users\matianht\.conda\envs\nomura\lib\site-packages\pandas\core\reshape\reshape.py:210, in _Unstacker._make_selectors(self) 207 mask.put(selector, True) 209 if mask.sum() < len(self.index): --> 210 raise ValueError("Index contains duplicate entries, cannot reshape") 212 self.group_index = comp_index 213 self.mask = mask ValueError: Index contains duplicate entries, cannot reshape 这是什么意思
最新发布
07-30
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值