DataFrame取值df['col_name']与df[['col_name']]区别

最新推荐文章于 2024-03-15 00:02:16 发布

原创

最新推荐文章于 2024-03-15 00:02:16 发布 · 4.9k 阅读

7 ·

CC 4.0 BY-SA版权

文章标签：

#DataFrame #df['col']

被这个困扰了好久,直到一次错误才发觉有用

df['col_name']
# 得到的是这一列的数据,不带col_name,是series结构
type(df['col_name'])
>> <class 'pandas.core.series.Series'>

col_name是DataFrame专有

df[['col_name']

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

ikeepo

关注关注

10
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
分享

复制链接

分享到 QQ

分享到新浪微博

扫一扫
举报

举报

专栏目录

R语言修改dataframe的列名（column name）实战

statistics+insight+vista+power

02-25

515

R语言修改dataframe的列名（column name）实战

Pandas 数据结构之 DataFrame使用教程

chenchen5152的博客

05-20

1653

Pandas 数据结构DataFrame 简介DataFrame 实例化行数据的选择1.按位置选择行数据（单行选择）2.按位置选择行数据（多行选择：类似于切片）3.按索引值选择单（多）行数据4.按条件（bool）选择指定的行数据列数据的选择 DataFrame 简介 DataFrame是Pandas中的一个表格型的数据结构，包含一组有序的列，每列的值的类型都可不同（整型、浮点型、布尔型、字符串等），DataFrame既有行索引也有列索引，可以被看作是由Series组成的字典 DataFrame也可以理解

参与评论您还未登录，请先登录后发表或查看评论

[python学习笔记]DataFrame中df['name']、df[['name']]和df.name的区别

最新发布

11-17

data_df[col] = pd.to_numeric(data_df[col], errors='coerce') # 5. 对每个动态列进行解析并查找 for col_name in dynamic_cols: parts = col_name.split('-') if len(parts) new_row[col_name] = '' ...

import numpy as np # 拆分各表 kpi_df = google_sheets_new["KPI"] results_df = google_sheets_new["Results"] short_term_df = google_sheets_new["ShortTerm"] high_risk_df = google_sheets_new["HighRisk"] # 最终结果 extracted = {} # 规则一：KPI G10（第10行第7列，0-based索引为[9, 6]） try: extracted['kpi_g10'] = kpi_df.iloc[9, 6] except: extracted['kpi_g10'] = None # 规则二和三：Results 表中 AS 列，获取最低/最高百分比的5个料号 # AS 列未知精确列名，因此扫描列名含 % 的列 as_col_candidates = [col for col in results_df.columns if isinstance(col, str) and '%' in col] as_col_name = as_col_candidates[0] if as_col_candidates else None if as_col_name: try: as_series = results_df[as_col_name].dropna() as_numeric = pd.to_numeric(as_series.astype(str).str.replace('%', '').str.strip(), errors='coerce').dropna() # 对应索引（DataFrame 中） lowest_5_idx = as_numeric.nsmallest(5).index highest_5_idx = as_numeric.nlargest(5).index extracted['lowest_5_as'] = results_df.loc[lowest_5_idx].to_dict(orient='records') extracted['highest_5_as'] = results_df.loc[highest_5_idx].to_dict(orient='records') except: extracted['lowest_5_as'] = [] extracted['highest_5_as'] = [] else: extracted['lowest_5_as'] = [] extracted['highest_5_as'] = [] # 规则四：ShortTerm 中 G列金额最大 3 个呆滞料 try: col_g = short_term_df.columns[6] g_series = pd.to_numeric(short_term_df[col_g], errors='coerce') top3_g_idx = g_series.nlargest(3).index extracted['top3_dead_stock'] = short_term_df.loc[top3_g_idx].to_dict(orient='records') except: extracted['top3_dead_stock'] = [] # 规则五：HighRisk 中 D列，提取天数最多的三条（天数表达为 7/9/8） def parse_days(val): try: return sum(int(x.strip()) for x in str(val).split('/') if x.strip().isdigit()) except: return -1 try: col_d = high_risk_df.columns[3] d_series = high_risk_df[col_d].dropna() d_days = d_series.apply(parse_days) top3_d_idx = d_days.nlargest(3).index extracted['top3_days'] = high_risk_df.loc[top3_d_idx].to_dict(orient='records') except: extracted['top3_days'] = [] # 规则六：HighRisk 中 N列最大值 try: col_n = high_risk_df.columns[13] n_series = pd.to_numeric(high_risk_df[col_n], errors='coerce') max_n_idx = n_series.idxmax() extracted['max_n'] = high_risk_df.loc[max_n_idx].to_dict() except: extracted['max_n'] = {} # 规则七：HighRisk 中 Q2（第17列，第2行） try: q2_value = high_risk_df.iloc[1, 16] extracted['q2_value'] = q2_value except: extracted['q2_value'] = None # 规则八：HighRisk 中 AO列最大3个数值（第41列） try: col_ao = high_risk_df.columns[40] ao_series = pd.to_numeric(high_risk_df[col_ao], errors='coerce') top3_ao_idx = ao_series.nlargest(3).index extracted['top3_ao'] = high_risk_df.loc[top3_ao_idx].to_dict(orient='records') except: extracted['top3_ao'] = [] extracted

07-22

df = pd.read_excel(file_path, sheet_name='kpi', header=None) 这样读取，因为没有标题行？但是用户没有说明，我们假设kpi sheet没有标题行，直接定位G10（行9，列6）或者，如果kpi sheet有标题行，那么数据...

我这段代码CPU单核处理不过来，会未响应，帮我改成能用全部CPU ，以下为代码 def compare_and_write(self): selected_sheets = [self.tree.item(item, "values")[1] for item, state in self.check_states.items() if state] if not selected_sheets: messagebox.showwarning("提示", "请至少选择一个工作表作为对比表") return file_path = self.file_path.get() if not file_path: messagebox.showwarning("提示", "请先选择Excel文件") return try: wb = load_workbook(file_path) if 'check' not in wb.sheetnames: messagebox.showerror("错误", "找不到check表，请先创建") return ws_check = wb['check'] # === 关键修复1：正确的列名收集 === all_headers = set() sheet_data = {} # 存储各表原始数据 for sheet in selected_sheets: if sheet in wb.sheetnames: temp_df = pd.read_excel(file_path, sheet_name=sheet) sheet_data[sheet] = temp_df.copy() all_headers.update(temp_df.columns[1:]) # 仅收集数据列名 # === 关键修复2：重建数据框架保证ID列存在 === df_contrast = pd.DataFrame(columns=['ID_COL'] + list(all_headers)) for sheet, df in sheet_data.items(): # 创建当前表的副本并重命名ID列 temp_df = df.copy() temp_df.rename(columns={df.columns[0]: 'ID_COL'}, inplace=True) # 对齐列结构 for col in all_headers: if col not in temp_df: temp_df[col] = None # 只保留需要的列 temp_df = temp_df[['ID_COL'] + list(all_headers)] df_contrast = pd.concat([df_contrast, temp_df], ignore_index=True) if df_contrast.empty: messagebox.showwarning("警告", "对比表数据为空") return # === 列名写入保持不变 === existing_headers = {ws_check.cell(1, col).value for col in range(5, ws_check.max_column+1) if ws_check.cell(1, col).value} start_col = 5 while start_col <= ws_check.max_column and ws_check.cell(1, start_col).value is not None: start_col += 1 for header in all_headers: if header not in existing_headers: ws_check.cell(row=1, column=start_col, value=header) start_col += 1 # === 关键修复3：增强数据匹配逻辑 === added_count = 0 # 将ID列转为字符串类型确保匹配 df_contrast['ID_COL'] = df_contrast['ID_COL'].astype(str) for row_idx in range(2, ws_check.max_row + 1): col_values = {} for idx, col in enumerate([2, 3, 4], start=1): value = ws_check.cell(row_idx, col).value col_values[f'col{idx}'] = str(value) if value is not None else "" # 查找匹配行（优先匹配col2，其次col3，最后col4） matched_row = None for col_key in ['col1', 'col2', 'col3']: if col_values[col_key]: match = df_contrast[df_contrast['ID_COL'] == col_values[col_key]] if not match.empty: matched_row = match.iloc[0] break if matched_row is not None: # 动态定位写入位置 write_col = 5 while write_col <= ws_check.max_column and ws_check.cell(row_idx, write_col).value is not None: write_col += 1 # 写入所有数据列（跳过ID列） data_cols = [col for col in df_contrast.columns if col != 'ID_COL'] for col_name in data_cols: if write_col > ws_check.max_column: break ws_check.cell(row_idx, write_col, value=matched_row[col_name]) write_col += 1 added_count += 1 wb.save(file_path) messagebox.showinfo("完成", f"成功写入 {added_count} 行对比数据到check表") except Exception as e: messagebox.showerror("错误", f"对比写入失败: {str(e)}")

09-12

由于后续操作需要sheet_data（字典，key为sheet名，value为DataFrame）和all_headers（集合），所以我们在每个线程中读取一个sheet，然后返回(sheet_name, DataFrame)。然后在主线程中更新all_headers。步骤： ...

请你将下面的代码的每一个部分做一下详细的解释，尤其是预处理的：去除无效特征值、缺失值处理、数据可视化、特征变量选择（点评数、口味评分、环境评分、服务评分、人均消费），最后给出一个详细的总结。# -*- coding: utf-8 -*- import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.preprocessing import StandardScaler import os import matplotlib as mpl import sys # 设置中文字体支持 plt.rcParams['font.sans-serif'] = ['SimHei', 'Microsoft YaHei', 'WenQuanYi Micro Hei'] # 支持中文的字体 plt.rcParams['axes.unicode_minus'] = False # 正确显示负号 mpl.rcParams['font.size'] = 12 # 设置全局字体大小 # 获取桌面路径 desktop = os.path.join(os.path.expanduser('~'), 'Desktop') print(f"文件将保存到桌面: {desktop}") # 1. 读取数据 print("读取数据...") try: df = pd.read_csv('sourceSet.csv') print(f"成功读取数据，共{len(df)}行，{len(df.columns)}列") except FileNotFoundError: print("错误：找不到 sourceSet.csv 文件，请确保文件在当前目录") sys.exit(1) # 2. 识别关键特征 print("\n识别关键特征...") # 定义必须保留的五个关键特征（中英文列名映射） key_features = { '点评数': ['点评数', 'review_count'], '口味评分': ['口味', 'taste_rating'], '环境评分': ['环境', 'environment_rating'], '服务评分': ['服务', 'service_rating'], '人均消费': ['人均消费', 'avg_price'] } # 构建实际存在的特征列表 existing_key_features = {} for feature_name, possible_names in key_features.items(): for name in possible_names: if name in df.columns: existing_key_features[feature_name] = name print(f"已识别关键特征 '{feature_name}': 列名 '{name}'") break else: print(f"警告：未找到关键特征 '{feature_name}' 的对应列") if len(existing_key_features) < 5: print(f"错误：只找到 {len(existing_key_features)} 个关键特征，需要5个") sys.exit(1) # 3. 去除无效特征值 print("\n去除无效特征值...") columns_to_drop = ['lng', 'lat', '数据ID', '城市', 'ID', 'city', 'id', 'Unnamed: 0'] # 只删除实际存在的列，且不是关键特征 invalid_cols = [col for col in columns_to_drop if col in df.columns and col not in existing_key_features.values()] df.drop(columns=invalid_cols, inplace=True) print(f"已删除无效列: {invalid_cols}") print(f"剩余列: {list(df.columns)}") # 4. 缺失值处理 print("\n处理缺失值...") # 确保关键特征存在 key_feature_cols = list(existing_key_features.values()) print(f"关键特征列: {key_feature_cols}") # ① 缺失值比例高于30%的特征直接删除（仅限非关键特征） non_key_cols = [col for col in df.columns if col not in key_feature_cols] missing_percent = df[non_key_cols].isnull().mean() cols_to_drop = missing_percent[missing_percent > 0.3].index.tolist() df.drop(columns=cols_to_drop, inplace=True) if cols_to_drop: print(f"删除缺失值>30%的非关键特征: {cols_to_drop}") # ② 关键特征的缺失值处理 initial_rows = len(df) for feature_name, col_name in existing_key_features.items(): missing_count = df[col_name].isnull().sum() if missing_count > 0: # 关键特征缺失值使用列中位数填充 median_val = df[col_name].median() df[col_name].fillna(median_val, inplace=True) print(f"填充关键特征 '{feature_name}' ({col_name}) 的缺失值 (中位数: {median_val:.2f})") # ③ 其他非关键数值特征的缺失值采取均值填充 other_cols = [col for col in df.columns if col not in key_feature_cols and df[col].dtype in ['int64', 'float64']] for col in other_cols: if df[col].isnull().sum() > 0: mean_val = df[col].mean() df[col].fillna(mean_val, inplace=True) print(f"填充非关键特征 '{col}' 的缺失值 (均值: {mean_val:.2f})") # 5. 数据可视化 - 评分排名 print("\n生成可视化图表...") def plot_ranking(feature_name, col_name, title, color): """绘制评分排名的水平条形图""" # 检查特征是否存在 if col_name not in df.columns: print(f"错误：特征 '{col_name}' 不存在，无法生成 {title} 排名") return None # 取前20名 top20 = df.sort_values(col_name, ascending=False).head(20).reset_index(drop=True) plt.figure(figsize=(14, 10)) # 使用商家名称或索引作为标签 labels = top20.index.astype(str) if '商家名称' in df.columns: labels = top20['商家名称'].values elif 'shop_name' in df.columns: labels = top20['shop_name'].values else: # 如果没有商家名称，使用索引 labels = [f"商家 {i+1}" for i in top20.index] bars = plt.barh(labels, top20[col_name], color=color) plt.title(f'{title}排名 - 前20名', fontsize=16) plt.xlabel(title, fontsize=14) plt.ylabel('商家名称', fontsize=14) plt.gca().invert_yaxis() # 反转Y轴使最高分在顶部 # 在条形图上添加数值标签 for bar in bars: width = bar.get_width() plt.text(width + 0.01, bar.get_y() + bar.get_height() / 2, f'{width:.2f}', va='center', ha='left', fontsize=10) plt.tight_layout() save_path = os.path.join(desktop, f'{title}排名.png') plt.savefig(save_path, dpi=100, bbox_inches='tight') plt.close() print(f"已保存: {save_path}") return save_path # 绘制三种评分排名 saved_files = [] for feature_name in ['服务评分', '口味评分', '环境评分']: if feature_name in existing_key_features: col_name = existing_key_features[feature_name] color = { '口味评分': '#55A868', # 绿色 '环境评分': '#C44E52', # 红色 '服务评分': '#4C72B0' # 蓝色 }.get(feature_name, '#888888') # 默认灰色 saved_path = plot_ranking(feature_name, col_name, feature_name, color) if saved_path: saved_files.append(saved_path) else: print(f"警告：未找到 {feature_name} 列，跳过可视化") # 6. 特征变量选择 - 确保保留五个关键特征 print("\n特征变量选择...") selected_features = list(existing_key_features.values()) print(f"保留的关键特征: {selected_features}") df_selected = df[selected_features].copy() # 7. 数据标准化 (Z-score) print("\n数据标准化处理...") scaler = StandardScaler() df_selected.loc[:, selected_features] = scaler.fit_transform(df_selected[selected_features]) # 8. 保存预处理结果 output_path = os.path.join(desktop, 'processed_sourceSet.csv') try: df_selected.to_csv(output_path, index=False) print(f"预处理后的数据已保存到: {output_path}") # 验证文件是否成功生成 if os.path.exists(output_path): print(f"文件验证成功，大小: {os.path.getsize(output_path) / 1024:.2f} KB") else: print("警告：文件保存失败，请检查路径权限") except Exception as e: print(f"文件保存错误: {str(e)}") # 9. 生成预处理结果统计 print("\n预处理结果统计:") stats = pd.DataFrame({ '特征': [existing_key_features[fn] for fn in existing_key_features], '原始名称': list(existing_key_features.keys()), '类型': df_selected.dtypes.values, '最小值': df_selected.min().values, '最大值': df_selected.max().values, '取值范围': df_selected.max().values - df_selected.min().values }) print(stats) # 保存统计信息 stats_path = os.path.join(desktop, '预处理统计.csv') try: stats.to_csv(stats_path, index=False) print(f"预处理统计已保存到: {stats_path}") except Exception as e: print(f"统计信息保存错误: {str(e)}") print("\n所有处理完成!") if saved_files: print(f"生成的可视化文件: {saved_files}") else: print("警告：未生成任何可视化图表")

06-16

$$填充值=\text{median}(x)$$-**代码实现**：```pythonforfeature_name,col_nameinexisting_key_features.items():missing_count=df[col_name].isnull().sum()ifmissing_count>0:median_val=df[col_name].median()df...

[python] 一个例子初步学习DataFrame _ 修改列名

Jean2257的博客

12-05

319

例子: 假设有4家公司, 他们都在2个省份有自己的业务, 用统计量v1, v2进行描述, 现在需要对这些数据进行一定的分析. # -*- coding: utf-8 -*- import pandas as pd list1 = [1, 2, 3, 4, 5, 6, 7, 8] list4 = [8, 7, 6, 5, 4, 3, 2, 1] list2 = ['a', 'a', 'a'...

SqlServer函数大全四十四：COL_NAME函数

yixiaobing的博客

03-15

789

是一个内置函数，它用于返回指定对象 ID 和列 ID 的列名。这个函数在动态 SQL、信息架构查询或需要获取列名的其他情况下特别有用。注意：在生产环境中使用动态 SQL 或执行此类操作时，请始终小心并确保代码是安全的，以防止 SQL 注入等安全问题。的列的对象 ID 和列 ID。下面是一个简单的示例，展示如何使用。在 SQL Server 中，的表，并且我们想要获取名为。

R语言中DataFrame列名作为函数参数

给永远比拿愉快

09-30

1853

在使用Tidyverse提供的各种函数时，我们很多时候都会直接传递DataFrame的列名作为函数参数，对对应的列进行操作。如果我们自定义的函数中需要传递列名作为函数参数，如何实现呢？

表名为变量view_name ,字段名为变量col_name的查询

得峰的专栏 [网络收藏]

11-01

1126

传人参数为表名,字段名等假设我设置表名变量为view_name ,字段名为col_name这里我需要将view_name中col_name的值全部取出来,拼成1,2,3,4,5,6等这种以逗号分割的形式.这个问题应该如何做? create or replace function getvaluebytab(in_tabname in varchar2,in_colname in varchar2)

【转载】Python 修改DataFrame列名的四种方法

thisis_redbrick的博客

11-02

5042

本文转自知乎博文。

SQL函数之元数据函数：COL_LENGTH，COL_NAME，DB_NAME函数

皮卡丘丘丘~~~

02-22

8474

元数据函数元数据函数描述了数据的结构和意义，它主要用于返回数据库中的相应信息。其中包括：返回数据库中数据表或视图的个数和名称。返回数据表中数据字段的名称、数据类型、长度等描述信息。返回数据表中定义的约束、索引、主键或外键等信息。常用的元数据函数及说明在这张数据表的基础上执行语句 COL_LENGTH函数 COL_LENGTH函数用于返回列的定义长度。 --元数据函数 --COL_...

[Python]DataFrame的创建、数据读取、增加删除行列

sihsd的博客

03-19

2773

DataFrame数据读取读取具体单元格值方法一 df[‘ColName’][行号] 方法二 df.values[1][1] df.values 去掉行名、列名只剩下数组构成的数组 ‘numpy.ndarray’ 可用df.values[行号]取出整行数据用df.values[行号][列号]取出单个数据读取一列 df[‘ColName’] 读取多列 df[[‘ColName1’,‘ColN...

df[col]与df[[col]]的区别

good18Levin的博客

11-13

2847

df[col]是一个pd.Series对象，shape为（样本数目，），属于1维的； df[[col]]是一个pd.DataFrame对象，shape等于（样本数目，1），属于2维的。其实这个也非常好理解，因为【col】其实也就是列表格式的，【col】是【col1、col2、col3、col5…]这种一般情况中的一个特例，而很明显这种一般情况对应多列的情形，自然也只可能是dataframe类型了。观察不难发现，前者在pycharm调试器上的显示只有单纯的数字，而后者的显示还包括当前列的名称，还有值的显示是

MySQL索引

qq_47711757的博客

10-31

254

MySQL索引在不同情况下的表现