关于使用merge合并两个data_frame出现大量的空值的情况

本文针对DataFrame在合并过程中遇到的关键字类型不一致的问题提供了解决方案。通过将关键字统一转化为字符串类型,确保了不同数据集之间的顺利连接。
              cid  total_sales   c_name   c1    n1
0           1101.0   7520675.03      NaN  NaN   NaN
1         110201.0   5920098.53      NaN  NaN   NaN
2         110202.0  11080813.26      NaN  NaN   NaN
3         110203.0   1781374.74      NaN  NaN   NaN
4         110204.0      4926.56      NaN  NaN   NaN
5         110205.0   5829167.01      NaN  NaN   NaN

6         110206.0   9002016.18      NaN  NaN   NaN


解决方法因此,在使用merge进行连接前,必须对连接的关键字进行字符化或者整型化的调整,使连接的关键字值和类型都相同。即:data_frame_1=pd.read_csv(文件名,dtype={关键词:str})

             data_frame_2=pd.read_csv(文件名,dtype={关键词:str})

             df=pd.merge(data_frame_1,data_frame_2,on='关键词')

from neo4j import GraphDatabase import json # 创建节点和关系的Cypher模板 CREATE_N_PDU = """ UNWIND $n_pdu AS pdu MERGE (n:N_PDU {short_name: pdu.short_name, length: pdu.length}) """ CREATE_I_SIGNAL_I_PDU = """ UNWIND $i_signal_i_pdu AS pdu MERGE (i:I_SIGNAL_I_PDU {short_name: pdu.short_name, length: pdu.length}) """ CREATE_CAN_FRAME = """ UNWIND $can_frame AS frame MERGE (c:CAN_FRAME { short_name: frame.short_name, length: frame.frame_length, pdu_to_frame_mapping_short_name: frame.pdu_to_frame_mapping_short_name, pdu_ref: frame.pdu_ref }) """ CREATE_CAN_PHYSICAL_CHANNEL = """ UNWIND $can_physical_channel AS can MERGE (cp:CAN_PHYSICAL_CHANNEL {short_name: can.short_name}) WITH can, cp UNWIND can.can_frame_triggering AS cft MERGE (cft_node:CAN_FRAME_TRIGGERING {short_name: cft.short_name, pdu_triggering_ref: cft.pdu_triggering_ref}) MERGE (cp)-[:can_frame_triggering]->(cft_node) WITH can, cp UNWIND can.pdu_triggering AS pt MERGE (pt_node:PDU_TRIGGERING {short_name: pt.short_name, i_pdu_ref: pt.i_pdu_ref, i_pdu_port_ref: pt.i_pdu_port_ref}) MERGE (cp)-[:pdu_triggering]->(pt_node) """ CREATE_LIN_PHYSICAL_CHANNEL = """ UNWIND $lin_physical_channel AS lin MERGE (lp:LIN_PHYSICAL_CHANNEL {short_name: lin.short_name}) WITH lin, lp UNWIND lin.lin_frame_triggering AS lft MERGE (lft_node:LIN_FRAME_TRIGGERING {short_name: lft.short_name, pdu_triggering_ref: lft.pdu_triggering_ref}) MERGE (lp)-[:lin_frame_triggering]->(lft_node) WITH lin, lp UNWIND lin.pdu_triggering AS pt MERGE (pt_node:PDU_TRIGGERING {short_name: pt.short_name, i_pdu_ref: pt.i_pdu_ref, i_pdu_port_ref: pt.i_pdu_port_ref}) MERGE (lp)-[:pdu_triggering]->(pt_node) """ CREATE_CAN_TP_CONNECTION = """ UNWIND $can_tp_connection AS can_tp MERGE (n:CAN_TP_CONNECT { short_name: can_tp.short_name, data_pdu_ref: can_tp.data_pdu_ref, flow_control_pdu_ref: can_tp.flow_control_pdu_ref, multicast_ref: can_tp.multicast_ref }) """ CREATE_LIN_TP_CONNECTION = """ UNWIND $lin_tp_connection AS lin_tp MERGE (n:LIN_TP_CONNECT { short_name: lin_tp.short_name, data_pdu_ref: lin_tp.data_pdu_ref, flow_control_pdu_ref: lin_tp.flow_control_pdu_ref, multicast_ref: lin_tp.multicast_ref }) """ class Neo4jImporter: def __init__(self, uri, user, password): self.driver = GraphDatabase.driver(uri, auth=(user, password)) def close(self): self.driver.close() def import_data(self, data): with self.driver.session() as session: # 批量创建节点 session.execute_write(self._create_nodes, data) @staticmethod def _create_nodes(tx, data): tx.run(CREATE_N_PDU, n_pdu=data["n_pdu"]) tx.run(CREATE_I_SIGNAL_I_PDU, i_signal_i_pdu=data["i_signal_i_pdu"]) tx.run(CREATE_CAN_FRAME, can_frame=data["can_frame"]) tx.run(CREATE_CAN_PHYSICAL_CHANNEL, can_physical_channel=data["can_physical_channel"]) tx.run(CREATE_LIN_PHYSICAL_CHANNEL, lin_physical_channel=data["lin_physical_channel"]) tx.run(CREATE_CAN_TP_CONNECTION, can_tp_connection=data["can_tp_connection"]) tx.run(CREATE_LIN_TP_CONNECTION, lin_tp_connection=data["lin_tp_connection"]) # 4. 使用示例 if __name__ == "__main__": # 加载原始数据 with open('./ecuc_data.json', 'r') as f: raw_data = json.load(f) # 连接配置 importer = Neo4jImporter("neo4j://192.168.11.111:7687", "neo4j", "111111") try: importer.import_data(raw_data) print("数据导入成功!") except Exception as e: print(f"导入出错: {e}") finally: importer.close() ================================== 通过以上代码创建的知识图谱,想实现以下的查询逻辑,请问怎么写cypher语句: 1.获取 N_PDU 列表中每个 N_PDU 对应的 CAN_FRAME 和 PDU_TRIGGERING: N_PDU 与 CAN_FRAME 的对应关系为:MATCH (n:N_PDU) OPTIONAL MATCH (c:CAN_FRAME) WHERE split(c.pdu_ref, '/')[-1] = n.short_name RETURN n, c N_PDU 与 PDU_TRIGGERING 的对应关系为:MATCH (n:N_PDU) OPTIONAL MATCH (c:CAN_FRAME) WHERE split(c.i_pdu_ref, '/')[-1] = n.short_name RETURN n, c 2.当 PDU_TRIGGERING 的 i_pdu_port_ref 以 In 结尾时,取值为: { ShortName: N_PDU.short_name PduLength: N_PDU.length SysTPduToFrameMappingRef: CAN_FRAME.short_name/CAN_FRAME.pdu_to_frame_mapping_short_name } 3.当 PDU_TRIGGERING 的 i_pdu_port_ref 不以 In 结尾时,需要做进一步判断: 3.1.当 N_PDU 匹配到 LIN_TP_CONNECTION 时不取值 N_PDU 与 LIN_TP_CONNECTION 的对应关系为:MATCH (n:N_PDU) OPTIONAL MATCH (c:LIN_TP_CONNECTION) WHERE split(c.data_pdu_ref, '/')[-1] = n.short_name RETURN n, c 3.2.当 N_PDU 匹配到 CAN_TP_CONNECTION 时,如果 CAN_TP_CONNECTION 里存在 multicast_ref、则取一个值,如果 CAN_TP_CONNECTION 里存在flow_control_pdu_ref、则取两个值: N_PDU 与 CAN_TP_CONNECTION 的对应关系为:MATCH (n:N_PDU) OPTIONAL MATCH (c:CAN_TP_CONNECTION) WHERE split(c.data_pdu_ref, '/')[-1] = n.short_name RETURN n, c 取一个值时,取值为: { ShortName: N_PDU.short_name PduLength: N_PDU.length SysTPduToFrameMappingRef: CAN_FRAME.short_name/CAN_FRAME.pdu_to_frame_mapping_short_name } 取两个值时,取值为: [ { ShortName: N_PDU.short_name PduLength: N_PDU.length SysTPduToFrameMappingRef: CAN_FRAME.short_name/CAN_FRAME.pdu_to_frame_mapping_short_name }, { ShortName: N_PDU.short_name 拼接上 'D' 字符串 PduLength: N_PDU.length SysTPduToFrameMappingRef: CAN_FRAME.short_name/CAN_FRAME.pdu_to_frame_mapping_short_name } ] 4.将以上取到的所有值放到一个列表里,作为最终的返回
10-17
import tkinter as tk from tkinter import filedialog, messagebox, ttk, scrolledtext import csv from datetime import datetime import logging import os import re from collections import defaultdict class CSVProcessorApp: def __init__(self, root): self.root = root self.root.title("CSV_ProcessPro") self.root.geometry("900x650") self.root.resizable(True, True) # 初始化变量 self.file_path = tk.StringVar() self.csv_data = [] self.headers = [] self.raw_data = [] # 存储原始数据 self.header_row_index = tk.IntVar(value=0) # 表头行索引 self.setup_variables() self.setup_logging() self.create_widgets() self.setup_styles() def setup_styles(self): """设置全局样式""" self.style = ttk.Style() self.style.configure("TFrame", background="#f0f0f0") self.style.configure("TLabel", background="#f0f0f0", font=('Arial', 9)) self.style.configure("TButton", font=('Arial', 9, 'bold')) self.style.configure("Accent.TButton", foreground="black", font=('Arial', 9, 'bold'), borderwidth=2, relief="raised") self.style.map("Accent.TButton", background=[("active", "#4a90e2"), ("!active", "#d4e6ff")], bordercolor=[("active", "#4a90e2"), ("!active", "#ffcc00")]) self.style.configure("Remove.TButton", foreground="black", font=('Arial', 8), background="#ffcccc", borderwidth=1, relief="solid") self.style.map("Remove.TButton", background=[("active", "#ff9999"), ("!active", "#ffcccc")]) self.style.configure("Header.TCombobox", font=('Arial', 9)) self.style.configure("TCheckbutton", background="#f0f0f0") def setup_variables(self): """初始化所有动态变量""" # 排序相关 self.sort_header = tk.StringVar() self.sort_order = tk.StringVar(value="升序") # 去重相关 self.dedupe_header = tk.StringVar() # 删除行相关 self.delete_keyword = tk.StringVar() self.delete_column = tk.StringVar() self.delete_case_sensitive = tk.BooleanVar() self.delete_match_type = tk.StringVar(value="包含") # 新增匹配类型 # 合并文件相关 self.merge_file_paths = [] self.merge_column = tk.StringVar() # 状态变量 self.enable_sort = tk.BooleanVar() self.enable_dedupe = tk.BooleanVar() self.enable_custom_letter_sort = tk.BooleanVar() self.letter_range_start = tk.StringVar(value="A") self.letter_range_end = tk.StringVar(value="Z") # 组合处理相关 self.enable_delete = tk.BooleanVar(value=True) self.enable_combined_sort = tk.BooleanVar(value=True) self.enable_combined_dedupe = tk.BooleanVar(value=True) # 表头保留选项 (新增) self.retain_header_sort = tk.BooleanVar(value=True) # 排序/去重页 self.retain_header_delete = tk.BooleanVar(value=True) # 删除行页 self.retain_header_merge = tk.BooleanVar(value=True) # 合并页 self.retain_header_combined = tk.BooleanVar(value=True) # 组合处理页 def setup_logging(self): """配置日志记录""" logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', handlers=[ logging.FileHandler('csv_processor.log', encoding='utf-8'), logging.StreamHandler() ] ) self.logger = logging.getLogger(__name__) self.logger.info("===== 程序启动 =====") def create_widgets(self): """创建所有界面组件""" # 主容器 main_container = ttk.Frame(self.root, padding=5) main_container.pack(fill=tk.BOTH, expand=True) # 使用notebook分页组织功能 self.notebook = ttk.Notebook(main_container) self.notebook.pack(fill=tk.BOTH, expand=True) # 创建各个标签页 self.create_file_tab() self.create_process_tab() self.create_delete_tab() self.create_merge_tab() self.create_combined_tab() # 新增组合处理标签页 self.create_log_tab() def create_file_tab(self): """创建文件操作标签页""" tab = ttk.Frame(self.notebook) self.notebook.add(tab, text="文件操作") # 文件选择部分 frame = ttk.LabelFrame(tab, text="CSV文件选择", padding=10) frame.pack(fill=tk.X, padx=5, pady=5) ttk.Label(frame, text="文件路径:").grid(row=0, column=0, sticky=tk.W) ttk.Entry(frame, textvariable=self.file_path, width=40).grid(row=0, column=1, sticky=tk.EW) ttk.Button(frame, text="浏览", command=self.select_file).grid(row=0, column=2, padx=5) # 表头行选择 ttk.Label(frame, text="表头选择:").grid(row=1, column=0, sticky=tk.W) self.header_row_combobox = ttk.Combobox( frame, textvariable=self.header_row_index, state="readonly", width=5, style="Header.TCombobox" ) self.header_row_combobox.grid(row=1, column=1, sticky=tk.W) ttk.Label(frame, text="(0表示第一行)").grid(row=1, column=2, sticky=tk.W) # 重新解析按钮 ttk.Button(frame, text="重新解析", command=self.reparse_data, style="Accent.TButton").grid(row=1, column=3, padx=5) # 文件信息显示 self.file_info = scrolledtext.ScrolledText(tab, height=8, width=80) self.file_info.pack(fill=tk.X, padx=5, pady=5) def create_process_tab(self): """创建数据处理标签页""" tab = ttk.Frame(self.notebook) self.notebook.add(tab, text="排序/去重") # 排序选项部分 frame = ttk.LabelFrame(tab, text="排序选项", padding=10) frame.pack(fill=tk.X, padx=5, pady=5) ttk.Checkbutton(frame, text="启用排序", variable=self.enable_sort, command=self.toggle_sort).grid(row=0, column=0, sticky=tk.W) ttk.Label(frame, text="排序表头:").grid(row=1, column=0, sticky=tk.W) self.sort_header_combobox = ttk.Combobox(frame, textvariable=self.sort_header, state="readonly") self.sort_header_combobox.grid(row=1, column=1, sticky=tk.EW) ttk.Label(frame, text="排序方式:").grid(row=2, column=0, sticky=tk.W) self.sort_order_combobox = ttk.Combobox( frame, textvariable=self.sort_order, values=["升序", "降序", "自定义字母排序"] ) self.sort_order_combobox.grid(row=2, column=1, sticky=tk.W) # 自定义字母排序范围 ttk.Checkbutton(frame, text="启用字母范围过滤", variable=self.enable_custom_letter_sort, command=self.toggle_letter_sort).grid(row=3, column=0, sticky=tk.W) ttk.Label(frame, text="字母范围:").grid(row=4, column=0, sticky=tk.W) self.letter_range_start_entry = ttk.Entry(frame, textvariable=self.letter_range_start, width=5) self.letter_range_start_entry.grid(row=4, column=1, sticky=tk.W) ttk.Label(frame, text="到").grid(row=4, column=2) self.letter_range_end_entry = ttk.Entry(frame, textvariable=self.letter_range_end, width=5) self.letter_range_end_entry.grid(row=4, column=3, sticky=tk.W) # 去重选项部分 frame = ttk.LabelFrame(tab, text="去重选项", padding=10) frame.pack(fill=tk.X, padx=5, pady=5) ttk.Checkbutton(frame, text="启用去重", variable=self.enable_dedupe, command=self.toggle_dedupe).grid(row=0, column=0, sticky=tk.W) ttk.Label(frame, text="去重表头:").grid(row=1, column=0, sticky=tk.W) self.dedupe_header_combobox = ttk.Combobox(frame, textvariable=self.dedupe_header, state="readonly") self.dedupe_header_combobox.grid(row=1, column=1, sticky=tk.EW) # 表头保留选项 (新增) frame = ttk.LabelFrame(tab, text="输出选项", padding=10) frame.pack(fill=tk.X, padx=5, pady=5) ttk.Checkbutton(frame, text="保留表头", variable=self.retain_header_sort).grid(row=0, column=0, sticky=tk.W) # 处理按钮 btn_frame = ttk.Frame(tab) btn_frame.pack(pady=10) ttk.Button(btn_frame, text="处理并保存到桌面", command=self.process_csv, style="Accent.TButton").pack() def create_delete_tab(self): """创建删除行标签页""" tab = ttk.Frame(self.notebook) self.notebook.add(tab, text="删除行") frame = ttk.LabelFrame(tab, text="删除包含指定字符的行", padding=10) frame.pack(fill=tk.X, padx=5, pady=5) # 删除条件设置 ttk.Label(frame, text="搜索列:").grid(row=0, column=0, sticky=tk.W) self.delete_column_combobox = ttk.Combobox(frame, textvariable=self.delete_column, state="readonly") self.delete_column_combobox.grid(row=0, column=1, sticky=tk.EW) ttk.Label(frame, text="关键字:").grid(row=1, column=0, sticky=tk.W) ttk.Entry(frame, textvariable=self.delete_keyword).grid(row=1, column=1, sticky=tk.EW) # 关键字匹配类型 (新增) ttk.Label(frame, text="匹配方式:").grid(row=2, column=0, sticky=tk.W) match_type_combo = ttk.Combobox( frame, textvariable=self.delete_match_type, values=["包含", "精确匹配", "开头是", "结尾是", "正则匹配"], state="readonly", width=10 ) match_type_combo.set("包含") match_type_combo.grid(row=2, column=1, sticky=tk.W) ttk.Checkbutton(frame, text="区分大小写", variable=self.delete_case_sensitive).grid(row=3, column=0, sticky=tk.W) # 表头保留选项 (新增) frame = ttk.LabelFrame(tab, text="输出选项", padding=10) frame.pack(fill=tk.X, padx=5, pady=5) ttk.Checkbutton(frame, text="保留表头", variable=self.retain_header_delete).grid(row=0, column=0, sticky=tk.W) # 执行按钮 btn_frame = ttk.Frame(tab) btn_frame.pack(pady=10) ttk.Button(btn_frame, text="执行删除并保存到桌面", command=self.delete_rows_with_keyword, style="Accent.TButton").pack() def create_merge_tab(self): """创建文件合并标签页""" tab = ttk.Frame(self.notebook) self.notebook.add(tab, text="文件合并") # 合并文件部分 frame = ttk.LabelFrame(tab, text="合并CSV文件", padding=10) frame.pack(fill=tk.BOTH, expand=True, padx=5, pady=5) # 文件列表容器 list_frame = ttk.Frame(frame) list_frame.pack(fill=tk.BOTH, expand=True) ttk.Label(list_frame, text="已选择文件:").grid(row=0, column=0, sticky=tk.W) # 文件列表和滚动条 self.merge_file_canvas = tk.Canvas(list_frame, height=150) self.merge_file_canvas.grid(row=1, column=0, sticky=tk.EW) scrollbar = ttk.Scrollbar(list_frame, orient="vertical", command=self.merge_file_canvas.yview) scrollbar.grid(row=1, column=1, sticky=tk.NS) self.merge_file_canvas.configure(yscrollcommand=scrollbar.set) self.merge_file_frame = ttk.Frame(self.merge_file_canvas) self.merge_file_canvas.create_window((0, 0), window=self.merge_file_frame, anchor="nw") # 按钮区域 btn_frame = ttk.Frame(frame) btn_frame.pack(fill=tk.X, pady=5) ttk.Button(btn_frame, text="添加文件", command=self.add_merge_file).pack(side=tk.LEFT, padx=5) ttk.Button(btn_frame, text="清空列表", command=self.clear_merge_list).pack(side=tk.LEFT, padx=5) # 合并选项 opt_frame = ttk.Frame(frame) opt_frame.pack(fill=tk.X, pady=5) ttk.Label(opt_frame, text="合并依据列(可选):").grid(row=0, column=0, sticky=tk.W) self.merge_column_combo = ttk.Combobox(opt_frame, textvariable=self.merge_column, state="readonly") self.merge_column_combo.grid(row=0, column=1, sticky=tk.EW) # 表头保留选项 (新增) frame = ttk.LabelFrame(tab, text="输出选项", padding=10) frame.pack(fill=tk.X, padx=5, pady=5) ttk.Checkbutton(frame, text="保留表头", variable=self.retain_header_merge).grid(row=0, column=0, sticky=tk.W) # 合并按钮 btn_frame = ttk.Frame(tab) btn_frame.pack(pady=10) ttk.Button(btn_frame, text="执行合并并保存到桌面", command=self.merge_csv_files, style="Accent.TButton").pack() def create_combined_tab(self): """创建组合处理标签页""" tab = ttk.Frame(self.notebook) self.notebook.add(tab, text="组合处理") # 组合处理选项 frame = ttk.LabelFrame(tab, text="组合处理选项", padding=10) frame.pack(fill=tk.X, padx=5, pady=5) # 删除行选项 delete_frame = ttk.Frame(frame) delete_frame.pack(fill=tk.X, pady=5) ttk.Checkbutton(delete_frame, text="启用删除行", variable=self.enable_delete).pack(side=tk.LEFT, padx=5) ttk.Label(delete_frame, text="搜索列:").pack(side=tk.LEFT, padx=5) self.combined_delete_column_combobox = ttk.Combobox( delete_frame, textvariable=self.delete_column, state="readonly", width=15 ) self.combined_delete_column_combobox.pack(side=tk.LEFT, padx=5) ttk.Label(delete_frame, text="关键字:").pack(side=tk.LEFT, padx=5) ttk.Entry(delete_frame, textvariable=self.delete_keyword, width=15).pack(side=tk.LEFT, padx=5) # 关键字匹配类型 (新增) ttk.Label(delete_frame, text="匹配方式:").pack(side=tk.LEFT, padx=5) match_type_combo = ttk.Combobox( delete_frame, textvariable=self.delete_match_type, values=["包含", "精确匹配", "开头是", "结尾是", "正则匹配"], state="readonly", width=10 ) match_type_combo.set("包含") match_type_combo.pack(side=tk.LEFT, padx=5) ttk.Checkbutton(delete_frame, text="区分大小写", variable=self.delete_case_sensitive).pack(side=tk.LEFT, padx=5) # 排序选项 sort_frame = ttk.Frame(frame) sort_frame.pack(fill=tk.X, pady=5) ttk.Checkbutton(sort_frame, text="启用排序", variable=self.enable_combined_sort).pack(side=tk.LEFT, padx=5) ttk.Label(sort_frame, text="排序表头:").pack(side=tk.LEFT, padx=5) self.combined_sort_header_combobox = ttk.Combobox( sort_frame, textvariable=self.sort_header, state="readonly", width=15 ) self.combined_sort_header_combobox.pack(side=tk.LEFT, padx=5) ttk.Label(sort_frame, text="排序方式:").pack(side=tk.LEFT, padx=5) self.combined_sort_order_combobox = ttk.Combobox( sort_frame, textvariable=self.sort_order, values=["升序", "降序", "自定义字母排序"], width=15 ) self.combined_sort_order_combobox.pack(side=tk.LEFT, padx=5) # 自定义字母排序范围 ttk.Checkbutton( sort_frame, text="启用字母范围", variable=self.enable_custom_letter_sort ).pack(side=tk.LEFT, padx=5) ttk.Label(sort_frame, text="从").pack(side=tk.LEFT, padx=5) ttk.Entry(sort_frame, textvariable=self.letter_range_start, width=3).pack(side=tk.LEFT) ttk.Label(sort_frame, text="到").pack(side=tk.LEFT, padx=5) ttk.Entry(sort_frame, textvariable=self.letter_range_end, width=3).pack(side=tk.LEFT) # 去重选项 dedupe_frame = ttk.Frame(frame) dedupe_frame.pack(fill=tk.X, pady=5) ttk.Checkbutton(dedupe_frame, text="启用去重", variable=self.enable_combined_dedupe).pack(side=tk.LEFT, padx=5) ttk.Label(dedupe_frame, text="去重表头:").pack(side=tk.LEFT, padx=5) self.combined_dedupe_header_combobox = ttk.Combobox( dedupe_frame, textvariable=self.dedupe_header, state="readonly", width=15 ) self.combined_dedupe_header_combobox.pack(side=tk.LEFT, padx=5) # 表头保留选项 (新增) frame = ttk.LabelFrame(tab, text="输出选项", padding=10) frame.pack(fill=tk.X, padx=5, pady=5) ttk.Checkbutton(frame, text="保留表头", variable=self.retain_header_combined).grid(row=0, column=0, sticky=tk.W) # 处理按钮 btn_frame = ttk.Frame(tab) btn_frame.pack(pady=10) ttk.Button(btn_frame, text="执行组合处理并保存到桌面", command=self.combined_process, style="Accent.TButton").pack() def create_log_tab(self): """创建日志标签页""" tab = ttk.Frame(self.notebook) self.notebook.add(tab, text="运行日志") self.log_text = scrolledtext.ScrolledText(tab, height=15, width=80) self.log_text.pack(fill=tk.BOTH, expand=True, padx=5, pady=5) def log_message(self, message, level="info"): """记录日志并显示在GUI中""" log_methods = { "info": self.logger.info, "error": self.logger.error, "warning": self.logger.warning } # 记录到日志文件 log_methods.get(level, self.logger.info)(message) # 显示在GUI日志标签页 timestamp = datetime.now().strftime("%H:%M:%S") tagged_msg = f"[{timestamp}] {message}" self.log_text.insert(tk.END, tagged_msg + "\n") self.log_text.see(tk.END) # 同时在文件信息标签页显示重要信息 if level in ["error", "warning"]: self.file_info.config(state=tk.NORMAL) self.file_info.insert(tk.END, tagged_msg + "\n") self.file_info.config(state=tk.DISABLED) self.file_info.see(tk.END) def select_file(self): """选择CSV文件""" file_path = filedialog.askopenfilename( title="选择CSV文件", filetypes=[("CSV文件", "*.csv"), ("文本文件", "*.txt"), ("所有文件", "*.*")] ) if file_path: self.file_path.set(file_path) self.log_message(f"已选择文件: {file_path}") self.load_csv(file_path) def reparse_data(self): """重新解析数据(使用新的表头行)""" if not self.file_path.get(): messagebox.showwarning("警告", "请先选择CSV文件") return self.log_message(f"重新解析数据,使用表头行: {self.header_row_index.get()}") self.parse_data(self.raw_data) def load_csv(self, file_path): """加载CSV文件内容""" try: with open(file_path, 'r', encoding='utf-8-sig') as file: reader = csv.reader(file) self.raw_data = list(reader) # 保存原始数据 self.parse_data(self.raw_data) self.log_message(f"文件加载成功,共 {len(self.csv_data)} 行") except Exception as e: error_msg = f"读取CSV文件失败: {str(e)}" self.log_message(error_msg, "error") messagebox.showerror("错误", error_msg) def parse_data(self, raw_data): """解析原始数据,根据选择的表头行""" if not raw_data: return # 更新表头行选择框 row_options = list(range(len(raw_data))) self.header_row_combobox['values'] = row_options # 使用用户选择的表头行 header_index = self.header_row_index.get() if header_index < 0 or header_index >= len(raw_data): header_index = 0 self.header_row_index.set(0) # 设置表头和数据 # 表头行之前的数据保留为数据行 self.headers = raw_data[header_index] self.csv_data = raw_data[:header_index] + raw_data[header_index+1:] # 更新UI self.update_ui_with_headers() self.show_file_info(self.file_path.get()) def show_file_info(self, file_path): """显示文件信息""" self.file_info.config(state=tk.NORMAL) self.file_info.delete(1.0, tk.END) info = [ f"文件路径: {file_path}", f"总行数: {len(self.csv_data)}", f"列数: {len(self.headers)}", f"表头: {', '.join(self.headers)}", f"表头行: {self.header_row_index.get()}", "="*40, "前5行数据预览:" ] self.file_info.insert(tk.END, "\n".join(info) + "\n") # 显示前5行数据 for i, row in enumerate(self.csv_data[:5], 1): self.file_info.insert(tk.END, f"{i}. {', '.join(row)}\n") self.file_info.config(state=tk.DISABLED) def update_ui_with_headers(self): """根据加载的CSV更新UI元素""" # 更新所有下拉框 for combo in [ self.sort_header_combobox, self.dedupe_header_combobox, self.delete_column_combobox, self.merge_column_combo, self.combined_delete_column_combobox, self.combined_sort_header_combobox, self.combined_dedupe_header_combobox ]: combo['values'] = self.headers # 设置默认值 if self.headers: self.sort_header.set(self.headers[0]) self.dedupe_header.set(self.headers[0]) self.delete_column.set(self.headers[0]) self.merge_column.set("") def toggle_sort(self): """切换排序功能的启用状态""" state = "normal" if self.enable_sort.get() else "disabled" self.sort_header_combobox['state'] = state self.sort_order_combobox['state'] = state self.toggle_letter_sort() self.log_message(f"排序功能 {'启用' if self.enable_sort.get() else '禁用'}") def toggle_dedupe(self): """切换去重功能的启用状态""" state = "normal" if self.enable_dedupe.get() else "disabled" self.dedupe_header_combobox['state'] = state self.log_message(f"去重功能 {'启用' if self.enable_dedupe.get() else '禁用'}") def toggle_letter_sort(self): """控制字母范围输入框的启用状态""" if not self.enable_sort.get(): return state = "normal" if self.enable_custom_letter_sort.get() else "disabled" self.letter_range_start_entry['state'] = state self.letter_range_end_entry['state'] = state self.log_message(f"字母范围过滤 {'启用' if self.enable_custom_letter_sort.get() else '禁用'}") def add_merge_file(self): """添加要合并的文件""" file_paths = filedialog.askopenfilenames( title="选择要合并的CSV文件", filetypes=[("CSV文件", "*.csv"), ("文本文件", "*.txt"), ("所有文件", "*.*")] ) if file_paths: for path in file_paths: if path not in self.merge_file_paths: self.merge_file_paths.append(path) self.update_merge_file_list() def clear_merge_list(self): """清空合并文件列表""" if self.merge_file_paths: self.merge_file_paths = [] self.update_merge_file_list() self.log_message("已清空合并文件列表") def update_merge_file_list(self): """更新合并文件列表显示""" # 清除现有内容 for widget in self.merge_file_frame.winfo_children(): widget.destroy() if not self.merge_file_paths: ttk.Label(self.merge_file_frame, text="尚未选择任何文件").pack() self.merge_file_canvas.configure(scrollregion=self.merge_file_canvas.bbox("all")) return # 添加文件列表 for i, path in enumerate(self.merge_file_paths): row_frame = ttk.Frame(self.merge_file_frame) row_frame.pack(fill=tk.X, pady=2) ttk.Label(row_frame, text=f"{i+1}. {os.path.basename(path)}", width=40, anchor="w").pack(side=tk.LEFT) ttk.Button(row_frame, text="移除", command=lambda p=path: self.remove_merge_file(p), style="Remove.TButton").pack(side=tk.LEFT, padx=2) # 更新滚动区域 self.merge_file_frame.update_idletasks() self.merge_file_canvas.configure(scrollregion=self.merge_file_canvas.bbox("all")) def remove_merge_file(self, file_path): """移除指定的合并文件""" if file_path in self.merge_file_paths: self.merge_file_paths.remove(file_path) self.update_merge_file_list() self.log_message(f"已移除文件: {file_path}") def delete_rows(self, data, column, keyword, match_type, case_sensitive): """删除包含关键字的行(通用方法)""" if not column or not keyword or not data: return data try: col_index = self.headers.index(column) if not case_sensitive: keyword = keyword.lower() new_data = [data[0]] # 保留表头 deleted_count = 0 for row in data[1:]: if len(row) > col_index: value = row[col_index] # 根据匹配类型处理 if not case_sensitive: value = value.lower() match_found = False if match_type == "包含": match_found = keyword in value elif match_type == "精确匹配": match_found = keyword == value elif match_type == "开头是": match_found = value.startswith(keyword) elif match_type == "结尾是": match_found = value.endswith(keyword) elif match_type == "正则匹配": try: match_found = bool(re.search(keyword, value)) except re.error: self.log_message(f"正则表达式错误: {keyword}", "warning") match_found = False if not match_found: new_data.append(row) else: deleted_count += 1 self.log_message(f"删除行: 移除了 {deleted_count} 行匹配 '{keyword}' 的数据") return new_data except Exception as e: error_msg = f"删除行时出错: {str(e)}" self.log_message(error_msg, "error") messagebox.showerror("错误", error_msg) return data def sort_data(self, data, header, order, enable_letter_sort, letter_start, letter_end): """对数据进行排序(通用方法)""" if not header or not data: return data try: sort_index = self.headers.index(header) reverse = (order == "降序") # 字母范围过滤 if enable_letter_sort: try: letter_start = letter_start.upper() letter_end = letter_end.upper() if not (len(letter_start) == 1 and len(letter_end) == 1 and letter_start.isalpha() and letter_end.isalpha()): raise ValueError("字母范围必须是单个字母(如A-Z)") filtered_rows = [] for row in data[1:]: # 跳过表头 if len(row) > sort_index: value = str(row[sort_index]).strip().upper() if value and letter_start <= value[0] <= letter_end: filtered_rows.append(row) data = [data[0]] + filtered_rows self.log_message(f"字母范围过滤完成:{letter_start} 到 {letter_end}") except Exception as e: self.log_message(f"字母范围过滤失败: {str(e)}", "error") messagebox.showerror("错误", f"字母范围过滤失败: {str(e)}") return data # 排序逻辑 def sort_key(row): if len(row) > sort_index: value = row[sort_index] # 尝试解析为日期 for fmt in ("%Y-%m-%d %H:%M:%S", "%Y-%m-%d", "%m/%d/%Y", "%Y.%m.%d"): try: return datetime.strptime(value, fmt) except ValueError: continue # 尝试解析为数字 try: return float(value) except ValueError: pass return value.lower() # 默认按字符串排序 return "" # 空值处理 # 执行排序 if order == "自定义字母排序": data[1:] = sorted( data[1:], key=lambda x: str(sort_key(x)).lower() if len(x) > sort_index else "", reverse=False ) else: data[1:] = sorted(data[1:], key=sort_key, reverse=reverse) self.log_message(f"排序完成,表头 '{header}',顺序: {order}") return data except Exception as e: self.log_message(f"排序时出错: {str(e)}", "error") messagebox.showerror("错误", f"排序时出错: {str(e)}") return data def dedupe_data(self, data, header): """对数据进行去重(通用方法)""" if not header or not data: return data try: dedupe_index = self.headers.index(header) seen = set() unique_rows = [data[0]] # 保留表头 for row in data[1:]: if len(row) > dedupe_index: key = row[dedupe_index] if key not in seen: seen.add(key) unique_rows.append(row) self.log_message( f"去重完成,根据表头 '{header}' 删除重复项," f"原始行数: {len(data)},去重后行数: {len(unique_rows)}" ) return unique_rows except Exception as e: self.log_message(f"去重时出错: {str(e)}", "error") messagebox.showerror("错误", f"去重时出错: {str(e)}") return data def delete_rows_with_keyword(self): """删除包含关键字的行并保存到桌面""" if not self.file_path.get(): messagebox.showwarning("警告", "请先选择CSV文件") return column = self.delete_column.get() keyword = self.delete_keyword.get() if not column: messagebox.showwarning("警告", "请选择要搜索的列") return if not keyword: messagebox.showwarning("警告", "请输入要搜索的关键字") return try: # 执行删除 processed_data = self.delete_rows( self.csv_data, column, keyword, self.delete_match_type.get(), # 使用匹配类型 self.delete_case_sensitive.get() ) # 生成保存路径 operation = f"deleted_{keyword}" save_path = self.generate_filename(self.file_path.get(), operation) # 保存文件 if self.save_csv_file( processed_data, save_path, retain_header=self.retain_header_delete.get() # 使用保留表头选项 ): # 更新当前数据 self.csv_data = processed_data messagebox.showinfo("成功", f"结果已保存到桌面:\n{os.path.basename(save_path)}") except Exception as e: error_msg = f"删除行时出错: {str(e)}" self.log_message(error_msg, "error") messagebox.showerror("错误", error_msg) def get_desktop_path(self): """获取桌面路径""" try: desktop = os.path.join(os.path.join(os.environ['USERPROFILE']), 'Desktop') if os.path.exists(desktop): return desktop except KeyError: pass # 如果上面的方法失败,尝试其他方法 desktop = os.path.join(os.path.expanduser('~'), 'Desktop') if os.path.exists(desktop): return desktop # 如果还是失败,返回当前目录 return os.getcwd() def generate_filename(self, original_name, operation): """生成新的文件名""" if not original_name: original_name = "processed" base = os.path.basename(original_name) name, ext = os.path.splitext(base) # 清理操作名称中的特殊字符 clean_op = "".join(c if c.isalnum() else "_" for c in operation) timestamp = datetime.now().strftime("%Y%m%d%H%M%S") new_name = f"{name}_{clean_op}_{timestamp}.csv" return os.path.join(self.get_desktop_path(), new_name) def save_csv_file(self, data, save_path, retain_header=True): """保存CSV文件到指定路径""" try: with open(save_path, 'w', encoding='utf-8-sig', newline='') as file: writer = csv.writer(file) # 根据设置决定是否写入表头 if retain_header and self.headers: writer.writerow(self.headers) # 写入数据行 for row in data[1:]: writer.writerow(row) # 更新文件信息显示 self.show_file_info(save_path) self.log_message(f"文件已保存到: {save_path}") return True except Exception as e: error_msg = f"保存文件时出错: {str(e)}" self.log_message(error_msg, "error") messagebox.showerror("错误", error_msg) return False def process_csv(self): """处理CSV文件(排序、去重等)并保存到桌面""" if not self.file_path.get(): messagebox.showwarning("警告", "请先选择CSV文件") return if not self.csv_data: messagebox.showwarning("警告", "CSV文件没有数据") return self.log_message("开始处理CSV文件...") processed_data = self.csv_data.copy() # 去重处理 if self.enable_dedupe.get(): processed_data = self.dedupe_data( processed_data, self.dedupe_header.get() ) # 排序处理 if self.enable_sort.get(): processed_data = self.sort_data( processed_data, self.sort_header.get(), self.sort_order.get(), self.enable_custom_letter_sort.get(), self.letter_range_start.get(), self.letter_range_end.get() ) # 生成操作描述 operations = [] if self.enable_sort.get(): operations.append(f"sorted_{self.sort_header.get()}{self.sort_order.get()}") if self.enable_dedupe.get(): operations.append(f"deduped") operation = "_".join(operations) if operations else "processed" # 生成保存路径 save_path = self.generate_filename(self.file_path.get(), operation) # 保存文件 if self.save_csv_file( processed_data, save_path, retain_header=self.retain_header_sort.get() # 使用保留表头选项 ): # 更新当前数据 self.csv_data = processed_data messagebox.showinfo("成功", f"文件处理完成,已保存到桌面:\n{os.path.basename(save_path)}") def combined_process(self): """组合处理:删除行 -> 排序 -> 去重""" if not self.file_path.get(): messagebox.showwarning("警告", "请先选择CSV文件") return if not self.csv_data: messagebox.showwarning("警告", "CSV文件没有数据") return self.log_message("开始组合处理CSV文件...") processed_data = self.csv_data.copy() operations = [] # 1. 删除行 if self.enable_delete.get(): column = self.delete_column.get() keyword = self.delete_keyword.get() if column and keyword: processed_data = self.delete_rows( processed_data, column, keyword, self.delete_match_type.get(), # 使用匹配类型 self.delete_case_sensitive.get() ) operations.append(f"deleted") # 2. 排序 if self.enable_combined_sort.get(): header = self.sort_header.get() order = self.sort_order.get() if header: processed_data = self.sort_data( processed_data, header, order, self.enable_custom_letter_sort.get(), self.letter_range_start.get(), self.letter_range_end.get() ) operations.append(f"sorted_{header}{order}") # 3. 去重 if self.enable_combined_dedupe.get(): header = self.dedupe_header.get() if header: processed_data = self.dedupe_data( processed_data, header ) operations.append(f"deduped") # 生成操作描述 operation = "combined_" + "_".join(operations) if operations else "combined_processed" # 生成保存路径 save_path = self.generate_filename(self.file_path.get(), operation) # 保存文件 if self.save_csv_file( processed_data, save_path, retain_header=self.retain_header_combined.get() # 使用保留表头选项 ): # 更新当前数据 self.csv_data = processed_data messagebox.showinfo("成功", f"组合处理完成,已保存到桌面:\n{os.path.basename(save_path)}") def merge_csv_files(self): """合并多个CSV文件并保存到桌面""" if not self.merge_file_paths: messagebox.showwarning("警告", "请先添加要合并的文件") return try: # 检查所有文件是否存在 missing_files = [f for f in self.merge_file_paths if not os.path.exists(f)] if missing_files: raise FileNotFoundError(f"以下文件不存在: {', '.join(missing_files)}") merge_column = self.merge_column.get() common_headers = None all_data = [] # 收集所有文件的表头和数据 header_sets = [] for file_path in self.merge_file_paths: with open(file_path, 'r', encoding='utf-8-sig') as file: reader = csv.reader(file) data = list(reader) if data: header_sets.append(set(data[0])) all_data.append(data) # 找出共同表头 if header_sets: common_headers = set(header_sets[0]) for headers in header_sets[1:]: common_headers.intersection_update(headers) common_headers = sorted(common_headers) if not common_headers: raise ValueError("选中的文件没有共同的列,无法合并") # 如果没有指定合并依据列,使用所有共同列 merge_indices = None if merge_column: if merge_column not in common_headers: raise ValueError(f"合并依据列 '{merge_column}' 不在共同列中") merge_indices = [i for i, h in enumerate(common_headers) if h == merge_column] # 合并数据 merged_data = [common_headers.copy()] # 表头行 key_counter = defaultdict(int) for data in all_data: if not data: continue headers = data[0] header_map = {h: i for i, h in enumerate(headers)} for row in data[1:]: # 如果指定了合并列,检查是否已存在相同键 if merge_indices: merge_values = [row[header_map[h]] for h in common_headers if h == merge_column] if merge_values: key = tuple(merge_values) key_counter[key] += 1 if key_counter[key] > 1: continue # 跳过重复键的行 # 构建新行,只保留共同列 new_row = [] for col in common_headers: if col in header_map and len(row) > header_map[col]: new_row.append(row[header_map[col]]) else: new_row.append("") merged_data.append(new_row) # 生成操作描述 operation = "merged" if merge_column: operation += f"by_{merge_column}" # 生成保存路径 first_file = os.path.basename(self.merge_file_paths[0]) save_path = self.generate_filename(first_file, operation) # 保存文件 if self.save_csv_file( merged_data[1:], # 跳过表头行 save_path, retain_header=self.retain_header_merge.get() # 使用保留表头选项 ): messagebox.showinfo("成功", f"文件合并完成,已保存到桌面:\n{os.path.basename(save_path)}") except Exception as e: error_msg = f"合并文件时出错: {str(e)}" self.log_message(error_msg, "error") messagebox.showerror("错误", error_msg) if __name__ == "__main__": root = tk.Tk() app = CSVProcessorApp(root) root.mainloop() 优化检查这个代码, 1.保留表头指的是 保留一开始所选择的初始csv原数据的默认表头即第一行数据,和用户自定义设定的表头数据,还有这两者之间的数据,一共3种 2.删除功能重新梳理,只保留关键字精准搜索功能,目前代码执行完删除功能后的数据执行删除功能不成功,无法删除关键字所在行数据 按照以上要求输出一份修改后的完整代码,不要省略
07-11
import tkinter as tk from tkinter import filedialog, messagebox, ttk, scrolledtext import csv from datetime import datetime import logging import os from collections import defaultdict class CSVProcessorApp: def init(self, root): self.root = root self.root.title(“CSV_ProcessPro”) self.root.geometry(“800x600”) self.root.resizable(False, False) 初始化变量 self.file_path = tk.StringVar() self.csv_data = [] self.headers = [] self.raw_data = [] # 存储原始数据 self.header_row_index = tk.IntVar(value=0) # 表头行索引 self.setup_variables() self.setup_logging() self.create_widgets() self.setup_styles() def setup_styles(self): “”“设置全局样式”“” self.style = ttk.Style() self.style.configure(“TFrame”, background=“#f0f0f0”) self.style.configure(“TLabel”, background=“#f0f0f0”, font=(‘Arial’, 9)) self.style.configure(“TButton”, font=(‘Arial’, 9, ‘bold’)) self.style.configure(“Accent.TButton”, foreground=“black”, font=(‘Arial’, 9, ‘bold’), borderwidth=2, relief=“raised”) self.style.map(“Accent.TButton”, background=[(“active”, “#4a90e2”), (“!active”, “#d4e6ff”)], bordercolor=[(“active”, “#4a90e2”), (“!active”, “#ffcc00”)]) self.style.configure(“Remove.TButton”, foreground=“black”, font=(‘Arial’, 8), background=“#ffcccc”, borderwidth=1, relief=“solid”) self.style.map(“Remove.TButton”, background=[(“active”, “#ff9999”), (“!active”, “#ffcccc”)]) self.style.configure(“Header.TCombobox”, font=(‘Arial’, 9)) def setup_variables(self): “”“初始化所有动态变量”“” # 排序相关 self.sort_header = tk.StringVar() self.sort_order = tk.StringVar(value=“升序”) # 去重相关 self.dedupe_header = tk.StringVar() # 删除行相关 self.delete_keyword = tk.StringVar() self.delete_column = tk.StringVar() self.delete_case_sensitive = tk.BooleanVar() # 合并文件相关 self.merge_file_paths = [] self.merge_column = tk.StringVar() # 状态变量 self.enable_sort = tk.BooleanVar() self.enable_dedupe = tk.BooleanVar() self.enable_custom_letter_sort = tk.BooleanVar() self.letter_range_start = tk.StringVar(value=“A”) self.letter_range_end = tk.StringVar(value=“Z”) # 组合处理相关 self.enable_delete = tk.BooleanVar(value=True) self.enable_combined_sort = tk.BooleanVar(value=True) self.enable_combined_dedupe = tk.BooleanVar(value=True) def setup_logging(self): “”“配置日志记录”“” logging.basicConfig( level=logging.INFO, format=‘%(asctime)s - %(levelname)s - %(message)s’, handlers=[ logging.FileHandler(‘csv_processor.log’, encoding=‘utf-8’), logging.StreamHandler() ] ) self.logger = logging.getLogger(name) self.logger.info(“===== 程序启动 =====”) def create_widgets(self): “”“创建所有界面组件”“” # 主容器 main_container = ttk.Frame(self.root, padding=5) main_container.pack(fill=tk.BOTH, expand=True) # 使用notebook分页组织功能 self.notebook = ttk.Notebook(main_container) self.notebook.pack(fill=tk.BOTH, expand=True) # 创建各个标签页 self.create_file_tab() self.create_process_tab() self.create_delete_tab() self.create_merge_tab() self.create_combined_tab() # 新增组合处理标签页 self.create_log_tab() def create_file_tab(self): “”“创建文件操作标签页”“” tab = ttk.Frame(self.notebook) self.notebook.add(tab, text=“文件操作”) # 文件选择部分 frame = ttk.LabelFrame(tab, text=“CSV文件选择”, padding=10) frame.pack(fill=tk.X, padx=5, pady=5) ttk.Label(frame, text=“文件路径:”).grid(row=0, column=0, sticky=tk.W) ttk.Entry(frame, textvariable=self.file_path, width=40).grid(row=0, column=1, sticky=tk.EW) ttk.Button(frame, text=“浏览”, command=self.select_file).grid(row=0, column=2, padx=5) # 表头行选择 ttk.Label(frame, text=“表头选择:”).grid(row=1, column=0, sticky=tk.W) self.header_row_combobox = ttk.Combobox( frame, textvariable=self.header_row_index, state=“readonly”, width=5, style=“Header.TCombobox” ) self.header_row_combobox.grid(row=1, column=1, sticky=tk.W) ttk.Label(frame, text=“(0表示第一行)”).grid(row=1, column=2, sticky=tk.W) # 重新解析按钮 ttk.Button(frame, text=“重新解析”, command=self.reparse_data, style=“Accent.TButton”).grid(row=1, column=3, padx=5) # 文件信息显示 self.file_info = scrolledtext.ScrolledText(tab, height=8, width=80) self.file_info.pack(fill=tk.X, padx=5, pady=5) def create_process_tab(self): “”“创建数据处理标签页”“” tab = ttk.Frame(self.notebook) self.notebook.add(tab, text=“排序/去重”) # 排序选项部分 frame = ttk.LabelFrame(tab, text=“排序选项”, padding=10) frame.pack(fill=tk.X, padx=5, pady=5) ttk.Checkbutton(frame, text=“启用排序”, variable=self.enable_sort, command=self.toggle_sort).grid(row=0, column=0, sticky=tk.W) ttk.Label(frame, text=“排序表头:”).grid(row=1, column=0, sticky=tk.W) self.sort_header_combobox = ttk.Combobox(frame, textvariable=self.sort_header, state=“readonly”) self.sort_header_combobox.grid(row=1, column=1, sticky=tk.EW) ttk.Label(frame, text=“排序方式:”).grid(row=2, column=0, sticky=tk.W) self.sort_order_combobox = ttk.Combobox( frame, textvariable=self.sort_order, values=[“升序”, “降序”, “自定义字母排序”] ) self.sort_order_combobox.grid(row=2, column=1, sticky=tk.W) # 自定义字母排序范围 ttk.Checkbutton(frame, text=“启用字母范围过滤”, variable=self.enable_custom_letter_sort, command=self.toggle_letter_sort).grid(row=3, column=0, sticky=tk.W) ttk.Label(frame, text=“字母范围:”).grid(row=4, column=0, sticky=tk.W) self.letter_range_start_entry = ttk.Entry(frame, textvariable=self.letter_range_start, width=5) self.letter_range_start_entry.grid(row=4, column=1, sticky=tk.W) ttk.Label(frame, text=“到”).grid(row=4, column=2) self.letter_range_end_entry = ttk.Entry(frame, textvariable=self.letter_range_end, width=5) self.letter_range_end_entry.grid(row=4, column=3, sticky=tk.W) # 去重选项部分 frame = ttk.LabelFrame(tab, text=“去重选项”, padding=10) frame.pack(fill=tk.X, padx=5, pady=5) ttk.Checkbutton(frame, text=“启用去重”, variable=self.enable_dedupe, command=self.toggle_dedupe).grid(row=0, column=0, sticky=tk.W) ttk.Label(frame, text=“去重表头:”).grid(row=1, column=0, sticky=tk.W) self.dedupe_header_combobox = ttk.Combobox(frame, textvariable=self.dedupe_header, state=“readonly”) self.dedupe_header_combobox.grid(row=1, column=1, sticky=tk.EW) # 处理按钮 btn_frame = ttk.Frame(tab) btn_frame.pack(pady=10) ttk.Button(btn_frame, text=“处理并保存到桌面”, command=self.process_csv, style=“Accent.TButton”).pack() def create_delete_tab(self): “”“创建删除行标签页”“” tab = ttk.Frame(self.notebook) self.notebook.add(tab, text=“删除行”) frame = ttk.LabelFrame(tab, text=“删除包含指定字符的行”, padding=10) frame.pack(fill=tk.X, padx=5, pady=5) # 删除条件设置 ttk.Label(frame, text=“搜索列:”).grid(row=0, column=0, sticky=tk.W) self.delete_column_combobox = ttk.Combobox(frame, textvariable=self.delete_column, state=“readonly”) self.delete_column_combobox.grid(row=0, column=1, sticky=tk.EW) ttk.Label(frame, text=“关键字:”).grid(row=1, column=0, sticky=tk.W) ttk.Entry(frame, textvariable=self.delete_keyword).grid(row=1, column=1, sticky=tk.EW) ttk.Checkbutton(frame, text=“区分大小写”, variable=self.delete_case_sensitive).grid(row=2, column=0, sticky=tk.W) # 执行按钮 btn_frame = ttk.Frame(tab) btn_frame.pack(pady=10) ttk.Button(btn_frame, text=“执行删除并保存到桌面”, command=self.delete_rows_with_keyword, style=“Accent.TButton”).pack() def create_merge_tab(self): “”“创建文件合并标签页”“” tab = ttk.Frame(self.notebook) self.notebook.add(tab, text=“文件合并”) # 合并文件部分 frame = ttk.LabelFrame(tab, text=“合并CSV文件”, padding=10) frame.pack(fill=tk.BOTH, expand=True, padx=5, pady=5) # 文件列表容器 list_frame = ttk.Frame(frame) list_frame.pack(fill=tk.BOTH, expand=True) ttk.Label(list_frame, text=“已选择文件:”).grid(row=0, column=0, sticky=tk.W) # 文件列表和滚动条 self.merge_file_canvas = tk.Canvas(list_frame, height=150) self.merge_file_canvas.grid(row=1, column=0, sticky=tk.EW) scrollbar = ttk.Scrollbar(list_frame, orient=“vertical”, command=self.merge_file_canvas.yview) scrollbar.grid(row=1, column=1, sticky=tk.NS) self.merge_file_canvas.configure(yscrollcommand=scrollbar.set) self.merge_file_frame = ttk.Frame(self.merge_file_canvas) self.merge_file_canvas.create_window((0, 0), window=self.merge_file_frame, anchor=“nw”) # 按钮区域 btn_frame = ttk.Frame(frame) btn_frame.pack(fill=tk.X, pady=5) ttk.Button(btn_frame, text=“添加文件”, command=self.add_merge_file).pack(side=tk.LEFT, padx=5) ttk.Button(btn_frame, text=“清空列表”, command=self.clear_merge_list).pack(side=tk.LEFT, padx=5) # 合并选项 opt_frame = ttk.Frame(frame) opt_frame.pack(fill=tk.X, pady=5) ttk.Label(opt_frame, text=“合并依据列(可选)😊.grid(row=0, column=0, sticky=tk.W) self.merge_column_combo = ttk.Combobox(opt_frame, textvariable=self.merge_column, state=“readonly”) self.merge_column_combo.grid(row=0, column=1, sticky=tk.EW) # 合并按钮 btn_frame = ttk.Frame(tab) btn_frame.pack(pady=10) ttk.Button(btn_frame, text=“执行合并并保存到桌面”, command=self.merge_csv_files, style=“Accent.TButton”).pack() def create_combined_tab(self): “”“创建组合处理标签页””" tab = ttk.Frame(self.notebook) self.notebook.add(tab, text=“组合处理”) # 组合处理选项 frame = ttk.LabelFrame(tab, text=“组合处理选项”, padding=10) frame.pack(fill=tk.X, padx=5, pady=5) # 删除行选项 delete_frame = ttk.Frame(frame) delete_frame.pack(fill=tk.X, pady=5) ttk.Checkbutton(delete_frame, text=“启用删除行”, variable=self.enable_delete).pack(side=tk.LEFT, padx=5) ttk.Label(delete_frame, text=“搜索列:”).pack(side=tk.LEFT, padx=5) self.combined_delete_column_combobox = ttk.Combobox( delete_frame, textvariable=self.delete_column, state=“readonly”, width=15 ) self.combined_delete_column_combobox.pack(side=tk.LEFT, padx=5) ttk.Label(delete_frame, text=“关键字:”).pack(side=tk.LEFT, padx=5) ttk.Entry(delete_frame, textvariable=self.delete_keyword, width=15).pack(side=tk.LEFT, padx=5) ttk.Checkbutton(delete_frame, text=“区分大小写”, variable=self.delete_case_sensitive).pack(side=tk.LEFT, padx=5) # 排序选项 sort_frame = ttk.Frame(frame) sort_frame.pack(fill=tk.X, pady=5) ttk.Checkbutton(sort_frame, text=“启用排序”, variable=self.enable_combined_sort).pack(side=tk.LEFT, padx=5) ttk.Label(sort_frame, text=“排序表头:”).pack(side=tk.LEFT, padx=5) self.combined_sort_header_combobox = ttk.Combobox( sort_frame, textvariable=self.sort_header, state=“readonly”, width=15 ) self.combined_sort_header_combobox.pack(side=tk.LEFT, padx=5) ttk.Label(sort_frame, text=“排序方式:”).pack(side=tk.LEFT, padx=5) self.combined_sort_order_combobox = ttk.Combobox( sort_frame, textvariable=self.sort_order, values=[“升序”, “降序”, “自定义字母排序”], width=15 ) self.combined_sort_order_combobox.pack(side=tk.LEFT, padx=5) # 自定义字母排序范围(新增) ttk.Checkbutton( sort_frame, text=“启用字母范围”, variable=self.enable_custom_letter_sort ).pack(side=tk.LEFT, padx=5) ttk.Label(sort_frame, text=“从”).pack(side=tk.LEFT, padx=5) ttk.Entry(sort_frame, textvariable=self.letter_range_start, width=3).pack(side=tk.LEFT) ttk.Label(sort_frame, text=“到”).pack(side=tk.LEFT, padx=5) ttk.Entry(sort_frame, textvariable=self.letter_range_end, width=3).pack(side=tk.LEFT) # 去重选项 dedupe_frame = ttk.Frame(frame) dedupe_frame.pack(fill=tk.X, pady=5) ttk.Checkbutton(dedupe_frame, text=“启用去重”, variable=self.enable_combined_dedupe).pack(side=tk.LEFT, padx=5) ttk.Label(dedupe_frame, text=“去重表头:”).pack(side=tk.LEFT, padx=5) self.combined_dedupe_header_combobox = ttk.Combobox( dedupe_frame, textvariable=self.dedupe_header, state=“readonly”, width=15 ) self.combined_dedupe_header_combobox.pack(side=tk.LEFT, padx=5) # 处理按钮 btn_frame = ttk.Frame(tab) btn_frame.pack(pady=10) ttk.Button(btn_frame, text=“执行组合处理并保存到桌面”, command=self.combined_process, style=“Accent.TButton”).pack() def create_log_tab(self): “”“创建日志标签页”“” tab = ttk.Frame(self.notebook) self.notebook.add(tab, text=“运行日志”) self.log_text = scrolledtext.ScrolledText(tab, height=15, width=80) self.log_text.pack(fill=tk.BOTH, expand=True, padx=5, pady=5) def log_message(self, message, level=“info”): “”“记录日志并显示在GUI中”“” log_methods = { “info”: self.logger.info, “error”: self.logger.error, “warning”: self.logger.warning } # 记录到日志文件 log_methods.get(level, self.logger.info)(message) # 显示在GUI日志标签页 timestamp = datetime.now().strftime(“%H:%M:%S”) tagged_msg = f"[{timestamp}] {message}" self.log_text.insert(tk.END, tagged_msg + “\n”) self.log_text.see(tk.END) # 同时在文件信息标签页显示重要信息 if level in [“error”, “warning”]: self.file_info.config(state=tk.NORMAL) self.file_info.insert(tk.END, tagged_msg + “\n”) self.file_info.config(state=tk.DISABLED) self.file_info.see(tk.END) def select_file(self): “”“选择CSV文件”“” file_path = filedialog.askopenfilename( title=“选择CSV文件”, filetypes=[(“CSV文件”, “.csv"), (“文本文件”, ".txt”), (“所有文件”, “.”)] ) if file_path: self.file_path.set(file_path) self.log_message(f"已选择文件: {file_path}“) self.load_csv(file_path) def reparse_data(self): “”“重新解析数据(使用新的表头行)””" if not self.file_path.get(): messagebox.showwarning(“警告”, “请先选择CSV文件”) return self.log_message(f"重新解析数据,使用表头行: {self.header_row_index.get()}“) self.parse_data(self.raw_data) def load_csv(self, file_path): “”“加载CSV文件内容””" try: with open(file_path, ‘r’, encoding=‘utf-8-sig’) as file: reader = csv.reader(file) self.raw_data = list(reader) # 保存原始数据 self.parse_data(self.raw_data) self.log_message(f"文件加载成功,共 {len(self.csv_data)} 行") except Exception as e: error_msg = f"读取CSV文件失败: {str(e)}" self.log_message(error_msg, “error”) messagebox.showerror(“错误”, error_msg) def parse_data(self, raw_data): “”“解析原始数据,根据选择的表头行”“” if not raw_data: return # 更新表头行选择框 row_options = list(range(len(raw_data))) self.header_row_combobox[‘values’] = row_options # 使用用户选择的表头行 header_index = self.header_row_index.get() if header_index < 0 or header_index >= len(raw_data): header_index = 0 self.header_row_index.set(0) # 设置表头和数据 # 表头行之前的数据保留为数据行 self.headers = raw_data[header_index] self.csv_data = raw_data[:header_index] + raw_data[header_index+1:] # 更新UI self.update_ui_with_headers() self.show_file_info(self.file_path.get()) def show_file_info(self, file_path): “”“显示文件信息”“” self.file_info.config(state=tk.NORMAL) self.file_info.delete(1.0, tk.END) info = [ f"文件路径: {file_path}“, f"总行数: {len(self.csv_data)}”, f"列数: {len(self.headers)}“, f"表头: {', '.join(self.headers)}”, f"表头行: {self.header_row_index.get()}“, “=“40, “前5行数据预览:” ] self.file_info.insert(tk.END, “\n”.join(info) + “\n”) # 显示前5行数据 for i, row in enumerate(self.csv_data[:5], 1): self.file_info.insert(tk.END, f"{i}. {', '.join(row)}\n") self.file_info.config(state=tk.DISABLED) def update_ui_with_headers(self): “”“根据加载的CSV更新UI元素”“” # 更新所有下拉框 for combo in [ self.sort_header_combobox, self.dedupe_header_combobox, self.delete_column_combobox, self.merge_column_combo, self.combined_delete_column_combobox, self.combined_sort_header_combobox, self.combined_dedupe_header_combobox ]: combo[‘values’] = self.headers # 设置默认值 if self.headers: self.sort_header.set(self.headers[0]) self.dedupe_header.set(self.headers[0]) self.delete_column.set(self.headers[0]) self.merge_column.set(“”) def toggle_sort(self): “”“切换排序功能的启用状态”“” state = “normal” if self.enable_sort.get() else “disabled” self.sort_header_combobox[‘state’] = state self.sort_order_combobox[‘state’] = state self.toggle_letter_sort() self.log_message(f"排序功能 {‘启用’ if self.enable_sort.get() else ‘禁用’}“) def toggle_dedupe(self): “”“切换去重功能的启用状态””" state = “normal” if self.enable_dedupe.get() else “disabled” self.dedupe_header_combobox[‘state’] = state self.log_message(f"去重功能 {‘启用’ if self.enable_dedupe.get() else ‘禁用’}“) def toggle_letter_sort(self): “”“控制字母范围输入框的启用状态””" if not self.enable_sort.get(): return state = “normal” if self.enable_custom_letter_sort.get() else “disabled” self.letter_range_start_entry[‘state’] = state self.letter_range_end_entry[‘state’] = state self.log_message(f"字母范围过滤 {‘启用’ if self.enable_custom_letter_sort.get() else ‘禁用’}“) def add_merge_file(self): “”“添加要合并的文件””" file_paths = filedialog.askopenfilenames( title=“选择要合并的CSV文件”, filetypes=[(“CSV文件”, ".csv”), (“文本文件”, “.txt"), (“所有文件”, ".*”)] ) if file_paths: for path in file_paths: if path not in self.merge_file_paths: self.merge_file_paths.append(path) self.update_merge_file_list() def clear_merge_list(self): “”“清空合并文件列表””” if self.merge_file_paths: self.merge_file_paths = [] self.update_merge_file_list() self.log_message(“已清空合并文件列表”) def update_merge_file_list(self): “”“更新合并文件列表显示”“” # 清除现有内容 for widget in self.merge_file_frame.winfo_children(): widget.destroy() if not self.merge_file_paths: ttk.Label(self.merge_file_frame, text=“尚未选择任何文件”).pack() self.merge_file_canvas.configure(scrollregion=self.merge_file_canvas.bbox(“all”)) return # 添加文件列表 for i, path in enumerate(self.merge_file_paths): row_frame = ttk.Frame(self.merge_file_frame) row_frame.pack(fill=tk.X, pady=2) ttk.Label(row_frame, text=f"{i+1}. {os.path.basename(path)}“, width=40, anchor=“w”).pack(side=tk.LEFT) ttk.Button(row_frame, text=“移除”, command=lambda p=path: self.remove_merge_file(p), style=“Remove.TButton”).pack(side=tk.LEFT, padx=2) # 更新滚动区域 self.merge_file_frame.update_idletasks() self.merge_file_canvas.configure(scrollregion=self.merge_file_canvas.bbox(“all”)) def remove_merge_file(self, file_path): “”“移除指定的合并文件””" if file_path in self.merge_file_paths: self.merge_file_paths.remove(file_path) self.update_merge_file_list() self.log_message(f"已移除文件: {file_path}“) def delete_rows(self, data, column, keyword, case_sensitive): “”“删除包含关键字的行(通用方法)””" if not column or not keyword or not data: return data try: col_index = self.headers.index(column) if not case_sensitive: keyword = keyword.lower() new_data = [data[0]] # 保留表头 deleted_count = 0 for row in data[1:]: if len(row) > col_index: value = row[col_index] compare_value = value if case_sensitive else value.lower() if keyword not in compare_value: new_data.append(row) else: deleted_count += 1 self.log_message(f"删除行: 移除了 {deleted_count} 行包含 ‘{keyword}’ 的数据") return new_data except Exception as e: error_msg = f"删除行时出错: {str(e)}" self.log_message(error_msg, “error”) messagebox.showerror(“错误”, error_msg) return data def sort_data(self, data, header, order, enable_letter_sort, letter_start, letter_end): “”“对数据进行排序(通用方法)”“” if not header or not data: return data try: sort_index = self.headers.index(header) reverse = (order == “降序”) # 字母范围过滤 if enable_letter_sort: try: letter_start = letter_start.upper() letter_end = letter_end.upper() if not (len(letter_start) == 1 and len(letter_end) == 1 and letter_start.isalpha() and letter_end.isalpha()): raise ValueError(“字母范围必须是单个字母(如A-Z)”) filtered_rows = [] for row in data[1:]: # 跳过表头 if len(row) > sort_index: value = str(row[sort_index]).strip().upper() if value and letter_start <= value[0] <= letter_end: filtered_rows.append(row) data = [data[0]] + filtered_rows self.log_message(f"字母范围过滤完成:{letter_start} 到 {letter_end}“) except Exception as e: self.log_message(f"字母范围过滤失败: {str(e)}”, “error”) messagebox.showerror(“错误”, f"字母范围过滤失败: {str(e)}“) return data # 排序逻辑 def sort_key(row): if len(row) > sort_index: value = row[sort_index] # 尝试解析为日期 for fmt in (”%Y-%m-%d %H:%M:%S", “%Y-%m-%d”, “%m/%d/%Y”, “%Y.%m.%d”): try: return datetime.strptime(value, fmt) except ValueError: continue # 尝试解析为数字 try: return float(value) except ValueError: pass return value.lower() # 默认按字符串排序 return “” # 执行排序 if order == “自定义字母排序”: data[1:] = sorted( data[1:], key=lambda x: str(sort_key(x)).lower() if len(x) > sort_index else “”, reverse=False ) else: data[1:] = sorted(data[1:], key=sort_key, reverse=reverse) self.log_message(f"排序完成,表头 ‘{header}’,顺序: {order}“) return data except Exception as e: self.log_message(f"排序时出错: {str(e)}”, “error”) messagebox.showerror(“错误”, f"排序时出错: {str(e)}“) return data def dedupe_data(self, data, header): “”“对数据进行去重(通用方法)””" if not header or not data: return data try: dedupe_index = self.headers.index(header) seen = set() unique_rows = [data[0]] # 保留表头 for row in data[1:]: if len(row) > dedupe_index: key = row[dedupe_index] if key not in seen: seen.add(key) unique_rows.append(row) self.log_message( f"去重完成,根据表头 ‘{header}’ 删除重复项," f"原始行数: {len(data)},去重后行数: {len(unique_rows)}" ) return unique_rows except Exception as e: self.log_message(f"去重时出错: {str(e)}“, “error”) messagebox.showerror(“错误”, f"去重时出错: {str(e)}”) return data def delete_rows_with_keyword(self): “”“删除包含关键字的行并保存到桌面”“” if not self.file_path.get(): messagebox.showwarning(“警告”, “请先选择CSV文件”) return column = self.delete_column.get() keyword = self.delete_keyword.get() if not column: messagebox.showwarning(“警告”, “请选择要搜索的列”) return if not keyword: messagebox.showwarning(“警告”, “请输入要搜索的关键字”) return try: # 执行删除 processed_data = self.delete_rows( self.csv_data, column, keyword, self.delete_case_sensitive.get() ) # 生成保存路径 operation = f"deleted_{keyword}" save_path = self.generate_filename(self.file_path.get(), operation) # 保存文件 if self.save_csv_file(processed_data, save_path): # 更新当前数据 self.csv_data = processed_data messagebox.showinfo(“成功”, f"结果已保存到桌面:\n{os.path.basename(save_path)}“) except Exception as e: error_msg = f"删除行时出错: {str(e)}” self.log_message(error_msg, “error”) messagebox.showerror(“错误”, error_msg) def get_desktop_path(self): “”“获取桌面路径”“” try: desktop = os.path.join(os.path.join(os.environ[‘USERPROFILE’]), ‘Desktop’) if os.path.exists(desktop): return desktop except KeyError: pass # 如果上面的方法失败,尝试其他方法 desktop = os.path.join(os.path.expanduser(‘~’), ‘Desktop’) if os.path.exists(desktop): return desktop # 如果还是失败,返回当前目录 return os.getcwd() def generate_filename(self, original_name, operation): “”“生成新的文件名”“” if not original_name: original_name = “processed” base = os.path.basename(original_name) name, ext = os.path.splitext(base) # 清理操作名称中的特殊字符 clean_op = “”.join(c if c.isalnum() else “" for c in operation) timestamp = datetime.now().strftime("%Y%m%d%H%M%S”) new_name = f"{name}{clean_op}" return os.path.join(self.get_desktop_path(), new_name) def save_csv_file(self, data, save_path): “”“保存CSV文件到指定路径”“” try: with open(save_path, ‘w’, encoding=‘utf-8-sig’, newline=‘’) as file: writer = csv.writer(file) writer.writerows(data) # 更新文件信息显示 self.show_file_info(save_path) self.log_message(f"文件已保存到: {save_path}“) return True except Exception as e: error_msg = f"保存文件时出错: {str(e)}” self.log_message(error_msg, “error”) messagebox.showerror(“错误”, error_msg) return False def process_csv(self): “”“处理CSV文件(排序、去重等)并保存到桌面”“” if not self.file_path.get(): messagebox.showwarning(“警告”, “请先选择CSV文件”) return if not self.csv_data: messagebox.showwarning(“警告”, “CSV文件没有数据”) return self.log_message(“开始处理CSV文件…”) processed_data = self.csv_data.copy() # 去重处理 if self.enable_dedupe.get(): processed_data = self.dedupe_data( processed_data, self.dedupe_header.get() ) # 排序处理 if self.enable_sort.get(): processed_data = self.sort_data( processed_data, self.sort_header.get(), self.sort_order.get(), self.enable_custom_letter_sort.get(), self.letter_range_start.get(), self.letter_range_end.get() ) # 生成操作描述 operations = [] if self.enable_sort.get(): operations.append(f"sorted_{self.sort_header.get()}{self.sort_order.get()}") if self.enable_dedupe.get(): operations.append(f"deduped") operation = ““.join(operations) if operations else “processed” # 生成保存路径 save_path = self.generate_filename(self.file_path.get(), operation) # 保存文件 if self.save_csv_file(processed_data, save_path): # 更新当前数据 self.csv_data = processed_data messagebox.showinfo(“成功”, f"文件处理完成,已保存到桌面:\n{os.path.basename(save_path)}”) def combined_process(self): “”“组合处理:删除行 -> 排序 -> 去重”“” if not self.file_path.get(): messagebox.showwarning(“警告”, “请先选择CSV文件”) return if not self.csv_data: messagebox.showwarning(“警告”, “CSV文件没有数据”) return self.log_message(“开始组合处理CSV文件…”) processed_data = self.csv_data.copy() operations = [] # 1. 删除行 if self.enable_delete.get(): column = self.delete_column.get() keyword = self.delete_keyword.get() if column and keyword: processed_data = self.delete_rows( processed_data, column, keyword, self.delete_case_sensitive.get() ) operations.append(f"deleted”) # 2. 排序 if self.enable_combined_sort.get(): header = self.sort_header.get() order = self.sort_order.get() if header: processed_data = self.sort_data( processed_data, header, order, self.enable_custom_letter_sort.get(), self.letter_range_start.get(), self.letter_range_end.get() ) operations.append(f"sorted_{header}{order}") # 3. 去重 if self.enable_combined_dedupe.get(): header = self.dedupe_header.get() if header: processed_data = self.dedupe_data( processed_data, header ) operations.append(f"deduped") # 生成操作描述 operation = “combined_” + “_”.join(operations) if operations else “combined_processed” # 生成保存路径 save_path = self.generate_filename(self.file_path.get(), operation) # 保存文件 if self.save_csv_file(processed_data, save_path): # 更新当前数据 self.csv_data = processed_data messagebox.showinfo(“成功”, f"组合处理完成,已保存到桌面:\n{os.path.basename(save_path)}“) def merge_csv_files(self): “”“合并多个CSV文件并保存到桌面””" if not self.merge_file_paths: messagebox.showwarning(“警告”, “请先添加要合并的文件”) return try: # 检查所有文件是否存在 missing_files = [f for f in self.merge_file_paths if not os.path.exists(f)] if missing_files: raise FileNotFoundError(f"以下文件不存在: {', '.join(missing_files)}“) merge_column = self.merge_column.get() common_headers = None all_data = [] # 收集所有文件的表头和数据 header_sets = [] for file_path in self.merge_file_paths: with open(file_path, ‘r’, encoding=‘utf-8-sig’) as file: reader = csv.reader(file) data = list(reader) if data: header_sets.append(set(data[0])) all_data.append(data) # 找出共同表头 if header_sets: common_headers = set(header_sets[0]) for headers in header_sets[1:]: common_headers.intersection_update(headers) common_headers = sorted(common_headers) if not common_headers: raise ValueError(“选中的文件没有共同的列,无法合并”) # 如果没有指定合并依据列,使用所有共同列 merge_indices = None if merge_column: if merge_column not in common_headers: raise ValueError(f"合并依据列 ‘{merge_column}’ 不在共同列中”) merge_indices = [i for i, h in enumerate(common_headers) if h == merge_column] # 合并数据 merged_data = [common_headers.copy()] key_counter = defaultdict(int) for data in all_data: if not data: continue headers = data[0] header_map = {h: i for i, h in enumerate(headers)} for row in data[1:]: # 如果指定了合并列,检查是否已存在相同键 if merge_indices: merge_values = [row[header_map[h]] for h in common_headers if h == merge_column] if merge_values: key = tuple(merge_values) key_counter[key] += 1 if key_counter[key] > 1: continue # 跳过重复键的行 # 构建新行,只保留共同列 new_row = [] for col in common_headers: if col in header_map and len(row) > header_map[col]: new_row.append(row[header_map[col]]) else: new_row.append(“”) merged_data.append(new_row) # 生成操作描述 operation = “merged” if merge_column: operation += f"by" # 生成保存路径 first_file = os.path.basename(self.merge_file_paths[0]) save_path = self.generate_filename(first_file, operation) # 保存文件 if self.save_csv_file(merged_data, save_path): messagebox.showinfo(“成功”, f"文件合并完成,已保存到桌面:\n{os.path.basename(save_path)}“) except Exception as e: error_msg = f"合并文件时出错: {str(e)}” self.log_message(error_msg, “error”) messagebox.showerror(“错误”, error_msg) if name == “main”: root = tk.Tk() app = CSVProcessorApp(root) root.mainloop() 对这个代码的所有功能优化,要求 1.执行完功能操作输出的新的csv文件仍保留原数据默认表头 2.执行完功能操作输出的新的csv文件所自定义设定的表头 3.执行完功能操作输出的新的csv文件保留原数据默认表头和自定义设定的表头中间的数据 4.每个功能页增加勾选按钮,用户自主选择是否需要保留原数据表头 2.对于关键字要求缩小范围,不是只要有数据包含这个关键字就可以 并输出一份完整代码,不要省略代码
07-10
import pandas as pd """ # 读取数据,假设列名为'date'和'temp' df_data = pd.read_excel('最高温度.xlsx') df_data['日期'] = pd.to_datetime(df_data['日期']) data_to_save = [] # 提取月份和日 for zhan_name in df_data.columns[1:]: # 计算每日的历史均值和标准差(1980-2022年) df=df_data[['日期',zhan_name]] df['month'] = df['日期'].dt.month df['day'] = df['日期'].dt.day daily_stats = df.groupby(['month', 'day'])[zhan_name].agg(['mean', 'std']).reset_index() # 合并数据以计算STI merged = df.merge(daily_stats, on=['month', 'day']) # 处理标准差为零的情况(替换为极小值避免除零错误) merged['std'] = merged['std'].replace(0, 1e-10) # 计算STI merged[zhan_name] = (merged[zhan_name] - merged['mean']) / merged['std'] data_to_save.append(merged[zhan_name] ) df_flattened_data = pd.DataFrame(data_to_save) df_flattened_data=df_flattened_data.T # 提取结果并保存 sti_result = merged[['date', 'STI']] sti_result.to_csv('daily_sti_results.csv', index=False) import pandas as pd import numpy as np # 读取数据 df = pd.read_excel('最高温度.xlsx') df['日期'] = pd.to_datetime(df['日期']) df['year'] = df['日期'].dt.year df['month'] = df['日期'].dt.month df['day'] = df['日期'].dt.day # 1. 处理闰日:合并到2月28日 df.loc[(df['month'] == 2) & (df['day'] == 29), 'day'] = 28 # 2. 设置基准期(推荐1981-2010) base_start = 1981 base_end = 2010 base_mask = (df['year'] >= base_start) & (df['year'] <= base_end) # 3. 计算基准期气候态 climate = df[base_mask].groupby(['month', 'day']).agg(['mean', 'std']) # 4. 处理标准差为0的情况 for col in df.columns[1:-3]: # 站点列 climate[(col, 'std')] = climate[(col, 'std')].replace(0, 1e-10) # 5. 合并气候数据 df = df.join(climate, on=['month', 'day'], rsuffix='_climate') # 6. 计算各站点STI for station in df.columns[1:-3]: mean_col = (station, 'mean') std_col = (station, 'std') df[f'{station}_STI'] = (df[station] - df[mean_col]) / df[std_col] # 7. 保存结果 df.to_excel('STI_计算结果.xlsx', index=False) """ # -*- coding: utf-8 -*- """ 1. 程序目的 (1) 实现单站点逐日的不同时间尺度的STI计算 2. 版本 2.1 天津,2025年3月17日 Version 1 3. 数据 3.1 输入数据 '01_Data'文件夹下的数据 3.2 输出数据 '03_Result'文件夹下的数据 """ # %% 包的导入 import numpy as np import pandas as pd # %% 变量预定义 month_day = {'Jan': (1, 31), 'Feb': (2, 29), 'Mar': (3, 31), 'Apr': (4, 30), 'May': (5, 31), 'Jun': (6, 30), 'Jul': (7, 31), 'Aug': (8, 31), 'Sep': (9, 30), 'Oct': (10, 31), 'Nov': (11, 30), 'Dec': (12, 31)} # %% 函数预定义 # %% 数据读取函数定义 def read_oridata( filepath: str ): """ (1) 功能: 读取用于计算sti的气象数据 注:此函数读取的是测试数据,也可以读取按照相同方式存储的气象数据 --- (2) 输入参数: 1) filepath: str,输入数存储路径 --- (3) 输出参数 1) sta_id: int,站点号 2) styr: int, 开始年 3) stmt: int, 开始月 4) stday: int, 开始日 5) edyr: int, 结束年 6) edmt: int, 结束月 7) edday: int, 结束日 8) tas: np.array, 温度 """ # 数据读取 clim_data = pd.read_csv(filepath) return clim_data # %% def tas_data_convolve(tas_data: np.array, scale: tuple) -> pd.DataFrame: ''' (1) 函数功能 ------------ 依据时间尺度,通过滑动平均方法整理tas数据,生成不同时间尺度的tas滑动均值 序列 (2) 参数 ------------ tas_data: np.array 温度数据(一般是平均温度) scale: tuple 时间尺度 (3) 返回值 ------- tas_data_scale: pd.DataFrame 特定时间尺度下的tas整理结果 ''' # 创建存储tas滑动均值的结果的DataFrame(tas,不同时间尺度的tas滑动均值结果) scale = sorted(scale) scale_list = [] for scale_temp in scale: if scale_temp < 10: scale_list.append('scale_0' + str(scale_temp)) else: scale_list.append('scale_' + str(scale_temp)) col_names = ['tas_original'] + scale_list # 空的数据框的创建 tas_roll = pd.DataFrame(data=[], columns=col_names) # 不同时间尺度的tas滑动均值的计算 for scale_temp in scale: if scale_temp == 1: tas_data_scale = tas_data tas_roll['scale_0' + str(scale_temp)] = tas_data_scale else: tas_data_scale = np.convolve(tas_data, np.ones(scale_temp), mode='valid') # 对于sti,求取的是卷积窗口内的均值,若是SPI则不需要求取均值 tas_data_scale = tas_data_scale / scale_temp # 空值区域补充为np.NaN tas_data_scale = np.hstack(([np.NaN] * (scale_temp - 1), tas_data_scale)) # if scale_temp >= 10: tas_roll['scale_' + str(scale_temp)] = tas_data_scale else: tas_roll['scale_0' + str(scale_temp)] = tas_data_scale tas_roll['tas_original'] = tas_data return tas_roll # %% def time_frame_get( start_date: str, end_date: str, ) -> pd.DataFrame: """ (1) 功能 -------- 由开始日期和结束日期获取年月日数据框 (2) 参数 ------- (1) start_date: 开始日期, str, year-month-day 或 year-month (2) end_date: 结束日期, str, year-month-day 或 year-month (3) 返回值 ------- time_frame: pd.DataFrame, year, month, day或year, month """ # 创建时间索引 # len_format = len(start_date.split('-')) if len_format == 3: date_index = pd.date_range(start_date, end_date) # 创建存储年月日的数据框 time_frame = pd.DataFrame(data=np.zeros((len(date_index), 3)), columns=['year', 'month', 'day']) # year_month_day_frame填充 for ii in range(len(date_index)): time_frame.year[ii] = date_index[ii].year time_frame.month[ii] = date_index[ii].month time_frame.day[ii] = date_index[ii].day else: date_index = pd.date_range(start_date, end_date, freq='M') time_frame = pd.DataFrame(data=np.zeros((len(date_index), 2)), columns=['year', 'month']) # year_month_frame填充 for ii in range(len(date_index)): time_frame.year[ii] = date_index[ii].year time_frame.month[ii] = date_index[ii].month return time_frame # %% 时间序列和气象要素合并函数 def tas_scale_frame( roll_tas: pd.DataFrame, frame_time: pd.DataFrame ) -> pd.DataFrame: ''' (1) 功能 -------- 将tas_data_convolve()的输出和time_frame_get()的输出结果合并 (2) 参数 ------- (1) roll_tas: pd.DataFrame,依据时间尺度滑动平均处理后的温度数据 (2) frame_time: pd.DataFrame,时间序列 (3) 返回值 ------- tas_sacle_frame: pd.DataFrame, 具有时间列的滑动平均处理后的温度数据 ''' tassacle_frame = frame_time.join(roll_tas) return tassacle_frame # %% def sti_diffscale_cal( frame_tassacle: pd.DataFrame, scale: tuple ) -> pd.DataFrame: ''' (1) 函数功能 ------------ 计算逐日的不同时间尺度的sti (2) 参数 ------------ frame_tassacle: pd.DataFrame tas_scale_frame()函数的输出结果 scale: tuple 时间尺度 (3) 返回值 ------------ sti_frame: pd.DataFrame 不同时间尺度下的sti计算结果 ''' # 获取列名 coll_names = frame_tassacle.columns.tolist() sti_frame = pd.DataFrame(data=[], columns=coll_names) # 年、月、日和原始温度数据 sti_frame[coll_names[0:4]] = frame_tassacle[coll_names[0:4]] # 提取不同时间尺度的tas的滑动均值 scale = sorted(scale) for scale_temp in scale: if scale_temp < 10: coll_name_temp = 'scale_0' + str(scale_temp) else: coll_name_temp = 'scale_' + str(scale_temp) # sti_series_frame = tas_series_frame # 特定时间尺度的tas数据的计算 for key, value in month_day.items(): month_temp = value[0] day_temp = value[1] for day_temp in range(1, day_temp + 1): data_extract = frame_tassacle.loc[(frame_tassacle['month'] == month_temp) & (frame_tassacle['day'] == day_temp) & (~pd.isna(frame_tassacle[coll_name_temp])), coll_name_temp] # STI计算方法 sti_temp = (data_extract - data_extract.mean()) / np.std(data_extract) sti_frame.loc[(frame_tassacle['month'] == month_temp) & (frame_tassacle['day'] == day_temp) & (~pd.isna(frame_tassacle[coll_name_temp])), coll_name_temp] = sti_temp return sti_frame # %% 测试程序 if __name__ == '__main__': all_sti_results = pd.DataFrame() date_column_added = False # %% 路径处理和基本变量定义 scale_aim = (5, 7) # 五天尺度和周尺度 # %% 获取用于计算STI的数据 climdata = pd.read_excel('最高温度.xlsx') climdata['日期'] = pd.to_datetime(climdata['日期']) # 开始年和结束年 styr = climdata['日期'].dt.year[0] stmt = climdata['日期'].dt.month[0] stday = climdata['日期'].dt.day[0] edyr = max(climdata['日期'].dt.year) roww, coll = climdata.shape # print(roww) edmt = climdata['日期'].dt.month[roww - 1] edday = climdata['日期'].dt.day[roww - 1] for name in climdata.columns[1:]: # 温度 tas = np.asarray(climdata[name]) # %% 生成时间数据框 stdate = str(styr) + '-' + str(stmt) + '-' + str(stday) eddate = str(edyr) + '-' + str(edmt) + '-' + str(edday) time_series = time_frame_get(stdate, eddate) # %% 不同时间尺度下卷积处理后的数据 tas_data_roll = tas_data_convolve(tas, scale_aim) # 合并时间标识和tas_data_roll tasscaleframe = tas_scale_frame(tas_data_roll, time_series) # 计算不同时间尺度的sti sti_result = sti_diffscale_cal(tasscaleframe, scale_aim) sti = pd.DataFrame() sti[name] = sti_result["scale_05"] # 创建日期列 if not date_column_added: all_sti_results['日期'] = pd.to_datetime(sti_result[["year", "month", "day"]]) date_column_added = True # 添加当前站点的STI结果 (使用5天尺度) all_sti_results[name] = sti[name] all_sti_results.to_excel('STI_计算结果.xlsx', index=False) 分析这个代码
最新发布
10-18
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值