Excel数据汇总Python程序_python如何把不同表格的数据按时间汇总-优快云博客

本文链接：https://blog.youkuaiyun.com/chen_w_t/article/details/147718407

现在的AI已经很厉害了，Excel处理的相关代码是很快就能写出来，有的一下就写好了，有的可能需要不断试错，带能改好。
今天我就用自身的经历，把Excel数据汇总的Python程序给大家分享
此次程序实现了以下功能：
1、能够对数据进行汇总,速度快
2、对日期格式进行修订输出：2023/3/04
缺点如下：
1、汇总的数据源会改变格式，如果没有格式要求的就可以略过
2、目前程序每天做到自动跨文件夹添加，只具备在指定文件夹内实现汇总

import pandas as pd

import os
import hashlib

# 源文件夹路径和目标文件路径
source_dir = r'\\****\**_\****\14、****\1、***\******\********'
output_file = r'\\******\******\*******\14、******\1、****\*******\************.xlsx'
sheet_name_to_extract = "*****"  # 目标表名称

# 用于保存文件哈希的字典
hash_store = {}

# 计算文件的哈希值
def compute_file_hash(file_path):
    hasher = hashlib.md5()
    with open(file_path, 'rb') as f:
        buf = f.read()
        hasher.update(buf)
    return hasher.hexdigest()

# 打开汇总表文件，读取到 DataFrame
if os.path.exists(output_file):
    summary_df = pd.read_excel(output_file)
else:
    summary_df = pd.DataFrame()  # 如果文件不存在，则创建一个空的 DataFrame

# 确保“来源文件名”列存在
if '来源文件名' not in summary_df.columns:
    summary_df['来源文件名'] = None

# 遍历源文件夹
for file_name in os.listdir(source_dir):
    if '全数问题点' in file_name and file_name.endswith('.xlsx'):
        file_path = os.path.join(source_dir, file_name)

        try:
            # 计算文件的当前哈希值
            current_hash = compute_file_hash(file_path)

            # 检查文件是否已经被处理且内容存在变化
            if file_name in hash_store and hash_store[file_name] == current_hash:
                print(f"跳过未变化的文件: {file_name}")
                continue

            # 获取文件中的所有工作表名称
            xls = pd.ExcelFile(file_path)
            if sheet_name_to_extract in xls.sheet_names:
                # 读取特定的工作表
                data_df = pd.read_excel(file_path, sheet_name=sheet_name_to_extract)

                # 添加来源文件名列
                data_df['来源文件名'] = file_name

                # 直接合并新的数据替换原有数据
                summary_df = summary_df[summary_df['来源文件名'] != file_name]
                summary_df = pd.concat([summary_df, data_df], ignore_index=True)

                # 更新哈希存储
                hash_store[file_name] = current_hash
                print(f"已处理并合并文件: {file_name}")

            else:
                print(f"文件 {file_name} 不包含目标表：{sheet_name_to_extract}")

        except Exception as e:
            print(f"文件 {file_name} 处理出现问题: {e}")

# 确保数据帧中存在“日期”和“序号”列，并将它们转换为适当的数据类型
if '日期' in summary_df.columns and '序号' in summary_df.columns:
    summary_df['日期'] = pd.to_datetime(summary_df['日期'], errors='coerce').dt.date  # 转换为日期格式，去除时分秒
    summary_df['序号'] = pd.to_numeric(summary_df['序号'], errors='coerce')  # 转换为数字格式
    summary_df.sort_values(by=['日期', '序号'], inplace=True)

# 保存汇总的 DataFrame 至 Excel 文件
summary_df.to_excel(output_file, index=False)
print("数据已成功复制并排序到汇总表中。")

程序是用kIMI生成的，如果大家觉得让AI生产麻烦，可以试一下这段代码，代码可用！