Working with CSV files in Bash

本文介绍了一系列使用Bash脚本处理CSV文件的方法,包括计数元素与行数、读取头部信息、逐行读取内容及拆分字段等实用技巧。

V files are very common. Using them with Bash to aid in scripting can be very useful. Here are a some methods of using bash to work with CSV files. These are what I sometimes use when dealing with CSV files in bash, they may work for your CSV’s they may not. I accept no responsibility of the accuracy of these, I’m sure there are better ways of accomplishing the same tasks. If anybody finds other ways or knows better please comment and let me know. Don’t run these on production data, its always a good idea to make backups and test them before using in a production environment. Who knows how your precious data could become mangled due to a wrong number of quotes, terminations or bad regex. Jump for the post

How to count the number of elements within a CSV file.

[owen@TheLinuxBlog.Com bin]$ echo $(cat test.csv | sed ‘s/, /_ /g’ | tr ‘,’ ‘/n’ | wc -l)
9

Count the number of lines in a CSV file terminated with /n

[owen@TheLinuxBlog.Com bin]$ wc -l test.csv | cut –delimiter/=” ” -f 1
3

And another way:

[owen@TheLinuxBlog.Com bin]$ echo $(expr `wc -l “test.csv” | awk ‘{print $1}’`)
3

Print the headers of a CSV file (useful if the headers contain the field names)

[owen@TheLinuxBlog.Com bin]$ head -1 test.csv
ID,Question,Answer

Read a line of the CSV (where the number next to the p is the line number) This is useful for looping over a file and getting the line.

[owen@TheLinuxBlog.com bin]$ sed -n 2′p’ “test.csv”
1,Who Am I?,I am Owen
[owen@TheLinuxBlog.com bin]$ sed -n 1′p’ “test.csv”
ID,Question,Answer
[owen@TheLinuxBlog.com bin]$ sed -n 3′p’ “test.csv”
2,What blog is this from?,TheLinuxBlog

So, to loop over the CSV and read a line you can do something like the following:

[owen@TheLinuxBlog.Com bin]$ for i in $(seq 1 `wc -l “test.csv” | awk ‘{print $1}’`); do sed -n $i’p’ “test.csv”; done;
ID,Question,Answer
1,Who Am I?,I am Owen
2,What blog is this from?,TheLinuxBlog

Of course with the above you can offset the first line by changing seq 1 to seq 2 to start from line 2.

To split up each row (line) in the CSV at the comma, I use the following:

[owen@TheLinuxBlog.Com bin]$ for i in $(seq 1 `wc -l “test.csv” | awk ‘{print $1}’`); do echo $(sed -n $i’p’ “test.csv”) | tr ‘,’ ‘/n’; done;

This puts a backspace in place of a comma. The above is known not to work for fields with comma’s in them. If the data relies on comma’s in the fields you have to use some more advanced sed expressions.

Here is the CSV file I was using for this post:

[owen@norpmes bin]$ cat test.csv
ID,Question,Answer
1,Who Am I?,I am Owen
2,What blog is this from?,TheLinuxBlog

Let me know if this was helpful or not, as a whole this is not suppose to be a one to end all CSV tutorial but more of the techniques that can be used to aid in processing them. If any one has any thoughts or suggestions, needs any help or finds other methods of completing the same or other tasks then please comment!

Precautions for Dynamically Appending ko to BOARD_*_KERNEL_MODULES 1. DATA INTEGRITY IN KO_ORDER_TABLE.CSV The ko_order_table.csv file should have accurate and consistent information about the kernel modules (.ko files). Each entry in the CSV should correctly represent a valid .ko module, including details such as the module name, its dependencies, and any other relevant metadata. Incorrect data in the CSV can lead to errors when appending to BOARD_*_KERNEL_MODULES as the system may try to load non - existent or misconfigured modules. For example, if a dependency is misspelled in the CSV, the kernel may fail to load the module due to unmet dependencies. 2. COMPATIBILITY OF .KO MODULES Ensure that the .ko modules listed in ko_order_table.csv are compatible with the kernel version and the target board. Different kernel versions may have different module interfaces, and using incompatible modules can lead to system instability or module loading failures. For instance, a module compiled for a specific kernel release may not work correctly on a different kernel patch level. 3. ORDERING IN KO_ORDER_TABLE.CSV The order of entries in ko_order_table.csv matters. Kernel modules often have dependencies on other modules, and they need to be loaded in the correct order. When appending to BOARD_*_KERNEL_MODULES, the order should reflect the correct loading sequence. For example, if module A depends on module B, module B should be listed before module A in the CSV. 4. ERROR HANDLING IN THE SCRIPT When writing the script to dynamically append .ko modules from ko_order_table.csv to BOARD_*_KERNEL_MODULES in gki_ko.mk, proper error handling should be implemented. This includes handling cases where the CSV file is missing, has incorrect formatting, or when a module cannot be found. Bash #!/bin/bash if [ ! -f "ko_order_table.csv" ]; then echo "Error: ko_order_table.csv not found." exit 1 fi Updating .ko by Modifying ko_order_table.csv Instead of gki_ko.mk 1. CENTRALIZED CONFIGURATION ko_order_table.csv provides a centralized location for managing kernel module information. By modifying this file instead of gki_ko.mk, it becomes easier to keep track of changes related to kernel modules. All module - specific information is in one place, which simplifies the process of adding, removing, or updating modules. For example, if multiple developers are working on different aspects of the system, they can all refer to the same ko_order_table.csv file to manage kernel modules. 2. AVOIDING MANUAL EDITS IN GKI_KO.MK gki_ko.mk is often a more complex makefile that may have other build - related rules and dependencies. Manually editing it to add or update .ko modules can introduce errors, especially if the makefile has a complex structure. Modifying ko_order_table.csv reduces the risk of accidentally breaking other parts of the build process. For instance, if a developer makes a mistake while manually adding a module entry in gki_ko.mk, it may cause the entire build to fail. 3. EASIER VERSION CONTROL Since ko_order_table.csv is a simple text - based file, it is easier to manage in version control systems. Changes made to the file can be easily tracked, and it is straightforward to roll back changes if necessary. For example, in a Git repository, it is easier to see the differences in ko_order_table.csv between different commits compared to a complex Makefile like gki_ko.mk.
12-26
“ AnalysisSetup Analysis.Setup.Scalar.Datum.Value 0 0.05 0 3.05E-12 8.85E-12 7.50E-13 6.94E-08 AnalysisSetup Analysis.Setup.Scalar.Datum.Unit V V V A A A A AnalysisSetup Analysis.Setup.Preference.GraphVisible true AnalysisSetup Analysis.Setup.Preference.ListVisible true AnalysisSetup Analysis.Setup.Preference.ScalarVisible true AnalysisSetup Analysis.Setup.Title VTL Dimension1 121 121 121 121 121 121 121 Dimension2 1 1 1 1 1 1 1 DataName Vg Vd Vs Id Is Ig Ib DataValue 0 0.05 0 8.85E-12 7.50E-13 3.05E-12 6.94E-08 DataValue 0.01 0.05 0 2.30E-12 0 1.60E-12 6.87E-08 DataValue 0.02 0.05 0 7.00E-13 9.00E-13 -5.05E-12 1.99E-08 DataValue 0.03 0.05 0 4.50E-12 -6.55E-12 -4.20E-12 2.70E-08 DataValue 0.04 0.05 0 -3.30E-12 -8.45E-12 -4.20E-12 -1.13E-07 DataValue 0.05 0.05 0 2.55E-12 -5.15E-12 -3.00E-13 1.43E-08 DataValue 0.06 0.05 0 6.05E-12 -3.45E-12 5.00E-14 -7.35E-08 DataValue 0.07 0.05 0 8.85E-12 -9.65E-12 8.50E-13 -1.23E-07 DataValue 0.08 0.05 0 2.09E-11 2.35E-12 9.50E-12 1.04E-07 DataValue 0.09 0.05 0 4.20E-12 -3.59E-11 -1.70E-11 -2.13E-07 DataValue 0.1 0.05 0 2.17E-11 -2.12E-11 -9.00E-13 -3.42E-09 DataValue 0.11 0.05 0 2.36E-11 -2.43E-11 2.35E-12 -6.03E-08 DataValue 0.12 0.05 0 3.59E-11 -3.97E-11 -2.40E-12 -1.20E-07 ” 我想要写一个python代码实现下面的功能:读入一些或一个csv文件的内容,如上述所示出部分内容,当识别到DataName时,这一行有: Vg Vd Vs Id Is Ig Ib,几个字符,其下的行有DataValue则对应DataName行的数值,①想要生成一个ppt,将一个文件夹地址下的所有.csv文件的内容读出,ppt名字为文件夹的名字,ppt中,这个文件夹里面有多少个.csv文件则新建多少个页面,这些页面中会标注这是哪个.csv文件(即.csv文件的名字)。②ppt中每一页中根据对应.csv文件里的内容画四个折线图,以 Vg为横坐标, Id Is Ig Ib各为纵坐标画4各折线图,各折线图的坐标轴范围根据数值的范围而定。③说明:当只选择一个.csv文件时,ppt名字为其文件名以及内部只有一页。还有说明:创建的ppt放入文件所在的地址。④创建桌面快捷方式的地方,代码第一次运行则在桌面创建一个桌面快捷方式打开界面可以选择文件夹或者单独选择一个.csv文件然后识别内容并在ppt中画图。⑤ppt有模板,放在了和代码一样的文件夹中,根据.csv文件产生页都用模板ppt的第二页一样的,.csv文件文件名替换第二页中写的(CSV文件名)文本框(即位置不变),每一页新加的四个图用2*2的排布放在ppt页面中,四个图对应的图的图标是:Id/Vg Is/Vg Ig/Vg Ib/Vg。⑥CSV文件内容是不规则的,每一行的列数是不同的,也会有完全的空白行,这样的文件也可以正确的读入然后生成图表。 下面的代码是我尝试实现上述功能写的,但是还有些地方不完善:⑦基于模板ppt,在界面选择文件夹或文件,点击生成ppt后,ppt产生的图表和添加的页面不对,没有将每个文件名替换对应页的'CSV文件名”。我们需要的是:每个csv文件的四个图在第二页及以后添加页面(利用第二页原来的模板,多少个csv文件就复制多少个第二页),原来模板的第三页(即最后一页)一直是最后一页。 请你基于下面的代码继续完善,添加ppt模板的内容,但是不要修改桌面快捷方法和界面的地方 import os import csv import tkinter as tk from tkinter import filedialog, messagebox from pptx import Presentation from pptx.chart.data import XyChartData from pptx.enum.chart import XL_CHART_TYPE from pptx.util import Inches import sys import win32com.client from win32com.client import Dispatch import ctypes import re from pptx.enum.shapes import MSO_SHAPE_TYPE def create_desktop_shortcut(): """在桌面创建快捷方式""" desktop = os.path.join(os.path.join(os.environ['USERPROFILE']), 'Desktop') shortcut_name = "CSV_to_PPT.lnk" shortcut_path = os.path.join(desktop, shortcut_name) if not os.path.exists(shortcut_path): python_exe = sys.executable script_path = os.path.abspath(__file__) shell = Dispatch('WScript.Shell') shortcut = shell.CreateShortCut(shortcut_path) shortcut.TargetPath = python_exe shortcut.Arguments = f'"{script_path}"' shortcut.WorkingDirectory = os.path.dirname(script_path) shortcut.IconLocation = python_exe shortcut.Description = "CSV to PPT Converter" shortcut.save() print(f"快捷方式已创建: {shortcut_path}") return shortcut_path def check_and_create_shortcut(): """检查并创建桌面快捷方式""" desktop = os.path.join(os.path.join(os.environ['USERPROFILE']), 'Desktop') shortcut_name = "CSV_to_PPT.lnk" shortcut_path = os.path.join(desktop, shortcut_name) if not os.path.exists(shortcut_path): return create_desktop_shortcut() return shortcut_path def parse_csv(file_path): """解析CSV文件,提取Vg, Id, Is, Ig, Ib数据""" vg = [] ids = [] ig = [] ib = [] iss = [] # 用于检测列名的正则表达式 vg_pattern = re.compile(r"Vg", re.IGNORECASE) id_pattern = re.compile(r"Id", re.IGNORECASE) is_pattern = re.compile(r"Is", re.IGNORECASE) ig_pattern = re.compile(r"Ig", re.IGNORECASE) ib_pattern = re.compile(r"Ib", re.IGNORECASE) data = [] encodings = ['utf-8', 'gbk', 'latin1', 'cp1252', 'utf-16'] for enc in encodings: try: with open(file_path, 'r', encoding=enc, newline='') as f: reader = csv.reader(f) for row in reader: # 保留空行(空行会被读作空列表) data.append(row) data_started = False headers_found = False column_indices = {} # 存储各列索引 for row in data: # 跳过空行和注释行 if not row or not any(row) or row[0].startswith("#"): continue # 检测DataName行 if row[0] == "DataName": # 查找各列的位置 for idx, col in enumerate(row): if vg_pattern.search(col): column_indices['Vg'] = idx elif id_pattern.search(col): column_indices['Id'] = idx elif is_pattern.search(col): column_indices['Is'] = idx elif ig_pattern.search(col): column_indices['Ig'] = idx elif ib_pattern.search(col): column_indices['Ib'] = idx # 检查是否找到必要列 if 'Vg' in column_indices and 'Id' in column_indices: data_started = True headers_found = True continue # 检测DataValue行 if data_started and row[0] == "DataValue" and headers_found: try: # 处理不规则的列数 if len(row) > max(column_indices.values()): vg_val = row[column_indices['Vg']] id_val = row[column_indices['Id']] # 尝试转换为浮点数 vg.append(float(vg_val)) ids.append(float(id_val)) # 可选列处理 if 'Is' in column_indices: is_val = row[column_indices['Is']] iss.append(float(is_val) if is_val.strip() else 0.0) else: iss.append(0.0) if 'Ig' in column_indices: ig_val = row[column_indices['Ig']] ig.append(float(ig_val) if ig_val.strip() else 0.0) else: ig.append(0.0) if 'Ib' in column_indices: ib_val = row[column_indices['Ib']] ib.append(float(ib_val) if ib_val.strip() else 0.0) else: ib.append(0.0) except (ValueError, IndexError) as e: print(f"解析错误: {e}, 行: {row}") continue elif not headers_found and any(col in row for col in ['Vg', 'Vd', 'Vs', 'Id', 'Is', 'Ig', 'Ib']): # 尝试在没有DataName行的情况下检测列名 for idx, col in enumerate(row): if vg_pattern.search(col): column_indices['Vg'] = idx elif id_pattern.search(col): column_indices['Id'] = idx elif is_pattern.search(col): column_indices['Is'] = idx elif ig_pattern.search(col): column_indices['Ig'] = idx elif ib_pattern.search(col): column_indices['Ib'] = idx if 'Vg' in column_indices and 'Id' in column_indices: data_started = True headers_found = True return { 'Vg': vg, 'Id': ids, 'Is': iss, 'Ig': ig, 'Ib': ib } except UnicodeDecodeError: continue except Exception: continue def find_and_replace_text(slide, search_text, replace_text): """在幻灯片中查找并替换文本""" for shape in slide.shapes: if shape.has_text_frame: for paragraph in shape.text_frame.paragraphs: for run in paragraph.runs: if search_text in run.text: run.text = run.text.replace(search_text, replace_text) def add_chart_to_slide(slide, vg, values, title, x_label, y_label, left, top, width=3.5, height=2.5): """添加图表到幻灯片""" # 创建图表数据 chart_data = XyChartData() series = chart_data.add_series('Data') # 添加数据点 for x, y in zip(vg, values): series.add_data_point(x, y) # 添加图表 chart = slide.shapes.add_chart( XL_CHART_TYPE.XY_SCATTER_SMOOTH_NO_MARKERS, Inches(left), Inches(top), Inches(width), Inches(height), chart_data ).chart # 设置图表标题 chart.has_title = True chart.chart_title.text_frame.text = title chart.chart_title.text_frame.paragraphs[0].font.size = Inches(0.15) # 设置坐标轴标签 chart.category_axis.has_title = True chart.category_axis.axis_title.text_frame.text = x_label chart.category_axis.axis_title.text_frame.paragraphs[0].font.size = Inches(0.12) chart.value_axis.has_title = True chart.value_axis.axis_title.text_frame.text = y_label chart.value_axis.axis_title.text_frame.paragraphs[0].font.size = Inches(0.12) # 设置坐标轴范围 if vg and values: min_x = min(vg) max_x = max(vg) min_y = min(values) max_y = max(values) # 计算边界 x_margin = (max_x - min_x) * 0.1 if max_x > min_x else abs(min_x) * 0.1 or 0.1 y_margin = (max_y - min_y) * 0.1 if max_y > min_y else abs(min_y) * 0.1 or 0.1 # 设置X轴范围 if min_x == max_x: chart.category_axis.minimum_scale = min_x - 0.1 chart.category_axis.maximum_scale = max_x + 0.1 else: chart.category_axis.minimum_scale = min_x - x_margin chart.category_axis.maximum_scale = max_x + x_margin # 设置Y轴范围 if min_y == max_y: chart.value_axis.minimum_scale = min_y - 1e-10 chart.value_axis.maximum_scale = max_y + 1e-10 else: chart.value_axis.minimum_scale = min_y - y_margin chart.value_axis.maximum_scale = max_y + y_margin return chart def create_ppt(csv_files, output_dir): """创建PPT并添加图表""" # 获取模板路径 script_dir = os.path.dirname(os.path.abspath(__file__)) template_path = os.path.join(script_dir, "PPT模板.pptx") if not os.path.exists(template_path): messagebox.showerror("错误", f"未找到模板文件: {template_path}") return None # 加载模板 prs = Presentation(template_path) # 检查模板是否包含足够的页面 if len(prs.slides) < 2: messagebox.showerror("错误", "模板文件需要至少2页") return None # 设置PPT名称 if len(csv_files) == 1: ppt_name = os.path.splitext(os.path.basename(csv_files[0]))[0] + ".pptx" else: folder_name = os.path.basename(os.path.dirname(csv_files[0])) ppt_name = f"{folder_name}.pptx" output_path = os.path.join(output_dir, ppt_name) # 获取模板的第二页布局 base_slide_layout = prs.slides[1].slide_layout # 存储最后一页以便最后添加 last_slide = None if len(prs.slides) > 2: last_slide = prs.slides[-1] # 为每个CSV文件创建新页面 for i, file_path in enumerate(csv_files): filename = os.path.splitext(os.path.basename(file_path))[0] data = parse_csv(file_path) if not data or not data.get('Vg'): print(f"跳过无数据文件: {filename}") continue vg = data['Vg'] # 复制模板第二页创建新幻灯片 if i == 0: # 第一个文件使用原始第二页 slide = prs.slides[1] else: # 后续文件创建新页面 slide = prs.slides.add_slide(base_slide_layout) # 替换CSV文件名 find_and_replace_text(slide, "CSV文件名", filename) # 清除可能存在的旧图表 shapes_to_delete = [] for shape in slide.shapes: if shape.has_chart: shapes_to_delete.append(shape) for shape in shapes_to_delete: sp = shape._element sp.getparent().remove(sp) # 2x2布局位置(左上、右上、左下、右下) positions = [ (0.5, 1.5), # 左上 (4.5, 1.5), # 右上 (0.5, 5.0), # 左下 (4.5, 5.0) # 右下 ] # 图表标题和Y轴标签 chart_titles = ['Id/Vg', 'Is/Vg', 'Ig/Vg', 'Ib/Vg'] chart_ylabels = ["Id (A)", "Is (A)", "Ig (A)", "Ib (A)"] data_keys = ['Id', 'Is', 'Ig', 'Ib'] # 添加四个图表 for j in range(4): key = data_keys[j] values = data.get(key, []) # 如果数据存在则添加图表 if values and len(values) == len(vg): add_chart_to_slide( slide, vg, values, chart_titles[j], "Vg (V)", chart_ylabels[j], positions[j][0], positions[j][1] ) # 恢复原始最后一页 if last_slide: # 将原始最后一页移到末尾 prs.slides._sldIdLst.append(last_slide._element) # 保存PPT try: prs.save(output_path) return output_path except Exception as e: messagebox.showerror("错误", f"保存PPT失败: {str(e)}") return None def select_folder(): """选择包含CSV文件的文件夹""" folder_path = filedialog.askdirectory(title="选择包含CSV的文件夹") if folder_path: entry_path.delete(0, tk.END) entry_path.insert(0, folder_path) global selected_files selected_files = [ os.path.join(folder_path, f) for f in os.listdir(folder_path) if f.lower().endswith('.csv') ] def select_file(): """选择单个或多个CSV文件""" file_paths = filedialog.askopenfilenames( title="选择CSV文件", filetypes=[("CSV Files", "*.csv")] ) if file_paths: if len(file_paths) == 1: display_path = file_paths[0] else: display_path = os.path.dirname(file_paths[0]) entry_path.delete(0, tk.END) entry_path.insert(0, display_path) global selected_files selected_files = list(file_paths) def convert_to_ppt(): """转换选中的CSV文件为PPT""" if not selected_files: messagebox.showwarning("警告", "请先选择文件或文件夹!") return # 确定输出目录 if len(selected_files) == 1: output_dir = os.path.dirname(selected_files[0]) else: output_dir = os.path.dirname(selected_files[0]) try: output_path = create_ppt(selected_files, output_dir) if output_path: messagebox.showinfo("成功", f"PPT已生成:\n{output_path}") except Exception as e: messagebox.showerror("错误", f"生成PPT时出错:\n{str(e)}") # GUI初始化代码保持不变 def main(): check_and_create_shortcut() whnd = ctypes.windll.kernel32.GetConsoleWindow() if whnd != 0: ctypes.windll.user32.ShowWindow(whnd, 0) global root, entry_path, selected_files root = tk.Tk() root.title("CSV转PPT工具") root.geometry("500x300") selected_files = [] tk.Label(root, text="选择路径:").pack(pady=10) entry_path = tk.Entry(root, width=50) entry_path.pack(padx=10, pady=5) button_frame = tk.Frame(root) button_frame.pack(pady=10) btn_folder = tk.Button( button_frame, text="选择文件夹", width=18, command=select_folder ) btn_folder.pack(side=tk.LEFT, padx=5) btn_file = tk.Button( button_frame, text="选择文件", width=18, command=select_file ) btn_file.pack(side=tk.LEFT, padx=5) btn_convert = tk.Button( root, text="生成PPT", width=22, command=convert_to_ppt, bg="#4CAF50", fg="white", font=("Arial", 10, "bold") ) btn_convert.pack(pady=15) info_text = ( "功能说明:\n" "1. 选择文件夹: 处理该文件夹下所有CSV文件\n" "2. 选择文件: 处理选中的CSV文件\n" "3. PPT将保存在文件所在目录\n" "4. 单个文件: PPT以文件名命名\n" "5. 多个文件: PPT以文件夹名命名" ) tk.Label(root, text=info_text, justify=tk.LEFT).pack(pady=10) root.mainloop() if __name__ == "__main__": main()
11-06
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值