CSV 格式及注意事项

CSV is a delimited data format that has fields/columns separated by the comma character and records/rows separated by newlines. Fields that contain a special character (comma, newline, or double quote), must be enclosed in double quotes. However, if a line contains a single entry which is the empty string, it may be enclosed in double quotes. If a field's value contains a double quote character it is escaped by placing another double quote character next to it. The CSV file format does not require a specific character encoding, byte order, or line terminator format.

  • Each record is one line terminated by a line feed (ASCII/LF=0x0A) or a carriage return and line feed pair (ASCII/CRLF=0x0D 0x0A), however, line-breaks can be embedded.
  • Fields are separated by commas.
1997,Ford,E350
  • In some CSV implementations, leading and trailing spaces or tabs, adjacent to commas, are trimmed. This practice is contentious and in fact is specifically prohibited by RFC 4180, which states, "Spaces are considered part of a field and should not be ignored."
1997,   Ford   , E350
same as
1997,Ford,E350
  • Fields with embedded commas must be delimited with double-quote characters.
1997,Ford,E350,"Super, luxurious truck"
  • Fields with embedded double-quote characters must be delimited with double-quote characters, and the embedded double-quote characters must be represented by a pair of double-quote characters.
1997,Ford,E350,"Super ""luxurious"" truck"
  • Fields with embedded line breaks must be delimited by double-quote characters.
1997,Ford,E350,"Go get one now
they are going fast"
  • Fields with leading or trailing spaces must be delimited by double-quote characters. (See comment about leading and trailing spaces above.)
1997,Ford,E350,"  Super luxurious truck    "
  • Fields may always be delimited by double-quote characters, whether necessary or not.
"1997","Ford","E350"
  • The first record in a csv file may contain column names in each of the fields.
Year,Make,Model
1997,Ford,E350
2000,Mercury,Cougar
   

[edit] Example

1997    Ford        E350                                                   ac, abs, moon                    3000.00 1999    Chevy     Venture "Extended Edition"                                                           4900.00   1996    Jeep       Grand                                                 Cherokee MUST SELL!                                                                                           air, moon roof, loaded       4799.00

The above table of data may be represented in CSV format as follows:

1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1996,Jeep,Grand Cherokee,"MUST SELL!

air, moon roof, loaded",4799.00

This CSV example illustrates that:

  • fields that contain commas, double-quotes, or line-breaks must be quoted,
  • a quote within a field must be escaped with an additional quote immediately preceding the literal quote,
  • space before and after delimiter commas may be trimmed, and
  • a line break within an element must be preserved.
### CSV文件格式说明 CSV(Comma-Separated Values,逗号分隔值)是一种常见的电子表格和数据库导出的纯文本文件格式。它通过简单的结构来存储表格数据,每一行代表一条记录,每条记录中的字段由逗号或其他分隔符分割。 #### 基本特性 - **简单易读**:CSV 文件的内容可以直接被人类阅读,也可以轻松导入到各种软件工具中。 - **通用性强**:几乎所有的电子表格程序(如 Microsoft Excel 和 Google Sheets)、编程语言库以及数据分析框架都支持 CSV 的读写操作。 - **轻量级**:相比二进制文件格式CSV 更加紧凑且易于传输。 --- ### 使用方法及相关注意事项 #### 创建 CSV 文件 可以通过多种方式创建 CSV 文件,例如手动编辑或者利用脚本来生成。下面是一个基于 Python 的示例代码片段用于创建 CSV 文件: ```python import csv data = [['Name', 'Age'], ['Alice', 30], ['Bob', 24]] file_path = './example.csv' with open(file_path, mode='w', newline='', encoding='utf-8') as file: writer = csv.writer(file) writer.writerows(data) ``` 上述代码展示了如何将二维数组保存成标准 UTF-8 编码的 CSV 文件[^1]。 #### 转换其他格式CSV 或反之亦然 当需要与其他类型的文档交互时,可以借助专门函数实现不同格式间的互转。比如从多个独立的 CSV 表格合并为单个 EXCEL 工作簿内的多张工作表的情况可参见以下例子: ```python def csv_to_excel_separate_sheets(csv_files_list, output_file): import pandas as pd with pd.ExcelWriter(output_file) as writer: for idx, csv_f in enumerate(csv_files_list): df = pd.read_csv(csv_f) sheet_name = f'Sheet{idx + 1}' df.to_excel(writer, index=False, sheet_name=sheet_name) # Example usage csv_files = ['my_custom_name_Sheet1.csv'] output_xlsx = 'combined_output.xlsx' csv_to_excel_separate_sheets(csv_files, output_xlsx) ``` 此段落描述了怎样把一系列单独存在的 CSV 文档整合进入同一个 XLSX 文件的不同 Sheet 中去[^2]。 #### 处理常见问题 以下是几个处理 CSV 过程当中可能遇到的技术要点及其解决方案: - **数据格式化**: 对于诸如日期这样的复杂类型,在存入之前应该统一转化为字符串形式并遵循一致的标准模式[^3]。 - **编码兼容性**: 当涉及国际化场景下的文字表达时,务必指定合适的字符集(推荐使用 Unicode 变体之一),防止因不匹配而导致显示异常现象发生。 - **缺失值标记**: 如果某些单元格为空白状态,则建议采用固定的占位词代替空白区域以便后续分析阶段能够正常识别这些位置上的潜在意义所在。 - **特殊字符转义**: 遇到含有引号或者其他控制符号作为组成部分的实际数值时候,记得按照规定加上额外层包裹以免引起解析错误。 --- ### 结论 综上所述,掌握好关于 CSV 格式的理论基础再加上实践技巧可以帮助我们更加高效准确地完成日常工作中涉及到的各种批量型数据管理任务。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值