Handling Microsoft Excel file format

本文讨论了在Qt应用程序中处理Microsoft Excel文档的各种选项,包括使用Excel自身、ODBC、独立解析库等方法进行读写操作。

Handling Microsoft Excel file format

TODO: If you know more about the container format, and whether it really needs a specialized library for processing, please expand this section.
TODO: Tips for displaying Excel documents which were manually parsed using one of the methods described.
TODO: If you know whether Excel provides a "viewer" ActiveX control that can be embedded in a Qt application through ActiveQt, please fill out this section (including links to relevant resources).

En Ar Bg De El Es Fa Fi Fr Hi Hu It Ja Kn Ko Ms Nl Pl Pt Ru Sq Th Tr Uk Zh

This page discusses various available options for working with Microsoft Excel documents in your Qt application. Please also read the general considerations outlined on the Handling Document Formats page.

One needs to distinguish between two different formats (this page deals with both of them):

 Legacy "Excel Spreadsheet" format"Office Open XML Workbook" format
classification:binary BIFF-basedXML-based
main filename extension:.xls.xlsx
main internet media type:application/vnd.ms-excelapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheet
default format of Excel:until Excel 2003since Excel 2007

Reading / Writing

Using Excel itself

If you are exclusively targeting the Windows platform and Microsoft Excel will be installed on all target machines, then you can use Qt's ActiveX framework to access Excel's spreadsheet processing functionality through OLE automation. For an introductory code example (and a way to list the API provided by the Excel COM object), consult this how-to.

 DLL file nameCOM object nameplatformslicense
Microsoft Excel ?Excel.ApplicationWindowscommercial

Using ODBC

To read an Excel file with ODBC (tested on Windows 7 with QT 4.7.1 and Windows 10 with QT 5.7) :

QSqlDatabase db = QSqlDatabase::addDatabase("QODBC", "xlsx_connection");
db.setDatabaseName("DRIVER={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};DBQ=" + QString("c:\\path\\to\\your\\file\\file.xlsx"));
if(db.open())
{
 QSqlQuery query("select * from [" + QString("Sheet1") + "$]"); // Select range, place A1:B5 after $
 while (query.next())
 {
 QString column1= query.value(0).toString();
 qDebug() << column1;
 }
db.close();
QSqlDatabase::removeDatabase("xlsx_connection");
}

The above code print all of column1's values tp the debug output. It works for *.xls and *.xlsx and the other excel file formats.

By default OBDC uses the first row as names for the columns, you are supposed to be able to change this with the 'FirstRowHasNames' option in the connection settings, however there is a bug (see KB288343). Keep in mind that you are using a database and that each column has his own datatype, so if your second row contains text and your third row contains numbers, sql will pick one of these datatypes. If a few rows contain text and the rest of them contain floating numbers, sql will make the text appear and will make the numbers disappear.

NOTE: To use ODBC on Windows, the MS Access Database Engine has to be installed. You can find it here: Microsoft Access Database Engine 2010. The Engine is maybe distributed with a MS Office Access installation, but on this should not be relied on. In Addition, you should regard that a 64 bit application can only use the 64 bit Engine and so for 32 bit accordingly. That’s why you maybe install both versions to avoid problems. Furthermore, the Engine should not be confused with the MS Access Runtime which contains the Engine.

Using independent parser/writer libraries

For a more portable solution, you could take a look at some of the available third-party C/C++ libraries for parsing/writing Excel files:

 API.xls.xlsxreadingwritingplatformslicense
Qt XlsxC++

Qt

noyesyesyesWin, Mac, Linux, …MIT [weak copyleft]
xlsLibC++yesnonoyesWin, Mac, Linux, …LGPL v3 [weak copyleft]
libxlsCyesnoyesnoWin, Mac, Linux, …LGPL [weak copyleft]
LibXLC++yesyesyesyesWin, Mac, Linux, …commercial
qtXLSCyesnoyesyesWin, ?commercial
FreeXLCyesnoyesnoLinux, ?LGPL / MPL [weak copyleft]
BasicExcelC++yesnoyesyes ? ?
Number DuckC++yesnoyesyesWin, Linuxcommercial

Note that these libraries differ in their scope and general approach to the problem.

Using manual XML processing

Files using the XML-based (.xlsx) format could be processed using Qt's XML handling classes (see Handling Document Formats). Third-party libraries can help you in dealing with the container format that wraps the actual XML files:

 APIsupported platformslicense
libopcCWin, Mac, Linux, …permissive

Using batch conversion tools

If all else fails, there is always the option of using an existing tool to automatically convert between Excel files and a more manageable format, and let your Qt application deal with that format instead. The conversion tool could be bundled with your application or specified as a prerequisite, and controlled via Doc:QProcess. Some possibilities are:

 .xls to *.xlsx to **to .xls*to .xlsxplatformslicense
LibreOffice.ods .csv ….ods .csv ….ods .csv ….ods .csv …Win, Mac, Linux, …GPL v3 [strong copyleft]

Notes:

LibreOffice can be used like this for batch conversion (it's slow, though):
soffice invisible -convert-to xls test.ods

See Also

import logging import os import tempfile import time import re import shutil import traceback from pathlib import Path import openpyxl import psutil from typing import List import pythoncom import win32com from kotei_omp.driver.excel_standard import ExcelStandardParser from kotei_omp.driver.excel_timing import ExcelTimingParser 执行patch脚本, 修复三方库openpyxl的问题 from kotei_omp.utils.python_patch import fix_openpyxl_descriptors_base_bug from win32com.client.gencache import GetGeneratePath fix_openpyxl_descriptors_base_bug() from kotei_omp.data.block import DocumentBlockObject class Parser: def __init__(self): self._openpyxl_wb = None self._win32com_wb = None self._excel_app = None """excel文档解析器""" def _get_bus_type(self, file_path): """ 根据文件名获取excel的解析器对象 """ file_name = os.path.basename(file_path) # 文件名中是否有TIMING if "TIMING" in file_name.upper(): return 13, ExcelTimingParser() return 0, ExcelStandardParser() def parse_document(self, file_path) -> List[DocumentBlockObject]: """ 解析excel文档入口 :param file_path: excel文件路径 :return: 返回解析后的文档数据。文档数据包含了文档信息、布局信息、样式信息和内容信息 """ self._init_excel_app() # Convert .xls to .xlsx if needed if file_path.lower().endswith('.xls'): file_path = self._convert_xls_to_xlsx(file_path) self._init_workbook(file_path) try: # 加载excel解析器对象 bus_type, parser = self._get_bus_type(file_path) parser._openpyxl_wb = self._openpyxl_wb parser._win32com_wb = self._win32com_wb parser._excel_app = self._excel_app parser._read_document(file_path) # 解析文档 blocks = parser.parse_document(file_path) # print(blocks) logging.debug('parse document end') return blocks except Exception as e: logging.error(traceback.format_exc()) logging.error(f"parse error {e}") finally: self._release_resources() if file_path.startswith(str(tempfile.tempdir)): os.remove(file_path) def _init_excel_app(self): if self._excel_app is not None: return try: # 初始化 openpyxl 的 Workbook # 初始化 COM 环境和 Workbook pythoncom.CoInitialize() # 初始化 COM 库(用于在多线程环境中正确初始化和释放资源) self._excel_app = win32com.client.DispatchEx("Excel.Application") self._excel_app.Visible = False # 隐藏 Excel GUI self._excel_app.DisplayAlerts = False self._excel_app.DisplayAlerts = False # 禁用提示框 except pythoncom.com_error as com_error: logging.error(f"错误来源: {com_error.source}") # 错误来源(通常是 COM 组件) logging.error(f"错误描述: {com_error.excepinfo[2]}") # COM 返回的错误描述 raise except Exception as e: print(f"发生未知错误: {e}") # 如果任何异常发生,释放资源 self._release_resources() raise def _init_workbook(self, file_path): if self._excel_app is None: self._init_excel_app() try: # 初始化 openpyxl 的 Workbook self._openpyxl_wb = openpyxl.load_workbook(file_path, rich_text=True, data_only=True) # 打开 Excel 文件作为 COM 对象 self._win32com_wb = self._excel_app.Workbooks.Open(file_path) except pythoncom.com_error as com_error: logging.error(f"错误来源: {com_error}") # 错误来源(通常是 COM 组件) logging.error(f"错误描述: {com_error.excepinfo[2]}") # COM 返回的错误描述 # 如果 COM 初始化失败,释放资源 self._release_resources() raise except Exception as e: print(f"发生未知错误: {e}") # 如果任何异常发生,释放资源 self._release_resources() raise def _convert_xls_to_xlsx(self, file_path): path = Path(file_path) temp_dir = Path(tempfile.gettempdir()) # 获取临时目录路径 target_path = os.path.join(temp_dir, f"{path.stem}.xlsx") # 如果目标文件存在,先删除它 if Path(target_path).exists(): os.remove(target_path) wb = self._excel_app.Workbooks.Open(path.absolute()) # FileFormat=51 is for .xlsx extension target_path = str(target_path) wb.SaveAs(target_path, FileFormat=51) # 关闭工作簿,不保存更改 wb.Close(False) return target_path def _release_resources(self): """释放资源,确保 openpyxl 和 COM 对象正确关闭.""" try: # 关闭 openpyxl 的 Workbook 对象 if self._openpyxl_wb is not None: self._openpyxl_wb.close() # 关闭 win32com 的 Workbook 和 Excel.Application if self._win32com_wb is not None: self._win32com_wb.Close(SaveChanges=False) if self._excel_app is not None: self._excel_app.Quit() finally: # 解除对 COM 对象的引用 self._openpyxl_wb = None self._win32com_wb = None self._excel_app = None # 解除 COM 库的初始化 pythoncom.CoUninitialize() if name == ‘main’: # r"D:\桌面文件\fengtian\line13\before\MET-G_TIMING-CSTD-A0-00-B-C0变更前.xls", # r"D:\桌面文件\fengtian\line13\after\MET-G_TIMING-CSTD-A0-00-B-C0变更后.xlsx" paths = [ #r"D:\Users\kotei\Desktop\testcase\4_dafa_12\before\90X_变更前脱敏.xlsx" # r"D:\33本\29_MIN\before\MET-M_STSW-CSTD-0-24-A-C5-2_添付資料_GUI表示制御仕様.xlsx" # r"D:\33本\line29\before\MET-M_STSW-CSTD-0-24-A-C5-2_添付資料_GUI表示制御仕様.xlsx" r"D:\GitProjects\test\data_zh\excel\before\Excel_b.xlsx" ] for file_path in paths: file_name, ext = os.path.splitext(file_path) # 每个worksheet对应一个block对象, 返回block对象列表 data = Parser().parse_document(file_path) # 保存数据 # for block in data: # with open(file_name + " " + block.name + “.json”, “w”, encoding=“utf-8”) as f: # f.write(block.to_json()) for sheet in data: print(f"sheet_name {sheet._name} texts {len(sheet.texts)} tables {len(sheet.tables)} " f"timing_texts {len(sheet.timing_texts)} timing_waves {len(sheet.timing_waves)} " f"pictures {len(sheet.pictures)} graphics {len(sheet.graphics)}") # 保证合并单元格的内容一致 # content_dict = {} # text_dict = {} # for (index, table) in enumerate(sheet.tables): # for row in table.rows: # for cell in row.cells: # if len(cell.merged_ranges) > 0: # a, b, c, d = cell.merged_ranges # key = (index, a, b, c, d) # if key not in content_dict: # content_dict[key] = len(cell.content) # text_dict[key] = cell.text # else: # if len(cell.content) != content_dict[key]: # print(f"desc: {cell.coordinate.desc}, merged: {cell.merged_ranges}, content length: {len(cell.content)}, expect length: {content_dict[key]}") # if cell.text != text_dict[key]: # print( # f"desc: {cell.coordinate.desc}, merged: {cell.merged_ranges}, text: {cell.text}, expect_text: {text_dict[key]}") # 已连接到 pydev 调试器(内部版本号 243.25659.43)Traceback (most recent call last): File “D:\GitProjects\REQManagement\Venv\Venv\RRM_env\Lib\site-packages\kotei_omp\driver\excel.py”, line 88, in _init_excel_app self.excel_app = win32com.client.DispatchEx(“Excel.Application”) File "D:\GitProjects\REQManagement\Venv\Venv\RRM_env\lib\site-packages\win32com\client_init.py", line 142, in DispatchEx dispatch = pythoncom.CoCreateInstanceEx( pywintypes.com_error: (-2147221005, ‘无效的类字符串’, None, None) During handling of the above exception, another exception occurred: Traceback (most recent call last): File “D:\GitProjects\REQManagement\Venv\Venv\RRM_env\Lib\site-packages\kotei_omp\driver\excel.py”, line 93, in _init_excel_app logging.error(f"错误来源: {com_error.source}") # 错误来源(通常是 COM 组件) AttributeError: ‘com_error’ object has no attribute ‘source’ python-BaseException
最新发布
10-24
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值