python docx tables_python-docx从word docx中提取表

weixin_39994461

于 2020-12-15 14:20:23 发布

阅读量455

点赞数

文章标签： python docx tables

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.youkuaiyun.com/weixin_39994461/article/details/111430266

版权

这篇博客介绍了如何使用Python从.docx文件中提取表格数据，并将其转换为pandas DataFrame。通过遍历表格行和列，将数据存储为字典列表，然后创建DataFrame。还展示了如何使用iloc选择特定行和列。例如，`df.iloc[0, :].tolist()`显示第一行所有列的数据，而`df.iloc[:, 0].tolist()`则显示第一列的所有数据。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

你的代码对我很有用。把它插入数据帧怎么样？import pandas as pd

from docx.api import Document

document = Document('test_word.docx')

table = document.tables[0]

data = []

keys = None

for i, row in enumerate(table.rows):

text = (cell.text for cell in row.cells)

if i == 0:

keys = tuple(text)

continue

row_data = dict(zip(keys, text))

data.append(row_data)

print (data)

df = pd.DataFrame(data)

如何显示该表中的特定行和列？

我们可以使用iloc根据索引提取行和列# iloc[row,columns]

df.iloc[0,:].tolist() # [5,6,7,8] - row index 0

df.iloc[:,0].tolist() # [5,9,13,17] - column index 0

df.iloc[0,0] # 5 - cell(0,0)

df.iloc[1:,2].tolist() # [11,15,19] - column index 2, but skip first row

等等。。。

但是，如果列有名称(在本例中是数字)，则可以这样做：#df["name"].tolist()

df[1].tolist() # [5,6,7,8] - column with name 1print(df)

打印，这就是我的示例文档中表格的外观。1 2 3 4

0 5 6 7 8

1 9 10 11 12

2 13 14 15 16

3 17 18 19 20

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。