python-docx解析文档报错：There is no item named ‘word/NULL‘ in the archive解决方法

最新推荐文章于 2025-03-21 20:15:08 发布

原创最新推荐文章于 2025-03-21 20:15:08 发布 · 2.7k 阅读

14 ·

CC 4.0 BY-SA版权

文章标签：

#word #python

该文章已生成可运行项目，

解析word文档，用python-docx打开docx文件时，报错：

KeyError: "There is no item named 'word/NULL' in the archive"

接着，尝试用office打开正常，zip解压也正常的情况。询问GPT没有得到合理解答的情况下，上网搜索找到了原因，主要是由于docx文件中引用的图片或object找不到对应的原始原件造成的。

具体错误定位：将docx文件解压以后，定位到word->_rels。打开document.xml.rels文件以后，查找NULL，会发现某一行的 Relationship Target="../NULL"。打开word文件定位到对应的位置也会发现文中有提示对应的应用部件无法展示。

解决方法：

重写load_from_xml函数，具体的操作就是将以下代码复制到"doc = Document('xx.docx')"这个代码之前。另外，在后续提取图片或对象的时候还是会报错KeyError，用try捕获异常，跳过这个元素即可。

from docx.opc.pkgreader import _SerializedRelationships, _SerializedRelationship
from docx.opc.oxml import parse_xml


def load_from_xml_v2(baseURI, rels_item_xml):
    """
    Return |_SerializedRelationships| instance loaded with the
    relationships contained in *rels_item_xml*. Returns an empty
    collection if *rels_item_xml* is |None|.
    """
    srels = _SerializedRelationships()
    if rels_item_xml is not None:
        rels_elm = parse_xml(rels_item_xml)
        for rel_elm in rels_elm.Relationship_lst:
            if rel_elm.target_ref in ('../NULL', 'NULL'):
                continue
            srels._srels.append(_SerializedRelationship(baseURI, rel_elm))
    return srels


_SerializedRelationships.load_from_xml = load_from_xml_v2

参考资料：

python读取word文档报错：KeyError: “There is no item named ‘word/NULL‘ in the archive“_keyerror: "there is no item named 'null' in the ar-优快云博客

Open Word docx file with "The image part with relationship rID8 was not found" error, it always fails · Issue #1105 · python-openxml/python-docx · GitHub

写在最后：最近在做通用文档解析，发现网上很多资料都比较零散，基本就是东找找西找找，再问问GPT等，一步步实现。欢迎有此烦恼的同学一起交流~

本文章已经生成可运行项目