python读取PPT内容

最新推荐文章于 2024-08-29 04:15:43 发布

原创最新推荐文章于 2024-08-29 04:15:43 发布 · 1.2k 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#python

作业专栏收录该内容

29 篇文章

订阅专栏

本文介绍了一种使用Python将PowerPoint(PPTX)文件转换为Word(DOCX)文档的方法，通过遍历PPTX文件中的每一张幻灯片和每一个文本框，将其内容逐段转换并保存为DOCX格式，但转换后的文档格式杂乱，需要进一步完善。

##作业
##打开哔哩哔哩财报.pptx
##按照paragraph分段，转换为word文档
##保存为哔哩哔哩财报.docx

问题：保存下来的无任何格式可言，杂乱无章，是需要待完善的地方

from pptx import Presentation
from docx import Document

doc=Document()

prs=Presentation('Bilibili 2Q19 Investor Presentation-Final.pptx')
##print(type(prs))  #<class 'pptx.presentation.Presentation'>
for slide in prs.slides:
    for shape in slide.shapes:
        if shape.has_text_frame:
            text_frame=shape.text_frame
            for paragraph in text_frame.paragraphs:
                doc.add_paragraph(paragraph.text)


doc.save('Bilibili 2Q19 Investor Presentation-Final.docx')