python 网页转pdf

最新推荐文章于 2025-09-13 11:18:57 发布

CGGAO

最新推荐文章于 2025-09-13 11:18:57 发布

阅读量139

点赞数

CC 4.0 BY-SA版权

文章标签：操作系统运维 python

原文链接：http://www.cnblogs.com/yc-c/p/10415058.html

本文详细介绍了如何在CentOS环境下使用wkhtmltopdf和其Python封装pdfkit来将HTML、URL和文本转换成PDF文件。包括安装步骤、基本用法、高级选项配置及如何处理多个输入源。

主要使用的是wkhtmltopdf的Python封装——pdfkit

centos环境

安装：Install python-pdfkit

pip install pdfkit

安装：Install wkhtmltopdf

yum intsall wkhtmltopdf

windows环境下安装wkhtmltopdf参考这篇文章：

http://blog.youkuaiyun.com/qq_14873105/article/details/51394026


Linux环境下安装wkhtmltopdf参考这篇文章：

http://blog.youkuaiyun.com/mr_zing/article/details/52833461


使用：

import   pdfkit
pdfkit.from_string('hello,python','out.pdf') #通过文本直接进行转换
pdfkit.from_url('http://baidu.com','out.pdf') #通过网址进行转换
pdfkit.from_file('test.html', 'out.pdf') #通过html文件进行转换

我们也可以传递一个url或者文件名列表：

pdfkit.from_url(['google.com', 'yandex.ru', 'engadget.com'], 'out.pdf') pdfkit.from_file(['file1.html', 'file2.html'], 'out.pdf')

也可以传递一个打开的文件：

with open('file.html') as f:
    pdfkit.from_file(f,'out.pdf')

如果想对生成的PDF作进一步处理，我们可以将其读取到一个变量中：

#设置输出文件为False，将结果赋给一个变量
pdf = pdfkit.form_url('http://google.com', False)

我们可以制定所有的 wkhtmltopdf 选项 http://wkhtmltopdf.org/usage/wkhtmltopdf.txt. 我们可以移除选项名字前面的 '--' .如果选项没有值, 使用None, Falseor * 作为字典值:

options = {
 'page-size': 'Letter',
 'margin-top': '0.75in',
 'margin-right': '0.75in',
 'margin-bottom': '0.75in',
 'margin-left': '0.75in',
 'encoding': "UTF-8",
 'no-outline': None
 } 
pdfkit.from_url('http://google.com', 'out.pdf', options=options)

默认情况下, PDFKit 将会显示所有的 wkhtmltopdf 输出. 如果不想看到这些信息，你需要传递一个 quiet 选项:

options = {
 'quiet': ''
 } 
 pdfkit.from_url('google.com', 'out.pdf', options=options)

由于wkhtmltopdf的命令语法 , TOC 和 Cover 选项必须分开指定:

toc = {
 'xsl-style-sheet': 'toc.xsl'
 } 
cover = 'cover.html' 
pdfkit.from_file('file.html', options=options, toc=toc, cover=cover)

当我们转换文件、或字符串的时候，可以通过css选项指定扩展的 CSS 文件。

# 单个 CSS 文件 
css = 'example.css' 
pdfkit.from_file('file.html', options=options, css=css) 
# Multiple CSS files 
css = ['example.css', 'example2.css'] 
pdfkit.from_file('file.html', options=options, css=css)

也可以通过HTML中的meta tags传递任意选项：

body = """
        <html>
          <head>
            <meta name="pdfkit-page-size" content="Legal"/>
            <meta name="pdfkit-orientation" content="Landscape"/>
          </head>
          Hello World!
          </html>
        """ 
pdfkit.from_string(body, 'out.pdf') #with --page-size=Legal and --orientation=Landscape


转载地址：https://www.jianshu.com/p/44ec7a83adcb

转载于:https://www.cnblogs.com/yc-c/p/10415058.html