Generate PDF files from HTML in Python

Weasyprint at our rescue

Mar 10, 2024

black metal empty building — Photo by Ant Rozetsky on Unsplash

We sometimes need to generate PDFs from invoices, reports, or tickets we emit to clients. If you are in a case where your clients can see these documents from a web application, you may not want to recreate it from scratch using tools specifically designed for this task like ReportLab. In this article, I will show you a library capable of transforming HTML pages into beautiful PDFs. This library is called WeasyPrint.

Installation

You will need Python 3.8 or higher to use this library. But before trying to install it with your favorite package manager, there is an external dependency called Pango that helps with text rendering. The method to install it will vary between platforms. The official documentation has a dedicated section for it that I encourage you to read, but here are the instructions for common distributions.

# ubuntu
$ sudo apt install pango1.0-tools

# macOS
$ brew install pango

On Windows, you will need to install GTK3 installer to have Pango automatically installed.

When that is done, you can now install WeasyPrint with your favorite package manager.

# with pip
$ pip install weasyprint

# with uv
$ uv pip install weasyprint

# with poetry
$ poetry add weasyprint

If you don’t know uv or poetry, I have introduction articles on both of them.

Let's talk about uv, a possible future replacement of pip

Kevin Tewouda

Feb 21

Let's talk about uv, a possible future replacement of pip

Charlie Marsh, the creator of Ruff, a fast Python linter, shocked again the Python ecosystem with his new open-source uv. Currently, it (almost) replaces some well-known tools in the Python ecosystem, virtualenv, pip and pip-tools. We will explore this tool in this article.

Read full story

What is new in poetry 1.2

Kevin Tewouda

January 5, 2023

Read full story

Usage

The usage is pretty straightforward. Let’s see how to create a PDF from the WeasyPrint landing page.

from weasyprint import HTML

HTML(url='https://weasyprint.org/').write_pdf('weasyprint-website.pdf')

Easy Peasy!

Note that you can customize the HTML rendered with your own CSS.

from weasyprint import HTML, CSS

HTML(url='https://weasyprint.org/').write_pdf(
    'weasyprint-website.pdf',
    stylesheets=[CSS(string='body { font-family: serif !important }')]
)

Instantiation methods

For HTML and CSS instantiation, we can use an absolute URL, a path to a local file, a readable file object, or a string.

import sys
from weasyprint import HTML

# with filename
HTML(filename='../foo.html')

# with URL
HTML(url='https://weasyprint.org')

# with file objects
with open('foo.html', 'r') as f:
    HTML(file_obj=f)

HTML(file_obj=sys.stdin)

If your custom CSS involves @font-face rules, you must create a FontConfiguration object.

from weasyprint import HTML, CSS
from weasyprint.text.fonts import FontConfiguration

font_config = FontConfiguration()
html = HTML(string='<h1>The title</h1>')
css = CSS(string='''
    @font-face {
        font-family: Gentium;
        src: url(https://example.com/fonts/Gentium.otf);
    }
    h1 { font-family: Gentium }''', font_config=font_config)
html.write_pdf(
    '/tmp/example.pdf', stylesheets=[css],
    font_config=font_config
)

Individual pages

The previously generated PDF has 6 pages. Sometimes, we may not want all the pages or split them into many ones. Here is what we can do to select specific pages.

from weasyprint import HTML

document = HTML(url='https://weasyprint.org/').render()

# render a specific page, the first one in this case
# note that it is recommended to create a copy of the document
# before processing individual pages
document.copy(document.pages[:1]).write_pdf('first_page.pdf')

# write odd and even pages separately:
# Lists count from 0 but page numbers usually from 1
# [::2] is a slice of even list indexes but odd-numbered pages.
document.copy(document.pages[::2]).write_pdf('odd_pages.pdf')
document.copy(document.pages[1::2]).write_pdf('even_pages.pdf')

Notes:

For the first page, you may wonder why the copy method takes document.pages[:1] as argument. This is because copy always work with an interable.
As said in the comments, always create a copy of the pages you want to work with.

Image optimization

By default, WeasyPrint will not do any image optimization, but we can apply them to accelerate the rendering. Here is an example.

from weasyprint import HTML

# Optimized lower-quality images, a bit slower, but generated PDF is smaller
HTML('https://weasyprint.org/').write_pdf(
    'weasyprint.pdf', optimize_images=True, jpeg_quality=60, dpi=150
)

Notes:

optimize_images argument reduces image size with no quality penalty, but the rendering time may be slightly increased.
jpeg_quality argument can be set to decrease the quality of JPEG images included in the PDF. You can set a value between 95 (best quality) to 0 (smaller image size), depending on your needs.
dpi argument offers the possibility to reduce the size (in pixels, and thus in bytes) of all included raster images. The resolution, set in dots per inch, indicates the maximum number of pixels included in one inch on the generated PDF.

We can also cache images that can be reused between different renderings. It will save us some network bandwidth and CPU. 😉

cache = {}
for i in range(10):
    HTML(f'https://www.goodreads.com/quotes?page={i}').write_pdf(
        f'quotes-{i}.pdf', image_cache=cache
    )

Web framework integration

When integrating WeasyPrint into a web framework, there may be some challenges with asset retrieval or rights management. Fortunately, there are integrations for some popular web frameworks like Flask or Django. They often come with a custom URL fetcher function to use with WeasyPrint.

Command line interface

There's also a small command-line interface that we can use to create PDFs quickly if we don't have any complicated configuration.

$ weasyprint https://weasyprint.org/ weasyprint.pdf

Before ending this article, I strongly recommend that you read the security section, where you will learn how to deal with untrusted source files, memory, long render times, etc.

This is the end of this article, I hope you enjoyed reading it. Take care and see you soon!🙃

Woudar's Blog