We sometimes need to generate PDFs from invoices, reports, or tickets we emit to clients. If you are in a case where your clients can see these documents from a web application, you may not want to recreate it from scratch using tools specifically designed for this task like ReportLab. In this article, I will show you a library capable of transforming HTML pages into beautiful PDFs. This library is called WeasyPrint.
Installation
You will need Python 3.8 or higher to use this library. But before trying to install it with your favorite package manager, there is an external dependency called Pango that helps with text rendering. The method to install it will vary between platforms. The official documentation has a dedicated section for it that I encourage you to read, but here are the instructions for common distributions.
# ubuntu
$ sudo apt install pango1.0-tools
# macOS
$ brew install pango
On Windows, you will need to install GTK3 installer to have Pango automatically installed.
When that is done, you can now install WeasyPrint with your favorite package manager.
# with pip
$ pip install weasyprint
# with uv
$ uv pip install weasyprint
# with poetry
$ poetry add weasyprint
If you don’t know uv or poetry, I have introduction articles on both of them.
Usage
The usage is pretty straightforward. Let’s see how to create a PDF from the WeasyPrint landing page.
from weasyprint import HTML
HTML(url='https://weasyprint.org/').write_pdf('weasyprint-website.pdf')
Easy Peasy!
Note that you can customize the HTML rendered with your own CSS.
from weasyprint import HTML, CSS
HTML(url='https://weasyprint.org/').write_pdf(
'weasyprint-website.pdf',
stylesheets=[CSS(string='body { font-family: serif !important }')]
)
Instantiation methods
For HTML and CSS instantiation, we can use an absolute URL, a path to a local file, a readable file object, or a string.
import sys
from weasyprint import HTML
# with filename
HTML(filename='../foo.html')
# with URL
HTML(url='https://weasyprint.org')
# with file objects
with open('foo.html', 'r') as f:
HTML(file_obj=f)
HTML(file_obj=sys.stdin)
If your custom CSS involves @font-face
rules, you must create a FontConfiguration
object.
from weasyprint import HTML, CSS
from weasyprint.text.fonts import FontConfiguration
font_config = FontConfiguration()
html = HTML(string='<h1>The title</h1>')
css = CSS(string='''
@font-face {
font-family: Gentium;
src: url(https://example.com/fonts/Gentium.otf);
}
h1 { font-family: Gentium }''', font_config=font_config)
html.write_pdf(
'/tmp/example.pdf', stylesheets=[css],
font_config=font_config
)
Individual pages
The previously generated PDF has 6 pages. Sometimes, we may not want all the pages or split them into many ones. Here is what we can do to select specific pages.
from weasyprint import HTML
document = HTML(url='https://weasyprint.org/').render()
# render a specific page, the first one in this case
# note that it is recommended to create a copy of the document
# before processing individual pages
document.copy(document.pages[:1]).write_pdf('first_page.pdf')
# write odd and even pages separately:
# Lists count from 0 but page numbers usually from 1
# [::2] is a slice of even list indexes but odd-numbered pages.
document.copy(document.pages[::2]).write_pdf('odd_pages.pdf')
document.copy(document.pages[1::2]).write_pdf('even_pages.pdf')
Notes:
For the first page, you may wonder why the copy method takes
document.pages[:1]
as argument. This is becausecopy
always work with an interable.As said in the comments, always create a copy of the pages you want to work with.
Image optimization
By default, WeasyPrint will not do any image optimization, but we can apply them to accelerate the rendering. Here is an example.
from weasyprint import HTML
# Optimized lower-quality images, a bit slower, but generated PDF is smaller
HTML('https://weasyprint.org/').write_pdf(
'weasyprint.pdf', optimize_images=True, jpeg_quality=60, dpi=150
)
Notes:
optimize_images
argument reduces image size with no quality penalty, but the rendering time may be slightly increased.jpeg_quality
argument can be set to decrease the quality of JPEG images included in the PDF. You can set a value between 95 (best quality) to 0 (smaller image size), depending on your needs.dpi
argument offers the possibility to reduce the size (in pixels, and thus in bytes) of all included raster images. The resolution, set in dots per inch, indicates the maximum number of pixels included in one inch on the generated PDF.
We can also cache images that can be reused between different renderings. It will save us some network bandwidth and CPU. 😉
cache = {}
for i in range(10):
HTML(f'https://www.goodreads.com/quotes?page={i}').write_pdf(
f'quotes-{i}.pdf', image_cache=cache
)
Web framework integration
When integrating WeasyPrint into a web framework, there may be some challenges with asset retrieval or rights management. Fortunately, there are integrations for some popular web frameworks like Flask or Django. They often come with a custom URL fetcher function to use with WeasyPrint.
Command line interface
There's also a small command-line interface that we can use to create PDFs quickly if we don't have any complicated configuration.
$ weasyprint https://weasyprint.org/ weasyprint.pdf
Before ending this article, I strongly recommend that you read the security section, where you will learn how to deal with untrusted source files, memory, long render times, etc.
This is the end of this article, I hope you enjoyed reading it. Take care and see you soon!🙃