My favorite way to configure a Python project
Pydantic-settings at our rescue
The first thing I want to note is that the following recommendations are valid for web applications or API services. For other types of applications like Command Line Interfaces, they might not be the best suit, even if I think they are also relevant.
Also, what I mean by configuration are things that can be changed between deploys or environments like:
database server URLs
secret key used to sign cookies
credentials to connect to an external service
etc…
I don’t include internal application configurations like route URLs that should be handled directly in code.
The are various ways to configure your application. We will see the most common and the one I use and how I use it.
Configuration with files
A common approach I see in various projects is to use files to configure the application. In Python, many formats are supported directly via the standard library or through third packages. The common ones are:
Ini file: This format is well-known in part because it is often used to configure Python packages.
Toml: This format is also used to configure Python packages and is therefore well-known by Pythonistas.
JSON: I think I don’t have to explain why this choice. JSON is everywhere nowadays when APIs need to communicate with users or other APIs, so it is appealing for some developers to use for configuration.
Yaml: Another popular format used by various tools like Ansible for IT automation, Kubernetes for deployment, GitHub Actions for CI/CD, etc…
While these file formats are familiar to Python developers, I don’t prefer this method for the following reasons:
They must not be checked in a version control system like Git to prevent credentials leakage, but it is easy to check them mistakenly. Fortunately, if you do code reviews or use tools to check for potential credentials leakage in your CI pipeline, the risk is mitigated, but it is still there.
They tend to be scattered in different places and we can potentially have different file formats used for the same project.
They are framework/language-related.
They are sometimes multiplied by the number of environments we have in a project, like test, pre-production, production, etc… This can quickly become tricky to follow.
Environment variables
This is my favorite way nowadays to configure an application. They are
Language/OS agnostic, available in all programming languages.
Simple to use.
Easy to change between deploys without needing to change code and therefore the risk of checking credentials in the repository is limited.
There is also an alternative to use a .env file to configure the environment but again I don't like this solution, as it leads back to the problems with the files listed above.
Also, it adheres to the 12-factor app specification and I tend to respect it since it is a proven method for deploying and scaling your applications.
Use environment variables in Python
In Python, we can use the os module to get environment variables in our code.
import os
import json
DATABASE_URL = os.getenv('DATABASE_URL', 'sqlite:///db.sqlite3')
SECRET_KEY = os.getenv('SECRET_KEY', 'mysecretkey')
DEBUG = bool(os.getenv('DEBUG', False))
ALLOWED_HOSTS = json.loads(os.getenv('ALLOWED_HOSTS', '["foo.com", "bar.foo.com"]'))
Easy peasy! But.. you may have noticed a few quirks when getting DEBUG and ALLOWED_HOSTS environment variables. We need to parse them to the correct type and this is often the case because not all configuration variables are strings! 😜
It can quickly become tricky to parse the correct type for environment variables but fortunately, the Python ecosystem has a killer open-source project to tackle this issue, it is named pydantic-settings.
To install it, you will need Python 3.8 or higher.
$ pip install pydantic-settings
# or with poetry
$ poetry add pydantic-settingsIf you don’t know poetry, I have an introduction here.
To demonstrate the usage, let’s re-write the previous example with it.
from typing import Union, List
from pydantic import PostgresDsn, RedisDsn, Field, AliasChoices, StringConstraints, SecretStr
from pydantic_settings import BaseSettings, SettingsConfigDict
from typing_extensions import Annotated
class Settings(BaseSettings):
# env_prefix will tell Pydantic to check for environment
# variables starting with "MY_APP_"
model_config = SettingsConfigDict(env_prefix='my_app_')
debug: bool = False
database_url: Union[str, PostgresDsn] = 'sqlite:///db.sqlite3'
# here we have three environment names where the redis dsn could be found
redis_dsn: RedisDsn = Field(
'redis://user:pass@localhost:6379/1',
validation_alias=AliasChoices('service_redis_dsn', 'redis_url', 'my_app_redis_dsn'),
)
allowed_hosts: List[Annotated[str, StringConstraints(max_length=255)]] = ['foo.com', 'bar.foo.com']
secret_key: SecretStr = 'my super secret key'
if __name__ == '__main__':
settings = Settings()
print(settings.model_config)
print(settings.debug)
print(settings.database_url)
print(settings.redis_dsn)
print(settings.allowed_hosts)
print(settings.secret_key)The best way to leverage Pydantic settings is to already know Pydantic. It is a fantastic library for data validation using type annotations for that exact purpose.
Nevertheless, to summarize what is done in the previous code:
Our setting class needs to inherit the
BaseSettingsclass.model_configis a dictionary where we specify some metadata. Here we tell that we want to search environment variables starting withmy_app_. Adding prefixes to your environment variables is always a good idea to avoid name clashes with environment variables already present on the target machine.Environment variables can be lowercase or uppercase. If we don’t want this behavior we can use the
case_sensitiveproperty of theSettingsConfigDictobject and set it toTrue.
After the model_config declaration, we define the attributes we want to retrieve from the environment.
debug: bool = False: This means Pydantic must check an environment variable called MY_APP_DEBUG and parses the value to a boolean. To know the parsing rules, I recommend you check this section of the documentation. If the environment variable is not present,Falsewill be set by default.database_url: Union[str, PostgresDsn] = 'sqlite:///db.sqlite3': Again the database URL must be retrieved from an environment variable called MY_APP_DATABASE_URL. It can be a simple string (for SQLite) or it must validate a PostgreSQL URL.redis_dsn: RedisDsn = Field('redis://user:pass@localhost:6379/1', validation_alias=AliasChoices('service_redis_dsn', 'redis_url', 'my_app_redis_dsn')):Here we defined a Redis URL. What is particular in this case is theAliasChoicesvalue. It tells Pydantic to search for the following environment variables in the defined order, starting withservice_redis_dsnfor the Redis URL.allowed_hosts: List[Annotated[str, StringConstraints(max_length=255)]] = ['foo.com', 'bar.foo.com']: This time we use an Annotated field to define a constraint on the domain name, its length must not exceed 255 characters.secret_key: SecretStr = 'my super secret key': Here we use the SecretStr type, it is a subclass of thestrclass with the particularity that we can’t access its value using, for example, the built-inprintfunction. This is to avoid leaking sensible information in logs.
I use this pattern in my Django projects. I create a config.py file near the settings.py file where I create my Pydantic settings class with all the configuration I want and I use an instance of it in settings.py.
For testing, there are two (or three) choices:
Used default values when creating the Pydantic setting class. These values will serve for testing.
Use something like pytest monkeypatch (By the way you should probably use pytest for testing these days if you don’t do it already) to set environment variables before instantiating the test application.
Mix the two previous approaches.
I highly recommend you check Pydantic if you don’t know it already. It will save you from some data validation headaches.
This is all for this article, hope you enjoy reading it. Take care of yourself and see you soon. 🙂
