Python Packaging Best Practices (2024)

mic

8 min read

Jan 31, 2024

The best way to share your Python project and let others install it is by building and distributing a package.

For example, to share a library for other developers to use in their application, or for development tools like ‘py.test’.

An advantage of this method of distribution is its well established ecosystem of tools such as PyPI and pip, which make it easy for other developers to download and install your package either for casual experiments, or as part of large, professional systems.

Packaging in this context means 2 things: Building a package + distributing the package.

In this article, I’ll talk about the best practices when building a Python package.

Disclaimer: What it follows is valid if you are writing a simple package such as a pure Python package. If you have a Python package with more complicated needs, you should still use pyproject.toml-based builds and set up your project metadata as instructed below. However, you may have additional considerations when choosing a backend.

Let’s start from the beginning though ….

Python packages are basically bundles of Python code (usually as a compressed file archive) in a particular format that can be distributed to other people and installed by a tool like pip.

Nowadays there are basically 2 types of Python packages used:

Source Packages (.tar.gz): a snapshot of the source code with a manifest file that includes metadata like Name, Version, Summary, Author, etc.
Wheel Packages (.whl): an improvement from the Egg format and now the most recommended format.

Now the question is, how do we bundle our source code into a distributable Python package? Keep reading to discover how (and the best way!).

When one thinks about build a python package, the most frequent words popping out are setup.py, setup.cfg, setuptools, … isn’t it? Well.. these are the “old standards” (as you can see in the python documentation here, packaging using these tools is kind of outdated).

What are the “new standards” then? Well… give a warm welcome to pyproject.toml-based packaging!

Building with the “old standards” and why you should avoid that

In the “old” way, you use a third party tool (usually setuptools) alongside the files setup.py and setup.cfg to build your package (it is basically the blue part on the diagram below).

Is is as easy as execute the following command:

Where sdist stands for source distribution and bdist_wheel, well … it stands for binary package using wheel format.

So far, so easy right? But wait … what are the two main problems with the “old” building procedure?

Remember that the “old” way that packaging was done was with a setup.py file. You'd write a function call to a setupfunction imported from the setuptools package that defined all of your package metadata.

First, this was unstructured data! You had to run this setup.py Python script to even read that data, because it was specified in code. It was more complex than it needed to be, and there were not a lot of guiderails to prevent bad practices. Because it was a script, people could write arbitrary code to dynamically do arbitrary things.

Another problem with setup.py files was dealing with build-time dependencies.

Consider the following situation: let's say you have a package that requires PyTorch to build with C++ and CUDA extensions. You would import things from torch in your setup.py. That meant you had to install torch in your environment before you could even pip install the package from source. Even worse, simply listing torch as a required dependency wasn't enough! Even listing torch together with your package in a requirements.txt file wasn't enough! Pip could not even check that package's dependencies: pip would need to run setup.py to get the dependencies, but setup.py couldn't run because it needs to import torch. Why do you need to install dependencies and run code just to read a bunch of static metadata? Why are there so many steps to install a Python package? In addition, if you only needed certain dependencies for building and not at runtime, it would be annoying to your users to have to install and carry around those dependencies in their environments.

For this and other reasons, you should build your Python package using the new standards.

Building with the “new standards”

The “new standards” refer to a standardized way to specify package metadata (things like package name, author, dependencies) in a pyproject.toml file and the way to build packages from source code using that metadata (referred as the back-end). You often see this referred to as “pyproject.toml-based builds.”

These 2 components are founded on 2 main PEPs (Python Enhancement Proposals), which are basically design documents about new standards or features in Python. These are PEP 517 and PEP 621.

Pyproject.toml and PEP 621

The pyproject.toml file acts as a configuration file for packaging-related tools (as well as other tools). It contains package metadata such as package name, author, dependencies, etc.

The pyproject.toml file is written in TOML. Three tables (TOML way to refer to a section) are currently specified, namely [build-system], [project] and [tool]. Other tables are reserved for future use (tool-specific configuration should use the [tool] table).

The definition of what goes under [project] is what PEP 621 standardizes.

Back-end and PEP 517

You may wonder: What is a back-end exactly?

The backend is the program that reads your pyproject.toml and actually does the work of turning your source code into a package archive that can be installed or distributed. The frontend is just a user interface (usually a command-line program) that calls the backend.

Examples of frontends include: pip, build, poetry, hatch, pdm, flit
Examples of backends include: setuptools (>=61), poetry-core, hatchling, pdm-backend, flit-core

The design of separating these two pieces in the build workflow means that — in principle — you can mix and match frontends and backends. But that’s food for another post :-)

To use a specific backend, simply include a section “build-system” in your pyproject.tomlthat looks like this:

In the example above flit is used as backend. Check the documentation of your chosen backend to see how to specify it in the pyproject.toml file. Below you can see som backend examples and the way they need to be specified in the pyproject.toml file.

What PEP 517 specifies are, among other things, the mandatory hooks (such as build_wheel, or build_sdist). This is essentially the interface the backend should implement.

So, backends implementing the PEP 517 specification is usually referred as “PEP 517 backends”.

Now the question is: How do I choose a build backend?

If you are writing a simple package in pure Python, you will more or less get the same outcome with any backend that is compliant with both PEP 517 ([build-system]) and PEP 621 ([project]).

However, there are 2 main points in my opinion that should define which backend you choose:

Extra features related to the build process such as customizing what extra files get included in your built package, running code at build time, and how to handle filling in dynamic fields. The most commonly supported dynamic field is the package version; many backends support reading it from the source code or the repository’s version control system. If you want or need those kinds of features, then you will need a build backend that supports them. Then, you will need to set backend-tool-specific configuration in your pyproject.toml for it. For example, you'd include this in your pyproject.toml to tell the hatchling backend how to include or exclude files in your build:

Integration with frontend systems that provide additional project and workflow management functionality. Many of these backends are strongly coupled to powerful workflow management programs. Examples are poetry/poetry-core, hatch/hatchling, pdm/pdm-backend. These frontends actually do significantly more than just act a frontend to the build process. Many of them are comprehensive workflow management tools for Python projects. They include functionalites such as:

Creating and managing your virtual environments.
Commands for managing and installing your dependencies.
Dependency lockfiles
Commands for uploading your package to PyPI.

Below you can see an example of a complete pyproject.toml file using PDM as backend.

As you can see:

The project is compliant with PEP 621 (section [project]).
Backend is PDM, whic is a PEP 517 backend (section [build-system]).
There are specific build features (sections [tool.pdm.*]

Summary

If you want to create a Python package using the modern packaging standards:

Declare a build backend in the [build-system] table of your pyproject.toml file.
Declare a your project metadata in the [project] table of your pyproject.toml file.

To choose a backend to use:

If you have a simple project, it doesn’t really matter. Pick any backend that supports PEP 517 and PEP 621. Currently popular choices include flit-core, hatchling, pdm-backend, and setuptools (>=61). At the time of writing Poetry is PEP 517 compliant but not PEP 621 compliant.
If you have more sophisticated build feature needs, compare the build-specific features that get configured in the [tool.*] tables of pyproject.toml.
If you want a comprehensive workflow tool that manages other things like virtual environments and dependencies, compare the tools on those features. Poetry, Hatch, and PDM are each currently quite popular, though Poetry isn’t yet up-to-date on all standards.

I personally like PDM for different reasons. It is both PEP 517 and PEP 621 compliant and it provides nice build-time features alongside a powerful front-end with sereval workflow management options.

I hope this helped to clarify the current status of packaging standards in Python and … happy packaging! :-)