There is an area of Python that many developers have problems with. This is an area that has seen many different solutions pop up over the years, with many different opinions, wars, and attempts to solve it. Many have complained about the packaging ecosystem and tools making their lives harder. Many beginners are confused about virtual environments. But does it have to be this way? Are the current solutions to packaging problems any good? And is the organization behind most of the packaging tools and standards part of the problem itself?
Join me on a journey through packaging in Python and elsewhere. We’ll start by describing the classic packaging stack (involving setuptools and friends), the scientific stack (with conda), and some of the modern/alternate tools, such as Pipenv, Poetry, Hatch, or PDM. We’ll also look at some examples of packaging and dependency-related workflows seen elsewhere (Node.js and .NET). We’ll also take a glimpse at a possible future (with a venv-less workflow with PDM), and see if the PyPA agrees with the vision and insights of eight thousand users.
The plethora of tools
There are many packaging-related tools in Python. All of them with different authors, lineages, and often different opinions, although most of them are now unified under the Python Packaging Authority (PyPA) umbrella. Let’s take a look at them.
The classic stack
The classic Python packaging stack consists of many semi-related tools. Setuptools, probably the oldest tool of the group, and itself based on distutils
, which is part of the standard library (although it will be removed in Python 3.12), is responsible for installing a single package. It previously used setup.py
files to do its job, which required arbitrary code execution. It then added support for non-executable metadata specification formats: setup.cfg
, and also pyproject.toml
(partially still in beta). However, you aren’t supposed to use setup.py
files directly these days, you’re supposed to be using pip. Pip installs packages, usually from the PyPI, but it can also support other sources (such as git repositories or the local filesystem). But where does pip install things? The default used to be to install globally and system-wide, which meant you could introduce conflicts between packages installed by pip and apt (or whatever the system package manager is). Even with a user-wide install (which pip is likely to attempt these days), you can still end up with conflicts, and you can also have conflicts in which package A requests X version 1.0.0, but package B expects X version 2.0.0—but A and B are not at all related and could live separately with their preferred version of X. Enter venv
, a standard library descendant of virtualenv
, which can create a lightweight virtual environment for packages to live in. This virtual environment gives you the separation from system packages and from different environments, but it is still tied to the system Python in some ways (and if the system Python disappears, the virtual environment stops working).
A few extra tools would be used in a typical packaging workflow. The wheel
package enhances Setuptools with the ability to generate wheels, which are ready-to-install (without running setup.py
). Wheels can either be pure-Python and be installed anywhere, or they can contain pre-compiled extension modules (things written in C) for a given OS and Python (and there’s even a standard that allows building and distributing one wheel for all typical Linux distros). The wheel
package should be an implementation detail, something existing inside Setuptools and/or pip, but users need to be aware of it if they want to make wheels on their system, because virtual environments produced by venv
do not have wheel
installed. Regular users who do not maintain their own packages may sometimes be told that pip is using something legacy because wheel
is not installed, which is not a good user experience. Package authors also need twine
, whose sole task is uploading source distributions or wheels, created with other tools, to PyPI (and there’s not much more to say about that tool).
…and a few extensions
Over the years, there have been a few tools that are based on things from the classic stack. For example, pip-tools
can simplify dependency management. While pip freeze
lets you produce a file with everything installed in your environment, there is no way to specify the dependencies you need, and get a lock file with specific versions and transitive dependencies (without installing and freezing everything), there is no easy way to skip development dependencies (e.g. IPython) when you pip freeze
, and there is no workflow to update all your dependencies with just pip. pip-tools
adds two tools, pip-compile
which takes in requirements.in
files with the packages you care about, and produces a requrirements.txt
with pinned versions of them and all transitive dependencies; and also pip-sync
, which can install requirements.txt
and removes things not listed in it.
Another tool that might come in useful is virtualenvwrapper
, which can help you manage (create and activate) virtual environments in a central location. It has a few bells and whistles (such as custom hooks to do actions on every virtualenv creation), although for basic usage, you could replace it with a single-line shell function.
Yet another tool that works alongside the classic toolset is pipx
, which creates and manages virtual environments for apps written in Python. You tell it to pipx install Nikola
, and it will create a virtual environment somewhere, install Nikola into it, and put a script for launching it in ~/.local/bin
. While you could do it all yourself with venv and some symlinks, pipx can take care of this, and you don’t need to remember where the virtual environment is.
The scientific stack and conda
The scientific Python community have had their own tools for many years. The conda tool can manage environments and packages. It doesn’t use PyPI and wheels, but rather packages from conda channels (which are prebuilt, and expect an Anaconda-distributed Python). Back in the day, when there were no wheels, this was the easiest way to get things installed on Windows; this is not as much of a problem now with binary wheels on PyPI—but the Anaconda stack is still popular in the scientific world. Conda packages can be built with conda-build
, which is separate, but closely related to conda
itself. Conda packages are not compatible with pip
in any way, they do not follow the packaging standards used by other tools. Is this good? No, because it makes integrating the two worlds harder, but also yes, because many problems that apply to scientific packages (and their C/C++ extension modules, and their high-performance numeric libraries, and other things) do not apply to other uses of Python, so having a separate tool lets people focusing the other uses simplify their workflows.
The new tools
A few years ago, new packaging tools appeared. Now, there were lots of “new fancy tools” introduced in the past, with setuptools extending distutils, then distribute forking setuptools, then distribute being merged back…
The earliest “new tool” was Pipenv. Pipenv had really terrible and misleading marketing, and it merged pip and venv, in that Pipenv would create a venv and install packages in it (from Pipfile
or Pipfile.lock
). Pipenv can place the venv in the project folder, or hide it somewhere in the project folder (the latter is the default). However, Pipenv does not handle any packages related to packaging your code, so it’s useful only for developing non-installable applications (Django sites, for example). If you’re a library developer, you need setuptools anyway.
The second new tool was Poetry. It manages environments and dependencies in a similar way to Pipenv, but it can also build .whl
files with your code, and it can upload wheels and source distributions to PyPI. This means it has pretty much all the features the other tools have, except you need just one tool. However, Poetry is opinionated, and its opinions are sometimes incompatible with the rest of the packaging scene. Poetry uses the pyproject.toml
standard, but it does not follow the standard specifying how metadata should be represented in a pyproject.toml
file (PEP 621), instead using a custom [tool.poetry]
table. This is partly because Poetry came out before PEP 621, but the PEP was accepted over 2 years ago—the biggest compatibility problem is Poetry’s node-inspired ~
and ^
dependency version markers, which are not compatible with PEP 508 (the dependency specification standard). Poetry can package C extension modules, although it uses setuptools’ infrastructure for this (and requires a custom build.py
script).
Another similar tool is Hatch. This tool can also manage environments (it allows multiple environments per project, but it does not allow to put them in the project directory), and it can manage packages (but without lockfile support). Hatch can also be used to package a project (with PEP 621-compliant pyproject.toml
files) and upload it to PyPI. It does not support C extension modules.
A tool that tries to be a simpler re-imagining of Setuptools is Flit. It can build and install a package using a pyproject.toml
file. It also supports uploads to PyPI. It lacks support for C extension modules, and it expects you to manage environments on your own.
There’s one more interesting (albeit not popular or well-known) tool. This tool is PDM. It can manage venvs (but it defaults to the saner .venv
location), manage dependencies, and it uses a standards-compliant pyproject.toml
. There’s also a curious little feature called PEP 582 support, which we’ll talk about later.
Does Python really need virtual environments?
Python relies on virtual environments for separation between projects. Virtual environments (aka virtualenvs or venvs) are folders with symlinks to a system-installed Python, and their own set of site-packages. There are a few problems with them:
How to use Python from a virtual environment?
There are two ways to do this. The first one is to activate it, by running the activate shell script installed in the environment’s bin directory. Another is to run the python executable (or any other script in the bin directory) directly from the venv. [2]
Activating venvs directly is more convenient for developers, but it also has some problems. Sometimes, activation fails to work, due to the shell caching the location of things in $PATH
. Also, beginners are taught to activate
and run python
, which means they might be confused and try to use activate in scripts or cronjobs (but in those environments, you should not activate venvs, and instead use the Python executable directly). Virtual environment activation is more state you need to be aware of, and if you forget about it, or if it breaks, you might end up messing up your user-wide (or worse, system-wide) Python packages.
How to manage virtual environments?
The original virtualenv tool, and its simplified standard library rewrite venv, allow you to put a virtual environment anywhere in the file system, as long as you have write privileges there. This has led to people and tools inventing their own standards. Virtualenvwrapper stores environments in a central location, and does not care about their contents. Pipenv and poetry allow you to choose (either a central location or the .venv directory in the project), and environments are tied to a project (they will use the project-specific environment if you’re in the project directory). Hatch stores environments in a central location, and it allows you to have multiple environments per project (but there is no option to share environments between projects).
Brett Cannon has recently done a survey, and it has shown the community is split on their workflows: some people use a central location, some put them in the project directory, some people have multiple environments with different Python versions, some people reuse virtualenvs between projects… Everyone has different needs, and different opinions. For example, I use a central directory (~/virtualenvs) and reuse environments when working on Nikola (sharing the same environment between development and 4 Nikola sites). But on the other hand, when deploying web apps, the venv lives in the project folder, because this venv needs to be used by processes running as different users (me, root, or the service account for the web server, which might have interactive login disabled, or whose home directory may be set to something ephemeral).
So: does Python need virtual environments? Perhaps looking how other languages handle this problem can help us figure this out for Python?
How everyone else is doing it
We’ll look at two ecosystems. We’ll start with JavaScript/Node.js (with npm), and then we’ll look at the C#/.NET (with dotnet CLI/MSBuild) ecosystem for comparison. We’ll demonstrate a sample flow of making a project, installing dependencies in it, and running things. If you’re familiar with those ecosystems and want to skip the examples, continue with How is Node better than Python? and Are those ecosystems’ tools perfect?. Otherwise, read on.
JavaScript/Node.js (with npm)
There are two tools for dealing with packages in the Node world, namely npm and Yarn. The npm CLI tool is shipped with Node, so we’ll focus on it.
Let’s create a project:
We’ve got a package.json file, which has some metadata about our project (name, version, description, license). Let’s install a dependency:
The mere existence of an is-even
package is questionable; the fact that it includes four dependencies is yet another, and the fact that it depends on is-odd
is even worse. But this post isn’t about is-even
or the Node ecosystem’s tendency to use tiny packages for everything (but I wrote one about this topic before). Let’s look at what we have in the filesystem:
$ ls node_modules/ package.json package-lock.json $ ls node_modules is-buffer/ is-even/ is-number/ is-odd/ kind-of/
Let’s also take a peek at the package.json
file:
{ "name": "mynpmproject", "version": "1.0.0", "description": "", "main": "index.js", "scripts": { "test": "echo "Error: no test specified" && exit 1" }, "author": "", "license": "ISC", "dependencies": { "is-even": "^1.0.0" } }
Our package.json
file now lists the dependency, and we’ve also got a lock file (package-lock.json
), which records all the dependency versions used for this install. If this file is kept in the repository, any future attempts to npm install
will use the dependency versions listed in this file, ensuring everything will work the same as it did originally (unless one of those packages were to get removed from the registry).
Let’s try writing a trivial program using the module and try running it:
Let’s try removing is-odd
to demonstrate how badly designed this package is:
$ rm -rf node_modules/is-odd $ node index.js node:internal/modules/cjs/loader:998 throw err; ^ Error: Cannot find module 'is-odd' Require stack: - /tmp/mynpmproject/node_modules/is-even/index.js - /tmp/mynpmproject/index.js at Module._resolveFilename (node:internal/modules/cjs/loader:995:15) at Module._load (node:internal/modules/cjs/loader:841:27) at Module.require (node:internal/modules/cjs/loader:1061:19) at require (node:internal/modules/cjs/helpers:103:18) at Object.(/tmp/mynpmproject/node_modules/is-even/index.js:10:13) at Module._compile (node:internal/modules/cjs/loader:1159:14) at Module._extensions..js (node:internal/modules/cjs/loader:1213:10) at Module.load (node:internal/modules/cjs/loader:1037:32) at Module._load (node:internal/modules/cjs/loader:878:12) at Module.require (node:internal/modules/cjs/loader:1061:19) { code: 'MODULE_NOT_FOUND', requireStack: [ '/tmp/mynpmproject/node_modules/is-even/index.js', '/tmp/mynpmproject/index.js' ] } Node.js v18.12.1
How is Node better than Python?
Badly designed packages aside, we can see an important difference from Python in that there is no virtual environment, and all the packages live in the project directory. If we fix the node_modules
directory by running npm install
, we can see that I can run the script from somewhere else on the file system:
$ pwd /tmp/mynpmproject $ npm install added 1 package, and audited 6 packages in 436ms found 0 vulnerabilities $ node /tmp/mynpmproject/index.js true $ cd ~ $ node /tmp/mynpmproject/index.js true
If you try to do that with a Python tool…
-
If you’re using a manually managed venv, you need to remember to activate it, or to use the appropriate Python.
-
If you’re using something fancier, it might be tied to the current working directory, and it may expect you to change into that directory, or to pass an argument pointing at that directory.
I can also run my code as root
, and as an unprivileged nginx
user, without any special preparation (like telling pipenv/poetry to put their venv in the project directory, or running them as the other users):
If you try to do that with a Python tool…
-
If you’re using a manually managed venv, you can use its Python as another user (assuming it has the right permissions).
-
If your tool puts the venv in the project directory, this will work too.
-
If your tool puts the venv in some weird place in your home folder, the other users will get their own venvs. The
uwsgi
user on Fedora uses/run/uwsgi
as its home directory, and/run
is ephemeral (tmpfs), so a reboot forces you to reinstall things.
We can even try to change the name of our project:
If you try to do that with a Python tool…
-
If you’re using a manually managed venv, and it lives in a central directory, all is well.
-
If you or your tool places the venv in the project directory, the venv is now broken, and you need to recreate it (hope you have a recent
requirements.txt
!) -
If your tool puts the venv in some weird place in your home folder, it may decide that this is a different project, which means it will recreate it, and you’ll have an unused virtual environment somewhere on your filesystem.
Other packaging topics
Some packages may expose executable scripts (with the bin
property). Those can be run in three ways:
-
Installed globally using
npm install -g
, which would put the script in a global location that’s likely in$PATH
(e.g./usr/local/bin
). -
Installed locally using
npm install
, and executed with thenpx
tool or manually by running the script innode_packages/.bin
. -
Not installed at all, but executed using the
npx
tool, which will install it into a cache and run it.
Also, if we wanted to publish our thing, we can just run npm publish
(after logging in with npm login
).
C#/.NET (with dotnet CLI/MSBuild)
In modern .NET, the One True Tool is the dotnet CLI, which uses MSBuild for most of the heavy lifting. (In the classic .NET Framework, the duties were split between MSBuild and NuGet.exe, but let’s focus on the modern workflow.)
Let’s create a project:
$ mkdir mydotnetproject $ cd mydotnetproject $ dotnet new console The template "Console App" was created successfully. Processing post-creation actions... Running 'dotnet restore' on /tmp/mydotnetproject/mydotnetproject.csproj... Determining projects to restore... Restored /tmp/mydotnetproject/mydotnetproject.csproj (in 92 ms). Restore succeeded. $ ls mydotnetproject.csproj obj/ Program.cs
We get three things: a mydotnetproject.csproj
file, which defines a few properties of our project; Program.cs
, which is a hello world program, and obj/
, which contains a few files you don’t need to care about.
Let’s try adding a dependency. For a pointless example, but slightly more reasonable than the JS one, we’ll use AutoFixture
, which brings in a dependency on Fare
. If we run dotnet add package AutoFixture
, we get some console output, and our mydotnetproject.csproj
now looks like this:
The first
specifies what our project is (Exe = something you can run), specifies the target framework (.NET 6.0 [3]), and enables a few opt-in features of C#. The second
was inserted when we installed AutoFixture.
We can now write a pointless program in C#. Here’s our new Program.cs
:
using AutoFixture; var fixture = new Fixture(); var a = fixture.Create<int>(); var b = fixture.Create<int>(); var result = a + b == b + a; Console.WriteLine(result ? "Math is working": "Math is broken");
(We could just use C#’s/.NET’s built-in random number generator, AutoFixture is complete overkill here—it’s meant for auto-generating test data, with support for arbitrary classes and other data structures, and we’re just getting two random ints here. I’m using AutoFixture for this example, because it’s simple to use and demonstrate, and because it gets us a transitive dependency.)
And now, we can run it:
If we want something that can be run outside of the project, and possibly without .NET installed on the system, we can use dotnet publish. The most basic scenario:
$ dotnet publish $ ls bin/Debug/net6.0/publish AutoFixture.dll* Fare.dll* mydotnetproject* mydotnetproject.deps.json mydotnetproject.dll mydotnetproject.pdb mydotnetproject.runtimeconfig.json $ du -h bin/Debug/net6.0/publish 424K bin/Debug/net6.0/publish $ bin/Debug/net6.0/publish/mydotnetproject Math is working
You can see that we’ve got a few files related to our project, as well as AutoFixture.dll
and Fare.dll
, which are our dependencies (Fare.dll
is a dependency of AutoFixture.dll
). Now, let’s try to remove AutoFixture.dll
from the published distribution:
$ rm bin/Debug/net6.0/publish/AutoFixture.dll $ bin/Debug/net6.0/publish/mydotnetproject Unhandled exception. System.IO.FileNotFoundException: Could not load file or assembly 'AutoFixture, Version=4.17.0.0, Culture=neutral, PublicKeyToken=b24654c590009d4f'. The system cannot find the file specified. File name: 'AutoFixture, Version=4.17.0.0, Culture=neutral, PublicKeyToken=b24654c590009d4f' [1] 45060 IOT instruction (core dumped) bin/Debug/net6.0/publish/mydotnetproject
We can also try a more advanced scenario:
$ rm -rf bin obj # clean up, just in case $ dotnet publish --sc -r linux-x64 -p:PublishSingleFile=true -o myoutput Microsoft (R) Build Engine version 17.0.1+b177f8fa7 for .NET Copyright (C) Microsoft Corporation. All rights reserved. Determining projects to restore... Restored /tmp/mydotnetproject/mydotnetproject.csproj (in 4.09 sec). mydotnetproject -> /tmp/mydotnetproject/bin/Debug/net6.0/linux-x64/mydotnetproject.dll mydotnetproject -> /tmp/mydotnetproject/myoutput/ $ ls myoutput mydotnetproject* mydotnetproject.pdb $ myoutput/mydotnetproject Math is working $ du -h myoutput/* 62M myoutput/mydotnetproject 12K myoutput/mydotnetproject.pdb $ file -k myoutput/mydotnetproject myoutput/mydotnetproject: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=47637c667797007d777f4322729d89e7fa53a870, for GNU/Linux 2.6.32, stripped, too many notes (256) 12- data $ file -k myoutput/mydotnetproject.pdb myoutput/mydotnetproject.pdb: Microsoft Roslyn C# debugging symbols version 1.0 12- data
We have a single output file that contains our program, its dependencies, and parts of the .NET runtime. We also get debugging symbols if we want to run our binary with a .NET debugger and see the associated source code. (There are ways to make the binary file smaller, and we can move most arguments of dotnet publish
to the .csproj file, but this post is about Python, not .NET, so I’m not going to focus on them too much.)
How is .NET better than Python?
I’m not going to bore you with the same demonstrations I’ve already shown when discussing How is Node better than Python?, but:
-
You can run built .NET projects as any user, from anywhere in the filesystem.
-
All you need to run your code is the output directory (publishing is optional, but useful to have a cleaner output, to simplify deployment, and to possibly enable compilation to native code).
-
If you do publish in single-executable mode, you can just distribute the single executable, and your users don’t even need to have .NET installed.
-
You do not need to manage environments, you do not need special tools to run your code, you do not need to think about the current working directory when running code.
Other packaging topics
Locking dependencies is disabled by default, but if you add
to the
in your .csproj
file, you can enable it (and get a packages.lock.json
file in output).
Regarding command line tools, .NET has support for those as well. They can be installed globally or locally, and may be accessed via $PATH or via the dotnet
command.
As for publishing your package to NuGet.org or to another repository, you might want to look at the full docs for more details, but the short version is:
-
Add some metadata to the
.csproj
file (e.g.PackageId
andVersion
) -
Run
dotnet pack
to get a.nupkg
file -
Run
dotnet nuget push
to upload the.nupkg
file (passing the file name and an API key)
Once again, everything is done with a single dotnet
tool. The .NET IDEs (in particular, Visual Studio and Rider) do offer friendly GUI versions of many features. Some of those GUIs might be doings things slightly differently behind the scenes, but this is transparent to the user (and the backend is still MSBuild or a close derivative of it). I can take a CLI-created project, add a dependency from Rider, and publish an executable from VS, and everything will work the same. And perhaps XML files aren’t as cool as TOML, but they’re still easy to work with in this case.
Other languages and ecosystems
While we have explored two tools for two languages in depth, there are also other languages that deserve at least a mention. In the Java world, the two most commonly used tools are Maven and Gradle. Both tools can be used to manage dependencies and build artifacts that can be executed or distributed further (things like JAR files). Other tools with support for building Java projects exist, but most people just pick one of the two. The community of Scala, which is another JVM-based language, prefers sbt (which can be used for plain Java as well), but there are also Maven or Gradle users in that community. Finally, two new-ish languages which are quite popular in the recent times, Go and Rust, have first-party tooling integrated with the rest of the toolchain. The go
command-lin