Managing dependencies
The simplest way of managing dependencies is storing them in a requirements.txt
file. In its simplest form, this is a list of package names and nothing else. This file can be extended with version requirements and can even support environment-specific installations.
A fancier method of installing and managing your dependencies is by using a tool such as poetry
or pipenv
. Internally, these use the regular pip
installation method, but they build a full dependency graph of all the packages. This makes sure that all package versions are compatible with each other and allows the parallel installation of non-dependent packages.
Using pip and a requirements.txt file
The requirements.txt
format allows you to list all of the dependencies of your project as broadly or as specifically as you feel is necessary. You can easily create this file yourself, but you can also tell pip
to generate it for you, or even to generate a new file based on a previous requirements.txt
file so you can view the changes. I recommend using pip freeze
to generate an initial file and cherry-picking the dependencies (versions) you want.
For example, assuming that we run pip freeze
in our virtual environment from before:
(your_env) $ pip3 freeze
pkg-resources==0.0.0
If we store that file in a requirements.txt
file, install a package, and look at the difference, we get this result:
(your_env) $ pip3 freeze > requirements.txt
(your_env) $ pip3 install progressbar2
Collecting progressbar2
...
Installing collected packages: six, python-utils, progressbar2
Successfully installed progressbar2-3.47.0 python-utils-2.3.0 six-1.13.0
(your_env) $ pip3 freeze -r requirements.txt
pkg-resources==0.0.0
## The following requirements were added by pip freeze:
progressbar2==3.47.0
python-utils==2.3.0
six==1.13.0
As you can see, the pip freeze
command automatically detected the addition of the six
, progressbar2
, and python-utils
packages, and it immediately pinned those versions to the currently installed ones.
The lines in the requirements.txt
file are understood by pip
on the command line as well, so to install a specific version, you can run:
$ pip3 install 'progressbar2==3.47.0'
Version specifiers
Often, pinning a version as strictly as that is not desirable, however, so let’s change the requirements file to only contain what we actually care about:
# We want a progressbar that is at least version 3.47.0 since we've tested that.
# But newer versions are ok as well.
progressbar2>=3.47.0
If someone else wants to install all of the requirements in this file, they can simply tell pip
to include that requirement:
(your_env) $ pip3 install -r requirements.txt
Requirement already satisfied: progressbar2>=3.47.0 in your_env/lib/python3.9/site-packages (from -r requirements.txt (line 1))
Requirement already satisfied: python-utils>=2.3.0 in your_env/lib/python3.9/site-packages (from progressbar2>=3.47.0->-r requirements.txt (line 1))
Requirement already satisfied: six in your_env/lib/python3.9/site-packages (from progressbar2>=3.47.0->-r requirements.txt (line 1))
In this case, pip
checks to see whether all packages are installed and will install or update them if needed.
-r requirements.txt
works recursively, allowing you to include multiple requirements files.
Now let’s assume we’ve encountered a bug in the latest version and we wish to skip it. We can assume that only this specific version is affected, so we will only blacklist that version:
# Progressbar 2 version 3.47.0 has a silly bug but anything beyond 3.46.0 still works with our code
progressbar2>=3.46,!=3.47.0
Lastly, we should talk about wildcards. One of the most common scenarios is needing a specific major version number but still wanting the latest security update and bug fixes. There are a few ways to specify these:
# Basic wildcard:
progressbar2 ==3.47.*
# Compatible release:
progressbar2 ~=3.47.1
# Compatible release above is identical to:
progressbar2 >=3.47.1, ==3.47.*
With the compatible release pattern (~=), you can select the newest version that is within the same major release but is at least the specified version.
The version identification and dependency specification standard is described thoroughly in PEP 440:
Installing through source control repositories
Now let’s say that we’re really unlucky and there is no working release of the package yet, but it has been fixed in the develop
branch of the Git repository. We can install that either through pip
or through a requirements.txt
file, like this:
(your_env) $ pip3 install --editable 'git+https://github.com/wolph/python-progressbar@develop#egg=progressbar2'
Obtaining progressbar2 from git+https://github.com/wolph/python-progressbar@develop#egg=progressbar2
Updating your_env/src/progressbar2 clone (to develop)
Requirement already satisfied: python-utils>=2.3.0 in your_env/lib/python3.9/site-packages (from progressbar2)
Requirement already satisfied: six in your_env/lib/python3.9/site-packages (from progressbar2)
Installing collected packages: progressbar2
Found existing installation: progressbar2 3.47.0
Uninstalling progressbar2-3.47.0:
Successfully uninstalled progressbar2-3.47.0
Running setup.py develop for progressbar2
Successfully installed progressbar2
You may notice that pip
not only installed the package but actually did a git clone
to your_env/src/progressbar2
. This is an optional step caused by the --editable
(short option: -e
) flag, which has the additional advantage that every time you re-run the command, the git
clone will be updated. It also makes it rather easy to go to that directory, modify the code, and create a pull request with a fix.
In addition to Git, other source control systems such as Bazaar, Mercurial, and Subversion are also supported.
Additional dependencies using extras
Many packages offer optional dependencies for specific use cases. In the case of the progressbar2
library, I have added tests
and docs
extras to install the test or documentation building dependencies needed to run the tests for the package. Extras can be specified using square brackets separated by commas:
# Install the documentation and test extras in addition to the progressbar
progressbar2[docs,tests]
# A popular example is the installation of encryption libraries when using the requests library:
requests[security]
Conditional dependencies using environment markers
If your project needs to run on multiple systems, you will most likely encounter dependencies that are not required on all systems. One example of this is libraries that are required on some operating systems but not on others. An example of this is the portalocker
package I maintain; on Linux/Unix systems, the locking mechanisms needed are supported out of the box. On Windows, however, they require the pywin32
package to work. The install_requires
part of the package (which uses the same syntax as requirements.txt
) contains this line:
pywin32!=226; platform_system == "Windows"
This specifies that on Windows, the pywin32
package is required, and version 226
was blacklisted due to a bug.
In addition to platform_system
, there are several more markers, such as python_version
and platform_machine
(contains architecture x86_64
, for example).
The full list of markers can be found in PEP 496: https://peps.python.org/pep-0496/.
One other useful example of this is the dataclasses
library. This library has been included with Python since version 3.7, so we only need to install the backport for older Python versions:
dataclasses; python_version < '3.7'
Automatic project management using poetry
The poetry
tool provides a really easy-to-use solution for creating, updating, and sharing your Python projects. It’s also very fast, which makes it a fantastic starting point for a project.
Creating a new poetry project
Starting a new project is very easy. It will automatically handle virtual environments, dependencies, and other project-related tasks for you. To start, we will use the poetry init
wizard:
$ poetry init
This command will guide you through creating your pyproject.toml config.
Package name [t_00_poetry]:
Version [0.1.0]:
Description []:
Author [Rick van Hattem <[email protected]>, n to skip]:
License []:
Compatible Python versions [^3.10]:
Would you like to define your main dependencies interactively? (yes/no) [yes] no
Would you like to define your development dependencies interact...? (yes/no) [yes] no
...
Do you confirm generation? (yes/no) [yes]
Following these few questions, it automatically creates a pyproject.toml
file for us that contains all the data we entered and some automatically generated data. As you may have noticed, it automatically prefilled several values for us:
- The project name. This is based on the current directory name.
- The version. This is fixed to
0.1.0
. - The author field. This looks at your
git
user information. This can be set using:$ git config --global user.name "Rick van Hattem" $ git config --global user.email "[email protected]"
- The Python version. This is based on the Python version you are running
poetry
with, but it can be customized usingpoetry init --python=...
Looking at the generated pyproject.toml
, we can see the following:
[tool.poetry]
name = "t_00_poetry"
version = "0.1.0"
description = ""
authors = ["Rick van Hattem <[email protected]>"]
[tool.poetry.dependencies]
python = "^3.10"
[tool.poetry.dev-dependencies]
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
Adding dependencies
Once we have the project up and running, we can now add dependencies:
$ poetry add progressbar2
Using version ^3.55.0 for progressbar2
...
Writing lock file
...
• Installing progressbar2 (3.55.0)
This automatically installs the package, adds it to the pyproject.toml
file, and adds the specific version to the poetry.lock
file. After this command, the pyproject.toml
file has a new line added to the tool.poetry.dependencies
section:
[tool.poetry.dependencies]
python = "^3.10"
progressbar2 = "^3.55.0"
The poetry.lock
file is a bit more specific. Whereas the progressbar2
dependency could have a wildcard version, the poetry.lock
file stores the exact version, the file hashes, and all the dependencies that were installed:
[[package]]
name = "progressbar2"
version = "3.55.0"
...
[package.dependencies]
python-utils = ">=2.3.0"
...
[package.extras]
docs = ["sphinx (>=1.7.4)"]
...
[metadata]
lock-version = "1.1"
python-versions = "^3.10"
content-hash = "c4235fba0428ce7877f5a94075e19731e5d45caa73ff2e0345e5dd269332bff0"
[metadata.files]
progressbar2 = [
{file = "progressbar2-3.55.0-py2.py3-none-any.whl", hash = "sha256:..."},
{file = "progressbar2-3.55.0.tar.gz", hash = "sha256:..."},
]
...
By having all this data, we can build or rebuild a virtual environment for a poetry
-based project on another system exactly as it was created on the original system. To install, upgrade, and/or downgrade the packages exactly as specified in the poetry.lock
file, we need a single command:
$ poetry install
Installing dependencies from lock file
...
This is very similar to how the npm
and yarn
commands work if you are familiar with those.
Upgrading dependencies
In the previous examples, we simply added a dependency without specifying an explicit version. Often this is a safe approach, as the default version requirement will allow for any version within that major version.
If the project uses normal Python versioning or semantic versioning (more about that in Chapter 18, Packaging - Creating Your Own Libraries or Applications), that should be perfect. At the very least, all of my projects (such as progressbar2) are generally both backward and largely forward compatible, so simply fixing the major version is enough. In this case, poetry
defaulted to version ^3.55.0
, which means that any version newer than or equal to 3.55.0, up to (but not including) 4.0.0, is valid.
Due to the poetry.lock
file, a poetry install
will result in those exact versions being installed instead of the new versions, however. So how can we upgrade the dependencies? For this purpose, we will start by installing an older version of the progressbar2
library:
$ poetry add 'progressbar2=3.1.0'
Now we will relax the version in the pyproject.toml
file to ^3.1.0
:
[tool.poetry.dependencies]
progressbar2 = "^3.1.0"
Once we have done this, a poetry install
will still keep the 3.1.0
version, but we can make poetry
update the dependencies for us:
$ poetry update
...
• Updating progressbar2 (3.1.0 -> 3.55.0)
Now, poetry has nicely updated the dependencies in our project while still adhering to the requirements we set in the pyproject.toml
file
. If you set the version requirements of all packages to *
, it will always update everything to the latest available versions that are compatible with each other.
Running commands
To run a single command using the poetry
environment, you can use poetry run
:
$ poetry run pip
For an entire development session, however, I would suggest using the shell
command:
$ poetry shell
After this, you can run all Python commands as normal, but these will now be running from the activated virtual environment.
For cron jobs this is similar, but you will need to make sure that you change directories first:
0 3 * * * cd /home/wolph/workspace/poetry_project/ && poetry run python script.py
This command runs every day at 03:00 (24-hour clock, so A.M.).
Note that cron might not be able to find the poetry
command due to having a different environment. In that case, I would recommend using the absolute path to the poetry
command, which can be found using which
:
$ which poetry
/usr/local/bin/poetry
Automatic dependency tracking using pipenv
For large projects, your dependencies can change often, which makes the manual manipulation of the requirements.txt
file rather tedious. Additionally, having to create a virtual environment before you can install your packages is also a pretty repetitive task if you work on many projects. The pipenv
tool aims to transparently solve these issues for you, while also making sure that all of your dependencies are compatible and updated. And as a final bonus, it combines the strict and loose dependency versions so you can make sure your production environment uses the exact same versions you tested.
Initial usage is simple; go to your project directory and install a package. Let’s give it a try:
$ pipenv install progressbar2
Creating a virtualenv for this project...
...
Using /usr/local/bin/python3 (3.10.4) to create virtualenv...
...
Successfully created virtual environment!
...
Creating a Pipfile for this project...
Installing progressbar2...
Adding progressbar2 to Pipfile's [packages]...
Installation Succeeded
Pipfile.lock not found, creating...
...
Success!
Updated Pipfile.lock (996b11)!
Installing dependencies from Pipfile.lock (996b11)...
0/0 — 00:00:0
That’s quite a bit of output even when abbreviated. But let’s look at what happened:
- A virtual environment was created.
- A
Pipfile
was created, which contains the dependency as you specified it. If you specify a specific version, that will be added to thePipfile
; otherwise, it will be a wildcard requirement, meaning that any version will be accepted as long as there are no conflicts with other packages. - A
Pipfile.lock
was created containing the exact list of packages and versions as installed. This allows an identical install on a different machine with the exact same versions.
The generated Pipfile
contains the following:
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true
[dev-packages]
[packages]
progressbar2 = "*"
[requires]
python_version = "3.10"
And the Pipfile.lock
is a bit larger, but immediately shows another advantage of this method:
{
...
"default": {
"progressbar2": {
"hashes": [
"sha256:14d3165a1781d053...",
"sha256:2562ba3e554433f0..."
],
"index": "pypi",
"version": "==4.0.0"
},
"python-utils": {
"hashes": [
"sha256:4dace6420c5f50d6...",
"sha256:93d9cdc8b8580669..."
],
"markers": "python_version >= '3.7'",
"version": "==3.1.0"
},
...
},
"develop": {}
}
As you can see, in addition to the exact package versions, the Pipfile.lock
contains the hashes of the packages as well. In this case, the package provides both a .tar.gz
(source) and a .whl
(wheel) file, which is why there are two hashes. Additionally, the Pipfile.lock
contains all packages installed by pipenv
, including all dependencies.
Using these hashes, you can be certain that during a deployment, you will receive the exact same file and not some corrupt or even malicious file.
Because the versions are completely fixed, you can also be certain that anyone deploying your project using the Pipfile.lock
will get the exact same package versions. This is very useful when working together with other developers.
To install all the necessary packages as specified in the Pipfile
(even for the initial install), you can simply run:
$ pipenv install
Installing dependencies from Pipfile.lock (5c99e1)…
3/3 — 00:00:00
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
Any time you run pipenv install package
, the Pipfile
will be automatically modified with your changes and checked for incompatible packages. The big downside is that pipenv
can become terribly slow for large projects. I have encountered multiple projects where a no-op pip install
would take several minutes due to the fetching and checking of the entire dependency graph. In most cases, it’s still worth it, however; the added functionality can save you a lot of headaches.
Don’t forget to run your regular Python commands with the pipenv run
prefix or from pipenv shell
.
Updating your packages
Because of the dependency graph, you can easily update your packages without having to worry about dependency conflicts. With one command, you’re done:
$ pipenv update
Should you still encounter issues with the versions because some packages haven’t been checked against each other, you can fix that by specifying the versions of the package you do or do not want:
$ pipenv install 'progressbar2!=3.47.0'
Installing progressbar2!=3.47.0…
Adding progressbar2 to Pipfile's [packages]…
Installation Succeeded
Pipfile.lock (c9327e) out of date, updating to (5c99e1)…
Success!
Updated Pipfile.lock (c9327e)!
Installing dependencies from Pipfile.lock (c9327e)…
3/3 — 00:00:00
By running that command, the packages
section of the Pipfile
changes to:
[packages]
progressbar2 = "!=3.47.0"
Deploying to production
Getting the exact same versions on all of your production servers is absolutely essential to prevent hard-to-trace bugs. For this very purpose, you can tell pipenv
to install everything as specified in the Pipenv.lock
file while still checking to see whether Pipfile.lock
is out of date. With one command, you have a fully functioning production virtual environment with all packages installed.
Let’s create a new directory and see if it all works out:
$ mkdir ../pipenv_production
$ cp Pipfile Pipfile.lock ../pipenv_production/
$ cd ../pipenv_production/
$ pipenv install --deploy
Creating a virtualenv for this project...
Pipfile: /home/wolph/workspace/pipenv_production/Pipfile
Using /usr/bin/python3 (3.10.4) to create virtualenv...
...
Successfully created virtual environment!
...
Installing dependencies from Pipfile.lock (996b11)...
2/2 — 00:00:01
$ pipenv shell
Launching subshell in virtual environment...
(pipenv_production) $ pip3 freeze
progressbar2==4.0.0
python-utils==3.1.0
All of the versions are exactly as expected and ready for use.
Running cron commands
To run your Python commands outside of the pipenv shell
, you can use the pipenv run
prefix. Instead of python
, you would run pipenv run python
. In normal usage, this is a lot less practical than activating the pipenv shell
, but for non-interactive sessions, such as cron jobs, this is an essential feature. For example, a cron job that runs at 03:00 (24-hour clock, so A.M.) every day would look something like this:
0 3 * * * cd /home/wolph/workspace/pipenv_project/ && pipenv run python script.py