Rethink before installing poetry into docker

Poetry is another black hole that consumes the disk size

Poetry is an excellent tool for managing packages and python virtual environments. But think twice before installing it into docker.

Why not

It all about the size.

Poetry is quite easy to use- it is powered by tons of dependencies:

$ poetry show --only main
attrs                22.2.0      Classes Without Boilerplate
build                0.10.0      A simple, correct Python build frontend
cachecontrol         0.12.11     httplib2 caching for requests
certifi              2022.12.7   Python package for providing Mozilla's CA Bundle.
cffi                 1.15.1      Foreign Function Interface for Python calling C code.
charset-normalizer   3.0.1       The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
cleo                 2.0.1       Cleo allows you to create beautiful and testable command-line interfaces.
crashtest            0.4.1       Manage Python errors with ease
cryptography         39.0.0      cryptography is a package which provides cryptographic recipes and primitives to Python developers.
distlib              0.3.6       Distribution utilities
dulwich              0.21.2      Python Git Library
filelock             3.9.0       A platform independent file lock.
html5lib             1.1         HTML parser based on the WHATWG HTML specification
idna                 3.4         Internationalized Domain Names in Applications (IDNA)
importlib-metadata   6.0.0       Read metadata from Python packages
importlib-resources  5.10.2      Read resources from Python packages
installer            0.6.0       A library for installing Python wheels.
jaraco-classes       3.2.3       Utility functions for Python class constructs
jeepney              0.8.0       Low-level, pure Python DBus protocol wrapper.
jsonschema           4.17.3      An implementation of JSON Schema validation for Python
keyring              23.13.1     Store and access your passwords safely.
lockfile             0.12.2      Platform-independent file locking module
more-itertools       9.0.0       More routines for operating on iterables, beyond itertools
msgpack              1.0.4       MessagePack serializer
packaging            23.0        Core utilities for Python packages
pexpect              4.8.0       Pexpect allows easy control of interactive console applications.
pkginfo              1.9.6       Query metadata from sdists / bdists / installed packages.
pkgutil-resolve-name 1.3.10      Resolve a name to an object.
platformdirs         2.6.2       A small Python package for determining appropriate platform-specific dirs, e.g. a "user data dir".
poetry-core          1.5.0       Poetry PEP 517 Build Backend
poetry-plugin-export 1.3.0       Poetry plugin to export the dependencies to various formats
ptyprocess           0.7.0       Run a subprocess in a pseudo terminal
pycparser            2.21        C parser in Python
pyproject-hooks      1.0.0       Wrappers to call pyproject.toml-based build backend hooks.
pyrsistent           0.19.3      Persistent/Functional/Immutable data structures
rapidfuzz            2.13.7      rapid fuzzy string matching
requests             2.28.2      Python HTTP for Humans.
requests-toolbelt    0.10.1      A utility belt for advanced users of python-requests
secretstorage        3.3.3       Python bindings to FreeDesktop.org Secret Service API
shellingham          1.5.0.post1 Tool to Detect Surrounding Shell
six                  1.16.0      Python 2 and 3 compatibility utilities
tomli                2.0.1       A lil' TOML parser
tomlkit              0.11.6      Style preserving TOML library
trove-classifiers    2023.1.20   Canonical source for classifiers on PyPI (pypi.org).
urllib3              1.26.14     HTTP library with thread-safe connection pooling, file post, and more.
virtualenv           20.17.1     Virtual Python Environment builder
webencodings         0.5.1       Character encoding aliases for legacy web content
zipp                 3.12.0      Backport of pathlib-compatible object wrapper for zip files

A typical installation of poetry takes more than 60 MB- it's already the size of CPython itself.

While people take jokes on node_moduels, it doesn't mean this issue is not happened in Python.

source: reddit

Further, there exists another pitfall: cffi the dependency1. This library requires extra dynamic linked library, while that dll is not available in some of the os distribution- e.g. alpine. This force the user to either chose a fatter base image like slim or installing compiler for it.

The base image issue could take extra 70 MB. But this might be a minor issue- you might already use slim image for other reasons.

These disk size, and the bandwidth consumed on deploying, are all wasted. Especially for those tiny apps. Those data are not used in production. Come on, let's save some energy for polar bears.

source: WWF

So, please rethink: is poetry needed for this docker image?

Solution

In some cases, you did need poetry for production container. Then go ahead. Here we only talk about the cases that we're only use it for dependency version lock.

Case: Bring the wheel

Build the wheel and bring it to production image. There's a step that creates constraints.txt for limiting the nested dependency version.

# builder image
# use slim image could be a lazy solution
FROM docker-registry.netbase.com/python:3.11-slim AS builder
RUN pip install poetry
WORKDIR /build
COPY PKG/ PKG/
COPY pyproject.toml poetry.lock .
RUN set -eux;\
poetry build; \
poetry export \
--without-hashes \
--format constraints.txt \
--output constraints.txt;
# production image
# it's possible to use alpine image now
# but you might still need to use slim image for some more complex apps
FROM docker-registry.netbase.com/python:3.11-alpine
COPY --from=builder /build/dist/PKG-*.whl .
COPY --from=builder /build/constraints.txt .
RUN set -eux; \
export PYTHONDONTWRITEBYTECODE=1; \
pip install \
--no-cache-dir \
--constraint constraints.txt \
PKG-*.whl; \
rm *.whl constraints.txt;

By using this solution, your app would be installed as site-wided pacakge in the image. And the scripts/entrypoints would be available in the image.

Note for environment variable PYTHONDONTWRITEBYTECODE, which is used to prevent python from compiling byte code. This option could silently reduce MBs of storage on your image.

Case: requirements.txt

Using the (legacy) requirements.txt could be a more light weight solution.

# builder image
# note only 2 files are needed for compiling the version lock
FROM docker-registry.netbase.com/python:3.11-slim AS builder
RUN pip install poetry
WORKDIR /build
ADD pyproject.toml poetry.lock ./
RUN poetry export -f requirements.txt -o requirements.txt
# production image
FROM docker-registry.netbase.com/python:3.11-alpine
COPY --from=builder /build/requirements.txt .
RUN set -eux; \
export PYTHONDONTWRITEBYTECODE=1; \
pip install \
--no-cache-dir \
--requirement requirements.txt; \
rm requirements.txt;
WORKDIR /app
COPY PKG/ PKG/

This practice only installed the dependencies, and we need to call the app like how we do during developement.

  1. It is a nested dependency come from keyring. And it's no longer needed for python >= 3.10. But at this time (2023 Feb), Python 3.7 is still maintained and I feel Python 3.8 is still a popular choice among enterprises.