[Distutils] Why I like eggs (or similar mechanisms) and my thoughts on future of buildout

Nick Coghlan ncoghlan at gmail.com
Sat Aug 20 15:02:39 EDT 2016


On 21 August 2016 at 04:00, Jim Fulton <jim at jimfulton.info> wrote:
> These really weren't goals of setuptools, which tried to fit into
> site-package-based installs and ironically resorted to unsavory techniques
> to wedge eggs in using elaborate .pth files.

Right, and pip took that approach further by making the site-packages
friendly approach the *default* approach, rather than a selectable
option.

This has had the effect of making pip not only useful for component
management in its own right, but also viable as a tool for assembling
*downstream* packages from Python source projects.

> With buildout, I chose to use eggs differently.  I simply generate scripts
> with dependencies explicitly listed in the Python path.  This is very easy
> to reason about.  It makes it easy to see what's installed by looking at the
> generated path.  This is similar to the way Java uses class paths to
> assemble jar files, which is fitting given that the design of eggs took at
> least some inspiration from jar files.
>
> I'm not a fan of Java, but I think class paths + jar files was something it
> got right.  Each program has it's own distinct environment.  If you need to
> add/remove/update a dependency, just change the path.  Want to share
> installed components between programs?  It's easy because only one installed
> component needs to be stored and different programs point to it via their
> paths.

Agreed, but similar to both conda and JARs themselves, this improved
isolation made buildout *less* useful to folks working on distro
packages that actually *wanted* to be installing their Python
components into the system Python installation.

> Pip's approach OTOH, leaves me skeptical.  When installing a package, it has
> to unpack it and install its files into a shared directory structure where
> conflicts are likely.  How does it handle these conflicts? IDK. I should,
> but I don't.

It doesn't, since we don't have full file manifests in our metadata -
if you inadvertently install both python-openid and python3-openid
into the same virtualenv, they'll trample over each other's files.

> I have the impression that uninstalling things can be
> problematic, but maybe that's been fixed.

Uninstallation is fine, as we *do* have a full file manifest after a
component has been installed.

> At best, this is a lot of
> complexity to maintain, at worse, uninstalls/reinstalls leave things behind
> that make buildout's goal of repeatability impossible to achieve.
>
> For isolation, pip relies on virtualenv.  This has always struck me as an
> extremely heavy-handed approach. I'm quite surprised that the Python
> community settled on it.  But whatever.

The pay-off for the pip model comes in the fact that using venv is
*optional* in a way that isn't generally true for other more
specifically app focused systems like conda and buildout. That doesn't
make conda and buildout wrong - it means they cut out a particular
rare-in-number-but-large-in-influence use case (Linux based operating
system development) in order to better focus on a more targeted set of
use cases (data analysis for conda, network service development for
buildout)

> A few packages (e.g. NumPy) really
> depend on the shape of the environment they're installed into so can't be
> installed with buildout, but can be installed with pip+virtualenv.

The distinction also goes the other way - pip can be used to add
capabilities to an existing Python installation without fundamentally
changing the architecture of that installation, including how it
decides to handle (or not handle, as the case may be) separating
different applications from each other.

The key thing that pip's downstream platform tend to bring to the
table is different answers to that application isolation problem,
while pip just handles dependency management and build system
invocation:

- virtualenv mainly just manipulates sys.path to amend where
site-packages is found
- Linux distros offer their system package management (including full
preinstall file manifests and associated conflict detection) as well
as chroots and Linux containers (most famously, Docker)
- *nix systems have also long offered the "modules" environment
management utility (especially popular in HPC)
- buildout has tailored per-application sys.path definitions
- conda has its own environment management tooling

> The buildout developers have discussed options for the future. We know
> there's a reckoning coming, but so far, thankfully, we've been able to put
> it off, but we don't want to be a burden on the rest of the Python
> community. (Seriously, thank you for not breaking us. :) )

I don't think buildout's a burden. While there's no logical reason for
environment isolation to be tightly coupled to dependency management
and build system invocation, when designing buildout you didn't have a
choice - setuptools was realistically the only game in town for those
two pieces, so it made sense to structure buildout around that, and
then diverge only on the isolation management side of things.

> We've debated just invoking virtualenv and pip to assemble Python
> applications.  A model we've been discussing is to let buildout recipes do
> this.  No change is needed to buildout.  There's at least one and probably
> multiple recipes that do this, although I haven't used them myself. In this
> model, a buildout could install different virtualenvs for different
> programs, allowing dependencies to be kept distinct.  I still worry about
> the integrity of these virtualenvs over time as dependencies are added,
> removed, and updated.

As a competing approach to isolation management, I doubt it would make
much sense for buildout to adopt virtualenv - it seems more logical to
me to keep your current isolation model (which has a lot to recommend
it), and instead look just to swapping in pip to replace setuptools
and easy_install for the dependency management and build system
invocation pieces.

> If I could have my way, the path of distinct package directories approach
> would still be an option for buildout, as I think it's superior.

I don't see any reason for buildout to drop the per-application
sys.path customisation approach in favour of venv's - there's nothing
wrong with it, and it avoids several of the problems that can arise
with a proliferation of venv's.

>  I'm
> hopeful that it will be possible to use wheels this way, assuming that eggs
> are retired.

Wheels can already be used this way - the "officially not supported"
aspect is using them as sys.path entries without unpacking them first.
However, eggs also won't be retired until there's a comparable
documented format that officially supports usage with zipimport (as
opposed to wheel's status where it works if you know how to make it
work, but there aren't any formal guarantees that what works today in
that respect will continue working tomorrow)

> I would also prefer that there be one library, or set of complementary
> libraries, to find, download and install packages.  I normally like
> competition, but this is such a boring and difficult domain that I don't
> really see there being interest in developing multiple solutions. Honestly,
> I'd be nervous if, in the long run, buildout and pip installed things
> differently, especially given security considerations.

For several of the core pieces shared between pip and Warehouse,
Donald has already broken out https://packaging.pypa.io

> In the long run, I suspect it would be best for some buildout developers to
> offer pip PRs that abstract functionality into libraries that buildout could
> call (and that pip called internally), although it sounds like this may
> already be happening without our help.

Only for the pip/Warehouse common components, since Donald is driving
the extraction of common requirements for those two projects.

However, there's another important activity along similar lines that
doesn't have anyone that I'm aware of actively pursuing it, which is
pulling more of the pieces of pip's PyPI client and local installation
management behaviour out into a more readily re-usable form.

While Vinay Sajip's distlib (https://pypi.python.org/pypi/distlib )
already covers a lot of that, what's currently missing is folks
looking at the common capabilities of pip and distlib to ensure that
they're actually behaving the same way, with robust test suites to
ensure they're also following the relevant standards (or else that we
update the relevant standards to match what people are actually
doing).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Distutils-SIG mailing list