[Distutils] Library instability on PyPI and impact on OpenStack

Nick Coghlan ncoghlan at gmail.com
Tue Mar 5 08:56:53 CET 2013


On Tue, Mar 5, 2013 at 8:29 AM, Mark McLoughlin <markmc at redhat.com> wrote:
> On Mon, 2013-03-04 at 12:44 -0500, Donald Stufft wrote:
>> On Monday, March 4, 2013 at 12:36 PM, Mark McLoughlin wrote:
>
>> > If parallel incompatible installs is a hopeless problem in Python,
>> > why
>> > the push to semantic versioning then rather than saying that
>> > incompatible API changes should mean a name change?
>> Forcing a name change feels ugly as all hell. I don't really see what
>> parallel installs has much to do with anything. I don't bundle anything
>> and i'm ideologically opposed to it generally but I don't typically have
>> a need for parallel installs because I use virtual environments. Why
>> don't you utilize those? (Not being snarky, actually curious).
>
> It's a fair question.
>
> To answer it with a question, how do you imagine Linux distributions
> using virtual environments such that:
>
>   $> yum install -y openstack-nova
>
> uses a virtual environment? How does it differ from bundling? (Not being
> snarky, actually curious :)
>
> The approach that some Fedora folks are trying out is called "Software
> Collections". It's not Python specific, but it's basically the same as a
> virtual environment.
>
> For OpenStack, I think we'd probably have all the Python libraries we
> require installed under e.g. /opt/rh/openstack-$version so that you
> could have programs from two different releases of OpenStack installed
> on the same system.
>
> Long time packagers are usually horrified at this idea e.g.
>
>   http://lists.fedoraproject.org/pipermail/devel/2012-December/thread.html#174872

Yes, it's the eternal tension between "I only care about making a wide
variety of applications on as easy to maintain on platform X as
possible" view of the sysadmin and the "I only care about making
application Y as easy to maintain on a wide variety of platforms as
possible" view of the developer.

Windows, Android, Mac OS X, etc, pretty much dial their software
distribution model all the way towards the developer end of the
spectrum. Linux distro maintainers need to realise that the language
communities are almost entirely down the developer end of this
spectrum, where sustainable cross-platform support is much higher
priority than making life easier for administrators for any given
platform. We're willing to work with distros to make deployment of
security updates easier, but any proposals that involve people
voluntarily making cross-platform development harder simpler aren't
going to be accepted.

>   - How many of these 200 new packages are essentially duplicates? Once
>     you go down the route of having applications bundle libraries like
>     this, there's going to basically be no sharing.

There's no sharing only if you *actually* bundle the dependencies into
each virtualenv. While full bundling is the only mode pip currently
implements, completely isolating each virtualenv, it doesn't *have* to
work that way. In particular, PEP 426 offers the opportunity to add a
"compatible release" mode to pip/virtualenv where the tool can
maintain a shared pool of installed libraries, and use *.pth files to
make an appropriate version available in each venv. Updating the
shared version to a more recent release would then automatically
update any venvs with a *.pth file that reference that release.

For example, suppose an application requires "somedep (1.3)". This
requires at least version 1.3, and won't accept 2.0. The latest
available qualifying version might be "1.5.3".

At the moment, pip will install a *copy* of somedep 1.5.3 into the
application's virtualenv. However, it doesn't have to do that. It
could, instead, install somedep 1.5.3 into a location like
"/usr/lib/shared/pip-python/somedep1/<contents>", and then add a
"somedep1.pth" file to the virtualenv that references
"/usr/lib/shared/pip-python/somedep1/".

Now, suppose we install another app, also using a virtualenv, that
requires "somedep (1.6)". The new version 1.6.0 is available now, so
we install it into the shared location and *both* applications will
end up using somedep 1.6.0.

A security update is released for "somedep" as 1.6.1 - we install it
into the shared location, and now both applications are using 1.6.1
instead of 1.6.0. Yay, that's what we wanted, just as if we had
runtime version selection, only the selection happens at install time
(when adding the *.pth file to the virtualenv) rather than at
application startup.

Finally, we install a third application that needs "somedep (2.1)". We
can't overwrite the shared version, because it isn't compatible.
Fortunately, what we can do instead is install it to
"/usr/lib/shared/pip-python/somedep2/<contents>" and create a
"somedep2.pth" file in that environment. The two virtualenvs relying
on "somedep1" are blissfully unaware anything has changed because that
version never appears anywhere on their sys.path.

Could you use this approach for the actual system site-packages
directory? No, because sys.path would become insanely long with that
many *.pth files. However, you could likely symlink to releases stored
in the *.pth friendly distribution store. But for application specific
virtual environments, it should be fine.

If any distros want that kind of thing to become a reality, though,
they're going to have to step up and contribute it. As noted above,
for the current tool development teams, the focus is on distributing,
maintaining and deploying cross-platform applications, not making it
easy to do security updates on a Linux distro. I believe it's possible
to satisfy both parties, but it's going to be up to the distros to
offer a viable plan for meeting their needs without disrupting
existing upstream practices.

I will note that making this kind of idea more feasible is one of the
reasons I am making "compatible release" the *default* in PEP 426
version specifiers, but it still needs people to actually figure out
the details and write the code.

I will also note that the filesystem layout described above is *far*
more amenable to safe runtime selection of packages than the current
pkg_resources method. The critical failure mode in pkg_resources that
can lead to a lot of weirdness is that it can end up pushing
site-packages itself on to the front of sys.path which can shadow a
*lot* of modules (in particular, an installed copy of the software
you're currently working on may shadow the version in your source
checkout - this is the bug the patch I linked earlier was needed to
resolve). Runtime selection would need more work than getting virtual
environments to work that way, but it's certainly feasible once the
installation layout is updated.

>   - What's the chance that that all of these 200 packages will be kept
>     up to date? If an application works with a given version of a
>     library and it can stick with that version, it will. As a Python
>     library maintainer, wow do you like the idea of 10 different
>     versions of you library included in Fedora?

That's a problem the distros need to manage by offering patches to how
virtual environments and installation layouts work, rather than
lamenting the fact that cross-platform developers and distro
maintainers care about different things.

>   - The next time a security issue is found in a common Python library,
>     does Fedora now have to rush out 10 parallel fixes for it?

Not if Fedora contributes the changes needed to support parallel
installs without requiring changes to existing Python applications and
libraries.

> You can see that reaction in mails like this:
>
>   http://lists.fedoraproject.org/pipermail/devel/2012-December/174944.html
>
> and the "why can't these losers just maintain compatibility" view:
>
>   http://lists.fedoraproject.org/pipermail/devel/2012-December/175028.html
>   http://lists.fedoraproject.org/pipermail/devel/2012-December/174929.html
>
> Notice folks complaining about Ruby and Java here, not Python. I can see
> Python embracing semantic versioning and "just use venv" shortly leading
> to Python being included in the list of "heretics".

Unlike Java, the Python community generally sees *actual* bundling as
evil - expressing constraints relative to a published package index is
a different thing. Dependencies in Python are typically only brought
together into a cohesive, pinned set of version by application
developers and system integrators - the frameworks and libraries often
express quite loose version requirements (and receive complaints if
they're overly restrictive).

The distros just have a harder problem than most because the set of
packages they're trying to bring together is so large, they're bound
to run into many cases of packages that have mutually incompatible
dependencies.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Distutils-SIG mailing list