[Distutils] Library instability on PyPI and impact on OpenStack

Mark McLoughlin markmc at redhat.com
Tue Mar 5 13:27:47 CET 2013


Hi Nick,

On Tue, 2013-03-05 at 17:56 +1000, Nick Coghlan wrote:
> On Tue, Mar 5, 2013 at 8:29 AM, Mark McLoughlin <markmc at redhat.com> wrote:
> > On Mon, 2013-03-04 at 12:44 -0500, Donald Stufft wrote:
> >> On Monday, March 4, 2013 at 12:36 PM, Mark McLoughlin wrote:
> >
> >> > If parallel incompatible installs is a hopeless problem in Python,
> >> > why
> >> > the push to semantic versioning then rather than saying that
> >> > incompatible API changes should mean a name change?
> >> Forcing a name change feels ugly as all hell. I don't really see what
> >> parallel installs has much to do with anything. I don't bundle anything
> >> and i'm ideologically opposed to it generally but I don't typically have
> >> a need for parallel installs because I use virtual environments. Why
> >> don't you utilize those? (Not being snarky, actually curious).
> >
> > It's a fair question.
> >
> > To answer it with a question, how do you imagine Linux distributions
> > using virtual environments such that:
> >
> >   $> yum install -y openstack-nova
> >
> > uses a virtual environment? How does it differ from bundling? (Not being
> > snarky, actually curious :)
> >
> > The approach that some Fedora folks are trying out is called "Software
> > Collections". It's not Python specific, but it's basically the same as a
> > virtual environment.
> >
> > For OpenStack, I think we'd probably have all the Python libraries we
> > require installed under e.g. /opt/rh/openstack-$version so that you
> > could have programs from two different releases of OpenStack installed
> > on the same system.
> >
> > Long time packagers are usually horrified at this idea e.g.
> >
> >   http://lists.fedoraproject.org/pipermail/devel/2012-December/thread.html#174872
> 
> Yes, it's the eternal tension between "I only care about making a wide
> variety of applications on as easy to maintain on platform X as
> possible" view of the sysadmin and the "I only care about making
> application Y as easy to maintain on a wide variety of platforms as
> possible" view of the developer.
> 
> Windows, Android, Mac OS X, etc, pretty much dial their software
> distribution model all the way towards the developer end of the
> spectrum. Linux distro maintainers need to realise that the language
> communities are almost entirely down the developer end of this
> spectrum, where sustainable cross-platform support is much higher
> priority than making life easier for administrators for any given
> platform.

I'm with you to there, but it's a bit of a vicious circle - app
developers bundle to insulate themselves from platform instability and
then the platform maintainers no longer see a benefit to platform
stability.

> We're willing to work with distros to make deployment of
> security updates easier, but any proposals that involve people
> voluntarily making cross-platform development harder simpler aren't
> going to be accepted.

Let's take it to an extreme and say there was some way to force library
maintainers to never make incompatible API changes without renaming the
project. Not what I'm suggesting, of course.

Doing that would not make app developers lives any/much harder since
they're bundling anyway.

It does, however, make life more difficult for the platform maintainers.

So, what I think you're really saying would be rejected is "any
proposals which make platform maintenance harder while not providing a
material benefit to app maintainers who bundle".

The concrete thing we're discussing here is that distros get screwed if
the incompatible API changes are commonplace and there is no easy way
for the distro to ship/install multiple versions of the same API without
going down the route of every app in the distro bundling their own
version of the API.

I don't see why that's a problem that necessarily requires disrupting
cross-platform app authors in order to address.

> >   - How many of these 200 new packages are essentially duplicates? Once
> >     you go down the route of having applications bundle libraries like
> >     this, there's going to basically be no sharing.
> 
> There's no sharing only if you *actually* bundle the dependencies into
> each virtualenv. While full bundling is the only mode pip currently
> implements, completely isolating each virtualenv, it doesn't *have* to
> work that way. In particular, PEP 426 offers the opportunity to add a
> "compatible release" mode to pip/virtualenv where the tool can
> maintain a shared pool of installed libraries, and use *.pth files to
> make an appropriate version available in each venv. Updating the
> shared version to a more recent release would then automatically
> update any venvs with a *.pth file that reference that release.
> 
> For example, suppose an application requires "somedep (1.3)". This
> requires at least version 1.3, and won't accept 2.0. The latest
> available qualifying version might be "1.5.3".
> 
> At the moment, pip will install a *copy* of somedep 1.5.3 into the
> application's virtualenv. However, it doesn't have to do that.

Awesome! Now we're getting on to figuring out a solution to the
"parallel installs of multiple incompatible versions" issue :)

> It
> could, instead, install somedep 1.5.3 into a location like
> "/usr/lib/shared/pip-python/somedep1/<contents>", and then add a
> "somedep1.pth" file to the virtualenv that references
> "/usr/lib/shared/pip-python/somedep1/".
> 
> Now, suppose we install another app, also using a virtualenv, that
> requires "somedep (1.6)". The new version 1.6.0 is available now, so
> we install it into the shared location and *both* applications will
> end up using somedep 1.6.0.
> 
> A security update is released for "somedep" as 1.6.1 - we install it
> into the shared location, and now both applications are using 1.6.1
> instead of 1.6.0. Yay, that's what we wanted, just as if we had
> runtime version selection, only the selection happens at install time
> (when adding the *.pth file to the virtualenv) rather than at
> application startup.
> 
> Finally, we install a third application that needs "somedep (2.1)". We
> can't overwrite the shared version, because it isn't compatible.
> Fortunately, what we can do instead is install it to
> "/usr/lib/shared/pip-python/somedep2/<contents>" and create a
> "somedep2.pth" file in that environment. The two virtualenvs relying
> on "somedep1" are blissfully unaware anything has changed because that
> version never appears anywhere on their sys.path.
> 
> Could you use this approach for the actual system site-packages
> directory? No, because sys.path would become insanely long with that
> many *.pth files. However, you could likely symlink to releases stored
> in the *.pth friendly distribution store. But for application specific
> virtual environments, it should be fine.

Ok, there's a tonne of details there about pip, virtualenv and .pth
files that are going over my head right now, but the general idea I'm
taking away is:

  - the system has multiple versions of somedep installed under /usr 
    somewhere

  - the latest version (2.1) is what you get if you naively just do 
    'import somedep'

  - most applications should instead somehow explicitly say they need
    ~>1.3, ->1.6 or ~->2.0 or whatever

  - distros would have multiple copies of the same library, but only 
    one copy for each incompatible stream rather than one copy for each 
    application

That's definitely workable.

> If any distros want that kind of thing to become a reality, though,
> they're going to have to step up and contribute it. As noted above,
> for the current tool development teams, the focus is on distributing,
> maintaining and deploying cross-platform applications, not making it
> easy to do security updates on a Linux distro. I believe it's possible
> to satisfy both parties, but it's going to be up to the distros to
> offer a viable plan for meeting their needs without disrupting
> existing upstream practices.

Point well taken.

However, I am surprised the pendulum has swung such that Python platform
maintainers only worry about app maintainers and not distro maintainers.

And also, Python embracing incompatible updates and bundling is either a
new upstream practice or just that doesn't well understood on the distro
side. 

> I will note that making this kind of idea more feasible is one of the
> reasons I am making "compatible release" the *default* in PEP 426
> version specifiers, but it still needs people to actually figure out
> the details and write the code.

I still think that going down this road without the parallel installs
issue solved is a dangerous move for Python. Leaving aside pain for
distros for a moment, there was a perception (perhaps misguided) that
Python was a more stable platform that e.g. Ruby or Java.

If Python library maintainers will see PEP426 as a license to make
incompatible changes more often so long as they bump their major number,
then that perception will change.

> I will also note that the filesystem layout described above is *far*
> more amenable to safe runtime selection of packages than the current
> pkg_resources method. The critical failure mode in pkg_resources that
> can lead to a lot of weirdness is that it can end up pushing
> site-packages itself on to the front of sys.path which can shadow a
> *lot* of modules (in particular, an installed copy of the software
> you're currently working on may shadow the version in your source
> checkout - this is the bug the patch I linked earlier was needed to
> resolve). Runtime selection would need more work than getting virtual
> environments to work that way, but it's certainly feasible once the
> installation layout is updated.

Ok.

> >   - What's the chance that that all of these 200 packages will be kept
> >     up to date? If an application works with a given version of a
> >     library and it can stick with that version, it will. As a Python
> >     library maintainer, wow do you like the idea of 10 different
> >     versions of you library included in Fedora?
> 
> That's a problem the distros need to manage by offering patches to how
> virtual environments and installation layouts work, rather than
> lamenting the fact that cross-platform developers and distro
> maintainers care about different things.

I'm not lamenting what cross-platform developers care about. I'm
lamenting that the Python platform maintainers care more about the
cross-platform developers than distro maintainers :)

> >   - The next time a security issue is found in a common Python library,
> >     does Fedora now have to rush out 10 parallel fixes for it?
> 
> Not if Fedora contributes the changes needed to support parallel
> installs without requiring changes to existing Python applications and
> libraries.

"Patches welcome" - I get it.

> > You can see that reaction in mails like this:
> >
> >   http://lists.fedoraproject.org/pipermail/devel/2012-December/174944.html
> >
> > and the "why can't these losers just maintain compatibility" view:
> >
> >   http://lists.fedoraproject.org/pipermail/devel/2012-December/175028.html
> >   http://lists.fedoraproject.org/pipermail/devel/2012-December/174929.html
> >
> > Notice folks complaining about Ruby and Java here, not Python. I can see
> > Python embracing semantic versioning and "just use venv" shortly leading
> > to Python being included in the list of "heretics".
> 
> Unlike Java, the Python community generally sees *actual* bundling as
> evil

I think what you call "*actual* bundling" is what I think of as
"vendorisation" - i.e. where an app actually copies a library into its
source tree?

By bundling, I mean that an app sees itself as in control of the
versions of its dependencies. The app developer fundamentally thinks she
is delivering a specific stack of dependencies and her application code
on top rather than installing just their app and running it on a stable
platform.

>  - expressing constraints relative to a published package index is
> a different thing. Dependencies in Python are typically only brought
> together into a cohesive, pinned set of version by application
> developers and system integrators - the frameworks and libraries often
> express quite loose version requirements (and receive complaints if
> they're overly restrictive).
> 
> The distros just have a harder problem than most because the set of
> packages they're trying to bring together is so large, they're bound
> to run into many cases of packages that have mutually incompatible
> dependencies.

It only takes two apps requiring two incompatible versions of the same
libary for this to become an issue.

A specific example that concerns OpenStack is you will often want server
A from version N installed alongside server B from version N+1. This is
especially true while you're migrating your deployment from version N to
N+1 since you probably want to upgrade a server at a time.

Thus, in OpenStack's case, it only takes one of our dependencies to
release an incompatible version for this to become an issue.

Python can be a stable platform and OpenStack wouldn't bundle, or it can
be an unstable platform without parallel installs and OpenStack will
bundle, or it can be an unstable platform with parallel installs and
OpenStack won't have to bundle.

Anyway, sounds like we have some ideas for parallel installs we can
investigate.

Thanks,
Mark.



More information about the Distutils-SIG mailing list