Library instability on PyPI and impact on OpenStack
Hey, (I sent this to the wrong list and was directed here by Nick. I wasn't aware of the very promising sounding PEP426 and haven't read it yet, so apologies on that. Just wanted to resend my mail here ASAP to prevent the discussion happening on the wrong list. Thanks!) Generally speaking, when a project has a large list of dependencies on libraries outside of its control, it can take one of two approaches to those dependencies: 1) specify the minimum required version of each library and assume new releases of all your dependencies will be backwards compatible with previous versions. You feel safe that if an incompatible version of the library is released, it will be a completely new stream and you can adopt the new stream at your leisure. I'm much more familiar with C libraries than any other language. Somehow, C library maintainers seem to understand the need for this approach and so you've got mechanisms like libtool managed sonames and parallel installable libraries like gtk, gtk2, gtk3. 2) specify exactly what version of each library to use, because you assume all of your dependencies are constantly changing their APIs and breaking your app This is what you see in the Ruby (bundler) and Java (maven) worlds. For distribution packagers, it presents horrific problems - either you accept that you're going to be packaging (potentially many) different versions of the same library and that any time a security issue comes up you need to patch each version. Personally, I tend to pour scorn on Ruby and Java folks for the chaotic nature that pushes app developers down this "I need to control the whole stack of dependencies because they're so unstable" path. OTOH, I hear the Perl community are pretty good about taking the first approach. I always felt that the Python community tended more towards the former approach, but there always exceptions to the rule - to unfairly pick one one project, sqlalchemy seems to have an API that often changes incompatibly. However, OpenStack is starting to get burned more often and some are advocating taking the second approach to managing our dependencies: http://lists.openstack.org/pipermail/openstack-dev/2013-February/thread.html... http://lists.openstack.org/pipermail/openstack-dev/2013-February/thread.html... http://lists.openstack.org/pipermail/openstack-dev/2012-November/002075.html It's probably not worthwhile for everyone to try and read the nuances of those threads. The tl;dr is we're hurting and hurting bad. Is this a problem the OpenStack and Python communities want to solve together? Or does the Python community fundamentally seem themselves as taking the same approach as the Ruby and Java communities? Maybe it sounds like I'm trolling, but this is an honest question. What I'd really like to hear back is "please, please OpenStack keep using the first approach and we can work through the issues you're seeing and together make Python better". We can totally do that IMHO. If people want an example of the kind of stuff things we need to work on: http://lists.openstack.org/pipermail/openstack-dev/2013-February/006048.html Thanks, Mark.
On Thu, Feb 28, 2013 at 10:39 AM, Mark McLoughlin <markmc@redhat.com> wrote:
Hey,
(I sent this to the wrong list and was directed here by Nick. I wasn't aware of the very promising sounding PEP426 and haven't read it yet, so apologies on that. Just wanted to resend my mail here ASAP to prevent the discussion happening on the wrong list. Thanks!)
Generally speaking, when a project has a large list of dependencies on libraries outside of its control, it can take one of two approaches to those dependencies:
1) specify the minimum required version of each library and assume new releases of all your dependencies will be backwards compatible with previous versions. You feel safe that if an incompatible version of the library is released, it will be a completely new stream and you can adopt the new stream at your leisure.
I'm much more familiar with C libraries than any other language. Somehow, C library maintainers seem to understand the need for this approach and so you've got mechanisms like libtool managed sonames and parallel installable libraries like gtk, gtk2, gtk3.
2) specify exactly what version of each library to use, because you assume all of your dependencies are constantly changing their APIs and breaking your app
This is what you see in the Ruby (bundler) and Java (maven) worlds. For distribution packagers, it presents horrific problems - either you accept that you're going to be packaging (potentially many) different versions of the same library and that any time a security issue comes up you need to patch each version.
Personally, I tend to pour scorn on Ruby and Java folks for the chaotic nature that pushes app developers down this "I need to control the whole stack of dependencies because they're so unstable" path.
OTOH, I hear the Perl community are pretty good about taking the first approach.
I always felt that the Python community tended more towards the former approach, but there always exceptions to the rule - to unfairly pick one one project, sqlalchemy seems to have an API that often changes incompatibly.
However, OpenStack is starting to get burned more often and some are advocating taking the second approach to managing our dependencies:
http://lists.openstack.org/pipermail/openstack-dev/2013-February/thread.html... http://lists.openstack.org/pipermail/openstack-dev/2013-February/thread.html... http://lists.openstack.org/pipermail/openstack-dev/2012-November/002075.html
It's probably not worthwhile for everyone to try and read the nuances of those threads. The tl;dr is we're hurting and hurting bad. Is this a problem the OpenStack and Python communities want to solve together? Or does the Python community fundamentally seem themselves as taking the same approach as the Ruby and Java communities?
Maybe it sounds like I'm trolling, but this is an honest question. What I'd really like to hear back is "please, please OpenStack keep using the first approach and we can work through the issues you're seeing and together make Python better". We can totally do that IMHO.
Briefly in PEP 426 we are likely to copy the Ruby behavior as the default (without using the ~> operator itself) which is to depend on "the remainder of a particular release series". In Ruby gems ~> 4.2.3 means >= 4.2.3, < 4.3.0 and the version numbers are expected to say something about backwards compatibility. On PyPI the version numbers don't necessarily mean anything but I hope that will change. I consider it good form for a setup.py to declare as loose dependencies as possible (no version qualifier or a >= version qualifier) and for an application to provide a requires.txt or a buildout that has stricter requirements.
On Thursday, February 28, 2013 at 10:39 AM, Mark McLoughlin wrote:
Hey,
(I sent this to the wrong list and was directed here by Nick. I wasn't aware of the very promising sounding PEP426 and haven't read it yet, so apologies on that. Just wanted to resend my mail here ASAP to prevent the discussion happening on the wrong list. Thanks!)
Generally speaking, when a project has a large list of dependencies on libraries outside of its control, it can take one of two approaches to those dependencies:
1) specify the minimum required version of each library and assume new releases of all your dependencies will be backwards compatible with previous versions. You feel safe that if an incompatible version of the library is released, it will be a completely new stream and you can adopt the new stream at your leisure.
I'm much more familiar with C libraries than any other language. Somehow, C library maintainers seem to understand the need for this approach and so you've got mechanisms like libtool managed sonames and parallel installable libraries like gtk, gtk2, gtk3.
2) specify exactly what version of each library to use, because you assume all of your dependencies are constantly changing their APIs and breaking your app
This is what you see in the Ruby (bundler) and Java (maven) worlds. For distribution packagers, it presents horrific problems - either you accept that you're going to be packaging (potentially many) different versions of the same library and that any time a security issue comes up you need to patch each version.
Personally, I tend to pour scorn on Ruby and Java folks for the chaotic nature that pushes app developers down this "I need to control the whole stack of dependencies because they're so unstable" path.
OTOH, I hear the Perl community are pretty good about taking the first approach.
I always felt that the Python community tended more towards the former approach, but there always exceptions to the rule - to unfairly pick one one project, sqlalchemy seems to have an API that often changes incompatibly.
SQLAlchemy is a 0.x so it's API is explicitly not stable yet.
However, OpenStack is starting to get burned more often and some are advocating taking the second approach to managing our dependencies:
http://lists.openstack.org/pipermail/openstack-dev/2013-February/thread.html... http://lists.openstack.org/pipermail/openstack-dev/2013-February/thread.html... http://lists.openstack.org/pipermail/openstack-dev/2012-November/002075.html
It's probably not worthwhile for everyone to try and read the nuances of those threads. The tl;dr is we're hurting and hurting bad. Is this a problem the OpenStack and Python communities want to solve together? Or does the Python community fundamentally seem themselves as taking the same approach as the Ruby and Java communities?
Maybe it sounds like I'm trolling, but this is an honest question. What I'd really like to hear back is "please, please OpenStack keep using the first approach and we can work through the issues you're seeing and together make Python better". We can totally do that IMHO.
I don't think OpenStack should be pinning to exact versions in it's releases. I do think that individual installs should be pinning to exact versions. I don't think that you're likely to ever get a complete consensus on foo2, foo3 vs foo 2.0, foo 3.0 (personally I prefer foo 2.0, foo 3.0). However my suggestion would be to express "expected" version ranges. For example SQLAlchemy (to continue on your example) tends to bump it's second digit (probably while it's still in 0.x) anytime a backwards incompat change is made. So Instead of depending on `SQLalchemy`, or `SQLAlchemy==0.8.0` you could do something like `SQLAlchemy>=0.8.0,<0.9`. This should function similarly to the "gtk, gtk2, gtk3" that you're used to. Different projects will use different places for it's significant bit though. In the future there will likely be a shortcut to handle this in the ~> operator (~>0.8.1 would be equivalent to >=0.8.1<0.9).
If people want an example of the kind of stuff things we need to work on:
http://lists.openstack.org/pipermail/openstack-dev/2013-February/006048.html
Thanks, Mark.
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org (mailto:Distutils-SIG@python.org) http://mail.python.org/mailman/listinfo/distutils-sig
On 28-02-13 16:39, Mark McLoughlin wrote:
Generally speaking, when a project has a large list of dependencies on libraries outside of its control, it can take one of two approaches to those dependencies:
1) specify the minimum required version of each library and assume new releases of all your dependencies will be backwards compatible with previous versions.
2) specify exactly what version of each library to use, because you assume all of your dependencies are constantly changing their APIs and breaking your app
I use versions in two places: - In my setup.py I note minimum versions. "xyz >= 1.2". In a rare case, I can say "<2dev" or so. - In my buildout configuration (or requirements.txt), I pin versions exactly. A bit of extra explanation on the last part: I have one basic set of version pins (KGS, Known Good Set) that I use in most places, so not every project needs to keep track of all versions. Something like http://packages.lizardsystem.nl/kgs/1.30/versions.cfg The minimum requirements keep your software working and keep yourself from picking known wrong versions. The version pinning keeps everything reliable. (Note that you can get buildout to always print any new versions that it picked, so it is easy to keep everything up-to-date regarding pins). Reinout -- Reinout van Rees http://reinout.vanrees.org/ reinout@vanrees.org http://www.nelen-schuurmans.nl/ "If you're not sure what to do, make something. -- Paul Graham"
yes, to what Reinout said. also, I thought this post from one of your threads was good http://lists.openstack.org/pipermail/openstack-dev/2013-February/006103.html if you want repeatable installs and testing throughout your release cycle, you have to pin your whole dependency tree. but in order to not get "frozen" on old stuff, you'll want a process for testing out upgrades in a controlled way, but not just letting it happen unpredictably. Marcus
On Feb 28, 2013, at 7:39 AM, Mark McLoughlin <markmc@redhat.com> wrote:
I always felt that the Python community tended more towards the former approach, but there always exceptions to the rule - to unfairly pick one one project, sqlalchemy seems to have an API that often changes incompatibly.
For what it's worth, Twisted takes backwards compatibility very seriously. I try to mention this frequently in the hopes that more Python projects will adopt a similar policy: <http://twistedmatrix.com/trac/wiki/CompatibilityPolicy>. I hear that openstack has chosen to avail themselves of some different networking libraries... but you have deprived me of the opportunity for snark by apparently having no difficulty with compatibility in that layer :). Unfortunately, the tenor of the community has changed somewhat recently, with Python 3 being a clarion call for people who want to screw over their users with an incompatible release. There has been some cultural fallout from that, because it sets a bad example. Personally, I have found myself in a dozen serious conversations over the last two years or so when talking to other package maintainers (names withheld to protect the guilty) contemplating some massive incompatible change with no warning, because "everything is going to break anyway, so why not take the opportunity"; it's important that cooler heads prevail here :). Luckily, in most cases I was able to convince those folks that multiple simultaneous levels of breakage are a bad idea... However, many, perhaps even most, projects have made the transition gracefully without any API breakage, as we are attempting to, and to their credit, the core developers *have* said repeatedly that breaking API while porting to Python 3 is a singularly terrible idea. I see that most of your problems have been with PyParsing, though. PyParsing has also been infamously bad in terms of having no stable API, despite being well-post-1.0. When I last used it, the breakage was so bad that we just started bundling in a specific version in our code; even dot-releases were breaking things majorly. You can still see that here: <http://bazaar.launchpad.net/~divmod-dev/divmod.org/trunk/view/head:/Imaginary/imaginary/pyparsing.py>. Overall though, your impression of the Python community at large *is* accurate; breakages come with deprecation warnings beforehand, and mature packages try hard to maintain some semblance of a compatible API, even if there is some disagreement over what "compatible" means. -glyph
Hey, On Thu, 2013-02-28 at 15:39 +0000, Mark McLoughlin wrote: ...
However, OpenStack is starting to get burned more often and some are advocating taking the second approach to managing our dependencies:
http://lists.openstack.org/pipermail/openstack-dev/2013-February/thread.html... http://lists.openstack.org/pipermail/openstack-dev/2013-February/thread.html... http://lists.openstack.org/pipermail/openstack-dev/2012-November/002075.html
It's probably not worthwhile for everyone to try and read the nuances of those threads. The tl;dr is we're hurting and hurting bad. Is this a problem the OpenStack and Python communities want to solve together? Or does the Python community fundamentally seem themselves as taking the same approach as the Ruby and Java communities? ...
I've tried to digest PEP426 and collected my thoughts below. Apologies if you read through them and they offer you nothing useful. I'm trying to imagine the future we're driving towards here where OpenStack is no longer suffering and how PEP426 helped get us there e.g. where - Many libraries use semantic versioning and OpenStack specifies x.y compatible versions in their dependency lists. Incompatible changes only get made in x+N.y and OpenStack continues using the x.y+N.z versions. - All the libraries which don't use semantic versioning use another clear versioning scheme that allows OpenStack to avoid incompatible updates while still using compatible updates. - Incompatible versions of the same library are routinely installed in parallel. Does PEP426 here, or is all the work to be done in tools like PyPI, pip, setuptools, etc. Apps somehow specify which version they want to use since the incompatible versions of the library use the same namespace. There are fairly large gaps in my understanding of the plan here. How quickly will we see libraries adopt PEP426? Will python 2.6/2.7 based projects be able to consume these? How do parallel installs work in practice? etc. Thanks, Mark. == Basic Principles == What we (OpenStack) need from you (python library maintainers): 1) API stability or, at least, predictable instability. We'd like to say "we require version N or any compatible later version" because just saying "we require version N" means our users can't use our application with later incompatible versions of your library. 2) Support for installing multiple incompatible versions at once[*]. If there are two applications in the same system/distribution that require incompatible versions of the same library, you need to support having both installed at once. [1] - throws me back to Havoc Pennington's way-back-when essay on parallel installable incompatible versions of GNOME libraries: http://www106.pair.com/rhp/parallel.html == Versioning == Semantic versioning is appealing here because, assuming all libraries adopt it, it becomes very easy for us to predict which versions will be incompatible. For any API unstable library (0.x in semantic versioning), we need to pin to a very specific version and require distributions to package that exact version in order to run OpenStack. When moving to a newer version of the library, we need to move all OpenStack projects at once. Ideally, we'd just avoid such libraries. Implied in semantic versioning, though, is that it's possible for distributions to include both version X.y.z and X+N.y.z == PEP426 == === Date Based Versioning === OpenStack uses date based versioning for its server components and we've begun using it for the Oslo libraries described above. PEP426 says: Date based release numbers are explicitly excluded from compatibility with this scheme, as they hinder automatic translation to other versioning schemes, as well as preventing the adoption of semantic versioning without changing the name of the project. Accordingly, a leading release component greater than or equal to 1980 is an error. This seems odd and the rationale seems week. It looks like an abritrary half-way house between requiring semantic versioning and allowing any scheme of increasing versions. I'm really not sure how we'd deal with this gracefully - assuming we have releases out there which require e.g. oslo-config>=2013.1 and we switch to: Version: 1.2 Private-Version: 2013.2 then is the >=2013.1 requirement satisfied by the private version? If not, should we just go ahead and take the pain now of releasing 2013.1 as 1.1? === Compatible Release Version Specifier === I know this is copied from Ruby's spermy operator, but it seems to be to be fairly useless (overly pessimistic, at least) with semantic versioning. i.e. assuming x.y.z releases, where an increase in X means that incompatible changes have been made then: ~>1.2.3 unnecessarily limits you to the 1.2.y series of releases. Now, in an ideal world, you would just say: ~>1.2 and you're good for the entire 1.x.y but often you know what your app doesn't work with 1.2.2 and there's a fix in 1.2.3 that your app absolutely needs. A random implementation detail I don't get here ... we use 'pip install -r' a lot in OpenStack and so have files with e.g. PasteDeploy>=1.5.0 how will you specify compatible releases in these files? What's the equivalent of PasteDeploy~>1.5.0 ? === Multiple Release Streams === If version 1.1.0 and version 1.2.0 were already published to PyPI, we'd like to be able to release a 1.1.1 bugfix update. I saw some mention that PyPI would then treat 1.1.1 as the latest release, but I can't find where I read that now. Could someone confirm this won't be an issue in future. I guess I'm unclear how the PEP relatest to problems with PyPI specifically. === Python 2 vs 3 === Apparently, part of our woes are down to library maintainers releasing a version of their library that isn't supported on Python 2 (OpenStack currently supports running on 2.6 and 2.7). This is obviously an incompatible change from our perspective. Going back the basic principles above, we'd like to be able to: 1) specify that we "require version N or any later compatible version that supports python 2.6. and 2.7". 2) have both the python 2 and python 3 compatible versions installed on the same system or included in the same distribution. Environment markers in PEP426 sound tantalizingly close to (1) but AFAICT the semantics of markers are "the set of things that must be true for this field to be considered" i.e. Requires-Dist: PasteDeploy (>=1.5.0); python_version == '2.6' or python_version == '2.7' means PasteDeploy is required on python 2.6/2.7 installs, but not otherwise. I really don't know what the story with (2) is. == Oslo Libraries == The approach we had planned to take to versioning for the Oslo family of libraries in OpenStack is: - APIs can be marked as deprecated in a release, kept around for another release and then removed. For example, if we deprecated an API in the H release it would not be remove until 1 year later in the J release. - If we wanted to go down the route of a version with incompatible API changes with no deprecation period for the old APIs, we'd actually just introduce the new APIs under a new library name like oslo-config2. We could continue to support the old APIs as a separate project for a time. This is an approach tailored to Oslo's use case as a library used by server projects on a 6 month coordinated release cycle. It allows servers from the H release to use Oslo libraries from the I release, further allowing servers from the H and I release to be in the same system/distribution. To encode this in a requirements file, I guess we'd do: oslo-config>=2013.1,<2014.1 although that does mean predicting that 2014.1 changes which are incompatible with 2013.1, even though it may not actually do so. This approach doesn't work so well for "normal" Python libraries because it's impossible to predict when it will be reasonable to remove a deprecated API. Who's to say that no-one will have a 3 year old application installed on their system that they expect to still run fine even if they update to a newer version of your library?
Hey On Thu, 2013-02-28 at 10:46 -0500, Daniel Holth wrote:
Briefly in PEP 426 we are likely to copy the Ruby behavior as the default (without using the ~> operator itself) which is to depend on "the remainder of a particular release series". In Ruby gems ~> 4.2.3 means >= 4.2.3, < 4.3.0 and the version numbers are expected to say something about backwards compatibility.
Thanks! Couple of questions on that in my other mail: 1) This doesn't seem to jive with semantic versioning where you want to say "1.2.3 or any later compatible version" rather than "1.2.3 or any later version in the 1.2 series" 2) How do you do this in requires.txt type files without the operator?
On PyPI the version numbers don't necessarily mean anything but I hope that will change.
Ok, and catalog-sig is the place to follow progress there?
I consider it good form for a setup.py to declare as loose dependencies as possible (no version qualifier or a >= version qualifier) and for an application to provide a requires.txt or a buildout that has stricter requirements.
Interesting! I feel like I'm missing some context on the latter part - mostly because I hadn't come across buildout, so more reading for me! - but if the idea is that a buildout/requires.txt specifies the versions that a developer should use when working on the project ... how do you avoid a situation where developers are happily working on one stack of libraries and the app either no longer works with the minimum versions specified in setup.py or the latest versions published upstream? Thanks, Mark.
On Fri, 2013-03-01 at 01:06 -0800, Glyph wrote:
On Feb 28, 2013, at 7:39 AM, Mark McLoughlin <markmc@redhat.com> wrote:
I always felt that the Python community tended more towards the former approach, but there always exceptions to the rule - to unfairly pick one one project, sqlalchemy seems to have an API that often changes incompatibly.
For what it's worth, Twisted takes backwards compatibility very seriously. I try to mention this frequently in the hopes that more Python projects will adopt a similar policy: <http://twistedmatrix.com/trac/wiki/CompatibilityPolicy>.
Very nice!
I hear that openstack has chosen to avail themselves of some different networking libraries... but you have deprived me of the opportunity for snark by apparently having no difficulty with compatibility in that layer :).
I won't digress into discussing other difficulties in that layer :) [snip useful and interesting context]
Overall though, your impression of the Python community at large *is* accurate; breakages come with deprecation warnings beforehand, and mature packages try hard to maintain some semblance of a compatible API, even if there is some disagreement over what "compatible" means.
Thanks! That gives me hope that OpenStack should continue assuming by default that Python library updates will be compatible. The difficulty is thus identifying the individual versioning semantics for all the libraries we use and tailoring our requirement specifications to each library. Cheers, Mark.
On 03-03-13 17:07, Mark McLoughlin wrote:
I consider it good form for a setup.py to declare as loose dependencies as possible (no version qualifier or a >= version qualifier) and for an application to provide a requires.txt or a buildout that has stricter requirements. Interesting!
I feel like I'm missing some context on the latter part - mostly because I hadn't come across buildout, so more reading for me! - but if the idea is that a buildout/requires.txt specifies the versions that a developer should use when working on the project ... how do you avoid a situation where developers are happily working on one stack of libraries and the app either no longer works with the minimum versions specified in setup.py or the latest versions published upstream?
Unless you explicitly test for this, you cannot really know. - If you always unpin the vesions and grab the latest one, you cannot be sure it still works with the minimum version. - If you pin on the lowest version, you cannot know if it breaks with the latest. You could test for this explicitly by having two different buildouts (or requirements.txt files) and test both. Perhaps some input from my practice can help. In reality it isn't so much of a problem. - If I encounter a problem ("hey, I now should use djangorestframework 2.x instead of the 0.3.x I was using because some package uses the new imports"), I fix up my setup.py. A colleague switched to the latest djangorestframework with an incompatible API, so my code broke. I added the ">= 2.0" requirement to my setup.py, ensuring I'm at least getting a good explicit error message instead of a unclear ImportError when running my site. - Sites are conservative and are pinned down solid. - Libraries are more loose. Often certain packages aren't pinned, which makes you note errors earlier. The 2nd and 3rd point provide a bit of certainty at both ends of the spectrum. The 1st is a handy method to ensure nothing goes terribly wrong. Ok, it is reactive instead of proactive, but you should encounter errors early enough (when developing features) so you can update your min/max versions before you do a release. Reinout -- Reinout van Rees http://reinout.vanrees.org/ reinout@vanrees.org http://www.nelen-schuurmans.nl/ "If you're not sure what to do, make something. -- Paul Graham"
On Mon, Mar 4, 2013 at 1:54 AM, Mark McLoughlin <markmc@redhat.com> wrote:
I've tried to digest PEP426 and collected my thoughts below. Apologies if you read through them and they offer you nothing useful.
I'm trying to imagine the future we're driving towards here where OpenStack is no longer suffering and how PEP426 helped get us there
e.g. where
- Many libraries use semantic versioning and OpenStack specifies x.y compatible versions in their dependency lists. Incompatible changes only get made in x+N.y and OpenStack continues using the x.y+N.z versions.
Yep, PEP 426 pushes projects heavily in that direction by making it the path of least resistance. You *can* do something different if necessary, it's just less convenient than simply using the first 9 clauses of semantic versioning.
- All the libraries which don't use semantic versioning use another clear versioning scheme that allows OpenStack to avoid incompatible updates while still using compatible updates.
Yes, the PEP will explicitly encourage projects to document how to define dependencies if the default "compatible release" clause isn't appropriate.
- Incompatible versions of the same library are routinely installed in parallel. Does PEP426 here, or is all the work to be done in tools like PyPI, pip, setuptools, etc. Apps somehow specify which version they want to use since the incompatible versions of the library use the same namespace.
PEP 426 doesn't help here, and isn't intended to. virtualenv (and the integrated venv in 3.3+) are the main solution being offered in this space (the primary difference with simple bundling is that you still keep track of your dependencies, so you have some hope of properly rolling out security fixes). Fedora (at least) is experimenting with similar capabilities through software collections. IMO, the success of iOS, Android (and Windows) as deployment targets means the ISVs have spoken: bundling is the preferred option once you get above the core OS level. That means it's on us to support bundling in a way that doesn't make rolling out security updates a nightmare for system administrators, rather than trying to tell ISVs that bundling isn't supported.
There are fairly large gaps in my understanding of the plan here. How quickly will we see libraries adopt PEP426? Will python 2.6/2.7 based projects be able to consume these?
PEP 426 is aimed primarily at the current ecosystem of tools (setuptools/distribute/pip/zc.buildout and others). There are limitations to the current PyPI API that prevent them from taking full advantage of the extended metadata, however, so full exploitation won't be possible until the server side gets sorted out (which is an ongoing discussion).
How do parallel installs work in practice? etc.
At the moment, they really don't. setuptools/distribute do allow it to some degree, but it can go wrong if you're not sufficiently careful with it (e.g. http://git.beaker-project.org/cgit/beaker/commit/?h=develop&id=d4077a118627b947a3c814cd3ff9280afeeecd73).
Thanks, Mark.
== Basic Principles ==
What we (OpenStack) need from you (python library maintainers):
1) API stability or, at least, predictable instability. We'd like to say "we require version N or any compatible later version" because just saying "we require version N" means our users can't use our application with later incompatible versions of your library.
I assume you mean s/incompatible/compatible/ in the last example. I agree, which is why PEP 426 is deliberately more opinionated than PEP 345 about what versions should mean (although I need to revise how that advocacy is currently handled).
2) Support for installing multiple incompatible versions at once[*]. If there are two applications in the same system/distribution that require incompatible versions of the same library, you need to support having both installed at once.
virtualenv/venv cover this (see http://www.virtualenv.org/en/1.9.X/ and http://www.python.org/dev/peps/pep-0405/
== Versioning ==
Semantic versioning is appealing here because, assuming all libraries adopt it, it becomes very easy for us to predict which versions will be incompatible.
For any API unstable library (0.x in semantic versioning), we need to pin to a very specific version and require distributions to package that exact version in order to run OpenStack. When moving to a newer version of the library, we need to move all OpenStack projects at once. Ideally, we'd just avoid such libraries.
Implied in semantic versioning, though, is that it's possible for distributions to include both version X.y.z and X+N.y.z
My point of view is that the system Python is there primarily to run system utilities and user scripts, rather than arbitrary Python applications. Users can install alternate versions of software into their user site directories, or into virtual environments. Projects are, of course, also free to include part of their version number in the project name. The challenge of dynamic linking different on-disk versions of a module into a process is that: - the import system simply isn't set up to work that way (setuptools/distribute try to fake it by adjusting sys.path, but that can go quite wrong at times) - it's confusing for users, since it isn't always clear which version they're going to see - errors can appear arbitrarily late, since module loading is truly dynamic
== PEP426 ==
=== Date Based Versioning ===
OpenStack uses date based versioning for its server components and we've begun using it for the Oslo libraries described above. PEP426 says:
Date based release numbers are explicitly excluded from compatibility with this scheme, as they hinder automatic translation to other versioning schemes, as well as preventing the adoption of semantic versioning without changing the name of the project. Accordingly, a leading release component greater than or equal to 1980 is an error.
This seems odd and the rationale seems week. It looks like an abritrary half-way house between requiring semantic versioning and allowing any scheme of increasing versions.
The assumption is that the version field will follow semantic versioning (that's not where the PEP started, it was originally non-committal on the matter like PEP 345 - however I've come to the conclusion that this assumption needs to be made explicit, but still need to update various parts of the PEP that are still wishy-washy about it). However, we can't actually enforce semantic versioning because we can't automatically detect if a project is adhering to those semantics. What we *can* enforce is the rejection of version numbers that obviously look like years rather than a semantic version.
I'm really not sure how we'd deal with this gracefully - assuming we have releases out there which require e.g. oslo-config>=2013.1 and we switch to:
Version: 1.2 Private-Version: 2013.2
then is the >=2013.1 requirement satisfied by the private version? If not, should we just go ahead and take the pain now of releasing 2013.1 as 1.1?
No version of the metadata will ever support ordered comparisons of the "Private-Version" field, since that's an arbitrary tag with no implied ordering mechanism. I expect we'll add support for == and != comparisons against private tags somewhere along the line once PyPI is publishing the enhanced metadata properly.
=== Compatible Release Version Specifier ===
I know this is copied from Ruby's spermy operator, but it seems to be to be fairly useless (overly pessimistic, at least) with semantic versioning.
i.e. assuming x.y.z releases, where an increase in X means that incompatible changes have been made then:
~>1.2.3
unnecessarily limits you to the 1.2.y series of releases. Now, in an ideal world, you would just say:
~>1.2
and you're good for the entire 1.x.y but often you know what your app doesn't work with 1.2.2 and there's a fix in 1.2.3 that your app absolutely needs.
Handling that kind of situation is why version specifiers allow multiple clauses that are ANDed together: mydependency (1.2, >= 1.2.3)
A random implementation detail I don't get here ... we use 'pip install -r' a lot in OpenStack and so have files with e.g.
PasteDeploy>=1.5.0
how will you specify compatible releases in these files? What's the equivalent of
PasteDeploy~>1.5.0
?
Note that the pkg_resources specifier format (no parens allowed) is NOT the same as the PEP 345/426 format (parens required). The former is designed to work with arbitrary version schemes, while the latter is explicitly constrained to version schemes that abide by the constraints described in PEP 386. I expect the setuptools/distribute developers will choose a tilde-based operator to represent compatible release clauses in the legacy formats (likely Ruby's ~>, although it could be something like ~= or ~~ instead).
=== Multiple Release Streams ===
If version 1.1.0 and version 1.2.0 were already published to PyPI, we'd like to be able to release a 1.1.1 bugfix update. I saw some mention that PyPI would then treat 1.1.1 as the latest release, but I can't find where I read that now.
Could someone confirm this won't be an issue in future. I guess I'm unclear how the PEP relatest to problems with PyPI specifically.
There are major issues with the way PyPI publishes metadata - in particular, installers must currently download and *run* setup.py in order to find out the dependencies for most packages. Dev releases are also prone to being picked up as the latest release, because there's no explicit guideline for how installers should identify and exclude development releases when satisfying dependencies.
=== Python 2 vs 3 ===
Apparently, part of our woes are down to library maintainers releasing a version of their library that isn't supported on Python 2 (OpenStack currently supports running on 2.6 and 2.7).
This is obviously an incompatible change from our perspective. Going back the basic principles above, we'd like to be able to:
1) specify that we "require version N or any later compatible version that supports python 2.6. and 2.7".
You'll need to figure out which version broke compatibility and add a "< first-broken-version" clause to the version specifier for that dependency (pointing out to the offending upstream that it isn't cool to break backwards compatibility like that in a nominally backwards compatible release may also be worthwhile). However, once Requires-Python metadata is more readily available, installers should also automatically exclude any candidate versions with an unsatisfied "Requires-Python" constraint. Single source and 2to3 based Python 2.6+/3.2+ compatibility is unfortunately currently somewhat clumsy to declare, since version specifiers currently have no "or" notation: Requires-Python: >= 2.6, < 4.0, !=3.0.*, !=3.1.* Probably something to look at for metadata 2.1 (I won't delay metadata 2.0 over it, since single source compatibility *can* be specified accurately once I update the PEP to include the ".*" notation, it's just ugly).
2) have both the python 2 and python 3 compatible versions installed on the same system or included in the same distribution.
Environment markers in PEP426 sound tantalizingly close to (1) but AFAICT the semantics of markers are "the set of things that must be true for this field to be considered" i.e.
Requires-Dist: PasteDeploy (>=1.5.0); python_version == '2.6' or python_version == '2.7'
means PasteDeploy is required on python 2.6/2.7 installs, but not otherwise.
Correct.
I really don't know what the story with (2) is.
virtualenv/venv/software collections.
== Oslo Libraries ==
The approach we had planned to take to versioning for the Oslo family of libraries in OpenStack is:
- APIs can be marked as deprecated in a release, kept around for another release and then removed. For example, if we deprecated an API in the H release it would not be remove until 1 year later in the J release.
That actually matches the traditional deprecation model for CPython. However, it's hard to handle cleanly with automated dependency analysis because it's hard to answer the "is this a backwards compatible release?" question quickly and automatically, since the answer is "it should be, but may not be if the application is currently triggering deprecation warnings". Depending on how conservative people are, they'll either choose the more restrictive option of disallowing forward compatibility, or the optimistic approach of declaring forward compatibility with the rest of the 3.x series.
- If we wanted to go down the route of a version with incompatible API changes with no deprecation period for the old APIs, we'd actually just introduce the new APIs under a new library name like oslo-config2. We could continue to support the old APIs as a separate project for a time.
This is an approach tailored to Oslo's use case as a library used by server projects on a 6 month coordinated release cycle. It allows servers from the H release to use Oslo libraries from the I release, further allowing servers from the H and I release to be in the same system/distribution.
To encode this in a requirements file, I guess we'd do:
oslo-config>=2013.1,<2014.1
although that does mean predicting that 2014.1 changes which are incompatible with 2013.1, even though it may not actually do so.
PEP 426 explicitly forbids you from doing that, since it disallows version numbers that look like dates. But you can add a leading zero to make it legal (I'll be changing the advice in the PEP to suggest this solution for date-based version compatibility): oslo-config (>= 0.2013.1, < 0.2014.1) I'll also be updating the "compatible release" clause specification to pay attention to leading zeroes so that "0.N" and "0.N+1" are considered incompatible by default (which again works in well with this approach).
This approach doesn't work so well for "normal" Python libraries because it's impossible to predict when it will be reasonable to remove a deprecated API. Who's to say that no-one will have a 3 year old application installed on their system that they expect to still run fine even if they update to a newer version of your library?
Yes, we've become a lot more conservative with this even in the standard library. Semantic versioning is a better approach in general. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Just a short note to state the obvious - the specific pycparsing library from the example did exactly what it should have done and can do in the current system, by incrementing its major version number for a backwards-incompatible release. Dependency management is a job though.
On Mar 04, 2013, at 06:11 PM, Nick Coghlan wrote:
My point of view is that the system Python is there primarily to run system utilities and user scripts, rather than arbitrary Python applications. Users can install alternate versions of software into their user site directories, or into virtual environments. Projects are, of course, also free to include part of their version number in the project name.
I have the same opinion, but it doesn't make developers very happy at all. Cheers, -Barry
On Mon, Mar 4, 2013 at 10:59 AM, Barry Warsaw <barry@python.org> wrote:
On Mar 04, 2013, at 06:11 PM, Nick Coghlan wrote:
My point of view is that the system Python is there primarily to run system utilities and user scripts, rather than arbitrary Python applications. Users can install alternate versions of software into their user site directories, or into virtual environments. Projects are, of course, also free to include part of their version number in the project name.
I have the same opinion, but it doesn't make developers very happy at all.
Cheers, -Barry
Apple does (or did) something very useful by not including the stdlib source code in OS X's builtin Python, making the system Python so hilariously useless for development that no one would attempt it after the trick has been discovered. After a few frustrating hours figuring out what they've actually done it hits you "OH that Python is not for me", you download a working version from python.org, and you're on your way. :-P
In article <CAG8k2+74cUh6p_mmgvS2hLKnREim_iepBQ0u7vF9dTpxk4++zw@mail.gmail.com>, Daniel Holth <dholth@gmail.com> wrote:
Apple does (or did) something very useful by not including the stdlib source code in OS X's builtin Python, making the system Python so hilariously useless for development that no one would attempt it after the trick has been discovered. After a few frustrating hours figuring out what they've actually done it hits you "OH that Python is not for me", you download a working version from python.org, and you're on your way. :-P
FWIW, the stdlib py files are installed as part of the Xcode Command Line Tools component. -- Ned Deily, nad@acm.org
Hi Nick, Thanks for the detailed reply. I'll stick to PEP426 related topics in this mail. Generally speaking, PEP426 looks like good progress, but the biggest problem I see now is the lack of parallel installs for incompatible versions. On Mon, 2013-03-04 at 18:11 +1000, Nick Coghlan wrote:
On Mon, Mar 4, 2013 at 1:54 AM, Mark McLoughlin <markmc@redhat.com> wrote:
I've tried to digest PEP426 and collected my thoughts below. Apologies if you read through them and they offer you nothing useful.
I'm trying to imagine the future we're driving towards here where OpenStack is no longer suffering and how PEP426 helped get us there
e.g. where
- Many libraries use semantic versioning and OpenStack specifies x.y compatible versions in their dependency lists. Incompatible changes only get made in x+N.y and OpenStack continues using the x.y+N.z versions.
Yep, PEP 426 pushes projects heavily in that direction by making it the path of least resistance. You *can* do something different if necessary, it's just less convenient than simply using the first 9 clauses of semantic versioning.
- All the libraries which don't use semantic versioning use another clear versioning scheme that allows OpenStack to avoid incompatible updates while still using compatible updates.
Yes, the PEP will explicitly encourage projects to document how to define dependencies if the default "compatible release" clause isn't appropriate.
Ok, understood. Is there a case for putting this information into the metadata? e.g. if this was version 1.2.3 of a library and the project's policy is to bump the major version if an incompatible API change is made, then you can do: Requires-Dist: foo (1.2) or: Requires-Dist: foo (1.2,>=1.2.3) or: Requires-Dist: foo (>=1.2.3,<2) but how about if foo-1.2.3 contained: Compatible-With: 1 and foo-2.0.0 contained: Compatible-With: 2 then this: Requires-Dist: foo (1.2) could reject 2.0.0 as satisfying the compatible release specification.
== PEP426 ==
=== Date Based Versioning ===
OpenStack uses date based versioning for its server components and we've begun using it for the Oslo libraries described above. PEP426 says:
Date based release numbers are explicitly excluded from compatibility with this scheme, as they hinder automatic translation to other versioning schemes, as well as preventing the adoption of semantic versioning without changing the name of the project. Accordingly, a leading release component greater than or equal to 1980 is an error.
This seems odd and the rationale seems week. It looks like an abritrary half-way house between requiring semantic versioning and allowing any scheme of increasing versions.
The assumption is that the version field will follow semantic versioning (that's not where the PEP started, it was originally non-committal on the matter like PEP 345 - however I've come to the conclusion that this assumption needs to be made explicit, but still need to update various parts of the PEP that are still wishy-washy about it). However, we can't actually enforce semantic versioning because we can't automatically detect if a project is adhering to those semantics. What we *can* enforce is the rejection of version numbers that obviously look like years rather than a semantic version.
I sympathise with wanting to do *something* here. Predictable instability was the first thing I listed above and semantic versioning would give that. However, I also described the approach I'm taking to predictable instability with the Oslo libraries. Any APIs we want to remove will be marked as deprecated for a year, then removed. We can encoded that as: oslo-config>=2013.1,<2014.1 if Compatible-With existed, we could do: Compatible-With: 2013.1,2013.2 If we used semantic versioning strictly, we'd bump the major release every 6 months (since every release may see APIs which were deprecated a year ago removed). Basically, PEP426 desires to enforce some meaning around the major number and API instability, while not enforcing a meaning of API stability for the micro version. I'm not sure it achieves a whole lot except making my life harder without actually forcing me into adopting a more predictable versioning scheme. And to be clear - this approach of removing APIs after a year is because we assume only other OpenStack projects are using Oslo libraries. If that changed, we'd probably keep deprecated APIs around for a lot longer.
=== Compatible Release Version Specifier ===
I know this is copied from Ruby's spermy operator, but it seems to be to be fairly useless (overly pessimistic, at least) with semantic versioning.
i.e. assuming x.y.z releases, where an increase in X means that incompatible changes have been made then:
~>1.2.3
unnecessarily limits you to the 1.2.y series of releases. Now, in an ideal world, you would just say:
~>1.2
and you're good for the entire 1.x.y but often you know what your app doesn't work with 1.2.2 and there's a fix in 1.2.3 that your app absolutely needs.
Handling that kind of situation is why version specifiers allow multiple clauses that are ANDed together:
mydependency (1.2, >= 1.2.3)
Thanks.
A random implementation detail I don't get here ... we use 'pip install -r' a lot in OpenStack and so have files with e.g.
PasteDeploy>=1.5.0
how will you specify compatible releases in these files? What's the equivalent of
PasteDeploy~>1.5.0
?
Note that the pkg_resources specifier format (no parens allowed) is NOT the same as the PEP 345/426 format (parens required). The former is designed to work with arbitrary version schemes, while the latter is explicitly constrained to version schemes that abide by the constraints described in PEP 386. I expect the setuptools/distribute developers will choose a tilde-based operator to represent compatible release clauses in the legacy formats (likely Ruby's ~>, although it could be something like ~= or ~~ instead).
Ok, figures. Cheers, Mark.
On Mon, Mar 4, 2013 at 12:12 PM, Ned Deily <nad@acm.org> wrote:
In article <CAG8k2+74cUh6p_mmgvS2hLKnREim_iepBQ0u7vF9dTpxk4++zw@mail.gmail.com>, Daniel Holth <dholth@gmail.com> wrote:
Apple does (or did) something very useful by not including the stdlib source code in OS X's builtin Python, making the system Python so hilariously useless for development that no one would attempt it after the trick has been discovered. After a few frustrating hours figuring out what they've actually done it hits you "OH that Python is not for me", you download a working version from python.org, and you're on your way. :-P
FWIW, the stdlib py files are installed as part of the Xcode Command Line Tools component.
Only half-serious; I come to the same conclusion for different reasons on RHEL or Ubuntu. Plone solves it by just bundling the Python source code to make things predictable.
Hey, On parallel installs ... On Mon, 2013-03-04 at 18:11 +1000, Nick Coghlan wrote:
On Mon, Mar 4, 2013 at 1:54 AM, Mark McLoughlin <markmc@redhat.com> wrote:
- Incompatible versions of the same library are routinely installed in parallel. Does PEP426 here, or is all the work to be done in tools like PyPI, pip, setuptools, etc. Apps somehow specify which version they want to use since the incompatible versions of the library use the same namespace.
PEP 426 doesn't help here, and isn't intended to. virtualenv (and the integrated venv in 3.3+) are the main solution being offered in this space (the primary difference with simple bundling is that you still keep track of your dependencies, so you have some hope of properly rolling out security fixes). Fedora (at least) is experimenting with similar capabilities through software collections. IMO, the success of iOS, Android (and Windows) as deployment targets means the ISVs have spoken: bundling is the preferred option once you get above the core OS level. That means it's on us to support bundling in a way that doesn't make rolling out security updates a nightmare for system administrators, rather than trying to tell ISVs that bundling isn't supported.
The adoption of semantic versioning without parallel installs really worries me. It says that incompatible API changes are ok if you bump your major version, but the incompatible change is no less painful for users and distros. Your "bundling is the preferred option once you get above core OS level" is exactly the kind of clarity I hoped for from this thread. However, it does mean that OpenStack and distros who include OpenStack need to figure out how to bundle OpenStack and its required libraries as a single stack. You're right that Fedora has been experimenting with Software Collections, but that doesn't mean it's a solved problem: http://lists.fedoraproject.org/pipermail/devel/2012-December/thread.html#174...
How do parallel installs work in practice? etc.
At the moment, they really don't. setuptools/distribute do allow it to some degree, but it can go wrong if you're not sufficiently careful with it (e.g. http://git.beaker-project.org/cgit/beaker/commit/?h=develop&id=d4077a118627b947a3c814cd3ff9280afeeecd73).
We do something similar in Fedora right now for sqlalchemy 0.7 and migrate 0.5. It's not pretty. Is there any work going on to make this more usable?
== Versioning ==
Semantic versioning is appealing here because, assuming all libraries adopt it, it becomes very easy for us to predict which versions will be incompatible.
For any API unstable library (0.x in semantic versioning), we need to pin to a very specific version and require distributions to package that exact version in order to run OpenStack. When moving to a newer version of the library, we need to move all OpenStack projects at once. Ideally, we'd just avoid such libraries.
Implied in semantic versioning, though, is that it's possible for distributions to include both version X.y.z and X+N.y.z
My point of view is that the system Python is there primarily to run system utilities and user scripts, rather than arbitrary Python applications. Users can install alternate versions of software into their user site directories, or into virtual environments. Projects are, of course, also free to include part of their version number in the project name.
You mentioned Software Collections - that means bundling all OpenStack's Python requirements in e.g. /opt/openstack-grizzly/
The challenge of dynamic linking different on-disk versions of a module into a process is that: - the import system simply isn't set up to work that way (setuptools/distribute try to fake it by adjusting sys.path, but that can go quite wrong at times) - it's confusing for users, since it isn't always clear which version they're going to see - errors can appear arbitrarily late, since module loading is truly dynamic
If parallel incompatible installs is a hopeless problem in Python, why the push to semantic versioning then rather than saying that incompatible API changes should mean a name change? Thanks, Mark.
On Monday, March 4, 2013 at 12:36 PM, Mark McLoughlin wrote:
Hey,
On parallel installs ...
On Mon, 2013-03-04 at 18:11 +1000, Nick Coghlan wrote:
On Mon, Mar 4, 2013 at 1:54 AM, Mark McLoughlin <markmc@redhat.com (mailto:markmc@redhat.com)> wrote:
- Incompatible versions of the same library are routinely installed in parallel. Does PEP426 here, or is all the work to be done in tools like PyPI, pip, setuptools, etc. Apps somehow specify which version they want to use since the incompatible versions of the library use the same namespace.
PEP 426 doesn't help here, and isn't intended to. virtualenv (and the integrated venv in 3.3+) are the main solution being offered in this space (the primary difference with simple bundling is that you still keep track of your dependencies, so you have some hope of properly rolling out security fixes). Fedora (at least) is experimenting with similar capabilities through software collections. IMO, the success of iOS, Android (and Windows) as deployment targets means the ISVs have spoken: bundling is the preferred option once you get above the core OS level. That means it's on us to support bundling in a way that doesn't make rolling out security updates a nightmare for system administrators, rather than trying to tell ISVs that bundling isn't supported.
The adoption of semantic versioning without parallel installs really worries me. It says that incompatible API changes are ok if you bump your major version, but the incompatible change is no less painful for users and distros.
Your "bundling is the preferred option once you get above core OS level" is exactly the kind of clarity I hoped for from this thread. However, it does mean that OpenStack and distros who include OpenStack need to figure out how to bundle OpenStack and its required libraries as a single stack.
You're right that Fedora has been experimenting with Software Collections, but that doesn't mean it's a solved problem:
http://lists.fedoraproject.org/pipermail/devel/2012-December/thread.html#174...
How do parallel installs work in practice? etc.
At the moment, they really don't. setuptools/distribute do allow it to some degree, but it can go wrong if you're not sufficiently careful with it (e.g. http://git.beaker-project.org/cgit/beaker/commit/?h=develop&id=d4077a118627b947a3c814cd3ff9280afeeecd73).
We do something similar in Fedora right now for sqlalchemy 0.7 and migrate 0.5. It's not pretty.
Is there any work going on to make this more usable?
== Versioning ==
Semantic versioning is appealing here because, assuming all libraries adopt it, it becomes very easy for us to predict which versions will be incompatible.
For any API unstable library (0.x in semantic versioning), we need to pin to a very specific version and require distributions to package that exact version in order to run OpenStack. When moving to a newer version of the library, we need to move all OpenStack projects at once. Ideally, we'd just avoid such libraries.
Implied in semantic versioning, though, is that it's possible for distributions to include both version X.y.z and X+N.y.z
My point of view is that the system Python is there primarily to run system utilities and user scripts, rather than arbitrary Python applications. Users can install alternate versions of software into their user site directories, or into virtual environments. Projects are, of course, also free to include part of their version number in the project name.
You mentioned Software Collections - that means bundling all OpenStack's Python requirements in e.g.
/opt/openstack-grizzly/
The challenge of dynamic linking different on-disk versions of a module into a process is that: - the import system simply isn't set up to work that way (setuptools/distribute try to fake it by adjusting sys.path, but that can go quite wrong at times) - it's confusing for users, since it isn't always clear which version they're going to see - errors can appear arbitrarily late, since module loading is truly dynamic
If parallel incompatible installs is a hopeless problem in Python, why the push to semantic versioning then rather than saying that incompatible API changes should mean a name change?
Forcing a name change feels ugly as all hell. I don't really see what parallel installs has much to do with anything. I don't bundle anything and i'm ideologically opposed to it generally but I don't typically have a need for parallel installs because I use virtual environments. Why don't you utilize those? (Not being snarky, actually curious).
Thanks, Mark.
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org (mailto:Distutils-SIG@python.org) http://mail.python.org/mailman/listinfo/distutils-sig
On Mon, Mar 4, 2013 at 12:23 PM, Mark McLoughlin <markmc@redhat.com> wrote:
Hi Nick,
Thanks for the detailed reply. I'll stick to PEP426 related topics in this mail.
Generally speaking, PEP426 looks like good progress, but the biggest problem I see now is the lack of parallel installs for incompatible versions.
I don't know what to do with the "runtime linker" feature provided by setuptools' pkg_resources or Ruby's Gem; the approach taken is to treat it as a separate problem from the "project name, version, and list of requirements" that 426 tries to specify. PEP 376 (database of installed Python distributions; defines the .dist-info directory) has more to do with whether parallel installs happen. Oh, do you mean two incompatible versions loaded during the same invocation? Due to the way JavaScript works (no global import namespace), npm seems to allow dependencies to have their own private instances of their sub-dependencies.
On Mon, 2013-03-04 at 18:11 +1000, Nick Coghlan wrote:
On Mon, Mar 4, 2013 at 1:54 AM, Mark McLoughlin <markmc@redhat.com> wrote:
I've tried to digest PEP426 and collected my thoughts below. Apologies if you read through them and they offer you nothing useful.
I'm trying to imagine the future we're driving towards here where OpenStack is no longer suffering and how PEP426 helped get us there
e.g. where
- Many libraries use semantic versioning and OpenStack specifies x.y compatible versions in their dependency lists. Incompatible changes only get made in x+N.y and OpenStack continues using the x.y+N.z versions.
Yep, PEP 426 pushes projects heavily in that direction by making it the path of least resistance. You *can* do something different if necessary, it's just less convenient than simply using the first 9 clauses of semantic versioning.
- All the libraries which don't use semantic versioning use another clear versioning scheme that allows OpenStack to avoid incompatible updates while still using compatible updates.
Yes, the PEP will explicitly encourage projects to document how to define dependencies if the default "compatible release" clause isn't appropriate.
Ok, understood.
Is there a case for putting this information into the metadata?
e.g. if this was version 1.2.3 of a library and the project's policy is to bump the major version if an incompatible API change is made, then you can do:
Requires-Dist: foo (1.2)
or:
Requires-Dist: foo (1.2,>=1.2.3)
or:
Requires-Dist: foo (>=1.2.3,<2)
but how about if foo-1.2.3 contained:
Compatible-With: 1
and foo-2.0.0 contained:
Compatible-With: 2
then this:
Requires-Dist: foo (1.2)
could reject 2.0.0 as satisfying the compatible release specification.
== PEP426 ==
=== Date Based Versioning ===
OpenStack uses date based versioning for its server components and we've begun using it for the Oslo libraries described above. PEP426 says:
Date based release numbers are explicitly excluded from compatibility with this scheme, as they hinder automatic translation to other versioning schemes, as well as preventing the adoption of semantic versioning without changing the name of the project. Accordingly, a leading release component greater than or equal to 1980 is an error.
This seems odd and the rationale seems week. It looks like an abritrary half-way house between requiring semantic versioning and allowing any scheme of increasing versions.
The assumption is that the version field will follow semantic versioning (that's not where the PEP started, it was originally non-committal on the matter like PEP 345 - however I've come to the conclusion that this assumption needs to be made explicit, but still need to update various parts of the PEP that are still wishy-washy about it). However, we can't actually enforce semantic versioning because we can't automatically detect if a project is adhering to those semantics. What we *can* enforce is the rejection of version numbers that obviously look like years rather than a semantic version.
I sympathise with wanting to do *something* here. Predictable instability was the first thing I listed above and semantic versioning would give that.
We should take it all back and derive the whole version number from a sha hash, e.g. the Mercurial revision number. A Previous-Version: tag (multiple-use of course) lets you assemble a DAG of the release history. Just kidding :-)
However, I also described the approach I'm taking to predictable instability with the Oslo libraries. Any APIs we want to remove will be marked as deprecated for a year, then removed.
We can encoded that as:
oslo-config>=2013.1,<2014.1
if Compatible-With existed, we could do:
Compatible-With: 2013.1,2013.2
If we used semantic versioning strictly, we'd bump the major release every 6 months (since every release may see APIs which were deprecated a year ago removed).
Basically, PEP426 desires to enforce some meaning around the major number and API instability, while not enforcing a meaning of API stability for the micro version.
I'm not sure it achieves a whole lot except making my life harder without actually forcing me into adopting a more predictable versioning scheme.
We try :-) I never dreamed it would be possible to force package producers to not break things with their new releases. I think you just have to know your publishers and do plenty of testing. The goal of the version specifiers section of PEP 426 is only to define a sort order and comparison operators. We threw in semantic versioning to recommend something more sane than the "mostly compat with pkg_resources" behavior that we are stuck with until the indices get smarter. I never hoped to be able to force packagers to actually do a good job of maintaining backwards compatibility. For example, zipimport._zip_directory_cache... it's there, and people started depending on it, and it's become part of the API even though it was never meant to be one. If I remove it, which version number or compatible-with tag do I have to specify? Take a look at https://crate.io/packages/apipkg/ ; it is a neat way to make it clear which parts of your program are the API.
On Mon, 2013-03-04 at 12:44 -0500, Donald Stufft wrote:
On Monday, March 4, 2013 at 12:36 PM, Mark McLoughlin wrote:
If parallel incompatible installs is a hopeless problem in Python, why the push to semantic versioning then rather than saying that incompatible API changes should mean a name change? Forcing a name change feels ugly as all hell. I don't really see what parallel installs has much to do with anything. I don't bundle anything and i'm ideologically opposed to it generally but I don't typically have a need for parallel installs because I use virtual environments. Why don't you utilize those? (Not being snarky, actually curious).
It's a fair question. To answer it with a question, how do you imagine Linux distributions using virtual environments such that: $> yum install -y openstack-nova uses a virtual environment? How does it differ from bundling? (Not being snarky, actually curious :) The approach that some Fedora folks are trying out is called "Software Collections". It's not Python specific, but it's basically the same as a virtual environment. For OpenStack, I think we'd probably have all the Python libraries we require installed under e.g. /opt/rh/openstack-$version so that you could have programs from two different releases of OpenStack installed on the same system. Long time packagers are usually horrified at this idea e.g. http://lists.fedoraproject.org/pipermail/devel/2012-December/thread.html#174... Some of the things to think about: - Each of the Python libraries under /opt/rh/openstack-$version would come from new packages like openstack-$version-python-eventlet.rpm - how many applications in Fedora would have a big stack of "bundled" python packages like OpenStack? 5, 10, 50, 100? Let's say it's 10 and each one stack has 20 packages. That's 200 new packages which need to be maintained by Fedora versus the current situation where we (painfully) make a single stack of libraries work for all applications. - How many of these 200 new packages are essentially duplicates? Once you go down the route of having applications bundle libraries like this, there's going to basically be no sharing. - What's the chance that that all of these 200 packages will be kept up to date? If an application works with a given version of a library and it can stick with that version, it will. As a Python library maintainer, wow do you like the idea of 10 different versions of you library included in Fedora? - The next time a security issue is found in a common Python library, does Fedora now have to rush out 10 parallel fixes for it? You can see that reaction in mails like this: http://lists.fedoraproject.org/pipermail/devel/2012-December/174944.html and the "why can't these losers just maintain compatibility" view: http://lists.fedoraproject.org/pipermail/devel/2012-December/175028.html http://lists.fedoraproject.org/pipermail/devel/2012-December/174929.html Notice folks complaining about Ruby and Java here, not Python. I can see Python embracing semantic versioning and "just use venv" shortly leading to Python being included in the list of "heretics". Thanks, Mark. [1] - http://docs.fedoraproject.org/en-US/Fedora_Contributor_Documentation/1/html-...
On Mon, 2013-03-04 at 13:07 -0500, Daniel Holth wrote:
On Mon, Mar 4, 2013 at 12:23 PM, Mark McLoughlin <markmc@redhat.com> wrote:
Hi Nick,
Thanks for the detailed reply. I'll stick to PEP426 related topics in this mail.
Generally speaking, PEP426 looks like good progress, but the biggest problem I see now is the lack of parallel installs for incompatible versions.
I don't know what to do with the "runtime linker" feature provided by setuptools' pkg_resources or Ruby's Gem; the approach taken is to treat it as a separate problem from the "project name, version, and list of requirements" that 426 tries to specify. PEP 376 (database of installed Python distributions; defines the .dist-info directory) has more to do with whether parallel installs happen. Oh, do you mean two incompatible versions loaded during the same invocation?
Not sure, I'm following you but what I mean is where you have foo-1.2.3 required by program A and foo-2.0 (which isn't backwards compatible with foo-1.2.3) required by program B. Both are installed in the standard python path and both programs explicitly say which versions they need. We seem to be pretty close to this with eggs and pkg_resources etc., but Nick seems to be concerned that it's very error prone. ...
However, I also described the approach I'm taking to predictable instability with the Oslo libraries. Any APIs we want to remove will be marked as deprecated for a year, then removed.
We can encoded that as:
oslo-config>=2013.1,<2014.1
if Compatible-With existed, we could do:
Compatible-With: 2013.1,2013.2
If we used semantic versioning strictly, we'd bump the major release every 6 months (since every release may see APIs which were deprecated a year ago removed).
Basically, PEP426 desires to enforce some meaning around the major number and API instability, while not enforcing a meaning of API stability for the micro version.
I'm not sure it achieves a whole lot except making my life harder without actually forcing me into adopting a more predictable versioning scheme.
We try :-)
I never dreamed it would be possible to force package producers to not break things with their new releases. I think you just have to know your publishers and do plenty of testing.
We do a lot of testing: http://status.openstack.org/zuul/ but we also depend on a fairly large bunch of libraries: https://github.com/markmc/requirements/blob/652644c/tools/pip-requires https://github.com/markmc/requirements/blob/652644c/tools/test-requires The conclusion here is that we need to go through each of those libraries, talk to upstream and figure how to cap our version ranges to avoid unexpected incompatible updates. That's a lot of publishers to talk to!
The goal of the version specifiers section of PEP 426 is only to define a sort order and comparison operators. We threw in semantic versioning to recommend something more sane than the "mostly compat with pkg_resources" behavior that we are stuck with until the indices get smarter. I never hoped to be able to force packagers to actually do a good job of maintaining backwards compatibility.
Rejecting date based versions looks to be trying to force packagers into a scheme whereby the major version doesn't increment needlessly. If PEP426 isn't trying to force publishers to do better, then this looks like a pretty random potshot to take :)
For example, zipimport._zip_directory_cache... it's there, and people started depending on it, and it's become part of the API even though it was never meant to be one. If I remove it, which version number or compatible-with tag do I have to specify?
IMHO, it'd be fair game to just remove - it's obviously intended to be private.
Take a look at https://crate.io/packages/apipkg/ ; it is a neat way to make it clear which parts of your program are the API.
Interesting, indeed. Cheers, Mark.
On Mar 04, 2013, at 10:29 PM, Mark McLoughlin wrote:
The approach that some Fedora folks are trying out is called "Software Collections". It's not Python specific, but it's basically the same as a virtual environment.
It's a serious problem, and I think it will be made more so by the incursions into mobile platforms, where app isolation is actually an important feature. The security implications of duplication are bad enough, but in many cases, it's an issue of just plain functionality. On some platforms, often a PyPI version of a package will not work out of the box, and has to be patched to deal with various platform issues. The advantage of having packages come from the distro is because there's a higher likelihood that it will both continue to be secure *and* continue to work! As you point out, this isn't necessarily Python specific. We see much hand-wringing about Go packaging in Debian, and static linking for mobile apps. It would however be nice if Python itself had a better story for concurrent multiversion libraries, even better if it could be made compatible with schemes that the distros are coming up with to deal with this issue. -Barry
On Monday, March 4, 2013 at 5:29 PM, Mark McLoughlin wrote:
On Mon, 2013-03-04 at 12:44 -0500, Donald Stufft wrote:
On Monday, March 4, 2013 at 12:36 PM, Mark McLoughlin wrote:
If parallel incompatible installs is a hopeless problem in Python, why the push to semantic versioning then rather than saying that incompatible API changes should mean a name change?
Forcing a name change feels ugly as all hell. I don't really see what parallel installs has much to do with anything. I don't bundle anything and i'm ideologically opposed to it generally but I don't typically have a need for parallel installs because I use virtual environments. Why don't you utilize those? (Not being snarky, actually curious).
It's a fair question.
To answer it with a question, how do you imagine Linux distributions using virtual environments such that:
$> yum install -y openstack-nova
uses a virtual environment? How does it differ from bundling? (Not being snarky, actually curious :)
Ah, See I don't install Python software via package managers so for that use case yes it'd probably be a lot like bundling. So the biggest problem with the setuptools style multi versioning that I can remember is that it modified the global system state via pth files. I've got an idea for a system that would work for this use case however it would result in app start up taking longer (meaninfully longer? I don't know for sure). Although it would require either support from the linux packagers or installation via a third party tool.
The approach that some Fedora folks are trying out is called "Software Collections". It's not Python specific, but it's basically the same as a virtual environment.
For OpenStack, I think we'd probably have all the Python libraries we require installed under e.g. /opt/rh/openstack-$version so that you could have programs from two different releases of OpenStack installed on the same system.
Long time packagers are usually horrified at this idea e.g.
http://lists.fedoraproject.org/pipermail/devel/2012-December/thread.html#174...
Some of the things to think about:
- Each of the Python libraries under /opt/rh/openstack-$version would come from new packages like openstack-$version-python-eventlet.rpm - how many applications in Fedora would have a big stack of "bundled" python packages like OpenStack? 5, 10, 50, 100? Let's say it's 10 and each one stack has 20 packages. That's 200 new packages which need to be maintained by Fedora versus the current situation where we (painfully) make a single stack of libraries work for all applications.
- How many of these 200 new packages are essentially duplicates? Once you go down the route of having applications bundle libraries like this, there's going to basically be no sharing.
- What's the chance that that all of these 200 packages will be kept up to date? If an application works with a given version of a library and it can stick with that version, it will. As a Python library maintainer, wow do you like the idea of 10 different versions of you library included in Fedora?
- The next time a security issue is found in a common Python library, does Fedora now have to rush out 10 parallel fixes for it?
You can see that reaction in mails like this:
http://lists.fedoraproject.org/pipermail/devel/2012-December/174944.html
and the "why can't these losers just maintain compatibility" view:
http://lists.fedoraproject.org/pipermail/devel/2012-December/175028.html http://lists.fedoraproject.org/pipermail/devel/2012-December/174929.html
Notice folks complaining about Ruby and Java here, not Python. I can see Python embracing semantic versioning and "just use venv" shortly leading to Python being included in the list of "heretics".
Thanks, Mark.
[1] - http://docs.fedoraproject.org/en-US/Fedora_Contributor_Documentation/1/html-...
On Tue, Mar 5, 2013 at 8:29 AM, Mark McLoughlin <markmc@redhat.com> wrote:
On Mon, 2013-03-04 at 12:44 -0500, Donald Stufft wrote:
On Monday, March 4, 2013 at 12:36 PM, Mark McLoughlin wrote:
If parallel incompatible installs is a hopeless problem in Python, why the push to semantic versioning then rather than saying that incompatible API changes should mean a name change? Forcing a name change feels ugly as all hell. I don't really see what parallel installs has much to do with anything. I don't bundle anything and i'm ideologically opposed to it generally but I don't typically have a need for parallel installs because I use virtual environments. Why don't you utilize those? (Not being snarky, actually curious).
It's a fair question.
To answer it with a question, how do you imagine Linux distributions using virtual environments such that:
$> yum install -y openstack-nova
uses a virtual environment? How does it differ from bundling? (Not being snarky, actually curious :)
The approach that some Fedora folks are trying out is called "Software Collections". It's not Python specific, but it's basically the same as a virtual environment.
For OpenStack, I think we'd probably have all the Python libraries we require installed under e.g. /opt/rh/openstack-$version so that you could have programs from two different releases of OpenStack installed on the same system.
Long time packagers are usually horrified at this idea e.g.
http://lists.fedoraproject.org/pipermail/devel/2012-December/thread.html#174...
Yes, it's the eternal tension between "I only care about making a wide variety of applications on as easy to maintain on platform X as possible" view of the sysadmin and the "I only care about making application Y as easy to maintain on a wide variety of platforms as possible" view of the developer. Windows, Android, Mac OS X, etc, pretty much dial their software distribution model all the way towards the developer end of the spectrum. Linux distro maintainers need to realise that the language communities are almost entirely down the developer end of this spectrum, where sustainable cross-platform support is much higher priority than making life easier for administrators for any given platform. We're willing to work with distros to make deployment of security updates easier, but any proposals that involve people voluntarily making cross-platform development harder simpler aren't going to be accepted.
- How many of these 200 new packages are essentially duplicates? Once you go down the route of having applications bundle libraries like this, there's going to basically be no sharing.
There's no sharing only if you *actually* bundle the dependencies into each virtualenv. While full bundling is the only mode pip currently implements, completely isolating each virtualenv, it doesn't *have* to work that way. In particular, PEP 426 offers the opportunity to add a "compatible release" mode to pip/virtualenv where the tool can maintain a shared pool of installed libraries, and use *.pth files to make an appropriate version available in each venv. Updating the shared version to a more recent release would then automatically update any venvs with a *.pth file that reference that release. For example, suppose an application requires "somedep (1.3)". This requires at least version 1.3, and won't accept 2.0. The latest available qualifying version might be "1.5.3". At the moment, pip will install a *copy* of somedep 1.5.3 into the application's virtualenv. However, it doesn't have to do that. It could, instead, install somedep 1.5.3 into a location like "/usr/lib/shared/pip-python/somedep1/<contents>", and then add a "somedep1.pth" file to the virtualenv that references "/usr/lib/shared/pip-python/somedep1/". Now, suppose we install another app, also using a virtualenv, that requires "somedep (1.6)". The new version 1.6.0 is available now, so we install it into the shared location and *both* applications will end up using somedep 1.6.0. A security update is released for "somedep" as 1.6.1 - we install it into the shared location, and now both applications are using 1.6.1 instead of 1.6.0. Yay, that's what we wanted, just as if we had runtime version selection, only the selection happens at install time (when adding the *.pth file to the virtualenv) rather than at application startup. Finally, we install a third application that needs "somedep (2.1)". We can't overwrite the shared version, because it isn't compatible. Fortunately, what we can do instead is install it to "/usr/lib/shared/pip-python/somedep2/<contents>" and create a "somedep2.pth" file in that environment. The two virtualenvs relying on "somedep1" are blissfully unaware anything has changed because that version never appears anywhere on their sys.path. Could you use this approach for the actual system site-packages directory? No, because sys.path would become insanely long with that many *.pth files. However, you could likely symlink to releases stored in the *.pth friendly distribution store. But for application specific virtual environments, it should be fine. If any distros want that kind of thing to become a reality, though, they're going to have to step up and contribute it. As noted above, for the current tool development teams, the focus is on distributing, maintaining and deploying cross-platform applications, not making it easy to do security updates on a Linux distro. I believe it's possible to satisfy both parties, but it's going to be up to the distros to offer a viable plan for meeting their needs without disrupting existing upstream practices. I will note that making this kind of idea more feasible is one of the reasons I am making "compatible release" the *default* in PEP 426 version specifiers, but it still needs people to actually figure out the details and write the code. I will also note that the filesystem layout described above is *far* more amenable to safe runtime selection of packages than the current pkg_resources method. The critical failure mode in pkg_resources that can lead to a lot of weirdness is that it can end up pushing site-packages itself on to the front of sys.path which can shadow a *lot* of modules (in particular, an installed copy of the software you're currently working on may shadow the version in your source checkout - this is the bug the patch I linked earlier was needed to resolve). Runtime selection would need more work than getting virtual environments to work that way, but it's certainly feasible once the installation layout is updated.
- What's the chance that that all of these 200 packages will be kept up to date? If an application works with a given version of a library and it can stick with that version, it will. As a Python library maintainer, wow do you like the idea of 10 different versions of you library included in Fedora?
That's a problem the distros need to manage by offering patches to how virtual environments and installation layouts work, rather than lamenting the fact that cross-platform developers and distro maintainers care about different things.
- The next time a security issue is found in a common Python library, does Fedora now have to rush out 10 parallel fixes for it?
Not if Fedora contributes the changes needed to support parallel installs without requiring changes to existing Python applications and libraries.
You can see that reaction in mails like this:
http://lists.fedoraproject.org/pipermail/devel/2012-December/174944.html
and the "why can't these losers just maintain compatibility" view:
http://lists.fedoraproject.org/pipermail/devel/2012-December/175028.html http://lists.fedoraproject.org/pipermail/devel/2012-December/174929.html
Notice folks complaining about Ruby and Java here, not Python. I can see Python embracing semantic versioning and "just use venv" shortly leading to Python being included in the list of "heretics".
Unlike Java, the Python community generally sees *actual* bundling as evil - expressing constraints relative to a published package index is a different thing. Dependencies in Python are typically only brought together into a cohesive, pinned set of version by application developers and system integrators - the frameworks and libraries often express quite loose version requirements (and receive complaints if they're overly restrictive). The distros just have a harder problem than most because the set of packages they're trying to bring together is so large, they're bound to run into many cases of packages that have mutually incompatible dependencies. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Hi Nick, On Tue, 2013-03-05 at 17:56 +1000, Nick Coghlan wrote:
On Tue, Mar 5, 2013 at 8:29 AM, Mark McLoughlin <markmc@redhat.com> wrote:
On Mon, 2013-03-04 at 12:44 -0500, Donald Stufft wrote:
On Monday, March 4, 2013 at 12:36 PM, Mark McLoughlin wrote:
If parallel incompatible installs is a hopeless problem in Python, why the push to semantic versioning then rather than saying that incompatible API changes should mean a name change? Forcing a name change feels ugly as all hell. I don't really see what parallel installs has much to do with anything. I don't bundle anything and i'm ideologically opposed to it generally but I don't typically have a need for parallel installs because I use virtual environments. Why don't you utilize those? (Not being snarky, actually curious).
It's a fair question.
To answer it with a question, how do you imagine Linux distributions using virtual environments such that:
$> yum install -y openstack-nova
uses a virtual environment? How does it differ from bundling? (Not being snarky, actually curious :)
The approach that some Fedora folks are trying out is called "Software Collections". It's not Python specific, but it's basically the same as a virtual environment.
For OpenStack, I think we'd probably have all the Python libraries we require installed under e.g. /opt/rh/openstack-$version so that you could have programs from two different releases of OpenStack installed on the same system.
Long time packagers are usually horrified at this idea e.g.
http://lists.fedoraproject.org/pipermail/devel/2012-December/thread.html#174...
Yes, it's the eternal tension between "I only care about making a wide variety of applications on as easy to maintain on platform X as possible" view of the sysadmin and the "I only care about making application Y as easy to maintain on a wide variety of platforms as possible" view of the developer.
Windows, Android, Mac OS X, etc, pretty much dial their software distribution model all the way towards the developer end of the spectrum. Linux distro maintainers need to realise that the language communities are almost entirely down the developer end of this spectrum, where sustainable cross-platform support is much higher priority than making life easier for administrators for any given platform.
I'm with you to there, but it's a bit of a vicious circle - app developers bundle to insulate themselves from platform instability and then the platform maintainers no longer see a benefit to platform stability.
We're willing to work with distros to make deployment of security updates easier, but any proposals that involve people voluntarily making cross-platform development harder simpler aren't going to be accepted.
Let's take it to an extreme and say there was some way to force library maintainers to never make incompatible API changes without renaming the project. Not what I'm suggesting, of course. Doing that would not make app developers lives any/much harder since they're bundling anyway. It does, however, make life more difficult for the platform maintainers. So, what I think you're really saying would be rejected is "any proposals which make platform maintenance harder while not providing a material benefit to app maintainers who bundle". The concrete thing we're discussing here is that distros get screwed if the incompatible API changes are commonplace and there is no easy way for the distro to ship/install multiple versions of the same API without going down the route of every app in the distro bundling their own version of the API. I don't see why that's a problem that necessarily requires disrupting cross-platform app authors in order to address.
- How many of these 200 new packages are essentially duplicates? Once you go down the route of having applications bundle libraries like this, there's going to basically be no sharing.
There's no sharing only if you *actually* bundle the dependencies into each virtualenv. While full bundling is the only mode pip currently implements, completely isolating each virtualenv, it doesn't *have* to work that way. In particular, PEP 426 offers the opportunity to add a "compatible release" mode to pip/virtualenv where the tool can maintain a shared pool of installed libraries, and use *.pth files to make an appropriate version available in each venv. Updating the shared version to a more recent release would then automatically update any venvs with a *.pth file that reference that release.
For example, suppose an application requires "somedep (1.3)". This requires at least version 1.3, and won't accept 2.0. The latest available qualifying version might be "1.5.3".
At the moment, pip will install a *copy* of somedep 1.5.3 into the application's virtualenv. However, it doesn't have to do that.
Awesome! Now we're getting on to figuring out a solution to the "parallel installs of multiple incompatible versions" issue :)
It could, instead, install somedep 1.5.3 into a location like "/usr/lib/shared/pip-python/somedep1/<contents>", and then add a "somedep1.pth" file to the virtualenv that references "/usr/lib/shared/pip-python/somedep1/".
Now, suppose we install another app, also using a virtualenv, that requires "somedep (1.6)". The new version 1.6.0 is available now, so we install it into the shared location and *both* applications will end up using somedep 1.6.0.
A security update is released for "somedep" as 1.6.1 - we install it into the shared location, and now both applications are using 1.6.1 instead of 1.6.0. Yay, that's what we wanted, just as if we had runtime version selection, only the selection happens at install time (when adding the *.pth file to the virtualenv) rather than at application startup.
Finally, we install a third application that needs "somedep (2.1)". We can't overwrite the shared version, because it isn't compatible. Fortunately, what we can do instead is install it to "/usr/lib/shared/pip-python/somedep2/<contents>" and create a "somedep2.pth" file in that environment. The two virtualenvs relying on "somedep1" are blissfully unaware anything has changed because that version never appears anywhere on their sys.path.
Could you use this approach for the actual system site-packages directory? No, because sys.path would become insanely long with that many *.pth files. However, you could likely symlink to releases stored in the *.pth friendly distribution store. But for application specific virtual environments, it should be fine.
Ok, there's a tonne of details there about pip, virtualenv and .pth files that are going over my head right now, but the general idea I'm taking away is: - the system has multiple versions of somedep installed under /usr somewhere - the latest version (2.1) is what you get if you naively just do 'import somedep' - most applications should instead somehow explicitly say they need ~>1.3, ->1.6 or ~->2.0 or whatever - distros would have multiple copies of the same library, but only one copy for each incompatible stream rather than one copy for each application That's definitely workable.
If any distros want that kind of thing to become a reality, though, they're going to have to step up and contribute it. As noted above, for the current tool development teams, the focus is on distributing, maintaining and deploying cross-platform applications, not making it easy to do security updates on a Linux distro. I believe it's possible to satisfy both parties, but it's going to be up to the distros to offer a viable plan for meeting their needs without disrupting existing upstream practices.
Point well taken. However, I am surprised the pendulum has swung such that Python platform maintainers only worry about app maintainers and not distro maintainers. And also, Python embracing incompatible updates and bundling is either a new upstream practice or just that doesn't well understood on the distro side.
I will note that making this kind of idea more feasible is one of the reasons I am making "compatible release" the *default* in PEP 426 version specifiers, but it still needs people to actually figure out the details and write the code.
I still think that going down this road without the parallel installs issue solved is a dangerous move for Python. Leaving aside pain for distros for a moment, there was a perception (perhaps misguided) that Python was a more stable platform that e.g. Ruby or Java. If Python library maintainers will see PEP426 as a license to make incompatible changes more often so long as they bump their major number, then that perception will change.
I will also note that the filesystem layout described above is *far* more amenable to safe runtime selection of packages than the current pkg_resources method. The critical failure mode in pkg_resources that can lead to a lot of weirdness is that it can end up pushing site-packages itself on to the front of sys.path which can shadow a *lot* of modules (in particular, an installed copy of the software you're currently working on may shadow the version in your source checkout - this is the bug the patch I linked earlier was needed to resolve). Runtime selection would need more work than getting virtual environments to work that way, but it's certainly feasible once the installation layout is updated.
Ok.
- What's the chance that that all of these 200 packages will be kept up to date? If an application works with a given version of a library and it can stick with that version, it will. As a Python library maintainer, wow do you like the idea of 10 different versions of you library included in Fedora?
That's a problem the distros need to manage by offering patches to how virtual environments and installation layouts work, rather than lamenting the fact that cross-platform developers and distro maintainers care about different things.
I'm not lamenting what cross-platform developers care about. I'm lamenting that the Python platform maintainers care more about the cross-platform developers than distro maintainers :)
- The next time a security issue is found in a common Python library, does Fedora now have to rush out 10 parallel fixes for it?
Not if Fedora contributes the changes needed to support parallel installs without requiring changes to existing Python applications and libraries.
"Patches welcome" - I get it.
You can see that reaction in mails like this:
http://lists.fedoraproject.org/pipermail/devel/2012-December/174944.html
and the "why can't these losers just maintain compatibility" view:
http://lists.fedoraproject.org/pipermail/devel/2012-December/175028.html http://lists.fedoraproject.org/pipermail/devel/2012-December/174929.html
Notice folks complaining about Ruby and Java here, not Python. I can see Python embracing semantic versioning and "just use venv" shortly leading to Python being included in the list of "heretics".
Unlike Java, the Python community generally sees *actual* bundling as evil
I think what you call "*actual* bundling" is what I think of as "vendorisation" - i.e. where an app actually copies a library into its source tree? By bundling, I mean that an app sees itself as in control of the versions of its dependencies. The app developer fundamentally thinks she is delivering a specific stack of dependencies and her application code on top rather than installing just their app and running it on a stable platform.
- expressing constraints relative to a published package index is a different thing. Dependencies in Python are typically only brought together into a cohesive, pinned set of version by application developers and system integrators - the frameworks and libraries often express quite loose version requirements (and receive complaints if they're overly restrictive).
The distros just have a harder problem than most because the set of packages they're trying to bring together is so large, they're bound to run into many cases of packages that have mutually incompatible dependencies.
It only takes two apps requiring two incompatible versions of the same libary for this to become an issue. A specific example that concerns OpenStack is you will often want server A from version N installed alongside server B from version N+1. This is especially true while you're migrating your deployment from version N to N+1 since you probably want to upgrade a server at a time. Thus, in OpenStack's case, it only takes one of our dependencies to release an incompatible version for this to become an issue. Python can be a stable platform and OpenStack wouldn't bundle, or it can be an unstable platform without parallel installs and OpenStack will bundle, or it can be an unstable platform with parallel installs and OpenStack won't have to bundle. Anyway, sounds like we have some ideas for parallel installs we can investigate. Thanks, Mark.
On Tuesday, March 5, 2013 at 7:27 AM, Mark McLoughlin wrote:
Hi Nick,
On Tue, 2013-03-05 at 17:56 +1000, Nick Coghlan wrote:
On Tue, Mar 5, 2013 at 8:29 AM, Mark McLoughlin <markmc@redhat.com (mailto:markmc@redhat.com)> wrote:
On Mon, 2013-03-04 at 12:44 -0500, Donald Stufft wrote:
On Monday, March 4, 2013 at 12:36 PM, Mark McLoughlin wrote:
If parallel incompatible installs is a hopeless problem in Python, why the push to semantic versioning then rather than saying that incompatible API changes should mean a name change?
Forcing a name change feels ugly as all hell. I don't really see what parallel installs has much to do with anything. I don't bundle anything and i'm ideologically opposed to it generally but I don't typically have a need for parallel installs because I use virtual environments. Why don't you utilize those? (Not being snarky, actually curious).
It's a fair question.
To answer it with a question, how do you imagine Linux distributions using virtual environments such that:
$> yum install -y openstack-nova
uses a virtual environment? How does it differ from bundling? (Not being snarky, actually curious :)
The approach that some Fedora folks are trying out is called "Software Collections". It's not Python specific, but it's basically the same as a virtual environment.
For OpenStack, I think we'd probably have all the Python libraries we require installed under e.g. /opt/rh/openstack-$version so that you could have programs from two different releases of OpenStack installed on the same system.
Long time packagers are usually horrified at this idea e.g.
http://lists.fedoraproject.org/pipermail/devel/2012-December/thread.html#174...
Yes, it's the eternal tension between "I only care about making a wide variety of applications on as easy to maintain on platform X as possible" view of the sysadmin and the "I only care about making application Y as easy to maintain on a wide variety of platforms as possible" view of the developer.
Windows, Android, Mac OS X, etc, pretty much dial their software distribution model all the way towards the developer end of the spectrum. Linux distro maintainers need to realise that the language communities are almost entirely down the developer end of this spectrum, where sustainable cross-platform support is much higher priority than making life easier for administrators for any given platform.
I'm with you to there, but it's a bit of a vicious circle - app developers bundle to insulate themselves from platform instability and then the platform maintainers no longer see a benefit to platform stability.
We're willing to work with distros to make deployment of security updates easier, but any proposals that involve people voluntarily making cross-platform development harder simpler aren't going to be accepted.
Let's take it to an extreme and say there was some way to force library maintainers to never make incompatible API changes without renaming the project. Not what I'm suggesting, of course.
Doing that would not make app developers lives any/much harder since they're bundling anyway.
It does, however, make life more difficult for the platform maintainers.
So, what I think you're really saying would be rejected is "any proposals which make platform maintenance harder while not providing a material benefit to app maintainers who bundle".
The concrete thing we're discussing here is that distros get screwed if the incompatible API changes are commonplace and there is no easy way for the distro to ship/install multiple versions of the same API without going down the route of every app in the distro bundling their own version of the API.
I don't see why that's a problem that necessarily requires disrupting cross-platform app authors in order to address.
- How many of these 200 new packages are essentially duplicates? Once you go down the route of having applications bundle libraries like this, there's going to basically be no sharing.
There's no sharing only if you *actually* bundle the dependencies into each virtualenv. While full bundling is the only mode pip currently implements, completely isolating each virtualenv, it doesn't *have* to work that way. In particular, PEP 426 offers the opportunity to add a "compatible release" mode to pip/virtualenv where the tool can maintain a shared pool of installed libraries, and use *.pth files to make an appropriate version available in each venv. Updating the shared version to a more recent release would then automatically update any venvs with a *.pth file that reference that release.
For example, suppose an application requires "somedep (1.3)". This requires at least version 1.3, and won't accept 2.0. The latest available qualifying version might be "1.5.3".
At the moment, pip will install a *copy* of somedep 1.5.3 into the application's virtualenv. However, it doesn't have to do that.
Awesome! Now we're getting on to figuring out a solution to the "parallel installs of multiple incompatible versions" issue :)
It could, instead, install somedep 1.5.3 into a location like "/usr/lib/shared/pip-python/somedep1/<contents>", and then add a "somedep1.pth" file to the virtualenv that references "/usr/lib/shared/pip-python/somedep1/".
Now, suppose we install another app, also using a virtualenv, that requires "somedep (1.6)". The new version 1.6.0 is available now, so we install it into the shared location and *both* applications will end up using somedep 1.6.0.
A security update is released for "somedep" as 1.6.1 - we install it into the shared location, and now both applications are using 1.6.1 instead of 1.6.0. Yay, that's what we wanted, just as if we had runtime version selection, only the selection happens at install time (when adding the *.pth file to the virtualenv) rather than at application startup.
Finally, we install a third application that needs "somedep (2.1)". We can't overwrite the shared version, because it isn't compatible. Fortunately, what we can do instead is install it to "/usr/lib/shared/pip-python/somedep2/<contents>" and create a "somedep2.pth" file in that environment. The two virtualenvs relying on "somedep1" are blissfully unaware anything has changed because that version never appears anywhere on their sys.path.
Could you use this approach for the actual system site-packages directory? No, because sys.path would become insanely long with that many *.pth files. However, you could likely symlink to releases stored in the *.pth friendly distribution store. But for application specific virtual environments, it should be fine.
Ok, there's a tonne of details there about pip, virtualenv and .pth files that are going over my head right now, but the general idea I'm taking away is:
- the system has multiple versions of somedep installed under /usr somewhere
- the latest version (2.1) is what you get if you naively just do 'import somedep'
- most applications should instead somehow explicitly say they need ~>1.3, ->1.6 or ~->2.0 or whatever
- distros would have multiple copies of the same library, but only one copy for each incompatible stream rather than one copy for each application
That's definitely workable.
If any distros want that kind of thing to become a reality, though, they're going to have to step up and contribute it. As noted above, for the current tool development teams, the focus is on distributing, maintaining and deploying cross-platform applications, not making it easy to do security updates on a Linux distro. I believe it's possible to satisfy both parties, but it's going to be up to the distros to offer a viable plan for meeting their needs without disrupting existing upstream practices.
Point well taken.
However, I am surprised the pendulum has swung such that Python platform maintainers only worry about app maintainers and not distro maintainers.
And also, Python embracing incompatible updates and bundling is either a new upstream practice or just that doesn't well understood on the distro side.
I will note that making this kind of idea more feasible is one of the reasons I am making "compatible release" the *default* in PEP 426 version specifiers, but it still needs people to actually figure out the details and write the code.
I still think that going down this road without the parallel installs issue solved is a dangerous move for Python. Leaving aside pain for distros for a moment, there was a perception (perhaps misguided) that Python was a more stable platform that e.g. Ruby or Java.
If Python library maintainers will see PEP426 as a license to make incompatible changes more often so long as they bump their major number, then that perception will change.
I still don't really see how this is related to PEP426 unless PEP426 has gotten a lot larger since I last looked at it. Where in particular a distribution gets installed is left up to the installers to sort out. And making sure that the installed versions exist in sys.path is similarly out of scope for PEP426.
I will also note that the filesystem layout described above is *far* more amenable to safe runtime selection of packages than the current pkg_resources method. The critical failure mode in pkg_resources that can lead to a lot of weirdness is that it can end up pushing site-packages itself on to the front of sys.path which can shadow a *lot* of modules (in particular, an installed copy of the software you're currently working on may shadow the version in your source checkout - this is the bug the patch I linked earlier was needed to resolve). Runtime selection would need more work than getting virtual environments to work that way, but it's certainly feasible once the installation layout is updated.
Ok.
- What's the chance that that all of these 200 packages will be kept up to date? If an application works with a given version of a library and it can stick with that version, it will. As a Python library maintainer, wow do you like the idea of 10 different versions of you library included in Fedora?
That's a problem the distros need to manage by offering patches to how virtual environments and installation layouts work, rather than lamenting the fact that cross-platform developers and distro maintainers care about different things.
I'm not lamenting what cross-platform developers care about. I'm lamenting that the Python platform maintainers care more about the cross-platform developers than distro maintainers :)
- The next time a security issue is found in a common Python library, does Fedora now have to rush out 10 parallel fixes for it?
Not if Fedora contributes the changes needed to support parallel installs without requiring changes to existing Python applications and libraries.
"Patches welcome" - I get it.
You can see that reaction in mails like this:
http://lists.fedoraproject.org/pipermail/devel/2012-December/174944.html
and the "why can't these losers just maintain compatibility" view:
http://lists.fedoraproject.org/pipermail/devel/2012-December/175028.html http://lists.fedoraproject.org/pipermail/devel/2012-December/174929.html
Notice folks complaining about Ruby and Java here, not Python. I can see Python embracing semantic versioning and "just use venv" shortly leading to Python being included in the list of "heretics".
Unlike Java, the Python community generally sees *actual* bundling as evil
I think what you call "*actual* bundling" is what I think of as "vendorisation" - i.e. where an app actually copies a library into its source tree?
By bundling, I mean that an app sees itself as in control of the versions of its dependencies. The app developer fundamentally thinks she is delivering a specific stack of dependencies and her application code on top rather than installing just their app and running it on a stable platform.
In this case the app would be one unit and an upgrade in one of the bundled dependencies would require a new version of the App. In my mind when an app either bundles or vendorizes they take responsibility for those dependencies and making sure they are up to date because they are now part of the logical unit that is the app.
- expressing constraints relative to a published package index is a different thing. Dependencies in Python are typically only brought together into a cohesive, pinned set of version by application developers and system integrators - the frameworks and libraries often express quite loose version requirements (and receive complaints if they're overly restrictive).
The distros just have a harder problem than most because the set of packages they're trying to bring together is so large, they're bound to run into many cases of packages that have mutually incompatible dependencies.
It only takes two apps requiring two incompatible versions of the same libary for this to become an issue.
A specific example that concerns OpenStack is you will often want server A from version N installed alongside server B from version N+1. This is especially true while you're migrating your deployment from version N to N+1 since you probably want to upgrade a server at a time.
Thus, in OpenStack's case, it only takes one of our dependencies to release an incompatible version for this to become an issue.
Python can be a stable platform and OpenStack wouldn't bundle, or it can be an unstable platform without parallel installs and OpenStack will bundle, or it can be an unstable platform with parallel installs and OpenStack won't have to bundle.
Anyway, sounds like we have some ideas for parallel installs we can investigate.
Thanks, Mark.
On Tue, 2013-03-05 at 07:34 -0500, Donald Stufft wrote:
On Tuesday, March 5, 2013 at 7:27 AM, Mark McLoughlin wrote:
On Tue, 2013-03-05 at 17:56 +1000, Nick Coghlan wrote:
...
I will note that making this kind of idea more feasible is one of the reasons I am making "compatible release" the *default* in PEP 426 version specifiers, but it still needs people to actually figure out the details and write the code.
I still think that going down this road without the parallel installs issue solved is a dangerous move for Python. Leaving aside pain for distros for a moment, there was a perception (perhaps misguided) that Python was a more stable platform that e.g. Ruby or Java.
If Python library maintainers will see PEP426 as a license to make incompatible changes more often so long as they bump their major number, then that perception will change. I still don't really see how this is related to PEP426 unless PEP426 has gotten a lot larger since I last looked at it. Where in particular a distribution gets installed is left up to the installers to sort out. And making sure that the installed versions exist in sys.path is similarly out of scope for PEP426.
Sorry, maybe I'm being obtuse. I can see people read PEP426 and thinking "oh, awesome! Python now has a mechanism for handling incompatible API changes! Now I can get rid of that crufty backwards compat code and bump my major number!". My point is that it's (potentially) damaging to send that message to library maintainers before Python has the infrastructure for sanely dealing with parallel installs. ...
I think what you call "*actual* bundling" is what I think of as "vendorisation" - i.e. where an app actually copies a library into its source tree?
By bundling, I mean that an app sees itself as in control of the versions of its dependencies. The app developer fundamentally thinks she is delivering a specific stack of dependencies and her application code on top rather than installing just their app and running it on a stable platform. In this case the app would be one unit and an upgrade in one of the bundled dependencies would require a new version of the App. In my mind when an app either bundles or vendorizes they take responsibility for those dependencies and making sure they are up to date because they are now part of the logical unit that is the app.
Right, exactly - the idea Nick lays out allows the dependencies to be managed by the system platform maintainer. That would be a huge improvement over bundling. Cheers, Mark.
On Tuesday, March 5, 2013 at 7:50 AM, Mark McLoughlin wrote:
I still don't really see how this is related to PEP426 unless PEP426 has gotten a lot larger since I last looked at it. Where in particular a distribution gets installed is left up to the installers to sort out. And making sure that the installed versions exist in sys.path is similarly out of scope for PEP426.
Sorry, maybe I'm being obtuse.
I can see people read PEP426 and thinking "oh, awesome! Python now has a mechanism for handling incompatible API changes! Now I can get rid of that crufty backwards compat code and bump my major number!".
My point is that it's (potentially) damaging to send that message to library maintainers before Python has the infrastructure for sanely dealing with parallel installs.
Gotcha, you think that codifying how to version with regards to breaking API compatibility will lead to more people breaking backwards compatibility. That's a fair concern, and there's not much that can be done inside of PEP426 to alleviate it. However I will say that PEP426 doesn't really contain much in the way of new ideas, but rather codifies a lot of existing practices within the Python community so that tools can get simpler and more accurate without having to resort to heuristics and guessing. You could also argue that this would _help_ with backwards compatibility because there is now a suggested way of declaring when 2 releases are no longer compatible by incrementing the major version number.
On Tue, Mar 5, 2013 at 10:34 PM, Donald Stufft <donald.stufft@gmail.com> wrote:
On Tuesday, March 5, 2013 at 7:27 AM, Mark McLoughlin wrote:
If Python library maintainers will see PEP426 as a license to make incompatible changes more often so long as they bump their major number, then that perception will change.
I still don't really see how this is related to PEP426 unless PEP426 has gotten a lot larger since I last looked at it. Where in particular a distribution gets installed is left up to the installers to sort out. And making sure that the installed versions exist in sys.path is similarly out of scope for PEP426.
Mark is worried that explicitly endorsing semantic versioning in PEP 426 will encourage package developers to gleefully break backwards compatibility whenever they want to, so the situation quickly degenerates into a mess of incompatible version requirements as you move higher up the stack. If such a situation occurs, then it is a potential problem for large applications like OpenStack (with a deep-and-broad dependency stack) and especially for full Linux distributions that are trying to jam large fractions of PyPI (plus their own distro-specific code) into a single coherent dependency tree. That's actually the opposite of the intent, though - as I see it, while most Python developers are doing the right thing and offering reasonable deprecation periods, there are also some cases where backwards compatibility is broken without suitable advance notice, or people aren't aware of the implications of depending on 0.x releases, or, as recently happened with the requests 0.9 -> 1.0 transition, developers bump their minimum required version of a dependency in an incompatible way. I don't expect a sudden stampede of "Hey, there's an official way to indicate I'm breaking backwards compatibility, I'm going to do it when I wouldn't have before!", no matter what any PEP says. However, I also see dependency management as primarily an integrator's problem, and something to be solved primarily external to the libraries themselves, through virtual environments, *.pth files and appropriate installation layouts. The only responsibility I see as lying with the upstream library and application developers is to accurately declare their dependencies (and to make them as broad as is reasonable). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, Mar 5, 2013 at 10:55 PM, Donald Stufft <donald.stufft@gmail.com> wrote:
On Tuesday, March 5, 2013 at 7:50 AM, Mark McLoughlin wrote:
I still don't really see how this is related to PEP426 unless PEP426 has gotten a lot larger since I last looked at it. Where in particular a distribution gets installed is left up to the installers to sort out. And making sure that the installed versions exist in sys.path is similarly out of scope for PEP426.
Sorry, maybe I'm being obtuse.
I can see people read PEP426 and thinking "oh, awesome! Python now has a mechanism for handling incompatible API changes! Now I can get rid of that crufty backwards compat code and bump my major number!".
My point is that it's (potentially) damaging to send that message to library maintainers before Python has the infrastructure for sanely dealing with parallel installs.
Gotcha, you think that codifying how to version with regards to breaking API compatibility will lead to more people breaking backwards compatibility.
That's a fair concern, and there's not much that can be done inside of PEP426 to alleviate it. However I will say that PEP426 doesn't really contain much in the way of new ideas, but rather codifies a lot of existing practices within the Python community so that tools can get simpler and more accurate without having to resort to heuristics and guessing.
You could also argue that this would _help_ with backwards compatibility because there is now a suggested way of declaring when 2 releases are no longer compatible by incrementing the major version number.
Yeah, I think if we can come up with a clear plan whereby distros can create a suitable installation layout such that parallel versions can be installed and imported, even for Python libraries that have no idea parallel installation is possible, it should alleviate a lot of the concerns. My main point is that most Python software assumes there will only be one version of a given library on sys.path, so relying on the libraries themselves to specify *at runtime* which version they want isn't going to be workable. However, the install time metadata is a much better candidate for doing something useful (and is also a bit more amenable to being adjusted by the distro maintainers when it is overly restrictive). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, Mar 5, 2013 at 7:50 AM, Mark McLoughlin <markmc@redhat.com> wrote:
On Tue, 2013-03-05 at 07:34 -0500, Donald Stufft wrote:
On Tuesday, March 5, 2013 at 7:27 AM, Mark McLoughlin wrote:
On Tue, 2013-03-05 at 17:56 +1000, Nick Coghlan wrote:
...
I will note that making this kind of idea more feasible is one of the reasons I am making "compatible release" the *default* in PEP 426 version specifiers, but it still needs people to actually figure out the details and write the code.
I still think that going down this road without the parallel installs issue solved is a dangerous move for Python. Leaving aside pain for distros for a moment, there was a perception (perhaps misguided) that Python was a more stable platform that e.g. Ruby or Java.
If Python library maintainers will see PEP426 as a license to make incompatible changes more often so long as they bump their major number, then that perception will change. I still don't really see how this is related to PEP426 unless PEP426 has gotten a lot larger since I last looked at it. Where in particular a distribution gets installed is left up to the installers to sort out. And making sure that the installed versions exist in sys.path is similarly out of scope for PEP426.
Sorry, maybe I'm being obtuse.
I can see people read PEP426 and thinking "oh, awesome! Python now has a mechanism for handling incompatible API changes! Now I can get rid of that crufty backwards compat code and bump my major number!".
It will have exactly the opposite effect. Semver is awesome because it encourages the developer to first define an API at all, then think about whether each new release breaks the API, and last encode that information in the version number instead of incrementing each component by whatever random number feels right. All that extra thinking will lead to better APIs. You will be amazed at how much backwards compatible cruft some developers will keep around rather than endure the psychological anguish of bumping the major version number.
On Tue, Mar 5, 2013 at 10:27 PM, Mark McLoughlin <markmc@redhat.com> wrote:
Ok, there's a tonne of details there about pip, virtualenv and .pth files that are going over my head right now, but the general idea I'm taking away is:
- the system has multiple versions of somedep installed under /usr somewhere
- the latest version (2.1) is what you get if you naively just do 'import somedep'
- most applications should instead somehow explicitly say they need ~>1.3, ->1.6 or ~->2.0 or whatever
- distros would have multiple copies of the same library, but only one copy for each incompatible stream rather than one copy for each application
That's definitely workable.
That's the general idea. There's a fair bit of work needed to get there though, and the on-disk layouts for the distros would look fairly different from the current approach of just dumping everything in Python's site-packages directory. It also won't solve *all* compatibility problems, so the distros may still be on the hook to carry some compatibility patches in order to get a common dependency that works for everyone. (To be honest, I expect software collections will end up needing a similar capability, for all the reasons you gave earlier in response to the suggestion of just using virtualenv with the existing simple bundling approach)
If any distros want that kind of thing to become a reality, though, they're going to have to step up and contribute it. As noted above, for the current tool development teams, the focus is on distributing, maintaining and deploying cross-platform applications, not making it easy to do security updates on a Linux distro. I believe it's possible to satisfy both parties, but it's going to be up to the distros to offer a viable plan for meeting their needs without disrupting existing upstream practices.
Point well taken.
However, I am surprised the pendulum has swung such that Python platform maintainers only worry about app maintainers and not distro maintainers.
If you saw my rant on the Fedora python-devel list, you may have guessed that a lot of the expressed frustration with the distro approach and the lack of willingness to understand why bundling is such an attractive option for cross-platform development comes from me personally, rather than the Python community in general. However, there's also a general dearth of distro people on the upstream packaging lists to balance the web app developers - I'm one of the ones that is *most sympathetic* to the distro point of view, and that should worry you a bit.
And also, Python embracing incompatible updates and bundling is either a new upstream practice or just that doesn't well understood on the distro side.
The popularity of bundling is primarily in devops and integrated application stacks. You can't deploy on Windows without bundling, and when you have to manage your own security updates for other platforms anyway, the distros refusing to accommodate that model is just frustrating. The distro packagers say "we don't trust the developers of all of the apps we ship to provide timely security fixes" and the app developers say "we don't trust the packagers of all of the distros we support to provide timely security fixes" and everybody gets annoyed with each other because their concerns and priorities are so wildly divergent. And we're definitely *not* embracing incompatible updates, we're acknowledging their inevitable existence in volunteer driven projects, and defining a way to at least communicate them clearly when they happen.
I will note that making this kind of idea more feasible is one of the reasons I am making "compatible release" the *default* in PEP 426 version specifiers, but it still needs people to actually figure out the details and write the code.
I still think that going down this road without the parallel installs issue solved is a dangerous move for Python. Leaving aside pain for distros for a moment, there was a perception (perhaps misguided) that Python was a more stable platform that e.g. Ruby or Java.
If Python library maintainers will see PEP426 as a license to make incompatible changes more often so long as they bump their major number, then that perception will change.
This isn't a concern I had considered, but I'll review the wording in the PEP with that in mind (there's a bunch of other stuff I need to fix in the version scheme description anyway).
By bundling, I mean that an app sees itself as in control of the versions of its dependencies. The app developer fundamentally thinks she is delivering a specific stack of dependencies and her application code on top rather than installing just their app and running it on a stable platform.
I'm actually trying to push *against* that by making the "compatible release" specifier the default in PEP 426 rather than requiring a separate operator (if you look at PEP 345, the predecessor that defines metadata 1.2, it pinned dependencies by default). Actual pinning (with "==") should be limited to devops type situations, where an application really is explicitly controlling it's entire stack, but that's not the kind of metadata anyone should be publishing on PyPI. I can see how the perception of the current PEP might be different without that background of knowing precisely what the previous version said, though.
Anyway, sounds like we have some ideas for parallel installs we can investigate.
Yes, I'm definitely not opposed to the idea of parallel installs - I'm just opposed to the idea of parallel install systems that rely on changes to PyPI packages in order for them to work properly. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, Mar 5, 2013 at 11:28 PM, Daniel Holth <dholth@gmail.com> wrote:
You will be amazed at how much backwards compatible cruft some developers will keep around rather than endure the psychological anguish of bumping the major version number.
We also have Python 3 to hit them over the head with. "Look how much pain the core devs put everyone through with the Python 3 transition! Do you want that for your users?! Do you?!" :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (9)
-
Barry Warsaw
-
Daniel Holth
-
Donald Stufft
-
Glyph
-
Marcus Smith
-
Mark McLoughlin
-
Ned Deily
-
Nick Coghlan
-
Reinout van Rees