Maintaining a curated set of Python packages

Hi, I would like to propose that, as a community, we jointly maintain a curated set of Python packages that are known to work together. These packages would receive security updates for some time and every couple of months a new major release of the curated set comes available. The idea of this is inspired by Haskell LTS, so maybe we should call this PyPI LTS? So why a PyPI LTS? PyPI makes available all versions of packages that were uploaded, and by default installers like pip will try to use the latest available versions of packages, unless told otherwise. With a requirements.txt file (or a future pipfile.lock) and setup.py we can pin as much as we like our requirements of respectively the environment and package requirements, thereby making a more reproducible environment possible and also fixing the API for developers. Pinning requirements is often a manual job, although one could use pip freeze or other tools. A common problem is when two packages in a certain environment require different versions of a package. Having a curated set of packages, developers could be encouraged to test against the latest stable and nightly of the curated package set, thereby increasing compatibility between different packages, something I think we all want. Having a compatible set of packages is not only interesting for developers, but also for downstream distributions. All distributions try to find a set of packages that are working together and release them. This is a lot of work, and I think it would be in everyone's benefit if we try to solve this issue together. A possible solution Downstream, that is developers and distributions, will need a set of packages that are known to work together. At minimum this would consist of, per package, the name of the package and its version, but for reproducibility I would propose adding the filename and hash as well. Because there isn't any reliable method to extract the requirements of a package, I propose also including `setup_requires`, install_requires`, and `tests_require` explicitly. That way, distributions can automatically build recipes for the packages (although non-Python dependencies would still have to be resolved by the distribution). The package set would be released as lts-YYYY-MM-REVISION, and developers can choose to track a specific revision, but would typically be asked to track only lts-YYYY-MM which would resolve to the latest REVISION. Because dependencies vary per Python language version, interpreter, and operating system, we would have to have these sets for each combination and therefore I propose having a source which evaluates to say a TOML/JSON file per version/interpreter/OS. How this source file should be written I don't know; while I think the Nix expression language is an excellent choice for this, it is not possible for everyone to use and therefore likely not an option. Open questions There are still plenty of open questions. - Who decides when a package is updated that would break dependents? This is an issue all distributions face, so maybe we should involve them. - How would this be integrated with pip / virtualenv / pipfile.lock / requirements.txt / setup.py? See e.g. https://github.com/pypa/pipfile/issues/10#issuecomment-262229620 References to Haskell LTS Here are several links to some interesting documents on how Haskell LTS works. - A blog post describing what Haskell LTS is: https://www.fpcomplete.com/blog/2014/12/backporting-bug-fixes - Rules regarding uploading and breaking packages: https://github.com/fpco/stackage/blob/master/MAINTAINERS.md#adding-a-package - The actual LTS files https://github.com/fpco/lts-haskell What do you think of this proposal? Would you be interested in this as developer, or packager? Freddy

Isn't this issue already solved by (and the raison d'être of) the multiple third-party Python redistributors, like the various OS package maintainers, Continuum's Anaconda, Enthought Canopy, ActiveState Python, WinPython, etc? [image: Inline image 1] -Robert On Thu, Dec 1, 2016 at 4:45 AM, Freddy Rietdijk <freddyrietdijk@fridh.nl> wrote:
Hi,
I would like to propose that, as a community, we jointly maintain a curated set of Python packages that are known to work together. These packages would receive security updates for some time and every couple of months a new major release of the curated set comes available. The idea of this is inspired by Haskell LTS, so maybe we should call this PyPI LTS?
So why a PyPI LTS?
PyPI makes available all versions of packages that were uploaded, and by default installers like pip will try to use the latest available versions of packages, unless told otherwise. With a requirements.txt file (or a future pipfile.lock) and setup.py we can pin as much as we like our requirements of respectively the environment and package requirements, thereby making a more reproducible environment possible and also fixing the API for developers. Pinning requirements is often a manual job, although one could use pip freeze or other tools.
A common problem is when two packages in a certain environment require different versions of a package. Having a curated set of packages, developers could be encouraged to test against the latest stable and nightly of the curated package set, thereby increasing compatibility between different packages, something I think we all want.
Having a compatible set of packages is not only interesting for developers, but also for downstream distributions. All distributions try to find a set of packages that are working together and release them. This is a lot of work, and I think it would be in everyone's benefit if we try to solve this issue together.
A possible solution
Downstream, that is developers and distributions, will need a set of packages that are known to work together. At minimum this would consist of, per package, the name of the package and its version, but for reproducibility I would propose adding the filename and hash as well. Because there isn't any reliable method to extract the requirements of a package, I propose also including `setup_requires`, install_requires`, and `tests_require` explicitly. That way, distributions can automatically build recipes for the packages (although non-Python dependencies would still have to be resolved by the distribution).
The package set would be released as lts-YYYY-MM-REVISION, and developers can choose to track a specific revision, but would typically be asked to track only lts-YYYY-MM which would resolve to the latest REVISION.
Because dependencies vary per Python language version, interpreter, and operating system, we would have to have these sets for each combination and therefore I propose having a source which evaluates to say a TOML/JSON file per version/interpreter/OS. How this source file should be written I don't know; while I think the Nix expression language is an excellent choice for this, it is not possible for everyone to use and therefore likely not an option.
Open questions
There are still plenty of open questions.
- Who decides when a package is updated that would break dependents? This is an issue all distributions face, so maybe we should involve them. - How would this be integrated with pip / virtualenv / pipfile.lock / requirements.txt / setup.py? See e.g. https://github.com/pypa/ pipfile/issues/10#issuecomment-262229620
References to Haskell LTS
Here are several links to some interesting documents on how Haskell LTS works. - A blog post describing what Haskell LTS is: https://www.fpcomplete. com/blog/2014/12/backporting-bug-fixes - Rules regarding uploading and breaking packages: https://github.com/ fpco/stackage/blob/master/MAINTAINERS.md#adding-a-package - The actual LTS files https://github.com/fpco/lts-haskell
What do you think of this proposal? Would you be interested in this as developer, or packager?
Freddy
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
-- -Robert

On 3 December 2016 at 01:33, Robert T. McGibbon <rmcgibbo@gmail.com> wrote:
Isn't this issue already solved by (and the raison d'être of) the multiple third-party Python redistributors, like the various OS package maintainers, Continuum's Anaconda, Enthought Canopy, ActiveState Python, WinPython, etc?
Yep. Once you start talking content curation, you're in a situation where: - you're providing an ongoing service that will always be needed (the specific packages will change, but the task won't) - you need to commit to a certain level of responsiveness for security issues - you'll generally need to span multiple language ecosystems, not just Python - exactly which packages are interesting will depend on the user audience you're targeting - the tolerance for API breakage will also vary based on the audience you're targeting - you'll often want to be able to carry patches that aren't present in the upstream components - you'll need to decide which target platforms you want to support If a curation community *isn't* doing any of those things, then it isn't adding a lot of value beyond folks just doing DIY integration in their CI system by pinning their dependencies to particular versions. As far as the comments about determining dependencies goes, the way pip does it generally works fine, you just need a sandboxed environment to do the execution, and both redistributors and open source information providers like libraries.io are actively working on automating that process (coping with packages as they exist on PyPI today, rather than relying on the upstream community to change anything about the way Python packaging works). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Dec 2, 2016 at 4:33 PM, Robert T. McGibbon <rmcgibbo@gmail.com> wrote:
Isn't this issue already solved by (and the raison d'être of) the multiple third-party Python redistributors, like the various OS package maintainers, Continuum's Anaconda, Enthought Canopy, ActiveState Python, WinPython, etc?
My intention is not creating yet another distribution. Instead, I want to see if there is interest in the different distributions on sharing some of the burden of curating by bringing up this discussion and seeing what is needed. These distributions have their recipes that allow them to build their packages using their tooling. What I propose is having some of that data community managed so the distributions can use that along with their tooling to build the eventual packages. On Fri, Dec 2, 2016 at 5:23 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 3 December 2016 at 01:33, Robert T. McGibbon <rmcgibbo@gmail.com> wrote:
Isn't this issue already solved by (and the raison d'être of) the multiple third-party Python redistributors, like the various OS package maintainers, Continuum's Anaconda, Enthought Canopy, ActiveState Python, WinPython, etc?
Yep. Once you start talking content curation, you're in a situation where:
- you're providing an ongoing service that will always be needed (the specific packages will change, but the task won't) - you need to commit to a certain level of responsiveness for security issues
- you'll generally need to span multiple language ecosystems, not just
Python - exactly which packages are interesting will depend on the user audience you're targeting - the tolerance for API breakage will also vary based on the audience you're targeting - you'll often want to be able to carry patches that aren't present in the upstream components - you'll need to decide which target platforms you want to support
These are interesting issues you bring up here. What I seek is having a set that has per package a version, source, Python dependencies and build system. Other dependencies would be for now left out, unless someone has a good idea how to include those. Distributions can take this curated set and extend the data with their distribution specific things. For example, in Nix we could load such a set, map a function that builds the packages in the set, and override what is passed to the function when necessary (e.g. to add system dependencies, our patches, or how tests are invoked, and so on). Responsiveness is indeed an interesting issue. If there's enough backing, then I imagine security issues will be resolved as fast as they are nowadays by the distributions backing the initiative.
If a curation community *isn't* doing any of those things, then it isn't adding a lot of value beyond folks just doing DIY integration in their CI system by pinning their dependencies to particular versions.
I would imagine that distributions that would support this idea would have a CI tracking packages built using the curated set and the distribution-specific changes. When there's an issue they could fix it at their side, or if it is something that might belong in the curated set, they would report the issue. At some point, when they would freeze, they would pin to a certain YYYY.MM and API breakage should not occur.
As far as the comments about determining dependencies goes, the way pip does it generally works fine, you just need a sandboxed environment to do the execution, and both redistributors and open source information providers like libraries.io are actively working on automating that process (coping with packages as they exist on PyPI today, rather than relying on the upstream community to change anything about the way Python packaging works).
libraries.io is a very interesting initiative. It seems they scan the contents of the archives and extract dependencies based on what is in the requirements files, which is often more than is actually needed for building and running the package. They would benefit from having a declarative style for the dependencies and build system, but that is another issue (PEP 517 e.g.) than what I bring up here. We also have a tool that runs pip in a sandbox to determine the dependencies, and then provide us with an expression. It works, but it shouldn't be necessary. Freddy

On 3 December 2016 at 03:34, Freddy Rietdijk <freddyrietdijk@fridh.nl> wrote:
On Fri, Dec 2, 2016 at 4:33 PM, Robert T. McGibbon <rmcgibbo@gmail.com> wrote:
Isn't this issue already solved by (and the raison d'être of) the multiple third-party Python redistributors, like the various OS package maintainers, Continuum's Anaconda, Enthought Canopy, ActiveState Python, WinPython, etc?
My intention is not creating yet another distribution. Instead, I want to see if there is interest in the different distributions on sharing some of the burden of curating by bringing up this discussion and seeing what is needed. These distributions have their recipes that allow them to build their packages using their tooling. What I propose is having some of that data community managed so the distributions can use that along with their tooling to build the eventual packages.
There's definitely interest in more automated curation such that publishing through PyPI means you get pre-built binary artifacts and compatibility testing for popular platforms automatically, but the hard part of that isn't really the technical aspects, it's developing a robust funding and governance model for the related sustaining engineering activities. That upstream component level "yes it builds" and "yes it passes its self-tests" data is then useful to redistributors, since it would make it straightforward to filter out releases that don't even build or pass their own tests even before they make it into a downstream review pipeline.
These are interesting issues you bring up here. What I seek is having a set that has per package a version, source, Python dependencies and build system. Other dependencies would be for now left out, unless someone has a good idea how to include those. Distributions can take this curated set and extend the data with their distribution specific things. For example, in Nix we could load such a set, map a function that builds the packages in the set, and override what is passed to the function when necessary (e.g. to add system dependencies, our patches, or how tests are invoked, and so on).
Something that could be useful on that front is to mine the stdlib documentation for "seealso" references to third party libraries and collect them into an automation-friendly reference API. The benefit of that approach is that it: - would be immediately useful in its own right as a "stdlib++" definition - solves the scope problem (the problem tackled has to be common enough to have a default solution in the standard library, but complex enough that there are recommended alternatives) - solves the governance problem (the approval process for new entries is to get them referenced from the relevant stdlib module documentation)
Responsiveness is indeed an interesting issue. If there's enough backing, then I imagine security issues will be resolved as fast as they are nowadays by the distributions backing the initiative.
Not necessarily, as many of those responsiveness guarantees rely on the ability of redistributors to carry downstream patches, even before there's a corresponding upstream security release. This is especially so for upstream projects that follow an as-needed release model, without much (if any) automation of their publication process.
If a curation community *isn't* doing any of those things, then it isn't adding a lot of value beyond folks just doing DIY integration in their CI system by pinning their dependencies to particular versions.
I would imagine that distributions that would support this idea would have a CI tracking packages built using the curated set and the distribution-specific changes. When there's an issue they could fix it at their side, or if it is something that might belong in the curated set, they would report the issue. At some point, when they would freeze, they would pin to a certain YYYY.MM and API breakage should not occur.
Yeah, this is effectively what happens already, it's just not particularly visible outside the individual redistributor pipelines.
libraries.io is a very interesting initiative. It seems they scan the contents of the archives and extract dependencies based on what is in the requirements files, which is often more than is actually needed for building and running the package. They would benefit from having a declarative style for the dependencies and build system, but that is another issue (PEP 517 e.g.) than what I bring up here. We also have a tool that runs pip in a sandbox to determine the dependencies, and then provide us with an expression. It works, but it shouldn't be necessary.
Alas, with 94k+ setup.py based packages already in the wild, arbitrary code execution for dependency metadata generation is going to be with us for a while. That said, centralised services like libraries.io should lead to more folks being able to just use their already collected dependency data (even if it isn't the minimal dependency set) and avoid having to generate it themselves. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Dec 01, 2016, at 10:45 AM, Freddy Rietdijk wrote:
Having a compatible set of packages is not only interesting for developers, but also for downstream distributions. All distributions try to find a set of packages that are working together and release them. This is a lot of work, and I think it would be in everyone's benefit if we try to solve this issue together.
It's an interesting but difficult problem at the level of PyPI. It's difficult because the definition of compatibility is highly dependent on the consumer's environment. For example, C extension compatibility will depend on the version of libraries available on the platform versions you care about. There are also dependents on Python libraries that you can't capture only in PyPI, e.g. some distro-only package may depend on a particular Python package API. PyPI can't test any of this. And some distros (e.g. Ubuntu) may have multiple dimensions of consumability, e.g. the classic apt packages vs. snaps, primary devel archive vs. stable distro versions vs. backports, etc. Ubuntu has an elaborate automated system for testing some dimension of compatibility issues between packages, not just Python packages. Debian has the same system but isn't gated on the results. Individual distro packages can include a set of tests that are run against the built version of the package (as opposed to tests run at package build time). When a new version of that package is uploaded it ends up in a "proposed" pocket that is generally not installed by users. The proposed new version has its tests run, and any package that depends on that package also has its tests run. Only when all those tests pass is the package automatically promoted to the release pocket, and thus is installable by the general user population. This does a great job within the context of the distro of raising the quality of the archive because obvious incompatibilities (i.e. those for which tests exist) block such promotion. (FWIW, the difference is that Debian doesn't block promotion of packages failing their automated tests, so 1) it provides less value to Debian; 2) we end up inheriting and seeing these problems in Ubuntu and so have to expend downstream effort to fix the failures.) All of this happens automatically within the context of the distro, on multiple architectures, so it provides a lot of value, but I'm not sure how useful it would be higher up the food chain, since those contexts will be different enough to cause both false positives and false negatives. And it does often take quite a bit of focused engineering effort to monitor packages which don't promote (something we want to automate), to actually fix the problems wherever is most appropriate (and as far upstream as possible), and to create meaningful tests of compatibility in the first place (except for default tests such as installability). Still, there may be value in inter-Python package compatibility tests, but it'll take serious engineering effort (i.e. $ and time), ongoing maintenance, ongoing effort to fix problems, and tooling to gate installability of failing packages (with overrides for downstreams which don't care or already expend such effort). Cheers, -Barry

Putting the conclusion first, I do see value in better publicising "Recommended libraries" based on some automated criteria like: - recommended in the standard library documentation - available via 1 or more cross-platform commercial Python redistributors - available via 1 or more Linux distro vendors - available via 1 or more web service development platforms That would be a potentially valuable service for folks new to the world of open source that are feeling somewhat overwhelmed by the sheer number of alternatives now available to them. However, I also think that would better fit in with the aims of an open source component tracking community like libraries.io than it does a publisher-centric community like distutils-sig. The further comments below are just a bit more background on why I feel the integration testing aspect of the suggestion isn't likely to be particularly beneficial :) On 9 December 2016 at 01:10, Barry Warsaw <barry@python.org> wrote:
Still, there may be value in inter-Python package compatibility tests, but it'll take serious engineering effort (i.e. $ and time), ongoing maintenance, ongoing effort to fix problems, and tooling to gate installability of failing packages (with overrides for downstreams which don't care or already expend such effort).
I think this is really the main issue, as both desktop and server environments are moving towards the integrated platform + isolated applications approach popularised by mobile devices. That means we end up with two very different variants of automated integration testing: - the application focused kind offered by the likes of requires.io and pyup.io (i.e. monitor for dependency updates, submit PRs to trigger app level CI) - the platform focused kind employed by distro vendors (testing all the platform components work together, including the app isolation features) The first kind makes sense if you're building something that runs *on* platforms (Docker containers, Snappy or FlatPak apps, web services, mobile apps, etc). The second kind inevitably ends up intertwined with the component review and release engineering systems of the particular platform, so it becomes really hard to collaborate cross-platform outside the context of specific projects like OpenStack that provide clear definitions for "What components do we collectively depend on that we need to test together?" and "What does 'working' mean in the context of this project?". Accordingly, for an initiative like this to be successful, it would need to put some thought up front into the questions of: 1. Who are the intended beneficiaries of the proposal? 2. What problem does it address that will prompt them to contribute time and/or money to solving it? 3. What do we expect people to be able to *stop doing* if the project proves successful? For platform providers, a generic "stdlib++" project wouldn't really reduce the amount of integration testing we'd need to do ourselves (we already don't test arbitrary combinations of dependencies, just the ones we provide at any given point in time). For application and service developers, the approach of pinning dependencies to specific versions and treating updates like any other source code change already works well in most cases. That leaves library and framework developers, who currently tend to adopt the policy of "for each version of Python that we support, we test against the latest versions of our dependencies that were available at the time we ran the test", leaving testing against older versions to platform providers. If there's a key related framework that also provides LTS versions (e.g. Django), then some folks may add that to their test matrix as well. In that context, "Only breaks backwards compatibility for compelling reasons" becomes a useful long term survival trait for libraries and frameworks, as gratuitous breakages are likely to lead to people migrating away from particularly unreliable dependencies. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thursday, December 8, 2016, Nick Coghlan <ncoghlan@gmail.com> wrote:
Putting the conclusion first, I do see value in better publicising "Recommended libraries" based on some automated criteria like:
- recommended in the standard library documentation - available via 1 or more cross-platform commercial Python redistributors - available via 1 or more Linux distro vendors - available via 1 or more web service development platforms
So these would be attributes tracked by a project maintainer and verified by the known-good-set maintainer? Or? (Again, here I reach for JSONLD. "count n" is only so useful; *which* {[re]distros, platforms, heartfelt testimonials from incredible experts} URLs ) - test coverage - seclist contact info AND procedures - more than one super admin maintainer - what other criteria should/could/can we use to vet open source libraries?
That would be a potentially valuable service for folks new to the world of open source that are feeling somewhat overwhelmed by the sheer number of alternatives now available to them.
However, I also think that would better fit in with the aims of an open source component tracking community like libraries.io than it does a publisher-centric community like distutils-sig.
IDK if libraries are really in scope for stackshare. The feature upcoming/down voting is pretty cool. https://stackshare.io/python
The further comments below are just a bit more background on why I feel the integration testing aspect of the suggestion isn't likely to be particularly beneficial :)
A catch-all for testing bits from application-specific integration test suites could be useful (and would likely require at least docker-compose, dox, kompose for working with actual data stores)
Still, there may be value in inter-Python package compatibility tests, but it'll take serious engineering effort (i.e. $ and time), ongoing
On 9 December 2016 at 01:10, Barry Warsaw <barry@python.org <javascript:;>> wrote: maintenance,
ongoing effort to fix problems, and tooling to gate installability of failing packages (with overrides for downstreams which don't care or already expend such effort).
I think this is really the main issue, as both desktop and server environments are moving towards the integrated platform + isolated applications approach popularised by mobile devices.
That means we end up with two very different variants of automated integration testing:
- the application focused kind offered by the likes of requires.io and pyup.io (i.e. monitor for dependency updates, submit PRs to trigger app level CI) - the platform focused kind employed by distro vendors (testing all the platform components work together, including the app isolation features)
The first kind makes sense if you're building something that runs *on* platforms (Docker containers, Snappy or FlatPak apps, web services, mobile apps, etc).
The second kind inevitably ends up intertwined with the component review and release engineering systems of the particular platform, so it becomes really hard to collaborate cross-platform outside the context of specific projects like OpenStack that provide clear definitions for "What components do we collectively depend on that we need to test together?" and "What does 'working' mean in the context of this project?".
Accordingly, for an initiative like this to be successful, it would need to put some thought up front into the questions of:
1. Who are the intended beneficiaries of the proposal? 2. What problem does it address that will prompt them to contribute time and/or money to solving it? 3. What do we expect people to be able to *stop doing* if the project proves successful?
For platform providers, a generic "stdlib++" project wouldn't really reduce the amount of integration testing we'd need to do ourselves (we already don't test arbitrary combinations of dependencies, just the ones we provide at any given point in time).
For application and service developers, the approach of pinning dependencies to specific versions and treating updates like any other source code change already works well in most cases.
That leaves library and framework developers, who currently tend to adopt the policy of "for each version of Python that we support, we test against the latest versions of our dependencies that were available at the time we ran the test", leaving testing against older versions to platform providers. If there's a key related framework that also provides LTS versions (e.g. Django), then some folks may add that to their test matrix as well.
In that context, "Only breaks backwards compatibility for compelling reasons" becomes a useful long term survival trait for libraries and frameworks, as gratuitous breakages are likely to lead to people migrating away from particularly unreliable dependencies.
Sometimes, when there are no active maintainers for a feature, it makes sense to deprecate and/or split that functionality / untested cruft out to a different package.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com <javascript:;> | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org <javascript:;> https://mail.python.org/mailman/listinfo/distutils-sig

Here are some standardized (conda) package versions: https://github.com/jupyter/docker-stacks/blob/master/scipy-notebook/Dockerfi... On Thursday, December 8, 2016, Wes Turner <wes.turner@gmail.com> wrote:
On Thursday, December 8, 2016, Nick Coghlan <ncoghlan@gmail.com <javascript:_e(%7B%7D,'cvml','ncoghlan@gmail.com');>> wrote:
Putting the conclusion first, I do see value in better publicising "Recommended libraries" based on some automated criteria like:
- recommended in the standard library documentation - available via 1 or more cross-platform commercial Python redistributors - available via 1 or more Linux distro vendors - available via 1 or more web service development platforms
So these would be attributes tracked by a project maintainer and verified by the known-good-set maintainer? Or?
(Again, here I reach for JSONLD. "count n" is only so useful; *which* {[re]distros, platforms, heartfelt testimonials from incredible experts} URLs )
- test coverage - seclist contact info AND procedures - more than one super admin maintainer - what other criteria should/could/can we use to vet open source libraries?
That would be a potentially valuable service for folks new to the world of open source that are feeling somewhat overwhelmed by the sheer number of alternatives now available to them.
However, I also think that would better fit in with the aims of an open source component tracking community like libraries.io than it does a publisher-centric community like distutils-sig.
IDK if libraries are really in scope for stackshare. The feature upcoming/down voting is pretty cool.
The further comments below are just a bit more background on why I feel the integration testing aspect of the suggestion isn't likely to be particularly beneficial :)
A catch-all for testing bits from application-specific integration test suites could be useful (and would likely require at least docker-compose, dox, kompose for working with actual data stores)
Still, there may be value in inter-Python package compatibility tests, but it'll take serious engineering effort (i.e. $ and time), ongoing
On 9 December 2016 at 01:10, Barry Warsaw <barry@python.org> wrote: maintenance,
ongoing effort to fix problems, and tooling to gate installability of failing packages (with overrides for downstreams which don't care or already expend such effort).
I think this is really the main issue, as both desktop and server environments are moving towards the integrated platform + isolated applications approach popularised by mobile devices.
That means we end up with two very different variants of automated integration testing:
- the application focused kind offered by the likes of requires.io and pyup.io (i.e. monitor for dependency updates, submit PRs to trigger app level CI) - the platform focused kind employed by distro vendors (testing all the platform components work together, including the app isolation features)
The first kind makes sense if you're building something that runs *on* platforms (Docker containers, Snappy or FlatPak apps, web services, mobile apps, etc).
The second kind inevitably ends up intertwined with the component review and release engineering systems of the particular platform, so it becomes really hard to collaborate cross-platform outside the context of specific projects like OpenStack that provide clear definitions for "What components do we collectively depend on that we need to test together?" and "What does 'working' mean in the context of this project?".
Accordingly, for an initiative like this to be successful, it would need to put some thought up front into the questions of:
1. Who are the intended beneficiaries of the proposal? 2. What problem does it address that will prompt them to contribute time and/or money to solving it? 3. What do we expect people to be able to *stop doing* if the project proves successful?
For platform providers, a generic "stdlib++" project wouldn't really reduce the amount of integration testing we'd need to do ourselves (we already don't test arbitrary combinations of dependencies, just the ones we provide at any given point in time).
For application and service developers, the approach of pinning dependencies to specific versions and treating updates like any other source code change already works well in most cases.
That leaves library and framework developers, who currently tend to adopt the policy of "for each version of Python that we support, we test against the latest versions of our dependencies that were available at the time we ran the test", leaving testing against older versions to platform providers. If there's a key related framework that also provides LTS versions (e.g. Django), then some folks may add that to their test matrix as well.
In that context, "Only breaks backwards compatibility for compelling reasons" becomes a useful long term survival trait for libraries and frameworks, as gratuitous breakages are likely to lead to people migrating away from particularly unreliable dependencies.
Sometimes, when there are no active maintainers for a feature, it makes sense to deprecate and/or split that functionality / untested cruft out to a different package.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On Saturday, December 10, 2016, Wes Turner <wes.turner@gmail.com> wrote:
Here are some standardized (conda) package versions: https://github.com/ jupyter/docker-stacks/blob/master/scipy-notebook/Dockerfile
IDK how they choose packages - what "criteria for inclusion" - for the kaggle/docker-python Dockerfile: https://github.com/Kaggle/docker-python/blob/master/Dockerfile (Ubuntu, Conda, *, TPOT)
On Thursday, December 8, 2016, Wes Turner <wes.turner@gmail.com <javascript:_e(%7B%7D,'cvml','wes.turner@gmail.com');>> wrote:
On Thursday, December 8, 2016, Nick Coghlan <ncoghlan@gmail.com> wrote:
Putting the conclusion first, I do see value in better publicising "Recommended libraries" based on some automated criteria like:
- recommended in the standard library documentation - available via 1 or more cross-platform commercial Python redistributors - available via 1 or more Linux distro vendors - available via 1 or more web service development platforms
So these would be attributes tracked by a project maintainer and verified by the known-good-set maintainer? Or?
(Again, here I reach for JSONLD. "count n" is only so useful; *which* {[re]distros, platforms, heartfelt testimonials from incredible experts} URLs )
- test coverage - seclist contact info AND procedures - more than one super admin maintainer - what other criteria should/could/can we use to vet open source libraries?
That would be a potentially valuable service for folks new to the world of open source that are feeling somewhat overwhelmed by the sheer number of alternatives now available to them.
However, I also think that would better fit in with the aims of an open source component tracking community like libraries.io than it does a publisher-centric community like distutils-sig.
IDK if libraries are really in scope for stackshare. The feature upcoming/down voting is pretty cool.
The further comments below are just a bit more background on why I feel the integration testing aspect of the suggestion isn't likely to be particularly beneficial :)
A catch-all for testing bits from application-specific integration test suites could be useful (and would likely require at least docker-compose, dox, kompose for working with actual data stores)
Still, there may be value in inter-Python package compatibility tests, but it'll take serious engineering effort (i.e. $ and time), ongoing
On 9 December 2016 at 01:10, Barry Warsaw <barry@python.org> wrote: maintenance,
ongoing effort to fix problems, and tooling to gate installability of failing packages (with overrides for downstreams which don't care or already expend such effort).
I think this is really the main issue, as both desktop and server environments are moving towards the integrated platform + isolated applications approach popularised by mobile devices.
That means we end up with two very different variants of automated integration testing:
- the application focused kind offered by the likes of requires.io and pyup.io (i.e. monitor for dependency updates, submit PRs to trigger app level CI) - the platform focused kind employed by distro vendors (testing all the platform components work together, including the app isolation features)
The first kind makes sense if you're building something that runs *on* platforms (Docker containers, Snappy or FlatPak apps, web services, mobile apps, etc).
The second kind inevitably ends up intertwined with the component review and release engineering systems of the particular platform, so it becomes really hard to collaborate cross-platform outside the context of specific projects like OpenStack that provide clear definitions for "What components do we collectively depend on that we need to test together?" and "What does 'working' mean in the context of this project?".
Accordingly, for an initiative like this to be successful, it would need to put some thought up front into the questions of:
1. Who are the intended beneficiaries of the proposal? 2. What problem does it address that will prompt them to contribute time and/or money to solving it? 3. What do we expect people to be able to *stop doing* if the project proves successful?
For platform providers, a generic "stdlib++" project wouldn't really reduce the amount of integration testing we'd need to do ourselves (we already don't test arbitrary combinations of dependencies, just the ones we provide at any given point in time).
For application and service developers, the approach of pinning dependencies to specific versions and treating updates like any other source code change already works well in most cases.
That leaves library and framework developers, who currently tend to adopt the policy of "for each version of Python that we support, we test against the latest versions of our dependencies that were available at the time we ran the test", leaving testing against older versions to platform providers. If there's a key related framework that also provides LTS versions (e.g. Django), then some folks may add that to their test matrix as well.
In that context, "Only breaks backwards compatibility for compelling reasons" becomes a useful long term survival trait for libraries and frameworks, as gratuitous breakages are likely to lead to people migrating away from particularly unreliable dependencies.
Sometimes, when there are no active maintainers for a feature, it makes sense to deprecate and/or split that functionality / untested cruft out to a different package.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

As pointed out by others, there are external groups doing "curating". conda-forge is one such project, so I'll comment from that perspective: It's difficult because the definition of compatibility is highly dependent on
the consumer's environment. For example, C extension compatibility will depend on the version of libraries available on the platform versions you care about.
Indeed -- which is why Anaconda and conda-forge are built on conda rather than pip -- it is designed to handle these issues. However with the many linux effort, and some efforts to kludge C libs into binary wheels, pypi may just be able to handle more of these issues -- so curating may have it's advantages. As it happens, I recently proposed a version system for conda-forge (much like Anaconda) where you would select a given conda-forge version, and ALL the packages in that version would be expected to work together. the idea is that you don't want one package dependent on libjpeg6 while another depends on libjpeg7, for instance. But the community has decided that rather than try to version the whole system, we instead rely on robust "pinning" -- i.e. version specific dependencies -- package A depends on libjpeg7, not just "libjpeg". Then there are tools that go through all the packages and check for incompatible pinnings, and update an build new versions where required (or at least ping the maintainers of a package that un update is needed) Ubuntu has an elaborate automated system for testing some dimension of
compatibility issues between packages, not just Python packages. Debian has the same system but isn't gated on the results.
This brings up the larger issue -- PyPi is inherently different than these efforts -- PyPi has always been about each package author maintaining their own package -- Linux distros and conda-forge, and ??? all have a small set of core contributions that do the package maintenance. This is a large effort, and wold be insanely hard with the massive amount of stuff on PyPi.... In fact, I think the kinda-sort curation that comes from individual communities is working remarkably well: the scipy community the django community ... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Dec 14, 2016, at 9:41 AM, Chris Barker <chris.barker@noaa.gov> wrote:
As pointed out by others, there are external groups doing "curating". conda-forge is one such project, so I'll comment from that perspective:
It's difficult because the definition of compatibility is highly dependent on the consumer's environment. For example, C extension compatibility will depend on the version of libraries available on the platform versions you care about.
Indeed -- which is why Anaconda and conda-forge are built on conda rather than pip -- it is designed to handle these issues.
However with the many linux effort, and some efforts to kludge C libs into binary wheels, pypi may just be able to handle more of these issues -- so curating may have it's advantages.
I think it's unfair to describe these efforts as a "kludge"; many of the tools developed for manylinux1 et. al. are actually pretty sophisticated tooling with a mature ecosystem approach to library bundling. Personally I have noticed a _massive_ reduction in the support overhead involved in getting new users spun up in the present Python packaging ecosystem. Due to the availability of cross-platform wheels, it's possible to do a LOT more python development without a C compiler than it used to be. -glyph

I think it's unfair to describe these efforts as a "kludge";
I was specifically referring to using wheels to deliver C libs-- pip+wheel were not designed for that. But I don't mean to offend, there has been a lot of great work done, and yes, the situation has much improved as a result. There are times when a kludge is called for! -CHB

On 15 December 2016 at 03:41, Chris Barker <chris.barker@noaa.gov> wrote: [Barry wrote]
Ubuntu has an elaborate automated system for testing some dimension of compatibility issues between packages, not just Python packages. Debian has the same system but isn't gated on the results.
This brings up the larger issue -- PyPi is inherently different than these efforts -- PyPi has always been about each package author maintaining their own package -- Linux distros and conda-forge, and ??? all have a small set of core contributions that do the package maintenance.
Fedora at least has just shy of 1900 people in the "packager" group, so I don't know that "small" is the right word in absolute terms :) However, relatively speaking, even a packager group that size is still an order of magnitude smaller than the 30k+ publishers on PyPI (which is in turn an order of magnitude smaller than the 180k+ registered PyPI accounts)
This is a large effort, and wold be insanely hard with the massive amount of stuff on PyPi....
In fact, I think the kinda-sort curation that comes from individual communities is working remarkably well:
the scipy community the django community ...
Exactly. Armin Ronacher and a few others have also started a new umbrella group on GitHub, Pallets, collecting together some of the key infrastructure projects in the Flask ecosystem: https://www.palletsprojects.com/blog/hello/ Dell/EMC's John Mark Walker has a recent article about this "downstream distribution" formation process on opensource.com, where it's an emergent phenomenon arising from the needs of people that are consuming open source components to achieve some particular purpose rather than working on them for their own sake: https://opensource.com/article/16/12/open-source-software-supply-chain It's a fairly different activity from pure upstream development - where upstream is a matter of "design new kinds and versions of Lego bricks" (e.g. the Linux kernel, gcc, CPython, PyPI projects), downstream integration is more "define new Lego kits using the already available bricks" (e.g. Debian, Fedora, conda-forge), while commercial product and service development is "We already put the Lego kit together for you, so you can just use it" (e.g. Ubuntu, RHEL, Amazon Linux, ActivePython, Enthought Canopy, Wakari.io). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I'm not sure how useful it would be higher up the food chain, since those contexts will be different enough to cause both false positives and false negatives. And it does often take quite a bit of focused engineering effort to monitor
It's interesting to read about how other distributions upgrade their package sets. In Nixpkgs most packages are updated manually. Some frameworks/languages provide their dependencies declarative, in which case it becomes 'straightforward' to include whole package sets, like in the case of Haskell. Some expressions need to be overridden manually because e.g. they require certain system libraries. It's manual work, but not that much. This is what I would like to see for Python as well. The Python packages we still update manually although we have tools to automate most of it. The reason for not using those tools is because a) it means evaluating or building parts of the packages to get the dependencies and b) too often upstream pins versions of dependencies which turns out to be entirely unnecessary, and would therefore prevent an upgrade. Even so, we have over 1500 Python packages per interpreter version that according to our CI seem to work together. We do only build on 3 architectures (i386, amd64 and darwin/osx). Compatibility with the latter is sometimes an issue because its guessing what Apple has changed when releasing a new version. packages
which don't promote (something we want to automate),
In my experience the manual work that typically needs to be done is a) making available Python dependencies, b) unpinning versions when unnecessary and report upstream, and c) making sure the package finds the system libraries. Issues with packages that cannot be upgraded because of a version of a system dependency I haven't yet encountered. In my proposal a) and b) would be fixed by the curated package set.
https://github.com/nvie/pip-tools : - requirements.in -> pip-compile -> requirements.txt (~pipfile.lock)
Yep, there are tools, which get the job done when developing and using a set of packages. Now, you want to deploy your app. You can't use your requirements.txt on a Linux distribution because they have a curated set of packages which is typically different from your set (although maybe they do provide tools to package the versions you need). But, you can choose to use a second package manager, pip or conda. System dependencies? That's something conda can somewhat take of. Problem solved you would say, except now you have multiple package managers that you need.
Practically, a developer would want a subset of the given known-good-set (and then additional packages), so:
- fork/copy requirements-YYYY-MM-REV--<OSNAMEVER>.txt - #comment out unused deps - add '-r addl-requirements.txt'
See the link I shared earlier on how this is already done with Haskell and stack.yaml and how it could be used with `pipfile` https://github.com/pypa/pipfile/issues/10#issuecomment-262229620
Putting the conclusion first, I do see value in better publicising "Recommended libraries" based on some automated criteria like:
Yes, we should recommend third-party libraries in a trusted place like the documentation of CPython. The amount of packages that are available can be overwhelming. Yet, defining a set of packages that are recommended, and perhaps working together, is still far from defining an exact set of packages that are known to work together, something which I proposed here.
As pointed out by others, there are external groups doing "curating". conda-forge is one such project, so I'll comment from that perspective
I haven't used conda in a long time, and conda-forge didn't exist back then. I see versions are pinned, but versions of dependencies sometimes as well. If I choose to install *all* the packages available via conda-forge, will I get a fixed package set, or will the SAT-solver try to find a working set (and possibly fail at it)? I hope it is the former, since if it is the latter then it is not curated in how I meant it. On Thu, Dec 15, 2016 at 6:22 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Ubuntu has an elaborate automated system for testing some dimension of compatibility issues between packages, not just Python packages. Debian has the same system but isn't gated on the results.
This brings up the larger issue -- PyPi is inherently different than
efforts -- PyPi has always been about each package author maintaining
On 15 December 2016 at 03:41, Chris Barker <chris.barker@noaa.gov> wrote: [Barry wrote] these their
own package -- Linux distros and conda-forge, and ??? all have a small set of core contributions that do the package maintenance.
Fedora at least has just shy of 1900 people in the "packager" group, so I don't know that "small" is the right word in absolute terms :)
However, relatively speaking, even a packager group that size is still an order of magnitude smaller than the 30k+ publishers on PyPI (which is in turn an order of magnitude smaller than the 180k+ registered PyPI accounts)
This is a large effort, and wold be insanely hard with the massive amount of stuff on PyPi....
In fact, I think the kinda-sort curation that comes from individual communities is working remarkably well:
the scipy community the django community ...
Exactly. Armin Ronacher and a few others have also started a new umbrella group on GitHub, Pallets, collecting together some of the key infrastructure projects in the Flask ecosystem: https://www.palletsprojects.com/blog/hello/
Dell/EMC's John Mark Walker has a recent article about this "downstream distribution" formation process on opensource.com, where it's an emergent phenomenon arising from the needs of people that are consuming open source components to achieve some particular purpose rather than working on them for their own sake: https://opensource.com/article/16/12/open-source-software-supply-chain
It's a fairly different activity from pure upstream development - where upstream is a matter of "design new kinds and versions of Lego bricks" (e.g. the Linux kernel, gcc, CPython, PyPI projects), downstream integration is more "define new Lego kits using the already available bricks" (e.g. Debian, Fedora, conda-forge), while commercial product and service development is "We already put the Lego kit together for you, so you can just use it" (e.g. Ubuntu, RHEL, Amazon Linux, ActivePython, Enthought Canopy, Wakari.io).
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On Dec 15, 2016, at 7:13 AM, Freddy Rietdijk <freddyrietdijk@fridh.nl> wrote:
Putting the conclusion first, I do see value in better publicising "Recommended libraries" based on some automated criteria like:
Yes, we should recommend third-party libraries in a trusted place like the documentation of CPython. The amount of packages that are available can be overwhelming. Yet, defining a set of packages that are recommended, and perhaps working together, is still far from defining an exact set of packages that are known to work together, something which I proposed here.
We could theoretically bake this into PyPI itself, though I’m not sure if that makes sense. We could also probably bake something like “curated package sets” into PyPI where individual users (or groups of users) can create their own view of PyPI for people to use, while still relying on all of the infrastructure of PyPI. Although I’m not sure that makes any sense either. — Donald Stufft

The "curated package sets" on PyPI idea sounds a bit like Steam's curator lists, which I like to think of as Twitter for game reviews. You can follow a curator to see their comments on particular games, and the most popular curators have their comments appear on the actual listings too. Might be interesting to see how something like that worked for PyPI, though the initial investment is pretty high. (It doesn't solve the coherent bundle problem either, just the discovery of good libraries problem.) Top-posted from my Windows Phone -----Original Message----- From: "Donald Stufft" <donald@stufft.io> Sent: 12/15/2016 4:21 To: "Freddy Rietdijk" <freddyrietdijk@fridh.nl> Cc: "DistUtils mailing list" <Distutils-Sig@python.org>; "Barry Warsaw" <barry@python.org> Subject: Re: [Distutils] Maintaining a curated set of Python packages On Dec 15, 2016, at 7:13 AM, Freddy Rietdijk <freddyrietdijk@fridh.nl> wrote:
Putting the conclusion first, I do see value in better publicising
"Recommended libraries" based on some automated criteria like:
Yes, we should recommend third-party libraries in a trusted place like the documentation of CPython. The amount of packages that are available can be overwhelming. Yet, defining a set of packages that are recommended, and perhaps working together, is still far from defining an exact set of packages that are known to work together, something which I proposed here. We could theoretically bake this into PyPI itself, though I’m not sure if that makes sense. We could also probably bake something like “curated package sets” into PyPI where individual users (or groups of users) can create their own view of PyPI for people to use, while still relying on all of the infrastructure of PyPI. Although I’m not sure that makes any sense either. — Donald Stufft

On Thursday, December 1, 2016, Freddy Rietdijk <freddyrietdijk@fridh.nl> wrote:
Hi,
I would like to propose that, as a community, we jointly maintain a curated set of Python packages that are known to work together. These packages would receive security updates for some time and every couple of months a new major release of the curated set comes available. The idea of this is inspired by Haskell LTS, so maybe we should call this PyPI LTS?
So why a PyPI LTS?
PyPI makes available all versions of packages that were uploaded, and by default installers like pip will try to use the latest available versions of packages, unless told otherwise. With a requirements.txt file (or a future pipfile.lock) and setup.py we can pin as much as we like our requirements of respectively the environment and package requirements, thereby making a more reproducible environment possible and also fixing the API for developers. Pinning requirements is often a manual job, although one could use pip freeze or other tools.
https://github.com/nvie/pip-tools : - requirements.in -> pip-compile -> requirements.txt (~pipfile.lock) - I can't remember whether pip-compile includes the checksum in the compiled requirements.txt https://pip.pypa.io/en/stable/reference/pip_install/#hash-checking-mode - Current recommendation: sha256 - (These are obviously different for platform-specific wheels and bdists) https://github.com/rbanffy/pip-chill - Unlike pip freeze, pip-chill includes only top-level deps
A common problem is when two packages in a certain environment require different versions of a package. Having a curated set of packages, developers could be encouraged to test against the latest stable and nightly of the curated package set, thereby increasing compatibility between different packages, something I think we all want.
Having a compatible set of packages is not only interesting for developers, but also for downstream distributions. All distributions try to find a set of packages that are working together and release them. This is a lot of work, and I think it would be in everyone's benefit if we try to solve this issue together.
I think conda has already been mentioned. - environment.yml : http://conda.pydata.org/docs/using/envs.html#use-environment-from-file https://conda-forge.github.io - "A community led collection of recipes, build infrastructure and distributions for the conda package manager." - "AppVeyor, CircleCI and TravisCI"
A possible solution
Downstream, that is developers and distributions, will need a set of packages that are known to work together. At minimum this would consist of, per package, the name of the package and its version, but for reproducibility I would propose adding the filename and hash as well. Because there isn't any reliable method to extract the requirements of a package, I propose also including `setup_requires`, install_requires`, and `tests_require` explicitly. That way, distributions can automatically build recipes for the packages (although non-Python dependencies would still have to be resolved by the distribution).
The package set would be released as lts-YYYY-MM-REVISION, and developers can choose to track a specific revision, but would typically be asked to track only lts-YYYY-MM which would resolve to the latest REVISION.
Because dependencies vary per Python language version, interpreter, and operating system, we would have to have these sets for each combination and therefore I propose having a source which evaluates to say a TOML/JSON file per version/interpreter/OS. How this source file should be written I don't know; while I think the Nix expression language is an excellent choice for this, it is not possible for everyone to use and therefore likely not an option.
YAML: environment.yml, meta.yaml - http://conda.pydata.org/docs/building/meta-yaml.html#requirements-section - http://conda.pydata.org/docs/building/meta-yaml.html#test-section Could/would there be a package with an integration test suite in tests/? Practically, a developer would want a subset of the given known-good-set (and then additional packages), so: - fork/copy requirements-YYYY-MM-REV--<OSNAMEVER>.txt - #comment out unused deps - add '-r addl-requirements.txt'
Open questions
There are still plenty of open questions.
- Who decides when a package is updated that would break dependents? This is an issue all distributions face, so maybe we should involve them.
IDK if e.g. https://requires.io can post to a mailing list? - "Stop wasting your time by manually keeping track of changelogs. Requires.io keeps your python projects secure by monitoring their dependencies." - Source: https://github.com/requires - https://wiki.jenkins-ci.org/display/JENKINS/ShiningPanda+Plugin
- How would this be integrated with pip / virtualenv / pipfile.lock / requirements.txt / setup.py? See e.g. https://github.com/pypa/ pipfile/issues/10#issuecomment-262229620
- https://tox.readthedocs.io/en/latest/ - tox.ini - http://doc.devpi.net/latest/ - http://doc.devpi.net/latest/quickstart-releaseprocess.html#devpi-test-testin...
References to Haskell LTS
Here are several links to some interesting documents on how Haskell LTS works. - A blog post describing what Haskell LTS is: https://www.fpcomplete. com/blog/2014/12/backporting-bug-fixes - Rules regarding uploading and breaking packages: https://github.com/ fpco/stackage/blob/master/MAINTAINERS.md#adding-a-package - The actual LTS files https://github.com/fpco/lts-haskell
What do you think of this proposal? Would you be interested in this as developer, or packager?
Given time to work on this, I'd probably spend it on developing a new or existing (?) comprehensive integration test suite for an already-maintained package set: https://docs.continuum.io/anaconda/pkg-docs https://www.enthought.com/products/canopy/package-index/ https://github.com/audreyr/cookiecutter-pypackage/blob/master/%7B%7Bcookiecu...
Freddy

On Thursday, December 8, 2016, Wes Turner <wes.turner@gmail.com> wrote:
On Thursday, December 1, 2016, Freddy Rietdijk <freddyrietdijk@fridh.nl <javascript:_e(%7B%7D,'cvml','freddyrietdijk@fridh.nl');>> wrote:
Hi,
I would like to propose that, as a community, we jointly maintain a curated set of Python packages that are known to work together. These packages would receive security updates for some time and every couple of months a new major release of the curated set comes available. The idea of this is inspired by Haskell LTS, so maybe we should call this PyPI LTS?
So why a PyPI LTS?
PyPI makes available all versions of packages that were uploaded, and by default installers like pip will try to use the latest available versions of packages, unless told otherwise. With a requirements.txt file (or a future pipfile.lock) and setup.py we can pin as much as we like our requirements of respectively the environment and package requirements, thereby making a more reproducible environment possible and also fixing the API for developers. Pinning requirements is often a manual job, although one could use pip freeze or other tools.
https://github.com/nvie/pip-tools :
- requirements.in -> pip-compile -> requirements.txt (~pipfile.lock) - I can't remember whether pip-compile includes the checksum in the compiled requirements.txt
https://pip.pypa.io/en/stable/reference/pip_install/#hash-checking-mode
- Current recommendation: sha256 - (These are obviously different for platform-specific wheels and bdists)
https://github.com/rbanffy/pip-chill
- Unlike pip freeze, pip-chill includes only top-level deps
A common problem is when two packages in a certain environment require different versions of a package. Having a curated set of packages, developers could be encouraged to test against the latest stable and nightly of the curated package set, thereby increasing compatibility between different packages, something I think we all want.
Having a compatible set of packages is not only interesting for developers, but also for downstream distributions. All distributions try to find a set of packages that are working together and release them. This is a lot of work, and I think it would be in everyone's benefit if we try to solve this issue together.
I think conda has already been mentioned.
- environment.yml : http://conda.pydata.org/docs/using/envs.html#use- environment-from-file
- "A community led collection of recipes, build infrastructure and distributions for the conda package manager." - "AppVeyor, CircleCI and TravisCI"
A possible solution
Downstream, that is developers and distributions, will need a set of packages that are known to work together. At minimum this would consist of, per package, the name of the package and its version, but for reproducibility I would propose adding the filename and hash as well. Because there isn't any reliable method to extract the requirements of a package, I propose also including `setup_requires`, install_requires`, and `tests_require` explicitly. That way, distributions can automatically build recipes for the packages (although non-Python dependencies would still have to be resolved by the distribution).
The package set would be released as lts-YYYY-MM-REVISION, and developers can choose to track a specific revision, but would typically be asked to track only lts-YYYY-MM which would resolve to the latest REVISION.
Because dependencies vary per Python language version, interpreter, and operating system, we would have to have these sets for each combination and therefore I propose having a source which evaluates to say a TOML/JSON file per version/interpreter/OS. How this source file should be written I don't know; while I think the Nix expression language is an excellent choice for this, it is not possible for everyone to use and therefore likely not an option.
YAML: environment.yml, meta.yaml
- http://conda.pydata.org/docs/building/meta-yaml.html# requirements-section - http://conda.pydata.org/docs/building/meta-yaml.html#test-section
Could/would there be a package with an integration test suite in tests/?
Practically, a developer would want a subset of the given known-good-set (and then additional packages), so:
- fork/copy requirements-YYYY-MM-REV--<OSNAMEVER>.txt - #comment out unused deps - add '-r addl-requirements.txt'
Open questions
There are still plenty of open questions.
- Who decides when a package is updated that would break dependents? This is an issue all distributions face, so maybe we should involve them.
IDK if e.g. https://requires.io can post to a mailing list?
- "Stop wasting your time by manually keeping track of changelogs. Requires.io keeps your python projects secure by monitoring their dependencies." - Source: https://github.com/requires - https://wiki.jenkins-ci.org/display/JENKINS/ShiningPanda+Plugin
- How would this be integrated with pip / virtualenv / pipfile.lock / requirements.txt / setup.py? See e.g. https://github.com/pypa/p ipfile/issues/10#issuecomment-262229620
- https://tox.readthedocs.io/en/latest/ - tox.ini - http://doc.devpi.net/latest/ - http://doc.devpi.net/latest/quickstart-releaseprocess. html#devpi-test-testing-an-uploaded-package
- Such a known-good-set could be hosted as a package index w/ devpi http://docs.openstack.org/developer/pbr/ - pbr does away with setup.py and install_requires in favor of just requirements.txt
References to Haskell LTS
Here are several links to some interesting documents on how Haskell LTS works. - A blog post describing what Haskell LTS is: https://www.fpcomplete.com /blog/2014/12/backporting-bug-fixes - Rules regarding uploading and breaking packages: https://github.com/f pco/stackage/blob/master/MAINTAINERS.md#adding-a-package - The actual LTS files https://github.com/fpco/lts-haskell
Haskell ... pandoc ... https://sites.google.com/site/pydatalog/
What do you think of this proposal? Would you be interested in this as developer, or packager?
Given time to work on this, I'd probably spend it on developing a new or existing (?) comprehensive integration test suite for an already-maintained package set:
https://docs.continuum.io/anaconda/pkg-docs
https://www.enthought.com/products/canopy/package-index/
https://github.com/audreyr/cookiecutter-pypackage/blob/ master/%7B%7Bcookiecutter.project_slug%7D%7D/tests/test_ %7B%7Bcookiecutter.project_slug%7D%7D.py
Freddy

On 2016-12-08 10:05:47 -0600 (-0600), Wes Turner wrote: [...]
http://docs.openstack.org/developer/pbr/
- pbr does away with setup.py and install_requires in favor of just requirements.txt [...]
It doesn't entirely "do away with setup.py" (it still relies on a relatively minimal boilerplate setup.py which loads its setuptools entrypoint), but does allow you to basically abstract away most common configuration into declarative setup.cfg and requirements.txt files (similar to some of the pyproject.toml use cases Donald et al have for PEP 518). -- Jeremy Stanley

Jeremy Stanley <fungi@yuggoth.org> writes:
[the ‘pbr’ library] does allow you to basically abstract away most common configuration into declarative setup.cfg and requirements.txt files
Hmm. That description sounds like a mistaken conflation of two things that should be distinct: * Declaration in Setuptools metadata of what versions of dependencies this distribution is *compatible with*. This purpose is served by Distutils ‘install_requires’ (and ‘test_requires’, etc.). It is characterised by specifying a range of versions for each dependency, for allowing dependency resolvers to have options to choose from. foo >=1.2, <3.0 * Declaration in Pip metadata for what *exact version* of each dependency I want to deploy. This purpose is served by Pip ‘requires.txt’ input. It is characterised by pinning a *single* version of each dependency, for a deterministic, repeatable deployment. foo == 1.4.7 If we're saying ‘pbr’ encourages the use of a single set of declarations for those quite different purposes, that sounds like an attractive nuisance. For those who haven't read it, see this post from Donald Stufft for why those purposes need to be kept distinct: There’s a lot of misunderstanding between setup.py and requirements.txt and their roles. A lot of people have felt they are duplicated information and have even created tools to handle this “duplication”. <URL:https://caremad.io/posts/2013/07/setup-vs-requirement/> -- \ “Corporation, n. An ingenious device for obtaining individual | `\ profit without individual responsibility.” —Ambrose Bierce, | _o__) _The Devil's Dictionary_, 1906 | Ben Finney

On 9 Dec 2016 4:42 PM, "Ben Finney" <ben+python@benfinney.id.au> wrote: Jeremy Stanley <fungi@yuggoth.org> writes:
[the ‘pbr’ library] does allow you to basically abstract away most common configuration into declarative setup.cfg and requirements.txt files
Hmm. That description sounds like a mistaken conflation of two things that should be distinct It's not, though the name of the file it looks for is indeed evocative of concrete version lists. Requirements.txt was on the path to being deprecated in order to reduce confusion, but I'm no longer paid to work on that tool chain, so it's now in my lengthy best effort to-do list. Requirements.txt files are mapped directly into install-requires by pbr, and the replacement is to put the same dependencies into setup.cfg. * ... If we're saying ‘pbr’ encourages the use of a single set of declarations for those quite different purposes, that sounds like an attractive nuisance It doesn't, confusing names aside. We even wrote a tool - generate-constraints to calculate the transitive closure, python version specific as needed, for placing into a constraints / requires file for pip to consume. Rob

On Dec 8, 2016, at 10:41 PM, Ben Finney <ben+python@benfinney.id.au> wrote:
For those who haven't read it, see this post from Donald Stufft for why those purposes need to be kept distinct
Somewhat funnily, pbr is what triggered me to actually write that post :) If I recall, at the time it only supported requirements.txt but it’s since then gotten support for putting it in setup.cfg and I think that is preferred now? — Donald Stufft

On Thursday, December 8, 2016, Ben Finney <ben+python@benfinney.id.au <javascript:_e(%7B%7D,'cvml','ben%2Bpython@benfinney.id.au');>> wrote:
Jeremy Stanley <fungi@yuggoth.org> writes:
[the ‘pbr’ library] does allow you to basically abstract away most common configuration into declarative setup.cfg and requirements.txt files
Hmm. That description sounds like a mistaken conflation of two things that should be distinct:
* Declaration in Setuptools metadata of what versions of dependencies this distribution is *compatible with*.
This purpose is served by Distutils ‘install_requires’ (and ‘test_requires’, etc.). It is characterised by specifying a range of versions for each dependency, for allowing dependency resolvers to have options to choose from.
foo >=1.2, <3.0
* Declaration in Pip metadata for what *exact version* of each dependency I want to deploy.
This purpose is served by Pip ‘requires.txt’ input. It is characterised by pinning a *single* version of each dependency, for a deterministic, repeatable deployment.
foo == 1.4.7
If we're saying ‘pbr’ encourages the use of a single set of declarations for those quite different purposes, that sounds like an attractive nuisance.
For those who haven't read it, see this post from Donald Stufft for why those purposes need to be kept distinct:
There’s a lot of misunderstanding between setup.py and requirements.txt and their roles. A lot of people have felt they are duplicated information and have even created tools to handle this “duplication”.
<URL:https://caremad.io/posts/2013/07/setup-vs-requirement/>
https://github.com/nvie/pip-tools : - requirements.in -> pip-compile -> requirements.txt (~pipfile.lock)
-- \ “Corporation, n. An ingenious device for obtaining individual | `\ profit without individual responsibility.” —Ambrose Bierce, | _o__) _The Devil's Dictionary_, 1906 | Ben Finney
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
participants (13)
-
Barry Warsaw
-
Ben Finney
-
Chris Barker
-
Chris Barker - NOAA Federal
-
Donald Stufft
-
Freddy Rietdijk
-
Glyph Lefkowitz
-
Jeremy Stanley
-
Nick Coghlan
-
Robert Collins
-
Robert T. McGibbon
-
Steve Dower
-
Wes Turner