Mailman 3 Current Python packaging status (from my point of view) - Distutils-SIG

Current Python packaging status (from my point of view)

Nick Coghlan

17 Sep 2016 17 Sep '16

3:59 p.m.

Hi folks, Prompted by a few posts I read recently about the current state of the Python packaging ecosystem, I figured it made sense to put together an article summarising my own perspective on the current state of things: http://www.curiousefficiency.org/posts/2016/09/python-packaging-ecosystem.ht... It's pretty long, so the short version tailored specifically for the distutils-sig audience would be: * restating the point that pip & conda solve different problems, so while there's some overlap in their core capabilities, neither is a substitute for the other * we've actually managed to put some pretty hard problems behind us in the last few years, so my thanks to everyone that's played a part in that process * some of the thorniest problems that still remain really do require proper funding of the core ecosystem infrastructure (most notably PyPI), which means that either commercial redistributors need to step up and handle the problem on behalf of their customers, or else the PSF needs to figure out alternative sources of funding (with "let's do both!" really being my preferred outcome on that front) Cheers, Nick. P.S. For the Twitter users amongst you, feel free to pass along my link to the article: https://twitter.com/ncoghlan_dev/status/777087670837088256 -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Show replies by date

Thomas Güttler

1 Nov 1 Nov

1 p.m.

Am 17.09.2016 um 12:29 schrieb Nick Coghlan:

...

Hi folks,

Prompted by a few posts I read recently about the current state of the Python packaging ecosystem, I figured it made sense to put together an article summarising my own perspective on the current state of things: http://www.curiousefficiency.org/posts/2016/09/python-packaging-ecosystem.ht...

Thank you for this summarizing article. Yes, a lot was done during the last months. I liked the part "My core software ecosystem design philosophy" a lot, since it explains that both parties (software consumer and software publisher) want it to be simple and easy. About conda: if pip and conda overlap in some point. Why not implement this in a reusable library which gets used by conda and pip? About funding: Looking for more income is one way to solve this. Why not look into the other direction: How to reduce costs? Heading "Making the presence of a compiler on end user systems optional". Here I just can say: Thank you very much. I guess it was a lot of hard work to make this all simple and easy for the software consumers and publishers. Thank you. I wrote some lines, but I deleted my thoughts about the topic "Automating wheel creation", since I am a afraid it could raise bad mood in this list again. That's not my goal. Regards, Thomas Güttler -- http://www.thomas-guettler.de/

Nick Coghlan

3:20 p.m.

On 1 November 2016 at 17:30, Thomas Güttler wrote:

...

Am 17.09.2016 um 12:29 schrieb Nick Coghlan:

...
Hi folks,

Prompted by a few posts I read recently about the current state of the Python packaging ecosystem, I figured it made sense to put together an article summarising my own perspective on the current state of things: http://www.curiousefficiency.org/posts/2016/09/python-packaging-ecosystem.ht...

Thank you for this summarizing article. Yes, a lot was done during the last months.

I liked the part "My core software ecosystem design philosophy" a lot, since it explains that both parties (software consumer and software publisher) want it to be simple and easy.

About conda: if pip and conda overlap in some point. Why not implement this in a reusable library which gets used by conda and pip?

For the parts where they genuinely overlap, conda is already able to just use pip, or else the same libraries that pip uses. For the platform management pieces (SAT solving for conda repositories, converting PyPI packages to conda ones, language independent environment management), what conda does is outside the scope of what pip supports anyway.

...

About funding: Looking for more income is one way to solve this. Why not look into the other direction: How to reduce costs?

Thanks to donated infrastructure, the direct costs to the PSF are incredibly low already. Donald went into some detail on that in https://caremad.io/posts/2016/05/powering-pypi/ and that's mostly still accurate (although his funded role with HPE ended recently)

...

Heading "Making the presence of a compiler on end user systems optional". Here I just can say: Thank you very much. I guess it was a lot of hard work to make this all simple and easy for the software consumers and publishers. Thank you.

I wrote some lines, but I deleted my thoughts about the topic "Automating wheel creation", since I am a afraid it could raise bad mood in this list again. That's not my goal.

I currently see 3 main ways that could eventually happen: - the PSF sorts out its general sustainability concerns to the point where it believes it can credibly maintain such a service on the community's behalf - the conda-forge folks branch out into offering wheel building as well (so it becomes a matter of "publish your Python projects for the user level conda platform, get platform independent Python wheels as well") - someone builds such a service independently of the current PyPI infrastructure team, and convinces package publishers to start using it There's also a 4th variant, which is getting to a point where someone figures out a pushbutton solution for a build pipeline in a public PaaS that offers a decent free tier. This is potentially one of the more promising options, since it means the sustainability risks related to future growth in demand accrue to the PaaS providers, rather than to the PSF. However, it's somewhat gated on the Warehouse migration, since you really want API token support for that kind of automation, which is something the current PyPI code base doesn't have, and isn't going to gain. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Paul Moore

3:52 p.m.

On 1 November 2016 at 09:50, Nick Coghlan wrote:

...

There's also a 4th variant, which is getting to a point where someone figures out a pushbutton solution for a build pipeline in a public PaaS that offers a decent free tier.

This, of course, is relatively easy for the simple cases (I could set up a project that builds Windows wheels on Appveyor relatively easily - I may well do so when I have some spare time). But the "simple cases" are precisely those where the only difficulty is the (relatively easy to overcome) need for a compiler. In other words, if "pip wheel foo" works when you have a compiler, it's simple to automate the building of wheels. The significant problem here is the process of setting up build dependencies for more complex projects, like scipy, pywin32, or lxml, or even simpler cases such as pyyaml. Each such project would need custom setup steps, and there's no easy way for a generic wheel building service to infer those steps - the onus would necessarily be on the individual projects to provide a script that manages the build process (and such a script might as well be the setup,py, meaning that we are back to the "pip wheel foo just works" case). So the issue remains one of *someone* doing project-by-project work to contribute scripts that cleanly handle setting up the build environment. Whether those scripts get contributed to the project or to a build farm project, is not so much a technical matter as a matter of whether projects are willing to accept such pull requests. Paul

Nick Coghlan

5:49 p.m.

On 1 November 2016 at 20:22, Paul Moore wrote:

...

On 1 November 2016 at 09:50, Nick Coghlan wrote:

...
There's also a 4th variant, which is getting to a point where someone figures out a pushbutton solution for a build pipeline in a public PaaS that offers a decent free tier.

This, of course, is relatively easy for the simple cases (I could set up a project that builds Windows wheels on Appveyor relatively easily - I may well do so when I have some spare time). But the "simple cases" are precisely those where the only difficulty is the (relatively easy to overcome) need for a compiler. In other words, if "pip wheel foo" works when you have a compiler, it's simple to automate the building of wheels.

It isn't that simple, as what you really want to automate is the *release process*, where you upload an sdist, and the wheels *just happen* for: - the Python versions you want to support (e.g 2.7, 3.4, 3.5) - the platforms you want to support (typically x86_64 for Windows, Mac OS X, manylinux1, and maybe 32 bit Windows as well) Adding a new Python release or a new platform to the build configuration is currently an activity that requires per-project work when in theory a build service could just add it automatically based on when new releases happen.

...

The significant problem here is the process of setting up build dependencies for more complex projects, like scipy, pywin32, or lxml, or even simpler cases such as pyyaml. Each such project would need custom setup steps, and there's no easy way for a generic wheel building service to infer those steps - the onus would necessarily be on the individual projects to provide a script that manages the build process (and such a script might as well be the setup,py, meaning that we are back to the "pip wheel foo just works" case).

Making "pip wheel foo" work cross platform and defining the steps needed to cut an sdist for each release are the only pieces that should genuinely require work on a per-project basis. However, at the moment, there are a lot of other steps related to the per-release build-and-publish matrix that are also defined on a per-project basis, and there's no fundamental reason that they need to be - it's purely a matter of providing a public build service and keeping it running smoothly isn't something you want to be asking volunteers to be responsible for. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Paul Moore

6:40 p.m.

On 1 November 2016 at 12:19, Nick Coghlan wrote:

...

It isn't that simple, as what you really want to automate is the *release process*, where you upload an sdist, and the wheels *just happen* for:

- the Python versions you want to support (e.g 2.7, 3.4, 3.5) - the platforms you want to support (typically x86_64 for Windows, Mac OS X, manylinux1, and maybe 32 bit Windows as well)

Adding a new Python release or a new platform to the build configuration is currently an activity that requires per-project work when in theory a build service could just add it automatically based on when new releases happen.

Ah. If you're looking for automated publication on PyPI (or alternatively, creation and maintenance of a secondary index for wheels) then yes, that's a much bigger piece of work. Paul

Chris Barker

10:35 p.m.

On Tue, Nov 1, 2016 at 5:19 AM, Nick Coghlan wrote:

...

It isn't that simple, as what you really want to automate is the *release process*, where you upload an sdist, and the wheels *just happen* for:

- the Python versions you want to support (e.g 2.7, 3.4, 3.5) - the platforms you want to support (typically x86_64 for Windows, Mac OS X, manylinux1, and maybe 32 bit Windows as well)

indeed -- do take a look at conda-forge, they've got all that working. and while it ultimately calls "conda build this_thing", it wouldn't be hard to make that a call to build a wheel instead.

...

Adding a new Python release or a new platform to the build configuration is currently an activity that requires per-project work when in theory a build service could just add it automatically based on when new releases happen.

hmm -- maybe we could leverage gitHub, like conda-forge does -- Warehouse would actually push to a repo on gitHub that would then trigger the CI builds -- though the sure seems cleaner for Warehouse to call teh CIs directly.

...

-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Nick Coghlan

2 Nov 2 Nov

8:10 p.m.

On 2 November 2016 at 03:05, Chris Barker wrote:

...

...
Adding a new Python release or a new platform to the build configuration is currently an activity that requires per-project work when in theory a build service could just add it automatically based on when new releases happen.

hmm -- maybe we could leverage gitHub, like conda-forge does -- Warehouse would actually push to a repo on gitHub that would then trigger the CI builds -- though the sure seems cleaner for Warehouse to call teh CIs directly.

GitHub's integration works the other way around - it emits notifications when events (like new commits) happen, and folks can register external service to receive those events and then authorize them to act on them (e.g. by publishing a release, or commenting on a pull request). This is one of the real costs of the lack of funding for PyPI development - we simply don't have those event notification and service authorisation primitives built into the current platform, so the only current way to automate things is to trust services with full access to your PyPI account by providing your password to them (which is a fundamentally bad idea, which is why we don't recommend it). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Chris Barker

1 Nov 1 Nov

10:31 p.m.

On Tue, Nov 1, 2016 at 2:50 AM, Nick Coghlan wrote:

...

...
I wrote some lines, but I deleted my thoughts about the topic "Automating wheel creation", since I am a afraid it could raise bad mood in this list again. That's not my goal.

I currently see 3 main ways that could eventually happen:

- the PSF sorts out its general sustainability concerns to the point where it believes it can credibly maintain such a service on the community's behalf

That would be great.

...

- the conda-forge folks branch out into offering wheel building as well (so it becomes a matter of "publish your Python projects for the user level conda platform, get platform independent Python wheels as well")

I doubt that's going to happen, but conda-forge is an open source project -- someone else could certainly make a friendly fork and make a wheel-forge. - someone builds such a service independently of the current PyPI

...

infrastructure team, and convinces package publishers to start using it

and starting with the nice tools conda-forge has built would be a good way to get going. There is an odd thing here, though -- conda-forge works because of two things: 1) The gitHub workflow is helpful for developing and vetting recipes 2) gitHub already has arrangements with CI services -- so it's easy to leverage them to do the building. But in theory, the PSF could negotiate an arrangement with the CI services to make it easy to call them from PyPi. and (1) is really necessary because of two things -- the folks making teh recipes generally are not the package maintainers. and, critical here -- there are a lot of issues with consistent dependency management -- which his what conda can help with, and wheels just don't. ANother key point about all of this: Nick missed the key point about conda. He mentioned that conda allows you to manage the python run-time itself, which is in deed a nice feature, but getting a python run-time as never been the hard part (maybe on Linux if you want a different one than your system supplies). And it's not about needing a compiler -- wheels solve that -- and getting the system compiler is not all that hard either -- on all three major OSs. It's about the non-python dependencies -- all the various C libs, etc that your python packages might need. Or, for that matter, about non-python command line utilities, and who knows what else. Sure, this is a bigger issue in scientific computing -- which is why conda was born there -- but it's an issue in many other realms as well. For many years, a Web-dev I worked with avoided using PIL because it was too much a pain to install -- that's pretty easy these days (thanks Pillow!) but it's not long before a serious web service might need SOMETHING that's not pure python... So -- an auto wheel building system would be nice, but until it can handle non-python dependencies, it's going to hit a wall.... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Nick Coghlan

2 Nov 2 Nov

8:02 p.m.

On 2 November 2016 at 03:01, Chris Barker wrote:

...

Nick missed the key point about conda. He mentioned that conda allows you to manage the python run-time itself, which is in deed a nice feature, but getting a python run-time as never been the hard part (maybe on Linux if you want a different one than your system supplies).

I didn't miss it accidentally, I left it out because it wasn't relevant to the post (which was about the ecosystem design direction, not the end user features that make the desire to use pip incomprehensible to a lot of folks). Designing software assembly tools for interpreted software is a system integrator's game, and when people are writing Python code, there is one absolutely 100% unavoidable integration task: choosing which Python runtimes they're going to support. But as an ecosystem designer, even if you only consider that single core integration activity, you have already hit a fundamental difference between pip and conda: conda requires that you use a runtime that it provides (similar to the way Linux distro packaging works), while pip really doesn't care where the runtime came from, as long as it can run pip properly. Essentially all the other differences between the two systems stem from the different choice made in that core design decision, which is why the two aren't substitutes for each other and never will be - they solve different problems for different audiences. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Chris Barker

9:24 p.m.

On Wed, Nov 2, 2016 at 7:32 AM, Nick Coghlan wrote:

...

...
He mentioned that conda allows you to manage the python run-time itself, which is in deed a nice feature, but getting a python run-time as never been the hard part (maybe on Linux if you want a different one than your system supplies).

I didn't miss it accidentally, I left it out because it wasn't relevant to the post (which was about the ecosystem design direction,

I would argue that is is quite relevant -- talking about design decisions without talking about the motivations and consequences of those decisions is missing much of the point. The issue here is that folks might read that post and think: "do I want to manage my python install or not?" and think that's the only question they need to ask to determine if pip or conda is right for them. But it's not at all. pip is about managing stuff WITHIN python -- that's why it can work with any conforming python install. So that's the advantage of this design decision. But it leads to a major limitation, EVEN if you only care about python, because it can't (properly) manage the stuff outside of python that python packages may need. I honestly have no idea if the original motivation was specifically to have a system that could work with any python install (maybe), but it certainly was designed specifically to handle python packages, period. conda started with the motivation of managing complex non-python dependencies (initially, to support python) -- in order to do that effectively, it has to operate outside the python envelope, and and that means that it really needs to manage python itself. I'm pretty sure that managing python itself was a consequence of the design goals, not a primary design goal. not the end user features that make the desire to use pip

...

incomprehensible to a lot of folks).

Designing software assembly tools for interpreted software is a system integrator's game, and when people are writing Python code, there is one absolutely 100% unavoidable integration task: choosing which Python runtimes they're going to support.

hmm -- I don't think that's the code-writers job -- it's the deployers job. Other than choosing which python *version* I want to use, I can happily develop with system python and pip, and then deploy with conda -- or vice versa. INdeed, I can develop on Windows and deploy on LInux, or.... though if you meant pypy vs iron python vs cPython when you meant "runtime" then yes, with the dependency issue, you really do need to make that choice upfront.

...

conda requires that you use a runtime that it provides (similar to the way Linux distro packaging works), while pip really doesn't care where the runtime came from, as long as it can run pip properly.

yes indeed -- and I'm fully aware of the consequences of that -- I work in a very locked down system -- our system security folks REALLY want us to use system-supplied packages (i.e. the python supplied by the OS distributor) where possible. But that doesn't mean that I should try to manage my applications with pip -- because while that makes the python part easier, it makes the dependencies a nightmare -- having to custom compile all sorts of stuff -- much more work, and not any more satisfying to the security folks. So you really need to look at your whole system to determine what will best work for your use case.

...

Essentially all the other differences between the two systems stem from the different choice made in that core design decision, which is why the two aren't substitutes for each other and never will be - they solve different problems for different audiences.

Indeed they do. But the different audiences aren't "data science folks" vs "web developers" -- the different audiences are determined by deployment needs, not domain. conda environments are kind like mini docker containers -- they really can make life easier for lots of use cases, if you can get your IT folks to accept a "non standard" python. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Nick Coghlan

10:19 p.m.

On 3 November 2016 at 01:54, Chris Barker wrote:

...

On Wed, Nov 2, 2016 at 7:32 AM, Nick Coghlan wrote:

...
...
He mentioned that conda allows you to manage the python run-time itself, which is in deed a nice feature, but getting a python run-time as never been the hard part (maybe on Linux if you want a different one than your system supplies).

I didn't miss it accidentally, I left it out because it wasn't relevant to the post (which was about the ecosystem design direction,

I would argue that is is quite relevant -- talking about design decisions without talking about the motivations and consequences of those decisions is missing much of the point.

No, as the post was about the fundamental and irreconcilable differences in capabilities, not the incidental ones that can be solved if folks choose (or are paid) to put in the necessary design and development time.

...

The issue here is that folks might read that post and think: "do I want to manage my python install or not?" and think that's the only question they need to ask to determine if pip or conda is right for them. But it's not at all.

The post isn't written for beginners deciding which tool to use, it's written for intermediate folks that have already chosen one or the other for their own needs, and are wondering why the other still exists (or, worse, are telling people that have chosen the other tool for good reasons that they're wrong, and should switch).

...

pip is about managing stuff WITHIN python -- that's why it can work with any conforming python install. So that's the advantage of this design decision. But it leads to a major limitation, EVEN if you only care about python, because it can't (properly) manage the stuff outside of python that python packages may need. I honestly have no idea if the original motivation was specifically to have a system that could work with any python install (maybe), but it certainly was designed specifically to handle python packages, period.

Aside from already needing a Python runtime, the inability to fully specify the target environment isn't an inherent design limitation though, the solution just looks different at a pip level: - you need a system for specifying environmental *constraints* (like dynamically linked C libraries and command line applications you invoke) - you need a system for asking the host environment if it can satisfy those constraints Tennessee Leeuwenberg started a draft PEP for that first part last year: https://github.com/pypa/interoperability-peps/pull/30/files dnf/yum, apt, brew, conda, et al all *work around* the current lack of such a system by asking humans (aka "downstream package maintainers") to supply the additional information by hand in a platform specific format.

...

conda started with the motivation of managing complex non-python dependencies (initially, to support python) -- in order to do that effectively, it has to operate outside the python envelope, and and that means that it really needs to manage python itself. I'm pretty sure that managing python itself was a consequence of the design goals, not a primary design goal.

Correct, just as managing Python runtimes specifically isn't a primary goal of Linux distro package managers, it's just an artifact of providing a language independent dependency declaration and management system.

...

...
not the end user features that make the desire to use pip incomprehensible to a lot of folks).

Designing software assembly tools for interpreted software is a system integrator's game, and when people are writing Python code, there is one absolutely 100% unavoidable integration task: choosing which Python runtimes they're going to support.

hmm -- I don't think that's the code-writers job -- it's the deployers job. Other than choosing which python *version* I want to use, I can happily develop with system python and pip, and then deploy with conda -- or vice versa. INdeed, I can develop on Windows and deploy on LInux, or....

You still need to decide which versions you're going to test against, and which bug reports you're going to accept as potentially valid feedback (e.g. very few people running upstream community projects will accept "doesn't run on Python 2.5" as a valid bug report any more, and RHEL/CentOS 7, Software Collections, and conda have been around for long enough now that most won't accept "doesn't run on 2.6" either)

...

though if you meant pypy vs iron python vs cPython when you meant "runtime" then yes, with the dependency issue, you really do need to make that choice upfront.

I also mean 2.6 vs 2.7 vs 3.4 vs 3.5 vs 3.6, etc

...

But the different audiences aren't "data science folks" vs "web developers" -- the different audiences are determined by deployment needs, not domain.

Deployment needs are strongly correlated with domain though, and there's a world of difference between the way folks do exploratory data analysis and the way production apps are managed in Heroku/OpenShift/Cloud Foundry/Lambda/etc. You can certainly use conda to do production web service deployments, and if you're still deploying to bare metal or full VMs, or building your own container images from scratch, it's a decent option to consider. However, if you're specifically interested in web service development, then swapping in your own Python runtime rather than just using a PaaS provided one is really much lower level than most beginners are going to want to be worrying about these days - getting opinionated about that kind of thing comes later (if it happens at all). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Donald Stufft

10:27 p.m.

...

On Nov 2, 2016, at 12:49 PM, Nick Coghlan wrote:

...
hmm -- I don't think that's the code-writers job -- it's the deployers job. Other than choosing which python *version* I want to use, I can happily develop with system python and pip, and then deploy with conda -- or vice versa. INdeed, I can develop on Windows and deploy on LInux, or....

You still need to decide which versions you're going to test against, and which bug reports you're going to accept as potentially valid feedback (e.g. very few people running upstream community projects will accept "doesn't run on Python 2.5" as a valid bug report any more, and RHEL/CentOS 7, Software Collections, and conda have been around for long enough now that most won't accept "doesn't run on 2.6" either)

...
though if you meant pypy vs iron python vs cPython when you meant "runtime" then yes, with the dependency issue, you really do need to make that choice upfront.

I also mean 2.6 vs 2.7 vs 3.4 vs 3.5 vs 3.6, etc

There are still platform differences too, we regularly get bugs that only are exposed on Anaconda or on Ubuntu’s Python or on RHEL's Python or on Python.org’s OS X installers etc etc. Basically every variation has a chance to introduce a bug of some kind, and if you’re around long enough and you’re used enough you’ll run into them on every system. As someone writing that code you have to decide where you draw the line for what you support or not (for instance, you may support Ubuntu/RHEL/Anaconda, but you may decide that any version of CPython running on HPUX is not supported). — Donald Stufft

Chris Barker

11:52 p.m.

On Wed, Nov 2, 2016 at 9:57 AM, Donald Stufft wrote:

...

...
On Nov 2, 2016, at 12:49 PM, Nick Coghlan wrote:

I also mean 2.6 vs 2.7 vs 3.4 vs 3.5 vs 3.6, etc

Of course, but that has nothing to do with the package management system...

...

There are still platform differences too, we regularly get bugs that only are exposed on Anaconda or on Ubuntu’s Python or on RHEL's Python or on Python.org’s OS X installers etc etc.

Yes, that is the challenge -- and one reason folks may be tempted to say we should have ONE package manager -- to reduce the variation -- though I don't think any of us think that's possible (and probably not desirable). Basically every variation has a chance to introduce a bug of some kind, and

...

if you’re around long enough and you’re used enough you’ll run into them on every system. As someone writing that code you have to decide where you draw the line for what you support or not (for instance, you may support Ubuntu/RHEL/Anaconda, but you may decide that any version of CPython running on HPUX is not supported).

or you may decide to ONLY support conda -- my use case is a big pile of tangled dependencies (yes, lots o' scientific stuff) that is fairly easy to manage in conda and freekin' nightmare without it. Oh, and I'm putting it behind a web service, so I need to deploy on locked-down servers.... You CAN get all those dependencies working without conda, but it's a serious pain. Almost impossible on Windows. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Donald Stufft

3 Nov 3 Nov

12:01 a.m.

...

On Nov 2, 2016, at 2:22 PM, Chris Barker wrote:

or you may decide to ONLY support conda -- my use case is a big pile of tangled dependencies (yes, lots o' scientific stuff) that is fairly easy to manage in conda and freekin' nightmare without it.

Sure. Do whatever you want, I don’t think anyone here thinks you absolutely must use pip. :) [1] My point is just that even if you narrow yourself down to CPython 3.6.0 there are still variations that can cause problems so each project individually ends up needing to decide what they support. Often times “unsupported” variations will still work unless they go out of their way to break it, but not always. [1] There seems to be some animosity among pip supporters and conda supports, or at least a perception that there is. I’d just like to say that this isn’t really shared (to my knowledge) by the development teams of either project. I think everyone involved thinks folks should use whatever solution best allows them to solve whatever problem they are having. — Donald Stufft

Chris Barker

12:19 a.m.

On Wed, Nov 2, 2016 at 11:31 AM, Donald Stufft wrote:

...

Sure. Do whatever you want, I don’t think anyone here thinks you absolutely must use pip. :) [1]

indeed -- and IIUC, part of the thrust of Nick's post was that different package managers serve different use-cases -- so we do need more than one.

...

My point is just that even if you narrow yourself down to CPython 3.6.0 there are still variations that can cause problems so each project individually ends up needing to decide what they support. Often times “unsupported” variations will still work unless they go out of their way to break it, but not always.

yup -- when I say I don't support non-conda installs, it doesn't mean my software wont work without it -- it means you're going to need to figure out how to compile those ugly dependencies yourself :-)

...

[1] There seems to be some animosity among pip supporters and conda supports, or at least a perception that there is.

for my part, I'd say "frustration" more than animosity. I see a lot of folks struggling with tools that don't serve their needs, without awareness that there are better options. And some of those folks want me to support them -- and I do want to support my users. I’d just like to say that this isn’t really shared (to my knowledge) by the

...

development teams of either project. I think everyone involved thinks folks should use whatever solution best allows them to solve whatever problem they are having.

I agree -- Python is a great community! -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Matthew Brett

2:32 a.m.

Hi, On Wed, Nov 2, 2016 at 11:31 AM, Donald Stufft wrote: [snip]

...

[1] There seems to be some animosity among pip supporters and conda supports, or at least a perception that there is. I’d just like to say that this isn’t really shared (to my knowledge) by the development teams of either project. I think everyone involved thinks folks should use whatever solution best allows them to solve whatever problem they are having.

I don't know whether there is animosity, but there is certainly tension. Speaking personally, I care a lot about having the option to prefer pip. There are others in the scientific community who feel we should standardize on conda. I think this is the cause of Chris' frustration. If we could all use conda instead of pip, then this would make it easier for him in terms of packaging, because he would not have to support pip (Chris please correct me if I'm wrong). Although there are clear differences in the audience for pip and conda, there is also a very large overlap. In practice the majority of users could reasonably choose one or the other as their starting point. Of course, one may come to dominate this choice over the other. At the point where enough users become frustrated with the lack of pip wheels, conda will become the default. If pip wheels are widely available, that makes the pressure to use conda less. If we reach this tipping point it will become wasteful of developer effort to make pip wheels / conda packages, the number and quality of binary packages will drop, and one of these package managers will go into decline. Cheers, Matthew

Chris Barker

4:38 a.m.

Hey Matthew,

...

[1] There seems to be some animosity among pip supporters and conda

...
supports, or at least a perception that there is.

I don't know whether there is animosity, but there is certainly tension. Speaking personally, I care a lot about having the option to prefer pip.

indeed -- and you have made herculean efforts to make that possible :-)

...

There are others in the scientific community who feel we should standardize on conda. I think this is the cause of Chris' frustration. If we could all use conda instead of pip, then this would make it easier for him in terms of packaging, because he would not have to support pip (Chris please correct me if I'm wrong).

yup -- nor would you :-) My personal frustration comes with my history -- I've been in this community for over 15 years -- and I spent a lot of effort back in the day to make packages available for the Mac (before pypi, pip, before wheels, ...). And I found I was constantly re-figuring out how to build the dependent libraries needed, and statically linking everything, etc... It sucked. A lot of work, and not at all fun (for me, anyway -- maybe some folks enjoy that sort of thing). So I started a project to share that effort, and to build a bit of infrastructure. I also started looking into how to handle dependent libs with pip+wheel -- I got no support whatsoever for that. I got frustrated 'cause it was too hard, and I also felt like I was fighting the tools. I did not get far. Mathew ended up picking up that effort and really making it work, and had gotten all the core SciPY stuff out there as binary wheels -- really great stuff. But then I discovered conda -- and while I was resistant at first, I found that it was a much nicer environment to do what I needed to do. It started not because of anything specific about conda, but because someone had already built a bunch of the stuff I needed -- nice! But the breaking point was when I needed to build a package of my own: py_gd -- it required libgd, which required libpng, libjpeg, libtiff --- some of those come out of the box on a Mac, it's all available from homebrew, and all pretty common on Linux -- but I have users that are on Windows, or on a Mac and can't grok homebrew or macports. And, frankly, neither do all of my development team. But guess what? libgd wasn't in conda, but everything else I needed was -- this is all pretty common stuff -- other people have solved the problem and the system supports installing libs, so I can just use them. My work was SO MUCH easier. And especially my users have it so much easier, cause I can just give them a conda package. And while that particular example would have been solvable with binary wheels, as things get more complicated, it gets hard or impossible to do. So if I'm happy with conda -- why the frustration? Some is the history, but also there are two things: 1) pip is considered "the way" to handle dependencies -- my users want to use it, and they ask me for help using it, and I don't want to abandon my users -- so more work for me. 2) I see people over and over again starting out with pip -- cause that's what you do. Then hitting a wall, then trying Enthought Canopy, then trying Anaconda, then ending up with a tangled mess of multiple systems where who knows what python "pip" is associated with. This is why "there is only one way to do it" would be nice. And I'm pretty sure that "wall" will always be there -- things have gotten better with wheels -- between Matthew's efforts and manylinux, most everybody can get the core SciPy stack with pip -- very nice! But not pyHDF, netCDF5, gdal, shapely, ... (to name a few that I need to work with). And these are ugly: which means very hard for end-users to build, and very hard for people to package up into wheels (is it even possible?) And of course, all is not rosy with conda either -- the conda-forge effort has made phenomenal progress, but it's really hard to manage that huge stack of stuff (I'm using the time I'm writing this with to take a break from conda-forge dependency hell ...). But in a way, I think we'd be better off if there was more focus on conda-forge rather than the effort to shoehorn pip into solving more of the dependency problem. And the final frustration -- I think conda is still pretty misunderstood an misrepresented as a solution only (or primarily) for "data scientists" or people doing interactive data exploration, or "scientific programmers", whereas it's actually a pretty good solution to a lot of people's problems.

...

Although there are clear differences in the audience for pip and conda, there is also a very large overlap. In practice the majority of users could reasonably choose one or the other as their starting point.

exactly.

...

Of course, one may come to dominate this choice over the other. At the point where enough users become frustrated with the lack of pip wheels, conda will become the default. If pip wheels are widely available, that makes the pressure to use conda less. If we reach this tipping point it will become wasteful of developer effort to make pip wheels / conda packages, the number and quality of binary packages will drop, and one of these package managers will go into decline.

perhaps so -- but it will be a good while! The endorsement of the "official" community really does keep pip going. And, of course, it works great for a lot of use-cases. If it were all up to me (which of course it's not) -- I'd say that keeping pip / PyPi fully supported for all the stuff it's good at -- pure python and small/no dependency extension modules -- and folks can go to conda when/if they need more. After all, you can use pip from within a conda environment just fine :-) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Matthew Brett

5:32 a.m.

Hi, On Wed, Nov 2, 2016 at 4:08 PM, Chris Barker wrote:

...

Hey Matthew,

...
...
[1] There seems to be some animosity among pip supporters and conda supports, or at least a perception that there is.

I don't know whether there is animosity, but there is certainly tension. Speaking personally, I care a lot about having the option to prefer pip.

indeed -- and you have made herculean efforts to make that possible :-)

...
There are others in the scientific community who feel we should standardize on conda. I think this is the cause of Chris' frustration. If we could all use conda instead of pip, then this would make it easier for him in terms of packaging, because he would not have to support pip (Chris please correct me if I'm wrong).

yup -- nor would you :-)

My personal frustration comes with my history -- I've been in this community for over 15 years -- and I spent a lot of effort back in the day to make packages available for the Mac (before pypi, pip, before wheels, ...). And I found I was constantly re-figuring out how to build the dependent libraries needed, and statically linking everything, etc... It sucked. A lot of work, and not at all fun (for me, anyway -- maybe some folks enjoy that sort of thing).

So I started a project to share that effort, and to build a bit of infrastructure. I also started looking into how to handle dependent libs with pip+wheel -- I got no support whatsoever for that. I got frustrated 'cause it was too hard, and I also felt like I was fighting the tools. I did not get far.

Mathew ended up picking up that effort and really making it work, and had gotten all the core SciPY stuff out there as binary wheels -- really great stuff.

But then I discovered conda -- and while I was resistant at first, I found that it was a much nicer environment to do what I needed to do. It started not because of anything specific about conda, but because someone had already built a bunch of the stuff I needed -- nice!

But the breaking point was when I needed to build a package of my own: py_gd -- it required libgd, which required libpng, libjpeg, libtiff --- some of those come out of the box on a Mac, it's all available from homebrew, and all pretty common on Linux -- but I have users that are on Windows, or on a Mac and can't grok homebrew or macports. And, frankly, neither do all of my development team.

Anaconda has an overwhelming advantage on Windows, in that Continuum can bear the licensing liabilities enforced by the Intel Fortran compiler, and we can not. We therefore have no license-compatible Fortran compiler for Python 3.5 - so we can't build scipy, and that's a full stop for the scipy stack. I'm sure you know, but the only practical open-source option is mingw-w64, that does not work with the Microsoft runtime used by Python 3.5 [1]. It appears the only person capable of making mingw-w64 compatible with this runtime is Ray Donnelly, who now works for Continuum, and we haven't yet succeeded in finding the 15K USD or so to pay Continuum for his time.

...

But not pyHDF, netCDF5, gdal, shapely, ... (to name a few that I need to work with). And these are ugly: which means very hard for end-users to build, and very hard for people to package up into wheels (is it even possible?)

I'd be surprised if these packages were very hard to build OSX and Linux wheels for. We're already building hard packages like Pillow and Matplotlib and h5py and pytables. If you mean netCDF4 - there are already OSX and Windows wheels for that [2]. Cheers, Matthew [1] https://mingwpy.github.io/issues.html#the-vs-14-2015-runtime [2] https://pypi.python.org/pypi/netCDF4

Paul Moore

2:59 p.m.

On 3 November 2016 at 00:02, Matthew Brett wrote:

...

Anaconda has an overwhelming advantage on Windows, in that Continuum can bear the licensing liabilities enforced by the Intel Fortran compiler, and we can not. We therefore have no license-compatible Fortran compiler for Python 3.5 - so we can't build scipy, and that's a full stop for the scipy stack. I'm sure you know, but the only practical open-source option is mingw-w64, that does not work with the Microsoft runtime used by Python 3.5 [1]. It appears the only person capable of making mingw-w64 compatible with this runtime is Ray Donnelly, who now works for Continuum, and we haven't yet succeeded in finding the 15K USD or so to pay Continuum for his time.

Is this something the PSF could assist with? Either in terms of funding work to get mingw-w64 working, or in terms of funding (or negotiating) Intel Fortran licenses for the community? Paul

Matthew Brett

11:26 p.m.

Hi, On Thu, Nov 3, 2016 at 2:29 AM, Paul Moore wrote:

...

On 3 November 2016 at 00:02, Matthew Brett wrote:

...
Anaconda has an overwhelming advantage on Windows, in that Continuum can bear the licensing liabilities enforced by the Intel Fortran compiler, and we can not. We therefore have no license-compatible Fortran compiler for Python 3.5 - so we can't build scipy, and that's a full stop for the scipy stack. I'm sure you know, but the only practical open-source option is mingw-w64, that does not work with the Microsoft runtime used by Python 3.5 [1]. It appears the only person capable of making mingw-w64 compatible with this runtime is Ray Donnelly, who now works for Continuum, and we haven't yet succeeded in finding the 15K USD or so to pay Continuum for his time.

Is this something the PSF could assist with? Either in terms of funding work to get mingw-w64 working, or in terms of funding (or negotiating) Intel Fortran licenses for the community?

I'm afraid paying for the Intel Fortran licenses won't help because the problem is in the Intel Fortran runtime license [1]. But - it would be a huge help if the PSF could help with funding to get mingw-w64 working. This is the crucial blocker for progress on binary wheels on Windows. Nathaniel - do you have any recent news on progress? Cheers, Matthew [1] https://software.intel.com/en-us/comment/1881526

Chris Barker

4 Nov 4 Nov

3:25 a.m.

On Thu, Nov 3, 2016 at 10:56 AM, Matthew Brett wrote:

...

But - it would be a huge help if the PSF could help with funding to get mingw-w64 working. This is the crucial blocker for progress on binary wheels on Windows.

for what it's worth, this is a blocker for many of for using Win64 at all! As far as I know, conda has not solved this one -- maybe for Continuum's paying customers? (or for binaries that continuum builds, but not stuff I need to build myself) If it's just money we need, I'd hope we could scrape it together somehow! -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Nick Coghlan

5 Nov 5 Nov

12:06 p.m.

On 4 November 2016 at 03:56, Matthew Brett wrote:

...

But - it would be a huge help if the PSF could help with funding to get mingw-w64 working. This is the crucial blocker for progress on binary wheels on Windows.

Such a grant was already awarded earlier this year by way of the Scientific Python Working Group (which is a collaborative funding initiative between the PSF and NumFocus): https://mail.python.org/pipermail/scientific/2016-January/000271.html However, we hadn't received a status update by the time I stepped down from the Board, although it sounds like progress hasn't been good if folks aren't even aware that the grant was awarded in the first place. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Matthew Brett

12:19 p.m.

Hi, On Fri, Nov 4, 2016 at 11:36 PM, Nick Coghlan wrote:

...

On 4 November 2016 at 03:56, Matthew Brett wrote:

...
But - it would be a huge help if the PSF could help with funding to get mingw-w64 working. This is the crucial blocker for progress on binary wheels on Windows.

Such a grant was already awarded earlier this year by way of the Scientific Python Working Group (which is a collaborative funding initiative between the PSF and NumFocus): https://mail.python.org/pipermail/scientific/2016-January/000271.html

However, we hadn't received a status update by the time I stepped down from the Board, although it sounds like progress hasn't been good if folks aren't even aware that the grant was awarded in the first place.

Ah no, that was a related project, but for a different set of work. Specifically that was a 5K grant to Carl Kleffner to configure and package up the mingw-w64 compilers to make it easier to build extensions for Pythons prior to 3.5 - see [1]. Cheers, Matthew [1] https://mingwpy.github.io/

Nathaniel Smith

12:27 p.m.

On Fri, Nov 4, 2016 at 11:36 PM, Nick Coghlan wrote:

...

On 4 November 2016 at 03:56, Matthew Brett wrote:

...
But - it would be a huge help if the PSF could help with funding to get mingw-w64 working. This is the crucial blocker for progress on binary wheels on Windows.

Such a grant was already awarded earlier this year by way of the Scientific Python Working Group (which is a collaborative funding initiative between the PSF and NumFocus): https://mail.python.org/pipermail/scientific/2016-January/000271.html

However, we hadn't received a status update by the time I stepped down from the Board, although it sounds like progress hasn't been good if folks aren't even aware that the grant was awarded in the first place.

There's two separate projects here that turn out to be unrelated: one to get mingw-w64 support for CPython < 3.5, and one for CPython >= 3.5. (This has to do with the thing where MSVC totally redid how their C runtime works.) The grant you're thinking of is for the python < 3.5 part; what Matthew's talking about is for the python >= 3.5 part, which is a totally different plan and team. The first blocker on getting funding for the >= 3.5 project though is getting the team to write down an actual plan and cost estimate, which has not yet happened... -n -- Nathaniel J. Smith -- https://vorpus.org

Ralf Gommers

6 Nov 6 Nov

3:43 a.m.

On Sat, Nov 5, 2016 at 7:57 PM, Nathaniel Smith wrote:

...

On Fri, Nov 4, 2016 at 11:36 PM, Nick Coghlan wrote:

...
On 4 November 2016 at 03:56, Matthew Brett wrote:

...
But - it would be a huge help if the PSF could help with funding to get mingw-w64 working. This is the crucial blocker for progress on binary wheels on Windows.

Such a grant was already awarded earlier this year by way of the Scientific Python Working Group (which is a collaborative funding initiative between the PSF and NumFocus): https://mail.python.org/pipermail/scientific/2016-January/000271.html

However, we hadn't received a status update by the time I stepped down from the Board,

This status update was sent to the PSF board in June: http://mingwpy.github.io/roadmap.html#status-update-june-16. Up until that report the progress was good, but after that progress has stalled due to unavailability for private reasons of Carl Kleffner (the main author of MingwPy). Ralf

...

...
although it sounds like progress hasn't been good if folks aren't even aware that the grant was awarded in the first place.

...

There's two separate projects here that turn out to be unrelated: one to get mingw-w64 support for CPython < 3.5, and one for CPython >= 3.5. (This has to do with the thing where MSVC totally redid how their C runtime works.) The grant you're thinking of is for the python < 3.5 part; what Matthew's talking about is for the python >= 3.5 part, which is a totally different plan and team.

The first blocker on getting funding for the >= 3.5 project though is getting the team to write down an actual plan and cost estimate, which has not yet happened...

-n

-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Nick Coghlan

10:14 a.m.

On 6 November 2016 at 08:13, Ralf Gommers wrote:

...

...
On Fri, Nov 4, 2016 at 11:36 PM, Nick Coghlan wrote:

...
Such a grant was already awarded earlier this year by way of the Scientific Python Working Group (which is a collaborative funding initiative between the PSF and NumFocus): https://mail.python.org/pipermail/scientific/2016-January/000271.html

However, we hadn't received a status update by the time I stepped down from the Board,

This status update was sent to the PSF board in June: http://mingwpy.github.io/roadmap.html#status-update-june-16.

Very cool - thanks for the update!

...

Up until that report the progress was good, but after that progress has stalled due to unavailability for private reasons of Carl Kleffner (the main author of MingwPy).

Ah, that's unfortunate. Hopefully nothing too serious, although I realise it wouldn't be appropriate to share details on a public list like this one. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Chris Barker

4 Nov 4 Nov

3:45 a.m.

On Wed, Nov 2, 2016 at 5:02 PM, Matthew Brett wrote:

...

Anaconda has an overwhelming advantage on Windows, in that Continuum can bear the licensing liabilities enforced by the Intel Fortran compiler, and we can not.

Technically, that's an advantage that a commercial distribution has -- really nothing to do with the conda technology per se. But yes, from a practical perspective Continuum's support for conda an Anaconda is a major bonus in many ways. Note that continuum in now (I think) default ing to MKL also -- they seem to have solved the res-distribution issues. But the conda-forge project has its own builds of lots of stuff -- including the core scipy stack, based on OpenBLAS. There is a scipy build there: https://github.com/conda-forge/scipy-feedstock/tree/master/recipe but alas, still stuck: ``` build: # We lack openblas on Windows, and therefore can't build scipy there either currently. skip: true # [win or np!=111] ```

...

I'm sure you know, but the only practical open-source option is mingw-w64, that does not work with the Microsoft runtime used by Python 3.5 [1].

OT -- but is it stable for Python2.7 now?

...

...
But not pyHDF, netCDF5, gdal, shapely, ... (to name a few that I need to work with). And these are ugly: which means very hard for end-users to build, and very hard for people to package up into wheels (is it even possible?)

I'd be surprised if these packages were very hard to build OSX and Linux wheels for. We're already building hard packages like Pillow

Pilllow is relatively easy :-) -- I was doing that years ago.

...

and Matplotlib and h5py and pytables.

Do h5py and pytables share libhdf5 somehow? Or is the whole mess statically linked into each? If you mean netCDF4 - there are

...

already OSX and Windows wheels for that [2].

God bless Chris Gohlke --- I have no idea how he does all that! So take all the hassle of those -- and multiply by about ten to get what the GIS stack needs: GDAL/OGR, Shapely, Fiona, etc.... Which doesn't mean it couldn't be done -- just that it would be a pain, and you'd end up with some pretty bloated wheels -- in the packages above, how many copies of libpng will there be? how many of hdf5? proj.4? geos? Maybe that simply doesn't matter -- computers have a lot of disk space and memory these days. But there is one more use-case -- folks that need to build their own stuff against some of those libs (netcdf4 in my case). The nice thing about conda is that it can provide the libs for me to use in my own custom built projects, or for other folks to use to build conda packages with. Maybe pip could be used to distribute those libs, too -- I know folks are working on that. Then there are non-python stuff that you need to work with that would be nice to mange in environments, too -- R, MySQL, who knows? As someone else on this thread noted -- it's jamming a square peg into a round hole -- and why do that when there is a square hole you can use instead? At least that's what I decided. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Donald Stufft

3 Nov 3 Nov

7:04 a.m.

...

On Nov 2, 2016, at 7:08 PM, Chris Barker wrote:

perhaps so -- but it will be a good while! The endorsement of the "official" community really does keep pip going. And, of course, it works great for a lot of use-cases.

Right, there is some overlap in terms of “I am on Windows/macOS/most Linuxs and I either prefer Anaconda for my Python or I don’t care where it comes from and I don’t have any compelling reasons NOT to use something different”. That can be a pretty big chunk of people, but then there is an area that doesn’t overlap that the differences in decisions aren’t really going to be very easy to reconcile. For instance, the obvious one that favors conda is anytime your existing platform isn’t providing you something that conda is providing (e.g. a new enough NumPy on CentOS, or binary packages at all on Windows, or what have you). The flip side of that is pip’s ability to just-in-time build binary packages means that if you’re deploying to say, alpine linux like a lot of the folks using Docker is doing or to FreeBSD or something like that then, as long as your system is configured, `pip install` will just work without you having to go through the work to stand up an entire Conda repository that contains all of the “bare minimum” compiled packages for an entire new architecture. Pip’s ability to “slot” into an existing environment also means that it generally just works in any environment you give it, if you want something that will integrate with something installed by apt-get, well you’re going to have a hard time getting that with conda. Beyond all of that though, there’s more to the pip tooling than just what command line program some end user happens to be executing. Part of this is enabling a project author to release a package that can be consumed by Conda and Debian and RHEL and Alpine and FreeBsd ports and etc etc. I don’t mean in the sense of “I can run ``pip install`` on them, but in the sense that by the very nature of our JIT building from source we need a programmatic standard interface to the build system, so if pip can automatically consume your package then conda-build and such becomes much easier to write for any individual package. At the end of the day, sure it would make some things a bit easier if everyone would just standardize around one thing, but I doubt it will ever happen because these things really do serve different audiences once you leave the overlap. Much like it’d be great if everyone just standardized on Ubuntu! Or Debian! Or CentOS! Or macOS! Or Windows! Or it’d be nice if everyone picked a single Python version, or a single Python implementation, or a single language at all :) So yea, diversity makes things more difficult, but it also tends to lead to better software overall :D

...

If it were all up to me (which of course it's not) -- I'd say that keeping pip / PyPi fully supported for all the stuff it's good at -- pure python and small/no dependency extension modules -- and folks can go to conda when/if they need more.

After all, you can use pip from within a conda environment just fine :-)

I think this is pretty much where we're at and where we’re likely to stay for the general case as far as what is officially supported. We may gain some stuff to try and integrate better with whatever platform we’re on (for instance, the metadata that Nick suggested that would allow people to say “I need libgd” and have conda or apt or whatever provide that) but I don’t think we’re likely to go much further than that. I know that there is the pynativelib stuff which would cross into that threshold, but one of the things about that proposal is it doesn’t require anything from pip or PyPI or whatever to make it happen. That’s “just” some square pegging into a round hole to sort of try and close the gap a little bit but I doubt it’s ever going to be something that even approaches a solution like what Conda/apt/etc give. — Donald Stufft

Paul Moore

3:04 p.m.

On 2 November 2016 at 23:08, Chris Barker wrote:

...

After all, you can use pip from within a conda environment just fine :-)

In my experience (some time ago) doing so ended up with the "tangled mess of multiple systems" you mentioned. Conda can't uninstall something I installed with pip, and I have no obvious way of introspecting "which installer did I use to install X?" And if I pip uninstall something installed by conda, I break things. (Apologies if I misrepresent, it was a while ago, but I know I ended up deciding that I needed to keep a log of every package I installed and what I installed it with if I wanted to keep things straight...) Has that improved? I'm not trying to complain, I don't have any particular expectation that conda "has to" support using pip on a conda installation. But I do want to make sure we're clear on what works and what doesn't. Paul

Chris Barker

4 Nov 4 Nov

3 a.m.

On Thu, Nov 3, 2016 at 2:34 AM, Paul Moore wrote:

...

On 2 November 2016 at 23:08, Chris Barker wrote:

...
After all, you can use pip from within a conda environment just fine :-)

In my experience (some time ago) doing so ended up with the "tangled mess of multiple systems" you mentioned.

well, yes, my experience too, as a of two years ago, -- which is why I'm taken the time make conda packages of all the pure python libs I need (and now that there is conda-forge, it's not a huge lift)

...

Conda can't uninstall something I installed with pip,

sure can't! -- though that's too much of a barrier -- it'll give you an error, and then you use pip to uninstall it.

...

and I have no obvious way of introspecting "which installer did I use to install X?" And if I pip uninstall something installed by conda, I break things. (Apologies if I misrepresent, it was a while ago, but I know I ended up deciding that I needed to keep a log of every package I installed and what I installed it with if I wanted to keep things straight...)

Has that improved?

yes, it has -- and lots of folks seem to have it work fine for them, though I"d say in more of a one-way street approach: - create a conda environment - install all the conda packages you need - install all the pip packages you need. you're done. And most conda packages of python packages are built "properly" with pip-compatible meta data, so a pip install won't get confused and think it needs to install a dependency that is already there. then you can export your environment to a yaml file that keeps track of which packages are conda, and which pip, and you (and your users) can re-create that environment reliably -- even on a different platform. I'm not trying to complain, I don't have any particular expectation

...

that conda "has to" support using pip on a conda installation. But I do want to make sure we're clear on what works and what doesn't.

I'm not the expert here -- honestly do try avoid mixing pip and conda -- but support for it has gotten a lot better. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Thomas Güttler

3 Nov 3 Nov

2:11 a.m.

New subject: I abandon my idea, Was: Current Python packaging status (from my point of view)

Am 02.11.2016 um 17:57 schrieb Donald Stufft:

...

...
On Nov 2, 2016, at 12:49 PM, Nick Coghlan mailto:ncoghlan@gmail.com> wrote:

...
hmm -- I don't think that's the code-writers job -- it's the deployers job. Other than choosing which python *version* I want to use, I can happily develop with system python and pip, and then deploy with conda -- or vice versa. INdeed, I can develop on Windows and deploy on LInux, or....

You still need to decide which versions you're going to test against, and which bug reports you're going to accept as potentially valid feedback (e.g. very few people running upstream community projects will accept "doesn't run on Python 2.5" as a valid bug report any more, and RHEL/CentOS 7, Software Collections, and conda have been around for long enough now that most won't accept "doesn't run on 2.6" either)

...
though if you meant pypy vs iron python vs cPython when you meant "runtime" then yes, with the dependency issue, you really do need to make that choice upfront.

I also mean 2.6 vs 2.7 vs 3.4 vs 3.5 vs 3.6, etc

There are still platform differences too, we regularly get bugs that only are exposed on Anaconda or on Ubuntu’s Python or on RHEL's Python or on Python.org http://Python.org’s OS X installers etc etc. Basically every variation has a chance to introduce a bug of some kind, and if you’re around long enough and you’re used enough you’ll run into them on every system. As someone writing that code you have to decide where you draw the line for what you support or not (for instance, you may support Ubuntu/RHEL/Anaconda, but you may decide that any version of CPython running on HPUX is not supported).

I guess you are right. I will abandon my idea. Regards, Thomas Güttler -- http://www.thomas-guettler.de/

Chris Barker

12:09 a.m.

On Wed, Nov 2, 2016 at 9:49 AM, Nick Coghlan wrote:

...

No, as the post was about the fundamental and irreconcilable differences in capabilities, not the incidental ones that can be solved if folks choose (or are paid) to put in the necessary design and development time.

It's your post, but the external dependency is hardly an incidental issue.

...

The post isn't written for beginners deciding which tool to use, it's written for intermediate folks that have already chosen one or the other for their own needs, and are wondering why the other still exists (or, worse, are telling people that have chosen the other tool for good reasons that they're wrong, and should switch).

again -- those reasons are very much about external dependency management! - you need a system for specifying environmental *constraints* (like

...

dynamically linked C libraries and command line applications you invoke) - you need a system for asking the host environment if it can satisfy those constraints

and it it can't -- you're then done -- that's actually the easy part (and happens already and build or run time, yes?): I try to build libgdal, it'll fail if I don't have that huge pile of dependencies installed. I try to run a wheel someone else built -- it'll also fail. It'd be better if this could be hashed out a compilation or linking error, sure, but that's not goign to do a whole lot except make the error messages nicer :-)

...

dnf/yum, apt, brew, conda, et al all *work around* the current lack of such a system by asking humans (aka "downstream package maintainers") to supply the additional information by hand in a platform specific format.

if conda is a "platform", then yes. but in that sense pip is a platform, too. I'll beat this drum again -- if you want to extend pip to solve all (most) of the problems conda solves, then you are re-inventing conda. If someone doesn't like the conda design, and has better ideas, great, but only re-invent the wheel if you really are going to make a better wheel. However, I haven't thought about it carefully -- maybe it would be possible to have a system than managed everything except python itself. But that would be difficult, and I don't see the point, except to satisfy brain dead IT security folks :-)

...

...
But the different audiences aren't "data science folks" vs "web developers" -- the different audiences are determined by deployment needs, not domain.

Deployment needs are strongly correlated with domain though, and there's a world of difference between the way folks do exploratory data analysis and the way production apps are managed in Heroku/OpenShift/Cloud Foundry/Lambda/etc.

sigh. not everyone that uses the complex scipy stack is doing "exploratory data analysis" -- a lot of us are building production software, much of it behind web services... and that's what I mean by deployment needs. However, if you're specifically interested in web service

...

development, then swapping in your own Python runtime rather than just using a PaaS provided one is really much lower level than most beginners are going to want to be worrying about these days - getting opinionated about that kind of thing comes later (if it happens at all).

it's not a matter of opinion, but needs -- if this "beginner" is doing stuff only with pure-python packages, then great -- there are many easy options. But if they need some other stuff - maybe this beginner needs to work with scientific data sets.. then they're dead in the water with a Platform that doesn't support what they need. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Nick Coghlan

1:27 p.m.

On 3 November 2016 at 04:39, Chris Barker wrote:

...

On Wed, Nov 2, 2016 at 9:49 AM, Nick Coghlan wrote:

...
- you need a system for specifying environmental *constraints* (like dynamically linked C libraries and command line applications you invoke) - you need a system for asking the host environment if it can satisfy those constraints

and it it can't -- you're then done -- that's actually the easy part (and happens already and build or run time, yes?):

When it comes to the kinds of environmental constraints most Python applications (even scientific ones) impose, conda, dnf/yum, and apt can meet them. If you're consistently running (for example) the Fedora Scientific Lab as your Linux distro of choice, and aren't worried about other platforms, then you don't need and probably don't want conda - your preferred binary dependency management community is the Fedora one, not the conda one, and your preferred app isolation technology will be Linux containers, not conda environments. Within the containers, if you use virtualenv at all, it will be with the system site-packages enabled.

...

I try to build libgdal, it'll fail if I don't have that huge pile of dependencies installed.

I try to run a wheel someone else built -- it'll also fail.

It'd be better if this could be hashed out a compilation or linking error, sure, but that's not goign to do a whole lot except make the error messages nicer :-)

This is why fixing this in the general case requires the ability for pip to ask the host platform to install additional components.

...

...
dnf/yum, apt, brew, conda, et al all *work around* the current lack of such a system by asking humans (aka "downstream package maintainers") to supply the additional information by hand in a platform specific format.

if conda is a "platform", then yes. but in that sense pip is a platform, too.

No, it's not, as it still wouldn't have the ability to install external dependencies itself, only the ability to request them from the host platform. You'd still need a plugin interacting with something like conda or PackageKit underneath to actually do the installation (you'd want something like PackageKit rather than raw dnf/yum/apt on Linux, as PackageKit is what powers features like PackageKit-command-not-found, which lets you install CLI commands from system repos

...

I'll beat this drum again -- if you want to extend pip to solve all (most) of the problems conda solves, then you are re-inventing conda.

I don't. I want publishers of Python packages to be able to express their external dependencies in a platform independent way such that the following processes can be almost completely and reliably automated: * converting PyPI packages to conda packages * converting PyPI packages to RPMs * converting PyPI packages to deb packages * requesting external dependencies from the host system (whether that's conda, a Linux distro, or something else) when installing Python packages with pip

...

If someone doesn't like the conda design, and has better ideas, great, but only re-invent the wheel if you really are going to make a better wheel.

It wouldn't be conda, because it still wouldn't be give you a way to publish Python runtimes and other external dependencies - only a way to specify what you need when installed so that pip can ask the host environment to provide it (and fail noisily if it says it can't oblige) and so format converters like "conda skeleton" and "pyp2rpm" can automatically emit the correct external dependencies.

...

However, I haven't thought about it carefully -- maybe it would be possible to have a system than managed everything except python itself. But that would be difficult, and I don't see the point, except to satisfy brain dead IT security folks :-)

The IT security folks aren't brain dead, they have obligations to manage organisational risk, and that means keeping some level of control over what's being run with privileged access to network resources (and in most organisations running "inside the firewall" still implicitly grants some level of privileged information access) Beyond that "because IT said so" case, we'd kinda like Fedora system components to be running in the Fedora system Python (and ditto for other Linux distributions), while folks running in Heroku/OpenShift/Lambda should really be using the platform provided runtimes so they benefit from automated security updates. More esoterically, Python interpreter implementers and other folks need a way to obtain Python packages that isn't contingent on the interpreter being published in any particular format, since it may not be published at all. Victor Stinner's "performance" benchmark suite is an example of that, since it uses pip and virtualenv to compare the performance of different Python interpreters across a common set of benchmarks: https://github.com/python/performance

...

...
...
But the different audiences aren't "data science folks" vs "web developers" -- the different audiences are determined by deployment needs, not domain.

Deployment needs are strongly correlated with domain though, and there's a world of difference between the way folks do exploratory data analysis and the way production apps are managed in Heroku/OpenShift/Cloud Foundry/Lambda/etc.

sigh. not everyone that uses the complex scipy stack is doing "exploratory data analysis" -- a lot of us are building production software, much of it behind web services...

and that's what I mean by deployment needs.

Sure, and folks building Linux desktop apps aren't web developers either. Describing them that way is just shorthand caricatures that each attempt to encompass broad swathes of a complex ecosystem. It makes a *lot* of sense for folks using conda in some contexts (e.g. exploratory data analysis) to expand that to using conda everywhere. It also makes a lot of sense for folks that don't have a preference yet to start with conda for their own personal use. However, it *doesn't* make sense for folks to introduce conda into their infrastructure just for the sake of using conda. Instead, for folks in the first category and the last category, the key question they have to ask themselves is "Do I want to let the conda packaging community determine which Python runtimes are going to be available for me to use?". In the first case, there's a strong consistency argument in favour of doing so. However, in the last case, it really depends greatly on what you're doing, what your team prefers, and what kind of infrastructure setup you have.

...

...
However, if you're specifically interested in web service development, then swapping in your own Python runtime rather than just using a PaaS provided one is really much lower level than most beginners are going to want to be worrying about these days - getting opinionated about that kind of thing comes later (if it happens at all).

it's not a matter of opinion, but needs -- if this "beginner" is doing stuff only with pure-python packages, then great -- there are many easy options. But if they need some other stuff - maybe this beginner needs to work with scientific data sets.. then they're dead in the water with a Platform that doesn't support what they need.

Yep, and if that's the case, hopefully someone directs them towards a platform with that feature set, like conda, or Scientific Linux or the Fedora Scientific Lab, rather than a vanilla Python distro. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nathaniel Smith

12:58 a.m.

On Nov 2, 2016 9:52 AM, "Nick Coghlan" wrote:

...

[...]

...

Aside from already needing a Python runtime, the inability to fully specify the target environment isn't an inherent design limitation though, the solution just looks different at a pip level:

- you need a system for specifying environmental *constraints* (like dynamically linked C libraries and command line applications you invoke) - you need a system for asking the host environment if it can satisfy those constraints

Tennessee Leeuwenberg started a draft PEP for that first part last year: https://github.com/pypa/interoperability-peps/pull/30/files

dnf/yum, apt, brew, conda, et al all *work around* the current lack of such a system by asking humans (aka "downstream package maintainers") to supply the additional information by hand in a platform specific format.

To be fair, though, it's not yet clear whether such a system is actually possible. AFAIK no one has ever managed to reliably translate between different languages that Linux distros use for describing environment constraints, never mind handling the places where they're genuinely irreconcilable (e.g. the way different distro openssl packages have incompatible ABIs), plus other operating systems too. I mean, it would be awesome if someone pulls it off. But it's possible that this is like saying that it's not an inherent design limitation of my bike that it's only suited for terrestrial use, because I could always strap on wings and a rocket... -n

Nick Coghlan

2:10 p.m.

On 3 November 2016 at 05:28, Nathaniel Smith wrote:

...

On Nov 2, 2016 9:52 AM, "Nick Coghlan" wrote:

...
Tennessee Leeuwenberg started a draft PEP for that first part last year: https://github.com/pypa/interoperability-peps/pull/30/files

dnf/yum, apt, brew, conda, et al all *work around* the current lack of such a system by asking humans (aka "downstream package maintainers") to supply the additional information by hand in a platform specific format.

To be fair, though, it's not yet clear whether such a system is actually possible. AFAIK no one has ever managed to reliably translate between different languages that Linux distros use for describing environment constraints, never mind handling the places where they're genuinely irreconcilable (e.g. the way different distro openssl packages have incompatible ABIs), plus other operating systems too.

The main problem with mapping between Debian/RPM/conda etc in the general case is that the dependencies are generally expressed in terms of *names* rather than runtime actions, and you also encounter problems with different dependency management ecosystems splitting up (or aggregating!) different upstream components differently. This means you end up with a situation that's more like a lossy transcoding between MP3 and OGG Vorbis or vice-versa than it is a pristine encoding to either from a losslessly encoded FLAC or WAV file. The approach Tennessee and Robert Collins came up with (which still sounds sensible to me) is that instead of dependencies on particular external components, what we want to be able express is instead a range of *actions* that the software is going to take: - "I am going to dynamically load a library named <X>" - "I am going to execute a subprocess for a command named <Y>" And rather than expecting folks to figure that stuff out for themselves, you'd use tools like auditwheel and strace to find ways to generate it and/or validate it.

...

I mean, it would be awesome if someone pulls it off. But it's possible that this is like saying that it's not an inherent design limitation of my bike that it's only suited for terrestrial use, because I could always strap on wings and a rocket...

When it comes to things that stand in the way of fully automating the PyPI -> RPM pipeline, there are still a lot of *much* lower hanging fruit out there than this. However, the immediate incentives of folks working on package management line up in such a way that I'm reasonably confident that the lack of these capabilities is a matter of economic incentives rather than insurmountable technical barriers - it's a hard, tedious problem to automate, and manual repackaging into a platform specific format has long been a pretty easy thing for commercial open source redistributors to sell. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nathaniel Smith

2:40 p.m.

On Nov 3, 2016 1:40 AM, "Nick Coghlan" wrote:

...

On 3 November 2016 at 05:28, Nathaniel Smith wrote:

...
On Nov 2, 2016 9:52 AM, "Nick Coghlan" wrote:

...
Tennessee Leeuwenberg started a draft PEP for that first part last year: https://github.com/pypa/interoperability-peps/pull/30/files

dnf/yum, apt, brew, conda, et al all *work around* the current lack of such a system by asking humans (aka "downstream package maintainers") to supply the additional information by hand in a platform specific format.

To be fair, though, it's not yet clear whether such a system is actually possible. AFAIK no one has ever managed to reliably translate between different languages that Linux distros use for describing environment constraints, never mind handling the places where they're genuinely irreconcilable (e.g. the way different distro openssl packages have incompatible ABIs), plus other operating systems too.

The main problem with mapping between Debian/RPM/conda etc in the general case is that the dependencies are generally expressed in terms of *names* rather than runtime actions, and you also encounter problems with different dependency management ecosystems splitting up (or aggregating!) different upstream components differently. This means you end up with a situation that's more like a lossy transcoding between MP3 and OGG Vorbis or vice-versa than it is a pristine encoding to either from a losslessly encoded FLAC or WAV file.

The approach Tennessee and Robert Collins came up with (which still sounds sensible to me) is that instead of dependencies on particular external components, what we want to be able express is instead a range of *actions* that the software is going to take:

- "I am going to dynamically load a library named <X>" - "I am going to execute a subprocess for a command named <Y>"

...

And rather than expecting folks to figure that stuff out for themselves, you'd use tools like auditwheel and strace to find ways to generate it and/or validate it.

...
I mean, it would be awesome if someone pulls it off. But it's possible

And then it segfaults because it turns out that your library named <X> is not abi compatible with my library named <X>. Or it would have been if you had the right version, but distros don't agree on how to express version numbers either. (Just think about epochs.) Or we're on Windows, so it's interesting to know that we need a library named <X>, but what are we supposed to do with that information exactly? Again, I don't want to just be throwing around stop energy -- if people want to tackle these problems then I wish them luck. But I don't think we should be hand waving this as a basically solved problem that just needs a bit of coding, because that also can create stop energy that blocks an honest evaluation of alternative approaches. that

...

...
this is like saying that it's not an inherent design limitation of my bike that it's only suited for terrestrial use, because I could always strap on wings and a rocket...

When it comes to things that stand in the way of fully automating the PyPI -> RPM pipeline, there are still a lot of *much* lower hanging fruit out there than this. However, the immediate incentives of folks working on package management line up in such a way that I'm reasonably confident that the lack of these capabilities is a matter of economic incentives rather than insurmountable technical barriers - it's a hard, tedious problem to automate, and manual repackaging into a platform specific format has long been a pretty easy thing for commercial open source redistributors to sell.

The idea does seem much more plausible as a form of metadata hints attached to sdists that are used as inputs to a semi-automated-but-human-supervised distro package building system. What I'm skeptical about is specifically the idea that someday I'll be able to build a wheel, distribute it to end users, and then pip will do some negotiation with dnf/apt/pacman/chocolatey/whatever and make my wheel work everywhere -- and that this will be an viable alternative to conda. -n

Nick Coghlan

4:09 p.m.

On 3 November 2016 at 19:10, Nathaniel Smith wrote:

...

On Nov 3, 2016 1:40 AM, "Nick Coghlan" wrote:

...
The approach Tennessee and Robert Collins came up with (which still sounds sensible to me) is that instead of dependencies on particular external components, what we want to be able express is instead a range of *actions* that the software is going to take:

- "I am going to dynamically load a library named <X>" - "I am going to execute a subprocess for a command named <Y>"

And then it segfaults because it turns out that your library named <X> is not abi compatible with my library named <X>. Or it would have been if you had the right version, but distros don't agree on how to express version numbers either. (Just think about epochs.) Or we're on Windows, so it's interesting to know that we need a library named <X>, but what are we supposed to do with that information exactly?

I don't think there's much chance of any of this ever working on Windows - conda will rule there, and rightly so. Mac OS X seems likely to go the same way, although there's an outside chance brew may pick up some of the otherwise Linux-only capabilities if they prove successful. Linux distros have larger package management communities than conda though, and excellent integration testing infrastructure on the commercial side of things, so we "just" need to get some of their software integration pipelines arranged in better configurations, and responding to the right software publication events :)

...

Again, I don't want to just be throwing around stop energy -- if people want to tackle these problems then I wish them luck. But I don't think we should be hand waving this as a basically solved problem that just needs a bit of coding, because that also can create stop energy that blocks an honest evaluation of alternative approaches.

There's a reason I'm not volunteering to work on this in my own time, and am instead letting the Fedora Modularity and Python maintenance folks decide when it's a major blocker to further process improvement efforts on their side of things, while I work on other non-Python-specific aspects of Red Hat's software supply chain management. What I'm mainly reacting to here is Chris's apparent position that he sees the entire effort as pointless. It's not pointless, just hard, since it involves working towards process improvements that span *multiple* open source communities, rather than being able to work with just the Python community or just the conda community. Even if automation can only cover 80-90% of legacy cases and 95% of future cases (numbers plucked out of thin air), that's still a huge improvement since folks generally aren't adding *new* huge-and-gnarly C/C++/FORTRAN dependencies to the mix of things pip users need to worry about, while Go and Rust have been paying much closer attention to supporting distributed automated builds from the start of their existence.

...

...
When it comes to things that stand in the way of fully automating the PyPI -> RPM pipeline, there are still a lot of *much* lower hanging fruit out there than this. However, the immediate incentives of folks working on package management line up in such a way that I'm reasonably confident that the lack of these capabilities is a matter of economic incentives rather than insurmountable technical barriers - it's a hard, tedious problem to automate, and manual repackaging into a platform specific format has long been a pretty easy thing for commercial open source redistributors to sell.

The idea does seem much more plausible as a form of metadata hints attached to sdists that are used as inputs to a semi-automated-but-human-supervised distro package building system. What I'm skeptical about is specifically the idea that someday I'll be able to build a wheel, distribute it to end users, and then pip will do some negotiation with dnf/apt/pacman/chocolatey/whatever and make my wheel work everywhere -- and that this will be an viable alternative to conda.

Not in the general case it won't, no, but it should be entirely feasible for distro-specific Linux wheels and manylinux1 wheels to be as user-friendly as conda in cases where folks on the distro side are willing to put in the effort to make the integration work properly. It may also be feasible to define an ABI for "linuxconda" that's broader than the manylinux1 ABI, so folks can publish conda wheels direct to PyPI, and then pip could define a way for distros to indicate their ABI is "conda compatible" somehow. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Paul Moore

4:20 p.m.

On 3 November 2016 at 10:39, Nick Coghlan wrote:

...

It may also be feasible to define an ABI for "linuxconda" that's broader than the manylinux1 ABI, so folks can publish conda wheels direct to PyPI, and then pip could define a way for distros to indicate their ABI is "conda compatible" somehow.

Even on the "hard" cases like Windows, it may be possible to define a standard approach that goes something along the lines of defining a standard location that (somehow) gets added to the load path, and interested parties provide DLLs for external dependencies, which the users can then manually place in those locations. Or there's the option that's been mentioned before, but never (to my knowledge) developed into a complete proposal, for packaging up external dependencies as wheels. In some ways, Windows is actually an *easier* platform in this regard, as it's much more consistent in what it does provide - the CRT, and nothing else, basically. So all of the rest of the "external dependencies" need to be shipped, which is a bad thing, but there's no combinatorial explosion of system dependencies to worry about, which is good (as long as the "external dependencies" ecosystem maintains internal consistency). Paul

Chris Barker

4 Nov 4 Nov

3:18 a.m.

On Thu, Nov 3, 2016 at 3:50 AM, Paul Moore wrote:

...

Even on the "hard" cases like Windows, it may be possible to define a standard approach that goes something along the lines of defining a standard location that (somehow) gets added to the load path, and interested parties provide DLLs for external dependencies, which the users can then manually place in those locations.

that't pretty much what conda is :-) though it adds the ability to handle multiple environments, and tools so you don't have to manually place anything. It would probably be feasible to make a conda-for-everything-but-python-itself. I'm just not sure that would buy you much. -CHB Or there's the

...

option that's been mentioned before, but never (to my knowledge) developed into a complete proposal, for packaging up external dependencies as wheels.

Folks were working on this at Pycon last spring -- any progress?

...

In some ways, Windows is actually an *easier* platform in this regard, as it's much more consistent in what it does provide - the CRT, and nothing else, basically. So all of the rest of the "external dependencies" need to be shipped, which is a bad thing, but there's no combinatorial explosion of system dependencies to worry about, which is good

and pretty much what manylinux is about, yes? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Paul Moore

3:32 a.m.

On 3 November 2016 at 21:48, Chris Barker wrote:

...

On Thu, Nov 3, 2016 at 3:50 AM, Paul Moore wrote:

...
Even on the "hard" cases like Windows, it may be possible to define a standard approach that goes something along the lines of defining a standard location that (somehow) gets added to the load path, and interested parties provide DLLs for external dependencies, which the users can then manually place in those locations.

that't pretty much what conda is :-)

Well, it doesn't feel like that. Maybe we're not understanding each other. Am I able to take a non-conda installation of Python, use conda to install *just* a set of non-Python DLLs (libxml, libyaml, ...) and then use pip to install wheels for lxml, pyyaml, etc? I understand that there currently isn't a way to *build* an lxml wheel that links to a conda-installed libxml, but that's not the point I'm making - if conda provided a way to manage external DLLs only, then it would be possible in theory to contribute a setup.py fix to a project like lxml that detected and linked to a conda-installed libxml. That single source could then be used to build *both* wheels and conda packages, avoiding the current situation where effort on getting lxml to build is duplicated, once for conda and once for wheel.

...

though it adds the ability to handle multiple environments, and tools so you don't have to manually place anything.

Well, there are other tools (virtualenv, venv) to do this as well, so again conda is to an extent offering a "buy into our infrastructure or you're on your own" proposition. That's absolutely the right thing for people wanting an integrated "just get on with it" solution, so don't assume this is a criticism - just an explanation of why conda doesn't suit *my* needs.

...

It would probably be feasible to make a conda-for-everything-but-python-itself. I'm just not sure that would buy you much.

It would buy *me* flexibility to use python.org build of Python, or my own builds. And not to have to wait for conda to release a build of a new version. Of course, I have specialised needs, so that's not an important consideration for the conda developers. Paul

Chris Barker

4:06 a.m.

On Thu, Nov 3, 2016 at 3:02 PM, Paul Moore wrote:

...

it may be possible to define a

...
...
standard approach that goes something along the lines of defining a standard location that (somehow) gets added to the load path, and interested parties provide DLLs for external dependencies, which the users can then manually place in those locations.

that't pretty much what conda is :-)

Well, it doesn't feel like that. Maybe we're not understanding each other. Am I able to take a non-conda installation of Python,

no -- not at all -- but there is no "system install" of Python on Windows that your IT folks are telling you that you shoudl use. (except maybe IBM, if I recall from a David Beazley talk) So conda provides that as well. But anyway, I meant _conceptually_ it's the same thing. use conda

...

to install *just* a set of non-Python DLLs (libxml, libyaml, ...) and then use pip to install wheels for lxml, pyyaml, etc?

If you have a conda python already, then yes, you can install conda packages of various libs, and then build pyton packages that depend on them. And now that I think about it -- you could probably: install conda (which WILL install python -- cond needs it itself...) do some munging of environment variables. Use another python install to build wheels, etc. If you got your environment variables set up right -- then building from anywhere should be able to find the conda stuff. But I don't know that you'd get any of the advantages of conda environments this way. I'm still trying to figure out why you'd want to do that on Windows, though -- why not let conda manage your python too? that there currently isn't a way to *build* an lxml wheel that links

...

to a conda-installed libxml, but that's not the point I'm making - if conda provided a way to manage external DLLs only, then it would be possible in theory to contribute a setup.py fix to a project like lxml that detected and linked to a conda-installed libxml. That single source could then be used to build *both* wheels and conda packages, avoiding the current situation where effort on getting lxml to build is duplicated, once for conda and once for wheel.

see above -- conda is putting dlls into a standard place -- no reason you couldn't teach other systems to find them. In fact, I did something like this on OS-X a long while back. There was a project (still is!) called "Unix Compatibility Frameworks": http://www.kyngchaos.com/software/frameworks It has a bunch of the dependencies needed for various python packages I wanted to support (matplotlib, PIL, gdal...). So I installed that, then build bdist_mpgks against it (no wheel back then!). I distributed those packages. In the install instructions, I told folks to go install the "Unix Compatibility Frameworks", and then all my packages would work. Similar concept. However, one "trick' here is multiple dependency versions. A given conda environment can have only one version of a package. If you need different version, you use different packages. but that would get ugly if you wanted your external wheels to link to a particular package. This all gets easier if you manage the whole stack with one tool -- that's why conda is built that way. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Chris Barker

4:15 a.m.

On Thu, Nov 3, 2016 at 3:02 PM, Paul Moore wrote:

...

It would buy *me* flexibility to use python.org build of Python, or my own builds. And not to have to wait for conda to release a build of a new version.

you are perfectly free to make your own conda package of python -- it's not that hard. In fact, that's one of the key benefits of conda's approach -- conda manages python itself, so you can easily have a conda environment for the latest development release of python. of, for that matter, just build python from source inside an environment. ANd if you want to build a custom python package, you don't even need your own recipe: https://github.com/conda-forge/python-feedstock/tree/master/recipe and conda can provide all the dependencies for you too :-)

...

Of course, I have specialised needs, so that's not an important consideration for the conda developers.

True -- though the particular need of using a custom-compiled python is actually pretty well supported! Caveat: I've never tried this -- and since conda itself uses Python, you may break conda if you install a python that can't run conda. But maybe not -- conda could be smart enough to use its "root" install of python regardless of what python you have in your environment. in fact, I think that is the case. when I am in an activated conda environment, the "conda" command is a link to the root conda command, so Im pretty sure it will use the root python to run itself. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Chris Barker

5 Nov 5 Nov

1:55 a.m.

Final note after a long thread: Just like Nick pointed out in his original post (if I read it right) , the pip vs the conda approach comes down to this: Do you want to a system to manage the whole stack? or do you want a system to manage Python packages? Personally, I think that no matter how you slice it, someone needs to manage the whole stack. I find it highly preferable to manage the whole stack with one tool. And even more so to manage the stack on multiple platforms with the same tool. But others have reasons to manage part of the stack (python itself, for instance) with a particular system, and thus only need a tool to manage the python packages part. Two different deployment use cases, that will probably always exist. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Nathaniel Smith

4 Nov 4 Nov

4:53 a.m.

On Thu, Nov 3, 2016 at 2:48 PM, Chris Barker wrote:

...

On Thu, Nov 3, 2016 at 3:50 AM, Paul Moore wrote:

...
Or there's the option that's been mentioned before, but never (to my knowledge) developed into a complete proposal, for packaging up external dependencies as wheels.

Folks were working on this at Pycon last spring -- any progress?

I figured out how to make it work in principle; the problem is that it turns out we need to write some nasty code to munge Mach-O files first. It's totally doable, just a bit more of a hurdle than previously expected :-). And then this happened: https://vorpus.org/blog/emerging-from-the-underworld/ so I'm way behind on everything right now. (If anyone wants to help then speak up!) -n -- Nathaniel J. Smith -- https://vorpus.org

Chris Barker

3:14 a.m.

On Thu, Nov 3, 2016 at 3:39 AM, Nick Coghlan wrote:

...

I don't think there's much chance of any of this ever working on Windows - conda will rule there, and rightly so. Mac OS X seems likely to go the same way, although there's an outside chance brew may pick up some of the otherwise Linux-only capabilities if they prove successful.

this is a really good point then -- we're building "platform independent system that, oh actually is a "a few different Linux vendors" system. Which is a perfectly worthy goal, but it was not at all the goal of conda. I wasn't there, but I think conda started with the goal of supporting three core platforms (that are pretty different) with the same UI, and as much of the same code as possible. And if you want to do that -- it really makes a lot of sense to control as much of the system as possible.

...

What I'm mainly reacting to here is Chris's apparent position that he sees the entire effort as pointless.

I don't believe I actually ever said that -- though I can see why'd you'd think so. But for the record, I don't see it as pointless. It's not pointless, just hard,

...

That I do see -- I see it as hard, so hard that it's likely to hit road blocks, so we could make better progress with other solutions if that's where people put their energies. And you said yourself earlier in this post that: "I don't think there's much chance of any of this ever working on Windows" One of the "points" for me it to support Windows, OS-X and Linux -- so in that sense -- I guess it is pointless ;-) By the way, I was quite disappointed when i discovered that conda has done literally nothing to make building cross-platfrom -- I understand why, but none the less, building can be a real pain. But it's also nice that it has a made a nice clean distinction between package management and building, a distinction that is all too blurred in the distutils/setuptools/pip/wheel stack (which, I know, is slowly being untangled..) Not in the general case it won't, no, but it should be entirely

...

feasible for distro-specific Linux wheels and manylinux1 wheels to be as user-friendly as conda in cases where folks on the distro side are willing to put in the effort to make the integration work properly.

I agree -- though I think the work up front by the distributors will be more.

...

It may also be feasible to define an ABI for "linuxconda" that's broader than the manylinux1 ABI, so folks can publish conda wheels direct to PyPI, and then pip could define a way for distros to indicate their ABI is "conda compatible" somehow.

Hmm -- now THAT's an interesting idea... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Nick Coghlan

5 Nov 5 Nov

1:13 p.m.

On 4 November 2016 at 07:44, Chris Barker wrote:

...

On Thu, Nov 3, 2016 at 3:39 AM, Nick Coghlan wrote:

...
I don't think there's much chance of any of this ever working on Windows - conda will rule there, and rightly so. Mac OS X seems likely to go the same way, although there's an outside chance brew may pick up some of the otherwise Linux-only capabilities if they prove successful.

this is a really good point then -- we're building "platform independent system that, oh actually is a "a few different Linux vendors" system. Which is a perfectly worthy goal, but it was not at all the goal of conda.

conda needs raw material from software publishers, and it gets most of that as software published in language specific formats on PyPI and CRAN (which it then redistributes in both source and binary form), *not* through original software being published specifically as conda packages. That latter does happen, but it's the exception rather than the rule. That's *exactly* the same redistribution model that Linux distros use, hence my semi-regular comparison between "I only want to publish and support conda packages" and "We only want to support CentOS/RHEL/Debian stable/Ubuntu LTS/insert your distro of choice". Constraining the platforms we will offer free support for is a 100% reasonable thing for volunteers to do, so it's entirely OK to tell folks that use another platform "Sorry, I can't reproduce that behaviour on a platform I support, so you're going to have to ask for help in a platform specific forum", and then point them at a community support forum like askubuntu.com or ask.fedoraproject.org, or else nudge them in the direction of their vendor (if they're running a commercially supported Python distribution). I do that myself - as a concrete example, I knew the interaction between the -m switch and multiprocessing on Windows was broken for years before I finally fixed it when support for the multiprocessing "spawn" mode behaviour that is used on Windows was added to the CPython Linux builds in Python 3.4.

...

From a conda perspective though, the difference between Linux and other deployment targets in the current situation is that one of the competing CPython ABI providers is the operating system itself, while on Windows it's mainly other commercial Python redistributors (ActiveState, Enthought), and on Mac OS X it's mainly a community driven package management community (brew) and a community run Python redistributor (pyenv).

You're also up against the fact that most folks using Linux on a client device are doing so by specific choice, and that Linux has been the default supported system for open source projects for 20+ years. This means that, despite what I said above, it's much easier in practice for Python project participants to say they don't support and can't help with PyPM/Enthought/brew/pyenv than it is to say that they can't help folks running their software directly in the Linux system Python. Putting my work hat back on for a moment, I actually wish more people *would* start saying that, as Red Hat actively want people to stop running their own applications in the system Python, and start using Software Collections (either directly or via the Docker images) instead. Sharing a language runtime environment between operating system components and end user applications creates all sorts of maintainability problems (not least of which is the inability to upgrade to new feature releases for fear of breaking end user applications and vice-versa), to the point where Fedora is planning to cede the "/usr/bin/python3" namespace to end users, and start migrating system components out to "/usr/bin/system-python": https://fedoraproject.org/wiki/Changes/System_Python

...

I wasn't there, but I think conda started with the goal of supporting three core platforms (that are pretty different) with the same UI, and as much of the same code as possible. And if you want to do that -- it really makes a lot of sense to control as much of the system as possible.

Yep, hence my use of the phrase "cross-platform platform" - conda defines a cross-platform API and build infrastructure for a few different ABIs (e.g. x86_64 Linux, x32/x86_64 Windows, Mac OS X) As a bit of additional background on this topic, folks that haven't seen it may want to take a look at my "Python beyond CPython: Adventures in Software Distribution" keynote from SciPy 2014, especially the part starting from around 7:10 about software redistribution networks: https://youtu.be/IVzjVqr_Bzs?t=458 The written speaker notes for that are available at https://bitbucket.org/ncoghlan/misc/src/default/talks/2014-07-scipy/talk.md

...

...
It's not pointless, just hard,

That I do see -- I see it as hard, so hard that it's likely to hit road blocks, so we could make better progress with other solutions if that's where people put their energies. And you said yourself earlier in this post that:

"I don't think there's much chance of any of this ever working on Windows"

One of the "points" for me it to support Windows, OS-X and Linux -- so in that sense -- I guess it is pointless ;-)

For Windows and Mac OS X, my typical advice is "just use conda" and pointing folks towards the Software Carpentry installation guide :) The only time I deviate from that advice is if I know someone is specifically interested in getting into professional Python development as a paid gig, as in that context you're likely to need to learn to be flexible about the base runtimes you work with. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Jeremy Stanley

8:15 p.m.

On 2016-11-05 17:43:48 +1000 (+1000), Nick Coghlan wrote: [...]

...

Putting my work hat back on for a moment, I actually wish more people *would* start saying that, as Red Hat actively want people to stop running their own applications in the system Python, and start using Software Collections (either directly or via the Docker images) instead. Sharing a language runtime environment between operating system components and end user applications creates all sorts of maintainability problems (not least of which is the inability to upgrade to new feature releases for fear of breaking end user applications and vice-versa), to the point where Fedora is planning to cede the "/usr/bin/python3" namespace to end users, and start migrating system components out to "/usr/bin/system-python": https://fedoraproject.org/wiki/Changes/System_Python [...]

It's a grey area, complicated by the fact that many people are writing their software with the intent of also packaging it for/in common distros and so want to make sure it works with the "system Python" present within them. There it looks like Fedora is splitting their Python-using packages along some arbitrary line of "is it a system component?" vs. "is it a user-facing application?" which is probably tractable given the (relatively) limited number of packages in their distribution. I can't fathom how to go about trying to introduce a similar restructuring in, say, Debian. -- Jeremy Stanley

Nick Coghlan

9:49 p.m.

On 6 November 2016 at 00:45, Jeremy Stanley wrote:

...

On 2016-11-05 17:43:48 +1000 (+1000), Nick Coghlan wrote: [...]

...
Putting my work hat back on for a moment, I actually wish more people *would* start saying that, as Red Hat actively want people to stop running their own applications in the system Python, and start using Software Collections (either directly or via the Docker images) instead. Sharing a language runtime environment between operating system components and end user applications creates all sorts of maintainability problems (not least of which is the inability to upgrade to new feature releases for fear of breaking end user applications and vice-versa), to the point where Fedora is planning to cede the "/usr/bin/python3" namespace to end users, and start migrating system components out to "/usr/bin/system-python": https://fedoraproject.org/wiki/Changes/System_Python [...]

It's a grey area, complicated by the fact that many people are writing their software with the intent of also packaging it for/in common distros and so want to make sure it works with the "system Python" present within them.

There's a lot more folks starting to challenge the idea of the monolithic all-inclusive distro as a sensible software management technique, though. The most popular client distro (Android) maintains a strict separation between the vendor provided platform and the sandboxed applications running on top of it, and that turns out to provide a lot of desirable characteristics so long as your build and publication infrastructure can cope with respinning the world for security updates. Go back 10 years and "rebuild the world on demand" simply wasn't feasible, but it's becoming a lot more viable now, especially as the mainstream Linux sandboxing capabilities continue to improve.

...

There it looks like Fedora is splitting their Python-using packages along some arbitrary line of "is it a system component?" vs. "is it a user-facing application?" which is probably tractable given the (relatively) limited number of packages in their distribution.

There's still more than five thousand Python using packages in the Fedora repos, and most of them are likely to remain classified as non-system optional addons. However, actual core system components like dnf, the Anaconda installer, firewalld, SELinux policy management, etc are likely to migrate - the idea being that "sudo dnf uninstall python3" shouldn't render your system fundamentally unusable. (Part of the motivation here is also simply size - Fedora hasn't historically had the python/python-minimal split that Debian has, which is becoming more of a problem as folks try to minimise base image sizes)

...

I can't fathom how to go about trying to introduce a similar restructuring in, say, Debian.

The problem isn't so much with the components that are in the distro, as it is the end user applications and scripts that aren't published online anywhere at all, let alone as distro packages. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Robert Collins

6 Nov 6 Nov

10:14 a.m.

On 3 November 2016 at 22:10, Nathaniel Smith wrote:

...

On Nov 3, 2016 1:40 AM, "Nick Coghlan" wrote:

...

And then it segfaults because it turns out that your library named <X> is not abi compatible with my library named <X>. Or it would have been if you had the right version, but distros don't agree on how to express version numbers either. (Just think about epochs.) Or we're on Windows, so it's interesting to know that we need a library named <X>, but what are we supposed to do with that information exactly?

Well, we weren't trying to fix the problem with incompatible ABI's: the thoughts that I recall were primarily around getting development header files in place, to permit building (and then caching) local binary wheels. The ports system for example, works very very robustly, even though (it used to) require building everything from scratch. That was my inspiration. The manylinux approach seems better to me for solving the 'install a binary wheel in many places' problem, because of the issues with variance across higher layered libraries you mention :).

...

Again, I don't want to just be throwing around stop energy -- if people want to tackle these problems then I wish them luck. But I don't think we should be hand waving this as a basically solved problem that just needs a bit of coding, because that also can create stop energy that blocks an honest evaluation of alternative approaches.

+1. ...> dnf/apt/pacman/chocolatey/whatever and make my wheel work everywhere -- and

...

that this will be an viable alternative to conda.

As a distribution of sources, I think its very viable with the approach above: indeed we do similar things using bindep in OpenStack and that seems to be working out pretty well. -Rob

Nick Coghlan

10:55 a.m.

On 6 November 2016 at 14:44, Robert Collins wrote:

...

On 3 November 2016 at 22:10, Nathaniel Smith wrote: ...> dnf/apt/pacman/chocolatey/whatever and make my wheel work everywhere -- and

...
that this will be an viable alternative to conda.

As a distribution of sources, I think its very viable with the approach above: indeed we do similar things using bindep in OpenStack and that seems to be working out pretty well.

Right, and I guess we should also clearly distinguish between "publisher provided binaries" and "redistributor provided binaries", as I think a potentially good way to go long term for Linux might look something like this: * manylinuxN A relatively narrow multi-distro ABI with updates driven mainly by when old CentOS versions go EOL We already have manylinux1, and it covers many cases, but not everything (e.g. lxml would need to statically link libxml2, libxslt and maybe a few other libraries to be manylinux1 compatible). Most publishers aren't going to want to go beyond providing manylinux binaries, since the combinatorial explosion of possibilities beyond that point makes doing so impractical. * linux_{arch}_{distro}_{version} This is the scheme Nate Coraor adopted in order to migrate Galaxy Project to using wheel files for binary caching: https://docs.galaxyproject.org/en/master/admin/framework_dependencies.html#g... The really neat thing about that approach is that it's primarily local config file driven: https://docs.galaxyproject.org/en/master/admin/framework_dependencies.html#w... What that means is that if we decided to standardise that approach, then conda could readily declare itself as a distinct platform in its own right - wheels built with a conda Python install could get tagged as conda wheels (since they might depend on external libraries from the conda ecosystem), and folks running conda environments could consume those wheels without needing to rebuild them. You'd also be able to share a single wheelhouse between the system Python and conda installs without inadvertently clobbering cached wheel files with incompatible versions. Being config file driven would also allow this to be extended to Mac OS X and Windows fairly readily, where the platform ABI implicitly defined by the python.org CPython binaries is even narrower than that of manylinux1. This scheme would mostly be useful in the near term for the way Galaxy Project is using it: maintainers of a controlled environment using it for managed caching of built artifacts that can then be installed with pip. However, it *might* eventually see use in another context as well: redistributors using it to provide prebuilt venv-compatible wheel files for their particular Python distribution. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Robert Collins

10:10 a.m.

On 3 November 2016 at 21:40, Nick Coghlan wrote:

...

On 3 November 2016 at 05:28, Nathaniel Smith wrote:

...
On Nov 2, 2016 9:52 AM, "Nick Coghlan" wrote:

...
Tennessee Leeuwenberg started a draft PEP for that first part last year: https://github.com/pypa/interoperability-peps/pull/30/files

dnf/yum, apt, brew, conda, et al all *work around* the current lack of such a system by asking humans (aka "downstream package maintainers") to supply the additional information by hand in a platform specific format.

To be fair, though, it's not yet clear whether such a system is actually possible. AFAIK no one has ever managed to reliably translate between different languages that Linux distros use for describing environment constraints, never mind handling the places where they're genuinely irreconcilable (e.g. the way different distro openssl packages have incompatible ABIs), plus other operating systems too.

The main problem with mapping between Debian/RPM/conda etc in the general case is that the dependencies are generally expressed in terms of *names* rather than runtime actions, and you also encounter problems with different dependency management ecosystems splitting up (or aggregating!) different upstream components differently. This means you end up with a situation that's more like a lossy transcoding between MP3 and OGG Vorbis or vice-versa than it is a pristine encoding to either from a losslessly encoded FLAC or WAV file.

The approach Tennessee and Robert Collins came up with (which still sounds sensible to me) is that instead of dependencies on particular external components, what we want to be able express is instead a range of *actions* that the software is going to take:

- "I am going to dynamically load a library named <X>" - "I am going to execute a subprocess for a command named <Y>"

...

And rather than expecting folks to figure that stuff out for themselves, you'd use tools like auditwheel and strace to find ways to generate it and/or validate it.

I don't think thats what we had proposed though its similar. My recollection is proposing that dependencies on system components be expressed as dependencies on the specific files rather than on abstract names. Packaging systems generally have metadata to map files to packages - and so a packaging specific driver can map 'libfoo1.2.so' into libfoo-1.2 on Debian, or libfoo1.2 on a slightly different naming style platform. The use of a tuple (type, relpath) was to abstract away from different FSH's. ABI's for libraries are well established and at the control of upstream, so there's no need for random audit tools :). -Rob

Thomas Güttler

1 Nov 1 Nov

11:05 p.m.

Am 01.11.2016 um 10:50 schrieb Nick Coghlan:

...

On 1 November 2016 at 17:30, Thomas Güttler wrote:

...
Am 17.09.2016 um 12:29 schrieb Nick Coghlan:

...
Hi folks,

Prompted by a few posts I read recently about the current state of the Python packaging ecosystem, I figured it made sense to put together an article summarising my own perspective on the current state of things: http://www.curiousefficiency.org/posts/2016/09/python-packaging-ecosystem.ht...

Thank you for this summarizing article. Yes, a lot was done during the last months.

I liked the part "My core software ecosystem design philosophy" a lot, since it explains that both parties (software consumer and software publisher) want it to be simple and easy.

About conda: if pip and conda overlap in some point. Why not implement this in a reusable library which gets used by conda and pip?

For the parts where they genuinely overlap, conda is already able to just use pip, or else the same libraries that pip uses. For the platform management pieces (SAT solving for conda repositories, converting PyPI packages to conda ones, language independent environment management), what conda does is outside the scope of what pip supports anyway.

...
About funding: Looking for more income is one way to solve this. Why not look into the other direction: How to reduce costs?

Thanks to donated infrastructure, the direct costs to the PSF are incredibly low already. Donald went into some detail on that in https://caremad.io/posts/2016/05/powering-pypi/ and that's mostly still accurate (although his funded role with HPE ended recently)

...
Heading "Making the presence of a compiler on end user systems optional". Here I just can say: Thank you very much. I guess it was a lot of hard work to make this all simple and easy for the software consumers and publishers. Thank you.

I wrote some lines, but I deleted my thoughts about the topic "Automating wheel creation", since I am a afraid it could raise bad mood in this list again. That's not my goal.

I currently see 3 main ways that could eventually happen:

- the PSF sorts out its general sustainability concerns to the point where it believes it can credibly maintain such a service on the community's behalf - the conda-forge folks branch out into offering wheel building as well (so it becomes a matter of "publish your Python projects for the user level conda platform, get platform independent Python wheels as well") - someone builds such a service independently of the current PyPI infrastructure team, and convinces package publishers to start using it

There's also a 4th variant, which is getting to a point where someone figures out a pushbutton solution for a build pipeline in a public PaaS that offers a decent free tier. This is potentially one of the more promising options, since it means the sustainability risks related to future growth in demand accrue to the PaaS providers, rather than to the PSF. However, it's somewhat gated on the Warehouse migration, since you really want API token support for that kind of automation, which is something the current PyPI code base doesn't have, and isn't going to gain.

I like this 4th variant. I guess most companies which use Python in a professional way already run a own pypi server. I am unsure if a public PaaS for this would exist. Maybe a script which runs a container on linux is enough. At least enough to build linux-only wheels. I guess most people have root access to a linux server somewhere. Regards, Thomas -- Thomas Guettler http://www.thomas-guettler.de/

Matthew Brett

10:20 p.m.

Hi, On Tue, Nov 1, 2016 at 10:35 AM, Thomas Güttler wrote:

...

Am 01.11.2016 um 10:50 schrieb Nick Coghlan:

...
On 1 November 2016 at 17:30, Thomas Güttler wrote:

...
Am 17.09.2016 um 12:29 schrieb Nick Coghlan:

...
Hi folks,

Prompted by a few posts I read recently about the current state of the Python packaging ecosystem, I figured it made sense to put together an article summarising my own perspective on the current state of things:

http://www.curiousefficiency.org/posts/2016/09/python-packaging-ecosystem.ht...

Thank you for this summarizing article. Yes, a lot was done during the last months.

I liked the part "My core software ecosystem design philosophy" a lot, since it explains that both parties (software consumer and software publisher) want it to be simple and easy.

About conda: if pip and conda overlap in some point. Why not implement this in a reusable library which gets used by conda and pip?

For the parts where they genuinely overlap, conda is already able to just use pip, or else the same libraries that pip uses. For the platform management pieces (SAT solving for conda repositories, converting PyPI packages to conda ones, language independent environment management), what conda does is outside the scope of what pip supports anyway.

...
About funding: Looking for more income is one way to solve this. Why not look into the other direction: How to reduce costs?

Thanks to donated infrastructure, the direct costs to the PSF are incredibly low already. Donald went into some detail on that in https://caremad.io/posts/2016/05/powering-pypi/ and that's mostly still accurate (although his funded role with HPE ended recently)

...
Heading "Making the presence of a compiler on end user systems optional". Here I just can say: Thank you very much. I guess it was a lot of hard work to make this all simple and easy for the software consumers and publishers. Thank you.

I wrote some lines, but I deleted my thoughts about the topic "Automating wheel creation", since I am a afraid it could raise bad mood in this list again. That's not my goal.

I currently see 3 main ways that could eventually happen:

- the PSF sorts out its general sustainability concerns to the point where it believes it can credibly maintain such a service on the community's behalf - the conda-forge folks branch out into offering wheel building as well (so it becomes a matter of "publish your Python projects for the user level conda platform, get platform independent Python wheels as well") - someone builds such a service independently of the current PyPI infrastructure team, and convinces package publishers to start using it

There's also a 4th variant, which is getting to a point where someone figures out a pushbutton solution for a build pipeline in a public PaaS that offers a decent free tier. This is potentially one of the more promising options, since it means the sustainability risks related to future growth in demand accrue to the PaaS providers, rather than to the PSF. However, it's somewhat gated on the Warehouse migration, since you really want API token support for that kind of automation, which is something the current PyPI code base doesn't have, and isn't going to gain.

I like this 4th variant. I guess most companies which use Python in a professional way already run a own pypi server.

I am unsure if a public PaaS for this would exist. Maybe a script which runs a container on linux is enough. At least enough to build linux-only wheels. I guess most people have root access to a linux server somewhere.

I wrote some scripts to do this: https://github.com/matthew-brett/multibuild See the README there for standard usage. For simple projects the project configuration is only a few lines. Quite a few projects use multibuild and travis-ci to build 32- and 64-bit manylinux and OSX wheels, here are some (more complex) examples: https://travis-ci.org/MacPython/numpy-wheels https://travis-ci.org/MacPython/scipy-wheels https://travis-ci.org/MacPython/matplotlib-wheels https://travis-ci.org/scikit-image/scikit-image-wheels https://travis-ci.org/MacPython/pandas-wheels https://travis-ci.org/python-pillow/pillow-wheels https://travis-ci.org/MacPython/cython-wheels The travis-ci jobs mostly upload to a CDN on a Rackspace account donated to scikit-learn, from which the release managers can upload to pypi with a few lines at the terminal. So, for many of us scientific Python projects, and for Linux and OSX, this problem has a reasonable solution in principle. One hurdle at the moment, is that new projects have to either get an encrypted key to the scikit-learn account from one of us with the credentials, or roll their own mechanism of uploading the built wheels. It would be very helpful to have some standard pypa account mechanism for this. That would allow us to standardize the tools and permission mechanism, Cheers, Matthew

Thomas Güttler

2 Nov 2 Nov

11:30 a.m.

New subject: Travis-CI is not open source. Was: Current Python packaging status (from my point of view)

Am 01.11.2016 um 17:50 schrieb Matthew Brett:

...

Hi,

On Tue, Nov 1, 2016 at 10:35 AM, Thomas Güttler wrote:

...
Am 01.11.2016 um 10:50 schrieb Nick Coghlan:

...
On 1 November 2016 at 17:30, Thomas Güttler wrote:

...
Am 17.09.2016 um 12:29 schrieb Nick Coghlan:

...
Hi folks,

Prompted by a few posts I read recently about the current state of the Python packaging ecosystem, I figured it made sense to put together an article summarising my own perspective on the current state of things:

http://www.curiousefficiency.org/posts/2016/09/python-packaging-ecosystem.ht...

Thank you for this summarizing article. Yes, a lot was done during the last months.

I liked the part "My core software ecosystem design philosophy" a lot, since it explains that both parties (software consumer and software publisher) want it to be simple and easy.

About conda: if pip and conda overlap in some point. Why not implement this in a reusable library which gets used by conda and pip?

For the parts where they genuinely overlap, conda is already able to just use pip, or else the same libraries that pip uses. For the platform management pieces (SAT solving for conda repositories, converting PyPI packages to conda ones, language independent environment management), what conda does is outside the scope of what pip supports anyway.

...
About funding: Looking for more income is one way to solve this. Why not look into the other direction: How to reduce costs?

Thanks to donated infrastructure, the direct costs to the PSF are incredibly low already. Donald went into some detail on that in https://caremad.io/posts/2016/05/powering-pypi/ and that's mostly still accurate (although his funded role with HPE ended recently)

...
Heading "Making the presence of a compiler on end user systems optional". Here I just can say: Thank you very much. I guess it was a lot of hard work to make this all simple and easy for the software consumers and publishers. Thank you.

I wrote some lines, but I deleted my thoughts about the topic "Automating wheel creation", since I am a afraid it could raise bad mood in this list again. That's not my goal.

I currently see 3 main ways that could eventually happen:

- the PSF sorts out its general sustainability concerns to the point where it believes it can credibly maintain such a service on the community's behalf - the conda-forge folks branch out into offering wheel building as well (so it becomes a matter of "publish your Python projects for the user level conda platform, get platform independent Python wheels as well") - someone builds such a service independently of the current PyPI infrastructure team, and convinces package publishers to start using it

There's also a 4th variant, which is getting to a point where someone figures out a pushbutton solution for a build pipeline in a public PaaS that offers a decent free tier. This is potentially one of the more promising options, since it means the sustainability risks related to future growth in demand accrue to the PaaS providers, rather than to the PSF. However, it's somewhat gated on the Warehouse migration, since you really want API token support for that kind of automation, which is something the current PyPI code base doesn't have, and isn't going to gain.

I like this 4th variant. I guess most companies which use Python in a professional way already run a own pypi server.

I am unsure if a public PaaS for this would exist. Maybe a script which runs a container on linux is enough. At least enough to build linux-only wheels. I guess most people have root access to a linux server somewhere.

I wrote some scripts to do this:

https://github.com/matthew-brett/multibuild

See the README there for standard usage. For simple projects the project configuration is only a few lines. Quite a few projects use multibuild and travis-ci to build 32- and 64-bit manylinux and OSX wheels, here are some (more complex) examples:

https://travis-ci.org/MacPython/numpy-wheels https://travis-ci.org/MacPython/scipy-wheels https://travis-ci.org/MacPython/matplotlib-wheels https://travis-ci.org/scikit-image/scikit-image-wheels https://travis-ci.org/MacPython/pandas-wheels https://travis-ci.org/python-pillow/pillow-wheels https://travis-ci.org/MacPython/cython-wheels

The travis-ci jobs mostly upload to a CDN on a Rackspace account donated to scikit-learn, from which the release managers can upload to pypi with a few lines at the terminal.

So, for many of us scientific Python projects, and for Linux and OSX, this problem has a reasonable solution in principle.

One hurdle at the moment, is that new projects have to either get an encrypted key to the scikit-learn account from one of us with the credentials, or roll their own mechanism of uploading the built wheels. It would be very helpful to have some standard pypa account mechanism for this. That would allow us to standardize the tools and permission mechanism,

I see an other hurdle. travis-ci is very widespread, but AFAIK it is not open source: https://travis-ci.com/plans -- http://www.thomas-guettler.de/

Ian Cordasco

6:43 p.m.

New subject: Travis-CI is not open source. Was: Current Python packaging status (from my point of view)

On Wed, Nov 2, 2016 at 1:00 AM, Thomas Güttler wrote:

...

I see an other hurdle. travis-ci is very widespread, but AFAIK it is not open source:

https://travis-ci.com/plans

It is open source: https://github.com/travis-ci Sadly, the infrastructure it takes to operate it is not Free.

Nick Coghlan

8:24 p.m.

New subject: Travis-CI is not open source. Was: Current Python packaging status (from my point of view)

On 2 November 2016 at 23:13, Ian Cordasco wrote:

...

On Wed, Nov 2, 2016 at 1:00 AM, Thomas Güttler wrote:

...
I see an other hurdle. travis-ci is very widespread, but AFAIK it is not open source:

https://travis-ci.com/plans

It is open source: https://github.com/travis-ci

Sadly, the infrastructure it takes to operate it is not Free.

This is also an area where I'm fine with recommending freemium solutions if they're the lowest barrier to entry option for new users, and "Use GitHub + Travis CI" qualifies on that front. The CPython repo hosting debate forced me to seriously consider when in someone's journey into open source and free software it makes sense to introduce the notion of "Free software needs free tools", and I came to the conclusion that in most cases the right answer is "After they discover the concept on their own and want to know more about it". My hard requirement is that folks using completely open source/free software stacks be *supported*, and that's not at risk here (since there are plenty of ways to run your own private build service). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Barry Warsaw

3 Nov 3 Nov

10:47 p.m.

New subject: Travis-CI is not open source. Was: Current Python packaging status (from my point of view)

On Nov 03, 2016, at 12:54 AM, Nick Coghlan wrote:

...

This is also an area where I'm fine with recommending freemium solutions if they're the lowest barrier to entry option for new users, and "Use GitHub + Travis CI" qualifies on that front.

I won't rehash the GitHub/GitLab debate, but in some of my projects (hosted on GH) I've had to ditch Travis because of limitations on that platform. Specifically, I needed to run various tests on an exact specification of various Ubuntu platforms, e.g. does X run on an up-to-date Ubuntu Y.Z? I originally used Docker for this, but our projects had additional constraints, such as needing to bind-mount, which aren't supported on the Travis+Docker platform. So we ended up ditching the Travis integration and hooking our test suite into the Ubuntu autopkgtest system (which is nearly identical to the Debian autopkgtest system but runs on Ubuntu infrastructure). Python may not be affected by similar constraints, but it is worth keeping in mind. Travis isn't a perfect technical solution for all projects, but it may be good enough for Python. Cheers, -Barry

Glyph Lefkowitz

11:38 p.m.

New subject: continuous integration options (was Re: Travis-CI is not open source, except in fact it *is* open source)

...

On Nov 3, 2016, at 10:17 AM, Barry Warsaw wrote:

On Nov 03, 2016, at 12:54 AM, Nick Coghlan wrote:

...
This is also an area where I'm fine with recommending freemium solutions if they're the lowest barrier to entry option for new users, and "Use GitHub + Travis CI" qualifies on that front.

I won't rehash the GitHub/GitLab debate, but in some of my projects (hosted on GH) I've had to ditch Travis because of limitations on that platform. Specifically, I needed to run various tests on an exact specification of various Ubuntu platforms, e.g. does X run on an up-to-date Ubuntu Y.Z?

I originally used Docker for this, but our projects had additional constraints, such as needing to bind-mount, which aren't supported on the Travis+Docker platform. So we ended up ditching the Travis integration and hooking our test suite into the Ubuntu autopkgtest system (which is nearly identical to the Debian autopkgtest system but runs on Ubuntu infrastructure).

Python may not be affected by similar constraints, but it is worth keeping in mind. Travis isn't a perfect technical solution for all projects, but it may be good enough for Python.

I think phrasing this in terms of "perfect" and "good enough" presents a highly misleading framing. Examined in this fashion, of course we may reluctantly use the "good enough" option, but don't we want the best option? A better way to look at it is cost vs. benefit. How much does it cost you in terms of time and money to run and administer the full spectrum of "real" operating systems X.Z that you wish to support? How much does it cost in terms of waiting for all that extra build infrastructure to run all the time? How much additional confidence and assurance that it will work does that buy you, over the confidence of passing tests within a docker container? Is that additional confidence worth the investment of resources? Of course, volunteer-driven projects are not concerned directly with top-level management allocation of ostensibly fungible resources, and so a hard "costly" solution that someone is interested in and committed to is far less expensive than a "cheap" solution that everyone finds boring, so we have to take that into account as well. As it happens, Twisted has a massive investment in existing Buildbot CI infrastructure _as well as_ Travis and Appveyor. Travis and Appveyor address something that our CI can't, which is allowing unauthenticated builds from randos issuing their first pull requests. This gives contributors much faster feedback which is adequate for the majority of changes. However, many of our ancillary projects, which do not have as many platform-sensitive components, are built using Travis only, and that's a very good compromise for them. It has allowed us to maintain a much larger and more diverse ecosystem with a much smaller team than we used to be able to. In the future, we may have to move to a different CI service, but I can tell you that for sure that 90% of the work involved in getting builds to run on Travis is transferrable to any platform that can run a shell script. There's a bit of YAML configuration we would need to replicate, and we might have to do some fancy dancing with Docker to get other ancillary services run on the backend in some other way, but I would not worry about vendor lock-in at all for this sort of service. Probably, the amount of time and energy on system maintenance that Travis saves us in a given week is enough to balance out all the possible future migration work. -glyph

Barry Warsaw

11:57 p.m.

New subject: continuous integration options (was Re: Travis-CI is not open source, except in fact it *is* open source)

On Nov 03, 2016, at 11:08 AM, Glyph Lefkowitz wrote:

...

I think phrasing this in terms of "perfect" and "good enough" presents a highly misleading framing. Examined in this fashion, of course we may reluctantly use the "good enough" option, but don't we want the best option?

What are the criteria for "best"? I'm not saying don't use Travis, I'm just trying to express that there are technical limitations, which is almost definitely true of any CI infrastructure. If those limitations don't affect your project, great, go for it! Cheers, -Barry

Nathaniel Smith

4 Nov 4 Nov

1:37 a.m.

New subject: continuous integration options (was Re: Travis-CI is not open source, except in fact it *is* open source)

I think we're drifting pretty far off topic here... IIRC the original discussion was about whether the travis-ci infrastructure could be suborned to provide an sdist->wheel autobuilding service for pypi. (Answer: maybe, though it would be pretty awkward, and no one seems to be jumping up to make it happen.) On Nov 3, 2016 11:28 AM, "Barry Warsaw" wrote:

...

On Nov 03, 2016, at 11:08 AM, Glyph Lefkowitz wrote:

...
I think phrasing this in terms of "perfect" and "good enough" presents a highly misleading framing. Examined in this fashion, of course we may reluctantly use the "good enough" option, but don't we want the best option?

What are the criteria for "best"?

I'm not saying don't use Travis, I'm just trying to express that there are technical limitations, which is almost definitely true of any CI infrastructure. If those limitations don't affect your project, great, go for it!

Cheers, -Barry

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Alex Grönholm

2:26 a.m.

New subject: continuous integration options (was Re: Travis-CI is not open source, except in fact it *is* open source)

I don't know if it has been mentioned before, but Travis already provides a way to automatically package and upload sdists and wheels to PyPI: https://docs.travis-ci.com/user/deployment/pypi/ I've been using it myself in many projects and it has worked quite well. Granted, I haven't had to deal with binary wheels yet, but I think Travis should be able to handle creation of macOS and manylinux1 wheels. 03.11.2016, 22:07, Nathaniel Smith kirjoitti:

...

I think we're drifting pretty far off topic here... IIRC the original discussion was about whether the travis-ci infrastructure could be suborned to provide an sdist->wheel autobuilding service for pypi. (Answer: maybe, though it would be pretty awkward, and no one seems to be jumping up to make it happen.)

On Nov 3, 2016 11:28 AM, "Barry Warsaw" mailto:barry@python.org> wrote:

On Nov 03, 2016, at 11:08 AM, Glyph Lefkowitz wrote:

>I think phrasing this in terms of "perfect" and "good enough" presents a >highly misleading framing. Examined in this fashion, of course we may >reluctantly use the "good enough" option, but don't we want the best option?

What are the criteria for "best"?

I'm not saying don't use Travis, I'm just trying to express that there are technical limitations, which is almost definitely true of any CI infrastructure. If those limitations don't affect your project, great, go for it!

Cheers, -Barry

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org mailto:Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig https://mail.python.org/mailman/listinfo/distutils-sig

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Nick Coghlan

5 Nov 5 Nov

11:59 a.m.

New subject: continuous integration options (was Re: Travis-CI is not open source, except in fact it *is* open source)

On 4 November 2016 at 06:07, Nathaniel Smith wrote:

...

I think we're drifting pretty far off topic here... IIRC the original discussion was about whether the travis-ci infrastructure could be suborned to provide an sdist->wheel autobuilding service for pypi. (Answer: maybe, though it would be pretty awkward, and no one seems to be jumping up to make it happen.)

The hard part of designing any such system isn't so much the building process, it's the authentication, authorisation and trust management for the release publication step. At the moment, that amounts to "Give the service your PyPI password, so PyPI will 100% trust that they're you" due to limitations on the PyPI side of things, and "Hello web service developer, I grant you 100% authority to impersonate me to the rest of the world however you like" is a questionable idea in any circumstance, let alone when we're talking about publishing software. Since we don't currently provide end-to-end package signing, PyPI initiated builds would solve the trust problem by having PyPI trust *itself*, and only require user credentials for the initial source upload. This has the major downside that "safely" running arbitrary code from unknown publishers is a Hard Problem, which is one of the big reasons that Linux distros put so many hurdles in the way of becoming a package maintainer (i.e. package maintainers get to run arbitrary code not just inside the distro build system but also on end user machines, usually with elevated privileges, so you want to establish a pretty high level of trust before letting people do it). If I understand correctly, conda-forge works on the same basic principle - reviewing the publishers before granting them publication access, rather than defending against arbitrarily malicious code at build time. A more promising long term path is trust federation, which many folks will already be familiar with through granting other services access to their GitHub repositories, or using Twitter/Facebook/Google/et al to sign into various systems. That's not going to be a quick fix though, as it's contingent on sorting out the Warehouse migration challenges, and those are already significant enough without piling additional proposed changes on top of the already pending work. However, something that could potentially be achieved in the near term given folks interested enough in the idea to set about designing it would be a default recommendation for a Travis CI + Appveyor + GitHub Releases based setup that automated everything *except* the final upload to PyPI, but then also offered a relatively simple way for folks to pull their built artifacts from GitHub and push them to PyPI (such that their login credentials never left their local system). Folks that care enough about who hosts their source code to want to avoid GitHub, or have complex enough build system needs that Travis CI isn't sufficient, are likely to be technically sophisticated enough to adapt service specific instructions to their particular circumstances, and any such design work now would be a useful template for full automation at some point in the future. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Wes Turner

4:13 p.m.

New subject: continuous integration options (was Re: Travis-CI is not open source, except in fact it *is* open source)

On Saturday, November 5, 2016, Nick Coghlan wrote:

...

On 4 November 2016 at 06:07, Nathaniel Smith javascript:;> wrote:

...
I think we're drifting pretty far off topic here... IIRC the original discussion was about whether the travis-ci infrastructure could be suborned to provide an sdist->wheel autobuilding service for pypi. (Answer: maybe, though it would be pretty awkward, and no one seems to be jumping up to make it happen.)

The hard part of designing any such system isn't so much the building process, it's the authentication, authorisation and trust management for the release publication step. At the moment, that amounts to "Give the service your PyPI password, so PyPI will 100% trust that they're you" due to limitations on the PyPI side of things, and "Hello web service developer, I grant you 100% authority to impersonate me to the rest of the world however you like" is a questionable idea in any circumstance, let alone when we're talking about publishing software.

Since we don't currently provide end-to-end package signing, PyPI initiated builds would solve the trust problem by having PyPI trust *itself*, and only require user credentials for the initial source upload. This has the major downside that "safely" running arbitrary code from unknown publishers is a Hard Problem, which is one of the big reasons that Linux distros put so many hurdles in the way of becoming a package maintainer (i.e. package maintainers get to run arbitrary code not just inside the distro build system but also on end user machines, usually with elevated privileges, so you want to establish a pretty high level of trust before letting people do it). If I understand correctly, conda-forge works on the same basic principle - reviewing the publishers before granting them publication access, rather than defending against arbitrarily malicious code at build time.

- https://conda-forge.github.io - https://github.com/conda-forge - https://github.com/conda-forge/feedstocks - https://github.com/conda-forge/conda-smithy

...

A more promising long term path is trust federation, which many folks will already be familiar with through granting other services access to their GitHub repositories, or using Twitter/Facebook/Google/et al to sign into various systems. That's not going to be a quick fix though, as it's contingent on sorting out the Warehouse migration challenges, and those are already significant enough without piling additional proposed changes on top of the already pending work.

- [ ] Warehouse: ENH,SEC: A table, form, API for creating and revoking OAuth authz - (project, key, UPLOAD_RELEASE) - key renewal date There are a few existing OAuth Server libraries for pyramid (Warehouse): - https://github.com/sneridagh/osiris - https://github.com/tilgovi/pyramid-oauthlib - https://github.com/elliotpeele/pyramid_oauth2_provider - [ ] CI Release utility secrets: - VCS commit signature checking keyring - Package signing key (GPG ASC) - Package signature pubkey I just found these: - https://gist.github.com/audreyr/5990987 - https://github.com/michaeljoseph/changes - https://pypi.python.org/pypi/jarn.mkrelease - scripted GPG - https://caremad.io/posts/2013/07/packaging-signing-not-holy-grail/ - SHOULD have OOB keyring dist - https://github.com/pypa/twine/issues/157 - twine uploads *.asc if it exists - http://pythonhosted.org/distlib/tutorial.html#signing-a-distribution - http://pythonhosted.org/distlib/tutorial.html#verifying-signatures - SHOULD specify a limited keying-dir/ - https://packaging.python.org/distributing/ - [ ] howto create an .asc signature - https://docs.travis-ci.com/user/deployment/pypi/ - https://github.com/travis-ci/dpl/blob/master/lib/dpl/provider/pypi.rb - [x] https://github.com/travis-ci/dpl/issues/253 - [ ] oauth key instead of pass - https://docs.travis-ci.com/user/deployment/releases/ - upload release to GitHub - [ ] Is it possible to maintain a simple index of GitHub-hosted releases and .asc signatures w/ gh-pages? (for backups) - twine: Is uploading GitHub releases in scope? - https://pypi.org/search/?q=Github+release

...

However, something that could potentially be achieved in the near term given folks interested enough in the idea to set about designing it would be a default recommendation for a Travis CI + Appveyor + GitHub Releases based setup that automated everything *except* the final upload to PyPI, but then also offered a relatively simple way for folks to pull their built artifacts from GitHub and push them to PyPI (such that their login credentials never left their local system). Folks that care enough about who hosts their source code to want to avoid GitHub, or have complex enough build system needs that Travis CI isn't sufficient, are likely to be technically sophisticated enough to adapt service specific instructions to their particular circumstances, and any such design work now would be a useful template for full automation at some point in the future.

https://www.google.com/search&q=Travis+Appveyor+GitHub+pypi ... also useful: - GitLab CI - .gitlab-ci.yml , config.toml - https://docs.gitlab.com/ce/ci/docker/using_docker_images.html - Jenkins - https://wiki.jenkins-ci.org/display/JENKINS/Docker+Plugin - https://wiki.jenkins-ci.org/display/JENKINS/ShiningPanda+Plugin - https://wiki.jenkins-ci.org/display/JENKINS/GitHub+Plugin - https://wiki.jenkins-ci.org/display/JENKINS/Release+Plugin - Buildbot - http://docs.buildbot.net/latest/manual/cfg-workers-docker.html - GitFlow and HubFlow specify a release/ branch with actual release tags all on master/ - https://datasift.github.io/gitflow/IntroducingGitFlow.html

...

Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com javascript:; | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org javascript:; https://mail.python.org/mailman/listinfo/distutils-sig

Wes Turner

4:32 p.m.

New subject: continuous integration options (was Re: Travis-CI is not open source, except in fact it *is* open source)

For automated deployment / continuous deployment / "continuous delivery": - pip maintains a local cache - devpi can be configured as a transparent proxy cache (in front of pypi.org ) - GitLab CI can show a checkmark for a deploy pipeline stage On Saturday, November 5, 2016, Wes Turner wrote:

...

On Saturday, November 5, 2016, Nick Coghlan javascript:_e(%7B%7D,'cvml','ncoghlan@gmail.com');> wrote:

...
On 4 November 2016 at 06:07, Nathaniel Smith wrote:

...
I think we're drifting pretty far off topic here... IIRC the original discussion was about whether the travis-ci infrastructure could be suborned to provide an sdist->wheel autobuilding service for pypi. (Answer: maybe, though it would be pretty awkward, and no one seems to be jumping up to make it happen.)

The hard part of designing any such system isn't so much the building process, it's the authentication, authorisation and trust management for the release publication step. At the moment, that amounts to "Give the service your PyPI password, so PyPI will 100% trust that they're you" due to limitations on the PyPI side of things, and "Hello web service developer, I grant you 100% authority to impersonate me to the rest of the world however you like" is a questionable idea in any circumstance, let alone when we're talking about publishing software.

Since we don't currently provide end-to-end package signing, PyPI initiated builds would solve the trust problem by having PyPI trust *itself*, and only require user credentials for the initial source upload. This has the major downside that "safely" running arbitrary code from unknown publishers is a Hard Problem, which is one of the big reasons that Linux distros put so many hurdles in the way of becoming a package maintainer (i.e. package maintainers get to run arbitrary code not just inside the distro build system but also on end user machines, usually with elevated privileges, so you want to establish a pretty high level of trust before letting people do it). If I understand correctly, conda-forge works on the same basic principle - reviewing the publishers before granting them publication access, rather than defending against arbitrarily malicious code at build time.

- https://conda-forge.github.io - https://github.com/conda-forge - https://github.com/conda-forge/feedstocks - https://github.com/conda-forge/conda-smithy

...
A more promising long term path is trust federation, which many folks will already be familiar with through granting other services access to their GitHub repositories, or using Twitter/Facebook/Google/et al to sign into various systems. That's not going to be a quick fix though, as it's contingent on sorting out the Warehouse migration challenges, and those are already significant enough without piling additional proposed changes on top of the already pending work.

- [ ] Warehouse: ENH,SEC: A table, form, API for creating and revoking OAuth authz - (project, key, UPLOAD_RELEASE) - key renewal date

There are a few existing OAuth Server libraries for pyramid (Warehouse):

- https://github.com/sneridagh/osiris - https://github.com/tilgovi/pyramid-oauthlib - https://github.com/elliotpeele/pyramid_oauth2_provider

- [ ] CI Release utility secrets: - VCS commit signature checking keyring - Package signing key (GPG ASC) - Package signature pubkey

I just found these:

- https://gist.github.com/audreyr/5990987 - https://github.com/michaeljoseph/changes

- https://pypi.python.org/pypi/jarn.mkrelease - scripted GPG

- https://caremad.io/posts/2013/07/packaging-signing-not-holy-grail/ - SHOULD have OOB keyring dist

- https://github.com/pypa/twine/issues/157 - twine uploads *.asc if it exists

- http://pythonhosted.org/distlib/tutorial.html#signing-a-distribution - http://pythonhosted.org/distlib/tutorial.html#verifying-signatures - SHOULD specify a limited keying-dir/

- https://packaging.python.org/distributing/ - [ ] howto create an .asc signature

- https://docs.travis-ci.com/user/deployment/pypi/ - https://github.com/travis-ci/dpl/blob/master/lib/dpl/provider/pypi.rb - [x] https://github.com/travis-ci/dpl/issues/253 - [ ] oauth key instead of pass

- https://docs.travis-ci.com/user/deployment/releases/ - upload release to GitHub

- [ ] Is it possible to maintain a simple index of GitHub-hosted releases and .asc signatures w/ gh-pages? (for backups) - twine: Is uploading GitHub releases in scope? - https://pypi.org/search/?q=Github+release

...
However, something that could potentially be achieved in the near term given folks interested enough in the idea to set about designing it would be a default recommendation for a Travis CI + Appveyor + GitHub Releases based setup that automated everything *except* the final upload to PyPI, but then also offered a relatively simple way for folks to pull their built artifacts from GitHub and push them to PyPI (such that their login credentials never left their local system). Folks that care enough about who hosts their source code to want to avoid GitHub, or have complex enough build system needs that Travis CI isn't sufficient, are likely to be technically sophisticated enough to adapt service specific instructions to their particular circumstances, and any such design work now would be a useful template for full automation at some point in the future.

https://www.google.com/search&q=Travis+Appveyor+GitHub+pypi

... also useful:

- GitLab CI - .gitlab-ci.yml , config.toml - https://docs.gitlab.com/ce/ci/docker/using_docker_images.html

- Jenkins - https://wiki.jenkins-ci.org/display/JENKINS/Docker+Plugin - https://wiki.jenkins-ci.org/display/JENKINS/ShiningPanda+Plugin - https://wiki.jenkins-ci.org/display/JENKINS/GitHub+Plugin - https://wiki.jenkins-ci.org/display/JENKINS/Release+Plugin

- Buildbot - http://docs.buildbot.net/latest/manual/cfg-workers-docker.html

- GitFlow and HubFlow specify a release/ branch with actual release tags all on master/ - https://datasift.github.io/gitflow/IntroducingGitFlow.html

...
Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Wes Turner

4:34 p.m.

New subject: continuous integration options (was Re: Travis-CI is not open source, except in fact it *is* open source)

On Saturday, November 5, 2016, Wes Turner wrote:

...

For automated deployment / continuous deployment / "continuous delivery":

- pip maintains a local cache - devpi can be configured as a transparent proxy cache (in front of pypi.org)

http://doc.devpi.net/latest/quickstart-pypimirror.html

...

- GitLab CI can show a checkmark for a deploy pipeline stage

On Saturday, November 5, 2016, Wes Turner javascript:_e(%7B%7D,'cvml','wes.turner@gmail.com');> wrote:

...
On Saturday, November 5, 2016, Nick Coghlan wrote:

...
On 4 November 2016 at 06:07, Nathaniel Smith wrote:

...
I think we're drifting pretty far off topic here... IIRC the original discussion was about whether the travis-ci infrastructure could be suborned to provide an sdist->wheel autobuilding service for pypi. (Answer: maybe, though it would be pretty awkward, and no one seems to be jumping up to make it happen.)

The hard part of designing any such system isn't so much the building process, it's the authentication, authorisation and trust management for the release publication step. At the moment, that amounts to "Give the service your PyPI password, so PyPI will 100% trust that they're you" due to limitations on the PyPI side of things, and "Hello web service developer, I grant you 100% authority to impersonate me to the rest of the world however you like" is a questionable idea in any circumstance, let alone when we're talking about publishing software.

Since we don't currently provide end-to-end package signing, PyPI initiated builds would solve the trust problem by having PyPI trust *itself*, and only require user credentials for the initial source upload. This has the major downside that "safely" running arbitrary code from unknown publishers is a Hard Problem, which is one of the big reasons that Linux distros put so many hurdles in the way of becoming a package maintainer (i.e. package maintainers get to run arbitrary code not just inside the distro build system but also on end user machines, usually with elevated privileges, so you want to establish a pretty high level of trust before letting people do it). If I understand correctly, conda-forge works on the same basic principle - reviewing the publishers before granting them publication access, rather than defending against arbitrarily malicious code at build time.

- https://conda-forge.github.io - https://github.com/conda-forge - https://github.com/conda-forge/feedstocks - https://github.com/conda-forge/conda-smithy

...
A more promising long term path is trust federation, which many folks will already be familiar with through granting other services access to their GitHub repositories, or using Twitter/Facebook/Google/et al to sign into various systems. That's not going to be a quick fix though, as it's contingent on sorting out the Warehouse migration challenges, and those are already significant enough without piling additional proposed changes on top of the already pending work.

- [ ] Warehouse: ENH,SEC: A table, form, API for creating and revoking OAuth authz - (project, key, UPLOAD_RELEASE) - key renewal date

There are a few existing OAuth Server libraries for pyramid (Warehouse):

- https://github.com/sneridagh/osiris - https://github.com/tilgovi/pyramid-oauthlib - https://github.com/elliotpeele/pyramid_oauth2_provider

- [ ] CI Release utility secrets: - VCS commit signature checking keyring - Package signing key (GPG ASC) - Package signature pubkey

I just found these:

- https://gist.github.com/audreyr/5990987 - https://github.com/michaeljoseph/changes

- https://pypi.python.org/pypi/jarn.mkrelease - scripted GPG

- https://caremad.io/posts/2013/07/packaging-signing-not-holy-grail/ - SHOULD have OOB keyring dist

- https://github.com/pypa/twine/issues/157 - twine uploads *.asc if it exists

- http://pythonhosted.org/distlib/tutorial.html#signing-a-distribution - http://pythonhosted.org/distlib/tutorial.html#verifying-signatures - SHOULD specify a limited keying-dir/

- https://packaging.python.org/distributing/ - [ ] howto create an .asc signature

- https://docs.travis-ci.com/user/deployment/pypi/ - https://github.com/travis-ci/dpl/blob/master/lib/dpl/provider/pypi.rb - [x] https://github.com/travis-ci/dpl/issues/253 - [ ] oauth key instead of pass

- https://docs.travis-ci.com/user/deployment/releases/ - upload release to GitHub

- [ ] Is it possible to maintain a simple index of GitHub-hosted releases and .asc signatures w/ gh-pages? (for backups) - twine: Is uploading GitHub releases in scope? - https://pypi.org/search/?q=Github+release

...
However, something that could potentially be achieved in the near term given folks interested enough in the idea to set about designing it would be a default recommendation for a Travis CI + Appveyor + GitHub Releases based setup that automated everything *except* the final upload to PyPI, but then also offered a relatively simple way for folks to pull their built artifacts from GitHub and push them to PyPI (such that their login credentials never left their local system). Folks that care enough about who hosts their source code to want to avoid GitHub, or have complex enough build system needs that Travis CI isn't sufficient, are likely to be technically sophisticated enough to adapt service specific instructions to their particular circumstances, and any such design work now would be a useful template for full automation at some point in the future.

https://www.google.com/search&q=Travis+Appveyor+GitHub+pypi

... also useful:

- GitLab CI - .gitlab-ci.yml , config.toml - https://docs.gitlab.com/ce/ci/docker/using_docker_images.html

- Jenkins - https://wiki.jenkins-ci.org/display/JENKINS/Docker+Plugin - https://wiki.jenkins-ci.org/display/JENKINS/ShiningPanda+Plugin - https://wiki.jenkins-ci.org/display/JENKINS/GitHub+Plugin - https://wiki.jenkins-ci.org/display/JENKINS/Release+Plugin

- Buildbot - http://docs.buildbot.net/latest/manual/cfg-workers-docker.html

- GitFlow and HubFlow specify a release/ branch with actual release tags all on master/ - https://datasift.github.io/gitflow/IntroducingGitFlow.html

...
Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Chris Barker

7 Nov 7 Nov

2:50 a.m.

New subject: continuous integration options (was Re: Travis-CI is not open source, except in fact it *is* open source)

On Fri, Nov 4, 2016 at 11:29 PM, Nick Coghlan wrote:

...

If I understand correctly, conda-forge works on the same basic principle - reviewing the publishers before granting them publication access, rather than defending against arbitrarily malicious code at build time.

yup -- that's pretty much it. you need a conda-forge member to merge your PR before you get a "feedstock" tied into the system. I'm confused though -- IIUC, ANYONE can put something up on PyPi with arbitrary code in it that will get run by someone when they do pip install of it. So how is allowing anyone to push something to PyPi that will run arbitrary code on a CI server, that will push arbitrary code to PyPi that will then get run by anyone that pip installs it? Essentially, we have already said that there is no such thing as "trusting PyPi" -- you need to trust each individual package. So how in any sort of auto-build system going to change that?? -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Nick Coghlan

8:58 a.m.

New subject: continuous integration options (was Re: Travis-CI is not open source, except in fact it *is* open source)

On 7 November 2016 at 07:20, Chris Barker wrote:

...

So how is allowing anyone to push something to PyPi that will run arbitrary code on a CI server, that will push arbitrary code to PyPi that will then get run by anyone that pip installs it?

PyPI currently has the ability to impersonate any PyPI publisher, which makes it an enormous security threat in and of itself, so we need to limit the attack surfaces that it exposes.

...

Essentially, we have already said that there is no such thing as "trusting PyPi" -- you need to trust each individual package. So how in any sort of auto-build system going to change that??

Currently we're reasonably confident that the only folks that can compromise Django users (for example) are the Django devs and the PyPI service administrators. The former is an inherent problem in trusting any software publisher, while the latter we currently mitigate by tightly controlling admin access to the production PyPI service, and strictly limiting the server-side processing that PyPI performs on uploaded files to reduce the opportunities for privilege escalation attacks. Once you start providing a server-side build service however, you're opening up additional attack vectors on the core publishing system, and getting any aspect of that wrong may lead to publishers being able to impersonate *each other*. Unfortunately, offering secure multi-tenancy in software services when you allow tenants to run arbitrary code is a really hard problem - it's the main reason that OpenShift v3 hasn't fully displaced OpenShift v2 yet, and that's with the likes of Red Hat, Google, CoreOS and Deis collaborating on the underlying Kubernetes infrastructure. Linux distros and conda-forge duck that multi-tenancy problem by treating the build system itself as the publisher, with everyone with access to it being a relatively trusted co-tenant (think "share house with no locks on interior doors" rather than "apartment complex"). That approach works OK at smaller scales, but the gatekeeping involved in approving new co-publishers introduces off-putting friction for potential participants (hence both app developers and data analysts finding ways to bypass the sysadmin and OS developer dominated Linux packaging ecosystems). For PyPI, we can mitigate the difficulty by getting the builds to happen somewhere else (like external CI services), but even then you still have a non-trivial service integration problem to manage, especially if you decide to tackle it through a "bring your own build service" approach (ala GitHub CI integration). Whichever way you go though (native build service, or integration with external build services), you're signing up for a major ongoing maintenance task, as you're either now responsible for a shared build system serving tens of thousands of software publishers [1], or else you're responsible for maintaining a coherent publisher UX while also maintaining compatibility with multiple external systems that you don't directly control. Cheers, Nick. [1] There were ~35k distinct publisher accounts on PyPI when Donald last checked in August: https://github.com/pypa/warehouse/issues/1428 -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Thomas Güttler

9 Nov 9 Nov

1:17 p.m.

New subject: Trust Management ....

Am 05.11.2016 um 07:29 schrieb Nick Coghlan:

...

On 4 November 2016 at 06:07, Nathaniel Smith wrote:

...
I think we're drifting pretty far off topic here... IIRC the original discussion was about whether the travis-ci infrastructure could be suborned to provide an sdist->wheel autobuilding service for pypi. (Answer: maybe, though it would be pretty awkward, and no one seems to be jumping up to make it happen.)

The hard part of designing any such system isn't so much the building process, it's the authentication, authorisation and trust management for the release publication step.

Yes, trust management is very hard. I think it can't be solved, and never will. Things are different if you run a build server in your own LAN (self hosting). Regards, Thomas Güttler -- http://www.thomas-guettler.de/

2719

Age (days ago)

2772

Last active (days ago)

List overview

Download

68 comments

15 participants

participants (15)

Alex Grönholm
Barry Warsaw
Chris Barker
Donald Stufft
Glyph Lefkowitz
Ian Cordasco
Jeremy Stanley
Matthew Brett
Nathaniel Smith
Nick Coghlan
Paul Moore
Ralf Gommers
Robert Collins
Thomas Güttler
Wes Turner