[Distutils] Current Python packaging status (from my point of view)

Wed Nov 2 12:49:30 EDT 2016

On 3 November 2016 at 01:54, Chris Barker <chris.barker at noaa.gov> wrote:
> On Wed, Nov 2, 2016 at 7:32 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>> > He mentioned that conda allows you to
>> > manage the python run-time itself, which is in deed a nice feature, but
>> > getting a python run-time as never been the hard part (maybe on Linux if
>> > you
>> > want a different one than your system supplies).
>>
>> I didn't miss it accidentally, I left it out because it wasn't
>> relevant to the post (which was about the ecosystem design direction,
>
> I would argue that is is quite relevant -- talking about design decisions
> without talking about the motivations and consequences of those decisions is
> missing much of the point.

No, as the post was about the fundamental and irreconcilable
differences in capabilities, not the incidental ones that can be
solved if folks choose (or are paid) to put in the necessary design
and development time.

> The issue here is that folks might read that post and think: "do I want to
> manage my python install or not?" and think that's the only question they
> need to ask to determine if pip or conda is right for them. But it's not at
> all.

The post isn't written for beginners deciding which tool to use, it's
written for intermediate folks that have already chosen one or the
other for their own needs, and are wondering why the other still
exists (or, worse, are telling people that have chosen the other tool
for good reasons that they're wrong, and should switch).

> pip is about managing stuff WITHIN python -- that's why it can work with any
> conforming python install. So that's the advantage of this design decision.
> But it leads to a major limitation, EVEN if you only care about python,
> because it can't (properly) manage the stuff outside of python that python
> packages may need. I honestly have no idea if the original motivation was
> specifically to have a system that could work with any python install
> (maybe), but it certainly was designed specifically to handle python
> packages, period.

Aside from already needing a Python runtime, the inability to fully
specify the target environment isn't an inherent design limitation
though, the solution just looks different at a pip level:

- you need a system for specifying environmental *constraints* (like
dynamically linked C libraries and command line applications you
invoke)
- you need a system for asking the host environment if it can satisfy
those constraints

Tennessee Leeuwenberg started a draft PEP for that first part last
year: https://github.com/pypa/interoperability-peps/pull/30/files

dnf/yum, apt, brew, conda, et al all *work around* the current lack of
such a system by asking humans (aka "downstream package maintainers")
to supply the additional information by hand in a platform specific
format.

> conda started with the motivation of managing complex non-python
> dependencies (initially, to support python) -- in order to do that
> effectively, it has to operate outside the  python envelope, and and that
> means that it really needs to manage python itself. I'm pretty sure that
> managing python itself was a consequence of the design goals, not a primary
> design goal.

Correct, just as managing Python runtimes specifically isn't a primary
goal of Linux distro package managers, it's just an artifact of
providing a language independent dependency declaration and management
system.

>> not the end user features that make the desire to use pip
>> incomprehensible to a lot of folks).
>>
>> Designing software assembly tools for interpreted software is a system
>> integrator's game, and when people are writing Python code, there is
>> one absolutely 100% unavoidable integration task: choosing which
>> Python runtimes they're going to support.
>
> hmm -- I don't think that's the code-writers job -- it's the deployers job.
> Other than choosing which python *version* I want to use, I can happily
> develop with system python and pip, and then deploy with conda -- or vice
> versa. INdeed, I can develop on Windows and deploy on LInux, or....

You still need to decide which versions you're going to test against,
and which bug reports you're going to accept as potentially valid
feedback (e.g. very few people running upstream community projects
will accept "doesn't run on Python 2.5" as a valid bug report any
more, and RHEL/CentOS 7, Software Collections, and conda have been
around for long enough now that most won't accept "doesn't run on 2.6"
either)

> though if you meant pypy vs iron python vs cPython when you meant "runtime"
> then yes, with the dependency issue, you really do need to make that choice
> upfront.

I also mean 2.6 vs 2.7 vs 3.4 vs 3.5 vs 3.6, etc

> But the different audiences aren't "data science folks" vs "web developers"
> -- the different audiences are determined by deployment needs, not domain.

Deployment needs are strongly correlated with domain though, and
there's a world of difference between the way folks do exploratory
data analysis and the way production apps are managed in
Heroku/OpenShift/Cloud Foundry/Lambda/etc.

You can certainly use conda to do production web service deployments,
and if you're still deploying to bare metal or full VMs, or building
your own container images from scratch, it's a decent option to
consider. However, if you're specifically interested in web service
development, then swapping in your own Python runtime rather than just
using a PaaS provided one is really much lower level than most
beginners are going to want to be worrying about these days - getting
opinionated about that kind of thing comes later (if it happens at
all).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia