[Distutils] Current Python packaging status (from my point of view)

Nick Coghlan ncoghlan at gmail.com
Thu Nov 3 03:57:08 EDT 2016


On 3 November 2016 at 04:39, Chris Barker <chris.barker at noaa.gov> wrote:
> On Wed, Nov 2, 2016 at 9:49 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> - you need a system for specifying environmental *constraints* (like
>> dynamically linked C libraries and command line applications you
>> invoke)
>> - you need a system for asking the host environment if it can satisfy
>> those constraints
>
> and it it can't -- you're then done -- that's actually the easy part (and
> happens already and build or run time, yes?):

When it comes to the kinds of environmental constraints most Python
applications (even scientific ones) impose, conda, dnf/yum, and apt
can meet them.

If you're consistently running (for example) the Fedora Scientific Lab
as your Linux distro of choice, and aren't worried about other
platforms, then you don't need and probably don't want conda - your
preferred binary dependency management community is the Fedora one,
not the conda one, and your preferred app isolation technology will be
Linux containers, not conda environments. Within the containers, if
you use virtualenv at all, it will be with the system site-packages
enabled.

> I try to build libgdal, it'll fail if I don't have that huge pile of
> dependencies installed.
>
> I try to run a wheel someone else built -- it'll also fail.
>
> It'd be better if this could be hashed out a compilation or linking error,
> sure, but that's not goign to do a whole lot except make the error messages
> nicer :-)

This is why fixing this in the general case requires the ability for
pip to ask the host platform to install additional components.

>> dnf/yum, apt, brew, conda, et al all *work around* the current lack of
>> such a system by asking humans (aka "downstream package maintainers")
>> to supply the additional information by hand in a platform specific
>> format.
>
> if conda is a "platform", then yes. but in that sense pip is a platform,
> too.

No, it's not, as it still wouldn't have the ability to install
external dependencies itself, only the ability to request them from
the host platform.

You'd still need a plugin interacting with something like conda or
PackageKit underneath to actually do the installation (you'd want
something like PackageKit rather than raw dnf/yum/apt on Linux, as
PackageKit is what powers features like PackageKit-command-not-found,
which lets you install CLI commands from system repos


> I'll beat this drum again -- if you want to extend pip to solve all (most)
> of the problems conda solves, then you are re-inventing conda.

I don't. I want publishers of Python packages to be able to express
their external dependencies in a platform independent way such that
the following processes can be almost completely and reliably
automated:

* converting PyPI packages to conda packages
* converting PyPI packages to RPMs
* converting PyPI packages to deb packages
* requesting external dependencies from the host system (whether
that's conda, a Linux distro, or something else) when installing
Python packages with pip

> If someone
> doesn't like the conda design, and has better ideas, great, but only
> re-invent the wheel if you really are going to make a better wheel.

It wouldn't be conda, because it still wouldn't be give you a way to
publish Python runtimes and other external dependencies - only a way
to specify what you need when installed so that pip can ask the host
environment to provide it (and fail noisily if it says it can't
oblige) and so format converters like "conda skeleton" and "pyp2rpm"
can automatically emit the correct external dependencies.

> However, I haven't thought about it carefully -- maybe it would be possible
> to have a system than managed everything except python itself. But that
> would be difficult, and I don't see the point, except to satisfy brain dead
> IT security folks :-)

The IT security folks aren't brain dead, they have obligations to
manage organisational risk, and that means keeping some level of
control over what's being run with privileged access to network
resources (and in most organisations running "inside the firewall"
still implicitly grants some level of privileged information access)

Beyond that "because IT said so" case, we'd kinda like Fedora system
components to be running in the Fedora system Python (and ditto for
other Linux distributions), while folks running in
Heroku/OpenShift/Lambda should really be using the platform provided
runtimes so they benefit from automated security updates.

More esoterically, Python interpreter implementers and other folks
need a way to obtain Python packages that isn't contingent on the
interpreter being published in any particular format, since it may not
be published at all. Victor Stinner's "performance" benchmark suite is
an example of that, since it uses pip and virtualenv to compare the
performance of different Python interpreters across a common set of
benchmarks: https://github.com/python/performance

>> > But the different audiences aren't "data science folks" vs "web
>> > developers"
>> > -- the different audiences are determined by deployment needs, not
>> > domain.
>>
>> Deployment needs are strongly correlated with domain though, and
>> there's a world of difference between the way folks do exploratory
>> data analysis and the way production apps are managed in
>> Heroku/OpenShift/Cloud Foundry/Lambda/etc.
>
> sigh. not everyone that uses the complex scipy stack is doing "exploratory
> data analysis" -- a lot of us are building production software, much of it
> behind web services...
>
> and that's what I mean by deployment needs.

Sure, and folks building Linux desktop apps aren't web developers
either. Describing them that way is just shorthand caricatures that
each attempt to encompass broad swathes of a complex ecosystem.

It makes a *lot* of sense for folks using conda in some contexts (e.g.
exploratory data analysis) to expand that to using conda everywhere.
It also makes a lot of sense for folks that don't have a preference
yet to start with conda for their own personal use.
However, it *doesn't* make sense for folks to introduce conda into
their infrastructure just for the sake of using conda.

Instead, for folks in the first category and the last category, the
key question they have to ask themselves is "Do I want to let the
conda packaging community determine which Python runtimes are going to
be available for me to use?". In the first case, there's a strong
consistency argument in favour of doing so. However, in the last case,
it really depends greatly on what you're doing, what your team
prefers, and what kind of infrastructure setup you have.

>>  However, if you're specifically interested in web service
>> development, then swapping in your own Python runtime rather than just
>> using a PaaS provided one is really much lower level than most
>> beginners are going to want to be worrying about these days - getting
>> opinionated about that kind of thing comes later (if it happens at
>> all).
>
> it's not a matter of opinion, but needs -- if this "beginner" is doing stuff
> only with pure-python packages, then great -- there are many easy options.
> But if they need some other stuff - maybe this beginner needs to work with
> scientific data sets.. then they're dead in the water with a Platform that
> doesn't support what they need.

Yep, and if that's the case, hopefully someone directs them towards a
platform with that feature set, like conda, or Scientific Linux or the
Fedora Scientific Lab, rather than a vanilla Python distro.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Distutils-SIG mailing list