[Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

Chris Barker chris.barker at noaa.gov
Sat May 16 22:18:36 CEST 2015


On Sat, May 16, 2015 at 10:12 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> > Maybe, but it's a problem to be solved, and the Linux distros more or
> less
> > solve it for us, but OS-X and Windows have no such system built in (OS-X
> > does have Brew and macports....)
>
> Windows 10 has Chocalatey and OneGet:
>
> * https://chocolatey.org/
> *
> http://blogs.msdn.com/b/garretts/archive/2015/01/27/oneget-and-the-windows-10-preview.aspx
>

cool -- though I don't think we want the "official" python to depend on a
third party system, and one get won't be available for most users for a
LONG time...

The fact that OS-X users have to choose between fink, macport, homebrew or
roll-your-own is a MAJOR soruce of pain for supporting the OS-X community.
"More than one way to do it" is not the goal.

conda and nix then fill the niche for language independent packaging
> at the user level rather than the system level.
>

yup -- conda is, indeed, pretty cool.

   > I think there is a bit of fuzz here -- cPython, at least, uses the "the

> > operating system provided C/C++
> > dynamic linking system" -- it's not a totally independent thing.
>
> I'm specifically referring to the *declaration* of dependencies here.
>

sure -- that's my point about the current "missing link" -- setuptools,
pip, etc, can only declare python-package-level dependencies, not
binary-level dependencies.

My idea is to bundle up a shared lib in a python package -- then, if you
declare a dependency on that package, you've handles the dep issue. The
trick is that a particular binary wheel depends on that other binary wheel
-- rather than the whole package depending on it. (that is, on linux, it
would have no dependency, on OS-X it would -- but then only the wheel built
for a non-macports build, etc....).

I think we could hack around this by monkey-patching the wheel after it is
built, so may be worth playing with to see how it works before proposing
any changes to the ecosystem.

> And if you are using something like conda you don't need pip
>
> or wheels anyway!
>
> Correct, just as if you're relying solely on Linux system packages,
> you don't need pip or wheels. Aside from the fact that conda is
> cross-platform, the main difference between the conda community and a
> Linux distro is in the *kind* of software we're likely to have already
> done the integration work for.
>

sure. but the cross-platform thing is BIG -- we NEED pip and wheel because
rpm, or deb, or ... are all platform and distro dependent -- we want a way
for package maintainers to support a broad audience without having to deal
with 12 different package systems.

The key to understanding the difference in the respective roles of pip
> and conda is realising that there are *two* basic distribution
> scenarios that we want to be able to cover (I go into this in more
> detail in
> https://www.python.org/dev/peps/pep-0426/#development-distribution-and-deployment-of-python-software
> ):
>

hmm -- sure, they are different, but is it impossible to support both with
one system?


> * software developer/publisher -> software integrator/service operator
> (or data analyst)
> * software developer/publisher -> software integrator -> service
> operator (or data analyst)
>
...

> On the consumption side, though, the nature of the PyPA tooling as a
> platform-independent software publication toolchain means that if you
> want to consume the PyPA formats directly, you need to be prepared to
> do your own integration work.


Exactly! and while Linux system admins can do their own system integration
work, everyday users (and many Windows sys admins) can't, and we shouldn't
expect them to.

And, in fact, the PyPA tooling does support the more casual user much of
the time -- for example, I'm in the third quarter of a Python certification
class -- Intro, Web development, Advanced topics -- and only half way
through the third class have I run into any problems with sticking with the
PyPA tools.

(except for pychecker -- not being on Pypi :-( )

Many public web service developers are
> entirely happy with that deal, but most system administrators and data
> analysts trying to deal with components written in multiple
> programming languages aren't.
>

exactly -- but it's not because the audience is different in their role --
it's because different users need different python packages. The PyPA tools
support pure-python great -- and compiled extensions without deps pretty
well -- but there is a bit of gap with extensions that require other deps.

It's a 90% (95%) solution... It'd be nice to get it to a 99% solution.

Where is really gets ugly is where you need stuff that has nothing to do
with python -- say a Julia run-time, or ...

Anaconda is there to support that: their philosophy is that if you are
trying to do full-on data analysis with python, you are likely to need
stuff strickly beyond the python ecosystem -- your own Fortran code, numpy
(which requires LLVM), etc.

Maybe they are right -- but there is still a heck of a lot of stuff that
you can do and stay within python, and it would be good if it was easier
for web developers to use a bit of numpy, or matplotlib, or pandas in their
web apps -- without having to jump to the "scipy stack" ecosystem (which
does not support the web dev stuff that well yet...

If you look at those pipelines from the service operator/data analyst
> end, then the *first* question to ask is "Is there a software
> integrator that targets the audience I am a member of?".


I think that's part of my point here -- I bridge two communities -- the
scientific community says: just use Anaconda or Canopy or ...., but the web
developer community says "use python.org, pip, and pypi". If you need to
both, there is a gap.


>  When it isn't, it's either
> a sign that those of us in the "software integrator" role aren't
> meeting the needs of our audience adequately,


sure -- but where does PyPA fit in here -- having binary wheels and pypi
puts us in teh role of integator -- and we aren't meeting the needs of a
broad enough audience as we could -- that's my point there.

If we didn't want to be an "integrator", we could have not build pypi, or
pip, or wheel.... conda, rpm, macports, etc doesn't need those.

I think PyPA tools could meet a braoder need with not much fudging. In some
sense,the only question I have at this point  is whether there is a
compelling reason to better support dynamic libs -- if not, then, as Paul
pointed out, all we need is a more coordinated community effort (not easy,
but not a tooling question)

> option (2) would be to extend python's import mechanism a bit to allow it
> to
> > do a raw "link in this arbitrary lib" action, so the lib would not have
> to
> > be wrapped in a python module -- I don't know how possible that is, or
> if it
> > would be worth it.
>
> Your option 2 is specifically the kind of thing I don't want to
> support, as it's incredibly hard to do right (to the tune of "people
> will pay you millions of dollars a year to reduce-or-eliminate their
> ABI compatibility concerns"), and has the potential to replace the
> current you-need-to-be-able-build-this-from-source-yourself issue with
> "oh, look, now you have a runtime ABI incompatibility, have fun

debugging that one, buddy".
>

fair enough -- that could be a pretty ugly nightmare.

Your option 1 seems somewhat more plausible, as I believe it should
> theoretically be possible to use the PyCObject/PyCapsule API (or even
> just normal Python objects) to pass the relevant shared library
> details from a "master" module that determines which versions of
> external libraries to link against, to other modules that always want
> to load them, in a way that ensures everything is linking against a
> version that it is ABI compatible with.
>
> That would require someone to actually work on the necessary tooling
> to help with that though, as you wouldn't be able to rely on the
> implicit dynamic linking provided by C/C++ toolchains any more.
> Probably the best positioned to tackle that idea would be the Cython
> community, since they could generate all the required cross-platform
> boilerplate code automatically.
>

good idea -- I'm tied in with those folks -- if I have to do any C stuff I
turn to Cython already...


> > sure, but if it's ALSO a Python package manger, then why not? i.e. conda
> --
> > if we all used conda, we wouldn't need pip+wheel.
>
> conda's not a Python package manager, it's a language independent
> package manager that was born out of the Scientific Python community
> and includes Python as one of its supported languages, just like nix,
> deb, rpm, etc.
>

indeed -- but it does have a bunch of python-specific features....it was
built around the need to combine python with other systems.

That makes it an interesting alternative to pip on the package
> *consumption* side for data analysts, but it isn't currently a good
> fit for any of pip's other use cases (e.g. one of the scenarios I'm
> personally most interested in is that pip is now part of the
> Fedora/RHEL/CentOS build pipeline for Python based RPM packages - we
> universally recommend using "pip install" in the %install phase over
> using "setup.py install" directly)
>

hmm -- conda generally uses "setup.py install" in its build scripts. And it
doesn't use pip install because it wants to handle the downloading and
dependencies itself (in fact, turning OFF setuptools dependency handling is
an annoyance..)

So I'm not sure why pip is needed here -- would it be THAT much harder to
build rpms of python packages if it didn't exist? (I do see why you
wouldn't want to use conda to build rpms..)

But while _maybe_ if conda had been around 5 years earlier we could have
not bothered with wheel, I'm not proposing that we drop it -- just that we
push pip and wheel a bit farther to broaden the supported user-base.


> Binary wheels already work for Python packages that have been
> developed with cross-platform maintainability and deployability taken
> into account as key design considerations (including pure Python
> wheels, where the binary format just serves as an installation
> accelerator). That category just happens to exclude almost all
> research and data analysis software, because it excludes the libraries
> at the bottom of that stack


It doesn't quite exclude those -- just makes it harder. And while depending
on Fortran, etc, is pretty unique to the data analysis stack, stuff like
libpng, libcurl, etc, etc, isn't -- non-system libs are not a rare thing.


> It's also the case that when you *are* doing your own system
> integration, wheels are a powerful tool for caching builds,


conda does this nicely as well  :-) I"m not tlrying to argue, at all, that
binary wheels are useless, jsu that they could be a bit more useful.

> Ah -- here is a key point -- because of that, we DO support binary
> packages
>
> on PyPi -- but only for Windows and OS-X.. I'm just suggesting we find a
> way
> > to extend that to pacakges that require a non-system non-python
> dependency.
>
> At the point you're managing arbitrary external binary dependencies,
> you've lost all the constraints that let us get away with doing this
> for extension modules without adequate metadata, and are back to
> trying to solve the same arbitrary ABI problem that exists on Linux.
>

I still don't get that -- any binary extension needs to match the ABI of
the python is it used with -- a shared lib is the same problem.

The line is drawn at ABI compatibility management. We're able to fuzz
> that line a little bit in the case of Windows and Mac OS X extension
> modules because we have the python.org CPython releases to act as an
> anchor for the ABI definition.
>
> We don't have that at all on other *nix platforms, and we don't have
> it on Windows and Mac OS X either once we move beyond the CPython C
> ABI (which encompasses the underlying platform ABI)
>

Showing my ignorance here -- what else is there we want to support (fortran
ABI maybe?)

We *might* be able to get to the point of being able to describe
> platform ABIs well enough to allow public wheels for arbitrary
> platforms,


That would be cool -- but not what I'm talking about here. I'm only talking
about the ABIs we already describe.

>
> > Are those the targets for binary wheels? I don't think so.
>
> Yes, they'll likely end up being one of Fedora's targets for prebuilt
> wheel files:
> https://fedoraproject.org/wiki/Env_and_Stacks/Projects/UserLevelPackageManagement
>

cool -- but it is Fedora that will be building those wheels -- so a systems
integrator.

> but if you statically link, you need to build the static package right

> > anyway -- so it doesn't actually solve the problem at hand anyway.
>
> Yes it does - you just need to make sure your build environment
> suitably matches your target deployment environment.
>

"just?" -- that's can actually be a major pain -- at least on OS-X.

> So what would be good is a way to specify a "this build" dependency. That
> > can be hacked in, of course, but nicer not to have to.
>
> By the time you've solved all these problems I believe you'll find you
> have reinvented conda ;)
>

I really do have less lofty goals than that, but yes -- no point in going
down that route!

Anyway -- I've take a lot of my time (and a bunch of others on this list).
And where ai think we are at is:

* No one else seems to think it's worth trying to extend the PyPa ecosystem
a bit more to better support dynamic libs. (except _maybe_ Enthought?)

* I still think it can be done with minimal changes, and hacked in to do
the proof of concept

* But I'm not sure it's something that's going to get to the top of my ToDo
list anyway -- I can get my needs met with conda anyway. My real production
work is deep in the SciPy stack.

* So I may or may not move my ideas forward -- if I do, I'll be back with
questions and maybe a more concrete proposal some day....

But I learned a lot from this conversation -- thanks!

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20150516/76d0f191/attachment-0001.html>


More information about the Distutils-SIG mailing list