[Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

Wes Turner wes.turner at gmail.com
Mon Oct 12 15:12:46 CEST 2015


On Oct 11, 2015 11:07 PM, "Robert Collins" <robertc at robertcollins.net>
wrote:
>
> EWOW, huge thread.
>
> I've read nearly all of it but in order not to make it massively
> worse, I'm going to reply to all the points I think need raising in
> one mail :).
>
> Top level thoughts here, more point fashion with only rough editing
> below the fold.
>
> I realise many things - like the issue between different wheels of the
> same package consuming different numpy abis - have been touched on,
> but AFAICT they are entirely orthogonal to the proposal, which was to
> solve 'be able to use arbitrary build systems and still install with
> pip'.
>
> Of the actual problems with using arbitrary build systems, 99% of them
> seem to boil down to 'setup-requires isn't introspectable by pip
> (https://github.com/pypa/pip/issues/1820 ). - If it was, then
> alternative build systems could be depended on reasonably; and the
> mooted thunk from setuptools CLI to arbitrary build system would be
> viable.
>
> It is, in principle a matter of one patch to teach pip *a* way to do
> this (and then any and all build systems that want to can utilise it).
> https://github.com/rbtcollins/pip/tree/declarative is a POC I did - my
> next steps on that were to discuss the right ecosystem stuff for it -
> e.g. should pip consume it via setuptools, or should pip support it as
> *the way* and other systems including setuptools can choose to use it?

as a standard RDF graph representation,
JSON-LD would be uniquely portable here.

"PEP 426: Define a JSON-LD context as part of the proposal"
https://github.com/pypa/interoperability-peps/issues/31

>
> A related but separate thing is being able to *exclusively* install
> things without setuptools present - I've filed
> https://github.com/pypa/pip/issues/3175 about that, but I think its
> -much- lower priority than reliably enabling third party build tools.

peep may not need setuptools?
* SHA256
* --no-deps
* https://pypi.python.org/pypi/peep
* wheels

>
> -Rob
>
> ----
>
>
> "
> solved many of the hard problems here -- e.g. it's no longer necessary
> that a build system also know about every possible installation
> configuration -- so pretty much all we really need from a build system
> is that it have some way to spit out standard-compliant wheels.
> "
>
> Actually pip still punts a *lot* here - we have bypasses to let things
> like C compiler flags be set during wheel build, and when thats done
> we don't cache the wheels (or even try to build wheels).
>
> "
> While ``distutils`` / ``setuptools`` have taken us a long way, they
> suffer from three serious problems: ...
> (c) you are forced to use them anyway, because they provide the
> standard interface for installing python packages expected by both
> users and installation tools like ``pip``."
>
> I don't understand the claim of (c) here - its entirely possible to
> write a package that doesn't use setuptools and have it do the right
> thing - pip uses a subprocess to drive package installation, and the
> interface is documented. The interface might be fugly as, but it
> exists and works. It is missing setup-requires handling, but so is
> setup.py itself. The only thing we'd really need to do AFAICT is make
> our setuptools monkeypatching thunk handle setuptools not being
> installed (which would be a sensible thing to Just Do anyhow).
>
> "
> - query for build dependencies
> - run a build, producing wheels as output
> - set up the current source tree so that it can be placed on
>   ``sys.path`` in "develop mode"
> "
>
> So we have that already. setup.py egg-info, setup.py bdist_wheel,
> setup.py develop.
>
> "A version 1-or-greater format source tree can be identified by the
> presence of a file ``_pypackage/_pypackage.cfg``.
> "
>
> I really don't like this. Its going to be with us forever, and its
> intrusive (its visible), and so far isn't shown to be fixing anything.
>
>
> "to scatter files around willy-nilly never works, so we adopt the
> convention that names starting with an underscore are reserved for
> official use, and non-underscored names are available for
> idiosyncratic use by individual projects."
>
> I can see the motivation here, but is it really solving a problem we have?
>
>
> On the specifics of the format: I don't want to kibbitz over strawman
> aspects at this point.
>
> Having the extension mechanism be both pip specific and in Python
> means that we're going to face significant adoption issues: the former
> because pip is not by any means the only thing around - and some
> distros have until very recently been actively hostile to pip (which
> in turn means we need to wait a decade or two for them to age-out and
> stop being used). The latter because we'll face all the headaches of
> running arbitrary untrusted code and dealing with two deps with
> different versions of the same hook and so on: I think its an
> intrinsically unsafe design.
>
> @dstufft "problem with numpy.distutils, as I know you’re aware!). We
> could do a minimal extension and add another defacto-ish standard of
> allowing pip and setuptools to process additional setup_requires like
> arguments from a setup.cfg to solve that problem though. The flip side
> to this is that since it involves new capabilities in
> pip/setuptools/any other installer is that it you’ll have several
> years until you can depend on setup.cfg based setup_requires from
> being able to be depended on.
> "
>
> Well. For *any* proposal that involves modifying pip, we have to
> assume that all existing things keep working, and that anyone wanting
> to utilise the new thing will have to either a) include a local
> compatibility thunk, or b) error when being used from a too-old
> toolchain. I don't think that should really be a factor in design
> since its intrinsic to the quagmire.
>
> "Longer term, I think the answer is sdist 2.0 which has proper
> metadata inside of it (name, version, dependencies, etc) but which
> also includes a hook like this PEP has to specify the build system
> that should be used to build a wheel out of this source distribution."

a composed JSON-LD document indicating provenance (who, what, when) for
each part of the build chain [VCS archive, egg-info, sdist, wheel, bdist]

pydist.jsonld?

>
> Any reason that can't just be setup.cfg ?
>
> @Daniel "I thought Robert Collins had a working setup-requires
> implementation already? I have a worse but backwards compatible one
> too at https://bitbucket.org/dholth/setup-requires/src/tip/setup.py" -
> https://github.com/rbtcollins/pip/tree/declarative - I'll be updating
> that probably early next year at this rate - after issue-988 anyhow.
> The issue with your approach is that pip doesn't handle having
> concurrent installs done well - and in fact it will end up locking its
> environment somehow.
>
> @Paul "
> I can understand that a binary wheel may need a certain set of
> libraries installed - but that's about the platform tags that are part
> of the wheel definition, not about dependencies. Platform tags are an
> ongoing discussion, and a good example of a partial solution that" -
> thats where the draft PEP tennessee and I start is aimed - at making
> those libraries be metadata, not platform tags.
>
> @Chris "
> A given package might depend on numpy, as you say, and it may work
> with all numpy versions 1.6 to 1.9. Fine, so we specify that in
> install_requires. And this shodl be the dependency in the sdist, too.
> If the package is pur python, this is fine and done.
>
> But if the package has some extensions code that used the numpy C API
> ( a very common occurrence), then when it is built, it will only work
> (reliably) with the version of numpy it was built with.
>
> So the project itself, and the sdist depend on numpy >=1.6, but a
> build binary wheel depends on numpy == 1.7 (for instance).
>
> Which requires a binary (wheel) dependency that is somewhat different
> than the source dependency.
> " - so yes, that is where bdist_wheel should be creating different
> metadata for that wheel. The issue that arises is that we need unique
> file names so that they can coexist on PyPI or local archives - which
> is where wheel tags come in. I'd be in favour of not using semantic
> tags for this - rather hash the deps or something and just make a
> unique file name. Use actual metadata for metadata.
>
> @Nathaniel "I know that one unpleasant aspect of the current design is
that the
> split between egg-info and actual building creates the possibility for
> time-of-definition-to-time-of-use bugs, where the final wheel
> hopefully matches what egg-info said it would, but in practice there
> could be skew. (Of course this is true in any system which represents"
> - actually see https://bugs.launchpad.net/pbr/+bug/1502692 for a bug
> where this 'skew' is desirable: for older environments we want
> tailored deps with no markers, for anything supporting markers we want
> them - so the wheel will have markers and egg_info won't.
>
> @Nathaniel "
> (Part of the intuition for the last part is that we also have a
> not-terribly-secret-conspiracy here for writing a PEP to get Linux
> wheels onto PyPI and at least achieve feature parity with Windows / OS
> X. Obviously there will always be weird platforms -- iOS and FreeBSD
> and Linux-without-glibc and ... -- but this should dramatically reduce
> the frequency with which people need sdist dependencies.)" - I think a
> distinction between sdist and binary names for dependencies would be a
> terrible mistake. It will raise complexity for reasoning and
> describing things without solving any concrete problem that I can see.
>
> @Nathaniel "I guess to make progress in this conversation I need some
> more detailed explanations. I totally get that there's a long history
> of thought and conversations behind the various assertions here like
> "a sdist is fundamentally different from a VCS checkout", "there must
> be a 1-1 mapping between sdists and wheels", "pip needs sdists that
> have full wheel metadata in static form", and I'm barging in from the
> outside with no context, but I literally have no idea why the specific
> design features you're asking for are desirable or even viable. Right
> now if I were to try and write the PEP you're asking for, then the
> rationale section would just be "because Donald said so" over and over
> :-). I couldn't write the motivation section, because I don't know any
> problems that the PEP you're describing would fix for me as a package
> author (which doesn't mean they don't exist, but!)." -- VCS trees are
> (generally) by-humans for humans. They are the primary source of data
> and can do thinks like inferring versions from commit data. sdists are
> derived from the VCS tree and can include extra data (such as
> statically defined version data). Wheels are derived from a tree on
> disk and can (today) be built from either VCS trees or sdists. I'm not
> sure that forcing an sdist step is beneficial - the egg-info step we
> have today is basically that without the cost of compressing and
> decompressing potentially large trees for no reason.
>
> @Jeremy "An sdist is an installable package which just happens to _look_ a
> lot like a source release tarball, but trying to pretend that
> downstream packagers will want to use it as such leads to a variety
> of pain points in the upstream/downstream relationship. For better
> or worse a lot of distros don't want generated files in upstream
> source code releases, since they need to confirm that they also ship
> the necessary tooling to regenerate any required files and that the
> generated files they ship match what their packaged tooling
> produces." - Well, pbr doesn't work if you just tar up or git export
> your VCS tree: it requires the chance to add metadata. And while
> distros have whinged about pbr in a number of contexts, that hasn't
> been one so far. Downstreams are pretty used to receiving tarballs
> with generated files in them - as long as they *have the option* to
> recreate those, so the source material isn't lost. [And for version
> data, 'grab from git' is a valid answer there']. OTOH perhaps
> ftpmaster just hasn't noticed and we're about to get a bug report ;)

another interesting use case for [not-] pip:
https://github.com/mitsuhiko/pipsi

> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20151012/9a07af61/attachment-0001.html>


More information about the Distutils-SIG mailing list