[Distutils] PEP 426 moved back to Draft status

Nathaniel Smith njs at pobox.com
Tue Mar 14 12:05:23 EDT 2017

On Tue, Mar 14, 2017 at 12:34 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 14 March 2017 at 09:41, Nathaniel Smith <njs at pobox.com> wrote:
>> On Fri, Mar 10, 2017 at 7:55 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> > On 11 March 2017 at 00:52, Nathaniel Smith <njs at pobox.com> wrote:
>> > There's a lot to be said for treating the file as immutable, and instead
>> > adding *other* metadata files as a component moves through the
>> > distribution
>> > process. If so, then it may actually be more appropriate to call the
>> > rendered file "pysdist.json", since it contains the sdist metadata
>> > specifically, rather than arbitrary distribution metadata.
>> I guess there are three possible kinds of build dependencies:
>> - those that are known statically
>> - those that are determined by running some code at sdist creation time
>> - those that are determined by running some code at build time
>> But all the examples I can think of fall into either bucket A (which
>> pyproject.toml handles), or bucket C (which pydist.json can't handle).
>> So it seems like the metadata here is either going to be redundant or
>> wrong?
> pyproject.toml only handles the bootstrapping dependencies for the build
> system itself, it *doesn't* necessarily include all the build dependencies,
> which may be in tool specific files (like setup_requires in setup.py) or
> otherwise added by the build system without and record of it in
> pyproject.toml. The build system knows the latter when it generates the
> sdist, and it means PyPI can extract and republish them without having to
> actually invoke the build system.

Currently there are cases where people use setup_requires for what's
actually static metadata, sure, but that's just because there hasn't
been any alternative.

The main actual *needs* are:
- static build dependencies
- dynamic build dependencies determined at build time

So it seems to me that we should encourage people to move static
dependencies into the static metadata (pyproject.toml), and when they
don't then we can treat them like build-time dependencies, which is a
problem we need to solve anyway.

Having special metadata for "sdist creation-time dependencies" strikes
me as papering over the needless complexity of the current system by
adding more complexity on top. I can see how it'd have some short-term
benefits but it seems net-harmful in the long run IMHO.

(If we need a hack to cover the transition period from
secretly-static-setup_requires to actually-static-pyproject.toml,
maybe we could teach the setuptools sdist command to push
setup_requires into pyproject.toml? That'd be a pretty simple hack
that wouldn't increase the surface area of our interoperability

>> I'm not sure I understand the motivation for wanting wheels to have a
>> file which says "here's the metadata describing the sdist that you
>> would have, if you had an sdist (which you don't)"? I guess it doesn't
>> hurt anything, but it seems odd.
> Wheels still have a corresponding source artifact, even if it hasn't been
> published anywhere using the Python-specific sdist format. Accordingly, I
> don't think it makes sense to be able to tell just from looking at a wheel
> file whether the generation process was:
> * tree -> sdist -> wheel; or
> * tree -> wheel

My point is just that usually if I'm looking at artifact A, I don't
care about metadata about artifact B :-). Suppose someone has one of
these wheels with an sdist.json in it. My question is, under what
circumstances are you imagining that they'd look at that sdist.json?
What would they do with it?

The only case I can think of is for provenance tracking of various
kinds, but I don't think just throwing in the sdist metadata is a very
good solution to that. If we want source->binary provenance tracking
then I'd rather see something focused on that problem, like wheel
metadata fields Sdist-SHA256, Build-Host, Build-Time, etc. This isn't
what sdist metadata is designed for, so to the extent that it would
help solve the problem it's by accident, incomplete.

>> > I'd also be fairly strongly opposed to converting extras from an
>> > optional
>> > dependency management system to a "let multiple PyPI packages target the
>> > same site-packages subdirectory" because we already know that's a
>> > nightmare
>> > from the Linux distro experience (having a clear "main" package that
>> > owns
>> > the parent directory with optional subpackages solves *some* of the
>> > problems, but my main reaction is still "Run awaaay").
>> The "let multiple PyPI packages target the same site-packages
>> directory" problem is orthogonal to the reified extras proposal. I
>> actually think we can't avoid handling the same site-packages
>> directory problem, but the solution is namespace packages and/or
>> better Conflicts: metadata.
>> Example illustrating why the site-packages conflict problem arises
>> independently of reified extras: people want to distribute numpy built
>> against different BLAS backends, especially MKL (which is good but
>> zero-cost proprietary) versus OpenBLAS (which is not as good but is
>> free). Right now that's possible by distributing 'numpy' and
>> 'numpy-mkl' packages, but of course ugly stuff happens if you try to
>> install both; some sort of Conflicts: metadata would help. If we
>> instead have the packages be named 'numpy' and 'numpy[mkl]', then
>> they're in exactly the same position with respect to conflicts. The
>> very significant advantage is that we know that 'numpy[mkl]' "belongs
>> to" the numpy project, so 'numpy[mkl]' can say 'Provides-Dist: numpy'
>> without all the security issues that Provides-Dist otherwise runs
>> into.
> Do other components need to be rebuilt or relinked if the NumPy BLAS backend
> changes?
> If the answer is yes, then this is something I'd strongly prefer to leave to
> conda and other package management systems like Nix that better support
> parallel installation of multiple versions of C/C++ dependencies.
> If the answer is no, then it seems like a better solution might be to allow
> for rich dependencies, where numpy could depend on "_numpy_backends.openblas
> or _numpy_backends.mkl" and figure out the details of exactly what's
> available and which one it's going to use at import time.

The answer is no, and it's unlikely that numpy will massively rewrite
its internals because pip is missing a feature that every other
packaging system has.

> Either way, contorting the Extras system to try to cover such a
> significantly different set of needs doesn't seem like a good idea.

The advantage of the "reified extras" idea is that it actually
*removes* features and complexity while *also* solving a bunch of
problems that are intractable today. So from my point of view, it's
the status quo that's contorted :-).

>> Example illustrating why reifed extras are useful totally
>> independently of site-packages conflicts: it would be REALLY NICE if
>> numpy could say 'Provides-Dist: numpy[abi=7]' and then packages could
>> depend on 'numpy[abi=7]' and have that do something sensible. This
>> would be a pure virtual package.
> PEP 459 has a whole separate "python.constraints" extension rather than
> trying to cover environmental constraints within the existing Extras system:
> https://www.python.org/dev/peps/pep-0459/#the-python-constraints-extension

I feel like this is the old argument between whether the best way to
handle a complex problem space is with a complex solution, or with
several simple solutions that can be composed. We can't even get a
dependency resolver that handles simple dist-to-dist dependencies, and
you want to add a whole second kind of constraints with its own
semantics? (Or really third kind, b/c extras are already a second kind
once we start tracking them properly.)


Nathaniel J. Smith -- https://vorpus.org

More information about the Distutils-SIG mailing list