[Distutils] Installed Extras Metadata

Nathaniel Smith njs at pobox.com
Sun Feb 11 04:17:33 EST 2018

On Fri, Jan 26, 2018 at 8:37 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 27 January 2018 at 13:46, Nathaniel Smith <njs at pobox.com> wrote:
>> The advantages are:
>> - it's a simpler way to record information the information you want
>> here, without adding more special cases to dist-info: most code
>> doesn't even have to know what 'extras' are, just what packages are
>> - it opens the door to lots of more advanced features, like
>> 'foo[test]' being a package that actually contains foo's tests, or
>> build variants like 'numpy[mkl]' being numpy built against the MKL
>> library, or maybe making it possible to track which version of numpy's
>> ABI different packages use. (The latter two cases need some kind of
>> provides: support, which is impossible right now because we don't want
>> to allow random-other-package to say 'provides-dist: cryptography';
>> but, it would be okay if 'numpy[mkl]' said 'provides-dist: numpy',
>> because we know 'numpy[mkl]' and 'numpy' are maintained by the same
>> people.)
>> I know there's a lot of precedent for this kind of clever use of
>> metadata-only packages in Debian (e.g. search for "metapackages"), and
>> I guess the RPM world probably has similar tricks.
> While I agree with this idea in principle, I'll note that RPM makes it
> relatively straightforward to have a single SRPM emit multiple RPMs, so
> defining a metapackage is just a few extra lines in a spec file. (I'm not
> sure how Debian's metapackages work, but I believe they're similarly simple
> on the publisher's side).
> We don't currently have a comparable mechanism to readily allow a single
> source project to expand to multiple package index entries that all share a
> common sdist, but include different subsets in their respective wheel files
> (defining one would definitely be possible, it's just a tricky migration
> problem to work out).

Yeah, the migration is indeed the tricky part. Here's one possible approach.

First, figure out what exactly an "extra" should become in the new
system. I think it's: if package $PACKAGE version $VERSION defines an
extra $EXTRA, then that corresponds to a wheel named
"$PACKAGE[$EXTRA]" (the brackets become part of the package name),
version $VERSION, and it has Requires-Dist: $PACKAGE = $VERSION, as
well as whatever requirements were originally part of the extra.

Now, if we didn't have to worry about migrations, we'd extend
setuptools/bdist_wheel so that when they see the current syntax for
defining an extra, they generate extra wheels following the formula
above. (So 'setup.py bdist_wheel' generates N+1 wheels for a package
with N extras.) And we'd teach PyPI that packages named like
"$PACKAGE[$EXTRA]" should be collected together with packages named
"$PACKAGE" (e.g. the same access control apply to both, and probably
you want to display them together in the UI when their versions
match). And we'd teach pip that square brackets are legal in package
names. And that'd be about it.

Of course, we do have to worry about migration, and in the first
instance, what we care about is making pip's database of installed
packages properly record these new wheels. So my proposal is:

- Requirements like 'requests[security,socks]' need to be expanded to
'requests[security], requests[socks]'. More specifically, when pip
processes a requirement like '$PACKAGE[$EXTRA1,$EXTRA2,...] $OP1
$VERSION1, $OP2 $VERSION2, ...', it expands it to multiple packages
and then applies the constraints to each of them: ['$PACKAGE[$EXTRA1]
$VERSION1, $OP2 $VERSION2 ...', ...].

- When pip needs to find a wheel like 'requests[security]', then it
first checks to see if this exact wheel (with the brackets) is
available on PyPI (or whatever package sources it has available). If
so, it uses that. If not, then it falls back to looking for a
'requests' wheel, and if it finds one, and that wheel has 'extra'
metadata, then it *uses that metadata to generate a wheel on the
spot*, and then carries on as if it had found it on PyPI.

  - Special case: when hash-checking mode is enabled and pip ends up
doing this fallback, then pip always checks the hash against the wheel
it found on PyPI – so 'requests[security] --hash=...' checks the hash
of requests.whl, not the auto-generated requests[security].whl.

(There is some question to discuss here about how sdists should be
handled: in many cases, possibly all of them, it doesn't really make
sense to have separate sdists for different square-bracket packages.
'requests[security]' will probably always be generated from the
requests source tree, and for build variants like 'numpy[mkl]' you
definitely want to build that from numpy.tar.gz, with some special
flag telling it to use the "mkl" configuration. So maybe when pip
fails to find a wheel it should always go straight to the un-adorned
sdist like requests.tar.gz, instead of checking for
requests[security].tar.gz. But this isn't going to make-or-break a v1

If we implement just these two things, then I think that's enough for
pip to start immediately tracking all the proper metadata for existing
extras packages, and also provide a smooth onramp to the eventual
features. Once this was working, we could enable uploading
square-bracket packages to PyPI, and pip would start automatically
picking them up where present. Then we could flip the switch for
setuptools to start generating such packages. We'd probably also want
to tweak PEP 517 so that the build backend is informed of exactly
which package pip is looking for, to handle the case where
numpy.tar.gz is expected to produce numpy[mkl].whl.

And after that we could enable real Provides-Dist: and Provides-Dist:


Nathaniel J. Smith -- https://vorpus.org

More information about the Distutils-SIG mailing list