[Distutils] distlib and wheel metadata

Nick Coghlan ncoghlan at gmail.com
Wed Feb 15 06:33:41 EST 2017


On 14 February 2017 at 21:21, Vinay Sajip via Distutils-SIG
<distutils-sig at python.org> wrote:
>
>
>> I thought the current status was that it's called metadata.json
>> exactly *because* it's not standardized, and you *shouldn't* look at
>> it?
>
>
> Well, it was work-in-progress-standardised according to PEP 426 (since
> sometimes implementations have to work in parallel with working out the
> details of specifications). Given that PEP 426 wasn't done and dusted
> but being progressed, I would have thought it perfectly acceptable to
> use "pydist.json", as the only things that would be affected would be
> packaging tools working to the PEP.

I asked Daniel to *stop* using pydist.json, since wheel was emitting a
point-in-time snapshot of PEP 426 (which includes a lot of
potentially-nice-to-have things that nobody has actually implemented
so far, like the semantic dependency declarations and the enhancements
to the extras syntax), rather than the final version of the spec.

>> It's too bad that the JSON thing didn't work out, but I think we're
>> better off working on better specifying the one source of truth
>> everything already uses (METADATA) instead of bringing in *new*
>> partially-incompatible-and-poorly-specified formats.
>
> When you say "everything already uses", do you mean setuptools and wheel?
> If nobody else is allowed to play, that's one thing. But otherwise, there
> need to be standards for interoperability. The METADATA file, now - exactly
> which standard does it follow? The one in the dateutil wheel that Jim
> referred to doesn't appear to conform to any of the metadata PEPs. It was
> rejected by old metadata code in distlib (which came of out the Python 3.3
> era "packaging" package - not to be confused with Donald's of the same name -
> which is strict in its interpretation of those earlier PEPs).
>
> The METADATA format (key-value) is not really flexible enough for certain
> things which were in PEP 426 (e.g. dependency descriptions), and for these
> JSON seems a reasonable fit.

The current de facto standard set by setuptools and bdist_wheel is:

- dist-info/METADATA as defined at
https://packaging.python.org/specifications/#package-distribution-metadata
- dist-info/requires.txt runtime dependencies as defined at
http://setuptools.readthedocs.io/en/latest/formats.html#requires-txt
- dist-info/setup_requires.txt build time dependencies as defined at
http://setuptools.readthedocs.io/en/latest/formats.html#setup-requires-txt

The dependency fields in METADATA itself unfortunately aren't really
useful for anything.

There's definitely still a place for a pydist.json created by going
through PEP 426, comparing it to what bdist_wheel already does to
populate metadata.json, and either changing the PEP to match the
existing practice, or else agreeing that we prefer what the PEP
recommends, that we want to move in that direction, and that there's a
definite commitment to implement the changes in at least setuptools
and bdist_wheel (plus a migration strategy that allows for reasonably
sensible consumption of old metadata).

Such an update would necessarily be a fairly ruthless process, where
we defer everything that can possibly be deferred. I already made one
pass at that when I split out the metadata extensions into PEP 459,
but at least one more such pass is needed before we can sign off on
the spec as metadata 2.0 - even beyond any "open for discussion"
questions, there are still things in there which were extracted and
standardised separately in PEP 508.

> There's no technical reason why "the JSON thing
> didn't work out", as far as I can see - it was just given up on for a more
> incremental approach (which has got no new PEPs other than 440, AFAICT).

Yep, it's a logistical problem rather than a technical problem per se
- new metadata formats need software publisher adoption to ensure the
design is sensible before we commit to them long term, but software
publishers are understandably reluctant to rely on new formats that
limit their target audience to folks running the latest versions of
the installation tools (outside constrained cases where the software
publisher is also the main consumer of that software).

For PEP 440 (version specifiers) and PEP 508 (dependency specifiers),
this was handled by focusing on documenting practices that people
already used (and checking existing PyPI projects for compatibility),
rather than trying to actively change those practices.

For pyproject.toml (e.g. enscons), the idea is to provide a setup.py
shim that can take care of bootstrapping the new approach for the
benefit of older tools that assume the use of setup.py (similar to
what was done with setup.cfg and d2to1).

The equivalent for PEP 426 would probably be legacy-to-pydist and
pydist-to-legacy converters that setuptools, bdist_wheel and other
publishing tools can use to ship legacy metadata alongside the
standardised format (and I believe Daniel already has at least the
former in order to generate metadata.json in bdist_wheel). With PEP
426 as currently written, a pydist-to-legacy converter isn't really
feasible, since pydist proposes new concepts that can't be readily
represented in the old format.

> I understand that social reasons are often more important than technical reasons
> when it comes to success or failure of an approach; I'm just not sure that
> in this case, it wasn't given up on too early.

I think of PEP 426 as "deferred indefinitely pending specific
practical problems to provide clearer design constraints" rather than
abandoned :)

There are two recent developments that I think may provide those
missing design constraints and hence motivation to finalise a metadata
2.0 specification:

1. the wheel-to-egg support in humpty (and hence zc.buiidout). That
makes humpty a concrete non-traditional installer that would benefit
from both a modernised standard metadata format, as well as common
tools both to convert legacy metadata to the agreed modern format and
to convert the modern format back to the legacy format for inclusion
in the generated egg files (as then humpty could just re-use the
shared tools, rather than having to maintain those capabilities
itself).

2. the new pipenv project to provide a simpler alternative to the
pip+virtualenv+pip-tools combination for environment management in web
service development (and similar layered application architectures).
As with the "install vs setup" split in setuptools, pipenv settled on
an "only two kinds of requirement (deployment and development)" model
for usability reasons, but it also distinguishes abstract dependencies
stored in Pipfile from pinned concrete dependencies stored in
Pipfile.lock.

If we put those together with the existing interest in automating
generation of policy compliant operating system distribution packages,
it makes it easier to go through the proposed semantic dependency
model in PEP 426 and ask "How would we populate these fields based on
the metadata that projects *already* publish?".

- "run requires": straightforward, as these are the standard
dependencies used in most projects. Not entirely clear how to gently
(or strongly!) discourage dependency pinning when publishing to PyPI
(although the Pipfile and Pipfile.lock model used in pipenv may help
with this)
- "meta requires": not clear at all, as this was added to handle cases
like PyObjC, where the main package is just a metapackage that makes a
particular set of versioned subpackages easy to install. This may be
better modeled as a separate "integrates" field, using a declaration
syntax more akin to that used for Pipfile.lock rather than that used
for normal requirements declarations.
- "dev requires": corresponds to "dev-packages" in pipenv
- "build requires": corresponds to "setup_requires" in setuptools,
"build-system.requires" + any dynamic build dependencies in PEP 518
- "test requires": corresponds to "test" extra in
https://packaging.python.org/specifications/#provides-extra-multiple-use

The "doc" extra in
https://packaging.python.org/specifications/#provides-extra-multiple-use
would map to "build requires", but there's potential benefit to
redistributors in separating it out, as we often split the docs out
from the built software components (since there's little reason to
install documentation on headless servers that are only going to be
debugged remotely).

The main argument against "test requires" and "doc requires" is that
the extras system already works fine for those - "pip install
MyProject[test]" and "pip install MyProject[doc]" are both already
supported, so metadata 2.0 just needs to continue to reserve those as
semantically significant extras names.

"dev" requires could be handled the same way - anything you actually
need to *build* an sdist or wheel archive from a source repository
should be in "setup_requires" (setuptools) or "build-system.requires"
(pyproject.toml), so "dev" would just be a conventional extra name
rather than a top level field.

That just leaves "build_requires", which turns out to interact
awkwardly with the "extras" system: if you write "pip install
MyProject[test]", does it install all the "test" dependencies,
regardless of whether they're listed in run_requires or
build_requires?

If yes: then why are run_requires and build_requires separate?
If no: then how do you request installation of the "test" build extra?
Or are build extras prohibited entirely?

That suggests that perhaps "build" should just be a conventional extra
as well, and considered orthogonal to the other conventional extras.
(I'm sure this idea has been suggested before, but I don't recall who
suggested it or when)

And if build, test, doc, and dev are all handled as extras, then the
top level name "run_requires" no longer makes sense, and the field
name should go back to just being "requires".

Under that evaluation, we'd be left with only the following top level
fields defined for dependency declarations:

- "requires": list where entries are either a string containing a PEP
508 dependency specifier or else a hash map contain a "requires" key
plus "extra" or "environment" fields as qualifiers
- "integrates": replacement for "meta_requires" that only allows
pinned dependencies (i.e. hash maps with "name" & "version" fields, or
direct URL references, rather than a general PEP 508 specifier as a
string)

For converting old metadata, any concrete dependencies that are
compatible with the "integrates" field format would be mapped that
way, while everything else would be converted to "requires" entries.
The semantic differences between normal runtime dependencies and
"dev", "test", "doc" and "build" requirements would be handled as
extras, regardless of whether you were using the old metadata format
or the new one.

Going the other direction would be similarly straightforward since
(excluding extensions) the set of required conceptual entities has
been reduced back to the set that already exists in the current
metadata formats. While "requires" and "integrates" would be distinct
fields in pydist.json, the decomposed fields in the latter would map
back to their string-based counterparts in PEP 508 when converted to
the legacy metadata formats.

Cheers,
Nick.

P.S. I'm definitely open to a PR that amends the PEP 426 draft along
these lines. I'll get to it eventually myself, but there are some
other things I see as higher priority for my open source time at the
moment (specifically the C locale handling behaviour of Python 3.6 in
Fedora 26 and the related upstream proposal for Python 3.7 in PEP 538)

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Distutils-SIG mailing list