[Distutils] distlib and wheel metadata

Wes Turner wes.turner at gmail.com
Wed Feb 15 08:00:59 EST 2017

On Wed, Feb 15, 2017 at 5:33 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 14 February 2017 at 21:21, Vinay Sajip via Distutils-SIG
> <distutils-sig at python.org> wrote:
> >
> >
> >> I thought the current status was that it's called metadata.json
> >> exactly *because* it's not standardized, and you *shouldn't* look at
> >> it?
> >
> >
> > Well, it was work-in-progress-standardised according to PEP 426 (since
> > sometimes implementations have to work in parallel with working out the
> > details of specifications). Given that PEP 426 wasn't done and dusted
> > but being progressed, I would have thought it perfectly acceptable to
> > use "pydist.json", as the only things that would be affected would be
> > packaging tools working to the PEP.
> I asked Daniel to *stop* using pydist.json, since wheel was emitting a
> point-in-time snapshot of PEP 426 (which includes a lot of
> potentially-nice-to-have things that nobody has actually implemented
> so far, like the semantic dependency declarations and the enhancements
> to the extras syntax), rather than the final version of the spec.

Would you send a link to the source for this?

> >> It's too bad that the JSON thing didn't work out, but I think we're
> >> better off working on better specifying the one source of truth
> >> everything already uses (METADATA) instead of bringing in *new*
> >> partially-incompatible-and-poorly-specified formats.
> >
> > When you say "everything already uses", do you mean setuptools and wheel?
> > If nobody else is allowed to play, that's one thing. But otherwise, there
> > need to be standards for interoperability. The METADATA file, now -
> exactly
> > which standard does it follow? The one in the dateutil wheel that Jim
> > referred to doesn't appear to conform to any of the metadata PEPs. It was
> > rejected by old metadata code in distlib (which came of out the Python
> 3.3
> > era "packaging" package - not to be confused with Donald's of the same
> name -
> > which is strict in its interpretation of those earlier PEPs).
> >
> > The METADATA format (key-value) is not really flexible enough for certain
> > things which were in PEP 426 (e.g. dependency descriptions), and for
> these
> > JSON seems a reasonable fit.
> The current de facto standard set by setuptools and bdist_wheel is:
> - dist-info/METADATA as defined at
> https://packaging.python.org/specifications/#package-distribution-metadata
> - dist-info/requires.txt runtime dependencies as defined at
> http://setuptools.readthedocs.io/en/latest/formats.html#requires-txt
> - dist-info/setup_requires.txt build time dependencies as defined at
> http://setuptools.readthedocs.io/en/latest/formats.html#setup-requires-txt
> The dependency fields in METADATA itself unfortunately aren't really
> useful for anything.

Graph: Nodes and edges.

> There's definitely still a place for a pydist.json created by going
> through PEP 426, comparing it to what bdist_wheel already does to
> populate metadata.json, and either changing the PEP to match the
> existing practice, or else agreeing that we prefer what the PEP
> recommends, that we want to move in that direction, and that there's a
> definite commitment to implement the changes in at least setuptools
> and bdist_wheel (plus a migration strategy that allows for reasonably
> sensible consumption of old metadata).

Which function reads metadata.json?
Which function reads pydist.json?

> Such an update would necessarily be a fairly ruthless process, where
> we defer everything that can possibly be deferred. I already made one
> pass at that when I split out the metadata extensions into PEP 459,
> but at least one more such pass is needed before we can sign off on
> the spec as metadata 2.0 - even beyond any "open for discussion"
> questions, there are still things in there which were extracted and
> standardised separately in PEP 508.
> > There's no technical reason why "the JSON thing
> > didn't work out", as far as I can see - it was just given up on for a
> more
> > incremental approach (which has got no new PEPs other than 440, AFAICT).
> Yep, it's a logistical problem rather than a technical problem per se
> - new metadata formats need software publisher adoption to ensure the
> design is sensible before we commit to them long term, but software
> publishers are understandably reluctant to rely on new formats that
> limit their target audience to folks running the latest versions of
> the installation tools (outside constrained cases where the software
> publisher is also the main consumer of that software).

An RDFS Vocabulary contains Classes and Properties with rdfs:ranges and

There are many representations for RDF: RDF/XML, Turtle/N3, JSONLD.

RDF is implementation-neutral. JSONLD is implementation-neutral.

> For PEP 440 (version specifiers) and PEP 508 (dependency specifiers),
> this was handled by focusing on documenting practices that people
> already used (and checking existing PyPI projects for compatibility),
> rather than trying to actively change those practices.
> For pyproject.toml (e.g. enscons), the idea is to provide a setup.py
> shim that can take care of bootstrapping the new approach for the
> benefit of older tools that assume the use of setup.py (similar to
> what was done with setup.cfg and d2to1).
> The equivalent for PEP 426 would probably be legacy-to-pydist and
> pydist-to-legacy converters that setuptools, bdist_wheel and other
> publishing tools can use to ship legacy metadata alongside the
> standardised format (and I believe Daniel already has at least the
> former in order to generate metadata.json in bdist_wheel). With PEP
> 426 as currently written, a pydist-to-legacy converter isn't really
> feasible, since pydist proposes new concepts that can't be readily
> represented in the old format.

pydist-to-legacy would be a lossy transformation.

> > I understand that social reasons are often more important than technical
> reasons
> > when it comes to success or failure of an approach; I'm just not sure
> that
> > in this case, it wasn't given up on too early.
> I think of PEP 426 as "deferred indefinitely pending specific
> practical problems to provide clearer design constraints" rather than
> abandoned :)

Is it too late to request lowercased property names without dashes?
If we're (I'm?) going to create @context URIs, compare:


"@context": {
    "default": "https://schema.python.org/#",
    "schema": "http://schema.org/",
    # "name": "http://schema.org/name",
    # "url": "http://schema.org/url",
    # "verstr":
    # "extra":
    # "requirements"
    # "requirementstr"
"@typeof": [ "py:PythonPackage"],
"name": "IPython",
"url": ["https://pypi.python.org/pypi/IPython", "https://pypi.org/project/
"Provides-Extra": [
    {"@typeof": "Requirement",
     "name": "notebook",
     "extra": ["notebook"],
     "requirements": [], #TODO
     "requirementstr": "extra == 'notebook'"
   {"name": "numpy",
    "extra": ["test"],
    "requirements": #TODO,
    "requirementstr": "python_version >= \"3.4\" and extra == 'test'"

> There are two recent developments that I think may provide those
> missing design constraints and hence motivation to finalise a metadata
> 2.0 specification:
> 1. the wheel-to-egg support in humpty (and hence zc.buiidout). That
> makes humpty a concrete non-traditional installer that would benefit
> from both a modernised standard metadata format, as well as common
> tools both to convert legacy metadata to the agreed modern format and
> to convert the modern format back to the legacy format for inclusion
> in the generated egg files (as then humpty could just re-use the
> shared tools, rather than having to maintain those capabilities
> itself).

class PackageMetadata
    def __init__():
        self.data = collections.OrderedDict()
    def read_legacy()
    def read_metadata_json()
    def read_pydist_json()
    def read_pyproject_toml()
    def read_jsonld()

    def to_legacy():
    def to_metadata_json()
    def to_pydist_json()
    def to_pyproject_toml()
    def to_jsonld()

    def Legacy()
    def MetadataJson()
    def PydistJson()
    def PyprojectToml()
    def Jsonld(cls, *args, **kwargs)
        obj = cls(*args, **kwargs)
        obj.read_jsonld(*args, **kwargs)
        return obj

    def from(cls, path,
        # or this

... for maximum reusability, we really shouldn't need an adapter registry

> 2. the new pipenv project to provide a simpler alternative to the
> pip+virtualenv+pip-tools combination for environment management in web
> service development (and similar layered application architectures).
> As with the "install vs setup" split in setuptools, pipenv settled on
> an "only two kinds of requirement (deployment and development)" model
> for usability reasons, but it also distinguishes abstract dependencies
> stored in Pipfile from pinned concrete dependencies stored in
> Pipfile.lock.

Does the Pipfile/Pipfile.lock distinction overlap with 'integrates' as a
replacement for meta_requires?

> If we put those together with the existing interest in automating
> generation of policy compliant operating system distribution packages,

Downstream OS packaging could easily (and without permission) include extra
attributes (properties specified with full URIS) in JSONLD metadata.

> it makes it easier to go through the proposed semantic dependency
> model in PEP 426 and ask "How would we populate these fields based on
> the metadata that projects *already* publish?".

See 'class PackageMetadata'

> - "run requires": straightforward, as these are the standard
> dependencies used in most projects. Not entirely clear how to gently
> (or strongly!) discourage dependency pinning when publishing to PyPI
> (although the Pipfile and Pipfile.lock model used in pipenv may help
> with this)
> - "meta requires": not clear at all, as this was added to handle cases
> like PyObjC, where the main package is just a metapackage that makes a
> particular set of versioned subpackages easy to install. This may be
> better modeled as a separate "integrates" field, using a declaration
> syntax more akin to that used for Pipfile.lock rather than that used
> for normal requirements declarations.
> - "dev requires": corresponds to "dev-packages" in pipenv
> - "build requires": corresponds to "setup_requires" in setuptools,
> "build-system.requires" + any dynamic build dependencies in PEP 518
> - "test requires": corresponds to "test" extra in
> https://packaging.python.org/specifications/#provides-extra-multiple-use
> The "doc" extra in
> https://packaging.python.org/specifications/#provides-extra-multiple-use
> would map to "build requires", but there's potential benefit to
> redistributors in separating it out, as we often split the docs out
> from the built software components (since there's little reason to
> install documentation on headless servers that are only going to be
> debugged remotely).
> The main argument against "test requires" and "doc requires" is that
> the extras system already works fine for those - "pip install
> MyProject[test]" and "pip install MyProject[doc]" are both already
> supported, so metadata 2.0 just needs to continue to reserve those as
> semantically significant extras names.
> "dev" requires could be handled the same way - anything you actually
> need to *build* an sdist or wheel archive from a source repository
> should be in "setup_requires" (setuptools) or "build-system.requires"
> (pyproject.toml), so "dev" would just be a conventional extra name
> rather than a top level field.
> That just leaves "build_requires", which turns out to interact
> awkwardly with the "extras" system: if you write "pip install
> MyProject[test]", does it install all the "test" dependencies,
> regardless of whether they're listed in run_requires or
> build_requires?
> If yes: then why are run_requires and build_requires separate?
> If no: then how do you request installation of the "test" build extra?
> Or are build extras prohibited entirely?
> That suggests that perhaps "build" should just be a conventional extra
> as well, and considered orthogonal to the other conventional extras.
> (I'm sure this idea has been suggested before, but I don't recall who
> suggested it or when)
> And if build, test, doc, and dev are all handled as extras, then the
> top level name "run_requires" no longer makes sense, and the field
> name should go back to just being "requires".

> Under that evaluation, we'd be left with only the following top level
> fields defined for dependency declarations:
> - "requires": list where entries are either a string containing a PEP
> 508 dependency specifier or else a hash map contain a "requires" key
> plus "extra" or "environment" fields as qualifiers


> - "integrates": replacement for "meta_requires" that only allows
> pinned dependencies (i.e. hash maps with "name" & "version" fields, or
> direct URL references, rather than a general PEP 508 specifier as a
> string)


What happens here when something is listed in both requires and integrates?

Where/do these get merged on the "name" attr as a key, given a presumed
namespace URI prefix (https://pypi.org/project/)?

> For converting old metadata, any concrete dependencies that are
> compatible with the "integrates" field format would be mapped that
> way, while everything else would be converted to "requires" entries.

What heuristic would help identify compatibility with the integrates field?

> The semantic differences between normal runtime dependencies and
> "dev", "test", "doc" and "build" requirements would be handled as
> extras, regardless of whether you were using the old metadata format
> or the new one.

+1 from me.

I can't recall whether I've used {"dev", "test", "doc", and "build"} as
extras names in the past; though I can remember thinking "wouldn't it be
more intuitive to do it [that way]"

Is this backward compatible? Extras still work as extras?

> Going the other direction would be similarly straightforward since
> (excluding extensions) the set of required conceptual entities has
> been reduced back to the set that already exists in the current
> metadata formats. While "requires" and "integrates" would be distinct
> fields in pydist.json, the decomposed fields in the latter would map
> back to their string-based counterparts in PEP 508 when converted to
> the legacy metadata formats.
> Cheers,
> Nick.
> P.S. I'm definitely open to a PR that amends the PEP 426 draft along
> these lines. I'll get to it eventually myself, but there are some
> other things I see as higher priority for my open source time at the
> moment (specifically the C locale handling behaviour of Python 3.6 in
> Fedora 26 and the related upstream proposal for Python 3.7 in PEP 538)

I need to find a job; my time commitment here is inconsistent.
I'm working on a project (nbmeta) for generating, displaying, and embedding
RDFa and JSONLD in Jupyter notebooks (w/ _repr_html_() and an OrderedDict)
which should refresh the JSONLD @context-writing skills necessary to define
the RDFS vocabulary we could/should have at https://schema.python.org/ .

- [ ] JSONLD PEP (<- PEP426)
  - [ ] examples / test cases
    - I've referenced IPython as an example package; are there other hard
test cases for python packaging metadata conversion? (i.e. one that uses
every feature of each metadata format)?
  - [ ] JSONLD @context
  - [ ] class PackageMetadata
  - [ ] wheel: (additionally) generate JSONLD metadata
  - [ ] schema.python.org: master, gh-pages (or e.g. "

- [ ] warehouse: add a ./jsonld view (to elgacy?)


> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20170215/469a7916/attachment-0001.html>

More information about the Distutils-SIG mailing list