[Distutils] PEP 426: proposed metadata caching convention

Daniel Holth dholth at gmail.com
Thu Feb 28 15:00:53 CET 2013


On Thu, Feb 28, 2013 at 12:54 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Thu, Feb 28, 2013 at 7:59 AM, Daniel Holth <dholth at gmail.com> wrote:
>> My aim is to provide a hook mechanism that specifically does not say
>> anything about the way the cache is stored or even whether the hook
>> produces a cache at all. It will just run when pip is done.
>
> How does the following idea sound?
>
> New metadata field: "Post-Install"
> Format: a *single* callable reference in entry-points format (i.e.
> "module.name:callable.name")
> Call signature:
>
>     def post_install_hook(metadata, extras, previous_version=None):
>         ...
>
> "extras" would be a tuple indicating which extras were installed.
>
> For an upgrade, "previous_version" would be set to the version that
> was previously installed. For a clean installation, it would either be
> None or omitted entirely.
>
> The "metadata" argument would be the PEP 426 metadata, reformatted as
> JSON-compatible structured metadata. I had planned to postpone
> defining the algorithm for that conversion until after PEP 426
> acceptance, but if we're going to add a post-install hook mechanism to
> PEP 426, I think it makes more sense to define it up front:
>
> 1. The top level is a mapping, with lowercase versions of all PEP 426
> fields as keys. All multiple-use fields other than "requires-python"
> are pluralised (that one is only multiple use so you can depend on a
> different version of Python given different environment markers - for
> example, supporting Python 2.6 everywhere, but requiring Python 2.7 on
> Windows. Aside from those cases, you can collapse an arbitrarily
> complex version specifier down to a single line)
> 3. Every mandatory field is present, with a string value
> 4. If present, the "keywords" field, references a list of keywords
> (created via str.split)
> 5. If present, the description is always stored under the
> "description" key, even if provided in the PEP 426 metadata payload
> 6. If any other optional field is present, it references a string value
> 7. If present, the "project-urls" key references a mapping of labels to URLs.
> 8. If present, the "extensions" key references a mapping of extension
> names to the extension's embedded JSON metadata. (Note: this is the
> key reason for my planned change to the extension format from
> arbitrary subfields to allowing only a single "json" subfield - it
> greatly simplifies this aspect of the translation to structured
> metadata, *and* makes it more flexible and powerful at the same time)
> 9. For any multi-use field that is present and supports environment
> markers, it is a reference to a mapping where each key is a
> whitespace-normalized (i.e. every sequence of whitespace converted to
> a single space) environment marker string that references a list of
> string values. The unqualified fields are referenced by the string
> "always". This breakdown allows each unique environment marker to be
> evaluated only once to determine whether or not it is applicable,
> regardless of how many times it was originally used.
> 10. If any other multi-use field is present, it references a list of
> string values.
>
> For example:
>
>     Metadata-Version: 2.0
>     Name: BeagleVote
>     Version: 1.0a2
>     Summary: A module for collecting votes from beagles.
>     Keywords: dog puppy voting election
>     Project-URL: Bug, Issue Tracker,
> http://bitbucket.org/tarek/distribute/issues/
>     Requires-Dist: pkginfo
>     Requires-Dist: PasteDeploy
>     Requires-Dist: zope.interface (3.5.0)
>     Extension: Chili
>     Chili/json: {
>         "Type": "Poblano",
>         "Heat": "Mild"
>     }
>
>    Apparently, these beagles like their chili. (This is not a helpful
> description)
>
> Would become:
>
>     {
>         "metadata-version": "2.0",
>         "name": "BeagleVote",
>         "version": "1.0a2",
>         "summary": A module for collecting votes from beagles.",
>         "description": "Apparently, these beagles like their chili.
> (This is not a helpful description)",
>         "keywords": ["dog", "puppy", "voting", "election"],
>         "project-urls": {
>             "Bug, Issue Tracker":
> "http://bitbucket.org/tarek/distribute/issues/"
>         },
>         "requires-dists": {"always": ["pkginfo", "PasteDeploy",
> "zope.interface (>3.5.0)"]},
>         "extensions: {
>             "Chili": {
>                 "Type": "Poblano",
>                 "Heat": "Mild"
>             }
>         }
>     }
>
> An apparently simpler alternative would be to rely on PEP 376 to
> retrieve the full metadata and only provide the distribution name and
> version to the hook:
>
>     def post_install_hook(distname, current_version, previous_version=None):
>         ...
>
> The key disadvantage of that seemingly simpler approach is it *only*
> works for post install and pre uninstall hooks, *and* requires that
> the post-install hook have the tools needed to read the PEP 376
> metadata. If we later want to add pre-install, build or archiving
> hooks, they would need the structured metadata format anyway, as
> relying on PEP 376 isn't an option for software that hasn't been
> installed yet. This "simpler" alternative also won't work for
> eventually decoupling the installation database from a particular
> filesystem layout (e.g. adding metadata support to import hooks or
> tunnelling the metadata through TUF).
>
> A third alternative would be to defer the task of defining the build
> hook signatures and the metadata conversion to a separate metadata
> extension (e.g. as is going to happen for entry points). I don't think
> that's appropriate - the metabuild system will be the way that the
> distribution ecosystem evolves in the future, so it makes more sense
> to me to use the core metadata standard to define it. If a particular
> installer doesn't understand a given extension, that's not supposed to
> matter, whereas ignoring the post-install hook would be a *big*
> problem. I did consider proposing the concept of "required extensions"
> instead, but that really runs counter to the idea of allowing end
> users to use whichever standards compliant installer they prefer.
>
> However, extensions *would* be a perfect way for installers like pip
> to experiment with additional build hooks (e.g. bypassing setup.py for
> wheel creation), based on the general style of interface I am
> proposing for the post-install hook.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

We will probably wind up with some JSON very much like that. I like
just exposing it as an ordered multidict with the same key names as
mentioned in the PEP. IMO the environment marker for "always" is just
"" (empty string).

My hook would be a literal Entry-Point. You would install a package
"twisted.plugins" that would register its interest in installation
changes by declaring the entry point "[packaging.hooks]
post_install=twisted.plugins:hook". Afterwards, every time you install
or uninstall another package, twisted.plugins.hook() would be called.
It would iterate over all installed distributions using some API like
pkg_resources.working_set or distlib's database and do whatever it
needed to do. It could be called once per pip invocation instead of
once per individual package.

The hook is not guaranteed to run. If you do not run the hook, you
should expect Twisted's plugin discovery process to take longer just
like it does today. In fact the packages available on sys.path are not
guaranteed to "have been installed" at all.

For comparison in the wheel patch we call
pkg_resources.find_distributions(location) against the per-dist
temporary location pip uses for builds. The call yields the one dist
we are considering as a Distribution() object and then it's easy to
get the requirements.

https://github.com/pypa/pip/blob/wheel/pip/req.py#L1078

It could turn into a very long discussion but I think import hooks
have to grow a public listdir() someday...
http://hg.python.org/cpython/file/2.7/Lib/pkgutil.py#l331 shows that
the current method is to use the less-than-ideal API of
zipimport._zip_directory_cache


More information about the Distutils-SIG mailing list