[Distutils] Changing the "install hooks" mechanism for PEP 426

Thu Aug 15 17:39:16 CEST 2013

On Thu, Aug 15, 2013 at 9:21 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
> On 15 Aug 2013 00:39, "Vinay Sajip" <vinay_sajip at yahoo.co.uk> wrote:
>>
>> PJ Eby <pje <at> telecommunity.com> writes:
>>
>> > The build system *should* reserve at least one (subdivisible)
>> > namespace for itself, and use that mechanism for its own extension,
>>
>> +1 - dog-food :-)
>
> Sounds fair - let's use "pydist", since we want these definitions to be
> somewhat independent of their reference implementation in distlib :)

I think that as part of the spec, we should either reserve multiple
prefixes for Python/stdlib use, or have a single, always-reserved
top-level prefix like 'py.' that can be subdivided in the future.
Extensions are a honking great idea, so the stdlib will probably do
more of them in the future.  Likewise, future standards and
informational PEPs will likely document specific extension protocols
of general and specialized interest.  (Notice, for example, that
extensions could be used to publicize what database drivers are
installed and available on a system.)

> Based on PJE's feedback, I'm also starting to think that the
> exports/extensions split is artificial and we should drop it. Instead, there
> should be a "validate" export hook that build tools can call to check for
> export validity, and the contents of an export group be permitted to be
> arbitrary JSON.

I think there is still something to be said for STASCTAP: simple
things are simple, complex things are possible.  (Also, flat is better
than nested.)  So I would suggest that an export can either be an
import identifier string, *or* a JSON object with arbitrary contents.

That would make it easier, I think, to implement both a full-featured
replacement for setuptools entry point API, and allow simple
extensions to be simple.  It means, too, that simple exports can be
defined with a flatter syntax (ala setuptools' ini format) in tools
that generate the JSON.

Given how many use cases are already met today by providing
import-based exports, ISTM that they are the 20% that provides 80% of
the value; arbitrary JSON is the 80% that only provides 20%, and so
should not be the entry point (no pun intended) for people dealing
with extensions.

Removing the extension/export split also raises a somewhat different
question, which is what to *call* them.  I'm sort of leaning towards
"extensions" as the general category, with "exports" being extensions
that consist of an importable object, and "JSON extensions" for ones
that are a JSON mapping object.

So the terminology would be:

Extension group - package like names, subdivisible as a namespace,
should have a prefix associated with a project that defines the
semantics of the extension group; analagous to Eclipse's notion of an
"extension point"

Extension name - arbitrary string, unique per distribution for a given
group, but not required to be globally unique even for the group.
Specific names or specific syntax for names may be specified by the
creators of the group, and may optionally be validated.

Extension object - either an "export string" specifying an importable
object, or a JSON object.  If a string, must be syntactically valid as
an export; it is not, however, required to reference a module in the
distribution that exports it; it *should* be in that distribution or
one of its dependencies, however.

So, an extension is machine-usable metadata published by a
distribution in order to be (optionally) consumed by other
distributions.  It can be either static JSON metadata, or an
importable object.  The semantics of an extension are defined by its
group, and other extensions can be used to validate those semantics.
Any project that wants to be able to use plugins or extensions of some
kind, can define its own groups, and publish extensions for validating
them.  Python itself will reserve and define a group namespace for
extending the build and installation system, including a sub-namespace
where the validators can be declared.

> So we would have "pydist.commands" and "pydist.export_hooks" as export
> groups, with "distlib" used as an example of how to define handlers for
> them.

Is 'commands' for scripts, or something else?   Following "flat is
better than nested", I would suggest not using arbitrary JSON for
these when it's easy to define new dotted groups.  (Keeping to such a
style will make it easier for humans to define this stuff in the first
place, before it's turned into JSON.)

(Note, btw, that having more dots in a name does not necessarily equal
"nested", whereas replacing those dots with nested JSON structures
most definitely *is* "nested"!)

Similarly, I'd just as soon see e.g. pydist.hooks.* subgroups, rather
than a dedicated data structure.  A 'pydist.validators' group would of
course also be needed for syntax validation, with extension names in
that group possibly allowing trailing '*' or '**' wildcards.

(There will of course need to be a validation API, which is why I
think that a separate PEP for the "extensions" system is probably
going to be needed, followed by a PEP for the specific extensions used
by the build system.)

> Something else I'm wondering: should the metabuild system be separate, or is
> it just some more export hooks and you define the appropriate export group
> to say which build system to invoke?

It's just extensions, IMO.  What else *is* there?  You *could* define
a core metadata field that says, "this is the distribution I depend on
for building", and then look for the right extension there.  (Or you
could define a builder name, and then look that up in an extension
group, to do a sort of provides-requires approach.)  But the actual
build process can be purely extension-driven, ISTM.