[Distutils] API for finding plugins
Phillip J. Eby
pje at telecommunity.com
Wed Feb 8 06:36:25 CET 2006
At 09:40 PM 2/7/2006 -0600, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>I'm assuming here that the problem is needing to import each command to
>>get its description and display it?
>
>Oh, yes, that too ;) That probably is the bigger problem, and
>inevitable. That doesn't happen except with help. So maybe I am worried
>about nothing.
Probably. ;) A fix, however, would be to change your entry point names to
include a description. Currently, entry point names can contain any
characters you like besides '='. (And leading/trailing spaces are skipped.)
This means that you can define entry points like this, as long as you do
the name parsing in your own tools:
commit (ci,checkin) - Commit the current version = some.module:commit
So, this entry point would have a name of "commit (ci,checkin) - Commit the
current version", and you can then parse it to get what you need. I plan
to use this format for "nest" command entry points in 0.7, with the
parenthesized names being used to identify aliases that refer to the same
command. This will let a help command operate without needing to import
anything, and in particular it means that any extras required for the
command won't have to be downloaded and installed unless you try to do
something with it more directly.
The downside to this approach is that you can't just look up a command name
directly, you have to iterate over all entry points in the command group
until you find a match. However, this search would take place anyway, it's
just hidden under the API, so the performance is the same apart from the
parsing overhead.
>>I'm not aware of any existing standards, either, but I'm thinking that
>>what's needed is more of an API for resource retrieval keyed on some set
>>of simple values or wildcards, with some way to aggregate search results
>>from multiple sources (so that e.g. databases, eggs, and directory trees)
>>can all "offer" resources on demand.
>>I'm thinking that you would call this API (at the low level) via
>>something like:
>> my_page_source = resource_set.find_resource(
>> resource=['my_page'], for_project=['MyProject'],
>> locale=['en','de'], layer=['some_layer', 'other_layer'],
>> ).as_string()
>
>Are the arguments arbitrary? I.e., can I add my own? Like
>domain=['blog.ianbicking.org', 'ianbicking.org'], or user=['ianb', None]?
Yeah, that's the idea.
>Are resources typed in any way? Similar to entry point groups...?
I think resources should only provide access to string/stream/filename (ala
pkg_resources resources) and their metadata attributes (like locale, layer,
etc.) If you want to have more elaborate typing, you can simply use
another attribute to define it. For example, a content_type attribute or
an attribute that says what entry point to use to adapt the resource to
some interface.
>>Of course, this would in most cases be wrapped by some higher-level API
>>that eliminates most of the parameters from needing to be specified (e.g.
>>by a framework that knows what locales and skin layers are in effect and
>>what project the requesting code is calling from). For performance, you
>>could extract subset resource sets and use them instead of querying a
>>top-level resource set.
>
>I often find myself wanting to just override one little bit. Subset
>resources would potentially break that, unless they are a subset that is
>resolved all at once instead of a subset that has to provide all the
>necessary resources. At least if you are describing what I think you are
>describing.
I'm not sure I follow you. If it's something you "override", then you'd
have to leave it out of your subset criteria. What I'm describing is the
ability to have a subset snapshot for performance reasons, not simply a
restricted view over a larger set. (Although that also sounds like a
useful thing to have.)
Mainly, my concerns about this approach are that, without tuning or hinting
for a particular access profile, it's going to be tough to have a fast data
structure that's also memory-efficient. Creating indexes on all attributes
means a space consumption of roughly one dictionary or set per unique
attribute value. That is, every relatively-unique key consumes a
dictionary of its own, consuming hundreds of bytes. Every resource will
have at least 1 relatively-unique attribute value, namely its ID.
On the plus side, even as the total number of resources grows due to
variants, the raw overhead for the dictionaries should remain the same,
since each new language or layer will only add one new unique key (the
language or layer). So, it's probably not as bad to just index everything
as I'm worrying it would be.
Efficiently handling search precedence across multiple resource providers
is also an interesting problem. You really want the result precedence to
be based on stuff like the locale and layers, *not* on which provider found
the data. This means that searches like the example I gave have to either
be broken down into a variety of single-value searches done in sequence,
each one executed in "parallel" against all backends. Either that, or
there has to be a kind of sort-merge done against results yielded by the
backends to ensure that the "best" results are yielded first.
Probably the best thing to do is going to be to require searches to be
prioritized on input, e.g.:
find_resource(
('resource', ['my_page']),
('for_project', ['MyProject']),
('layer', ['some_layer','other_layer']),
('locale', ['en','de']),
...
The above is saying, "first look for resources named my_page that are for
MyProject, and of those you find, give precedence to 'some_layer' over
'other_layer' ones. And within those, give precedence to locales of 'en'
then 'de'.
This approach has the benefit of allowing entire backends to be excluded
early from the search, since it doesn't matter what layers or locales an
egg has resources for if it doesn't have 'my_page' for 'MyProject'.
More information about the Distutils-SIG
mailing list