[Distutils] API for finding plugins

Wed Feb 8 19:07:47 CET 2006

At 10:40 AM 2/8/2006 -0600, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>Probably the best thing to do is going to be to require searches to be 
>>prioritized on input, e.g.:
>>     find_resource(
>>        ('resource', ['my_page']),
>>        ('for_project', ['MyProject']),
>>        ('layer', ['some_layer','other_layer']),
>>        ('locale', ['en','de']),
>>     ...
>
>Thoughts about the return value:
>
>I think there should be a dictionary returned, that contains various metadata.

Actually, there should be a find_resources() that yields resources 
according to the precedence given, and find_resource() will simply return 
the first resource.

>   In this case it should contain at least {'resource': 'my_page', 
> 'for_project': 'MyProject', 'layer': 'other_layer', 'locale': 'en'}, to 
> represent exactly what was found.

There would actually be a Resource object, with either a mapping or 
attribute interface to these things, as well as methods.  The attributes 
are going to be tuples of strings, however, since they can be multi-valued.

>   Other metadata can be useful, like the resource_location (the actual 
> filename or other user-readable name).  Content-type is useful in many 
> contexts -- for instance, you might want some kind of image, but then you 
> need to know exactly what kind you got back.  If it gets a stream, 
> knowing Content-length is useful as well.  Also, either encoding should 
> be given (for text resources), or unicode returns should be allowed (I 
> prefer unicode return values).  I think the resource container usually 
> knows the encoding, not the entity requesting the resource.

Content type would probably need to be mime_major+mime_minor attributes, so 
you could request just 'image', without implementing more complex matches 
for searching.  Content length I don't see as a searchable attribute, but I 
can see having a method of some sort to query that, and the same is true 
for encoding.

>That dictionary could also be a container for callables that produce 
>things like filenames and streams.  Or they could be returned in a 
>different way.  Or it could be an actual object, with methods for those 
>things.  Somehow I have become fond of dictionaries.

Heh.  I think objects are a reasonable way to go here; there isn't much 
call for wrapping resources themselves with middleware, and they don't do 
very much to begin with.

>I'm starting to get a better feel for how this overlaps with templates -- 
>coming at it from this direction is easier than from the WSGI direction.

So far the biggest architectural flaw (efficiency-wise) that I see in all 
this is that if your resources are all in eggs, and you have a *lot* of 
them, you have to read *all* of the eggs' resource indexes before you can 
return a single match.  While it's true that you could have a shorter list 
for each egg that indicates only what attribute/value combinations are 
offered by that egg, you still have to read *that* list for all of them for 
the first search, and for eggs with a small number of resources it'll be 
almost as fast to just read the full index to avoid multiple I/O operations.

Anyway, conceptually I think this is something that's useful for pretty 
much any extensible, localizable Python application, especially ones that 
are web-based.  I can see many potential implementations for how you get 
the resource data *in* to the system, too.  For example, peak.web already 
has an .ini file format that lets you set content type rules, using section 
headings that give filenames or wildcards, and then the entries list 
properties to be assigned.

I'm thinking that on the egg side, I'd use a new setuptools entry point for 
"resource finder" plugins.  Their job will be to scope out the distribution 
source for resources and add them to an index.  The index would then be 
written to the egg's metadata directory.  A 'resource_finders' keyword to 
setup would list the names of the entry points to use, so that you don't 
have every possible resource finder chugging away and adding false 
positives.  It might be that the keyword would be a dictionary, e.g.:

     setup(
         ...
         publish_resources = {
             'peak.web': ['somepkg/foo', ...],
             'chandler.translations': {'someparcel':'foobar/LC_MESSAGES'},
             ...
         }
     )

That is, 'publish_resources' would be a dictionary mapping entry point 
names to arguments that define how those plugins should locate and index 
the resources.  The above would fire the peak.web and chandler.translations 
plugins from the 'setuptools.resource_finders' entry point group in order 
to index the available resources for publication.  (Notice that each 
resource finder can have a different parameter format, if it likes.)

Hm.  After all this talk about the thing, I kind of want to just go 
implement it so *I* can use it, standards be damned.  :)  OTOH, I think it 
would be really good to get more feedback on the concept before doing 
that.  I'd be especially curious to hear from the framework developers, 
esp. of Zope, TurboGears, and Myghty.  Zope of course already has similar 
resource-finding abilities, but they don't tie to eggs.  They should have 
good feedback about both what you need to be able to search on, and what 
kind of performance issues are likely to arise.