[Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config

Sun Jul 24 18:49:03 CEST 2005

[cc:ed to distutils-sig because much of the below is about a new egg 
feature; follow-ups about the web stuff should stay on web-sig]

At 04:04 AM 7/24/2005 -0500, Ian Bicking wrote:
>So maybe here's a deployment spec we can start with.  It looks like:
>
>    [feature1]
>    someapplication.somemodule.some_function
>
>    [feature2]
>    someapplication.somemodule.some_function2
>
>You can't get dumber than that!  There should also be a "no-feature"
>section; maybe one without a section identifier, or some special section
>identifier.
>
>It goes in the .egg-info directory.  This way elsewhere you can say:
>
>    application = SomeApplication[feature1]

I like this a lot, although for a different purpose than the format Chris 
and I were talking about.  I see this fitting into that format as maybe:

    [feature1 from SomeApplication]
    # configuration here

>And it's quite unambiguous.  Note that there is *no* "configuration" in
>the egg-info file, because you can't put any configuration related to a
>deployment in an .egg-info directory, because it's not specific to any
>deployment.  Obviously we still need a way to get configuration in
>there, but lets say that's a different matter.

Easily fixed via what I've been thinking of as the "deployment descriptor"; 
I would call your proposal here the "import map".  Basically, an import map 
describes a mapping from some sort of feature name to qualified names in 
the code.

I have an extension that I would make, though.  Instead of using sections 
for features, I would use name/value pairs inside of sections named for the 
kind of import map.  E.g.:

     [wsgi.app_factories]
     feature1 = somemodule:somefunction
     feature2 = another.module:SomeClass
     ...

     [mime.parsers]
     application/atom+xml = something:atom_parser
     ...

In other words, feature maps could be a generic mechanism offered by 
setuptools, with a 'Distribution.load_entry_point(kind,name)' API to 
retrieve the desired object.  That way, we don't end up reinventing this 
idea for dozens of frameworks or pluggable applications that just need a 
way to find a few simple entry points into the code.

In addition to specifying the entry point, each entry in the import map 
could optionally list the "extras" that are required if that entry point is 
used.
It could also issue a 'require()' for the corresponding feature if it has 
any additional requirements listed in the extras_require dictionary.

So, I'm thinking that this would be implemented with an entry_points.txt 
file in .egg-info, but supplied in setup.py like this:

     setup(
         ...
         entry_points = {
             "wsgi.app_factories": dict(
                 feature1 = "somemodule:somefunction",
                 feature2 = "another.module:SomeClass [extra1,extra2]",
             ),
             "mime.parsers": {
                 "application/atom+xml": "something:atom_parser [feedparser]"
             }
         },
         extras_require = dict(
             feedparser = [...],
             extra1 = [...],
             extra2 = [...],
         )
     )

Anyway, this would make the most common use case for eggs-as-plugins very 
easy: an application or framework would simply define entry points, and 
plugin projects would declare the ones they offer in their setup script.

I think this is a fantastic idea and I'm about to leap into implementing 
it.  :)

>This puts complex middleware construction into the function that is
>referenced.  This function might be, in turn, an import from a
>framework.  Or it might be some complex setup specific to the
>application.  Whatever.
>
>The API would look like:
>
>    wsgiapp = wsgiref.get_egg_application('SomeApplication[feature1]')
>
>Which ultimately resolves to:
>
>    wsgiapp = some_function()
>
>get_egg_application could also take a pkg_resources.Distribution object.

Yeah, I'm thinking that this could be implemented as something like:

     import pkg_resources

     def get_wsgi_app(project_name, app_name, *args, **kw):
         dist = pkg_resources.require(project_name)[0]
         return dist.load_entry_point('wsgi.app_factories', 
app_name)(*args,**kw)

with all the heavy lifting happening in the pkg_resources.Distribution 
class, along with maybe a new EntryPoint class (to handle parsing entry 
point specifiers and to do the loading of them.

>Open issues?  Yep, there's a bunch.  This requires the rest of the
>configuration to be done quite lazily.

Not sure I follow you; the deployment descriptor could contain all the 
configuration; see the Web-SIG post I made just previous to this one.

>   But I can fit this into source
>control; it is about *all* I can fit into source control (I can't have
>any filenames, I can't have any installation-specific pipelines, I can't
>have any other apps), but it is also enough that the deployment-specific
>parts can avoid many complexities of pipelining and factories and all
>that -- presumably the factory functions handle that.

+1.

>   I don't think
>this is useful without the other pieces (both in front of this
>configuration file and behind it) but maybe we can think about what
>those other pieces could look like.  I'm particularly open to
>suggestions that some_function() take some arguments, but I don't know
>what arguments.

At this point, I think this "entry points" concept weighs in favor of 
having the deployment descriptor configuration values be Python 
expressions, meaning that a WSGI application factory would accept keyword 
arguments that can be whatever you like in order to configure it.

However, after more thought, I think that the "next application" argument 
should be a keyword argument too, like 'wsgi_next' or some such.  This 
would allow a factory to have required arguments in its signature, e.g.:

     def some_factory(required_arg_x, required_arg_y, optional_arg="foo", 
....):
         ...

The problem with my original idea to have the "next app" be a positional 
argument is that it would prevent non-middleware applications from having 
any required arguments.

Anyway, I think we're now very close to being able to define a useful 
deployment descriptor format for establishing pipelines and setting 
options, that leaves open the possibility to do some very sophisticated 
things.

Hm.  Interesting thought...  we could have a function to read a deployment 
descriptor (from a string, stream, or filename) and then return the WSGI 
application object.  You could then wrap this in a simple WSGI app that 
does filesystem-based URL routing to serve up *.wsgi files from a 
directory.  This would let you bootstrap a deployment capability into 
existing WSGI servers, without them having to add their own support for 
it!  Web servers and frameworks that have some kind of file extension 
mapping mechanism could do this directly, of course.  I can envision 
putting *.wsgi files in my web directories and then configuring Apache to 
run them using either mod_python or FastCGI or even as a CGI, just by 
tweaking local .htaccess files.  However, once you have Apache tweaked the 
way you want, .wsgi files can be just dropped in and edited.

Of course, there are still some open design issues, like caching of .wsgi 
files (e.g. should the file be checked for changes on each hit?  I guess 
that could be a setting under "WSGI options", and would only work if the 
descriptor parser was given an actual filename to load from.)