[Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config

Sun Jul 24 21:12:02 CEST 2005

Phillip J. Eby wrote:
>> It goes in the .egg-info directory.  This way elsewhere you can say:
>>
>>    application = SomeApplication[feature1]
> 
> 
> I like this a lot, although for a different purpose than the format 
> Chris and I were talking about.  

Yes, this proposal really just simplifies a part of that application 
deployment configuration, it doesn't replace it.  Though it might make 
other standardization less important.

> I see this fitting into that format as 
> maybe:
> 
>    [feature1 from SomeApplication]
>    # configuration here
> 
> 
>> And it's quite unambiguous.  Note that there is *no* "configuration" in
>> the egg-info file, because you can't put any configuration related to a
>> deployment in an .egg-info directory, because it's not specific to any
>> deployment.  Obviously we still need a way to get configuration in
>> there, but lets say that's a different matter.
> 
> 
> Easily fixed via what I've been thinking of as the "deployment 
> descriptor"; I would call your proposal here the "import map".  
> Basically, an import map describes a mapping from some sort of feature 
> name to qualified names in the code.

Yes, it really just gives you a shorthand for the factory configuration 
variable.

> I have an extension that I would make, though.  Instead of using 
> sections for features, I would use name/value pairs inside of sections 
> named for the kind of import map.  E.g.:
> 
>     [wsgi.app_factories]
>     feature1 = somemodule:somefunction
>     feature2 = another.module:SomeClass
>     ...
> 
>     [mime.parsers]
>     application/atom+xml = something:atom_parser
>     ...

I assume mime.parsers is just a theoretical example of another kind of 
service a package can provide?  But yes, this seems very reasonable, and 
even allows for loosely versioned specs (e.g., wsgi.app_factories02, 
which returns factories with a different interface; or maybe something 
like foo.configuration_schema, an optional entry point that returns the 
configuration schema for an application described elsewhere).

This kind of addresses the issue where the module structure of a package 
becomes an often unintentional part of its external interface.  It feels 
a little crude in that respect... but maybe not.  Is it worse to do:

   from package.module import name

or:

   name = require('Package').load_entry_point('service_type', 'name')

OK, well clearly the second is worse ;)  But if that turned into a 
single function call:

   name = load_service('Package', 'service_type', 'name')

It's not that bad.  Maybe even:

   name = services['Package:service_type:name']

Though service_type feels extraneous to me.  I see the benefit of being 
explicit about what the factory provides, but I don't see the benefit of 
separating namespaces; the name should be unambiguous.  Well... unless 
you used the same name to group related services, like the configuration 
schema and the application factory itself.  So maybe I retract that 
criticism.

> In addition to specifying the entry point, each entry in the import map 
> could optionally list the "extras" that are required if that entry point 
> is used.
> It could also issue a 'require()' for the corresponding feature if it 
> has any additional requirements listed in the extras_require dictionary.

I figured each entry point would just map to a feature, so the 
extra_require dictionary would already have entries.

> So, I'm thinking that this would be implemented with an entry_points.txt 
> file in .egg-info, but supplied in setup.py like this:
> 
>     setup(
>         ...
>         entry_points = {
>             "wsgi.app_factories": dict(
>                 feature1 = "somemodule:somefunction",
>                 feature2 = "another.module:SomeClass [extra1,extra2]",
>             ),
>             "mime.parsers": {
>                 "application/atom+xml": "something:atom_parser 
> [feedparser]"
>             }
>         },
>         extras_require = dict(
>             feedparser = [...],
>             extra1 = [...],
>             extra2 = [...],
>         )
>     )

I think I'd rather just put the canonical version in .egg-info instead 
of as an argument to setup(); this is one place where using Python 
expressions isn't a shining example of clarity.  But I guess this is 
fine too; for clarity I'll probably start writing my setup.py files with 
variable assignments, then a setup() call that just refers to those 
variables.

>> Open issues?  Yep, there's a bunch.  This requires the rest of the
>> configuration to be done quite lazily.
> 
> 
> Not sure I follow you; the deployment descriptor could contain all the 
> configuration; see the Web-SIG post I made just previous to this one.

Well, when I proposed that the factory be called with zero arguments, 
that wouldn't allow any configuration to be passed in.

>>   I don't think
>> this is useful without the other pieces (both in front of this
>> configuration file and behind it) but maybe we can think about what
>> those other pieces could look like.  I'm particularly open to
>> suggestions that some_function() take some arguments, but I don't know
>> what arguments.
> 
> 
> At this point, I think this "entry points" concept weighs in favor of 
> having the deployment descriptor configuration values be Python 
> expressions, meaning that a WSGI application factory would accept 
> keyword arguments that can be whatever you like in order to configure it.

Yes, I'd considered this as well.  I'm not a huge fan of Python 
expressions, because something like "allow_hosts=['127.0.0.1']" seems 
unnecessarily complex to me.  As a convention (maybe not a requirement; 
a SHOULD) I like if configuration consumers handle strings specially, 
doing context-sensitive conversion (in this case maybe splitting on ',' 
or on whitespace).  It would make me sad to see a something accept 
requests from the IP addresses ['1', '2', '7', '.', '0', '.', '0', '.', 
'1'].  This is the small sort of thing that I think makes the experience 
less pleasant.

> However, after more thought, I think that the "next application" 
> argument should be a keyword argument too, like 'wsgi_next' or some 
> such.  This would allow a factory to have required arguments in its 
> signature, e.g.:
> 
>     def some_factory(required_arg_x, required_arg_y, optional_arg="foo", 
> ....):
>         ...
> 
> The problem with my original idea to have the "next app" be a positional 
> argument is that it would prevent non-middleware applications from 
> having any required arguments.

I think it's fine to declare the next_app keyword argument as special, 
and promise (by convention) to always pass it in with that name.

> Anyway, I think we're now very close to being able to define a useful 
> deployment descriptor format for establishing pipelines and setting 
> options, that leaves open the possibility to do some very sophisticated 
> things.
> 
> Hm.  Interesting thought...  we could have a function to read a 
> deployment descriptor (from a string, stream, or filename) and then 
> return the WSGI application object.  You could then wrap this in a 
> simple WSGI app that does filesystem-based URL routing to serve up 
> *.wsgi files from a directory.  This would let you bootstrap a 
> deployment capability into existing WSGI servers, without them having to 
> add their own support for it!  Web servers and frameworks that have some 
> kind of file extension mapping mechanism could do this directly, of 
> course.  I can envision putting *.wsgi files in my web directories and 
> then configuring Apache to run them using either mod_python or FastCGI 
> or even as a CGI, just by tweaking local .htaccess files.  However, once 
> you have Apache tweaked the way you want, .wsgi files can be just 
> dropped in and edited.

Absolutely; I see no reason WSGI servers should have any dispatching 
logic in them, except in cases when they also dispatch to non-Python 
applications (like Apache).  So it seems natural that we present 
deployment as a single application factory that takes zero or one arguments.

> Of course, there are still some open design issues, like caching of 
> .wsgi files (e.g. should the file be checked for changes on each hit?  I 
> guess that could be a setting under "WSGI options", and would only work 
> if the descriptor parser was given an actual filename to load from.)

I don't know what we'd do if we checked the file and found it wasn't up 
to date.  In this particular case I suppose you could reload the 
configuration file, but if the change in the configuration file 
reflected a change in the source code, then you're stuck because 
reloading in Python is so infeasible.  I'm all for warnings, but I don't 
see how we can do the Right Thing here, as much as I wish it were otherwise.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org