[Web-SIG] Standardized configuration

Tue Jul 19 19:15:00 CEST 2005

Chris McDonough wrote:
> On Mon, 2005-07-18 at 22:49 -0500, Ian Bicking wrote:
> 
>>In addition to the examples I gave in response to Graham, I wrote a 
>>document on this a while ago: 
>>http://pythonpaste.org/docs/url-parsing-with-wsgi.html
>>
>>The hard part about this is configuration; it's easy to configure a 
>>non-branching chain of middleware.  Once it branches the configuration 
>>becomes hard (like programming-hard; which isn't *hard*, but it quickly 
>>stops feeling like configuration).
> 
> 
> Yep.  I think I'm getting it.  For example, I see that Paste's URLParser
> seems to *construct* applications if they don't already exist based on
> the URL.  And I assume that these applications could themselves be
> middleware.  I don't think that is configurable declaratively if you
> want to decide which app to use based on arbitrary request parameters.
> 
> But if we already had the config for each app "instance" that URLParser
> wanted to consult laying around as files on disk, wouldn't it be just as
> easy to construct these app objects "eagerly" at startup time?  Then you
> URLParser could choose an already-configured app based on some sort of
> configuration file in the URLParser component itself.  The "apps"
> themselves may be pipelines, too, I realize that, but that is still
> configurable without coding.

That's what paste.urlmap is for:

   http://svn.pythonpaste.org/Paste/trunk/paste/urlmap.py

(I haven't actually tried using it much for practical things, so it's 
quite possible it has design mistakes in it)

The idea being that you do:

   urlmap['/myapp'] = MyApp()

But additionally (in PathProxyURLMap):

   urlmap['/myapp'] = 'myapp.conf'

And it builds the application from the configuration file.

> Maybe there'd be some concern about needing to stop the process in order
> to add new applications.  That's a use case I hadn't really considered.
> I suspect this could be done with a signal handler, though, which could
> tell the URLParser to reload its config file instead of potentially
> locating a and creating a new application within every request.
> 
> This would make URLParser a kind of "decision" middleware, but it would
> choose from a static set of existing applications (or pipelines) for the
> lifetime of the process as opposed to constructing them lazily.

URLParser itself is just one parsing implementation, though maybe named 
too generically.  I don't think that particular code needs to grow many 
more features, but there's also room for many other parsers.  And it's 
also fairly easy to wrestle control from URLParser if that gets put in 
the stack (for instance, putting an application function in __init__.py 
will basically take over URL parsing for that  directory).

>>>OTOH, I'm not sure that I want my framework to "find" an app for me.
>>>I'd like to be able to define pipelines that include my app, but I'd
>>>typically just want to statically declare it as the end point of a
>>>pipeline composed of service middleware.  I should look at Paste a
>>>little more to see if it has the same philosophy or if I'm
>>>misunderstanding you.
>>
>>Mostly I wanted to avoid lots of magical incantations for the simple 
>>case.  If you are used to Webware, well it has a very straight-forward 
>>way of finding your application -- you give it a directory name.  If 
>>Quixote or CherryPy, you give it a root object.  Maybe Zope would take a 
>>ZEO connection string, and so on.
> 
> 
> I think I understand now.
> 
> In general, I think I'd rather create "instance" locations of WSGI
> applications (which would essentially consist of a config file on disk
> plus any state info required by the app), configure and construct Python
> objects out of those instances eagerly at "startup time" and just choose
> between already-constructed apps if in "decision middleware" that has
> its own declarative configuration if decisions need to be made about
> which app to use.

I think this is a laudible goal.  Right now, when I'm deploying 
applications written for Paste, I am reluctant to intermingle them in 
the same process and configuration... but that's because Paste is young, 
not because that's a bad idea.  But as a result I haven't tried it, and 
I only have a moderate concept of what it would mean in practice.

A neat feature would be to configure fairly seemlessly across process 
boundaries.  E.g., add a "fork=True" parameter to an application's 
configuration, and the server would fork a process (or delegate to an 
already forked worker process) for that application.  That's the sort of 
thing that could move Python into PHP-style hosting situations.

> This is mostly because I want the configuration info to live within the
> application/middleware instance and have some other "starter" import
> those configurations from application/middleware instance locations on
> the filesystem.  The "starter" would construct required instances as
> Python objects, and chain them together arbitrarily based on some other
> "pipeline configuration" file that lives with the "starter".  The first
> part of that (construct required instances) is described in a post I
> made to this list yesterday.
> 
> This is probably because I'd like there to be one well-understood way to
> declaratively configure pipelines as opposed to each piece of middleware
> potentially needing to manage app construction and having its own
> configuration to do so.
> 
> I don't know if this is reasonable for simpler requirements.  This is
> more of a "formal deployment spec" idea and of course is likely flawed
> in some subtle way I don't understand yet.

I think there's probably some room for separation.  In practice I end up 
with multiple configuration files for my projects -- one that's generic 
to the application, and one that's specific to the deployment.  But it's 
very hard to determine ahead of time what stuff goes where.  For 
instance, server options mostly go in the deployment configuration. 
Until I start building conventions about configuration information on 
the servers, at which time I expect configuration will migrate into 
common locations in the form of configuration-loading options.  E.g., 
where I now do:

   server = 'scgi_threaded'
   port = 4010

In the future I might do:

   import port_map
   port = port_map.find_port(app_name)

Where port_map is some global module where I keep the entire server's 
list of ports mappings.  And being able to do stuff like this is what 
makes Python-syntax imperative configuration so nice... it's crude and 
annoying, but configuration that is more declarative becomes even worse 
when you try to build these kind of features into it.

But I digress... the deployment configuration as I currently use it is 
usually something that overwrites the generic application configuration. 
  They aren't two distinct things.  And the configuration doesn't belong 
to one or the other.  Is the location of session information server 
specific, application specific, profile specific?  It depends on your 
situation.  I might have a standard convention for the location of 
Javascript libraries that lives in my configuration; but on my 
development machine I override that because I'm doing development on one 
of those libraries.  There's all sorts of specific cases, and in 
declarative or well-partitioned configurations the configuration 
language has to include lots and lots of features.  Or you end up with 
configuration file generation or other nonsense.

In the end, I think I have more faith in the general applicability of 
Python as a way to describe structures, combined with strong 
configuration-specific conventions and style guides.  Otherwise it feels 
like this embeds policy into the configuration-loading code, and I hate 
policy in code.

>>>I'm pretty sure you're not advocating it, but in case you are, I'm not
>>>sure it adds as much value as it removes to be able to have a "dynamic"
>>>middleware chain whereby new middleware elements can be added "on the
>>>fly" to a pipeline after a request has begun.  That is *very* "late
>>>binding" to me and it's impossible to configure declaratively.
>>
>>I'm comfortable with a little of both.  I don't even know *how* I'd stop 
>>dynamic middleware.  For instance, one of the methods I added to Wareweb 
>>recently allows any servlet to forward to any WSGI application; but from 
>>the outside the servlet looks like a normal WSGI application just like 
>>before.
> 
> 
> It's obviously fine if applications themselves want to do this.  I'm not
> sure that it would be possible to create a "deployment spec" that
> canonized *how* to do it because as you mentioned it's not really a
> configuration task, it's a programming task.
> 
> 
>>>I agree!  I'm a bit confused because one of the canonical examples of
>>>how WSGI middleware is useful seems to be the example of implementing a
>>>framework-agnostic sessioning service.  And for that sessioning service
>>>to be useful, your application has to be able to depend on its
>>>availability so it can't be "oblivious".
>>
>>This is where I'd like additional (incrementally agreed upon) standards. 
>>  For instance, a standard for the interface of 'webapp01.session'. 
>>It's a requirement, certainly, but the requirement is merely "there must 
>>be a webapp01-compliant session installed".
> 
> 
> Yes... I think the best way to describe this sort of thing is through
> interfaces (at least notional, documented ones, if not formal ones that
> can be introspected at runtime).  But that will need to be fleshed out
> on a service-by-service basis, obviously.
> 
> FWIW, I'm also finding myself agreeing with Phillip's idea of allowing
> applications to have a context object to which can help them find
> services, as opposed to implementing each service entirely as
> middleware.
> 
> Instead of obtaining the sessioning service via
> "environ['webapp01.session']" in an application's __call__ , you might
> do "self.context.get_service('session')"... or maybe even
> "environ['services'].get_service('session')".  The latter would be
> easier to add because we'd be using an existing PEP 333 protocol.  We'd
> consume a single key within the environ namespace, but there would need
> to be no change to the WSGI spec.

I have to read over PJE's email some more.  It doesn't really remove the 
need for middleware, it's more like it could consolidate many services 
into one generic service middleware.  For instance, the session service 
still needs access to the response, and the only general way to access 
the response is through middleware.  The request, at least, can be 
generally accessed as the environment dictionary; but replacing 
middleware with contracts on what you must return from your application 
is a non-starter.  E.g., if an auth service requires something like:

auth = get_service('auth')
if not auth.allowed(app_context):
     forbidden = auth.forbidden()
     start_response(forbidden[0], forbidden[1])
     return forbidden[2]

Well... that's not very nice, is it?  And it's totally infeasible once 
your code is in the bowls of some framework.  You could do it with an 
exception (with some middleware that catches the exception).  You could 
do the session service with some middleware that collects extra headers 
and other response information.

And now that I'm thinking through an implementation, I realize it's 
something I've thought of before -- in my mind it was about 
lighter-weight filters and simpler configuration, but the implementation 
would be similar.

My only concern is if it confuses the order of filters.  If there's one 
generic service middleware, it's probably going to be invoked before 
some other middleware and after others.  But the services would 
communicate with that service middleware outside of the WSGI band (using 
callbacks or shared structures or something).  This makes it difficult 
for transforming middleware to be certain that it has full control to 
wrap applications.

> This would be pretty straightforward and a separate services framework
> could be implemented outside WSGI entirely perhaps taking some cues from
> PEAK and/or Zope 3 ( or even [gasp] *code!*, god knows this problem has
> already been solved many times over ;-) -- for implementing service
> registration and lookup.  It could form the basis for a "WSGI services"
> spec without muddying the waters for PEP 333.
> 
> That said, if you're not interested in that because you think
> implementing services as middleware is "good enough" and you'd rather
> not implement another framework, I'd totally understand that.  At that
> point I probably wouldn't be interested either because you're the
> defacto champion of WSGI middleware as a lingua franca and the only
> reason to do any of this is for the sake of collaboration and code
> sharing.  But I do think it would be cleaner.

Well, I'm a fan of working code.  If services are a better way of doing 
some of this stuff, and they supercede code I've written or imagined, 
that's not that big a deal.  At this point I'd be interested to see how 
a Really Lame Implementation of Sessions (for instance) would be 
implemented with services.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org