[Web-SIG] Standardized configuration

Sun Jul 17 10:16:14 CEST 2005

Chris McDonough wrote:
>>Because middleware can't be introspected (generally), this makes things 
>>like configuration schemas very hard to implement.  It all needs to be 
>>late-bound.
> 
> 
> The pipeline itself isn't really late bound.  For instance, if I was to
> create a WSGI middleware pipeline something like this:
> 
>    server <--> session <--> identification <--> authentication <--> 
>    <--> challenge <--> application
> 
> ... session, identification, authentication, and challenge are
> middleware components (you'll need to imagine their implementations).
> And within a module that started a server, you might end up doing
> something like:
> 
> def configure_pipeline(app):
>     return SessionMiddleware(
>             IdentificationMiddleware(
>               AuthenticationMiddleware(
>                 ChallengeMiddleware(app)))))
> 
> if __name__ == '__main__':
>     app = Application()
>     pipeline = configure_pipeline(app)
>     server = Server(pipeline)
>     server.serve()

This is what Paste does in configuration, like:

middleware.extend([
     SessionMiddleware, IdentificationMiddleware,
     AuthenticationMiddleware, ChallengeMiddleware])

This kind of middleware takes a single argument, which is the 
application it will wrap.  In practice, this means all the other 
parameters go into lazily-read configuration.

You can also define a "framework" (a plugin to Paste), which in addition 
to finding an "app" can also add middleware; basically embodying all the 
middleware that is typical for a framework.

Paste is really a deployment configuration.  Well, that as well as stuff 
to deploy.  And two frameworks.  And whatever else I feel a need or 
desire to throw in there.

Note also that parts of the pipeline are very much late bound.  For 
instance, the way I implemented Webware (and Wareweb) each servlet is a 
WSGI application.  So while there's one URLParser application, the 
application that actually handles the request differs per request.  If 
you start hanging more complete applications (that might have their own 
middleware) at different URLs, then this happens more generally.

There's a newish poorly tested feature where you can do urlmap['/path'] 
= 'config_file.conf' and it'll hang the application described by that 
configuration file at that URL.

> The pipeline is static.  When a request comes in, the pipeline itself is
> already constructed.  I don't really want a way to prevent "improper"
> pipeline construction at startup time (right now anyway), because
> failures due to missing dependencies will be fairly obvious.

I think that's reasonable too; it's what Paste implements now.

> But some elements of the pipeline at this level of factoring do need to
> have dependencies on availability and pipeline placement of the other
> elements.  In this example, proper operation of the authentication
> component depends on the availability and pipeline placement of the
> identification component.  Likewise, the identification component may
> depend on values that need to be retrieved from the session component.

Yes; and potentially you could have several middlewares implementing the 
same functionality for a single request, e.g., if you had different kind 
of authentication for part of your site/application; that might shadow 
authentication further up the stack.

> I've just seen Phillip's post where he implies that this kind of
> fine-grained component factoring wasn't really the initial purpose of
> WSGI middleware.  That's kind of a bummer. ;-)

Well, I don't understand the services he's proposing yet.  I'm quite 
happy with using middleware the way I have been, so I'm not seeing a 
problem with it, and there's lots of benefits.

> Factoring middleware components in this way seems to provide clear
> demarcation points for reuse and maintenance.  For example, I imagined a
> declarative security module that might be factored as a piece of
> middleware here:  http://www.plope.com/Members/chrism/decsec_proposal .

Yes, I read that before; I haven't quite figured out how to digest it, 
though.  This is probably in part because of the resource-based 
orientation of Zope, and WSGI is application-based, where applications 
are rather opaque and defined only in terms of function.

> Of course, this sort of thing doesn't *need* to be middleware.  But
> making it middleware feels very right to me in terms of being able to
> deglom nice features inspired by Zope and other frameworks into pieces
> that are easy to recombine as necessary.  Implementations as WSGI
> middleware seems a nice way to move these kinds of features out of our
> respective applications and into more application-agnostic pieces that
> are very loosely coupled, but perhaps I'm taking it too far.

Certainly these pieces of code can apply to multiple applications and 
disparate systems.  The most obvious instance right now that I think of 
is a WSGI WebDAV server (and someone's working on that for Google Summer 
of Code), which should be implemented pretty framework-free, simply 
because a good WebDAV implementation works at a low level.  But 
obviously you want that to work with the same authentication as other 
parts of the system.

I guess this is how I come back to lazily introducing middleware.  For 
instance, some "application" (which might be a fairly small bit of 
functionality) might require a session.  If there's no session 
available, then it can probably make a reasonable session itself.  But 
it shouldn't shadow any session available to it, if that's already 
available.  This is doubly true for something more authoritative like 
authentication.

>> I think authorization is different, and is conflated in 
>>paste.login, but I don't have any many use cases where it's a useful 
>>distinction.  I guess there's a number of ways of getting a username and 
>>password; and to some degree the  authenticator object works at that 
>>level of abstraction.  And there's a couple other ways of authenticating 
>>a user as well (public keys, IP address, etc).  I've generally used a 
>>"user manager" object for this kind of abstraction, with subclassing for 
>>different kinds of generality (e.g., the basic abstract class makes 
>>username/password logins simple, but a subclass can override that and 
>>authenticate based on anything in the request).
> 
> 
> Sure.  OTOH, Zope 2 has proven that inheritance makes for a pretty awful
> general reuse pattern when things become sufficiently complicated.

True.  But part of that is having a clear internal and external 
interface.  The external interface -- which you can implement without 
using the abstract (convenience) superclass -- should be small and 
explicit.  I've found interfaces a useful way of adding discipline in 
this way, even though I've never really used them at runtime.

But I think it's reasonable to use inheritance for convenience sake, so 
long as you don't implement more than one thing in a class.

>>As long as it's properly partitioned, I don't think it's a terribly hard 
>>problem.  That is, with proper partitioning the pieces can be 
>>recombined, even if the implementations aren't general enough for all 
>>cases.  Apache and Zope 2 authentication being examples where the 
>>partitioning was done improperly.
> 
> 
> Yes.  I think it goes further than that.  For example, I'd like to have
> be able to swap out implementations of the following kinds of components
> at a level somewhere above my application:
> 
> Sessioning

Yes; we need a standard interface for sessions, but that's pretty 
straight-forward.  There's other levels where a useful standard can be 
implemented as well; for instance, flup.middleware.session has 
SessionStore, which is where most of the parts of the session that you'd 
want to reimplement are implemented.

> Authentication/identification

This seems very doable right now, just by using SCRIPT_NAME.  This leads 
to rather dumb users -- just a string -- but it's a good 
lowest-common-denominator starting point.  More interesting interfaces 
-- like lists of roles/groups, or user objects -- can be added on 
incrementally.

> Authorization (via something like declarative security based on a path)

Sure; I can imagine a whole slew of ways to do authorization.  An 
application can do it simply by returning 403 Forbidden.  A front-end 
middleware could do it with simple pattern matching on the URL.  A URL 
parser (aka traversal) can look for security annotations.

> Virtual hosting awareness

I've never had a problem with this, except in Zope...

Anyway, to me this feels like a kind of URL parsing.  One of the 
mini-proposals I made before involved a way of URL parsers to add URL 
variables to the system (basically a standard WSGI key to put URL 
variables as a dictionary).  So a pattern like:

   (?<username>.*)\.myblogspace.com/(?<year>\d\d\d\d)/(?<month>\d\d)/

Would add username, year, and month variables to the system.  But regex 
matching is just one way; the *result* of parsing is usually either in 
the object (e.g., you use domains to get entirely different sites), or 
in terms of these variables.

> View lookup
> View invocation

This I imagine happening either below WSGI entirely, or as part of a URL 
parser.  There's certainly a place for adaptation at different stages. 
For instance, paste.urlparser.URLParser.get_application() clearly is 
ripe for adaptation.  I imagine this wrapping the "resource" with 
something that renders it using a view.  If you make resources and views 
-- lots of (most?) frameworks use controllers and views, and view lookup 
tends to be controller driven.  So it feels very framework-specific to me.

> Transformation during rendering

If you mean what I think -- e.g., rendering XSL -- I think WSGI is ripe 
for this sort of thing.  So far I've just done small things, like HTML 
checking, debugging log messages, etc.  But other things are very possible.

> Caching

Again, I think this is a very natural fit.  Well, at least for 
whole-page caching.  Partial page caching doesn't really fit well at 
all, I'm afraid, though both systems could use the same caching backend.

> Essentially, as Phillip divined, to do so, I've been trying to construct
> a framework-neutral component system out of middleware pieces to do so,
> but maybe I need to step back from that a bit.  It sure is tempting,
> though. ;-)

I've found it satisfyingly easy.  Maybe there's a "better" way... but 
"better" without "easier" doesn't excite me at all.  And we learn best 
by doing... which is my way of saying you should try it with code right 
now ;)

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org