[Web-SIG] Standardized configuration

Sun Jul 17 13:29:56 CEST 2005

On Sun, 2005-07-17 at 03:16 -0500, Ian Bicking wrote:
> This is what Paste does in configuration, like:
> 
> middleware.extend([
>      SessionMiddleware, IdentificationMiddleware,
>      AuthenticationMiddleware, ChallengeMiddleware])
> 
> This kind of middleware takes a single argument, which is the 
> application it will wrap.  In practice, this means all the other 
> parameters go into lazily-read configuration.

I'm finding it hard to imagine a reason to have another kind of
middleware.

Well, actually that's not true.  In noodling about this, I did think it
would be kind of neat in a twisted way to have "decision middleware"
like:

class DecisionMiddleware:
     def __init__(self, apps):
         self.apps = apps

     def __call__(self, environ, start_response):
        app = self.choose(environ)
        for chunk in app(environ, start_response):
            yield chunk

     def choose(self, environ):
         app = some_decision_function(self.apps, environ)

I can imagine using this pattern as a decision point for a WSGI pipeline
serving multiple application end-points (perhaps based on URL matching
of the PATH_INFO in environ).

But by and large, most middleware components seem to be just wrappers
for the next application in the chain.  There seem to be two types of
middleware that takes a single application object as a parameter to its
constructor.  There is "decorator" middleware where you want to add
something to the environment for an application to find later and
"action" middleware that does some rewriting of the body or the response
headers before the response is sent back to the client.  Some of this
kind of middleware does both.

> You can also define a "framework" (a plugin to Paste), which in addition 
> to finding an "app" can also add middleware; basically embodying all the 
> middleware that is typical for a framework.

This appears to be what I'm trying to do too, which is why I'm intrigued
by Paste.

OTOH, I'm not sure that I want my framework to "find" an app for me.
I'd like to be able to define pipelines that include my app, but I'd
typically just want to statically declare it as the end point of a
pipeline composed of service middleware.  I should look at Paste a
little more to see if it has the same philosophy or if I'm
misunderstanding you.

> Paste is really a deployment configuration.  Well, that as well as stuff 
> to deploy.  And two frameworks.  And whatever else I feel a need or 
> desire to throw in there.

Yeah.  FWIW, as someone who has recently taken a brief look at Paste, I
think it would be helpful (at least for newbies) to partition out the
bits of Paste which are meant to be deployment configuration from the
bits that are meant to be deployed.  Zope 2 fell into the same trap
early on, and never recovered.  For example, ZPublisher (nee Bobo) was
always meant to be able to be useful outside of Zope, but in practice it
never happened because nobody could figure out how to disentangle it
from its ever-increasing dependencies on other software only found in a
Zope checkout.  In the end, nobody even remembered what its dependencies
were *supposed* to be.  If you ask ten people, you'd get ten different
answers.

I also think that the rigor of separating out different components helps
to make the software stronger and more easily understood in bite-sized
pieces.  Unfortunately, separating them makes configuration tough, but I
think that's what we're trying to find an answer about how to do "the
right way" here.

> Note also that parts of the pipeline are very much late bound.  For 
> instance, the way I implemented Webware (and Wareweb) each servlet is a 
> WSGI application.  So while there's one URLParser application, the 
> application that actually handles the request differs per request.  If 
> you start hanging more complete applications (that might have their own 
> middleware) at different URLs, then this happens more generally.

Well, if you put the "decider" in middleware itself, all of the
middleware components in each pipeline could still be at least
constructed early.  I'm pretty sure this doesn't really strictly qualify
as "early binding" but it's not terribly dynamic either.  It also makes
configuration pretty straightforward.  At least I can imagine a
declarative syntax for configuring pipelines this way.

I'm pretty sure you're not advocating it, but in case you are, I'm not
sure it adds as much value as it removes to be able to have a "dynamic"
middleware chain whereby new middleware elements can be added "on the
fly" to a pipeline after a request has begun.  That is *very* "late
binding" to me and it's impossible to configure declaratively.

> > But some elements of the pipeline at this level of factoring do need to
> > have dependencies on availability and pipeline placement of the other
> > elements.  In this example, proper operation of the authentication
> > component depends on the availability and pipeline placement of the
> > identification component.  Likewise, the identification component may
> > depend on values that need to be retrieved from the session component.
> 
> Yes; and potentially you could have several middlewares implementing the 
> same functionality for a single request, e.g., if you had different kind 
> of authentication for part of your site/application; that might shadow 
> authentication further up the stack.

That's true.  In the Zope world, we'd call that a "placeful service".
I'd be tempted to model this with "decision middleware".

> > I've just seen Phillip's post where he implies that this kind of
> > fine-grained component factoring wasn't really the initial purpose of
> > WSGI middleware.  That's kind of a bummer. ;-)
> 
> Well, I don't understand the services he's proposing yet.  I'm quite 
> happy with using middleware the way I have been, so I'm not seeing a 
> problem with it, and there's lots of benefits.

I agree!  I'm a bit confused because one of the canonical examples of
how WSGI middleware is useful seems to be the example of implementing a
framework-agnostic sessioning service.  And for that sessioning service
to be useful, your application has to be able to depend on its
availability so it can't be "oblivious".

OTOH, the primary benefit -- to me, at least -- of modeling services as
WSGI middleware is the fact that someone else might be able to use my
service outside the scope of my projects (and thus help maintain it and
find bugs, etc).  So if I've got the wrong concept of what kinds of
middleware that I can expect "normal" people to use, I don't want to go
very far down that road without listening carefully to Phillip.  Perhaps
I'll have a shot at influencing the direction of WSGI to make it more
appropriate for this sort of thing or maybe we'll come up with a better
way of doing it.

Zope 3 is a component system much like what I'm after, and I may just
end up using it wholesale.  But my immediate problem with Zope 3 is that
like Zope 2, it's a collection of libraries that have dependencies on
other libraries that are only included within its own checkout and don't
yet have much of a life of their own.  It's not really a technical
problem, it's a social one... I'd rather have a somewhat messy framework
with a lot of diversity composed of wildly differing component
implementations that have a life of their own than to be be trapped in a
clean, pure world where all the components are used only within that
world.

I suspect there's a middle ground here somewhere.

> > Factoring middleware components in this way seems to provide clear
> > demarcation points for reuse and maintenance.  For example, I imagined a
> > declarative security module that might be factored as a piece of
> > middleware here:  http://www.plope.com/Members/chrism/decsec_proposal .
> 
> Yes, I read that before; I haven't quite figured out how to digest it, 
> though.  This is probably in part because of the resource-based 
> orientation of Zope, and WSGI is application-based, where applications 
> are rather opaque and defined only in terms of function.

Yes, it is a bit Zopeish because it assumes content lives at a path.
This isn't always the case, I know, but it often is.  Well, it's a bit
of a stretch, but an alternate decsec implementation might use a
"content identifier" to determine the protection of a resource instead
of a full path.

For example, if you're implementing an application that is very simple
and takes one and only one URL, but calls it with a different query
string variable to display different pieces of content (e.g.
'/blog?entry_num=1234'), you might have one ACL as the "root" ACL but
optionally protect each piece of content with a separate ACL if one can
be found.  Maybe the content-specific ACL would be 'entry_num=1234'
instead of a path.  A function that accepts a form post for displaying
or changing the blog entry for 1234 might look like this:

def blog(environ, start_response):
    acl = environ['acl'] # added by decsec middleware
    userid = environ['userid'] # added by an authentication middleware
    formvars = get_form_vars_from(environ)
    if formvars['action'] == "view":
        permission = 'view'
    elif formvars['action'] == "change":
        permission = 'edit'
    content = get_blog_entry(environ)
    # pulls out the entry for 1234
    if not acl.check(userid, permission):
       start_response('401 Unauthorized', [])
       return ['<html>Unauthorized</html>']
   [ ... further code to change or display the blog entry ... ]

The ACL could be the "root" ACL (say, all users can view, members of the
group "manager" could change, everything else is denied).  The "root"
ACL would be used if content did not have its own ACL.  But associating
an ACL with a content identifier would allow the developer or site
manager to protect individual blog entries (e.g. 1234, 5678, etc) with
different ACLs.  "Joe can view this one but he can't change it", "Jim
can view all of them and can change all of them", etc.. the sorts of
things useful for "staging" and workflow delegation without unduly
mucking up the actual application code.

Decsec would also take into account the user's group memberships and so
forth during the "check" step, so you wouldn't have to write any of this
code either.  The "blog" example is stupid, of course, the concept is
more useful for higher-security apps.

Sorry, all of this is somewhat besides the point of this thread, but it
does provide an example of kind of functionality I'd like to be able to
put into middleware.

> > Of course, this sort of thing doesn't *need* to be middleware.  But
> > making it middleware feels very right to me in terms of being able to
> > deglom nice features inspired by Zope and other frameworks into pieces
> > that are easy to recombine as necessary.  Implementations as WSGI
> > middleware seems a nice way to move these kinds of features out of our
> > respective applications and into more application-agnostic pieces that
> > are very loosely coupled, but perhaps I'm taking it too far.
> 
> Certainly these pieces of code can apply to multiple applications and 
> disparate systems.  The most obvious instance right now that I think of 
> is a WSGI WebDAV server (and someone's working on that for Google Summer 
> of Code), which should be implemented pretty framework-free, simply 
> because a good WebDAV implementation works at a low level.  But 
> obviously you want that to work with the same authentication as other 
> parts of the system.

Yes.  In particular, if you knew you were working with an application
that could resolve a path in terms of containers and contained pieces of
content (just like a filesystem does), it would be pretty easy to code
up a DAV "action middleware" component that rendered containerish things
as DAV "collections" and contentish things as DAV "resources", and which
could handle DAV locking and property rendering and so forth.

This kind of middleware might be tough, though, because it probably
requires explicit cooperation from the end-point application (it expects
to be talking to an actual filesystem, but that won't always be the case
at least without some sort of adaptation).

But in any case, it's a good example of how we could prevent people from
needing to reinvent the wheel... this guy appears to be coming up with
his own identification, authentication, authorization, and challenge
libraries entirely http://cwho.blogspot.com/ which just feels very
wasteful.

> I guess this is how I come back to lazily introducing middleware.  For 
> instance, some "application" (which might be a fairly small bit of 
> functionality) might require a session.  If there's no session 
> available, then it can probably make a reasonable session itself.  But 
> it shouldn't shadow any session available to it, if that's already 
> available.  This is doubly true for something more authoritative like 
> authentication.

I'm not sure I know enough to be able to agree or disagree.  But this
seems definitely more in the realm of "late binding", which I'm a little
concerned about from a config perspective.

> > Sure.  OTOH, Zope 2 has proven that inheritance makes for a pretty awful
> > general reuse pattern when things become sufficiently complicated.
> 
> True.  But part of that is having a clear internal and external 
> interface.  The external interface -- which you can implement without 
> using the abstract (convenience) superclass -- should be small and 
> explicit.  I've found interfaces a useful way of adding discipline in 
> this way, even though I've never really used them at runtime.
> 
> But I think it's reasonable to use inheritance for convenience sake, so 
> long as you don't implement more than one thing in a class.

I agree completely.

> > Yes.  I think it goes further than that.  For example, I'd like to have
> > be able to swap out implementations of the following kinds of components
> > at a level somewhere above my application:
> > 
> > Sessioning
> 
> Yes; we need a standard interface for sessions, but that's pretty 
> straight-forward.  There's other levels where a useful standard can be 
> implemented as well; for instance, flup.middleware.session has 
> SessionStore, which is where most of the parts of the session that you'd 
> want to reimplement are implemented.

Yes.  Furthermore, if sessioning is a middleware component, anything can
be a middleware component as far as I can tell. ;-)

> > Authentication/identification
> 
> This seems very doable right now, just by using SCRIPT_NAME.  This leads 
> to rather dumb users -- just a string -- but it's a good 
> lowest-common-denominator starting point.  More interesting interfaces 
> -- like lists of roles/groups, or user objects -- can be added on 
> incrementally.

Sure.

> > Authorization (via something like declarative security based on a path)
> 
> Sure; I can imagine a whole slew of ways to do authorization.  An 
> application can do it simply by returning 403 Forbidden.
>   A front-end 
> middleware could do it with simple pattern matching on the URL.  A URL 
> parser (aka traversal) can look for security annotations.

Yes.  In the simplest case, security annotations for resources could be
kept statically in a Python module.  In more complicated cases, the
application itself would need to collaborate with "upstream" middleware
to do authorization.

> > Virtual hosting awareness
> 
> I've never had a problem with this, except in Zope...
> 
> Anyway, to me this feels like a kind of URL parsing.  One of the 
> mini-proposals I made before involved a way of URL parsers to add URL 
> variables to the system (basically a standard WSGI key to put URL 
> variables as a dictionary).  So a pattern like:
> 
>    (?<username>.*)\.myblogspace.com/(?<year>\d\d\d\d)/(?<month>\d\d)/
> 
> Would add username, year, and month variables to the system.  But regex 
> matching is just one way; the *result* of parsing is usually either in 
> the object (e.g., you use domains to get entirely different sites), or 
> in terms of these variables.

Yes, this seems to be more of a problem for Zope because it's a) a
long-running app with its own webserver b) has convenience functions for
generating URLs based on its internal containment graph and c) doesn't
deal well with relative URLs.  So if you want an application that lives
in a "subfolder" of your Zope object graph to behave as if it lives at
"http://example.com" instead of "http://example.com/subfolder", you need
to give it clues.

> > View lookup
> > View invocation
> 
> This I imagine happening either below WSGI entirely, or as part of a URL 
> parser.  There's certainly a place for adaptation at different stages. 
> For instance, paste.urlparser.URLParser.get_application() clearly is 
> ripe for adaptation.  I imagine this wrapping the "resource" with 
> something that renders it using a view.  If you make resources and views 
> -- lots of (most?) frameworks use controllers and views, and view lookup 
> tends to be controller driven.  So it feels very framework-specific to me.

Yep, I suspect the same.  I think these things will end up in the
end-point application but it's kinda fun to try to think about
abstracting them.

> > Transformation during rendering
> 
> If you mean what I think -- e.g., rendering XSL -- I think WSGI is ripe 
> for this sort of thing.

Yes, that's what I meant.

>   So far I've just done small things, like HTML 
> checking, debugging log messages, etc.  But other things are very possible.
> 
> > Caching
> 
> Again, I think this is a very natural fit.  Well, at least for 
> whole-page caching.  Partial page caching doesn't really fit well at 
> all, I'm afraid, though both systems could use the same caching backend.
> 
> > Essentially, as Phillip divined, to do so, I've been trying to construct
> > a framework-neutral component system out of middleware pieces to do so,
> > but maybe I need to step back from that a bit.  It sure is tempting,
> > though. ;-)
> 
> I've found it satisfyingly easy.  Maybe there's a "better" way... but 
> "better" without "easier" doesn't excite me at all.  And we learn best 
> by doing... which is my way of saying you should try it with code right 
> now ;)

Yes, I should stop blathering and get to work.  I gotta admit that I'm
pretty excited about the possibilities.  It's just reassuring to know
that I'm not entirely insane, or at least that other people are just as
insane as I am. ;-)

- C