[Web-SIG] Standardized configuration

Sun Jul 17 12:04:48 CEST 2005

On 17/07/2005, at 6:16 PM, Ian Bicking wrote:
>> The pipeline itself isn't really late bound.  For instance, if I was 
>> to
>> create a WSGI middleware pipeline something like this:
>>
>>    server <--> session <--> identification <--> authentication <-->
>>    <--> challenge <--> application
>>
>> ... session, identification, authentication, and challenge are
>> middleware components (you'll need to imagine their implementations).
>> And within a module that started a server, you might end up doing
>> something like:
>>
>> def configure_pipeline(app):
>>     return SessionMiddleware(
>>             IdentificationMiddleware(
>>               AuthenticationMiddleware(
>>                 ChallengeMiddleware(app)))))
>
> This is what Paste does in configuration, like:
>
> middleware.extend([
>      SessionMiddleware, IdentificationMiddleware,
>      AuthenticationMiddleware, ChallengeMiddleware])
>
> This kind of middleware takes a single argument, which is the
> application it will wrap.  In practice, this means all the other
> parameters go into lazily-read configuration.

Sorry, but you have given me a nice opening here to hijack this 
conversation
a bit and make some comments and pose some questions about WSGI that I 
have
been thinking on for a while.

My understanding from reading the WSGI PEP and examples like that above 
is
that the WSGI middleware stack concept is very much tree like, but 
where at
any specific node within the tree, one can only traverse into one 
child. Ie.,
a parent middleware component could make a decision to defer to one 
child or
another, but there is no means of really trying out multiple choices 
until
you find one that is prepared to handle the request. The only way 
around it
seems to be make the linear chain of nested applications longer and 
longer,
something which to me just doesn't sit right. In some respects the need 
for
the configuration scheme is in part to make that less unwieldy.

To explain what I am going on about, I am going to use examples from 
some
work I have been doing with componentised construction of request 
handler
stacks in mod_python. I will not use the term middleware here, as I 
note that
someone here in this discussion has already made the point of saying 
that
the components being talked about here aren't really middleware and in 
what
I have been doing I have been taking it to an even more fine grained 
level.

I believe I can draw a reasonable analogy to mod_python as at the 
simplest,
a mod_python request handler and a WSGI application are both providing 
the
most basic function of proving the service for responding to a request,
they just do so in different ways.

Normally in mod_python a handler can return an OK response, an error 
response
or a DECLINED response. The DECLINED response is special and indicates 
to
mod_python that any further content handlers defined by mod_python 
should be
skipped and control passed back up to Apache so that it can potentially
serve up a matched static file.

What I am doing is making it acceptable for a handler to also return 
None.
If this were returned by the highest level handler, it would equate to 
being
the same as DECLINED, but within the context of middleware components it
has a lightly relaxed meaning. Specifically, it indicates that that 
handler
isn't returning a response, but not that it is indicating that the 
request
as a whole is being DECLINED causing a return to Apache.

Doing this means that within the context of a tree based middleware 
stack,
at a particular node in the stack one can introduce a list of handlers 
at
a particular node. Each handler in the list will in turn be tried to see
if it wishes to handle the response, returning either an error or valid
response, or None. If it doesn't raise a response, the next handler in 
the
list would be tried until one is found, and if one isn't, then None is 
passed
back to the parent middleware component.

This all means I could write something like:

   handler = Handlers(
     IfLocationMatches(r"/_",NotFound()),
     IfLocationMatches(r"\.py(/.*)?$",NotFound()),
     PythonModule(),
   )

This handler might be associated with any access to a directory as a 
whole.
In iterating over each of the handlers it filters out requests to files
that we don't want to provide access to, with the final handler 
deferring
to a handler within a Python module associated with the actual resource
being requested. Although Apache provides means of filtering out 
requests,
it only works properly for physical files and not virtual resources 
specified
by way of the path info.

For example, a file "page.tmpl" (a Cheetah file) could have a "page.py"
file that defines:

   handler = Handlers(
     IfLocationMatches(r"\.bak(/.*)?$",NotFound()),
     IfLocationMatches(r"\.tmpl(/.*)?$",NotFound()),
     IfLocationMatches(r"/.*\.html(/.*)?$",CheetahModule()),
   )

Again, more filtering and finally a handler is triggered which knows how
to trigger a precompiled Cheetah template stored as a Python module.

All in all a similar tree like structure to WSGI, except you have the 
ability
to iterate through handlers at one level with them being able to 
explicitly
define that they aren't providing a response and instead allowing the 
next
handler to be tried.

My experience with this so far is that it has allowed more fine grained
components to be created which provide specific filtering without it
all turning into a mess due to having to nest each handler within 
another
in a big pipeline as things seem they must be done in WSGI.

In mod_python one already has access to a table object storing 
configuration
options set within the Apache configuration for mod_python, plus the 
ability
to add Python objects into the mod_python request object itself as 
necessary
In terms of configuration, using this ability of a list of handlers 
where
they don't actually return a response, seems to me to make it easier to
avoid having to have a separate configuration system for most stuff.

For example, I can have a handler "SetPythonOption" which sets an 
option in
the options table object and always returns None, thus passing control 
onto
the next handler. In the highest level handler before point where 
control
is dispatched off to a separate Python module or special purpose 
handler, one
can thus define the configuration as necessary.

   handler = Handlers(
     SetPythonOption("PythonDebug","1"),
     SetPythonOption("ApplicationPath","/application"),
     IfLocationMatches(r"/_",NotFound()),
     IfLocationMatches(r"\.py(/.*)?$",NotFound()),
     PythonModule(),
   )

In other words, the code itself contains the configuration and one 
doesn't
have to worry about where the configuration is found and working out 
what
you may need from it. Of course you could still have a separate 
configuration
object and provide a special purpose handler which merges that into the
environment of the request object in some way.

For this later case, inline with how its request object is used, you 
could
have something like:

   config = getApplicationConfig()

   handler = Handlers(
     SetRequestAttribute("config",config),
     IfLocationMatches(r"/_",NotFound()),
     IfLocationMatches(r"\.py(/.*)?$",NotFound()),
     PythonModule(),
   )

Having done that, any later handler could access "req.config" to get 
access
to the configuration object and use it as necessary. In WSGI such things
would be placed into the "environ" dictionary and propagated to 
subsequent
applications.

One last example, is what a session based login mechanism might look 
like
since this was one of the examples posed in the initial discussion. 
Here you
might have a handler for a whole directory which contains:

_userDatabase = _users.UserDatabase()

handler = Handlers(
     IfLocationMatches(r"\.bak(/.*)?$",NotFound()),
     IfLocationMatches(r"\.tmpl(/.*)?$",NotFound()),

     IfLocationIsADirectory(ExternalRedirect('index.html')),

     # Create session and stick it in request object.
     CreateUserSession(),

     # Login form shouldn't require user to be logged in to access it.
     IfLocationMatches(r"^/login\.html(/.*)?$",CheetahModule()),

     # Serve requests against login/logout URLs and otherwise
     # don't let request proceed if user not yet authenticated.
     # Will redirect to login form if not authenticated.
     FormAuthentication(_userDatabase,"login.html"),

     SetResponseHeader('Pragma','no-cache'),
     SetResponseHeader('Cache-Control','no-cache'),
     SetResponseHeader('Expires','-1'),

     IfLocationMatches(r"/.*\.html(/.*)?$",CheetahModule()),
)

Again, one has done away with the need for a configuration files as the 
code
itself specifies what is required, along with the constraints as to what
order things should be done in.

Another thing this example shows is that handlers when they return None 
due
to not returning an actual response, can still add to the response 
headers
in the way of special cookies as required by sessions, or headers 
controlling
caching etc.

In terms of late binding of which handler is executed, the 
"PythonModule"
handler is one example in that it selects which Python module to load 
only
when the request is being handled. Another example of late construction 
of
an instance of a handler in what I am doing, albeit the same type, is:

   class Handler:

     def __init__(self,req):
       self.__req = req

     def __call__(self,name="value"):
       self.__req.content_type = "text/html"
       self.__req.send_http_header()
       self.__req.write("<html><body>")
       self.__req.write("<p>name=%r</p>"%cgi.escape(name))
       self.__req.write("</body></html>")
       return apache.OK

   handler = IfExtensionEquals("html",HandlerInstance(Handler))

First off the "HandlerInstance" object is only triggered if the request
against this specific file based resource was by way of a ".html"
extension. When it is triggered, it is only at that point that an 
instance
of "Handler" is created, with the request object being supplied to the
constructor.

To round this off, the special "Handlers" handler only contains the 
following
code. Pretty simple, but makes construction of the component hierarchy 
a bit
easier in my mind when multiple things need to be done in turn where 
nesting
isn't strictly required.

   class Handlers:

     def __init__(self,*handlers):
         self.__handlers = handlers

     def __call__(self,req):
         if len(self.__handlers) != 0:
             for handler in self.__handlers:
                 result = _execute(req,handler,lazy=True)
                 if result is not None:
                     return result

Would be very interested to see how people see this relating to what is 
possible
with WSGI. Could one instigate a similar sort of class to "Handlers" in 
WSGI
to sequence through WSGI applications until one generates a complete 
response?

The areas that have me thinking the answer is "no" is that I recollect 
the PEP
saying that the "start_response" object can only be called once, which 
precludes
applications in a list adding to the response headers without returning 
a valid
status. Secondly, if "start_response" object hasn't been called when 
the parent
starts to try and construct the response content from the result of 
calling the
application, it raises an error. But then, I have a distinct lack of 
proper
knowledge on WSGI so could be wrong.

If my thinking is correct, it could only be done by changing the WSGI 
specification
to support the concept of trying applications in sequence, by way of 
allowing None
as the status when "start_response" is called to indicate the same as 
when I return
None from a handler. Ie., the application may have set headers, but 
otherwise the
parent should where possible move to a subsequence application and try 
it etc.

Anyway, people may feel that this is totally contrary to what WSGI is 
all about and
not relevant and that is fine, I am at least finding it an interesting 
idea to
play with in respect of mod_python at least.

BTW, WSGI itself could just become a plugable component within this 
mod_python
middleware equivalent. :-)

   handler = Handlers(
     IfLocationMatches(r"/_",NotFound()),
     IfLocationMatches(r"\.py(/.*)?$",NotFound()),
     WSGIApplicationModule(),
   )

Feedback most welcome. I have been trying to work out how what I am 
doing may
transfered to WSGI for a little while, but if people think it is a 
stupid idea
then I'll no longer waste my time on thinking about it and just stick 
with
mod_python.

Graham