[Web-SIG] Standardized configuration
Graham Dumpleton
grahamd at dscpl.com.au
Sun Jul 17 12:04:48 CEST 2005
On 17/07/2005, at 6:16 PM, Ian Bicking wrote:
>> The pipeline itself isn't really late bound. For instance, if I was
>> to
>> create a WSGI middleware pipeline something like this:
>>
>> server <--> session <--> identification <--> authentication <-->
>> <--> challenge <--> application
>>
>> ... session, identification, authentication, and challenge are
>> middleware components (you'll need to imagine their implementations).
>> And within a module that started a server, you might end up doing
>> something like:
>>
>> def configure_pipeline(app):
>> return SessionMiddleware(
>> IdentificationMiddleware(
>> AuthenticationMiddleware(
>> ChallengeMiddleware(app)))))
>
> This is what Paste does in configuration, like:
>
> middleware.extend([
> SessionMiddleware, IdentificationMiddleware,
> AuthenticationMiddleware, ChallengeMiddleware])
>
> This kind of middleware takes a single argument, which is the
> application it will wrap. In practice, this means all the other
> parameters go into lazily-read configuration.
Sorry, but you have given me a nice opening here to hijack this
conversation
a bit and make some comments and pose some questions about WSGI that I
have
been thinking on for a while.
My understanding from reading the WSGI PEP and examples like that above
is
that the WSGI middleware stack concept is very much tree like, but
where at
any specific node within the tree, one can only traverse into one
child. Ie.,
a parent middleware component could make a decision to defer to one
child or
another, but there is no means of really trying out multiple choices
until
you find one that is prepared to handle the request. The only way
around it
seems to be make the linear chain of nested applications longer and
longer,
something which to me just doesn't sit right. In some respects the need
for
the configuration scheme is in part to make that less unwieldy.
To explain what I am going on about, I am going to use examples from
some
work I have been doing with componentised construction of request
handler
stacks in mod_python. I will not use the term middleware here, as I
note that
someone here in this discussion has already made the point of saying
that
the components being talked about here aren't really middleware and in
what
I have been doing I have been taking it to an even more fine grained
level.
I believe I can draw a reasonable analogy to mod_python as at the
simplest,
a mod_python request handler and a WSGI application are both providing
the
most basic function of proving the service for responding to a request,
they just do so in different ways.
Normally in mod_python a handler can return an OK response, an error
response
or a DECLINED response. The DECLINED response is special and indicates
to
mod_python that any further content handlers defined by mod_python
should be
skipped and control passed back up to Apache so that it can potentially
serve up a matched static file.
What I am doing is making it acceptable for a handler to also return
None.
If this were returned by the highest level handler, it would equate to
being
the same as DECLINED, but within the context of middleware components it
has a lightly relaxed meaning. Specifically, it indicates that that
handler
isn't returning a response, but not that it is indicating that the
request
as a whole is being DECLINED causing a return to Apache.
Doing this means that within the context of a tree based middleware
stack,
at a particular node in the stack one can introduce a list of handlers
at
a particular node. Each handler in the list will in turn be tried to see
if it wishes to handle the response, returning either an error or valid
response, or None. If it doesn't raise a response, the next handler in
the
list would be tried until one is found, and if one isn't, then None is
passed
back to the parent middleware component.
This all means I could write something like:
handler = Handlers(
IfLocationMatches(r"/_",NotFound()),
IfLocationMatches(r"\.py(/.*)?$",NotFound()),
PythonModule(),
)
This handler might be associated with any access to a directory as a
whole.
In iterating over each of the handlers it filters out requests to files
that we don't want to provide access to, with the final handler
deferring
to a handler within a Python module associated with the actual resource
being requested. Although Apache provides means of filtering out
requests,
it only works properly for physical files and not virtual resources
specified
by way of the path info.
For example, a file "page.tmpl" (a Cheetah file) could have a "page.py"
file that defines:
handler = Handlers(
IfLocationMatches(r"\.bak(/.*)?$",NotFound()),
IfLocationMatches(r"\.tmpl(/.*)?$",NotFound()),
IfLocationMatches(r"/.*\.html(/.*)?$",CheetahModule()),
)
Again, more filtering and finally a handler is triggered which knows how
to trigger a precompiled Cheetah template stored as a Python module.
All in all a similar tree like structure to WSGI, except you have the
ability
to iterate through handlers at one level with them being able to
explicitly
define that they aren't providing a response and instead allowing the
next
handler to be tried.
My experience with this so far is that it has allowed more fine grained
components to be created which provide specific filtering without it
all turning into a mess due to having to nest each handler within
another
in a big pipeline as things seem they must be done in WSGI.
In mod_python one already has access to a table object storing
configuration
options set within the Apache configuration for mod_python, plus the
ability
to add Python objects into the mod_python request object itself as
necessary
In terms of configuration, using this ability of a list of handlers
where
they don't actually return a response, seems to me to make it easier to
avoid having to have a separate configuration system for most stuff.
For example, I can have a handler "SetPythonOption" which sets an
option in
the options table object and always returns None, thus passing control
onto
the next handler. In the highest level handler before point where
control
is dispatched off to a separate Python module or special purpose
handler, one
can thus define the configuration as necessary.
handler = Handlers(
SetPythonOption("PythonDebug","1"),
SetPythonOption("ApplicationPath","/application"),
IfLocationMatches(r"/_",NotFound()),
IfLocationMatches(r"\.py(/.*)?$",NotFound()),
PythonModule(),
)
In other words, the code itself contains the configuration and one
doesn't
have to worry about where the configuration is found and working out
what
you may need from it. Of course you could still have a separate
configuration
object and provide a special purpose handler which merges that into the
environment of the request object in some way.
For this later case, inline with how its request object is used, you
could
have something like:
config = getApplicationConfig()
handler = Handlers(
SetRequestAttribute("config",config),
IfLocationMatches(r"/_",NotFound()),
IfLocationMatches(r"\.py(/.*)?$",NotFound()),
PythonModule(),
)
Having done that, any later handler could access "req.config" to get
access
to the configuration object and use it as necessary. In WSGI such things
would be placed into the "environ" dictionary and propagated to
subsequent
applications.
One last example, is what a session based login mechanism might look
like
since this was one of the examples posed in the initial discussion.
Here you
might have a handler for a whole directory which contains:
_userDatabase = _users.UserDatabase()
handler = Handlers(
IfLocationMatches(r"\.bak(/.*)?$",NotFound()),
IfLocationMatches(r"\.tmpl(/.*)?$",NotFound()),
IfLocationIsADirectory(ExternalRedirect('index.html')),
# Create session and stick it in request object.
CreateUserSession(),
# Login form shouldn't require user to be logged in to access it.
IfLocationMatches(r"^/login\.html(/.*)?$",CheetahModule()),
# Serve requests against login/logout URLs and otherwise
# don't let request proceed if user not yet authenticated.
# Will redirect to login form if not authenticated.
FormAuthentication(_userDatabase,"login.html"),
SetResponseHeader('Pragma','no-cache'),
SetResponseHeader('Cache-Control','no-cache'),
SetResponseHeader('Expires','-1'),
IfLocationMatches(r"/.*\.html(/.*)?$",CheetahModule()),
)
Again, one has done away with the need for a configuration files as the
code
itself specifies what is required, along with the constraints as to what
order things should be done in.
Another thing this example shows is that handlers when they return None
due
to not returning an actual response, can still add to the response
headers
in the way of special cookies as required by sessions, or headers
controlling
caching etc.
In terms of late binding of which handler is executed, the
"PythonModule"
handler is one example in that it selects which Python module to load
only
when the request is being handled. Another example of late construction
of
an instance of a handler in what I am doing, albeit the same type, is:
class Handler:
def __init__(self,req):
self.__req = req
def __call__(self,name="value"):
self.__req.content_type = "text/html"
self.__req.send_http_header()
self.__req.write("<html><body>")
self.__req.write("<p>name=%r</p>"%cgi.escape(name))
self.__req.write("</body></html>")
return apache.OK
handler = IfExtensionEquals("html",HandlerInstance(Handler))
First off the "HandlerInstance" object is only triggered if the request
against this specific file based resource was by way of a ".html"
extension. When it is triggered, it is only at that point that an
instance
of "Handler" is created, with the request object being supplied to the
constructor.
To round this off, the special "Handlers" handler only contains the
following
code. Pretty simple, but makes construction of the component hierarchy
a bit
easier in my mind when multiple things need to be done in turn where
nesting
isn't strictly required.
class Handlers:
def __init__(self,*handlers):
self.__handlers = handlers
def __call__(self,req):
if len(self.__handlers) != 0:
for handler in self.__handlers:
result = _execute(req,handler,lazy=True)
if result is not None:
return result
Would be very interested to see how people see this relating to what is
possible
with WSGI. Could one instigate a similar sort of class to "Handlers" in
WSGI
to sequence through WSGI applications until one generates a complete
response?
The areas that have me thinking the answer is "no" is that I recollect
the PEP
saying that the "start_response" object can only be called once, which
precludes
applications in a list adding to the response headers without returning
a valid
status. Secondly, if "start_response" object hasn't been called when
the parent
starts to try and construct the response content from the result of
calling the
application, it raises an error. But then, I have a distinct lack of
proper
knowledge on WSGI so could be wrong.
If my thinking is correct, it could only be done by changing the WSGI
specification
to support the concept of trying applications in sequence, by way of
allowing None
as the status when "start_response" is called to indicate the same as
when I return
None from a handler. Ie., the application may have set headers, but
otherwise the
parent should where possible move to a subsequence application and try
it etc.
Anyway, people may feel that this is totally contrary to what WSGI is
all about and
not relevant and that is fine, I am at least finding it an interesting
idea to
play with in respect of mod_python at least.
BTW, WSGI itself could just become a plugable component within this
mod_python
middleware equivalent. :-)
handler = Handlers(
IfLocationMatches(r"/_",NotFound()),
IfLocationMatches(r"\.py(/.*)?$",NotFound()),
WSGIApplicationModule(),
)
Feedback most welcome. I have been trying to work out how what I am
doing may
transfered to WSGI for a little while, but if people think it is a
stupid idea
then I'll no longer waste my time on thinking about it and just stick
with
mod_python.
Graham
More information about the Web-SIG
mailing list