From deelan at interplanet.it  Mon Jul  4 11:10:55 2005
From: deelan at interplanet.it (deelan)
Date: Mon, 04 Jul 2005 11:10:55 +0200
Subject: [Web-SIG] CSS selector parsing
In-Reply-To: <5b024817050602104429fd8668@mail.gmail.com>
References: <5b024817050602104429fd8668@mail.gmail.com>
Message-ID: <42C8FD1F.1080707@interplanet.it>

Sanghyeon Seo wrote:
> Hello, I am new here.
> 
> Web SIG charter says: "HTML and XML parsing are pretty solid, but a
> critical lack on the client side is the lack of a CSS parser."
> 
> Is there any progress on a CSS parser? Any prior art?

for the record, i've just noticed this:

"cssutils - CSS Cascading Style Sheets library for Python"
<http://cthedot.de/cssutils/>

From ianb at colorstudy.com  Mon Jul 11 20:57:43 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon, 11 Jul 2005 13:57:43 -0500
Subject: [Web-SIG] Standardized configuration
Message-ID: <42D2C127.5060706@colorstudy.com>

Lately I've been thinking about the role of Paste and WSGI and whatnot. 
  Much of what makes a Paste component Pastey is configuration; 
otherwise the bits are just independent pieces of middleware, WSGI 
applications, etc.  So, potentially if we can agree on configuration, we 
can start using each other's middleware more usefully.

I think we should avoid questions of configuration file syntax for now. 
  Lets instead simply consider configuration consumers.  A standard 
would consist of:

* A WSGI environment key (e.g., 'webapp01.config')
* A standard for what goes in that key (e.g., a dictionary object)
* A reference implementation of the middleware
* Maybe a non-WSGI-environment way to access the configuration (like 
paste.CONFIG, which is a global object that dispatches to per-request 
configuration objects) -- in practice this is really really useful, as 
you don't have to pass the configuration object around.

There's some other things we have to consider, as configuration syntaxes 
do effect the configuration objects significantly.  So, the standard for 
what goes in the key has to take into consideration some possible 
configuration syntaxes.

The obvious starting place is a dictionary-like object.  I would suggest 
that the keys should be valid Python identifiers.  Not all syntaxes 
require this, but some do.  This restriction simply means that 
configuration consumers should try to consume Python identifiers.

There's also a question about name conflicts (two consumers that are 
looking for the same key), and whether nested configuration should be 
preferred, and in what style.

Note that the standard we decide on here doesn't have to be the only way 
the object can be accessed.  For instance, you could make your 
configuration available through 'myframework.config', and create a 
compliant wrapper that lives in 'webapp01.config', perhaps even doing 
different kinds of mapping to fix convention differences.

There's also a question about what types of objects we can expect in the 
configuration.  Some input styles (e.g., INI and command line) only 
produce strings.  I think consumers should treat strings (or maybe a 
special string subclass) specially, performing conversions as necessary 
(e.g., 'yes'->True).

Thoughts?

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

From chrism at plope.com  Sun Jul 17 05:37:35 2005
From: chrism at plope.com (Chris McDonough)
Date: Sat, 16 Jul 2005 23:37:35 -0400
Subject: [Web-SIG]  Standardized configuration
Message-ID: <1121571455.24386.171.camel@plope.dyndns.org>

I've also been putting a bit of thought into middleware configuration,
although maybe in a different direction.  I'm not too concerned yet
about being able to introspect the configuration of an individual
component.  Maybe that's because I haven't thought about the problem
enough to be concerned about it.  In the meantime, though, I *am*
concerned about being able to configure a middleware "pipeline" easily
and have it work.

I've been attempting to divine a declarative way to configure a pipeline
of WSGI middleware components.  This is simple enough through code,
except that at least in terms of how I'm attempting to factor my
middleware, some components in the pipeline may have dependencies on
other pipeline components.

For example, it would be useful in some circumstances to create separate
WSGI components for user identification and user authorization.  The
process of identification -- obtaining user credentials from a request
-- and user authorization  -- ensuring that the user is who he says he
is by comparing the credentials against a data source -- are really
pretty much distinct operations.  There might also be a "challenge"
component which forces a login dialog.

In practice, I don't know if this is a truly useful separation of
concerns that need to be implemented in terms of separate components in
the middleware pipeline (I see that paste.login conflates them), it's
just an example.  But at very least it would keep each component simpler
if the concerns were factored out into separate pieces.

But in the example I present, the "authentication" component depends
entirely on the result of the "identification" component.  It would be
simple enough to glom them together by using a distinct environment key
for the identification component results and have the authentication
component look for that key later in the middleware result chain, but
then it feels like you might as well have written the whole process
within one middleware component because the coupling is pretty strong.

I have a feeling that adapters fit in here somewhere, but I haven't
really puzzled that out yet.  I'm sure this has been discussed somewhere
in the lifetime of WSGI but I can't find much in this list's archives.

> Lately I've been thinking about the role of Paste and WSGI and
> whatnot. Much of what makes a Paste component Pastey is
> configuration;  otherwise the bits are just independent pieces of
> middleware, WSGI applications, etc.  So, potentially if we can agree
> on configuration, we can start using each other's middleware more
> usefully.
>
> I think we should avoid questions of configuration file syntax for
> now.  Lets instead simply consider configuration consumers.  A
> standard would consist of:
>
> * A WSGI environment key (e.g., 'webapp01.config')
> * A standard for what goes in that key (e.g., a dictionary object)
> * A reference implementation of the middleware
> * Maybe a non-WSGI-environment way to access the configuration (like 
> paste.CONFIG, which is a global object that dispatches to per-request 
> configuration objects) -- in practice this is really really useful, as 
> you don't have to pass the configuration object around.
>
> There's some other things we have to consider, as configuration syntaxes 
> do effect the configuration objects significantly.  So, the standard for 
> what goes in the key has to take into consideration some possible 
> configuration syntaxes.
>
> The obvious starting place is a dictionary-like object.  I would suggest 
> that the keys should be valid Python identifiers.  Not all syntaxes 
> require this, but some do.  This restriction simply means that 
> configuration consumers should try to consume Python identifiers.
>
> There's also a question about name conflicts (two consumers that are 
> looking for the same key), and whether nested configuration should be 
> preferred, and in what style.
>
> Note that the standard we decide on here doesn't have to be the only way 
> the object can be accessed.  For instance, you could make your 
> configuration available through 'myframework.config', and create a 
> compliant wrapper that lives in 'webapp01.config', perhaps even doing 
> different kinds of mapping to fix convention differences.
>
> There's also a question about what types of objects we can expect in the 
> configuration.  Some input styles (e.g., INI and command line) only 
> produce strings.  I think consumers should treat strings (or maybe a 
> special string subclass) specially, performing conversions as necessary 
> (e.g., 'yes'->True).
>
> Thoughts?


From exarkun at divmod.com  Sun Jul 17 05:52:45 2005
From: exarkun at divmod.com (Jp Calderone)
Date: Sat, 16 Jul 2005 23:52:45 -0400
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <1121571455.24386.171.camel@plope.dyndns.org>
Message-ID: <20050717035245.26278.1537979134.divmod.quotient.13326@ohm>

http://twistedmatrix.com/pipermail/twisted-python/2005-July/010902.html might be of interest on this topic.

Jp

From ianb at colorstudy.com  Sun Jul 17 06:29:46 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Sat, 16 Jul 2005 23:29:46 -0500
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <1121571455.24386.171.camel@plope.dyndns.org>
References: <1121571455.24386.171.camel@plope.dyndns.org>
Message-ID: <42D9DEBA.4080609@colorstudy.com>

Chris McDonough wrote:
> I've also been putting a bit of thought into middleware configuration,
> although maybe in a different direction.  I'm not too concerned yet
> about being able to introspect the configuration of an individual
> component.  Maybe that's because I haven't thought about the problem
> enough to be concerned about it.  In the meantime, though, I *am*
> concerned about being able to configure a middleware "pipeline" easily
> and have it work.

There's nothing in WSGI to facilitate introspection.  Sometimes that 
seems annoying, though I suspect lots of headaches are removed because 
of it, and I haven't found it to be a stopper yet.  The issue I'm 
interested in is just how to deliver configuration to middleware.

Because middleware can't be introspected (generally), this makes things 
like configuration schemas very hard to implement.  It all needs to be 
late-bound.

> I've been attempting to divine a declarative way to configure a pipeline
> of WSGI middleware components.  This is simple enough through code,
> except that at least in terms of how I'm attempting to factor my
> middleware, some components in the pipeline may have dependencies on
> other pipeline components.

At least in Paste, you just have to set up the stack properly.  It would 
be cool if middleware could detect the presence of its prerequesites, 
and add the prerequesites if they weren't present; I don't think that's 
terribly complicated, but I haven't actually tried it.  Mostly you'd 
test for a key, and if not present then you'd instantiate the middleware 
and reinvoke.

> For example, it would be useful in some circumstances to create separate
> WSGI components for user identification and user authorization.  The
> process of identification -- obtaining user credentials from a request
> -- and user authorization  -- ensuring that the user is who he says he
> is by comparing the credentials against a data source -- are really
> pretty much distinct operations.  There might also be a "challenge"
> component which forces a login dialog.

I've always thought that a 401 response is a good way of indicating 
that, but not everyone agrees.  (The idea being that the middleware 
catches the 401 and possibly translates it into a redirect or something.)

> In practice, I don't know if this is a truly useful separation of
> concerns that need to be implemented in terms of separate components in
> the middleware pipeline (I see that paste.login conflates them), it's
> just an example.  

Do you mean identification and authentication (you mention authorization 
above)?  I think authorization is different, and is conflated in 
paste.login, but I don't have any many use cases where it's a useful 
distinction.  I guess there's a number of ways of getting a username and 
password; and to some degree the  authenticator object works at that 
level of abstraction.  And there's a couple other ways of authenticating 
a user as well (public keys, IP address, etc).  I've generally used a 
"user manager" object for this kind of abstraction, with subclassing for 
different kinds of generality (e.g., the basic abstract class makes 
username/password logins simple, but a subclass can override that and 
authenticate based on anything in the request).

Maybe there's a better term, the fact these two words start with "auth" 
causes all kinds of confusion.  Conflating identification and 
authentication isn't so bad, but authentication and authorization is 
really bad (but common).

> But at very least it would keep each component simpler
> if the concerns were factored out into separate pieces.
> 
> But in the example I present, the "authentication" component depends
> entirely on the result of the "identification" component.  It would be
> simple enough to glom them together by using a distinct environment key
> for the identification component results and have the authentication
> component look for that key later in the middleware result chain, but
> then it feels like you might as well have written the whole process
> within one middleware component because the coupling is pretty strong.
> 
> I have a feeling that adapters fit in here somewhere, but I haven't
> really puzzled that out yet.  I'm sure this has been discussed somewhere
> in the lifetime of WSGI but I can't find much in this list's archives.

No, I don't think so.  It was something I experimented with in 
paste.login (purely intellectually, I haven't used it in a real app), 
and Aaron Lav did a little work on it as well, but until it gets some 
use it's hard to know how complete it is.

As long as it's properly partitioned, I don't think it's a terribly hard 
problem.  That is, with proper partitioning the pieces can be 
recombined, even if the implementations aren't general enough for all 
cases.  Apache and Zope 2 authentication being examples where the 
partitioning was done improperly.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From pje at telecommunity.com  Sun Jul 17 06:33:57 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 17 Jul 2005 00:33:57 -0400
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <42D2C127.5060706@colorstudy.com>
Message-ID: <5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com>

At 01:57 PM 7/11/2005 -0500, Ian Bicking wrote:
>Lately I've been thinking about the role of Paste and WSGI and whatnot.
>   Much of what makes a Paste component Pastey is configuration;
>otherwise the bits are just independent pieces of middleware, WSGI
>applications, etc.  So, potentially if we can agree on configuration, we
>can start using each other's middleware more usefully.

I'm going to go ahead and throw my hat in the ring here, even though I've 
been trying to avoid it.

Most of the stuff you are calling middleware really isn't, or at any rate 
it has no reason to be middleware.

What I think you actually need is a way to create WSGI application objects 
with a "context" object.  The "context" object would have a method like 
"get_service(name)", and if it didn't find the service, it would ask its 
parent context, and so on, until there's no parent context to get it 
from.  The web server would provide a way to configure a root or default 
context.

This would allow you to do early binding of services without needing to do 
lookups on every web hit.  E.g.::

     class MyApplication:
         def __init__(self, context):
             self.authenticate = context.get_service('security.authentication')
         def __call__(self, environ, start_response):
             user = self.authenticate(environ)

So, you would simply register an application *factory* with the web server 
instead of an application instance, and it invokes it on the context object 
in order to get the right thing.

Really, the only stuff that actually needs to be middleware, is stuff that 
wraps an *oblivious* application; i.e., the application doesn't know it's 
there.  If it's a service the application uses, then it makes more sense to 
create a service management mechanism for configuration and deployment of 
WSGI applications.

However, I think that the again the key part of configuration that actually 
relates to WSGI here is *deployment* configuration, such as which service 
implementations to use for the various kinds of services.  Configuration 
*of* the services can and should be private to those services, since 
they'll have implementation-specific needs.  (This doesn't mean, however, 
that a "configuration service" couldn't be part of the family of WSGI 
service interfaces.)

I hope this isn't too vague; I've been wanting to say something about this 
since I saw your blog post about doing transaction services in WSGI, as 
that was when I first understood why you were making everything into 
middleware.  (i.e., to create a poor man's substitute for "placeful" 
services and utilities as found in PEAK and Zope 3.)

Anyway, I don't have a problem with trying to create a framework-neutral 
(in theory, anyway) component system, but I think it would be a good idea 
to take lessons from ones that have solved this problem well, and then 
create an extremely scaled-down version, rather than kludging application 
configuration into what's really per-request data.


From chrism at plope.com  Sun Jul 17 07:31:20 2005
From: chrism at plope.com (Chris McDonough)
Date: Sun, 17 Jul 2005 01:31:20 -0400
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <42D9DEBA.4080609@colorstudy.com>
References: <1121571455.24386.171.camel@plope.dyndns.org>
	<42D9DEBA.4080609@colorstudy.com>
Message-ID: <1121578280.24386.228.camel@plope.dyndns.org>

On Sat, 2005-07-16 at 23:29 -0500, Ian Bicking wrote:
> There's nothing in WSGI to facilitate introspection.  Sometimes that 
> seems annoying, though I suspect lots of headaches are removed because 
> of it, and I haven't found it to be a stopper yet.  The issue I'm 
> interested in is just how to deliver configuration to middleware.

Whew, I hoped you'd respond. ;-)

It appears that I haven't gotten as far as to want introspection into
the implementation or configuration of a middleware component.  Instead,
I want the ability to declaratively construct a pipeline out of largely
opaque and potentially interdependent (but loosely coupled) WSGI
middleware components, which is another problem entirely.  It seemed
cogent, so I just somewhat belligerently coopted this thread, sorry!

> Because middleware can't be introspected (generally), this makes things 
> like configuration schemas very hard to implement.  It all needs to be 
> late-bound.

The pipeline itself isn't really late bound.  For instance, if I was to
create a WSGI middleware pipeline something like this:

   server <--> session <--> identification <--> authentication <--> 
   <--> challenge <--> application

... session, identification, authentication, and challenge are
middleware components (you'll need to imagine their implementations).
And within a module that started a server, you might end up doing
something like:

def configure_pipeline(app):
    return SessionMiddleware(
            IdentificationMiddleware(
              AuthenticationMiddleware(
                ChallengeMiddleware(app)))))

if __name__ == '__main__':
    app = Application()
    pipeline = configure_pipeline(app)
    server = Server(pipeline)
    server.serve()

The pipeline is static.  When a request comes in, the pipeline itself is
already constructed.  I don't really want a way to prevent "improper"
pipeline construction at startup time (right now anyway), because
failures due to missing dependencies will be fairly obvious.

But some elements of the pipeline at this level of factoring do need to
have dependencies on availability and pipeline placement of the other
elements.  In this example, proper operation of the authentication
component depends on the availability and pipeline placement of the
identification component.  Likewise, the identification component may
depend on values that need to be retrieved from the session component.

I've just seen Phillip's post where he implies that this kind of
fine-grained component factoring wasn't really the initial purpose of
WSGI middleware.  That's kind of a bummer. ;-)

Factoring middleware components in this way seems to provide clear
demarcation points for reuse and maintenance.  For example, I imagined a
declarative security module that might be factored as a piece of
middleware here:  http://www.plope.com/Members/chrism/decsec_proposal .

Of course, this sort of thing doesn't *need* to be middleware.  But
making it middleware feels very right to me in terms of being able to
deglom nice features inspired by Zope and other frameworks into pieces
that are easy to recombine as necessary.  Implementations as WSGI
middleware seems a nice way to move these kinds of features out of our
respective applications and into more application-agnostic pieces that
are very loosely coupled, but perhaps I'm taking it too far.

> > For example, it would be useful in some circumstances to create separate
> > WSGI components for user identification and user authorization.  The
> > process of identification -- obtaining user credentials from a request
> > -- and user authorization  -- ensuring that the user is who he says he
> > is by comparing the credentials against a data source -- are really
> > pretty much distinct operations.  There might also be a "challenge"
> > component which forces a login dialog.
> 
> I've always thought that a 401 response is a good way of indicating 
> that, but not everyone agrees.  (The idea being that the middleware 
> catches the 401 and possibly translates it into a redirect or something.)

Yep.  That'd be a fine signaling mechanism.

> > In practice, I don't know if this is a truly useful separation of
> > concerns that need to be implemented in terms of separate components in
> > the middleware pipeline (I see that paste.login conflates them), it's
> > just an example.  
> 
> Do you mean identification and authentication (you mention authorization 
> above)? 

Aggh.  Yes, I meant to write authentication, sorry.

>  I think authorization is different, and is conflated in 
> paste.login, but I don't have any many use cases where it's a useful 
> distinction.  I guess there's a number of ways of getting a username and 
> password; and to some degree the  authenticator object works at that 
> level of abstraction.  And there's a couple other ways of authenticating 
> a user as well (public keys, IP address, etc).  I've generally used a 
> "user manager" object for this kind of abstraction, with subclassing for 
> different kinds of generality (e.g., the basic abstract class makes 
> username/password logins simple, but a subclass can override that and 
> authenticate based on anything in the request).

Sure.  OTOH, Zope 2 has proven that inheritance makes for a pretty awful
general reuse pattern when things become sufficiently complicated.

> As long as it's properly partitioned, I don't think it's a terribly hard 
> problem.  That is, with proper partitioning the pieces can be 
> recombined, even if the implementations aren't general enough for all 
> cases.  Apache and Zope 2 authentication being examples where the 
> partitioning was done improperly.

Yes.  I think it goes further than that.  For example, I'd like to have
be able to swap out implementations of the following kinds of components
at a level somewhere above my application:

Sessioning
Authentication/identification
Authorization (via something like declarative security based on a path)
Virtual hosting awareness
View lookup
View invocation
Transformation during rendering
Caching

Essentially, as Phillip divined, to do so, I've been trying to construct
a framework-neutral component system out of middleware pieces to do so,
but maybe I need to step back from that a bit.  It sure is tempting,
though. ;-)

- C


From ianb at colorstudy.com  Sun Jul 17 10:16:14 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun, 17 Jul 2005 03:16:14 -0500
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <1121578280.24386.228.camel@plope.dyndns.org>
References: <1121571455.24386.171.camel@plope.dyndns.org>	
	<42D9DEBA.4080609@colorstudy.com>
	<1121578280.24386.228.camel@plope.dyndns.org>
Message-ID: <42DA13CE.2080208@colorstudy.com>

Chris McDonough wrote:
>>Because middleware can't be introspected (generally), this makes things 
>>like configuration schemas very hard to implement.  It all needs to be 
>>late-bound.
> 
> 
> The pipeline itself isn't really late bound.  For instance, if I was to
> create a WSGI middleware pipeline something like this:
> 
>    server <--> session <--> identification <--> authentication <--> 
>    <--> challenge <--> application
> 
> ... session, identification, authentication, and challenge are
> middleware components (you'll need to imagine their implementations).
> And within a module that started a server, you might end up doing
> something like:
> 
> def configure_pipeline(app):
>     return SessionMiddleware(
>             IdentificationMiddleware(
>               AuthenticationMiddleware(
>                 ChallengeMiddleware(app)))))
> 
> if __name__ == '__main__':
>     app = Application()
>     pipeline = configure_pipeline(app)
>     server = Server(pipeline)
>     server.serve()

This is what Paste does in configuration, like:

middleware.extend([
     SessionMiddleware, IdentificationMiddleware,
     AuthenticationMiddleware, ChallengeMiddleware])

This kind of middleware takes a single argument, which is the 
application it will wrap.  In practice, this means all the other 
parameters go into lazily-read configuration.

You can also define a "framework" (a plugin to Paste), which in addition 
to finding an "app" can also add middleware; basically embodying all the 
middleware that is typical for a framework.

Paste is really a deployment configuration.  Well, that as well as stuff 
to deploy.  And two frameworks.  And whatever else I feel a need or 
desire to throw in there.


Note also that parts of the pipeline are very much late bound.  For 
instance, the way I implemented Webware (and Wareweb) each servlet is a 
WSGI application.  So while there's one URLParser application, the 
application that actually handles the request differs per request.  If 
you start hanging more complete applications (that might have their own 
middleware) at different URLs, then this happens more generally.

There's a newish poorly tested feature where you can do urlmap['/path'] 
= 'config_file.conf' and it'll hang the application described by that 
configuration file at that URL.

> The pipeline is static.  When a request comes in, the pipeline itself is
> already constructed.  I don't really want a way to prevent "improper"
> pipeline construction at startup time (right now anyway), because
> failures due to missing dependencies will be fairly obvious.

I think that's reasonable too; it's what Paste implements now.

> But some elements of the pipeline at this level of factoring do need to
> have dependencies on availability and pipeline placement of the other
> elements.  In this example, proper operation of the authentication
> component depends on the availability and pipeline placement of the
> identification component.  Likewise, the identification component may
> depend on values that need to be retrieved from the session component.

Yes; and potentially you could have several middlewares implementing the 
same functionality for a single request, e.g., if you had different kind 
of authentication for part of your site/application; that might shadow 
authentication further up the stack.

> I've just seen Phillip's post where he implies that this kind of
> fine-grained component factoring wasn't really the initial purpose of
> WSGI middleware.  That's kind of a bummer. ;-)

Well, I don't understand the services he's proposing yet.  I'm quite 
happy with using middleware the way I have been, so I'm not seeing a 
problem with it, and there's lots of benefits.

> Factoring middleware components in this way seems to provide clear
> demarcation points for reuse and maintenance.  For example, I imagined a
> declarative security module that might be factored as a piece of
> middleware here:  http://www.plope.com/Members/chrism/decsec_proposal .

Yes, I read that before; I haven't quite figured out how to digest it, 
though.  This is probably in part because of the resource-based 
orientation of Zope, and WSGI is application-based, where applications 
are rather opaque and defined only in terms of function.

> Of course, this sort of thing doesn't *need* to be middleware.  But
> making it middleware feels very right to me in terms of being able to
> deglom nice features inspired by Zope and other frameworks into pieces
> that are easy to recombine as necessary.  Implementations as WSGI
> middleware seems a nice way to move these kinds of features out of our
> respective applications and into more application-agnostic pieces that
> are very loosely coupled, but perhaps I'm taking it too far.

Certainly these pieces of code can apply to multiple applications and 
disparate systems.  The most obvious instance right now that I think of 
is a WSGI WebDAV server (and someone's working on that for Google Summer 
of Code), which should be implemented pretty framework-free, simply 
because a good WebDAV implementation works at a low level.  But 
obviously you want that to work with the same authentication as other 
parts of the system.

I guess this is how I come back to lazily introducing middleware.  For 
instance, some "application" (which might be a fairly small bit of 
functionality) might require a session.  If there's no session 
available, then it can probably make a reasonable session itself.  But 
it shouldn't shadow any session available to it, if that's already 
available.  This is doubly true for something more authoritative like 
authentication.

>> I think authorization is different, and is conflated in 
>>paste.login, but I don't have any many use cases where it's a useful 
>>distinction.  I guess there's a number of ways of getting a username and 
>>password; and to some degree the  authenticator object works at that 
>>level of abstraction.  And there's a couple other ways of authenticating 
>>a user as well (public keys, IP address, etc).  I've generally used a 
>>"user manager" object for this kind of abstraction, with subclassing for 
>>different kinds of generality (e.g., the basic abstract class makes 
>>username/password logins simple, but a subclass can override that and 
>>authenticate based on anything in the request).
> 
> 
> Sure.  OTOH, Zope 2 has proven that inheritance makes for a pretty awful
> general reuse pattern when things become sufficiently complicated.

True.  But part of that is having a clear internal and external 
interface.  The external interface -- which you can implement without 
using the abstract (convenience) superclass -- should be small and 
explicit.  I've found interfaces a useful way of adding discipline in 
this way, even though I've never really used them at runtime.

But I think it's reasonable to use inheritance for convenience sake, so 
long as you don't implement more than one thing in a class.

>>As long as it's properly partitioned, I don't think it's a terribly hard 
>>problem.  That is, with proper partitioning the pieces can be 
>>recombined, even if the implementations aren't general enough for all 
>>cases.  Apache and Zope 2 authentication being examples where the 
>>partitioning was done improperly.
> 
> 
> Yes.  I think it goes further than that.  For example, I'd like to have
> be able to swap out implementations of the following kinds of components
> at a level somewhere above my application:
> 
> Sessioning

Yes; we need a standard interface for sessions, but that's pretty 
straight-forward.  There's other levels where a useful standard can be 
implemented as well; for instance, flup.middleware.session has 
SessionStore, which is where most of the parts of the session that you'd 
want to reimplement are implemented.

> Authentication/identification

This seems very doable right now, just by using SCRIPT_NAME.  This leads 
to rather dumb users -- just a string -- but it's a good 
lowest-common-denominator starting point.  More interesting interfaces 
-- like lists of roles/groups, or user objects -- can be added on 
incrementally.

> Authorization (via something like declarative security based on a path)

Sure; I can imagine a whole slew of ways to do authorization.  An 
application can do it simply by returning 403 Forbidden.  A front-end 
middleware could do it with simple pattern matching on the URL.  A URL 
parser (aka traversal) can look for security annotations.

> Virtual hosting awareness

I've never had a problem with this, except in Zope...

Anyway, to me this feels like a kind of URL parsing.  One of the 
mini-proposals I made before involved a way of URL parsers to add URL 
variables to the system (basically a standard WSGI key to put URL 
variables as a dictionary).  So a pattern like:

   (?<username>.*)\.myblogspace.com/(?<year>\d\d\d\d)/(?<month>\d\d)/

Would add username, year, and month variables to the system.  But regex 
matching is just one way; the *result* of parsing is usually either in 
the object (e.g., you use domains to get entirely different sites), or 
in terms of these variables.

> View lookup
> View invocation

This I imagine happening either below WSGI entirely, or as part of a URL 
parser.  There's certainly a place for adaptation at different stages. 
For instance, paste.urlparser.URLParser.get_application() clearly is 
ripe for adaptation.  I imagine this wrapping the "resource" with 
something that renders it using a view.  If you make resources and views 
-- lots of (most?) frameworks use controllers and views, and view lookup 
tends to be controller driven.  So it feels very framework-specific to me.

> Transformation during rendering

If you mean what I think -- e.g., rendering XSL -- I think WSGI is ripe 
for this sort of thing.  So far I've just done small things, like HTML 
checking, debugging log messages, etc.  But other things are very possible.

> Caching

Again, I think this is a very natural fit.  Well, at least for 
whole-page caching.  Partial page caching doesn't really fit well at 
all, I'm afraid, though both systems could use the same caching backend.

> Essentially, as Phillip divined, to do so, I've been trying to construct
> a framework-neutral component system out of middleware pieces to do so,
> but maybe I need to step back from that a bit.  It sure is tempting,
> though. ;-)

I've found it satisfyingly easy.  Maybe there's a "better" way... but 
"better" without "easier" doesn't excite me at all.  And we learn best 
by doing... which is my way of saying you should try it with code right 
now ;)

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From ianb at colorstudy.com  Sun Jul 17 10:28:05 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun, 17 Jul 2005 03:28:05 -0500
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com>
References: <5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com>
Message-ID: <42DA1695.7020304@colorstudy.com>

Phillip J. Eby wrote:
> At 01:57 PM 7/11/2005 -0500, Ian Bicking wrote:
> 
>> Lately I've been thinking about the role of Paste and WSGI and whatnot.
>>   Much of what makes a Paste component Pastey is configuration;
>> otherwise the bits are just independent pieces of middleware, WSGI
>> applications, etc.  So, potentially if we can agree on configuration, we
>> can start using each other's middleware more usefully.
> 
> 
> I'm going to go ahead and throw my hat in the ring here, even though 
> I've been trying to avoid it.
> 
> Most of the stuff you are calling middleware really isn't, or at any 
> rate it has no reason to be middleware.

Well, it is if you implement it that way ;)  I think I'd prefer the term 
"filter" actually; less bad connotations for people.  But that's really 
unrelated to your point.

> What I think you actually need is a way to create WSGI application 
> objects with a "context" object.  The "context" object would have a 
> method like "get_service(name)", and if it didn't find the service, it 
> would ask its parent context, and so on, until there's no parent context 
> to get it from.  The web server would provide a way to configure a root 
> or default context.

I guess I'm treating the request environment as that context.  I don't 
really see the problem with that...?

> This would allow you to do early binding of services without needing to 
> do lookups on every web hit.  E.g.::
> 
>     class MyApplication:
>         def __init__(self, context):
>             self.authenticate = 
> context.get_service('security.authentication')
>         def __call__(self, environ, start_response):
>             user = self.authenticate(environ)
> 
> So, you would simply register an application *factory* with the web 
> server instead of an application instance, and it invokes it on the 
> context object in order to get the right thing.

I don't see the distinction between a factory and an instance.  Or at 
least, it's easy to translate from one to the other.

In many cases, the middleware is modifying or watching the application's 
output.  For instance, catching a 401 and turning that into the 
appropriate login -- which might mean producing a 401, a redirect, a 
login page via internal redirect, or whatever.

I guess you could make one Uber Middleware that could handle the 
services' needs to rewrite output, watch for errors and finalize 
resources, etc.  This isn't unreasonable, and I've kind of expected one 
to evolve at some point.  But you'll have to say more to get me to see 
how "services" is a better way to manage this.

> Really, the only stuff that actually needs to be middleware, is stuff 
> that wraps an *oblivious* application; i.e., the application doesn't 
> know it's there.  If it's a service the application uses, then it makes 
> more sense to create a service management mechanism for configuration 
> and deployment of WSGI applications.

Applications always care about the things around them, so any convention 
that middleware and applications be unaware of each other would rule out 
most middleware.

> However, I think that the again the key part of configuration that 
> actually relates to WSGI here is *deployment* configuration, such as 
> which service implementations to use for the various kinds of services.  
> Configuration *of* the services can and should be private to those 
> services, since they'll have implementation-specific needs.  (This 
> doesn't mean, however, that a "configuration service" couldn't be part 
> of the family of WSGI service interfaces.)
> 
> I hope this isn't too vague; I've been wanting to say something about 
> this since I saw your blog post about doing transaction services in 
> WSGI, as that was when I first understood why you were making everything 
> into middleware.  (i.e., to create a poor man's substitute for 
> "placeful" services and utilities as found in PEAK and Zope 3.)

What do they provide that middleware does not?

> Anyway, I don't have a problem with trying to create a framework-neutral 
> (in theory, anyway) component system, but I think it would be a good 
> idea to take lessons from ones that have solved this problem well, and 
> then create an extremely scaled-down version, rather than kludging 
> application configuration into what's really per-request data.

Per-request or not, from the application's side I don't see the 
difference.  It is convenient to put configuration into the request, 
though paste.CONFIG is also provided as a global variable that 
represents the current request's configuration.

In practice the configuration is usually identical for all requests, but 
I haven't seen any reason to enforce this.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From grahamd at dscpl.com.au  Sun Jul 17 12:04:48 2005
From: grahamd at dscpl.com.au (Graham Dumpleton)
Date: Sun, 17 Jul 2005 20:04:48 +1000
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <42DA13CE.2080208@colorstudy.com>
References: <1121571455.24386.171.camel@plope.dyndns.org>	
	<42D9DEBA.4080609@colorstudy.com>
	<1121578280.24386.228.camel@plope.dyndns.org>
	<42DA13CE.2080208@colorstudy.com>
Message-ID: <669891ba7d6b3bb20d95f44ff112074a@dscpl.com.au>


On 17/07/2005, at 6:16 PM, Ian Bicking wrote:
>> The pipeline itself isn't really late bound.  For instance, if I was 
>> to
>> create a WSGI middleware pipeline something like this:
>>
>>    server <--> session <--> identification <--> authentication <-->
>>    <--> challenge <--> application
>>
>> ... session, identification, authentication, and challenge are
>> middleware components (you'll need to imagine their implementations).
>> And within a module that started a server, you might end up doing
>> something like:
>>
>> def configure_pipeline(app):
>>     return SessionMiddleware(
>>             IdentificationMiddleware(
>>               AuthenticationMiddleware(
>>                 ChallengeMiddleware(app)))))
>
> This is what Paste does in configuration, like:
>
> middleware.extend([
>      SessionMiddleware, IdentificationMiddleware,
>      AuthenticationMiddleware, ChallengeMiddleware])
>
> This kind of middleware takes a single argument, which is the
> application it will wrap.  In practice, this means all the other
> parameters go into lazily-read configuration.

Sorry, but you have given me a nice opening here to hijack this 
conversation
a bit and make some comments and pose some questions about WSGI that I 
have
been thinking on for a while.

My understanding from reading the WSGI PEP and examples like that above 
is
that the WSGI middleware stack concept is very much tree like, but 
where at
any specific node within the tree, one can only traverse into one 
child. Ie.,
a parent middleware component could make a decision to defer to one 
child or
another, but there is no means of really trying out multiple choices 
until
you find one that is prepared to handle the request. The only way 
around it
seems to be make the linear chain of nested applications longer and 
longer,
something which to me just doesn't sit right. In some respects the need 
for
the configuration scheme is in part to make that less unwieldy.

To explain what I am going on about, I am going to use examples from 
some
work I have been doing with componentised construction of request 
handler
stacks in mod_python. I will not use the term middleware here, as I 
note that
someone here in this discussion has already made the point of saying 
that
the components being talked about here aren't really middleware and in 
what
I have been doing I have been taking it to an even more fine grained 
level.

I believe I can draw a reasonable analogy to mod_python as at the 
simplest,
a mod_python request handler and a WSGI application are both providing 
the
most basic function of proving the service for responding to a request,
they just do so in different ways.

Normally in mod_python a handler can return an OK response, an error 
response
or a DECLINED response. The DECLINED response is special and indicates 
to
mod_python that any further content handlers defined by mod_python 
should be
skipped and control passed back up to Apache so that it can potentially
serve up a matched static file.

What I am doing is making it acceptable for a handler to also return 
None.
If this were returned by the highest level handler, it would equate to 
being
the same as DECLINED, but within the context of middleware components it
has a lightly relaxed meaning. Specifically, it indicates that that 
handler
isn't returning a response, but not that it is indicating that the 
request
as a whole is being DECLINED causing a return to Apache.

Doing this means that within the context of a tree based middleware 
stack,
at a particular node in the stack one can introduce a list of handlers 
at
a particular node. Each handler in the list will in turn be tried to see
if it wishes to handle the response, returning either an error or valid
response, or None. If it doesn't raise a response, the next handler in 
the
list would be tried until one is found, and if one isn't, then None is 
passed
back to the parent middleware component.

This all means I could write something like:

   handler = Handlers(
     IfLocationMatches(r"/_",NotFound()),
     IfLocationMatches(r"\.py(/.*)?$",NotFound()),
     PythonModule(),
   )

This handler might be associated with any access to a directory as a 
whole.
In iterating over each of the handlers it filters out requests to files
that we don't want to provide access to, with the final handler 
deferring
to a handler within a Python module associated with the actual resource
being requested. Although Apache provides means of filtering out 
requests,
it only works properly for physical files and not virtual resources 
specified
by way of the path info.

For example, a file "page.tmpl" (a Cheetah file) could have a "page.py"
file that defines:

   handler = Handlers(
     IfLocationMatches(r"\.bak(/.*)?$",NotFound()),
     IfLocationMatches(r"\.tmpl(/.*)?$",NotFound()),
     IfLocationMatches(r"/.*\.html(/.*)?$",CheetahModule()),
   )

Again, more filtering and finally a handler is triggered which knows how
to trigger a precompiled Cheetah template stored as a Python module.

All in all a similar tree like structure to WSGI, except you have the 
ability
to iterate through handlers at one level with them being able to 
explicitly
define that they aren't providing a response and instead allowing the 
next
handler to be tried.

My experience with this so far is that it has allowed more fine grained
components to be created which provide specific filtering without it
all turning into a mess due to having to nest each handler within 
another
in a big pipeline as things seem they must be done in WSGI.

In mod_python one already has access to a table object storing 
configuration
options set within the Apache configuration for mod_python, plus the 
ability
to add Python objects into the mod_python request object itself as 
necessary
In terms of configuration, using this ability of a list of handlers 
where
they don't actually return a response, seems to me to make it easier to
avoid having to have a separate configuration system for most stuff.

For example, I can have a handler "SetPythonOption" which sets an 
option in
the options table object and always returns None, thus passing control 
onto
the next handler. In the highest level handler before point where 
control
is dispatched off to a separate Python module or special purpose 
handler, one
can thus define the configuration as necessary.

   handler = Handlers(
     SetPythonOption("PythonDebug","1"),
     SetPythonOption("ApplicationPath","/application"),
     IfLocationMatches(r"/_",NotFound()),
     IfLocationMatches(r"\.py(/.*)?$",NotFound()),
     PythonModule(),
   )

In other words, the code itself contains the configuration and one 
doesn't
have to worry about where the configuration is found and working out 
what
you may need from it. Of course you could still have a separate 
configuration
object and provide a special purpose handler which merges that into the
environment of the request object in some way.

For this later case, inline with how its request object is used, you 
could
have something like:

   config = getApplicationConfig()

   handler = Handlers(
     SetRequestAttribute("config",config),
     IfLocationMatches(r"/_",NotFound()),
     IfLocationMatches(r"\.py(/.*)?$",NotFound()),
     PythonModule(),
   )

Having done that, any later handler could access "req.config" to get 
access
to the configuration object and use it as necessary. In WSGI such things
would be placed into the "environ" dictionary and propagated to 
subsequent
applications.

One last example, is what a session based login mechanism might look 
like
since this was one of the examples posed in the initial discussion. 
Here you
might have a handler for a whole directory which contains:

_userDatabase = _users.UserDatabase()

handler = Handlers(
     IfLocationMatches(r"\.bak(/.*)?$",NotFound()),
     IfLocationMatches(r"\.tmpl(/.*)?$",NotFound()),

     IfLocationIsADirectory(ExternalRedirect('index.html')),

     # Create session and stick it in request object.
     CreateUserSession(),

     # Login form shouldn't require user to be logged in to access it.
     IfLocationMatches(r"^/login\.html(/.*)?$",CheetahModule()),

     # Serve requests against login/logout URLs and otherwise
     # don't let request proceed if user not yet authenticated.
     # Will redirect to login form if not authenticated.
     FormAuthentication(_userDatabase,"login.html"),

     SetResponseHeader('Pragma','no-cache'),
     SetResponseHeader('Cache-Control','no-cache'),
     SetResponseHeader('Expires','-1'),

     IfLocationMatches(r"/.*\.html(/.*)?$",CheetahModule()),
)

Again, one has done away with the need for a configuration files as the 
code
itself specifies what is required, along with the constraints as to what
order things should be done in.

Another thing this example shows is that handlers when they return None 
due
to not returning an actual response, can still add to the response 
headers
in the way of special cookies as required by sessions, or headers 
controlling
caching etc.

In terms of late binding of which handler is executed, the 
"PythonModule"
handler is one example in that it selects which Python module to load 
only
when the request is being handled. Another example of late construction 
of
an instance of a handler in what I am doing, albeit the same type, is:

   class Handler:

     def __init__(self,req):
       self.__req = req

     def __call__(self,name="value"):
       self.__req.content_type = "text/html"
       self.__req.send_http_header()
       self.__req.write("<html><body>")
       self.__req.write("<p>name=%r</p>"%cgi.escape(name))
       self.__req.write("</body></html>")
       return apache.OK

   handler = IfExtensionEquals("html",HandlerInstance(Handler))

First off the "HandlerInstance" object is only triggered if the request
against this specific file based resource was by way of a ".html"
extension. When it is triggered, it is only at that point that an 
instance
of "Handler" is created, with the request object being supplied to the
constructor.

To round this off, the special "Handlers" handler only contains the 
following
code. Pretty simple, but makes construction of the component hierarchy 
a bit
easier in my mind when multiple things need to be done in turn where 
nesting
isn't strictly required.

   class Handlers:

     def __init__(self,*handlers):
         self.__handlers = handlers

     def __call__(self,req):
         if len(self.__handlers) != 0:
             for handler in self.__handlers:
                 result = _execute(req,handler,lazy=True)
                 if result is not None:
                     return result

Would be very interested to see how people see this relating to what is 
possible
with WSGI. Could one instigate a similar sort of class to "Handlers" in 
WSGI
to sequence through WSGI applications until one generates a complete 
response?

The areas that have me thinking the answer is "no" is that I recollect 
the PEP
saying that the "start_response" object can only be called once, which 
precludes
applications in a list adding to the response headers without returning 
a valid
status. Secondly, if "start_response" object hasn't been called when 
the parent
starts to try and construct the response content from the result of 
calling the
application, it raises an error. But then, I have a distinct lack of 
proper
knowledge on WSGI so could be wrong.

If my thinking is correct, it could only be done by changing the WSGI 
specification
to support the concept of trying applications in sequence, by way of 
allowing None
as the status when "start_response" is called to indicate the same as 
when I return
None from a handler. Ie., the application may have set headers, but 
otherwise the
parent should where possible move to a subsequence application and try 
it etc.

Anyway, people may feel that this is totally contrary to what WSGI is 
all about and
not relevant and that is fine, I am at least finding it an interesting 
idea to
play with in respect of mod_python at least.

BTW, WSGI itself could just become a plugable component within this 
mod_python
middleware equivalent. :-)

   handler = Handlers(
     IfLocationMatches(r"/_",NotFound()),
     IfLocationMatches(r"\.py(/.*)?$",NotFound()),
     WSGIApplicationModule(),
   )

Feedback most welcome. I have been trying to work out how what I am 
doing may
transfered to WSGI for a little while, but if people think it is a 
stupid idea
then I'll no longer waste my time on thinking about it and just stick 
with
mod_python.

Graham


From chrism at plope.com  Sun Jul 17 13:29:56 2005
From: chrism at plope.com (Chris McDonough)
Date: Sun, 17 Jul 2005 07:29:56 -0400
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <42DA13CE.2080208@colorstudy.com>
References: <1121571455.24386.171.camel@plope.dyndns.org>
	<42D9DEBA.4080609@colorstudy.com>
	<1121578280.24386.228.camel@plope.dyndns.org>
	<42DA13CE.2080208@colorstudy.com>
Message-ID: <1121599799.24386.347.camel@plope.dyndns.org>

On Sun, 2005-07-17 at 03:16 -0500, Ian Bicking wrote:
> This is what Paste does in configuration, like:
> 
> middleware.extend([
>      SessionMiddleware, IdentificationMiddleware,
>      AuthenticationMiddleware, ChallengeMiddleware])
> 
> This kind of middleware takes a single argument, which is the 
> application it will wrap.  In practice, this means all the other 
> parameters go into lazily-read configuration.

I'm finding it hard to imagine a reason to have another kind of
middleware.

Well, actually that's not true.  In noodling about this, I did think it
would be kind of neat in a twisted way to have "decision middleware"
like:

class DecisionMiddleware:
     def __init__(self, apps):
         self.apps = apps

     def __call__(self, environ, start_response):
        app = self.choose(environ)
        for chunk in app(environ, start_response):
            yield chunk

     def choose(self, environ):
         app = some_decision_function(self.apps, environ)

I can imagine using this pattern as a decision point for a WSGI pipeline
serving multiple application end-points (perhaps based on URL matching
of the PATH_INFO in environ).

But by and large, most middleware components seem to be just wrappers
for the next application in the chain.  There seem to be two types of
middleware that takes a single application object as a parameter to its
constructor.  There is "decorator" middleware where you want to add
something to the environment for an application to find later and
"action" middleware that does some rewriting of the body or the response
headers before the response is sent back to the client.  Some of this
kind of middleware does both.

> You can also define a "framework" (a plugin to Paste), which in addition 
> to finding an "app" can also add middleware; basically embodying all the 
> middleware that is typical for a framework.

This appears to be what I'm trying to do too, which is why I'm intrigued
by Paste.

OTOH, I'm not sure that I want my framework to "find" an app for me.
I'd like to be able to define pipelines that include my app, but I'd
typically just want to statically declare it as the end point of a
pipeline composed of service middleware.  I should look at Paste a
little more to see if it has the same philosophy or if I'm
misunderstanding you.

> Paste is really a deployment configuration.  Well, that as well as stuff 
> to deploy.  And two frameworks.  And whatever else I feel a need or 
> desire to throw in there.

Yeah.  FWIW, as someone who has recently taken a brief look at Paste, I
think it would be helpful (at least for newbies) to partition out the
bits of Paste which are meant to be deployment configuration from the
bits that are meant to be deployed.  Zope 2 fell into the same trap
early on, and never recovered.  For example, ZPublisher (nee Bobo) was
always meant to be able to be useful outside of Zope, but in practice it
never happened because nobody could figure out how to disentangle it
from its ever-increasing dependencies on other software only found in a
Zope checkout.  In the end, nobody even remembered what its dependencies
were *supposed* to be.  If you ask ten people, you'd get ten different
answers.

I also think that the rigor of separating out different components helps
to make the software stronger and more easily understood in bite-sized
pieces.  Unfortunately, separating them makes configuration tough, but I
think that's what we're trying to find an answer about how to do "the
right way" here.

> Note also that parts of the pipeline are very much late bound.  For 
> instance, the way I implemented Webware (and Wareweb) each servlet is a 
> WSGI application.  So while there's one URLParser application, the 
> application that actually handles the request differs per request.  If 
> you start hanging more complete applications (that might have their own 
> middleware) at different URLs, then this happens more generally.

Well, if you put the "decider" in middleware itself, all of the
middleware components in each pipeline could still be at least
constructed early.  I'm pretty sure this doesn't really strictly qualify
as "early binding" but it's not terribly dynamic either.  It also makes
configuration pretty straightforward.  At least I can imagine a
declarative syntax for configuring pipelines this way.

I'm pretty sure you're not advocating it, but in case you are, I'm not
sure it adds as much value as it removes to be able to have a "dynamic"
middleware chain whereby new middleware elements can be added "on the
fly" to a pipeline after a request has begun.  That is *very* "late
binding" to me and it's impossible to configure declaratively.

> > But some elements of the pipeline at this level of factoring do need to
> > have dependencies on availability and pipeline placement of the other
> > elements.  In this example, proper operation of the authentication
> > component depends on the availability and pipeline placement of the
> > identification component.  Likewise, the identification component may
> > depend on values that need to be retrieved from the session component.
> 
> Yes; and potentially you could have several middlewares implementing the 
> same functionality for a single request, e.g., if you had different kind 
> of authentication for part of your site/application; that might shadow 
> authentication further up the stack.

That's true.  In the Zope world, we'd call that a "placeful service".
I'd be tempted to model this with "decision middleware".

> > I've just seen Phillip's post where he implies that this kind of
> > fine-grained component factoring wasn't really the initial purpose of
> > WSGI middleware.  That's kind of a bummer. ;-)
> 
> Well, I don't understand the services he's proposing yet.  I'm quite 
> happy with using middleware the way I have been, so I'm not seeing a 
> problem with it, and there's lots of benefits.

I agree!  I'm a bit confused because one of the canonical examples of
how WSGI middleware is useful seems to be the example of implementing a
framework-agnostic sessioning service.  And for that sessioning service
to be useful, your application has to be able to depend on its
availability so it can't be "oblivious".

OTOH, the primary benefit -- to me, at least -- of modeling services as
WSGI middleware is the fact that someone else might be able to use my
service outside the scope of my projects (and thus help maintain it and
find bugs, etc).  So if I've got the wrong concept of what kinds of
middleware that I can expect "normal" people to use, I don't want to go
very far down that road without listening carefully to Phillip.  Perhaps
I'll have a shot at influencing the direction of WSGI to make it more
appropriate for this sort of thing or maybe we'll come up with a better
way of doing it.

Zope 3 is a component system much like what I'm after, and I may just
end up using it wholesale.  But my immediate problem with Zope 3 is that
like Zope 2, it's a collection of libraries that have dependencies on
other libraries that are only included within its own checkout and don't
yet have much of a life of their own.  It's not really a technical
problem, it's a social one... I'd rather have a somewhat messy framework
with a lot of diversity composed of wildly differing component
implementations that have a life of their own than to be be trapped in a
clean, pure world where all the components are used only within that
world.

I suspect there's a middle ground here somewhere.

> > Factoring middleware components in this way seems to provide clear
> > demarcation points for reuse and maintenance.  For example, I imagined a
> > declarative security module that might be factored as a piece of
> > middleware here:  http://www.plope.com/Members/chrism/decsec_proposal .
> 
> Yes, I read that before; I haven't quite figured out how to digest it, 
> though.  This is probably in part because of the resource-based 
> orientation of Zope, and WSGI is application-based, where applications 
> are rather opaque and defined only in terms of function.

Yes, it is a bit Zopeish because it assumes content lives at a path.
This isn't always the case, I know, but it often is.  Well, it's a bit
of a stretch, but an alternate decsec implementation might use a
"content identifier" to determine the protection of a resource instead
of a full path.

For example, if you're implementing an application that is very simple
and takes one and only one URL, but calls it with a different query
string variable to display different pieces of content (e.g.
'/blog?entry_num=1234'), you might have one ACL as the "root" ACL but
optionally protect each piece of content with a separate ACL if one can
be found.  Maybe the content-specific ACL would be 'entry_num=1234'
instead of a path.  A function that accepts a form post for displaying
or changing the blog entry for 1234 might look like this:

def blog(environ, start_response):
    acl = environ['acl'] # added by decsec middleware
    userid = environ['userid'] # added by an authentication middleware
    formvars = get_form_vars_from(environ)
    if formvars['action'] == "view":
        permission = 'view'
    elif formvars['action'] == "change":
        permission = 'edit'
    content = get_blog_entry(environ)
    # pulls out the entry for 1234
    if not acl.check(userid, permission):
       start_response('401 Unauthorized', [])
       return ['<html>Unauthorized</html>']
   [ ... further code to change or display the blog entry ... ]

The ACL could be the "root" ACL (say, all users can view, members of the
group "manager" could change, everything else is denied).  The "root"
ACL would be used if content did not have its own ACL.  But associating
an ACL with a content identifier would allow the developer or site
manager to protect individual blog entries (e.g. 1234, 5678, etc) with
different ACLs.  "Joe can view this one but he can't change it", "Jim
can view all of them and can change all of them", etc.. the sorts of
things useful for "staging" and workflow delegation without unduly
mucking up the actual application code.

Decsec would also take into account the user's group memberships and so
forth during the "check" step, so you wouldn't have to write any of this
code either.  The "blog" example is stupid, of course, the concept is
more useful for higher-security apps.

Sorry, all of this is somewhat besides the point of this thread, but it
does provide an example of kind of functionality I'd like to be able to
put into middleware.

> > Of course, this sort of thing doesn't *need* to be middleware.  But
> > making it middleware feels very right to me in terms of being able to
> > deglom nice features inspired by Zope and other frameworks into pieces
> > that are easy to recombine as necessary.  Implementations as WSGI
> > middleware seems a nice way to move these kinds of features out of our
> > respective applications and into more application-agnostic pieces that
> > are very loosely coupled, but perhaps I'm taking it too far.
> 
> Certainly these pieces of code can apply to multiple applications and 
> disparate systems.  The most obvious instance right now that I think of 
> is a WSGI WebDAV server (and someone's working on that for Google Summer 
> of Code), which should be implemented pretty framework-free, simply 
> because a good WebDAV implementation works at a low level.  But 
> obviously you want that to work with the same authentication as other 
> parts of the system.

Yes.  In particular, if you knew you were working with an application
that could resolve a path in terms of containers and contained pieces of
content (just like a filesystem does), it would be pretty easy to code
up a DAV "action middleware" component that rendered containerish things
as DAV "collections" and contentish things as DAV "resources", and which
could handle DAV locking and property rendering and so forth.

This kind of middleware might be tough, though, because it probably
requires explicit cooperation from the end-point application (it expects
to be talking to an actual filesystem, but that won't always be the case
at least without some sort of adaptation).

But in any case, it's a good example of how we could prevent people from
needing to reinvent the wheel... this guy appears to be coming up with
his own identification, authentication, authorization, and challenge
libraries entirely http://cwho.blogspot.com/ which just feels very
wasteful.

> I guess this is how I come back to lazily introducing middleware.  For 
> instance, some "application" (which might be a fairly small bit of 
> functionality) might require a session.  If there's no session 
> available, then it can probably make a reasonable session itself.  But 
> it shouldn't shadow any session available to it, if that's already 
> available.  This is doubly true for something more authoritative like 
> authentication.

I'm not sure I know enough to be able to agree or disagree.  But this
seems definitely more in the realm of "late binding", which I'm a little
concerned about from a config perspective.

> > Sure.  OTOH, Zope 2 has proven that inheritance makes for a pretty awful
> > general reuse pattern when things become sufficiently complicated.
> 
> True.  But part of that is having a clear internal and external 
> interface.  The external interface -- which you can implement without 
> using the abstract (convenience) superclass -- should be small and 
> explicit.  I've found interfaces a useful way of adding discipline in 
> this way, even though I've never really used them at runtime.
> 
> But I think it's reasonable to use inheritance for convenience sake, so 
> long as you don't implement more than one thing in a class.

I agree completely.

> > Yes.  I think it goes further than that.  For example, I'd like to have
> > be able to swap out implementations of the following kinds of components
> > at a level somewhere above my application:
> > 
> > Sessioning
> 
> Yes; we need a standard interface for sessions, but that's pretty 
> straight-forward.  There's other levels where a useful standard can be 
> implemented as well; for instance, flup.middleware.session has 
> SessionStore, which is where most of the parts of the session that you'd 
> want to reimplement are implemented.

Yes.  Furthermore, if sessioning is a middleware component, anything can
be a middleware component as far as I can tell. ;-)

> > Authentication/identification
> 
> This seems very doable right now, just by using SCRIPT_NAME.  This leads 
> to rather dumb users -- just a string -- but it's a good 
> lowest-common-denominator starting point.  More interesting interfaces 
> -- like lists of roles/groups, or user objects -- can be added on 
> incrementally.

Sure.

> > Authorization (via something like declarative security based on a path)
> 
> Sure; I can imagine a whole slew of ways to do authorization.  An 
> application can do it simply by returning 403 Forbidden.
>   A front-end 
> middleware could do it with simple pattern matching on the URL.  A URL 
> parser (aka traversal) can look for security annotations.

Yes.  In the simplest case, security annotations for resources could be
kept statically in a Python module.  In more complicated cases, the
application itself would need to collaborate with "upstream" middleware
to do authorization.

> > Virtual hosting awareness
> 
> I've never had a problem with this, except in Zope...
> 
> Anyway, to me this feels like a kind of URL parsing.  One of the 
> mini-proposals I made before involved a way of URL parsers to add URL 
> variables to the system (basically a standard WSGI key to put URL 
> variables as a dictionary).  So a pattern like:
> 
>    (?<username>.*)\.myblogspace.com/(?<year>\d\d\d\d)/(?<month>\d\d)/
> 
> Would add username, year, and month variables to the system.  But regex 
> matching is just one way; the *result* of parsing is usually either in 
> the object (e.g., you use domains to get entirely different sites), or 
> in terms of these variables.

Yes, this seems to be more of a problem for Zope because it's a) a
long-running app with its own webserver b) has convenience functions for
generating URLs based on its internal containment graph and c) doesn't
deal well with relative URLs.  So if you want an application that lives
in a "subfolder" of your Zope object graph to behave as if it lives at
"http://example.com" instead of "http://example.com/subfolder", you need
to give it clues.

> > View lookup
> > View invocation
> 
> This I imagine happening either below WSGI entirely, or as part of a URL 
> parser.  There's certainly a place for adaptation at different stages. 
> For instance, paste.urlparser.URLParser.get_application() clearly is 
> ripe for adaptation.  I imagine this wrapping the "resource" with 
> something that renders it using a view.  If you make resources and views 
> -- lots of (most?) frameworks use controllers and views, and view lookup 
> tends to be controller driven.  So it feels very framework-specific to me.

Yep, I suspect the same.  I think these things will end up in the
end-point application but it's kinda fun to try to think about
abstracting them.

> > Transformation during rendering
> 
> If you mean what I think -- e.g., rendering XSL -- I think WSGI is ripe 
> for this sort of thing.

Yes, that's what I meant.

>   So far I've just done small things, like HTML 
> checking, debugging log messages, etc.  But other things are very possible.
> 
> > Caching
> 
> Again, I think this is a very natural fit.  Well, at least for 
> whole-page caching.  Partial page caching doesn't really fit well at 
> all, I'm afraid, though both systems could use the same caching backend.
> 
> > Essentially, as Phillip divined, to do so, I've been trying to construct
> > a framework-neutral component system out of middleware pieces to do so,
> > but maybe I need to step back from that a bit.  It sure is tempting,
> > though. ;-)
> 
> I've found it satisfyingly easy.  Maybe there's a "better" way... but 
> "better" without "easier" doesn't excite me at all.  And we learn best 
> by doing... which is my way of saying you should try it with code right 
> now ;)

Yes, I should stop blathering and get to work.  I gotta admit that I'm
pretty excited about the possibilities.  It's just reassuring to know
that I'm not entirely insane, or at least that other people are just as
insane as I am. ;-)

- C


From pje at telecommunity.com  Sun Jul 17 19:56:35 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 17 Jul 2005 13:56:35 -0400
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <1121599799.24386.347.camel@plope.dyndns.org>
References: <42DA13CE.2080208@colorstudy.com>
	<1121571455.24386.171.camel@plope.dyndns.org>
	<42D9DEBA.4080609@colorstudy.com>
	<1121578280.24386.228.camel@plope.dyndns.org>
	<42DA13CE.2080208@colorstudy.com>
Message-ID: <5.1.1.6.0.20050717134029.0269b730@mail.telecommunity.com>

At 07:29 AM 7/17/2005 -0400, Chris McDonough wrote:
>I'm a bit confused because one of the canonical examples of
>how WSGI middleware is useful seems to be the example of implementing a
>framework-agnostic sessioning service.  And for that sessioning service
>to be useful, your application has to be able to depend on its
>availability so it can't be "oblivious".

Exactly.  As soon as you start trying to have configured services, you are 
creating Yet Another Framework.  Which isn't a bad thing per se, except 
that it falls outside the scope of  PEP 333.  It deserves a separate PEP, I 
think, and a separate implementation mechanism than being crammed into the 
request environment.  These things should be allowed to be static, so that 
an application can do some reasonable setup, and so that you don't have 
per-request overhead to shove ninety services into the environment.

Also, because we are dealing not with basic plumbing but with making a nice 
kitchen, it seems to me we can afford to make the fixtures nice.  That is, 
for an add-on specification to WSGI we don't need to adhere to the "let it 
be ugly for apps if it makes the server easier" principle that guided PEP 
333.  The assumption there was that people would mostly port existing 
wrappers over HTTP/CGI to be wrappers over WSGI.  But for services, we are 
talking about an actual framework to be used by application developers 
directly, so more user-friendliness is definitely in order.

For WSGI itself, the server-side implementation has to be very server 
specific.  But the bulk of a service stack could be implemented once (e.g. 
as part of wsgiref), and then just used by servers.  So, we don't have to 
worry as much about making it easy for server people to implement, except 
for any server-specific choices about how configuration might be 
stacked.  (For example, in a filesystem-oriented server like Apache, you 
might want subdirectories to inherit services defined in parent directories.)


>OTOH, the primary benefit -- to me, at least -- of modeling services as
>WSGI middleware is the fact that someone else might be able to use my
>service outside the scope of my projects (and thus help maintain it and
>find bugs, etc).  So if I've got the wrong concept of what kinds of
>middleware that I can expect "normal" people to use, I don't want to go
>very far down that road without listening carefully to Phillip.  Perhaps
>I'll have a shot at influencing the direction of WSGI to make it more
>appropriate for this sort of thing or maybe we'll come up with a better
>way of doing it.
>
>Zope 3 is a component system much like what I'm after, and I may just
>end up using it wholesale.  But my immediate problem with Zope 3 is that
>like Zope 2, it's a collection of libraries that have dependencies on
>other libraries that are only included within its own checkout and don't
>yet have much of a life of their own.  It's not really a technical
>problem, it's a social one... I'd rather have a somewhat messy framework
>with a lot of diversity composed of wildly differing component
>implementations that have a life of their own than to be be trapped in a
>clean, pure world where all the components are used only within that
>world.
>
>I suspect there's a middle ground here somewhere.

Right; I'm suggesting that we grow a "WSGI Deployment" or "WSGI Stack" 
specification that includes a simple way to obtain services (using the Zope 
3 definition of "service" as simply a named component).  This would form 
the basis for various "WSGI Service" specifications.  And, for existing 
frameworks there's at least some potential possibility of integrating with 
this stack, since PEAK and Zope 3 both already have ways to define and 
acquire named services, so it might be possible to define the spec in such 
a way that their implementations could be reused by wrapping them in a thin 
"WSGI Stack" adapter.  Similarly, if there are any other frameworks out 
there that offer similar functionality, then they ought to be able to play 
too, at least in principle.


From pje at telecommunity.com  Sun Jul 17 20:23:46 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 17 Jul 2005 14:23:46 -0400
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <42DA1695.7020304@colorstudy.com>
References: <5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com>
	<5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20050717135650.026a0428@mail.telecommunity.com>

At 03:28 AM 7/17/2005 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>What I think you actually need is a way to create WSGI application 
>>objects with a "context" object.  The "context" object would have a 
>>method like "get_service(name)", and if it didn't find the service, it 
>>would ask its parent context, and so on, until there's no parent context 
>>to get it from.  The web server would provide a way to configure a root 
>>or default context.
>
>I guess I'm treating the request environment as that context.  I don't 
>really see the problem with that...?

It puts a layer in the request call stack for each service you want to 
offer, versus *no* layers for an arbitrary number of services.  It adds 
work to every request to put stuff into the environment, then take it out 
again, versus just getting what you want in the first place.


>In many cases, the middleware is modifying or watching the application's 
>output.  For instance, catching a 401 and turning that into the 
>appropriate login -- which might mean producing a 401, a redirect, a login 
>page via internal redirect, or whatever.

And that would be legitimate middleware, except I don't think that's what 
you really want for that use case.  What you want is an "authentication 
service" that you just call to say, "I need a login" and get the login 
information from, and return its return value so that it does 
start_response for you and sends the right output.

The difference is obliviousness; if you want to *wrap* an application not 
written to use WSGI services, then it makes sense to make it 
middleware.  If you're writing a new application, just have it use 
components instead of mocking up a 401 just so you can use the existing 
middleware.

Notice, by the way, that it's trivial to create middleware that detects the 
401 and then *invokes the service*.  So, it's more reusable to make 
services be services, and middleware be wrappers to apply services to 
oblivious applications.


>I guess you could make one Uber Middleware that could handle the services' 
>needs to rewrite output, watch for errors and finalize resources, etc.

Um, it's called a library of functions.  :)  WSGI was designed to make it 
easy to use library calls to do stuff.  If you don't need the 
obliviousness, then library calls (or service calls) are the Obvious Way To 
Do It.


>   This isn't unreasonable, and I've kind of expected one to evolve at 
> some point.  But you'll have to say more to get me to see how "services" 
> is a better way to manage this.

I'm saying that middleware can use services, and applications can use 
services.  Making applications *have to* use middleware in order to use the 
services is wasteful of both computer time and developer brainpower.  Just 
let them use services directly when the situation calls for it, and you can 
always write middleware to use the services when you encounter the 
occasional (and ever-rarer with time) oblivious application.


>>Really, the only stuff that actually needs to be middleware, is stuff 
>>that wraps an *oblivious* application; i.e., the application doesn't know 
>>it's there.  If it's a service the application uses, then it makes more 
>>sense to create a service management mechanism for configuration and 
>>deployment of WSGI applications.
>
>Applications always care about the things around them, so any convention 
>that middleware and applications be unaware of each other would rule out 
>most middleware.

Yes, exactly!  Now you understand me.  :)  If the application is what wants 
the service, let it just call the service.  Middleware is *overhead* in 
that case.


>>I hope this isn't too vague; I've been wanting to say something about 
>>this since I saw your blog post about doing transaction services in WSGI, 
>>as that was when I first understood why you were making everything into 
>>middleware.  (i.e., to create a poor man's substitute for "placeful" 
>>services and utilities as found in PEAK and Zope 3.)
>
>What do they provide that middleware does not?

Well, some services may be things the application needs only when it's 
being initially configured.  Or maybe the service is something like a 
scheduler that gives timed callbacks.  There are lots of non-per-request 
services that make sense, so forcing service access to be only through the 
environment makes for cruftier code, since you now have to keep track of 
whether you've been called before, and then do any setup during your first 
web hit.  For that matter, some service configuration might need to be 
dynamically determined, based on the application object requesting it.

But the main thing they provide that middleware does not is simplicity and 
ease of use.  I understand your desire to preserve the appearance of 
neutrality, but you are creating new web frameworks here, and making them 
ugly doesn't make them any less of a framework.  :)

What's worse is that by tying the service access mechanism to the request 
environment, you're effectively locking out frameworks like PEAK and Zope 3 
from being able to play, and that goes against (IMO) the goals of WSGI, 
which is to get more and more frameworks to be able to play, and give them 
*incentive* to merge and dissolve and be assimilated into the primordial 
soup of WSGI-based integration, or at least to be competitors for various 
implementation/use case niches in the WSGI ecosystem.

See also my message to Chris just now about why a WSGI service spec can and 
should follow different rules of engagement than the WSGI spec did; it 
really isn't necessary to make services ugly for applications in order to 
make it easy for server implementors, as it was for the WSGI core spec.  In 
fact, the opposite condition applies: the service stack should make it easy 
and clean for applications to use WSGI services, because they're the things 
that will let them hide WSGI implementation details in the absence of an 
existing web framework.


From chrism at plope.com  Mon Jul 18 06:57:26 2005
From: chrism at plope.com (Chris McDonough)
Date: Mon, 18 Jul 2005 00:57:26 -0400
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <5.1.1.6.0.20050717134029.0269b730@mail.telecommunity.com>
References: <42DA13CE.2080208@colorstudy.com>
	<1121571455.24386.171.camel@plope.dyndns.org>
	<42D9DEBA.4080609@colorstudy.com>
	<1121578280.24386.228.camel@plope.dyndns.org>
	<42DA13CE.2080208@colorstudy.com>
	<5.1.1.6.0.20050717134029.0269b730@mail.telecommunity.com>
Message-ID: <1121662646.24386.462.camel@plope.dyndns.org>

I tried to think of this today in terms of creating a "deployment spec"
but boy, it gets complicated if you want a lot of useful features out of
it.  I have about four or five pages of a straw man "deployment
configuration" proposal, but it makes way too many assumptions.

So I tried to boil the problem down into its parts.  There seem to be
three distinct categories of configuration:

- Server/gateway/application instance configuration.  This is the
  kind of configuration that may be exposed to deployers by
  application authors.  Creating an instance configuration results
  in an instance of an application or gateway or maybe even
  a server.

- "Wiring" configuration which allows you to string together a
  "stack" out of instances.   I like calling it a "pipeline" better,
  but when in Rome... This is the kind of configuration that
  would be useful if you already have a bunch of instance configurations
  from the step above laying around and you want to create a stack
  out of them for deployment purposes.

- "Service" configuration which allows you create bits of 
  context that can be used by applications in the stack, but which
  aren't inserted into the stack itself.

I suspect we should stick to the first category of configuration first,
but I'll note that the desire for the other two categories might impose
some design constraints on the first.  The last kind of configuration
definitely ventures far out into framework land and though it'd be
terribly useful and seems to be where a lot of people think the value of
WSGI is, it might be something other than WSGI entirely.

So, anyway, towards the first category, I'll throw something out to the
wolves.  Note that below when I say "component" I mean a WSGI server,
gateway, or application:

  Each Python package which includes one or more WSGI components may
  optionally include descriptions of these components'
  "meta-configuration".  This meta-configuration would take the form
  of one or more "schemas".  Each schema would enumerate the
  configurable elements of a single WSGI component implementation.
  A schema for a component defines *the minimal number* of typed,
  component-specific keys and values that may be used to create
  instances of this component.

  >>> # load the schemas
  >>> server_schema  = loadSchema('components/server/server.schema')
  >>> gateway_schema = loadSchema('components/gateway/gateway.schema')
  >>> app_schema     = loadSchema('components/app/app.schema')

  >>> # create the instances; any one of these steps would fail
  >>> # if the config file violated its schema.
  >>> server_factory  = loadConfig('instances/server/server.conf',
                                  schema = server_schema)
  >>> gateway_factory = loadConfig('instances/gateway/gateway.conf',
                                  schema = gateway_schema)
  >>> app_factory     = loadConfig('instances/app/app.conf',
                                    schema = app_schema)

  >>> # create instances from the factories
  >>> server = server_factory.create()
  >>> gateway = gateway_factory.create()
  >>> app = app_factory.create()

  # configure the instances into a pipeline
  >>> pipeline = server(gateway(app))

  # serve up the pipeline (notionally)
  >>> server.serve()

Of course this is just a more declarative way to do what is already
possible in code except for the schema-checking part, which presumably
would supply the deployer with clues if he had screwed up a config file.

I purposely didn't attempt to describe the syntax of the configuration
or schema files, but I suspect it would be best to make them both
ConfigParser files.  FWIW, ZConfig already does this exact thing, and
it's already written, but introducing dependencies on non-stdlib things
seems problematic.

Is this more or less what people have in mind for deployment
configuration or am I out in left field?

On Sun, 2005-07-17 at 13:56 -0400, Phillip J. Eby wrote:
> At 07:29 AM 7/17/2005 -0400, Chris McDonough wrote:
> >I'm a bit confused because one of the canonical examples of
> >how WSGI middleware is useful seems to be the example of implementing a
> >framework-agnostic sessioning service.  And for that sessioning service
> >to be useful, your application has to be able to depend on its
> >availability so it can't be "oblivious".
> 
> Exactly.  As soon as you start trying to have configured services, you are 
> creating Yet Another Framework.  Which isn't a bad thing per se, except 
> that it falls outside the scope of  PEP 333.  It deserves a separate PEP, I 
> think, and a separate implementation mechanism than being crammed into the 
> request environment.  These things should be allowed to be static, so that 
> an application can do some reasonable setup, and so that you don't have 
> per-request overhead to shove ninety services into the environment.
> 
> Also, because we are dealing not with basic plumbing but with making a nice 
> kitchen, it seems to me we can afford to make the fixtures nice.  That is, 
> for an add-on specification to WSGI we don't need to adhere to the "let it 
> be ugly for apps if it makes the server easier" principle that guided PEP 
> 333.  The assumption there was that people would mostly port existing 
> wrappers over HTTP/CGI to be wrappers over WSGI.  But for services, we are 
> talking about an actual framework to be used by application developers 
> directly, so more user-friendliness is definitely in order.
> 
> For WSGI itself, the server-side implementation has to be very server 
> specific.  But the bulk of a service stack could be implemented once (e.g. 
> as part of wsgiref), and then just used by servers.  So, we don't have to 
> worry as much about making it easy for server people to implement, except 
> for any server-specific choices about how configuration might be 
> stacked.  (For example, in a filesystem-oriented server like Apache, you 
> might want subdirectories to inherit services defined in parent directories.)
> 
> 
> >OTOH, the primary benefit -- to me, at least -- of modeling services as
> >WSGI middleware is the fact that someone else might be able to use my
> >service outside the scope of my projects (and thus help maintain it and
> >find bugs, etc).  So if I've got the wrong concept of what kinds of
> >middleware that I can expect "normal" people to use, I don't want to go
> >very far down that road without listening carefully to Phillip.  Perhaps
> >I'll have a shot at influencing the direction of WSGI to make it more
> >appropriate for this sort of thing or maybe we'll come up with a better
> >way of doing it.
> >
> >Zope 3 is a component system much like what I'm after, and I may just
> >end up using it wholesale.  But my immediate problem with Zope 3 is that
> >like Zope 2, it's a collection of libraries that have dependencies on
> >other libraries that are only included within its own checkout and don't
> >yet have much of a life of their own.  It's not really a technical
> >problem, it's a social one... I'd rather have a somewhat messy framework
> >with a lot of diversity composed of wildly differing component
> >implementations that have a life of their own than to be be trapped in a
> >clean, pure world where all the components are used only within that
> >world.
> >
> >I suspect there's a middle ground here somewhere.
> 
> Right; I'm suggesting that we grow a "WSGI Deployment" or "WSGI Stack" 
> specification that includes a simple way to obtain services (using the Zope 
> 3 definition of "service" as simply a named component).  This would form 
> the basis for various "WSGI Service" specifications.  And, for existing 
> frameworks there's at least some potential possibility of integrating with 
> this stack, since PEAK and Zope 3 both already have ways to define and 
> acquire named services, so it might be possible to define the spec in such 
> a way that their implementations could be reused by wrapping them in a thin 
> "WSGI Stack" adapter.  Similarly, if there are any other frameworks out 
> there that offer similar functionality, then they ought to be able to play 
> too, at least in principle.
> 


From mso at oz.net  Mon Jul 18 23:11:51 2005
From: mso at oz.net (mso@oz.net)
Date: Mon, 18 Jul 2005 14:11:51 -0700 (PDT)
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <5.1.1.6.0.20050717135650.026a0428@mail.telecommunity.com>
References: <5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com>
	<5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com>
	<5.1.1.6.0.20050717135650.026a0428@mail.telecommunity.com>
Message-ID: <32994.161.55.66.121.1121721111.squirrel@www.oz.net>

A couple things I don't understand in this discussion.

Phillip J. Eby said:
> At 03:28 AM 7/17/2005 -0500, Ian Bicking wrote:
>>I guess I'm treating the request environment as that context.  I don't
>>really see the problem with that...?
>
> It puts a layer in the request call stack for each service you want to
> offer, versus *no* layers for an arbitrary number of services.  It adds
> work to every request to put stuff into the environment, then take it out
> again, versus just getting what you want in the first place.

But the "overhead" is adding one dictionary item and reading it again.
The most insignificant thing imaginable.  More important is the ugliness
of accessing an arbitrarily-named key in the application, but even that is
minor.

> The difference is obliviousness; if you want to *wrap* an application not
> written to use WSGI services, then it makes sense to make it
> middleware.  If you're writing a new application, just have it use
> components instead of mocking up a 401 just so you can use the existing
> middleware.

That seems to suggest the whole PEP 333 excersise was a waste of time. 
(I'm not saying it is, just that it seems to be the logical conclusion of
your statement.)  WSGI is just "backward compatibility" for existing
applications?  Practically all the interesting middleware falls into this
"component" category.  I'm having a hard time seeing what middleware a
naive CGI/legacy application would benefit from, besides access to
alternative webservers.  (But at this point, none of these are "better"
than the frameworks' native servers.)  Especially since legacy apps access
their services in a framework-specific way and would need specific
middleware or patching.

If a new API is in order, it seems high priority to get a PEP out soon, or
at least some reference implementations.  Otherwise the middleware way
will become a de facto standard.

-- 
-- Mike Orr <mso at oz.net>


From ianb at colorstudy.com  Tue Jul 19 04:57:40 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon, 18 Jul 2005 21:57:40 -0500
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <669891ba7d6b3bb20d95f44ff112074a@dscpl.com.au>
References: <1121571455.24386.171.camel@plope.dyndns.org>	
	<42D9DEBA.4080609@colorstudy.com>
	<1121578280.24386.228.camel@plope.dyndns.org>
	<42DA13CE.2080208@colorstudy.com>
	<669891ba7d6b3bb20d95f44ff112074a@dscpl.com.au>
Message-ID: <42DC6C24.3080905@colorstudy.com>

Graham Dumpleton wrote:
> My understanding from reading the WSGI PEP and examples like that above is
> that the WSGI middleware stack concept is very much tree like, but where at
> any specific node within the tree, one can only traverse into one child. 
> Ie.,
> a parent middleware component could make a decision to defer to one 
> child or
> another, but there is no means of really trying out multiple choices until
> you find one that is prepared to handle the request. The only way around it
> seems to be make the linear chain of nested applications longer and longer,
> something which to me just doesn't sit right. In some respects the need for
> the configuration scheme is in part to make that less unwieldy.

It's not at all limited to this, but these are simply the ones that are 
easy to configure, and can be inserted into a stack without changing the 
stack very much.

> What I am doing is making it acceptable for a handler to also return None.
> If this were returned by the highest level handler, it would equate to 
> being
> the same as DECLINED, but within the context of middleware components it
> has a lightly relaxed meaning. Specifically, it indicates that that handler
> isn't returning a response, but not that it is indicating that the request
> as a whole is being DECLINED causing a return to Apache.

Incidentally, I'd typically use an exception when the return value 
didn't include the semantics I wanted, but that might not be a problem here.

> One last example, is what a session based login mechanism might look like
> since this was one of the examples posed in the initial discussion. Here 
> you
> might have a handler for a whole directory which contains:
> 
> _userDatabase = _users.UserDatabase()
> 
> handler = Handlers(
>     IfLocationMatches(r"\.bak(/.*)?$",NotFound()),
>     IfLocationMatches(r"\.tmpl(/.*)?$",NotFound()),
> 
>     IfLocationIsADirectory(ExternalRedirect('index.html')),
> 
>     # Create session and stick it in request object.
>     CreateUserSession(),
> 
>     # Login form shouldn't require user to be logged in to access it.
>     IfLocationMatches(r"^/login\.html(/.*)?$",CheetahModule()),
> 
>     # Serve requests against login/logout URLs and otherwise
>     # don't let request proceed if user not yet authenticated.
>     # Will redirect to login form if not authenticated.
>     FormAuthentication(_userDatabase,"login.html"),
> 
>     SetResponseHeader('Pragma','no-cache'),
>     SetResponseHeader('Cache-Control','no-cache'),
>     SetResponseHeader('Expires','-1'),
> 
>     IfLocationMatches(r"/.*\.html(/.*)?$",CheetahModule()),
> )
> 
> Again, one has done away with the need for a configuration files as the 
> code
> itself specifies what is required, along with the constraints as to what
> order things should be done in.
> 
> Another thing this example shows is that handlers when they return None due
> to not returning an actual response, can still add to the response headers
> in the way of special cookies as required by sessions, or headers 
> controlling
> caching etc.

This is not possible in WSGI middleware if handled in a chain-like 
fashion.  Nested middleware can do this, of course.

This kind of chaining would be necessary if "services" were used, as 
many services have to effect the response, and there's no WSGI-related 
spec about where or how they would do that.  Though I haven't digested 
all the long emails lately...

> In terms of late binding of which handler is executed, the "PythonModule"
> handler is one example in that it selects which Python module to load only
> when the request is being handled. Another example of late construction of
> an instance of a handler in what I am doing, albeit the same type, is:
> 
>   class Handler:
> 
>     def __init__(self,req):
>       self.__req = req
> 
>     def __call__(self,name="value"):
>       self.__req.content_type = "text/html"
>       self.__req.send_http_header()
>       self.__req.write("<html><body>")
>       self.__req.write("<p>name=%r</p>"%cgi.escape(name))
>       self.__req.write("</body></html>")
>       return apache.OK
> 
>   handler = IfExtensionEquals("html",HandlerInstance(Handler))
> 
> First off the "HandlerInstance" object is only triggered if the request
> against this specific file based resource was by way of a ".html"
> extension. When it is triggered, it is only at that point that an instance
> of "Handler" is created, with the request object being supplied to the
> constructor.

Incidentally, I'm doing something a little like that with the 
filebrowser example in Paste:

http://svn.pythonpaste.org/Paste/trunk/examples/filebrowser/web/__init__.py

Looking at it now, it's not clear where that's happening, but (in 
application()) context.path(path) creates a WSGI application using a 
class based on the extension/expected mime type.  So the dispatching is 
similar.

> To round this off, the special "Handlers" handler only contains the 
> following
> code. Pretty simple, but makes construction of the component hierarchy a 
> bit
> easier in my mind when multiple things need to be done in turn where 
> nesting
> isn't strictly required.
> 
>   class Handlers:
> 
>     def __init__(self,*handlers):
>         self.__handlers = handlers
> 
>     def __call__(self,req):
>         if len(self.__handlers) != 0:
>             for handler in self.__handlers:
>                 result = _execute(req,handler,lazy=True)
>                 if result is not None:
>                     return result
> 
> Would be very interested to see how people see this relating to what is 
> possible
> with WSGI. Could one instigate a similar sort of class to "Handlers" in 
> WSGI
> to sequence through WSGI applications until one generates a complete 
> response?
> 
> The areas that have me thinking the answer is "no" is that I recollect 
> the PEP
> saying that the "start_response" object can only be called once, which 
> precludes
> applications in a list adding to the response headers without returning 
> a valid
> status. Secondly, if "start_response" object hasn't been called when the 
> parent
> starts to try and construct the response content from the result of 
> calling the
> application, it raises an error. But then, I have a distinct lack of proper
> knowledge on WSGI so could be wrong.

When you just want to add headers (like with a session) you can use 
wrapping middleware, which appends to its application's response 
headers, but doesn't create a full response on its own.

As for the order, when there's an issue you can cache the call.  For 
instance, if I want to look at what gets passed to start_response before 
passing it up to the server, I create a fake start_response that just 
saves the values.  Or sometimes a start_response that merely watches the 
values, like when I want to check the content-type to see if I can 
insert information into the page (since you can't append text to an 
image, for instance).

> If my thinking is correct, it could only be done by changing the WSGI 
> specification
> to support the concept of trying applications in sequence, by way of 
> allowing None
> as the status when "start_response" is called to indicate the same as 
> when I return
> None from a handler. Ie., the application may have set headers, but 
> otherwise the
> parent should where possible move to a subsequence application and try 
> it etc.

There's several conventions that could be used for trying applications 
in-sequence.  For instance, you could do something like this (untested) 
for delegating to different apps until one of them doesn't respond with 
a 404:

class FirstFound(object):
     """Try apps in sequence until one doesn't return 404"""
     def __init__(self, apps):
         self.apps = apps
     def __call__(self, environ, start_response):
         def replacement_start_response(status, headers):
             if int(status.split()[0]) == 404:
                 raise HTTPNotFound
             return start_response(status, headers)
         for app in self.apps[:-1]:
             try:
                 return app(environ, replacement_start_response)
             except HTTPNotFound:
                 pass
         # If the last one responds with 404, so be it
         return self.apps[-1](environ, start_response)

> Anyway, people may feel that this is totally contrary to what WSGI is 
> all about and
> not relevant and that is fine, I am at least finding it an interesting 
> idea to
> play with in respect of mod_python at least.

It's very relevent, at least in my opinion.  This is exactly the sort of 
architecture I've been attracted to, and the kind of middleware I've 
been adding to Paste.  The biggest difference is that mod_python uses an 
actual list and return values, where WSGI uses nested function calls.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From ianb at colorstudy.com  Tue Jul 19 05:49:44 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon, 18 Jul 2005 22:49:44 -0500
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <1121599799.24386.347.camel@plope.dyndns.org>
References: <1121571455.24386.171.camel@plope.dyndns.org>	
	<42D9DEBA.4080609@colorstudy.com>	
	<1121578280.24386.228.camel@plope.dyndns.org>	
	<42DA13CE.2080208@colorstudy.com>
	<1121599799.24386.347.camel@plope.dyndns.org>
Message-ID: <42DC7858.80007@colorstudy.com>

Chris McDonough wrote:
> On Sun, 2005-07-17 at 03:16 -0500, Ian Bicking wrote:
> 
>>This is what Paste does in configuration, like:
>>
>>middleware.extend([
>>     SessionMiddleware, IdentificationMiddleware,
>>     AuthenticationMiddleware, ChallengeMiddleware])
>>
>>This kind of middleware takes a single argument, which is the 
>>application it will wrap.  In practice, this means all the other 
>>parameters go into lazily-read configuration.
> 
> 
> I'm finding it hard to imagine a reason to have another kind of
> middleware.
> 
> Well, actually that's not true.  In noodling about this, I did think it
> would be kind of neat in a twisted way to have "decision middleware"
> like:

In addition to the examples I gave in response to Graham, I wrote a 
document on this a while ago: 
http://pythonpaste.org/docs/url-parsing-with-wsgi.html

The hard part about this is configuration; it's easy to configure a 
non-branching chain of middleware.  Once it branches the configuration 
becomes hard (like programming-hard; which isn't *hard*, but it quickly 
stops feeling like configuration).

>>You can also define a "framework" (a plugin to Paste), which in addition 
>>to finding an "app" can also add middleware; basically embodying all the 
>>middleware that is typical for a framework.
> 
> 
> This appears to be what I'm trying to do too, which is why I'm intrigued
> by Paste.
> 
> OTOH, I'm not sure that I want my framework to "find" an app for me.
> I'd like to be able to define pipelines that include my app, but I'd
> typically just want to statically declare it as the end point of a
> pipeline composed of service middleware.  I should look at Paste a
> little more to see if it has the same philosophy or if I'm
> misunderstanding you.

Mostly I wanted to avoid lots of magical incantations for the simple 
case.  If you are used to Webware, well it has a very straight-forward 
way of finding your application -- you give it a directory name.  If 
Quixote or CherryPy, you give it a root object.  Maybe Zope would take a 
ZEO connection string, and so on.

>>Paste is really a deployment configuration.  Well, that as well as stuff 
>>to deploy.  And two frameworks.  And whatever else I feel a need or 
>>desire to throw in there.
> 
> 
> Yeah.  FWIW, as someone who has recently taken a brief look at Paste, I
> think it would be helpful (at least for newbies) to partition out the
> bits of Paste which are meant to be deployment configuration from the
> bits that are meant to be deployed.  Zope 2 fell into the same trap
> early on, and never recovered.  For example, ZPublisher (nee Bobo) was
> always meant to be able to be useful outside of Zope, but in practice it
> never happened because nobody could figure out how to disentangle it
> from its ever-increasing dependencies on other software only found in a
> Zope checkout.  In the end, nobody even remembered what its dependencies
> were *supposed* to be.  If you ask ten people, you'd get ten different
> answers.

Maybe with setuptools' namespace packages I can try this sometime.  It's 
not a high priority, though if splitting pieces out would make them more 
appealing then I could do that.

Deployment doesn't actually interest me, it's just a pain in the ass and 
I wanted to give it a go.  There's no real competition that I know of, 
because it's a boring and annoying problem ;)  So if I split it off, it 
might become accidentally orphaned...

> I also think that the rigor of separating out different components helps
> to make the software stronger and more easily understood in bite-sized
> pieces.  Unfortunately, separating them makes configuration tough, but I
> think that's what we're trying to find an answer about how to do "the
> right way" here.

Yes, you've reminded me why I brought this up, for that exact reason, 
though we've digressed a great deal.  Lots of pieces of Paste have zero 
(or close to it) dependencies, except for configuration.  That's what 
distinguishes a Paste component from a generic WSGI component, and I'm 
just as happy if there is no distinction.

>>Note also that parts of the pipeline are very much late bound.  For 
>>instance, the way I implemented Webware (and Wareweb) each servlet is a 
>>WSGI application.  So while there's one URLParser application, the 
>>application that actually handles the request differs per request.  If 
>>you start hanging more complete applications (that might have their own 
>>middleware) at different URLs, then this happens more generally.
> 
> 
> Well, if you put the "decider" in middleware itself, all of the
> middleware components in each pipeline could still be at least
> constructed early.  I'm pretty sure this doesn't really strictly qualify
> as "early binding" but it's not terribly dynamic either.  It also makes
> configuration pretty straightforward.  At least I can imagine a
> declarative syntax for configuring pipelines this way.

This is close to how Paste works now.  The typical middleware stack does 
everything but find the terminal application object, though with hooks 
if you are inclined to add yet more middleware (like the Paste 
examples.filebrowser.web.__init__.application() object I mentioned before).

> I'm pretty sure you're not advocating it, but in case you are, I'm not
> sure it adds as much value as it removes to be able to have a "dynamic"
> middleware chain whereby new middleware elements can be added "on the
> fly" to a pipeline after a request has begun.  That is *very* "late
> binding" to me and it's impossible to configure declaratively.

I'm comfortable with a little of both.  I don't even know *how* I'd stop 
dynamic middleware.  For instance, one of the methods I added to Wareweb 
recently allows any servlet to forward to any WSGI application; but from 
the outside the servlet looks like a normal WSGI application just like 
before.

I guess this is part of the advantage (and disadvantage) of completely 
opaque applications; you don't and can't know what they do.

>>>I've just seen Phillip's post where he implies that this kind of
>>>fine-grained component factoring wasn't really the initial purpose of
>>>WSGI middleware.  That's kind of a bummer. ;-)
>>
>>Well, I don't understand the services he's proposing yet.  I'm quite 
>>happy with using middleware the way I have been, so I'm not seeing a 
>>problem with it, and there's lots of benefits.
> 
> 
> I agree!  I'm a bit confused because one of the canonical examples of
> how WSGI middleware is useful seems to be the example of implementing a
> framework-agnostic sessioning service.  And for that sessioning service
> to be useful, your application has to be able to depend on its
> availability so it can't be "oblivious".

This is where I'd like additional (incrementally agreed upon) standards. 
  For instance, a standard for the interface of 'webapp01.session'. 
It's a requirement, certainly, but the requirement is merely "there must 
be a webapp01-compliant session installed".

> OTOH, the primary benefit -- to me, at least -- of modeling services as
> WSGI middleware is the fact that someone else might be able to use my
> service outside the scope of my projects (and thus help maintain it and
> find bugs, etc).  So if I've got the wrong concept of what kinds of
> middleware that I can expect "normal" people to use, I don't want to go
> very far down that road without listening carefully to Phillip.  Perhaps
> I'll have a shot at influencing the direction of WSGI to make it more
> appropriate for this sort of thing or maybe we'll come up with a better
> way of doing it.

Well, you can go some ways.  If you are distributing an application -- 
which can be very fine-grained -- you can always resort to invoking 
middleware yourself.  If you are distributing middleware or a library 
that depends on middleware, then dependencies are part of the deployment 
configuration.  Which has always been the case.

Also, a smart middleware can pretend to be many kinds of middleware, by 
putting objects with different (wrapper) interfaces in multiple keys. 
So if we have an explosion of incompatible session middlewares, for 
instance, we can ultimately create an ubersession that maintains 
backward compatibility and provides a forward-compatible interface.

> Zope 3 is a component system much like what I'm after, and I may just
> end up using it wholesale.  But my immediate problem with Zope 3 is that
> like Zope 2, it's a collection of libraries that have dependencies on
> other libraries that are only included within its own checkout and don't
> yet have much of a life of their own.  It's not really a technical
> problem, it's a social one... I'd rather have a somewhat messy framework
> with a lot of diversity composed of wildly differing component
> implementations that have a life of their own than to be be trapped in a
> clean, pure world where all the components are used only within that
> world.

My personal critique would be that Zope 3 adds novel concepts more than 
libraries, and they are better concepts than in Zope 2 (where "concept" 
was just whatever got thrown into the most base classes), but there's 
still a lot of concept there.  Some of them deserve to become part of 
the wider Python knowledge base.  I think some of them don't.  But 
there's no survival of the fittest, since the concepts depend on each other.

> I suspect there's a middle ground here somewhere.
> 
> 
>>>Factoring middleware components in this way seems to provide clear
>>>demarcation points for reuse and maintenance.  For example, I imagined a
>>>declarative security module that might be factored as a piece of
>>>middleware here:  http://www.plope.com/Members/chrism/decsec_proposal .
>>
>>Yes, I read that before; I haven't quite figured out how to digest it, 
>>though.  This is probably in part because of the resource-based 
>>orientation of Zope, and WSGI is application-based, where applications 
>>are rather opaque and defined only in terms of function.
> 
> 
> Yes, it is a bit Zopeish because it assumes content lives at a path.
> This isn't always the case, I know, but it often is.  Well, it's a bit
> of a stretch, but an alternate decsec implementation might use a
> "content identifier" to determine the protection of a resource instead
> of a full path.
> 
> For example, if you're implementing an application that is very simple
> and takes one and only one URL, but calls it with a different query
> string variable to display different pieces of content (e.g.
> '/blog?entry_num=1234'), you might have one ACL as the "root" ACL but
> optionally protect each piece of content with a separate ACL if one can
> be found.  Maybe the content-specific ACL would be 'entry_num=1234'
> instead of a path.  

Zope really puts a lot of importance in paths; though I don't think 
typical Zope applications have any better URLs as a result.  I don't 
know if that's something specific to Zope, or merely the inevitable 
result that when you make something Important you make it Hard and 
Fragile.  I'd actually go for the latter, which is why I'd be very 
reluctant to make URL-based permissions anything more than one tool 
among many.

Something like services seem more practical in this case, or perhaps an 
advisory object that gets placed in the request if we're seeing what we 
can do without services.  The advisory object doesn't know what the 
entry_num=1234 object is, but the application can figure out how that 
object maps to what the advisory object knows about (e.g., owners and 
editors and whatnot).

But oh! that's exactly what you describe below.  With all these long 
emails I don't have the room in my brain to read ahead, because it all 
becomes a jumble of WSGIness.  Which is good, just hard...

> A function that accepts a form post for displaying
> or changing the blog entry for 1234 might look like this:
> 
> def blog(environ, start_response):
>     acl = environ['acl'] # added by decsec middleware
>     userid = environ['userid'] # added by an authentication middleware
>     formvars = get_form_vars_from(environ)
>     if formvars['action'] == "view":
>         permission = 'view'
>     elif formvars['action'] == "change":
>         permission = 'edit'
>     content = get_blog_entry(environ)
>     # pulls out the entry for 1234
>     if not acl.check(userid, permission):
>        start_response('401 Unauthorized', [])
>        return ['<html>Unauthorized</html>']
>    [ ... further code to change or display the blog entry ... ]
> 
> The ACL could be the "root" ACL (say, all users can view, members of the
> group "manager" could change, everything else is denied).  The "root"
> ACL would be used if content did not have its own ACL.  But associating
> an ACL with a content identifier would allow the developer or site
> manager to protect individual blog entries (e.g. 1234, 5678, etc) with
> different ACLs.  "Joe can view this one but he can't change it", "Jim
> can view all of them and can change all of them", etc.. the sorts of
> things useful for "staging" and workflow delegation without unduly
> mucking up the actual application code.
> 
> Decsec would also take into account the user's group memberships and so
> forth during the "check" step, so you wouldn't have to write any of this
> code either.  The "blog" example is stupid, of course, the concept is
> more useful for higher-security apps.
> 
> Sorry, all of this is somewhat besides the point of this thread, but it
> does provide an example of kind of functionality I'd like to be able to
> put into middleware.
> 
> 
>>>Of course, this sort of thing doesn't *need* to be middleware.  But
>>>making it middleware feels very right to me in terms of being able to
>>>deglom nice features inspired by Zope and other frameworks into pieces
>>>that are easy to recombine as necessary.  Implementations as WSGI
>>>middleware seems a nice way to move these kinds of features out of our
>>>respective applications and into more application-agnostic pieces that
>>>are very loosely coupled, but perhaps I'm taking it too far.
>>
>>Certainly these pieces of code can apply to multiple applications and 
>>disparate systems.  The most obvious instance right now that I think of 
>>is a WSGI WebDAV server (and someone's working on that for Google Summer 
>>of Code), which should be implemented pretty framework-free, simply 
>>because a good WebDAV implementation works at a low level.  But 
>>obviously you want that to work with the same authentication as other 
>>parts of the system.
> 
> 
> Yes.  In particular, if you knew you were working with an application
> that could resolve a path in terms of containers and contained pieces of
> content (just like a filesystem does), it would be pretty easy to code
> up a DAV "action middleware" component that rendered containerish things
> as DAV "collections" and contentish things as DAV "resources", and which
> could handle DAV locking and property rendering and so forth.
> 
> This kind of middleware might be tough, though, because it probably
> requires explicit cooperation from the end-point application (it expects
> to be talking to an actual filesystem, but that won't always be the case
> at least without some sort of adaptation).

I think WebDAV is very unripe for WSGI abstractions.  And even if I 
remember the Zope WebDAV code I briefly looked at, it special cases all 
sorts of things (e.g., based on user agent) because there's so much more 
semantics than with a normal web page.  It's the kind of place where 
introspection really would be helpful; though maybe the discipline of 
enforced decoupling would still help.

> But in any case, it's a good example of how we could prevent people from
> needing to reinvent the wheel... this guy appears to be coming up with
> his own identification, authentication, authorization, and challenge
> libraries entirely http://cwho.blogspot.com/ which just feels very
> wasteful.

Yes; I'm his advisor.  I've encouraged him to look at reusing stuff, but 
I really have to give stronger direction.

>>>Virtual hosting awareness
>>
>>I've never had a problem with this, except in Zope...
>>
>>Anyway, to me this feels like a kind of URL parsing.  One of the 
>>mini-proposals I made before involved a way of URL parsers to add URL 
>>variables to the system (basically a standard WSGI key to put URL 
>>variables as a dictionary).  So a pattern like:
>>
>>   (?<username>.*)\.myblogspace.com/(?<year>\d\d\d\d)/(?<month>\d\d)/
>>
>>Would add username, year, and month variables to the system.  But regex 
>>matching is just one way; the *result* of parsing is usually either in 
>>the object (e.g., you use domains to get entirely different sites), or 
>>in terms of these variables.
> 
> 
> Yes, this seems to be more of a problem for Zope because it's a) a
> long-running app with its own webserver b) has convenience functions for
> generating URLs based on its internal containment graph and c) doesn't
> deal well with relative URLs.  So if you want an application that lives
> in a "subfolder" of your Zope object graph to behave as if it lives at
> "http://example.com" instead of "http://example.com/subfolder", you need
> to give it clues.

Incidentally, since this is frequently a problem, for my applications 
I've been using something bookmark-like; at some point in the request 
(often just before URLParser is invoked) I store the SCRIPT_NAME and 
give it some name (like 'app_name.base_url').  Then I can construct all 
my URLs relative to that.  This still involves information I keep in my 
head (like how internal URLs are constructed), but at least it gets it 
right without hardcoding/configuring URLs, or being clever and getting 
it wrong.

>>>Transformation during rendering
>>
>>If you mean what I think -- e.g., rendering XSL -- I think WSGI is ripe 
>>for this sort of thing.
> 
> 
> Yes, that's what I meant.

Incidentally someone just did an XSLT middleware today: 
http://www.decafbad.com/blog/2005/07/18/discovering_wsgi_and_xslt_as_middleware

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From grahamd at dscpl.com.au  Tue Jul 19 06:02:10 2005
From: grahamd at dscpl.com.au (Graham Dumpleton)
Date: Tue, 19 Jul 2005 00:02:10 -0400
Subject: [Web-SIG] Standardized configuration
Message-ID: <1121745730.17225@dscpl.user.openhosting.com>

Ian Bicking wrote ..
> There's several conventions that could be used for trying applications
> in-sequence.  For instance, you could do something like this (untested)
> for delegating to different apps until one of them doesn't respond with
> a 404:
> 
> class FirstFound(object):
>      """Try apps in sequence until one doesn't return 404"""
>      def __init__(self, apps):
>          self.apps = apps
>      def __call__(self, environ, start_response):
>          def replacement_start_response(status, headers):
>              if int(status.split()[0]) == 404:
>                  raise HTTPNotFound
>              return start_response(status, headers)
>          for app in self.apps[:-1]:
>              try:
>                  return app(environ, replacement_start_response)
>              except HTTPNotFound:
>                  pass
>          # If the last one responds with 404, so be it
>          return self.apps[-1](environ, start_response)
> 
> > Anyway, people may feel that this is totally contrary to what WSGI is
> > all about and
> > not relevant and that is fine, I am at least finding it an interesting
> > idea to
> > play with in respect of mod_python at least.

As far as using 404 to indicate this, I had thought of that, but it then
precludes one of those applications actually raising that as a real
response. I often return NotFound as opposed to Forbidden when
access is to files such as ".py" files. Return forbidden still gives a
clue as to what implementation language is used where as returning not
found doesn't. I do this, perhaps in a misguided way, as by not exposing
how something is implemented, I feel it makes it just a bit harder for
people to work out how to breach your security. :-)

If one was going to use a specific error code to indicate next
application object should be tried, maybe it might be more appropriate
to use 303 (See Other) with there being no redirect URL specified. Ie.,
something that doesn't necessarily overlap with something that might
be valid for a application object to do.

> It's very relevent, at least in my opinion.  This is exactly the sort of
> architecture I've been attracted to, and the kind of middleware I've 
> been adding to Paste.  The biggest difference is that mod_python uses an
> actual list and return values, where WSGI uses nested function calls.

To say that mod_python uses an actual list is only really true at the
level of Apache configuration where one defines the PythonHandler
directive and can specify multiple handlers to run in succession. Most
people would only have the one.

At the level I am working where I use "Handlers()", not a part of
mod_python itself, I am using both sequences of handlers as well as
recursive nesting. The "IfLocationMatches()" object in my examples was
wrapping the "NotFound()" object, but it could equally have wrapped a
"Handlers()" or another "If" object, which in turn wraps lower level
objects. Even the "PythonModule()" object wrapped objects indirectly,
they just happen to be loaded at run time much like the URLParser
example for Paste.

Thus I am using both lists and nested callable objects in the way of
wrappers. WSGI seems to focus mainly on the latter of using only nested
calls in all the examples I have seen, although you do show above one
way perhaps of having a lineal search for an application object.

Anyway, the point I was trying to make was that to me, the lineal
search through a list of handlers (or application objects) seems
to be an easier way of dealing with things in some cases and looks
simpler in code than having a long nested chain of objects, yet WSGI
doesn't seem to make any real use of that approach to composing
together middleware components.

I'll leave it at that for the moment. I guess I'll just have to show whether
one way works better and is easier to understand than the other by way
of example at some point. :-)

Thanks for the response.

Graham

From chrism at plope.com  Tue Jul 19 08:39:11 2005
From: chrism at plope.com (Chris McDonough)
Date: Tue, 19 Jul 2005 02:39:11 -0400
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <42DC7858.80007@colorstudy.com>
References: <1121571455.24386.171.camel@plope.dyndns.org>
	<42D9DEBA.4080609@colorstudy.com>
	<1121578280.24386.228.camel@plope.dyndns.org>
	<42DA13CE.2080208@colorstudy.com>
	<1121599799.24386.347.camel@plope.dyndns.org>
	<42DC7858.80007@colorstudy.com>
Message-ID: <1121755151.13123.70.camel@plope.dyndns.org>

On Mon, 2005-07-18 at 22:49 -0500, Ian Bicking wrote:
> In addition to the examples I gave in response to Graham, I wrote a 
> document on this a while ago: 
> http://pythonpaste.org/docs/url-parsing-with-wsgi.html
> 
> The hard part about this is configuration; it's easy to configure a 
> non-branching chain of middleware.  Once it branches the configuration 
> becomes hard (like programming-hard; which isn't *hard*, but it quickly 
> stops feeling like configuration).

Yep.  I think I'm getting it.  For example, I see that Paste's URLParser
seems to *construct* applications if they don't already exist based on
the URL.  And I assume that these applications could themselves be
middleware.  I don't think that is configurable declaratively if you
want to decide which app to use based on arbitrary request parameters.

But if we already had the config for each app "instance" that URLParser
wanted to consult laying around as files on disk, wouldn't it be just as
easy to construct these app objects "eagerly" at startup time?  Then you
URLParser could choose an already-configured app based on some sort of
configuration file in the URLParser component itself.  The "apps"
themselves may be pipelines, too, I realize that, but that is still
configurable without coding.

Maybe there'd be some concern about needing to stop the process in order
to add new applications.  That's a use case I hadn't really considered.
I suspect this could be done with a signal handler, though, which could
tell the URLParser to reload its config file instead of potentially
locating a and creating a new application within every request.

This would make URLParser a kind of "decision" middleware, but it would
choose from a static set of existing applications (or pipelines) for the
lifetime of the process as opposed to constructing them lazily.

> > OTOH, I'm not sure that I want my framework to "find" an app for me.
> > I'd like to be able to define pipelines that include my app, but I'd
> > typically just want to statically declare it as the end point of a
> > pipeline composed of service middleware.  I should look at Paste a
> > little more to see if it has the same philosophy or if I'm
> > misunderstanding you.
> 
> Mostly I wanted to avoid lots of magical incantations for the simple 
> case.  If you are used to Webware, well it has a very straight-forward 
> way of finding your application -- you give it a directory name.  If 
> Quixote or CherryPy, you give it a root object.  Maybe Zope would take a 
> ZEO connection string, and so on.

I think I understand now.

In general, I think I'd rather create "instance" locations of WSGI
applications (which would essentially consist of a config file on disk
plus any state info required by the app), configure and construct Python
objects out of those instances eagerly at "startup time" and just choose
between already-constructed apps if in "decision middleware" that has
its own declarative configuration if decisions need to be made about
which app to use.

This is mostly because I want the configuration info to live within the
application/middleware instance and have some other "starter" import
those configurations from application/middleware instance locations on
the filesystem.  The "starter" would construct required instances as
Python objects, and chain them together arbitrarily based on some other
"pipeline configuration" file that lives with the "starter".  The first
part of that (construct required instances) is described in a post I
made to this list yesterday.

This is probably because I'd like there to be one well-understood way to
declaratively configure pipelines as opposed to each piece of middleware
potentially needing to manage app construction and having its own
configuration to do so.

I don't know if this is reasonable for simpler requirements.  This is
more of a "formal deployment spec" idea and of course is likely flawed
in some subtle way I don't understand yet.

> > I'm pretty sure you're not advocating it, but in case you are, I'm not
> > sure it adds as much value as it removes to be able to have a "dynamic"
> > middleware chain whereby new middleware elements can be added "on the
> > fly" to a pipeline after a request has begun.  That is *very* "late
> > binding" to me and it's impossible to configure declaratively.
> 
> I'm comfortable with a little of both.  I don't even know *how* I'd stop 
> dynamic middleware.  For instance, one of the methods I added to Wareweb 
> recently allows any servlet to forward to any WSGI application; but from 
> the outside the servlet looks like a normal WSGI application just like 
> before.

It's obviously fine if applications themselves want to do this.  I'm not
sure that it would be possible to create a "deployment spec" that
canonized *how* to do it because as you mentioned it's not really a
configuration task, it's a programming task.

> > I agree!  I'm a bit confused because one of the canonical examples of
> > how WSGI middleware is useful seems to be the example of implementing a
> > framework-agnostic sessioning service.  And for that sessioning service
> > to be useful, your application has to be able to depend on its
> > availability so it can't be "oblivious".
> 
> This is where I'd like additional (incrementally agreed upon) standards. 
>   For instance, a standard for the interface of 'webapp01.session'. 
> It's a requirement, certainly, but the requirement is merely "there must 
> be a webapp01-compliant session installed".

Yes... I think the best way to describe this sort of thing is through
interfaces (at least notional, documented ones, if not formal ones that
can be introspected at runtime).  But that will need to be fleshed out
on a service-by-service basis, obviously.

FWIW, I'm also finding myself agreeing with Phillip's idea of allowing
applications to have a context object to which can help them find
services, as opposed to implementing each service entirely as
middleware.

Instead of obtaining the sessioning service via
"environ['webapp01.session']" in an application's __call__ , you might
do "self.context.get_service('session')"... or maybe even
"environ['services'].get_service('session')".  The latter would be
easier to add because we'd be using an existing PEP 333 protocol.  We'd
consume a single key within the environ namespace, but there would need
to be no change to the WSGI spec.

This would be pretty straightforward and a separate services framework
could be implemented outside WSGI entirely perhaps taking some cues from
PEAK and/or Zope 3 ( or even [gasp] *code!*, god knows this problem has
already been solved many times over ;-) -- for implementing service
registration and lookup.  It could form the basis for a "WSGI services"
spec without muddying the waters for PEP 333.

That said, if you're not interested in that because you think
implementing services as middleware is "good enough" and you'd rather
not implement another framework, I'd totally understand that.  At that
point I probably wouldn't be interested either because you're the
defacto champion of WSGI middleware as a lingua franca and the only
reason to do any of this is for the sake of collaboration and code
sharing.  But I do think it would be cleaner.

Anyway, lots of good ideas and tips in your further responses, thanks,
but for the sake of brevity and keeping the thread somewhat on topic, I
won't respond to them.

- C


From ianb at colorstudy.com  Tue Jul 19 19:15:00 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue, 19 Jul 2005 12:15:00 -0500
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <1121755151.13123.70.camel@plope.dyndns.org>
References: <1121571455.24386.171.camel@plope.dyndns.org>	
	<42D9DEBA.4080609@colorstudy.com>	
	<1121578280.24386.228.camel@plope.dyndns.org>	
	<42DA13CE.2080208@colorstudy.com>	
	<1121599799.24386.347.camel@plope.dyndns.org>	
	<42DC7858.80007@colorstudy.com>
	<1121755151.13123.70.camel@plope.dyndns.org>
Message-ID: <42DD3514.5080100@colorstudy.com>

Chris McDonough wrote:
> On Mon, 2005-07-18 at 22:49 -0500, Ian Bicking wrote:
> 
>>In addition to the examples I gave in response to Graham, I wrote a 
>>document on this a while ago: 
>>http://pythonpaste.org/docs/url-parsing-with-wsgi.html
>>
>>The hard part about this is configuration; it's easy to configure a 
>>non-branching chain of middleware.  Once it branches the configuration 
>>becomes hard (like programming-hard; which isn't *hard*, but it quickly 
>>stops feeling like configuration).
> 
> 
> Yep.  I think I'm getting it.  For example, I see that Paste's URLParser
> seems to *construct* applications if they don't already exist based on
> the URL.  And I assume that these applications could themselves be
> middleware.  I don't think that is configurable declaratively if you
> want to decide which app to use based on arbitrary request parameters.
> 
> But if we already had the config for each app "instance" that URLParser
> wanted to consult laying around as files on disk, wouldn't it be just as
> easy to construct these app objects "eagerly" at startup time?  Then you
> URLParser could choose an already-configured app based on some sort of
> configuration file in the URLParser component itself.  The "apps"
> themselves may be pipelines, too, I realize that, but that is still
> configurable without coding.

That's what paste.urlmap is for:

   http://svn.pythonpaste.org/Paste/trunk/paste/urlmap.py

(I haven't actually tried using it much for practical things, so it's 
quite possible it has design mistakes in it)

The idea being that you do:

   urlmap['/myapp'] = MyApp()

But additionally (in PathProxyURLMap):

   urlmap['/myapp'] = 'myapp.conf'

And it builds the application from the configuration file.

> Maybe there'd be some concern about needing to stop the process in order
> to add new applications.  That's a use case I hadn't really considered.
> I suspect this could be done with a signal handler, though, which could
> tell the URLParser to reload its config file instead of potentially
> locating a and creating a new application within every request.
> 
> This would make URLParser a kind of "decision" middleware, but it would
> choose from a static set of existing applications (or pipelines) for the
> lifetime of the process as opposed to constructing them lazily.

URLParser itself is just one parsing implementation, though maybe named 
too generically.  I don't think that particular code needs to grow many 
more features, but there's also room for many other parsers.  And it's 
also fairly easy to wrestle control from URLParser if that gets put in 
the stack (for instance, putting an application function in __init__.py 
will basically take over URL parsing for that  directory).

>>>OTOH, I'm not sure that I want my framework to "find" an app for me.
>>>I'd like to be able to define pipelines that include my app, but I'd
>>>typically just want to statically declare it as the end point of a
>>>pipeline composed of service middleware.  I should look at Paste a
>>>little more to see if it has the same philosophy or if I'm
>>>misunderstanding you.
>>
>>Mostly I wanted to avoid lots of magical incantations for the simple 
>>case.  If you are used to Webware, well it has a very straight-forward 
>>way of finding your application -- you give it a directory name.  If 
>>Quixote or CherryPy, you give it a root object.  Maybe Zope would take a 
>>ZEO connection string, and so on.
> 
> 
> I think I understand now.
> 
> In general, I think I'd rather create "instance" locations of WSGI
> applications (which would essentially consist of a config file on disk
> plus any state info required by the app), configure and construct Python
> objects out of those instances eagerly at "startup time" and just choose
> between already-constructed apps if in "decision middleware" that has
> its own declarative configuration if decisions need to be made about
> which app to use.

I think this is a laudible goal.  Right now, when I'm deploying 
applications written for Paste, I am reluctant to intermingle them in 
the same process and configuration... but that's because Paste is young, 
not because that's a bad idea.  But as a result I haven't tried it, and 
I only have a moderate concept of what it would mean in practice.

A neat feature would be to configure fairly seemlessly across process 
boundaries.  E.g., add a "fork=True" parameter to an application's 
configuration, and the server would fork a process (or delegate to an 
already forked worker process) for that application.  That's the sort of 
thing that could move Python into PHP-style hosting situations.

> This is mostly because I want the configuration info to live within the
> application/middleware instance and have some other "starter" import
> those configurations from application/middleware instance locations on
> the filesystem.  The "starter" would construct required instances as
> Python objects, and chain them together arbitrarily based on some other
> "pipeline configuration" file that lives with the "starter".  The first
> part of that (construct required instances) is described in a post I
> made to this list yesterday.
> 
> This is probably because I'd like there to be one well-understood way to
> declaratively configure pipelines as opposed to each piece of middleware
> potentially needing to manage app construction and having its own
> configuration to do so.
> 
> I don't know if this is reasonable for simpler requirements.  This is
> more of a "formal deployment spec" idea and of course is likely flawed
> in some subtle way I don't understand yet.

I think there's probably some room for separation.  In practice I end up 
with multiple configuration files for my projects -- one that's generic 
to the application, and one that's specific to the deployment.  But it's 
very hard to determine ahead of time what stuff goes where.  For 
instance, server options mostly go in the deployment configuration. 
Until I start building conventions about configuration information on 
the servers, at which time I expect configuration will migrate into 
common locations in the form of configuration-loading options.  E.g., 
where I now do:

   server = 'scgi_threaded'
   port = 4010

In the future I might do:

   import port_map
   port = port_map.find_port(app_name)

Where port_map is some global module where I keep the entire server's 
list of ports mappings.  And being able to do stuff like this is what 
makes Python-syntax imperative configuration so nice... it's crude and 
annoying, but configuration that is more declarative becomes even worse 
when you try to build these kind of features into it.

But I digress... the deployment configuration as I currently use it is 
usually something that overwrites the generic application configuration. 
  They aren't two distinct things.  And the configuration doesn't belong 
to one or the other.  Is the location of session information server 
specific, application specific, profile specific?  It depends on your 
situation.  I might have a standard convention for the location of 
Javascript libraries that lives in my configuration; but on my 
development machine I override that because I'm doing development on one 
of those libraries.  There's all sorts of specific cases, and in 
declarative or well-partitioned configurations the configuration 
language has to include lots and lots of features.  Or you end up with 
configuration file generation or other nonsense.

In the end, I think I have more faith in the general applicability of 
Python as a way to describe structures, combined with strong 
configuration-specific conventions and style guides.  Otherwise it feels 
like this embeds policy into the configuration-loading code, and I hate 
policy in code.

>>>I'm pretty sure you're not advocating it, but in case you are, I'm not
>>>sure it adds as much value as it removes to be able to have a "dynamic"
>>>middleware chain whereby new middleware elements can be added "on the
>>>fly" to a pipeline after a request has begun.  That is *very* "late
>>>binding" to me and it's impossible to configure declaratively.
>>
>>I'm comfortable with a little of both.  I don't even know *how* I'd stop 
>>dynamic middleware.  For instance, one of the methods I added to Wareweb 
>>recently allows any servlet to forward to any WSGI application; but from 
>>the outside the servlet looks like a normal WSGI application just like 
>>before.
> 
> 
> It's obviously fine if applications themselves want to do this.  I'm not
> sure that it would be possible to create a "deployment spec" that
> canonized *how* to do it because as you mentioned it's not really a
> configuration task, it's a programming task.
> 
> 
>>>I agree!  I'm a bit confused because one of the canonical examples of
>>>how WSGI middleware is useful seems to be the example of implementing a
>>>framework-agnostic sessioning service.  And for that sessioning service
>>>to be useful, your application has to be able to depend on its
>>>availability so it can't be "oblivious".
>>
>>This is where I'd like additional (incrementally agreed upon) standards. 
>>  For instance, a standard for the interface of 'webapp01.session'. 
>>It's a requirement, certainly, but the requirement is merely "there must 
>>be a webapp01-compliant session installed".
> 
> 
> Yes... I think the best way to describe this sort of thing is through
> interfaces (at least notional, documented ones, if not formal ones that
> can be introspected at runtime).  But that will need to be fleshed out
> on a service-by-service basis, obviously.
> 
> FWIW, I'm also finding myself agreeing with Phillip's idea of allowing
> applications to have a context object to which can help them find
> services, as opposed to implementing each service entirely as
> middleware.
> 
> Instead of obtaining the sessioning service via
> "environ['webapp01.session']" in an application's __call__ , you might
> do "self.context.get_service('session')"... or maybe even
> "environ['services'].get_service('session')".  The latter would be
> easier to add because we'd be using an existing PEP 333 protocol.  We'd
> consume a single key within the environ namespace, but there would need
> to be no change to the WSGI spec.

I have to read over PJE's email some more.  It doesn't really remove the 
need for middleware, it's more like it could consolidate many services 
into one generic service middleware.  For instance, the session service 
still needs access to the response, and the only general way to access 
the response is through middleware.  The request, at least, can be 
generally accessed as the environment dictionary; but replacing 
middleware with contracts on what you must return from your application 
is a non-starter.  E.g., if an auth service requires something like:

auth = get_service('auth')
if not auth.allowed(app_context):
     forbidden = auth.forbidden()
     start_response(forbidden[0], forbidden[1])
     return forbidden[2]

Well... that's not very nice, is it?  And it's totally infeasible once 
your code is in the bowls of some framework.  You could do it with an 
exception (with some middleware that catches the exception).  You could 
do the session service with some middleware that collects extra headers 
and other response information.

And now that I'm thinking through an implementation, I realize it's 
something I've thought of before -- in my mind it was about 
lighter-weight filters and simpler configuration, but the implementation 
would be similar.

My only concern is if it confuses the order of filters.  If there's one 
generic service middleware, it's probably going to be invoked before 
some other middleware and after others.  But the services would 
communicate with that service middleware outside of the WSGI band (using 
callbacks or shared structures or something).  This makes it difficult 
for transforming middleware to be certain that it has full control to 
wrap applications.

> This would be pretty straightforward and a separate services framework
> could be implemented outside WSGI entirely perhaps taking some cues from
> PEAK and/or Zope 3 ( or even [gasp] *code!*, god knows this problem has
> already been solved many times over ;-) -- for implementing service
> registration and lookup.  It could form the basis for a "WSGI services"
> spec without muddying the waters for PEP 333.
> 
> That said, if you're not interested in that because you think
> implementing services as middleware is "good enough" and you'd rather
> not implement another framework, I'd totally understand that.  At that
> point I probably wouldn't be interested either because you're the
> defacto champion of WSGI middleware as a lingua franca and the only
> reason to do any of this is for the sake of collaboration and code
> sharing.  But I do think it would be cleaner.

Well, I'm a fan of working code.  If services are a better way of doing 
some of this stuff, and they supercede code I've written or imagined, 
that's not that big a deal.  At this point I'd be interested to see how 
a Really Lame Implementation of Sessions (for instance) would be 
implemented with services.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

From ianb at colorstudy.com  Tue Jul 19 19:28:21 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue, 19 Jul 2005 12:28:21 -0500
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <5.1.1.6.0.20050717134029.0269b730@mail.telecommunity.com>
References: <42DA13CE.2080208@colorstudy.com>
	<1121571455.24386.171.camel@plope.dyndns.org>
	<42D9DEBA.4080609@colorstudy.com>
	<1121578280.24386.228.camel@plope.dyndns.org>
	<42DA13CE.2080208@colorstudy.com>
	<5.1.1.6.0.20050717134029.0269b730@mail.telecommunity.com>
Message-ID: <42DD3835.1040300@colorstudy.com>

Phillip J. Eby wrote:
> At 07:29 AM 7/17/2005 -0400, Chris McDonough wrote:
> 
>> I'm a bit confused because one of the canonical examples of
>> how WSGI middleware is useful seems to be the example of implementing a
>> framework-agnostic sessioning service.  And for that sessioning service
>> to be useful, your application has to be able to depend on its
>> availability so it can't be "oblivious".
> 
> 
> Exactly.  As soon as you start trying to have configured services, you 
> are creating Yet Another Framework.  Which isn't a bad thing per se, 
> except that it falls outside the scope of  PEP 333.  It deserves a 
> separate PEP, I think, and a separate implementation mechanism than 
> being crammed into the request environment.  These things should be 
> allowed to be static, so that an application can do some reasonable 
> setup, and so that you don't have per-request overhead to shove ninety 
> services into the environment.

The services themselves can be fairly lazy; though unfortunately you 
can't be trickly and add laziness when a service was originally written 
in a very concrete way, since that would require fake dictionaries and 
other things WSGI disallows.

But there's not a lot of overhead to environ['paste.session.factory']() 
-- it's just a stub object stuck in a particulra key, that knows the 
context in which it was created so it can communicate with that context 
later.

> Also, because we are dealing not with basic plumbing but with making a 
> nice kitchen, it seems to me we can afford to make the fixtures nice.  
> That is, for an add-on specification to WSGI we don't need to adhere to 
> the "let it be ugly for apps if it makes the server easier" principle 
> that guided PEP 333.  The assumption there was that people would mostly 
> port existing wrappers over HTTP/CGI to be wrappers over WSGI.  But for 
> services, we are talking about an actual framework to be used by 
> application developers directly, so more user-friendliness is definitely 
> in order.

My own vision for most middleware is that it get wrapped by frameworks. 
  In fact, that it be so godawful ugly you can't help but wrap it ;) 
Well, not deliberately horrible for no good reason... but at least that 
it doesn't matter that much, because the frameworks will want to wrap it 
anyway.

This is the "aesthetically neutral" aspect of middleware that I've 
mentioned before.  People get all bothered if you use underscores 
instead of mixed case, or vice versa, even though that's one of the 
least important aspects of the features being implemented.

Of course, there are real problems with wrapping.  Like it reduces the 
transparency -- middleware becomes this magic part of the system because 
it's not something people deal with day-to-day, and if your first chance 
to work with middleware is to write it, that's intimidating.  There's 
also the leaky abstraction problem; though I think well-defined 
middleware helps avoid this.

Really, if you are building user-visible standard libraries, you are 
building a framework.  And maybe I'm just too pessimistic about a 
standard framework... but, well, I am certainly not optimistic about it. 
  On the other hand, it's not like people are breaking down my door with 
their enthusiasm to use Paste middleware either.  So I dunno.

I can only say a good strategy clearly has to build on developer's 
laziness, their fear of new things, and their reluctance to learn new 
things.  Well, that's the negative way of saying it.  It has to build on 
the likelihood that their attention is primarily focused on their 
domain, that it builds on their existing knowledge, and that it presents 
a minimal set of new concepts.


-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

From jjinux at gmail.com  Tue Jul 19 19:46:25 2005
From: jjinux at gmail.com (Shannon -jj Behrens)
Date: Tue, 19 Jul 2005 10:46:25 -0700
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <1121571455.24386.171.camel@plope.dyndns.org>
References: <1121571455.24386.171.camel@plope.dyndns.org>
Message-ID: <c41f67b905071910465763e330@mail.gmail.com>

It seems to me that authentication and authorization should be a put
into a library that isn't bound to the Web at all.  I thought that
those guys reimplementing J2EE in Python did that. :-/

Oh well,
-jj

On 7/16/05, Chris McDonough <chrism at plope.com> wrote:
> I've also been putting a bit of thought into middleware configuration,
> although maybe in a different direction.  I'm not too concerned yet
> about being able to introspect the configuration of an individual
> component.  Maybe that's because I haven't thought about the problem
> enough to be concerned about it.  In the meantime, though, I *am*
> concerned about being able to configure a middleware "pipeline" easily
> and have it work.
> 
> I've been attempting to divine a declarative way to configure a pipeline
> of WSGI middleware components.  This is simple enough through code,
> except that at least in terms of how I'm attempting to factor my
> middleware, some components in the pipeline may have dependencies on
> other pipeline components.
> 
> For example, it would be useful in some circumstances to create separate
> WSGI components for user identification and user authorization.  The
> process of identification -- obtaining user credentials from a request
> -- and user authorization  -- ensuring that the user is who he says he
> is by comparing the credentials against a data source -- are really
> pretty much distinct operations.  There might also be a "challenge"
> component which forces a login dialog.
> 
> In practice, I don't know if this is a truly useful separation of
> concerns that need to be implemented in terms of separate components in
> the middleware pipeline (I see that paste.login conflates them), it's
> just an example.  But at very least it would keep each component simpler
> if the concerns were factored out into separate pieces.
> 
> But in the example I present, the "authentication" component depends
> entirely on the result of the "identification" component.  It would be
> simple enough to glom them together by using a distinct environment key
> for the identification component results and have the authentication
> component look for that key later in the middleware result chain, but
> then it feels like you might as well have written the whole process
> within one middleware component because the coupling is pretty strong.
> 
> I have a feeling that adapters fit in here somewhere, but I haven't
> really puzzled that out yet.  I'm sure this has been discussed somewhere
> in the lifetime of WSGI but I can't find much in this list's archives.
> 
> > Lately I've been thinking about the role of Paste and WSGI and
> > whatnot. Much of what makes a Paste component Pastey is
> > configuration;  otherwise the bits are just independent pieces of
> > middleware, WSGI applications, etc.  So, potentially if we can agree
> > on configuration, we can start using each other's middleware more
> > usefully.
> >
> > I think we should avoid questions of configuration file syntax for
> > now.  Lets instead simply consider configuration consumers.  A
> > standard would consist of:
> >
> > * A WSGI environment key (e.g., 'webapp01.config')
> > * A standard for what goes in that key (e.g., a dictionary object)
> > * A reference implementation of the middleware
> > * Maybe a non-WSGI-environment way to access the configuration (like
> > paste.CONFIG, which is a global object that dispatches to per-request
> > configuration objects) -- in practice this is really really useful, as
> > you don't have to pass the configuration object around.
> >
> > There's some other things we have to consider, as configuration syntaxes
> > do effect the configuration objects significantly.  So, the standard for
> > what goes in the key has to take into consideration some possible
> > configuration syntaxes.
> >
> > The obvious starting place is a dictionary-like object.  I would suggest
> > that the keys should be valid Python identifiers.  Not all syntaxes
> > require this, but some do.  This restriction simply means that
> > configuration consumers should try to consume Python identifiers.
> >
> > There's also a question about name conflicts (two consumers that are
> > looking for the same key), and whether nested configuration should be
> > preferred, and in what style.
> >
> > Note that the standard we decide on here doesn't have to be the only way
> > the object can be accessed.  For instance, you could make your
> > configuration available through 'myframework.config', and create a
> > compliant wrapper that lives in 'webapp01.config', perhaps even doing
> > different kinds of mapping to fix convention differences.
> >
> > There's also a question about what types of objects we can expect in the
> > configuration.  Some input styles (e.g., INI and command line) only
> > produce strings.  I think consumers should treat strings (or maybe a
> > special string subclass) specially, performing conversions as necessary
> > (e.g., 'yes'->True).
> >
> > Thoughts?
> 
> 
> 
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/jjinux%40gmail.com
> 


-- 
I have decided to switch to Gmail, but messages to my Yahoo account will
still get through.

From ianb at colorstudy.com  Tue Jul 19 19:56:02 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue, 19 Jul 2005 12:56:02 -0500
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <5.1.1.6.0.20050717135650.026a0428@mail.telecommunity.com>
References: <5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com>
	<5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com>
	<5.1.1.6.0.20050717135650.026a0428@mail.telecommunity.com>
Message-ID: <42DD3EB2.4090605@colorstudy.com>

Phillip J. Eby wrote:
>> In many cases, the middleware is modifying or watching the 
>> application's output.  For instance, catching a 401 and turning that 
>> into the appropriate login -- which might mean producing a 401, a 
>> redirect, a login page via internal redirect, or whatever.
> 
> 
> And that would be legitimate middleware, except I don't think that's 
> what you really want for that use case.  What you want is an 
> "authentication service" that you just call to say, "I need a login" and 
> get the login information from, and return its return value so that it 
> does start_response for you and sends the right output.

Like I mentioned in my response to Chris, this kind of contract about 
return values is a difficult one to implement.  A "return 401 status" 
contract is pretty simple, in that it's vague in a way that fits with 
typical frameworks -- they all have a way of changing the status, and 
most have a way of aborting with that kind of error.

> The difference is obliviousness; if you want to *wrap* an application 
> not written to use WSGI services, then it makes sense to make it 
> middleware.  If you're writing a new application, just have it use 
> components instead of mocking up a 401 just so you can use the existing 
> middleware.

Who's writing new applications?  OK... I guess a lot of people are.  I 
may be more focused on retrofitting compared to other people.

> Notice, by the way, that it's trivial to create middleware that detects 
> the 401 and then *invokes the service*.  So, it's more reusable to make 
> services be services, and middleware be wrappers to apply services to 
> oblivious applications.

Yes, this would be the single-middleware-multiple-service model.  I 
don't understand exactly how services work myself, so I can't write 
that, but I'm certainly interested in examples.  Well... I'll throw out 
one just for the heck of it:

class ServiceMiddleware(object):

     def __init__(self, app):
         self.app = app
     def __call__(self, environ, start_response):
         context = environ['webapp.service_context'] = ServiceContext()
         # You could also do some thread-local registering of this
         # context at this point
         def replacement_start_response(status, headers):
             status, headers, writer = context.start_response(
                 start_response, status, headers)
             return writer
         app_iter = self.app(environ, start_response)
         return context.app_iter(app_iter)

class ServiceContext(object):
     def __init__(self):
         self.services = []
     def get_service(self, name):
         ... something I don't understand ...
         self.services.append(service)
         return service
     def start_response(self, start_response, status, headers):
         for service in self.services:
             if hasattr(service, 'munge_start_response'):
                 status, headers = service.munge_start_response(status, 
headers)
         return start_response(status, headers)
     def app_iter(self, app_iter):
         return app_iter


And ServiceContext should also ask services if they care to munge_body 
or something, and then pipe all calls to the writer and all the parts of 
app_iter into that service if so.  And it should let services catch 
exceptions.

>> I guess you could make one Uber Middleware that could handle the 
>> services' needs to rewrite output, watch for errors and finalize 
>> resources, etc.
> 
> 
> Um, it's called a library of functions.  :)  WSGI was designed to make 
> it easy to use library calls to do stuff.  If you don't need the 
> obliviousness, then library calls (or service calls) are the Obvious Way 
> To Do It.

I do use library calls when possible; and even when not possible I 
(generally) try to make the middleware as small as possible, just 
handling the logic of the transformation.  But mostly libraries don't 
need to be discussed here, because they are simple ;)

There are perhaps a few places where standardization of some library 
manipulations would be useful.  E.g., get_cookies() and 
parse_querystring() in paste.wsgilib 
(http://svn.pythonpaste.org/Paste/trunk/paste/wsgilib.py) could be 
standardized, and then WSGI-based libraries that were interested in the 
request could probably retrieve the frameworks' parsed version of URL 
and cookie parameters.

>>> Really, the only stuff that actually needs to be middleware, is stuff 
>>> that wraps an *oblivious* application; i.e., the application doesn't 
>>> know it's there.  If it's a service the application uses, then it 
>>> makes more sense to create a service management mechanism for 
>>> configuration and deployment of WSGI applications.
>>
>>
>> Applications always care about the things around them, so any 
>> convention that middleware and applications be unaware of each other 
>> would rule out most middleware.
> 
> 
> Yes, exactly!  Now you understand me.  :)  If the application is what 
> wants the service, let it just call the service.  Middleware is 
> *overhead* in that case.

Well, no, I don't really understand you, but if it makes you feel 
better... ;)

For instance, applications may be interested to know there's a piece of 
middleware that will catch unexpected exceptions.  For instance, it 
might see that and reraise unexpected exceptions instead of providing 
its own error report.  But it's not "overhead" or something the 
application wants handled lazily.  It's just useful information about 
the environment.

>>> I hope this isn't too vague; I've been wanting to say something about 
>>> this since I saw your blog post about doing transaction services in 
>>> WSGI, as that was when I first understood why you were making 
>>> everything into middleware.  (i.e., to create a poor man's substitute 
>>> for "placeful" services and utilities as found in PEAK and Zope 3.)
>>
>>
>> What do they provide that middleware does not?
> 
> 
> Well, some services may be things the application needs only when it's 
> being initially configured.  Or maybe the service is something like a 
> scheduler that gives timed callbacks.  There are lots of non-per-request 
> services that make sense, so forcing service access to be only through 
> the environment makes for cruftier code, since you now have to keep 
> track of whether you've been called before, and then do any setup during 
> your first web hit.  For that matter, some service configuration might 
> need to be dynamically determined, based on the application object 
> requesting it.
> 
> But the main thing they provide that middleware does not is simplicity 
> and ease of use.  I understand your desire to preserve the appearance of 
> neutrality, but you are creating new web frameworks here, and making 
> them ugly doesn't make them any less of a framework.  :)
> 
> What's worse is that by tying the service access mechanism to the 
> request environment, you're effectively locking out frameworks like PEAK 
> and Zope 3 from being able to play, and that goes against (IMO) the 
> goals of WSGI, which is to get more and more frameworks to be able to 
> play, and give them *incentive* to merge and dissolve and be assimilated 
> into the primordial soup of WSGI-based integration, or at least to be 
> competitors for various implementation/use case niches in the WSGI 
> ecosystem.

How is being request-oriented locking them out?  To me this mostly seems 
like an aesthetics and implementation discussion; mapping from one to 
the other doesn't seem that hard.  If you map from request to service, 
you do it by putting a little proxy in the request that calls the 
service.  If mapping from service to request, you keep the request 
around somewhere (threadlocal or something) and the service is 
implemented in terms of things found in the request.

> See also my message to Chris just now about why a WSGI service spec can 
> and should follow different rules of engagement than the WSGI spec did; 
> it really isn't necessary to make services ugly for applications in 
> order to make it easy for server implementors, as it was for the WSGI 
> core spec.  In fact, the opposite condition applies: the service stack 
> should make it easy and clean for applications to use WSGI services, 
> because they're the things that will let them hide WSGI implementation 
> details in the absence of an existing web framework.

With perhaps a couple exceptions, I don't think WSGI is that bad for the 
application side.  Not that you'll write to WSGI directly most of the 
time, but if you do it's still not that bad.  WSGI is dumb and crude, 
which is a feature.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

From fuzzybr80 at gmail.com  Tue Jul 19 20:08:41 2005
From: fuzzybr80 at gmail.com (ChunWei Ho)
Date: Wed, 20 Jul 2005 02:08:41 +0800
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <mailman.7767.1121744822.10511.web-sig@python.org>
References: <mailman.7767.1121744822.10511.web-sig@python.org>
Message-ID: <31f07fc30507191108b01ba7d@mail.gmail.com>

Hi, I have been looking at WSGI for only a few weeks, but had some
ideas similar (I hope) to what is being discussed that I'll put down
here. I'm new to this so I beg your indulgence if this is heading down
the wrong track or wildly offtopic :)

It seems to me that a major drawback of WSGI middleware that is
preventing flexible configuration/chain paths is that the application
to be run has to be determined at init time. It is much flexible if we
were able to specify what application to run and configuration
information at call time - the middleware would be able to approximate
a service of sorts.

An example:
I have an WSGI application simulating a file-server, and I wish to
authenticate users and gzip served files where application. In a
middleware chain it would probably work out to be:
application = authmiddleware(gzipmiddleware(fileserverapp))

For example, a simplified gzipping middleware consists of:
class gzipmiddleware:
  def __init__(self, application, configparam):
     self._application = application
     ....
  def __call__(self, environ, start_response):
     do start_response
     call self._application(environ, start_response) as iterable    
     get each iterator output and zip and yield it.

and the fileserverapp, with doGET, doPUT, doPOST subapplications that
do the actual processing:
def fileserverapp(environ, start_response):
   if(GET): return doGET(environ, start_response)
   if(POST): return doPOST(environ, start_response)
   if(PUT): return doPUT(environ, start_response)

Now, the application-server is specific on what it wishes to gzip
(usually only on GET or POST entity responses and only if the mimetype
allows it). But this level of logic is not to be placed in the
gzipping middleware, since its configurable on the webserver. So in
order to tell the gzipmiddleware whether to gzip or not:

(a) Add a key in environ, say environ[gzip.do_gzip] = True or False to
inform the gzipmiddleware to do gzip or not. This does mean that
gzipmiddleware remains in the chain, irregardless of whether it is
needed or not.

(b) 
Have chain application = authmiddleware(fileserverapp)
Use Handlers, as Ian suggested, and in the fileserverapp's init:
Handlers(
  IfTest(method=GET,MimeOkForGzip=True, RunApp=gzipmiddleware(doGET)), 
  IfTest(method=GET,MimeOkForGzip=False, RunApp=doGET), 
  IfTest(method=POST,MimeOkForGzip=True, RunApp=gzipmiddleware(doPOST)), 
  IfTest(method=POST,MimeOkForGzip=False, RunApp=doPOST), 
  IfTest(method=PUT, RunApp=doPOST) 
) 

(c)
Make gzipmiddleware a service in the following form:
class gzipmiddleware:
  def __init__(self, application=None, configparam=None):
     self._application = application
     ....
  def __call__(self, environ, start_response, application=None,
configparam=None):
     if application and configparam is specified, use them instead of
the init values
         do start_response
         call self._application(environ, start_response) as iterable    
         get each iterator output and zip and yield it.

This "middleware" is still compatible with PEP-333, but can also be used as:
#on main application initialization, create a gzipservice and put it
in environ without
#specifying application or configparams for init():
environ['service.gzip'] = gzipmiddleware()

Modify fileserverapp to:
def fileserverapp(environ, start_response):
   if(GET): 
       if(mimetype ok for gzip):
           gzipservice = environ['service.gzip']
           return gzipservice(environ, start_response, doGET, gzipconfigparams) 
       else: return doGET(environ, start_response)
   if(POST): 
       if(mimetype ok for gzip):
           gzipservice = environ['service.gzip']
           return gzipservice(environ, start_response, doPOST,
gzipconfigparams)
       else: return doPOST(environ, start_response)
   if(PUT): doPUT(environ, start_response)

The main difference here is that you don't have to initialize full
application chains for each possible middleware-path for the request.
This would be very useful if you had many middleware in the chain with
many permutations as to which middleware are needed

You could also instead put a service factory object into environ, it
will return the gzipmiddleware object as a service if already exist,
otherwise it will create it and then return it.

From mike_mp at zzzcomputing.com  Tue Jul 19 20:25:04 2005
From: mike_mp at zzzcomputing.com (mike bayer)
Date: Tue, 19 Jul 2005 14:25:04 -0400 (EDT)
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <42DD3835.1040300@colorstudy.com>
References: <42DA13CE.2080208@colorstudy.com>
	<1121571455.24386.171.camel@plope.dyndns.org>
	<42D9DEBA.4080609@colorstudy.com>
	<1121578280.24386.228.camel@plope.dyndns.org>
	<42DA13CE.2080208@colorstudy.com>
	<5.1.1.6.0.20050717134029.0269b730@mail.telecommunity.com>
	<42DD3835.1040300@colorstudy.com>
Message-ID: <6107.66.192.34.8.1121797504.squirrel@66.192.34.8>


While I'm not following every detail of this discussion, this line caught
my attention -

Ian Bicking said:
> Really, if you are building user-visible standard libraries, you are
> building a framework.

only because Fowler recently posted something that made me think about
this, where he distinguishes a "framework" as being something which
employs the "inversion of control" principle, as Paste does, versus a
"library" which does not: 
http://martinfowler.com/bliki/InversionOfControl.html .

I know theres a lot of discussion over "A Framework ? Not a Framework?"
lately, largely in response to the recent meme "more frameworks == BAD"
that seems to be getting around these days; perhaps Fowler's distinction
is helpful...I hadn't thought of it that way before.

From jjinux at gmail.com  Tue Jul 19 22:33:02 2005
From: jjinux at gmail.com (Shannon -jj Behrens)
Date: Tue, 19 Jul 2005 13:33:02 -0700
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <5.1.1.6.0.20050717135650.026a0428@mail.telecommunity.com>
References: <5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com>
	<42DA1695.7020304@colorstudy.com>
	<5.1.1.6.0.20050717135650.026a0428@mail.telecommunity.com>
Message-ID: <c41f67b90507191333659e63fe@mail.gmail.com>

Phillip, 

100% agreed.

Libraries are more flexible than middleware because you get to decide
when, if, and how they get called.  Middleware has its place, but it
doesn't make sense to try to package all library code as middleware. 
Even when you do write middleware, it should simply link in library
code so that you can use the library code in the absence of the
middleware.

Consider an XSLT middleware layer.  It makes sense to have such a
thing.  It doesn't make sense to only be able to use the XSLT code via
the middleware interface.  As much as possible, you want to be able to
interact with libraries directly.

Best Regards,
-jj

On 7/17/05, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 03:28 AM 7/17/2005 -0500, Ian Bicking wrote:
> >Phillip J. Eby wrote:
> >>What I think you actually need is a way to create WSGI application
> >>objects with a "context" object.  The "context" object would have a
> >>method like "get_service(name)", and if it didn't find the service, it
> >>would ask its parent context, and so on, until there's no parent context
> >>to get it from.  The web server would provide a way to configure a root
> >>or default context.
> >
> >I guess I'm treating the request environment as that context.  I don't
> >really see the problem with that...?
> 
> It puts a layer in the request call stack for each service you want to
> offer, versus *no* layers for an arbitrary number of services.  It adds
> work to every request to put stuff into the environment, then take it out
> again, versus just getting what you want in the first place.
> 
> 
> >In many cases, the middleware is modifying or watching the application's
> >output.  For instance, catching a 401 and turning that into the
> >appropriate login -- which might mean producing a 401, a redirect, a login
> >page via internal redirect, or whatever.
> 
> And that would be legitimate middleware, except I don't think that's what
> you really want for that use case.  What you want is an "authentication
> service" that you just call to say, "I need a login" and get the login
> information from, and return its return value so that it does
> start_response for you and sends the right output.
> 
> The difference is obliviousness; if you want to *wrap* an application not
> written to use WSGI services, then it makes sense to make it
> middleware.  If you're writing a new application, just have it use
> components instead of mocking up a 401 just so you can use the existing
> middleware.
> 
> Notice, by the way, that it's trivial to create middleware that detects the
> 401 and then *invokes the service*.  So, it's more reusable to make
> services be services, and middleware be wrappers to apply services to
> oblivious applications.
> 
> 
> >I guess you could make one Uber Middleware that could handle the services'
> >needs to rewrite output, watch for errors and finalize resources, etc.
> 
> Um, it's called a library of functions.  :)  WSGI was designed to make it
> easy to use library calls to do stuff.  If you don't need the
> obliviousness, then library calls (or service calls) are the Obvious Way To
> Do It.
> 
> 
> >   This isn't unreasonable, and I've kind of expected one to evolve at
> > some point.  But you'll have to say more to get me to see how "services"
> > is a better way to manage this.
> 
> I'm saying that middleware can use services, and applications can use
> services.  Making applications *have to* use middleware in order to use the
> services is wasteful of both computer time and developer brainpower.  Just
> let them use services directly when the situation calls for it, and you can
> always write middleware to use the services when you encounter the
> occasional (and ever-rarer with time) oblivious application.
> 
> 
> >>Really, the only stuff that actually needs to be middleware, is stuff
> >>that wraps an *oblivious* application; i.e., the application doesn't know
> >>it's there.  If it's a service the application uses, then it makes more
> >>sense to create a service management mechanism for configuration and
> >>deployment of WSGI applications.
> >
> >Applications always care about the things around them, so any convention
> >that middleware and applications be unaware of each other would rule out
> >most middleware.
> 
> Yes, exactly!  Now you understand me.  :)  If the application is what wants
> the service, let it just call the service.  Middleware is *overhead* in
> that case.
> 
> 
> >>I hope this isn't too vague; I've been wanting to say something about
> >>this since I saw your blog post about doing transaction services in WSGI,
> >>as that was when I first understood why you were making everything into
> >>middleware.  (i.e., to create a poor man's substitute for "placeful"
> >>services and utilities as found in PEAK and Zope 3.)
> >
> >What do they provide that middleware does not?
> 
> Well, some services may be things the application needs only when it's
> being initially configured.  Or maybe the service is something like a
> scheduler that gives timed callbacks.  There are lots of non-per-request
> services that make sense, so forcing service access to be only through the
> environment makes for cruftier code, since you now have to keep track of
> whether you've been called before, and then do any setup during your first
> web hit.  For that matter, some service configuration might need to be
> dynamically determined, based on the application object requesting it.
> 
> But the main thing they provide that middleware does not is simplicity and
> ease of use.  I understand your desire to preserve the appearance of
> neutrality, but you are creating new web frameworks here, and making them
> ugly doesn't make them any less of a framework.  :)
> 
> What's worse is that by tying the service access mechanism to the request
> environment, you're effectively locking out frameworks like PEAK and Zope 3
> from being able to play, and that goes against (IMO) the goals of WSGI,
> which is to get more and more frameworks to be able to play, and give them
> *incentive* to merge and dissolve and be assimilated into the primordial
> soup of WSGI-based integration, or at least to be competitors for various
> implementation/use case niches in the WSGI ecosystem.
> 
> See also my message to Chris just now about why a WSGI service spec can and
> should follow different rules of engagement than the WSGI spec did; it
> really isn't necessary to make services ugly for applications in order to
> make it easy for server implementors, as it was for the WSGI core spec.  In
> fact, the opposite condition applies: the service stack should make it easy
> and clean for applications to use WSGI services, because they're the things
> that will let them hide WSGI implementation details in the absence of an
> existing web framework.
> 
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/jjinux%40gmail.com
> 


-- 
I have decided to switch to Gmail, but messages to my Yahoo account will
still get through.

From fuzzybr80 at gmail.com  Wed Jul 20 05:34:07 2005
From: fuzzybr80 at gmail.com (ChunWei Ho)
Date: Wed, 20 Jul 2005 11:34:07 +0800
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <31f07fc30507191108b01ba7d@mail.gmail.com>
References: <mailman.7767.1121744822.10511.web-sig@python.org>
	<31f07fc30507191108b01ba7d@mail.gmail.com>
Message-ID: <31f07fc30507192034438a8617@mail.gmail.com>

> (b)
> Have chain application = authmiddleware(fileserverapp)
> Use Handlers, as Ian suggested, and in the fileserverapp's init:
> Handlers(
>   IfTest(method=GET,MimeOkForGzip=True, RunApp=gzipmiddleware(doGET)),
>   IfTest(method=GET,MimeOkForGzip=False, RunApp=doGET),
>   IfTest(method=POST,MimeOkForGzip=True, RunApp=gzipmiddleware(doPOST)),
>   IfTest(method=POST,MimeOkForGzip=False, RunApp=doPOST),
>   IfTest(method=PUT, RunApp=doPOST)
> )

It was Graham who suggested the use of Handlers initially. Sincere
apologies for my confusion.

> (c)
> Make gzipmiddleware a service in the following form:
> class gzipmiddleware:
>   def __init__(self, application=None, configparam=None):
>      self._application = application
>      ....
>   def __call__(self, environ, start_response, application=None,
> configparam=None):
>      if application and configparam is specified, use them instead of
> the init values
>          do start_response
>          call self._application(environ, start_response) as iterable
>          get each iterator output and zip and yield it.
> 
> This "middleware" is still compatible with PEP-333, but can also be used as:
> #on main application initialization, create a gzipservice and put it
> in environ without
> #specifying application or configparams for init():
> environ['service.gzip'] = gzipmiddleware()
> 
> Modify fileserverapp to:
> def fileserverapp(environ, start_response):
>    if(GET):
>        if(mimetype ok for gzip):
>            gzipservice = environ['service.gzip']
>            return gzipservice(environ, start_response, doGET, gzipconfigparams)
>        else: return doGET(environ, start_response)
>    if(POST):
>        if(mimetype ok for gzip):
>            gzipservice = environ['service.gzip']
>            return gzipservice(environ, start_response, doPOST,
> gzipconfigparams)
>        else: return doPOST(environ, start_response)
>    if(PUT): doPUT(environ, start_response)
> 
> The main difference here is that you don't have to initialize full
> application chains for each possible middleware-path for the request.
> This would be very useful if you had many middleware in the chain with
> many permutations as to which middleware are needed
>
> You could also instead put a service factory object into environ, it
> will return the gzipmiddleware object as a service if already exist,
> otherwise it will create it and then return it.
>

From mo.babaei at gmail.com  Thu Jul 21 13:05:37 2005
From: mo.babaei at gmail.com (mohammad babaei)
Date: Thu, 21 Jul 2005 15:35:37 +0430
Subject: [Web-SIG] Session Handling in python
Message-ID: <5bf3a41f05072104058ffffbb@mail.gmail.com>

Hi,
what is the best way for "Session Handling" in python for production use ?

regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/web-sig/attachments/20050721/930e1a59/attachment.htm

From mike_mp at zzzcomputing.com  Thu Jul 21 18:37:40 2005
From: mike_mp at zzzcomputing.com (mike bayer)
Date: Thu, 21 Jul 2005 12:37:40 -0400 (EDT)
Subject: [Web-SIG] Session Handling in python
In-Reply-To: <5bf3a41f05072104058ffffbb@mail.gmail.com>
References: <5bf3a41f05072104058ffffbb@mail.gmail.com>
Message-ID: <20114.66.192.34.8.1121963860.squirrel@66.192.34.8>

theres a mod_python FAQ entry on this which names several packages for
session management:

http://www.modpython.org/FAQ/faqw.py?req=show&file=faq03.008.htp

the first one mentioned is my own, which can adapt to mod_python, CGI and
WSGI interfaces.

mohammad babaei said:
> Hi,
> what is the best way for "Session Handling" in python for production use ?
>
> regards
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe:
> http://mail.python.org/mailman/options/web-sig/mike_mp%40zzzcomputing.com
>


From jjinux at gmail.com  Thu Jul 21 19:15:21 2005
From: jjinux at gmail.com (Shannon -jj Behrens)
Date: Thu, 21 Jul 2005 10:15:21 -0700
Subject: [Web-SIG] Session Handling in python
In-Reply-To: <20114.66.192.34.8.1121963860.squirrel@66.192.34.8>
References: <5bf3a41f05072104058ffffbb@mail.gmail.com>
	<20114.66.192.34.8.1121963860.squirrel@66.192.34.8>
Message-ID: <c41f67b905072110156036465d@mail.gmail.com>

If you use Aquarium, it has its own session infrastructure, supporting
in-memory sessions, database sessions, or whatever other backends you
want to plug in.  I think most of the other frameworks do the same.

Best Regards,
-jj

On 7/21/05, mike bayer <mike_mp at zzzcomputing.com> wrote:
> theres a mod_python FAQ entry on this which names several packages for
> session management:
> 
> http://www.modpython.org/FAQ/faqw.py?req=show&file=faq03.008.htp
> 
> the first one mentioned is my own, which can adapt to mod_python, CGI and
> WSGI interfaces.
> 
> mohammad babaei said:
> > Hi,
> > what is the best way for "Session Handling" in python for production use ?
> >
> > regards
> > _______________________________________________
> > Web-SIG mailing list
> > Web-SIG at python.org
> > Web SIG: http://www.python.org/sigs/web-sig
> > Unsubscribe:
> > http://mail.python.org/mailman/options/web-sig/mike_mp%40zzzcomputing.com
> >
> 
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/jjinux%40gmail.com
> 


-- 
I have decided to switch to Gmail, but messages to my Yahoo account will
still get through.

From chrism at plope.com  Fri Jul 22 22:38:07 2005
From: chrism at plope.com (Chris McDonough)
Date: Fri, 22 Jul 2005 16:38:07 -0400
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <31f07fc30507191108b01ba7d@mail.gmail.com>
References: <mailman.7767.1121744822.10511.web-sig@python.org>
	<31f07fc30507191108b01ba7d@mail.gmail.com>
Message-ID: <1122064687.8446.2.camel@localhost.localdomain>

I've had a stab at creating a simple WSGI deployment implementation.
I use the term "WSGI component" in here as shorthand to indicate all
types of WSGI implementations (server, application, gateway).

The primary deployment concern is to create a way to specify the
configuration of an instance of a WSGI component, preferably within a
declarative configuration file.  A secondary deployment concern is to
create a way to "wire up" components together into a specific
deployable "pipeline".  

A strawman implementation that solves both issues via the
"configurator", which would be presumed to live in "wsgiref". Currently
it lives in a package named "wsgiconfig" on my laptop.  This module
follows.

    """ Configurator for establishing a WSGI pipeline """

    from ConfigParser import ConfigParser
    import types

    def configure(path):
        config = ConfigParser()
        if isinstance(path, types.StringTypes):
            config.readfp(open(path))
        else:
            config.readfp(path)

        appsections = []

        for name in config.sections():
            if name.startswith('application:'):
                appsections.append(name)
            elif name == 'pipeline':
                pass
            else:
                raise ValueError, '%s is not a valid section name'

        app_defs = {}

        for appsection in appsections:
            app_config_file = config.get(appsection, 'config')
            app_factory_name = config.get(appsection, 'factory')
            app_name = appsection.split('application:')[1]
            if app_config_file is None:
                raise ValueError, ('application section %s requires a
"config" '
                                   'option' % app_config_file)
            if app_factory_name is None:
                raise ValueError, ('application %s requires a "factory"'
                                   ' option' % app_factory_name)
            app_defs[app_name] = {'config':app_config_file,
                                  'factory':app_factory_name}

        if not config.has_section('pipeline'):
            raise ValueError, 'must have a "pipeline" section in config'

        pipeline_str = config.get('pipeline', 'apps')
        if pipeline_str is None:
            raise ValueError, ('must have an "apps" definition in the '
                               'pipeline section')

        pipeline_def = pipeline_str.split()

        next = None

        while pipeline_def:
            app_name = pipeline_def.pop()
            app_def = app_defs.get(app_name)
            if app_def is None:
                raise ValueError, ('appname %s os defined in pipeline '
                                   '%s butno application is defined '
                                   'with that name')
            factory_name = app_def['factory']
            factory = import_by_name(factory_name)
            config_file = app_def['config']
            app_factory = factory(config_file)
            app = app_factory(next)
            next = app

        if not next:
            raise ValueError, 'no apps defined in pipeline'
        return next

    def import_by_name(name):
        if not "." in name:
            raise ValueError("unloadable name: " + `name`)
        components = name.split('.')
        start = components[0]
        g = globals()
        package = __import__(start, g, g)
        modulenames = [start]
        for component in components[1:]:
            modulenames.append(component)
            try:
                package = getattr(package, component)
            except AttributeError:
                n = '.'.join(modulenames)
                package = __import__(n, g, g, component)
        return package

  We configure a pipeline based on a config file, which
  creates and chains two "sample" WSGI applications together.

  To do this, we use a ConfigParser-format config file named
  'myapplication.conf' that looks like this::

    [application:sample1]
    config = sample1.conf
    factory = wsgiconfig.tests.sample_components.factory1

    [application:sample2]
    config = sample2.conf
    factory = wsgiconfig.tests.sample_components.factory2

    [pipeline]
    apps = sample1 sample2

  The configurator exposes a function that accepts a single argument,
  "configure".

    >>> from wsgiconfig.configurator import configure
    >>> appchain = configure('myapplication.conf')

  The "sample_components" module referred to in the
  'myapplication.conf' file application definitions might look like
  this::

      class sample1:
          """ middleware """
          def __init__(self, app):
              self.app = app
          def __call__(self, environ, start_response):
              environ['sample1'] = True
              return self.app(environ, start_response)

      class sample2:
           """ end-point app """
          def __init__(self, app):
              self.app = app

          def __call__(self, environ, start_response):
              environ['sample2'] = True
              return ['return value 2']

      def factory1(filename):
          # this app requires no configuration, but if it did, we would
          # parse the file represented by filename and do some config
          return sample1

      def factory2(filename):
          # this app requires no configuration, but if it did, we would
          # parse the file represented by filename and do some config
          return sample2

  The appchain represents an automatically constructed pipeline of
  WSGI components.  Each application in the chain is constructed from
  a factory.

    >>> appchain.__class__.__name__ # sample1 (middleware)
    'sample1'
    >>> appchain.app.__class__.__name__  # sample2 (application)
    'sample2'

  Calling the "appchain" in this example results in the keys "sample1"
  and "sample2" being available in the environment, and what is
  returned is the result of the application, which is the list
  ['return value 2'].

Potential points of contention

 - The WSGI configurator assumes that you are willing to write WSGI
   component factories which accept a filename as a config file.  This
   factory returns *another* factory (typically a class) that accepts
   "the next" application in the pipeline chain and returns a WSGI
   application instance.  This pattern is necessary to support
   argument currying across a declaratively configured pipeline,
   because the WSGI spec doesn't allow for it.  This is more contract
   than currently exists in the WSGI specification but it would be
   trivial to change existing WSGI components to adapt to this
   pattern.  Or we could adopt a pattern/convention that removed one
   of the factories, passing both the "next" application and the
   config file into a single factory function.  Whatever.  In any
   case, in order to do declarative pipeline configuration, some
   convention will need to be adopted.  The convention I'm advocating
   above seems to already have been for the current crop of middleware
   components (using a factory which accepts the application as the
   first argument).

 - Pipeline deployment configuration should be used only to configure
   essential information about pipeline and individual pipeline
   components.  Where complex service data configuration is necessary,
   the component which implements a service should provide its own
   external configuration mechanism.  For example, if an XSL service
   is implemented as a WSGI component, and it needs configuration
   knobs of some kind, these knobs should not live within the WSGI
   pipeline deployment file.  Instead, each component should have its
   own configuration file.  This is the purpose (undemonstrated above)
   of allowing an [application] section to specify a config filename.

 - Some people have seem to be arguing that there should be a single
   configuration format across all WSGI applications and gateways to
   configure everything about those components.  I don't think this is
   workable.  I think the only thing that is workable is to recommend
   to WSGI component authors that they make their components
   configurable using some configuration file or other type of path
   (URL, perhaps).  The composition, storage, and format of all other
   configuration data for the component should be chosen by the
   author.

 - Threads which discussed this earlier on the web-sig list included
   the idea that a server or gateway should be able to "find" an
   end-point application based on a lookup of source file/module +
   attrname specified in the server's configuration.  I'm suggesting
   instead that the mapping between servers, gateways, and
   applications be a pipeline and that the pipeline itself have a
   configuration definition that may live outside of any particular
   server, gateway, or application.  The pipeline definition(s) would
   wire up the servers, gateways, and applications itself.  The
   pipeline definition *could* be kept amongs the files representing a
   particular server instance on the filesystem (and this might be the
   default), but it wouldn't necessarily have to be.  This might just
   be semantics.

 - There were a few mentions of being able to configure/create a WSGI
   application at request time by passing name/value string pairs
   "through the pipeline" that would ostensibly be used to create a
   new application instance (thereby dynamically extending or
   modifying the pipeline).  I think it's fine if a particular
   component does this, but I'm suggesting that a canonization of the
   mechanism used to do this is not necessary and that it's useful to
   have the ability to define static pipelines for deployment.

 - If elements in the pipeline depend on "services" (ala
   Paste-as-not-a-chain-of-middleware-components), it may be
   advantageous to create a "service manager" instead of deploying
   each service as middleware.  The "service manager" idea is not a
   part of the deployment spec.  The service manager would itself
   likely be implemented as a piece of middleware or perhaps just a
   library.


On Wed, 2005-07-20 at 02:08 +0800, ChunWei Ho wrote:
> Hi, I have been looking at WSGI for only a few weeks, but had some
> ideas similar (I hope) to what is being discussed that I'll put down
> here. I'm new to this so I beg your indulgence if this is heading down
> the wrong track or wildly offtopic :)
> 
> It seems to me that a major drawback of WSGI middleware that is
> preventing flexible configuration/chain paths is that the application
> to be run has to be determined at init time. It is much flexible if we
> were able to specify what application to run and configuration
> information at call time - the middleware would be able to approximate
> a service of sorts.


....


From ianb at colorstudy.com  Sat Jul 23 00:26:01 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri, 22 Jul 2005 17:26:01 -0500
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <1122064687.8446.2.camel@localhost.localdomain>
References: <mailman.7767.1121744822.10511.web-sig@python.org>	<31f07fc30507191108b01ba7d@mail.gmail.com>
	<1122064687.8446.2.camel@localhost.localdomain>
Message-ID: <42E17279.5040104@colorstudy.com>

Chris McDonough wrote:
> I've had a stab at creating a simple WSGI deployment implementation.
> I use the term "WSGI component" in here as shorthand to indicate all
> types of WSGI implementations (server, application, gateway).
> 
> The primary deployment concern is to create a way to specify the
> configuration of an instance of a WSGI component, preferably within a
> declarative configuration file.  A secondary deployment concern is to
> create a way to "wire up" components together into a specific
> deployable "pipeline".  
> 
> A strawman implementation that solves both issues via the
> "configurator", which would be presumed to live in "wsgiref". Currently
> it lives in a package named "wsgiconfig" on my laptop.  This module
> follows.

I have a weird problem reading unhighlighted source.  I dunno why.  But 
anyway, the configuration file is what interests me most...

>   To do this, we use a ConfigParser-format config file named
>   'myapplication.conf' that looks like this::
> 
>     [application:sample1]
>     config = sample1.conf
>     factory = wsgiconfig.tests.sample_components.factory1
> 
>     [application:sample2]
>     config = sample2.conf
>     factory = wsgiconfig.tests.sample_components.factory2
> 
>     [pipeline]
>     apps = sample1 sample2

I think it's confusing to call both these applications.  I think 
"middleware" or "filter" would be better.  I think people understand 
"filter" far better, so I'm inclined to use that.  So...

[application:sample2]
# What is this relative to?  I hate both absolute paths and
# paths relative to pwd equally...
config = sample1.conf
factory = wsgiconfig...

[filter:sample1]
config = sample1.conf
factory = ...

[pipeline]
# The app is unique and special...?
app = sample2
filters = sample1


Well, that's just a first refactoring; I'm having other inclinations...


> Potential points of contention
> 
>  - The WSGI configurator assumes that you are willing to write WSGI
>    component factories which accept a filename as a config file.  This
>    factory returns *another* factory (typically a class) that accepts
>    "the next" application in the pipeline chain and returns a WSGI
>    application instance.  This pattern is necessary to support
>    argument currying across a declaratively configured pipeline,
>    because the WSGI spec doesn't allow for it.  This is more contract
>    than currently exists in the WSGI specification but it would be
>    trivial to change existing WSGI components to adapt to this
>    pattern.  Or we could adopt a pattern/convention that removed one
>    of the factories, passing both the "next" application and the
>    config file into a single factory function.  Whatever.  In any
>    case, in order to do declarative pipeline configuration, some
>    convention will need to be adopted.  The convention I'm advocating
>    above seems to already have been for the current crop of middleware
>    components (using a factory which accepts the application as the
>    first argument).

I hate the proliferation of configuration files this implies.  I 
consider the filters an implementation detail; if they each have 
partitioned configuration then they become a highly exposed piece of the 
architecture.

It's also a lot of management overhead.  Typical middleware takes 0-5 
configuration parameters.  For instance, paste.profilemiddleware is 
perfectly usable with no configuration at all, and only has two parameters.

But this is reasonably easy to resolve -- there's a perfectly good 
configuration section sitting there, waiting to be used:

   [filter:profile]
   factory = paste.profilemiddleware.ProfileMiddleware
   # Show top 50 functions:
   limit = 50

This in no way precludes 'config', which is just a special case of this 
general configuration.  The only real problem is a possible conflict if 
we wanted to add new special names to the configuration, i.e., 
meta-filter-configuration.

Another option is indirection like:

   [filter:profile]
   factory = paste.profilemiddleware.ProfileMiddleware

   [config:profile]
   limit = 50

If we do something like this, the interface for these factories does 
become larger, as we're passing in objects that are more complex than 
strings.

Another thing this could allow is recursive configuration, like:

[application:urlmap]
factory = paste.urlmap.URLMapBuilder
app1 = blog
app1.url = /
app2 = statview
app2.url = /stats
app3 = cms
app3.host = dev.*

[application:blog]
factory = leonardo.wsgifactory
config = myblog.conf

[application:statview]
factory = statview
log_location = /var/logs/apache2

[application:cms]
factory = proxy
location = http://localhost:8080
map = / /cms.php

[pipeline]
app = urlmap


So URLMapBuilder needs the entire configuration file passed in, along 
with the name of the section it is building.  It then reads some keys, 
and builds some named applications, and creates an application that 
delegates based on patterns.  That's the kind of configuration file I 
could really use.

Of course, if I really wanted this I could implement:

[application:configurable]
factory = paste.configurable_pipeline
conf = abetterconffile.conf

But then the configuration file becomes a dummy configuration, and no 
one else gets to use my fancier middleware with the normal configuration 
file.

>  - Pipeline deployment configuration should be used only to configure
>    essential information about pipeline and individual pipeline
>    components.  Where complex service data configuration is necessary,
>    the component which implements a service should provide its own
>    external configuration mechanism.  For example, if an XSL service
>    is implemented as a WSGI component, and it needs configuration
>    knobs of some kind, these knobs should not live within the WSGI
>    pipeline deployment file.  Instead, each component should have its
>    own configuration file.  This is the purpose (undemonstrated above)
>    of allowing an [application] section to specify a config filename.

The intelligent finding of files is important to me with any references 
to filenames.  Working directory is, IMHO, fragile and unreliable. 
Absolute paths are reliable but fragile.

In some cases module names are a more robust way of location resources, 
if those modules are self-describing applications.  Mostly because 
there's a search path.  Several projects encourage this kind of system, 
though I'm not particularly fond of it because it mixes 
installation-specific files with code.

>  - Some people have seem to be arguing that there should be a single
>    configuration format across all WSGI applications and gateways to
>    configure everything about those components.  I don't think this is
>    workable.  I think the only thing that is workable is to recommend
>    to WSGI component authors that they make their components
>    configurable using some configuration file or other type of path
>    (URL, perhaps).  The composition, storage, and format of all other
>    configuration data for the component should be chosen by the
>    author.

While I appreciate the difficulty of agreeing on a configuration format, 
the way this proposal avoids that is by underpowering the deployment 
file so that authors are forced to create other configuration files.

>  - Threads which discussed this earlier on the web-sig list included
>    the idea that a server or gateway should be able to "find" an
>    end-point application based on a lookup of source file/module +
>    attrname specified in the server's configuration.  I'm suggesting
>    instead that the mapping between servers, gateways, and
>    applications be a pipeline and that the pipeline itself have a
>    configuration definition that may live outside of any particular
>    server, gateway, or application.  The pipeline definition(s) would
>    wire up the servers, gateways, and applications itself.  The
>    pipeline definition *could* be kept amongs the files representing a
>    particular server instance on the filesystem (and this might be the
>    default), but it wouldn't necessarily have to be.  This might just
>    be semantics.

I think it's mostly semantics.

>  - There were a few mentions of being able to configure/create a WSGI
>    application at request time by passing name/value string pairs
>    "through the pipeline" that would ostensibly be used to create a
>    new application instance (thereby dynamically extending or
>    modifying the pipeline).  I think it's fine if a particular
>    component does this, but I'm suggesting that a canonization of the
>    mechanism used to do this is not necessary and that it's useful to
>    have the ability to define static pipelines for deployment.

It does concern me that we allow for dynamic systems.  A dynamic system 
allows for more levels of abstraction in deployment, meaning more 
potential for automation.

I think this can be achieved simply by defining a standard based on the 
object interface, where the configuration file itself is a reference 
implementation (that we expect people will usually use).  Semantics from 
the configuration file will leak through, but it's lot easier to deal 
with (for example) a system that can only support string configuration 
values, than a system based on concrete files in a specific format.

>  - If elements in the pipeline depend on "services" (ala
>    Paste-as-not-a-chain-of-middleware-components), it may be
>    advantageous to create a "service manager" instead of deploying
>    each service as middleware.  The "service manager" idea is not a
>    part of the deployment spec.  The service manager would itself
>    likely be implemented as a piece of middleware or perhaps just a
>    library.

That might be best.  It's also quite possible for the factory to 
instantiate more middleware.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From ianb at colorstudy.com  Sat Jul 23 20:46:02 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Sat, 23 Jul 2005 13:46:02 -0500
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <42E17279.5040104@colorstudy.com>
References: <mailman.7767.1121744822.10511.web-sig@python.org>	<31f07fc30507191108b01ba7d@mail.gmail.com>	<1122064687.8446.2.camel@localhost.localdomain>
	<42E17279.5040104@colorstudy.com>
Message-ID: <42E2906A.1060004@colorstudy.com>

>>  To do this, we use a ConfigParser-format config file named
>>  'myapplication.conf' that looks like this::
>>
>>    [application:sample1]
>>    config = sample1.conf
>>    factory = wsgiconfig.tests.sample_components.factory1
>>
>>    [application:sample2]
>>    config = sample2.conf
>>    factory = wsgiconfig.tests.sample_components.factory2
>>
>>    [pipeline]
>>    apps = sample1 sample2

On another tack, I think it's important we consider how 
setuptools/pkg_resources fits into this.  Specifically we should allow:

[application:sample1]
require = WSGIConfig
factory = ...

Since the factory might not be importable until require() is called. 
There's lots of other potential benefits to being able to get that 
information about requirements as well.

Another option is if, instead of a factory (or as an alternative 
alongside it) we make distributions publishable themselves, like:

[application:sample]
egg = MyAppSuite[filebrowser]

Which would require('MyAppSuite[filebrowser]'), and look in 
Paste.egg-info for a configuration file.  The [filebrowser] portion is 
pkg_resource's way of defining a feature, and I figure it can also 
identify a specific application if one package holds multiple 
applications.  However, that feature specification would be optional. 
What the configuration file in egg-info looks like, I don't know. 
Probably just like the original configuration file, except this time 
with a factory.

I don't like the configuration key "egg" though.  But eh, that's a detail.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From chrism at plope.com  Sun Jul 24 02:08:03 2005
From: chrism at plope.com (Chris McDonough)
Date: Sat, 23 Jul 2005 20:08:03 -0400
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <42E17279.5040104@colorstudy.com>
References: <mailman.7767.1121744822.10511.web-sig@python.org>
	<31f07fc30507191108b01ba7d@mail.gmail.com>
	<1122064687.8446.2.camel@localhost.localdomain>
	<42E17279.5040104@colorstudy.com>
Message-ID: <1122163683.3650.132.camel@plope.dyndns.org>

On Fri, 2005-07-22 at 17:26 -0500, Ian Bicking wrote:

> >   To do this, we use a ConfigParser-format config file named
> >   'myapplication.conf' that looks like this::
> > 
> >     [application:sample1]
> >     config = sample1.conf
> >     factory = wsgiconfig.tests.sample_components.factory1
> > 
> >     [application:sample2]
> >     config = sample2.conf
> >     factory = wsgiconfig.tests.sample_components.factory2
> > 
> >     [pipeline]
> >     apps = sample1 sample2
> 
> I think it's confusing to call both these applications.  I think 
> "middleware" or "filter" would be better.  I think people understand 
> "filter" far better, so I'm inclined to use that.  So...

The reason I called them applications instead of filters is because all
of them implement the WSGI "application" API (they all implement "a
callable that accepts two parameters, environ and start_response").
Some happen to be gateways/filters/middleware/whatever but at least one
is just an application and does no delegation.  In my example above,
"sample2" is not a filter, it is the end-point application.  "sample1"
is a filter, but it's of course also an application too.

Would you maybe rather make it more explicit that some apps are also
gateways, e.g.:

[application:bleeb]
config = bleeb.conf
factory = bleeb.factory

[filter:blaz]
config = blaz.conf
factory = blaz.factory

?  I don't know that there's any way we could make use of the
distinction between the two types in the configurator other than
disallowing people to place an application "before" a filter in a
pipeline through validation.  Is there something else you had in mind?

> [application:sample2]
> # What is this relative to?  I hate both absolute paths and
> # paths relative to pwd equally...
> config = sample1.conf
> factory = wsgiconfig...

This was from a doctest I wrote so I could rely on relative paths,
sorry.  You're right.  Ummmm... we could probably cause use the
environment as "defaults" to ConfigParser inerpolation and set whatever
we need before the configurator is run:

$ export APP_ROOT=/home/chrism/myapplication
$ ./wsgi-configurator.py myapplication.conf

And in myapplication.conf:

[application:sample1]
config = %(APP_ROOT)s/sample1.conf
factory = myapp.sample1.factory

That would probably be the least-effort and most flexible thing to do
and doesn't mandate any particular directory structure.  Of course, we
could provide a convention for a recommended directory structure, but
this gives us an "out" from being painted in to that in specific cases.

> [pipeline]
> # The app is unique and special...?
> app = sample2
> filters = sample1
> 
> 
> 
> Well, that's just a first refactoring; I'm having other inclinations...

I'm not sure whether this is just a stylistic thing or if there's a
reason you want to treat the endpoint app specially.  By definition, in
my implementation, the endpoint app is just the last app mentioned in
the pipeline.

> > Potential points of contention
> > 
> >  - The WSGI configurator assumes that you are willing to write WSGI
> >    component factories which accept a filename as a config file.  This
> >    factory returns *another* factory (typically a class) that accepts
> >    "the next" application in the pipeline chain and returns a WSGI
> >    application instance.  This pattern is necessary to support
> >    argument currying across a declaratively configured pipeline,
> >    because the WSGI spec doesn't allow for it.  This is more contract
> >    than currently exists in the WSGI specification but it would be
> >    trivial to change existing WSGI components to adapt to this
> >    pattern.  Or we could adopt a pattern/convention that removed one
> >    of the factories, passing both the "next" application and the
> >    config file into a single factory function.  Whatever.  In any
> >    case, in order to do declarative pipeline configuration, some
> >    convention will need to be adopted.  The convention I'm advocating
> >    above seems to already have been for the current crop of middleware
> >    components (using a factory which accepts the application as the
> >    first argument).
> 
> I hate the proliferation of configuration files this implies.  I 
> consider the filters an implementation detail; if they each have 
> partitioned configuration then they become a highly exposed piece of the 
> architecture.
> 
> It's also a lot of management overhead.  Typical middleware takes 0-5 
> configuration parameters.  For instance, paste.profilemiddleware is 
> perfectly usable with no configuration at all, and only has two parameters.

True.  The config file param should be optional.  Apps might use the
environment to configure themselves.

> But this is reasonably easy to resolve -- there's a perfectly good 
> configuration section sitting there, waiting to be used:
> 
>    [filter:profile]
>    factory = paste.profilemiddleware.ProfileMiddleware
>    # Show top 50 functions:
>    limit = 50
> 
> This in no way precludes 'config', which is just a special case of this 
> general configuration.  The only real problem is a possible conflict if 
> we wanted to add new special names to the configuration, i.e., 
> meta-filter-configuration.

I think I'd maybe rather see configuration settings for apps that don't
require much configuration to come in as environment variables (maybe
not necessarily in the "environ" namespace that is implied by the WSGI
callable interface but instead in os.environ).  Envvars are
uncontroversial, so they don't cost us any coding time, PEP time, or
brain cycles.

But if you really do want a bunch of config to happen in the pipeline
deployment file itself (definitely to be able to visually inspect it all
in one place would be nice), maybe there could be one optional section
in the pipeline deployment config file that sets keys and values into
os.environ before creating any application instances:

[environment]
app1.hosed = true
app2.disabled = false

... apps could just look for these keys and values in os.environ within
their factories and configure themselves appropriately.  If you didn't
particularly want this, you could not define the section and just do:

$ app1.hosed=true app2.hosed=false ./wsgi-configurator.py \ 
    myapplication.conf

or run a shell script to export these things before running the
configurator.

> Another option is indirection like:
> 
>    [filter:profile]
>    factory = paste.profilemiddleware.ProfileMiddleware
> 
>    [config:profile]
>    limit = 50
> 
> If we do something like this, the interface for these factories does 
> become larger, as we're passing in objects that are more complex than 
> strings.

Sure.  If this were a democracy, I'd vote to use a single well-known
already-existing namespace (os.environ) as a config namespace for all
apps that don't require their own config files instead of baking the
idea of configuration sections for the apps themselves into the
configurator logic.  But I'd like to hear what others besides you and me
think.

> Another thing this could allow is recursive configuration, like:
> 
> [application:urlmap]
> factory = paste.urlmap.URLMapBuilder
> app1 = blog
> app1.url = /
> app2 = statview
> app2.url = /stats
> app3 = cms
> app3.host = dev.*
> 
> [application:blog]
> factory = leonardo.wsgifactory
> config = myblog.conf
> 
> [application:statview]
> factory = statview
> log_location = /var/logs/apache2
> 
> [application:cms]
> factory = proxy
> location = http://localhost:8080
> map = / /cms.php
> 
> [pipeline]
> app = urlmap
> 
> 
> So URLMapBuilder needs the entire configuration file passed in, along 
> with the name of the section it is building.  It then reads some keys, 
> and builds some named applications, and creates an application that 
> delegates based on patterns.  That's the kind of configuration file I 
> could really use.

Maybe one other (less flexible, but declaratively configurable and
simpler to code) way to do this might be by canonizing the idea of
"decision middleware", allowing one component in an otherwise static
pipeline to decide which is the "next" one by executing a Python
expression which runs in a context that exposes the WSGI environment.

[application:blog]
factory = leonardo.wsgifactory
config = myblog.conf

[application:statview]
factory = statview

[application:cms]
factory = proxy

[decision:urlmapper]
cms = environ['PATH_INFO'].startswith('/cms')
statview = environ['PATH_INFO'].startswith('/statview')
blog = environ['PATH_INFO'].startswith('/blog')

[environment]
statview.log_location = /var/logs/apache2
cms.location = http://localhost:8080
cms.map = / /cms.php

[pipeline]
apps = urlmapper

> Of course, if I really wanted this I could implement:
> 
> [application:configurable]
> factory = paste.configurable_pipeline
> conf = abetterconffile.conf
> 
> But then the configuration file becomes a dummy configuration, and no 
> one else gets to use my fancier middleware with the normal configuration 
> file.


> >  - Pipeline deployment configuration should be used only to configure
> >    essential information about pipeline and individual pipeline
> >    components.  Where complex service data configuration is necessary,
> >    the component which implements a service should provide its own
> >    external configuration mechanism.  For example, if an XSL service
> >    is implemented as a WSGI component, and it needs configuration
> >    knobs of some kind, these knobs should not live within the WSGI
> >    pipeline deployment file.  Instead, each component should have its
> >    own configuration file.  This is the purpose (undemonstrated above)
> >    of allowing an [application] section to specify a config filename.
> 
> The intelligent finding of files is important to me with any references 
> to filenames.  Working directory is, IMHO, fragile and unreliable. 
> Absolute paths are reliable but fragile.

Yup.  

> In some cases module names are a more robust way of location resources, 
> if those modules are self-describing applications.  Mostly because 
> there's a search path.  Several projects encourage this kind of system, 
> though I'm not particularly fond of it because it mixes 
> installation-specific files with code.
> 
> >  - Some people have seem to be arguing that there should be a single
> >    configuration format across all WSGI applications and gateways to
> >    configure everything about those components.  I don't think this is
> >    workable.  I think the only thing that is workable is to recommend
> >    to WSGI component authors that they make their components
> >    configurable using some configuration file or other type of path
> >    (URL, perhaps).  The composition, storage, and format of all other
> >    configuration data for the component should be chosen by the
> >    author.
> 
> While I appreciate the difficulty of agreeing on a configuration format, 
> the way this proposal avoids that is by underpowering the deployment 
> file so that authors are forced to create other configuration files.

I *think* promoting a convention of using environment variables to do
configuration and allowing envvars to be set in the main deployment file
solves this for apps that don't actually need their own config file.

> >  - There were a few mentions of being able to configure/create a WSGI
> >    application at request time by passing name/value string pairs
> >    "through the pipeline" that would ostensibly be used to create a
> >    new application instance (thereby dynamically extending or
> >    modifying the pipeline).  I think it's fine if a particular
> >    component does this, but I'm suggesting that a canonization of the
> >    mechanism used to do this is not necessary and that it's useful to
> >    have the ability to define static pipelines for deployment.
> 
> It does concern me that we allow for dynamic systems.  A dynamic system 
> allows for more levels of abstraction in deployment, meaning more 
> potential for automation.

Yes.  OTOH, when a certain level of dynamicism is reached, it's no
longer possible to configure things declaratively because it becomes a
programming task, and this proposal is (so far) just about being able to
configure things declaratively so I think we need some sort of
compromise.

> I think this can be achieved simply by defining a standard based on the 
> object interface, where the configuration file itself is a reference 
> implementation (that we expect people will usually use).  Semantics from 
> the configuration file will leak through, but it's lot easier to deal 
> with (for example) a system that can only support string configuration 
> values, than a system based on concrete files in a specific format.

Sorry, I can't parse that paragraph.

> >  - If elements in the pipeline depend on "services" (ala
> >    Paste-as-not-a-chain-of-middleware-components), it may be
> >    advantageous to create a "service manager" instead of deploying
> >    each service as middleware.  The "service manager" idea is not a
> >    part of the deployment spec.  The service manager would itself
> >    likely be implemented as a piece of middleware or perhaps just a
> >    library.
> 
> That might be best.  It's also quite possible for the factory to 
> instantiate more middleware.

Which factory?

Thanks,

- C


From pje at telecommunity.com  Sun Jul 24 02:21:13 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat, 23 Jul 2005 20:21:13 -0400
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <1122163683.3650.132.camel@plope.dyndns.org>
References: <42E17279.5040104@colorstudy.com>
	<mailman.7767.1121744822.10511.web-sig@python.org>
	<31f07fc30507191108b01ba7d@mail.gmail.com>
	<1122064687.8446.2.camel@localhost.localdomain>
	<42E17279.5040104@colorstudy.com>
Message-ID: <5.1.1.6.0.20050723201631.027d8628@mail.telecommunity.com>

At 08:08 PM 7/23/2005 -0400, Chris McDonough wrote:
>Would you maybe rather make it more explicit that some apps are also
>gateways, e.g.:
>
>[application:bleeb]
>config = bleeb.conf
>factory = bleeb.factory
>
>[filter:blaz]
>config = blaz.conf
>factory = blaz.factory

That looks backwards to me.  Why not just list the sections in pipeline 
order?  i.e., outermost middleware first, and the final application last?

For that matter, if you did that, you could specify the above as:

     [blaz.factory]
     config=blaz.conf

     [bleeb.factory]
     config=bleeb.conf

Which looks a lot nicer to me.  If you want global WSGI or server options 
for the stack, one could always use multi-word section names e.g.:

     [WSGI options]
     multi_thread = 0

     [mod_python options]
     blah = "feh"

and not treat these sections as part of the pipeline.  For Ian's idea about 
requiring particular projects to be available (via pkg_resources), I'd 
suggest making that sort of thing part of one of the options sections.


From chrism at plope.com  Sun Jul 24 02:41:43 2005
From: chrism at plope.com (Chris McDonough)
Date: Sat, 23 Jul 2005 20:41:43 -0400
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <5.1.1.6.0.20050723201631.027d8628@mail.telecommunity.com>
References: <42E17279.5040104@colorstudy.com>
	<mailman.7767.1121744822.10511.web-sig@python.org>
	<31f07fc30507191108b01ba7d@mail.gmail.com>
	<1122064687.8446.2.camel@localhost.localdomain>
	<42E17279.5040104@colorstudy.com>
	<5.1.1.6.0.20050723201631.027d8628@mail.telecommunity.com>
Message-ID: <1122165703.3650.144.camel@plope.dyndns.org>

On Sat, 2005-07-23 at 20:21 -0400, Phillip J. Eby wrote:
> At 08:08 PM 7/23/2005 -0400, Chris McDonough wrote:
> >Would you maybe rather make it more explicit that some apps are also
> >gateways, e.g.:
> >
> >[application:bleeb]
> >config = bleeb.conf
> >factory = bleeb.factory
> >
> >[filter:blaz]
> >config = blaz.conf
> >factory = blaz.factory
> 
> That looks backwards to me.  Why not just list the sections in pipeline 
> order?  i.e., outermost middleware first, and the final application last?
> 
> For that matter, if you did that, you could specify the above as:
> 
>      [blaz.factory]
>      config=blaz.conf
> 
>      [bleeb.factory]
>      config=bleeb.conf

Guess that would work for me, but out of the box, ConfigParser doesn't
appear to preserve section ordering.  I'm sure we could make it do that.
Not a dealbreaker either, but if you ever did want a way to
declaratively configure something in the config file like the generic
"decision middleware" I described in that message, this wouldn't really
work.  I hadn't described it yet, but I can also imagine declaring
multiple pipelines in the config file and using decision middleware to
choose the first app in the next pipeline (as opposed to just an app).

- C


From ianb at colorstudy.com  Sun Jul 24 03:01:25 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Sat, 23 Jul 2005 20:01:25 -0500
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <1122163683.3650.132.camel@plope.dyndns.org>
References: <mailman.7767.1121744822.10511.web-sig@python.org>	
	<31f07fc30507191108b01ba7d@mail.gmail.com>	
	<1122064687.8446.2.camel@localhost.localdomain>	
	<42E17279.5040104@colorstudy.com>
	<1122163683.3650.132.camel@plope.dyndns.org>
Message-ID: <42E2E865.2020702@colorstudy.com>

Chris McDonough wrote:
> On Fri, 2005-07-22 at 17:26 -0500, Ian Bicking wrote:
>>>  To do this, we use a ConfigParser-format config file named
>>>  'myapplication.conf' that looks like this::
>>>
>>>    [application:sample1]
>>>    config = sample1.conf
>>>    factory = wsgiconfig.tests.sample_components.factory1
>>>
>>>    [application:sample2]
>>>    config = sample2.conf
>>>    factory = wsgiconfig.tests.sample_components.factory2
>>>
>>>    [pipeline]
>>>    apps = sample1 sample2
>>
>>I think it's confusing to call both these applications.  I think 
>>"middleware" or "filter" would be better.  I think people understand 
>>"filter" far better, so I'm inclined to use that.  So...
> 
> 
> The reason I called them applications instead of filters is because all
> of them implement the WSGI "application" API (they all implement "a
> callable that accepts two parameters, environ and start_response").
> Some happen to be gateways/filters/middleware/whatever but at least one
> is just an application and does no delegation.  In my example above,
> "sample2" is not a filter, it is the end-point application.  "sample1"
> is a filter, but it's of course also an application too.

Well, the difference I see is that a filter accepts a next-application, 
where a plain application does not.  From the perspective of this 
configuration file, those seem ver different.  In fact, it could 
actually be:

   [application:sample1]
   config = sample1.conf
   factory = ...

   ...

   [application:real_sample1]
   pipeline = printdebug_app sample1

That is, a "pipeline" simply describes a new application.  And then -- 
perhaps with a conventional name, or through some more global 
configuration -- we indicate which application we are going to serve.

Hmm... thinking about it, this seems much more general, in a very useful 
way, since anyone can plugin in ways to compose applications. 
"pipeline" is just one use case for how to compose applications.

> Would you maybe rather make it more explicit that some apps are also
> gateways, e.g.:
> 
> [application:bleeb]
> config = bleeb.conf
> factory = bleeb.factory
> 
> [filter:blaz]
> config = blaz.conf
> factory = blaz.factory
> 
> ?  I don't know that there's any way we could make use of the
> distinction between the two types in the configurator other than
> disallowing people to place an application "before" a filter in a
> pipeline through validation.  Is there something else you had in mind?

I have forgotten what the actual factory interface was, but I think it 
should be different for the two.  Well, I think it *is* different, and 
passing in a next-application of None just covers up that difference.

>>[application:sample2]
>># What is this relative to?  I hate both absolute paths and
>># paths relative to pwd equally...
>>config = sample1.conf
>>factory = wsgiconfig...
> 
> 
> This was from a doctest I wrote so I could rely on relative paths,
> sorry.  You're right.  Ummmm... we could probably cause use the
> environment as "defaults" to ConfigParser inerpolation and set whatever
> we need before the configurator is run:
> 
> $ export APP_ROOT=/home/chrism/myapplication
> $ ./wsgi-configurator.py myapplication.conf
> 
> And in myapplication.conf:
> 
> [application:sample1]
> config = %(APP_ROOT)s/sample1.conf
> factory = myapp.sample1.factory

I hate %(APP_ROOT)s as a syntax; I think it's okay to simply say that 
the configuration loader (in some fashion) should determine the root 
(maybe with an environmental variable or command line parameter).

Though, realistically, there might be several app roots.  Apache's root 
directory configuration (for relative paths) isn't very useful to me, in 
practice, because it's not flexible enough nor allow more than one root.

>>But this is reasonably easy to resolve -- there's a perfectly good 
>>configuration section sitting there, waiting to be used:
>>
>>   [filter:profile]
>>   factory = paste.profilemiddleware.ProfileMiddleware
>>   # Show top 50 functions:
>>   limit = 50
>>
>>This in no way precludes 'config', which is just a special case of this 
>>general configuration.  The only real problem is a possible conflict if 
>>we wanted to add new special names to the configuration, i.e., 
>>meta-filter-configuration.
> 
> 
> I think I'd maybe rather see configuration settings for apps that don't
> require much configuration to come in as environment variables (maybe
> not necessarily in the "environ" namespace that is implied by the WSGI
> callable interface but instead in os.environ).  Envvars are
> uncontroversial, so they don't cost us any coding time, PEP time, or
> brain cycles.

Yikes!  Were you like the ZConfig holdout or something?  os.environ is 
way, way, way too inflexible.

Just the other day I was able to deploy a single application I wrote 
with two configurations in the same process, without having thought 
about that possibility ahead of time, and without doing any extra work 
or avoiding any particular shortcuts.  It worked absolutely seamlessly, 
because I wasn't using any global variables, and I had stuck to a 
convention where Paste nests configurations in a safe manner. 
os.environ is very global, very hard to work with from a UI perspective, 
and very invisible.  These configuration files should be totally 
encapsulated, and easy to nest.

There's a small number of places where I might be open to using 
environmental variables as an *optional* way to feed information, like 
APP_ROOT (but even there I feel strongly there should be a 
configuration-file-based way to say the same thing).  For middleware 
configuration it makes no sense at all -- configuration must be 
encapsulated in the file itself (or the files that are referenced).

>>Another thing this could allow is recursive configuration, like:
>>
>>[application:urlmap]
>>factory = paste.urlmap.URLMapBuilder
>>app1 = blog
>>app1.url = /
>>app2 = statview
>>app2.url = /stats
>>app3 = cms
>>app3.host = dev.*
>>
>>[application:blog]
>>factory = leonardo.wsgifactory
>>config = myblog.conf
>>
>>[application:statview]
>>factory = statview
>>log_location = /var/logs/apache2
>>
>>[application:cms]
>>factory = proxy
>>location = http://localhost:8080
>>map = / /cms.php
>>
>>[pipeline]
>>app = urlmap
>>
>>
>>So URLMapBuilder needs the entire configuration file passed in, along 
>>with the name of the section it is building.  It then reads some keys, 
>>and builds some named applications, and creates an application that 
>>delegates based on patterns.  That's the kind of configuration file I 
>>could really use.
> 
> 
> Maybe one other (less flexible, but declaratively configurable and
> simpler to code) way to do this might be by canonizing the idea of
> "decision middleware", allowing one component in an otherwise static
> pipeline to decide which is the "next" one by executing a Python
> expression which runs in a context that exposes the WSGI environment.
> 
> [application:blog]
> factory = leonardo.wsgifactory
> config = myblog.conf
> 
> [application:statview]
> factory = statview
> 
> [application:cms]
> factory = proxy
> 
> [decision:urlmapper]
> cms = environ['PATH_INFO'].startswith('/cms')
> statview = environ['PATH_INFO'].startswith('/statview')
> blog = environ['PATH_INFO'].startswith('/blog')

Well, that's hard to imagine working.  First, you'd need a way to import 
new functions, since a large number of use cases can't be handled 
without imports (like re).  But even then, these transformations 
typically modify the environment.  For instance, if you map /cms to an 
application, you have to put /cms onto SCRIPT_NAME, and take it off of 
PATH_INFO.  This keeps URL introspection sane.

But the example I gave seems just as declarative to me (moreso, even), 
and not hard to implement.  It just requires that the factory get a 
reference to the full parsed configuration file.

> [environment]
> statview.log_location = /var/logs/apache2
> cms.location = http://localhost:8080
> cms.map = / /cms.php
> 
> [pipeline]
> apps = urlmapper

> Yes.  OTOH, when a certain level of dynamicism is reached, it's no
> longer possible to configure things declaratively because it becomes a
> programming task, and this proposal is (so far) just about being able to
> configure things declaratively so I think we need some sort of
> compromise.
> 
> 
>>I think this can be achieved simply by defining a standard based on the 
>>object interface, where the configuration file itself is a reference 
>>implementation (that we expect people will usually use).  Semantics from 
>>the configuration file will leak through, but it's lot easier to deal 
>>with (for example) a system that can only support string configuration 
>>values, than a system based on concrete files in a specific format.
> 
> 
> Sorry, I can't parse that paragraph.

I mean that a standard should be in terms of what interface the 
factories must implement, and what objects they are given.  The actual 
implementation of a loader based on an INI configuration file is a 
useful reference library (and maybe the only library we need), but 
shouldn't be part of the standard.

>>> - If elements in the pipeline depend on "services" (ala
>>>   Paste-as-not-a-chain-of-middleware-components), it may be
>>>   advantageous to create a "service manager" instead of deploying
>>>   each service as middleware.  The "service manager" idea is not a
>>>   part of the deployment spec.  The service manager would itself
>>>   likely be implemented as a piece of middleware or perhaps just a
>>>   library.
>>
>>That might be best.  It's also quite possible for the factory to 
>>instantiate more middleware.
> 
> 
> Which factory?

The object referenced by the "factory" key in the configuration file.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From pje at telecommunity.com  Sun Jul 24 03:57:13 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat, 23 Jul 2005 21:57:13 -0400
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <1122165703.3650.144.camel@plope.dyndns.org>
References: <5.1.1.6.0.20050723201631.027d8628@mail.telecommunity.com>
	<42E17279.5040104@colorstudy.com>
	<mailman.7767.1121744822.10511.web-sig@python.org>
	<31f07fc30507191108b01ba7d@mail.gmail.com>
	<1122064687.8446.2.camel@localhost.localdomain>
	<42E17279.5040104@colorstudy.com>
	<5.1.1.6.0.20050723201631.027d8628@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20050723212408.02878df0@mail.telecommunity.com>

At 08:41 PM 7/23/2005 -0400, Chris McDonough wrote:
>On Sat, 2005-07-23 at 20:21 -0400, Phillip J. Eby wrote:
> > At 08:08 PM 7/23/2005 -0400, Chris McDonough wrote:
> > >Would you maybe rather make it more explicit that some apps are also
> > >gateways, e.g.:
> > >
> > >[application:bleeb]
> > >config = bleeb.conf
> > >factory = bleeb.factory
> > >
> > >[filter:blaz]
> > >config = blaz.conf
> > >factory = blaz.factory
> >
> > That looks backwards to me.  Why not just list the sections in pipeline
> > order?  i.e., outermost middleware first, and the final application last?
> >
> > For that matter, if you did that, you could specify the above as:
> >
> >      [blaz.factory]
> >      config=blaz.conf
> >
> >      [bleeb.factory]
> >      config=bleeb.conf
>
>Guess that would work for me, but out of the box, ConfigParser doesn't
>appear to preserve section ordering.  I'm sure we could make it do that.
>Not a dealbreaker either, but if you ever did want a way to
>declaratively configure something in the config file like the generic
>"decision middleware" I described in that message, this wouldn't really
>work.  I hadn't described it yet, but I can also imagine declaring
>multiple pipelines in the config file and using decision middleware to
>choose the first app in the next pipeline (as opposed to just an app).

I consider this a YAGNI, myself.  But then again, most of the pipeline 
stuff seems like a YAGNI to me.

Probably that's because everything you guys are talking about implementing 
with pipelines of middleware, I'd use a single generic function for.  If I 
was wrapping oblivious or legacy apps, I'd just make one middleware object 
that then calls the generic function to do any and all dynamic 
requirements, because it would only take a little bit of syntax sugar to 
implement "configuration" scripts like:

     use_auth("/some/subdir", some_auth_service)
     mount_app("/other/path", some_app_object)

etc.  So, all the time spent on coming up with an uglier, less-powerful 
pseudo-framework to simulate these capabilities using crude .ini files and 
poking stuff into environ seems kind of wasteful to me, versus defining a 
powerful API to -- dare I say it -- "paste" applications together.  :)

However, such an API deserves to be both powerful and easy-to-use, not 
kludged together with .ini syntax.

That's not saying I don't think WSGI should have a deployment configuration 
format based on .ini syntax -- I still do!  I just don't think it should 
even attempt to allow anything complex.  A simple static pipeline and some 
server-defined and WSGI-defined options will do nicely for the "simple 
things are simple" case, and a Python file will do nicely for all the 
"complex things are possible" cases.

That's why I'd like to see this effort split into two parts: 1) simple 
deployment, and 2) a "pasting" API whose entire purpose in life is to 
stack, route, and multiplex "middleware" and "applications" without having 
to explicitly manage a pipeline.

This API would use *specificity* as a basis for establishing pipelines, 
because it's not at all scalable (developer-wise) to set up pipelines on a 
URL-by-URL basis for a complex application -- especially for applications 
that aren't page-based!  Usually, you'll need some kind of pipeline 
inheritance to manage that sort of thing.

There is little reason, however, why you can't configure a significant 
portion of a URL space using a single WSGI component, using an appropriate 
mechanism.  For example, recasting my earlier example:

     def factory(container):
         container.use_auth("some/subdir", some_auth_service)
         container.mount_app_factory("other/path", some_app_factory)

Then, the 'mount_app_factory()' call could invoke 
'some_app_factory(subcontainer)' where 'subcontainer' is a wrapper that 
prepends 'other/path' to URLs before delegating to 'container'.

In other words, once you have this "container API", there's no reason not 
to just use it to implement the whole stack in a single middleware object.

Anyway, this is why I think there should be a "WSGI Services" and/or "WSGI 
Container API" spec, distinct from a "WSGI Deployment Metadata" 
spec.  These two spheres are both valuable, but I think it'll take longer 
to get a "deployment" spec if we mix "container API" stuff into it -- and 
get a much less useful container API than if we set our minds on making a 
good container API, rather than a souped-up deployment descriptor.


From mo.babaei at gmail.com  Sun Jul 24 07:02:20 2005
From: mo.babaei at gmail.com (mohammad babaei)
Date: Sun, 24 Jul 2005 09:32:20 +0430
Subject: [Web-SIG] change "?" into "/" in url
Message-ID: <5bf3a41f050723220239b0eacb@mail.gmail.com>

Hi,
how can i change "?" into "/" in urls ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/web-sig/attachments/20050724/511c0409/attachment.htm

From jonathan at carnageblender.com  Sun Jul 24 07:06:00 2005
From: jonathan at carnageblender.com (Jonathan Ellis)
Date: Sat, 23 Jul 2005 22:06:00 -0700
Subject: [Web-SIG] change "?" into "/" in url
In-Reply-To: <5bf3a41f050723220239b0eacb@mail.gmail.com>
References: <5bf3a41f050723220239b0eacb@mail.gmail.com>
Message-ID: <1122181560.12090.239102466@webmail.messagingengine.com>

On Sun, 24 Jul 2005 09:32:20 +0430, "mohammad babaei"
<mo.babaei at gmail.com> said:
> Hi,
> how can i change "?" into "/" in urls ?

It's quite platform-dependent...  if Apache is an option, mod_rewrite is
your friend.

Well, okay, mod_rewrite isn't really friendly even on a good day, but
it's a common solution. :)

-Jonathan

From chrism at plope.com  Sun Jul 24 09:38:40 2005
From: chrism at plope.com (Chris McDonough)
Date: Sun, 24 Jul 2005 03:38:40 -0400
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <5.1.1.6.0.20050723212408.02878df0@mail.telecommunity.com>
References: <5.1.1.6.0.20050723201631.027d8628@mail.telecommunity.com>
	<42E17279.5040104@colorstudy.com>
	<mailman.7767.1121744822.10511.web-sig@python.org>
	<31f07fc30507191108b01ba7d@mail.gmail.com>
	<1122064687.8446.2.camel@localhost.localdomain>
	<42E17279.5040104@colorstudy.com>
	<5.1.1.6.0.20050723201631.027d8628@mail.telecommunity.com>
	<5.1.1.6.0.20050723212408.02878df0@mail.telecommunity.com>
Message-ID: <1122190720.3650.186.camel@plope.dyndns.org>

On Sat, 2005-07-23 at 21:57 -0400, Phillip J. Eby wrote:
> > > For that matter, if you did that, you could specify the above as:
> > >
> > >      [blaz.factory]
> > >      config=blaz.conf
> > >
> > >      [bleeb.factory]
> > >      config=bleeb.conf
> >
> >Guess that would work for me, but out of the box, ConfigParser doesn't
> >appear to preserve section ordering.  I'm sure we could make it do that.
> >Not a dealbreaker either, but if you ever did want a way to
> >declaratively configure something in the config file like the generic
> >"decision middleware" I described in that message, this wouldn't really
> >work.  I hadn't described it yet, but I can also imagine declaring
> >multiple pipelines in the config file and using decision middleware to
> >choose the first app in the next pipeline (as opposed to just an app).
> 
> I consider this a YAGNI, myself.  But then again, most of the pipeline 
> stuff seems like a YAGNI to me.
> 
> Probably that's because everything you guys are talking about implementing 
> with pipelines of middleware, I'd use a single generic function for. 

FWIW, I think I fall somewhere between you and Ian on this, and maybe
more towards you.

I believe that there are services that are usefully composed as
middleware ("oblivious" things like XSL renderering and caches).  But
sessioning and auth services and whatnot I wouldn't put into middleware.
Instead, I'd use some service library that would have a much nicer
configuration API.  But none of that should really be described within
the deployment spec, so I haven't done so.

I'm trying to be sensitive of Ian's desire to use middleware for all
kinds of services.  I also do think there is a place for middleware, so
it's useful to be able to compose pipelines declaratively even if they
are terribly simple.  OTOH, if I set up an actual deployment for a
customer, it would rarely consist of more than one or two gateways and
then the application and many times it would just be the application if
I had no need for "oblivious" middleware apps in the pipeline.

Anyway, back to the nitty gritty of config, I'd rather just use
ConfigParser "as is" right now than to come up with another .ini parser
that preserves section ordering, thus the non-dependence on ordering
within the deployment file.

>  If I 
> was wrapping oblivious or legacy apps, I'd just make one middleware object 
> that then calls the generic function to do any and all dynamic 
> requirements, because it would only take a little bit of syntax sugar to 
> implement "configuration" scripts like:
> 
>      use_auth("/some/subdir", some_auth_service)
>      mount_app("/other/path", some_app_object)
> 
> etc.  So, all the time spent on coming up with an uglier, less-powerful 
> pseudo-framework to simulate these capabilities using crude .ini files and 
> poking stuff into environ seems kind of wasteful to me, versus defining a 
> powerful API to -- dare I say it -- "paste" applications together.  :)
> 
> However, such an API deserves to be both powerful and easy-to-use, not 
> kludged together with .ini syntax.

I agree.

> That's not saying I don't think WSGI should have a deployment configuration 
> format based on .ini syntax -- I still do!  I just don't think it should 
> even attempt to allow anything complex.  A simple static pipeline and some 
> server-defined and WSGI-defined options will do nicely for the "simple 
> things are simple" case, and a Python file will do nicely for all the 
> "complex things are possible" cases.

That's fine by me.

> That's why I'd like to see this effort split into two parts: 1) simple 
> deployment, and 2) a "pasting" API whose entire purpose in life is to 
> stack, route, and multiplex "middleware" and "applications" without having 
> to explicitly manage a pipeline.
> 
> This API would use *specificity* as a basis for establishing pipelines, 
> because it's not at all scalable (developer-wise) to set up pipelines on a 
> URL-by-URL basis for a complex application -- especially for applications 
> that aren't page-based!  Usually, you'll need some kind of pipeline 
> inheritance to manage that sort of thing.
> 
> There is little reason, however, why you can't configure a significant 
> portion of a URL space using a single WSGI component, using an appropriate 
> mechanism.  For example, recasting my earlier example:
> 
>      def factory(container):
>          container.use_auth("some/subdir", some_auth_service)
>          container.mount_app_factory("other/path", some_app_factory)

Yes.  I hadn't thought about managing service context based on
containment like this (and I like that), but to me, this is a services
registration all the same.

> Then, the 'mount_app_factory()' call could invoke 
> 'some_app_factory(subcontainer)' where 'subcontainer' is a wrapper that 
> prepends 'other/path' to URLs before delegating to 'container'.
> 
> In other words, once you have this "container API", there's no reason not 
> to just use it to implement the whole stack in a single middleware object.

I'd agree.  I'd only like to use the deployment spec to compose a
pipeline out of very simple oblivious middleware apps and a single
endpoint app.

> Anyway, this is why I think there should be a "WSGI Services" and/or "WSGI 
> Container API" spec, distinct from a "WSGI Deployment Metadata" 
> spec.  These two spheres are both valuable, but I think it'll take longer 
> to get a "deployment" spec if we mix "container API" stuff into it -- and 
> get a much less useful container API than if we set our minds on making a 
> good container API, rather than a souped-up deployment descriptor.

+1.  This is the main reason that I'm trying to resist putting
arbitrarily complex configuration into the deployment file.  I don't
think there's anything about the proposal I sent over the other day that
advocates complexity in the config format.  As far as I'm concerned,
there isn't much configuration for middleware, and when there is, they
can use envvars or a separate config file.  Most of the more complex
configuration I'd tend to do via a services library.

- C


From chrism at plope.com  Sun Jul 24 10:05:43 2005
From: chrism at plope.com (Chris McDonough)
Date: Sun, 24 Jul 2005 04:05:43 -0400
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <42E2E865.2020702@colorstudy.com>
References: <mailman.7767.1121744822.10511.web-sig@python.org>
	<31f07fc30507191108b01ba7d@mail.gmail.com>
	<1122064687.8446.2.camel@localhost.localdomain>
	<42E17279.5040104@colorstudy.com>
	<1122163683.3650.132.camel@plope.dyndns.org>
	<42E2E865.2020702@colorstudy.com>
Message-ID: <1122192343.3650.203.camel@plope.dyndns.org>

Thanks for the response... I'm not going to respond point-by-point here
because probably nobody has time to read this stuff anyway.

But in general:

1) I'm for creating a simple deployment spec that allows you to define
static pipelines declaratively.  The decision middleware thing is just
an idea.  I'm not really sure it's even a good idea, but it's a stab at
a compromise which would allow for a bit of pipeline dynamicism.

2) I don't have a strong preference one way or another about what the
main config looks like other than it should be simple.  So I'd probably
be fine  with any of:

  [application:foo]
  factory = foo.factory
  config = foo.conf

  [application:bar]
  factory = bar.factory
  config = bar.conf

  [pipeline]
  apps = foo bar

- OR (assuming we have section ordering and we can live with a single
pipeline) -

  [foo.factory]
  config = foo.conf

  [bar.factory]
  config = bar.conf

- OR (if we passed the factory a namespace instead of a filename) -

  [foo.factory]
  arbitrarykey1 = arbitraryvalue1
  arbitrarykey2 = arbitraryvalue2

  [bar.factory]
  arbitrarykey1 = arbitraryvalue1
  arbitrarykey2 = arbitraryvalue2

  (Forget my ramblings about os.environ.  You're right.
  It all comes out the same.)

3) I don't have a strong opinion on whether middleware and endpoint
   apps should be treated differently in the config file.
   If we used section ordering in configparser to imply the pipeline, 
   I'd suspect they wouldn't be.

So where does that leave us?

- C

On Sat, 2005-07-23 at 20:01 -0500, Ian Bicking wrote:
> Chris McDonough wrote:
> > On Fri, 2005-07-22 at 17:26 -0500, Ian Bicking wrote:
> >>>  To do this, we use a ConfigParser-format config file named
> >>>  'myapplication.conf' that looks like this::
> >>>
> >>>    [application:sample1]
> >>>    config = sample1.conf
> >>>    factory = wsgiconfig.tests.sample_components.factory1
> >>>
> >>>    [application:sample2]
> >>>    config = sample2.conf
> >>>    factory = wsgiconfig.tests.sample_components.factory2
> >>>
> >>>    [pipeline]
> >>>    apps = sample1 sample2
> >>
> >>I think it's confusing to call both these applications.  I think 
> >>"middleware" or "filter" would be better.  I think people understand 
> >>"filter" far better, so I'm inclined to use that.  So...
> > 
> > 
> > The reason I called them applications instead of filters is because all
> > of them implement the WSGI "application" API (they all implement "a
> > callable that accepts two parameters, environ and start_response").
> > Some happen to be gateways/filters/middleware/whatever but at least one
> > is just an application and does no delegation.  In my example above,
> > "sample2" is not a filter, it is the end-point application.  "sample1"
> > is a filter, but it's of course also an application too.
> 
> Well, the difference I see is that a filter accepts a next-application, 
> where a plain application does not.  From the perspective of this 
> configuration file, those seem ver different.  In fact, it could 
> actually be:
> 
>    [application:sample1]
>    config = sample1.conf
>    factory = ...
> 
>    ...
> 
>    [application:real_sample1]
>    pipeline = printdebug_app sample1
> 
> That is, a "pipeline" simply describes a new application.  And then -- 
> perhaps with a conventional name, or through some more global 
> configuration -- we indicate which application we are going to serve.
> 
> Hmm... thinking about it, this seems much more general, in a very useful 
> way, since anyone can plugin in ways to compose applications. 
> "pipeline" is just one use case for how to compose applications.
> 
> > Would you maybe rather make it more explicit that some apps are also
> > gateways, e.g.:
> > 
> > [application:bleeb]
> > config = bleeb.conf
> > factory = bleeb.factory
> > 
> > [filter:blaz]
> > config = blaz.conf
> > factory = blaz.factory
> > 
> > ?  I don't know that there's any way we could make use of the
> > distinction between the two types in the configurator other than
> > disallowing people to place an application "before" a filter in a
> > pipeline through validation.  Is there something else you had in mind?
> 
> I have forgotten what the actual factory interface was, but I think it 
> should be different for the two.  Well, I think it *is* different, and 
> passing in a next-application of None just covers up that difference.
> 
> >>[application:sample2]
> >># What is this relative to?  I hate both absolute paths and
> >># paths relative to pwd equally...
> >>config = sample1.conf
> >>factory = wsgiconfig...
> > 
> > 
> > This was from a doctest I wrote so I could rely on relative paths,
> > sorry.  You're right.  Ummmm... we could probably cause use the
> > environment as "defaults" to ConfigParser inerpolation and set whatever
> > we need before the configurator is run:
> > 
> > $ export APP_ROOT=/home/chrism/myapplication
> > $ ./wsgi-configurator.py myapplication.conf
> > 
> > And in myapplication.conf:
> > 
> > [application:sample1]
> > config = %(APP_ROOT)s/sample1.conf
> > factory = myapp.sample1.factory
> 
> I hate %(APP_ROOT)s as a syntax; I think it's okay to simply say that 
> the configuration loader (in some fashion) should determine the root 
> (maybe with an environmental variable or command line parameter).
> 
> Though, realistically, there might be several app roots.  Apache's root 
> directory configuration (for relative paths) isn't very useful to me, in 
> practice, because it's not flexible enough nor allow more than one root.
> 
> >>But this is reasonably easy to resolve -- there's a perfectly good 
> >>configuration section sitting there, waiting to be used:
> >>
> >>   [filter:profile]
> >>   factory = paste.profilemiddleware.ProfileMiddleware
> >>   # Show top 50 functions:
> >>   limit = 50
> >>
> >>This in no way precludes 'config', which is just a special case of this 
> >>general configuration.  The only real problem is a possible conflict if 
> >>we wanted to add new special names to the configuration, i.e., 
> >>meta-filter-configuration.
> > 
> > 
> > I think I'd maybe rather see configuration settings for apps that don't
> > require much configuration to come in as environment variables (maybe
> > not necessarily in the "environ" namespace that is implied by the WSGI
> > callable interface but instead in os.environ).  Envvars are
> > uncontroversial, so they don't cost us any coding time, PEP time, or
> > brain cycles.
> 
> Yikes!  Were you like the ZConfig holdout or something?  os.environ is 
> way, way, way too inflexible.
> 
> Just the other day I was able to deploy a single application I wrote 
> with two configurations in the same process, without having thought 
> about that possibility ahead of time, and without doing any extra work 
> or avoiding any particular shortcuts.  It worked absolutely seamlessly, 
> because I wasn't using any global variables, and I had stuck to a 
> convention where Paste nests configurations in a safe manner. 
> os.environ is very global, very hard to work with from a UI perspective, 
> and very invisible.  These configuration files should be totally 
> encapsulated, and easy to nest.
> 
> There's a small number of places where I might be open to using 
> environmental variables as an *optional* way to feed information, like 
> APP_ROOT (but even there I feel strongly there should be a 
> configuration-file-based way to say the same thing).  For middleware 
> configuration it makes no sense at all -- configuration must be 
> encapsulated in the file itself (or the files that are referenced).
> 
> >>Another thing this could allow is recursive configuration, like:
> >>
> >>[application:urlmap]
> >>factory = paste.urlmap.URLMapBuilder
> >>app1 = blog
> >>app1.url = /
> >>app2 = statview
> >>app2.url = /stats
> >>app3 = cms
> >>app3.host = dev.*
> >>
> >>[application:blog]
> >>factory = leonardo.wsgifactory
> >>config = myblog.conf
> >>
> >>[application:statview]
> >>factory = statview
> >>log_location = /var/logs/apache2
> >>
> >>[application:cms]
> >>factory = proxy
> >>location = http://localhost:8080
> >>map = / /cms.php
> >>
> >>[pipeline]
> >>app = urlmap
> >>
> >>
> >>So URLMapBuilder needs the entire configuration file passed in, along 
> >>with the name of the section it is building.  It then reads some keys, 
> >>and builds some named applications, and creates an application that 
> >>delegates based on patterns.  That's the kind of configuration file I 
> >>could really use.
> > 
> > 
> > Maybe one other (less flexible, but declaratively configurable and
> > simpler to code) way to do this might be by canonizing the idea of
> > "decision middleware", allowing one component in an otherwise static
> > pipeline to decide which is the "next" one by executing a Python
> > expression which runs in a context that exposes the WSGI environment.
> > 
> > [application:blog]
> > factory = leonardo.wsgifactory
> > config = myblog.conf
> > 
> > [application:statview]
> > factory = statview
> > 
> > [application:cms]
> > factory = proxy
> > 
> > [decision:urlmapper]
> > cms = environ['PATH_INFO'].startswith('/cms')
> > statview = environ['PATH_INFO'].startswith('/statview')
> > blog = environ['PATH_INFO'].startswith('/blog')
> 
> Well, that's hard to imagine working.  First, you'd need a way to import 
> new functions, since a large number of use cases can't be handled 
> without imports (like re).  But even then, these transformations 
> typically modify the environment.  For instance, if you map /cms to an 
> application, you have to put /cms onto SCRIPT_NAME, and take it off of 
> PATH_INFO.  This keeps URL introspection sane.
> 
> But the example I gave seems just as declarative to me (moreso, even), 
> and not hard to implement.  It just requires that the factory get a 
> reference to the full parsed configuration file.
> 
> > [environment]
> > statview.log_location = /var/logs/apache2
> > cms.location = http://localhost:8080
> > cms.map = / /cms.php
> > 
> > [pipeline]
> > apps = urlmapper
> 
> > Yes.  OTOH, when a certain level of dynamicism is reached, it's no
> > longer possible to configure things declaratively because it becomes a
> > programming task, and this proposal is (so far) just about being able to
> > configure things declaratively so I think we need some sort of
> > compromise.
> > 
> > 
> >>I think this can be achieved simply by defining a standard based on the 
> >>object interface, where the configuration file itself is a reference 
> >>implementation (that we expect people will usually use).  Semantics from 
> >>the configuration file will leak through, but it's lot easier to deal 
> >>with (for example) a system that can only support string configuration 
> >>values, than a system based on concrete files in a specific format.
> > 
> > 
> > Sorry, I can't parse that paragraph.
> 
> I mean that a standard should be in terms of what interface the 
> factories must implement, and what objects they are given.  The actual 
> implementation of a loader based on an INI configuration file is a 
> useful reference library (and maybe the only library we need), but 
> shouldn't be part of the standard.
> 
> >>> - If elements in the pipeline depend on "services" (ala
> >>>   Paste-as-not-a-chain-of-middleware-components), it may be
> >>>   advantageous to create a "service manager" instead of deploying
> >>>   each service as middleware.  The "service manager" idea is not a
> >>>   part of the deployment spec.  The service manager would itself
> >>>   likely be implemented as a piece of middleware or perhaps just a
> >>>   library.
> >>
> >>That might be best.  It's also quite possible for the factory to 
> >>instantiate more middleware.
> > 
> > 
> > Which factory?
> 
> The object referenced by the "factory" key in the configuration file.
> 


From ianb at colorstudy.com  Sun Jul 24 11:04:43 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun, 24 Jul 2005 04:04:43 -0500
Subject: [Web-SIG] Scarecrow deployment config
Message-ID: <42E359AB.5010002@colorstudy.com>

So maybe here's a deployment spec we can start with.  It looks like:

   [feature1]
   someapplication.somemodule.some_function

   [feature2]
   someapplication.somemodule.some_function2

You can't get dumber than that!  There should also be a "no-feature" 
section; maybe one without a section identifier, or some special section 
identifier.

It goes in the .egg-info directory.  This way elsewhere you can say:

   application = SomeApplication[feature1]

And it's quite unambiguous.  Note that there is *no* "configuration" in 
the egg-info file, because you can't put any configuration related to a 
deployment in an .egg-info directory, because it's not specific to any 
deployment.  Obviously we still need a way to get configuration in 
there, but lets say that's a different matter.

This puts complex middleware construction into the function that is 
referenced.  This function might be, in turn, an import from a 
framework.  Or it might be some complex setup specific to the 
application.  Whatever.

The API would look like:

   wsgiapp = wsgiref.get_egg_application('SomeApplication[feature1]')

Which ultimately resolves to:

   wsgiapp = some_function()

get_egg_application could also take a pkg_resources.Distribution object.

Open issues?  Yep, there's a bunch.  This requires the rest of the 
configuration to be done quite lazily.  But I can fit this into source 
control; it is about *all* I can fit into source control (I can't have 
any filenames, I can't have any installation-specific pipelines, I can't 
have any other apps), but it is also enough that the deployment-specific 
parts can avoid many complexities of pipelining and factories and all 
that -- presumably the factory functions handle that.  I don't think 
this is useful without the other pieces (both in front of this 
configuration file and behind it) but maybe we can think about what 
those other pieces could look like.  I'm particularly open to 
suggestions that some_function() take some arguments, but I don't know 
what arguments.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From pje at telecommunity.com  Sun Jul 24 17:29:30 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 24 Jul 2005 11:29:30 -0400
Subject: [Web-SIG] Standardized configuration
In-Reply-To: <1122192343.3650.203.camel@plope.dyndns.org>
References: <42E2E865.2020702@colorstudy.com>
	<mailman.7767.1121744822.10511.web-sig@python.org>
	<31f07fc30507191108b01ba7d@mail.gmail.com>
	<1122064687.8446.2.camel@localhost.localdomain>
	<42E17279.5040104@colorstudy.com>
	<1122163683.3650.132.camel@plope.dyndns.org>
	<42E2E865.2020702@colorstudy.com>
Message-ID: <5.1.1.6.0.20050724111347.02733ff0@mail.telecommunity.com>

At 04:05 AM 7/24/2005 -0400, Chris McDonough wrote:
>- OR (if we passed the factory a namespace instead of a filename) -
>
>   [foo.factory]
>   arbitrarykey1 = arbitraryvalue1
>   arbitrarykey2 = arbitraryvalue2
>
>   [bar.factory]
>   arbitrarykey1 = arbitraryvalue1
>   arbitrarykey2 = arbitraryvalue2

This one's my favorite.  I'd say the semantics are that each factory gets 
passed the key/value pairs as keyword arguments, with a positional argument 
used to pass in the "next application".  The last factory in the file 
wouldn't get the positional argument.

If a section's name has len(sectionName.split())>1, then the second and 
subsequent words are directives that change the default interpretation of 
the section, so that we can have things like:

     [WSGI options]
     # WSGI options, like required eggs, threading mode, etc.

     [mod_python options]
     # mod_python-specific options

     [some.app object]
     # this app is an object, not a factory

I don't care that ConfigParser doesn't support any of this, because 
low-level .ini parsers are easy to write and I've previously written two: 
one for peak.config and one for pkg_resources.  And if the implementation 
can assume pkg_resources is available, it can use the one that's there to 
do the sequential section-splitting part of the job.

I'm not sure of this, but I tend towards thinking that the 
'arbitraryvalues' should be Python expressions, rather than raw strings.  I 
also think that we should support a source-encoding comment to allow for 
localization of Unicode literals, whether we treat values as raw strings or 
Python expressions.


From pje at telecommunity.com  Sun Jul 24 18:49:03 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 24 Jul 2005 12:49:03 -0400
Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow
 deployment config
In-Reply-To: <42E359AB.5010002@colorstudy.com>
Message-ID: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>

[cc:ed to distutils-sig because much of the below is about a new egg 
feature; follow-ups about the web stuff should stay on web-sig]

At 04:04 AM 7/24/2005 -0500, Ian Bicking wrote:
>So maybe here's a deployment spec we can start with.  It looks like:
>
>    [feature1]
>    someapplication.somemodule.some_function
>
>    [feature2]
>    someapplication.somemodule.some_function2
>
>You can't get dumber than that!  There should also be a "no-feature"
>section; maybe one without a section identifier, or some special section
>identifier.
>
>It goes in the .egg-info directory.  This way elsewhere you can say:
>
>    application = SomeApplication[feature1]

I like this a lot, although for a different purpose than the format Chris 
and I were talking about.  I see this fitting into that format as maybe:

    [feature1 from SomeApplication]
    # configuration here


>And it's quite unambiguous.  Note that there is *no* "configuration" in
>the egg-info file, because you can't put any configuration related to a
>deployment in an .egg-info directory, because it's not specific to any
>deployment.  Obviously we still need a way to get configuration in
>there, but lets say that's a different matter.

Easily fixed via what I've been thinking of as the "deployment descriptor"; 
I would call your proposal here the "import map".  Basically, an import map 
describes a mapping from some sort of feature name to qualified names in 
the code.

I have an extension that I would make, though.  Instead of using sections 
for features, I would use name/value pairs inside of sections named for the 
kind of import map.  E.g.:

     [wsgi.app_factories]
     feature1 = somemodule:somefunction
     feature2 = another.module:SomeClass
     ...

     [mime.parsers]
     application/atom+xml = something:atom_parser
     ...

In other words, feature maps could be a generic mechanism offered by 
setuptools, with a 'Distribution.load_entry_point(kind,name)' API to 
retrieve the desired object.  That way, we don't end up reinventing this 
idea for dozens of frameworks or pluggable applications that just need a 
way to find a few simple entry points into the code.

In addition to specifying the entry point, each entry in the import map 
could optionally list the "extras" that are required if that entry point is 
used.
It could also issue a 'require()' for the corresponding feature if it has 
any additional requirements listed in the extras_require dictionary.

So, I'm thinking that this would be implemented with an entry_points.txt 
file in .egg-info, but supplied in setup.py like this:

     setup(
         ...
         entry_points = {
             "wsgi.app_factories": dict(
                 feature1 = "somemodule:somefunction",
                 feature2 = "another.module:SomeClass [extra1,extra2]",
             ),
             "mime.parsers": {
                 "application/atom+xml": "something:atom_parser [feedparser]"
             }
         },
         extras_require = dict(
             feedparser = [...],
             extra1 = [...],
             extra2 = [...],
         )
     )

Anyway, this would make the most common use case for eggs-as-plugins very 
easy: an application or framework would simply define entry points, and 
plugin projects would declare the ones they offer in their setup script.

I think this is a fantastic idea and I'm about to leap into implementing 
it.  :)


>This puts complex middleware construction into the function that is
>referenced.  This function might be, in turn, an import from a
>framework.  Or it might be some complex setup specific to the
>application.  Whatever.
>
>The API would look like:
>
>    wsgiapp = wsgiref.get_egg_application('SomeApplication[feature1]')
>
>Which ultimately resolves to:
>
>    wsgiapp = some_function()
>
>get_egg_application could also take a pkg_resources.Distribution object.

Yeah, I'm thinking that this could be implemented as something like:

     import pkg_resources

     def get_wsgi_app(project_name, app_name, *args, **kw):
         dist = pkg_resources.require(project_name)[0]
         return dist.load_entry_point('wsgi.app_factories', 
app_name)(*args,**kw)

with all the heavy lifting happening in the pkg_resources.Distribution 
class, along with maybe a new EntryPoint class (to handle parsing entry 
point specifiers and to do the loading of them.


>Open issues?  Yep, there's a bunch.  This requires the rest of the
>configuration to be done quite lazily.

Not sure I follow you; the deployment descriptor could contain all the 
configuration; see the Web-SIG post I made just previous to this one.


>   But I can fit this into source
>control; it is about *all* I can fit into source control (I can't have
>any filenames, I can't have any installation-specific pipelines, I can't
>have any other apps), but it is also enough that the deployment-specific
>parts can avoid many complexities of pipelining and factories and all
>that -- presumably the factory functions handle that.

+1.


>   I don't think
>this is useful without the other pieces (both in front of this
>configuration file and behind it) but maybe we can think about what
>those other pieces could look like.  I'm particularly open to
>suggestions that some_function() take some arguments, but I don't know
>what arguments.

At this point, I think this "entry points" concept weighs in favor of 
having the deployment descriptor configuration values be Python 
expressions, meaning that a WSGI application factory would accept keyword 
arguments that can be whatever you like in order to configure it.

However, after more thought, I think that the "next application" argument 
should be a keyword argument too, like 'wsgi_next' or some such.  This 
would allow a factory to have required arguments in its signature, e.g.:

     def some_factory(required_arg_x, required_arg_y, optional_arg="foo", 
....):
         ...

The problem with my original idea to have the "next app" be a positional 
argument is that it would prevent non-middleware applications from having 
any required arguments.

Anyway, I think we're now very close to being able to define a useful 
deployment descriptor format for establishing pipelines and setting 
options, that leaves open the possibility to do some very sophisticated 
things.

Hm.  Interesting thought...  we could have a function to read a deployment 
descriptor (from a string, stream, or filename) and then return the WSGI 
application object.  You could then wrap this in a simple WSGI app that 
does filesystem-based URL routing to serve up *.wsgi files from a 
directory.  This would let you bootstrap a deployment capability into 
existing WSGI servers, without them having to add their own support for 
it!  Web servers and frameworks that have some kind of file extension 
mapping mechanism could do this directly, of course.  I can envision 
putting *.wsgi files in my web directories and then configuring Apache to 
run them using either mod_python or FastCGI or even as a CGI, just by 
tweaking local .htaccess files.  However, once you have Apache tweaked the 
way you want, .wsgi files can be just dropped in and edited.

Of course, there are still some open design issues, like caching of .wsgi 
files (e.g. should the file be checked for changes on each hit?  I guess 
that could be a setting under "WSGI options", and would only work if the 
descriptor parser was given an actual filename to load from.)


From ianb at colorstudy.com  Sun Jul 24 19:59:20 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun, 24 Jul 2005 12:59:20 -0500
Subject: [Web-SIG] Scarecrow deployment config
In-Reply-To: <42E359AB.5010002@colorstudy.com>
References: <42E359AB.5010002@colorstudy.com>
Message-ID: <42E3D6F8.1020905@colorstudy.com>

Did I say scarecrow?  Man it must have been late, I think I meant 
strawman ;)

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From ianb at colorstudy.com  Sun Jul 24 21:12:02 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun, 24 Jul 2005 14:12:02 -0500
Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow
 deployment config
In-Reply-To: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
Message-ID: <42E3E802.4030500@colorstudy.com>

Phillip J. Eby wrote:
>> It goes in the .egg-info directory.  This way elsewhere you can say:
>>
>>    application = SomeApplication[feature1]
> 
> 
> I like this a lot, although for a different purpose than the format 
> Chris and I were talking about.  

Yes, this proposal really just simplifies a part of that application 
deployment configuration, it doesn't replace it.  Though it might make 
other standardization less important.

> I see this fitting into that format as 
> maybe:
> 
>    [feature1 from SomeApplication]
>    # configuration here
> 
> 
>> And it's quite unambiguous.  Note that there is *no* "configuration" in
>> the egg-info file, because you can't put any configuration related to a
>> deployment in an .egg-info directory, because it's not specific to any
>> deployment.  Obviously we still need a way to get configuration in
>> there, but lets say that's a different matter.
> 
> 
> Easily fixed via what I've been thinking of as the "deployment 
> descriptor"; I would call your proposal here the "import map".  
> Basically, an import map describes a mapping from some sort of feature 
> name to qualified names in the code.

Yes, it really just gives you a shorthand for the factory configuration 
variable.

> I have an extension that I would make, though.  Instead of using 
> sections for features, I would use name/value pairs inside of sections 
> named for the kind of import map.  E.g.:
> 
>     [wsgi.app_factories]
>     feature1 = somemodule:somefunction
>     feature2 = another.module:SomeClass
>     ...
> 
>     [mime.parsers]
>     application/atom+xml = something:atom_parser
>     ...

I assume mime.parsers is just a theoretical example of another kind of 
service a package can provide?  But yes, this seems very reasonable, and 
even allows for loosely versioned specs (e.g., wsgi.app_factories02, 
which returns factories with a different interface; or maybe something 
like foo.configuration_schema, an optional entry point that returns the 
configuration schema for an application described elsewhere).

This kind of addresses the issue where the module structure of a package 
becomes an often unintentional part of its external interface.  It feels 
a little crude in that respect... but maybe not.  Is it worse to do:

   from package.module import name

or:

   name = require('Package').load_entry_point('service_type', 'name')

OK, well clearly the second is worse ;)  But if that turned into a 
single function call:

   name = load_service('Package', 'service_type', 'name')

It's not that bad.  Maybe even:

   name = services['Package:service_type:name']

Though service_type feels extraneous to me.  I see the benefit of being 
explicit about what the factory provides, but I don't see the benefit of 
separating namespaces; the name should be unambiguous.  Well... unless 
you used the same name to group related services, like the configuration 
schema and the application factory itself.  So maybe I retract that 
criticism.

> In addition to specifying the entry point, each entry in the import map 
> could optionally list the "extras" that are required if that entry point 
> is used.
> It could also issue a 'require()' for the corresponding feature if it 
> has any additional requirements listed in the extras_require dictionary.

I figured each entry point would just map to a feature, so the 
extra_require dictionary would already have entries.

> So, I'm thinking that this would be implemented with an entry_points.txt 
> file in .egg-info, but supplied in setup.py like this:
> 
>     setup(
>         ...
>         entry_points = {
>             "wsgi.app_factories": dict(
>                 feature1 = "somemodule:somefunction",
>                 feature2 = "another.module:SomeClass [extra1,extra2]",
>             ),
>             "mime.parsers": {
>                 "application/atom+xml": "something:atom_parser 
> [feedparser]"
>             }
>         },
>         extras_require = dict(
>             feedparser = [...],
>             extra1 = [...],
>             extra2 = [...],
>         )
>     )

I think I'd rather just put the canonical version in .egg-info instead 
of as an argument to setup(); this is one place where using Python 
expressions isn't a shining example of clarity.  But I guess this is 
fine too; for clarity I'll probably start writing my setup.py files with 
variable assignments, then a setup() call that just refers to those 
variables.

>> Open issues?  Yep, there's a bunch.  This requires the rest of the
>> configuration to be done quite lazily.
> 
> 
> Not sure I follow you; the deployment descriptor could contain all the 
> configuration; see the Web-SIG post I made just previous to this one.

Well, when I proposed that the factory be called with zero arguments, 
that wouldn't allow any configuration to be passed in.

>>   I don't think
>> this is useful without the other pieces (both in front of this
>> configuration file and behind it) but maybe we can think about what
>> those other pieces could look like.  I'm particularly open to
>> suggestions that some_function() take some arguments, but I don't know
>> what arguments.
> 
> 
> At this point, I think this "entry points" concept weighs in favor of 
> having the deployment descriptor configuration values be Python 
> expressions, meaning that a WSGI application factory would accept 
> keyword arguments that can be whatever you like in order to configure it.

Yes, I'd considered this as well.  I'm not a huge fan of Python 
expressions, because something like "allow_hosts=['127.0.0.1']" seems 
unnecessarily complex to me.  As a convention (maybe not a requirement; 
a SHOULD) I like if configuration consumers handle strings specially, 
doing context-sensitive conversion (in this case maybe splitting on ',' 
or on whitespace).  It would make me sad to see a something accept 
requests from the IP addresses ['1', '2', '7', '.', '0', '.', '0', '.', 
'1'].  This is the small sort of thing that I think makes the experience 
less pleasant.

> However, after more thought, I think that the "next application" 
> argument should be a keyword argument too, like 'wsgi_next' or some 
> such.  This would allow a factory to have required arguments in its 
> signature, e.g.:
> 
>     def some_factory(required_arg_x, required_arg_y, optional_arg="foo", 
> ....):
>         ...
> 
> The problem with my original idea to have the "next app" be a positional 
> argument is that it would prevent non-middleware applications from 
> having any required arguments.

I think it's fine to declare the next_app keyword argument as special, 
and promise (by convention) to always pass it in with that name.

> Anyway, I think we're now very close to being able to define a useful 
> deployment descriptor format for establishing pipelines and setting 
> options, that leaves open the possibility to do some very sophisticated 
> things.
> 
> Hm.  Interesting thought...  we could have a function to read a 
> deployment descriptor (from a string, stream, or filename) and then 
> return the WSGI application object.  You could then wrap this in a 
> simple WSGI app that does filesystem-based URL routing to serve up 
> *.wsgi files from a directory.  This would let you bootstrap a 
> deployment capability into existing WSGI servers, without them having to 
> add their own support for it!  Web servers and frameworks that have some 
> kind of file extension mapping mechanism could do this directly, of 
> course.  I can envision putting *.wsgi files in my web directories and 
> then configuring Apache to run them using either mod_python or FastCGI 
> or even as a CGI, just by tweaking local .htaccess files.  However, once 
> you have Apache tweaked the way you want, .wsgi files can be just 
> dropped in and edited.

Absolutely; I see no reason WSGI servers should have any dispatching 
logic in them, except in cases when they also dispatch to non-Python 
applications (like Apache).  So it seems natural that we present 
deployment as a single application factory that takes zero or one arguments.

> Of course, there are still some open design issues, like caching of 
> .wsgi files (e.g. should the file be checked for changes on each hit?  I 
> guess that could be a setting under "WSGI options", and would only work 
> if the descriptor parser was given an actual filename to load from.)

I don't know what we'd do if we checked the file and found it wasn't up 
to date.  In this particular case I suppose you could reload the 
configuration file, but if the change in the configuration file 
reflected a change in the source code, then you're stuck because 
reloading in Python is so infeasible.  I'm all for warnings, but I don't 
see how we can do the Right Thing here, as much as I wish it were otherwise.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From pje at telecommunity.com  Sun Jul 24 22:42:35 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 24 Jul 2005 16:42:35 -0400
Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow
 deployment config
In-Reply-To: <42E3E802.4030500@colorstudy.com>
References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20050724160148.02874f10@mail.telecommunity.com>

At 02:12 PM 7/24/2005 -0500, Ian Bicking wrote:
>This kind of addresses the issue where the module structure of a package 
>becomes an often unintentional part of its external interface.  It feels a 
>little crude in that respect... but maybe not.  Is it worse to do:
>
>   from package.module import name
>
>or:
>
>   name = require('Package').load_entry_point('service_type', 'name')
>
>OK, well clearly the second is worse ;)  But if that turned into a single 
>function call:
>
>   name = load_service('Package', 'service_type', 'name')
>
>It's not that bad.  Maybe even:
>
>   name = services['Package:service_type:name']

The actual API I have implemented in my CVS working copy is:

    the_object = load_entry_point('Project', 'group', 'name')

which seems pretty clean to me.  You can also use 
dist.load_entry_point('group','name') if you already have a distribution 
object for some reason.  (For example, if you use an activation listener to 
get callbacks when distributions are activated on sys.path.)

To introspect an entry point or check for its existence, you can use:

    entry_point = get_entry_info('Project', 'group', 'name')

which returns either None or an EntryPoint object with various 
attributes.  To list the entry points of a group, or to list the groups, 
you can use:

    # dictionary of group names to entry map for each kind
    group_names = get_entry_map('Project')

    # dictionary of entry names to corresponding EntryPoint object
    entry_names = get_entry_map('Project', 'group')

These are useful for dynamic entry points.


>Though service_type feels extraneous to me.  I see the benefit of being 
>explicit about what the factory provides, but I don't see the benefit of 
>separating namespaces; the name should be unambiguous.

You're making the assumption that the package author defines the entry 
point names, but that's not the case for application plugins; the 
application will define entry point names and group names for the 
application's use, and some applications will need multiple groups.  Groups 
might be keyed statically (i.e. a known set of entry point names) or 
dynamically (the keys are used to put things in a table, e.g. a file 
extension handler table).


>>In addition to specifying the entry point, each entry in the import map 
>>could optionally list the "extras" that are required if that entry point 
>>is used.
>>It could also issue a 'require()' for the corresponding feature if it has 
>>any additional requirements listed in the extras_require dictionary.
>
>I figured each entry point would just map to a feature, so the 
>extra_require dictionary would already have entries.

The problem with that is that asking for a feature that's not in 
extras_require is an InvalidOption error, so this would force you to define 
entries in extras_require even if you have no extras involved.  It would 
also make for redundancies when entry points share an extra.  I also don't 
expect extras to be used as frequently as entry points.


>>So, I'm thinking that this would be implemented with an entry_points.txt 
>>file in .egg-info, but supplied in setup.py like this:
>>     setup(
>>         ...
>>         entry_points = {
>>             "wsgi.app_factories": dict(
>>                 feature1 = "somemodule:somefunction",
>>                 feature2 = "another.module:SomeClass [extra1,extra2]",
>>             ),
>>             "mime.parsers": {
>>                 "application/atom+xml": "something:atom_parser [feedparser]"
>>             }
>>         },
>>         extras_require = dict(
>>             feedparser = [...],
>>             extra1 = [...],
>>             extra2 = [...],
>>         )
>>     )
>
>I think I'd rather just put the canonical version in .egg-info instead of 
>as an argument to setup(); this is one place where using Python 
>expressions isn't a shining example of clarity.  But I guess this is fine 
>too; for clarity I'll probably start writing my setup.py files with 
>variable assignments, then a setup() call that just refers to those variables.

The actual syntax I'm going to end up with is:

       entry_points = {
           "wsgi.app_factories": [
               "feature1 = somemodule:somefunction",
               "feature2 = another.module:SomeClass [extra1,extra2]",
           ]
       }

Which is still not great, but it's a bit simpler.  If you only have one 
entry point, you can use:

       entry_points = {
           "wsgi.app_factories": "feature = somemodule:somefunction",
       }

Or you can use a long string for each group:

       entry_points = {
           "wsgi.app_factories": """
               # define features for blah blah
               feature1 = somemodule:somefunction
               feature2 = another.module:SomeClass [extra1,extra2]
           """
       }

Or even list everything in one giant string:

       entry_points = """
           [wsgi.app_factories]
           # define features for blah blah
           feature1 = somemodule:somefunction
           feature2 = another.module:SomeClass [extra1,extra2]
       """

This last format is more readable than the others, I think, but there are 
likely to be setup scripts that will be generating some of this 
dynamically, and I'd rather not force them to use strings when lists or 
dictionaries would be more convenient for their use cases.

Anyway, I hope to check in a working implementation with tests later 
today.  Currently, the EntryPoint class works, but setuptools doesn't 
generate the entry_points.txt file yet, and I don't have any tests yet for 
the entry_points.txt parser or the API functions, although they're already 
implemented.


From pje at telecommunity.com  Mon Jul 25 01:20:20 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 24 Jul 2005 19:20:20 -0400
Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow
 deployment config
In-Reply-To: <42E3E802.4030500@colorstudy.com>
References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20050724191150.027de240@mail.telecommunity.com>

At 02:12 PM 7/24/2005 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>However, after more thought, I think that the "next application" argument 
>>should be a keyword argument too, like 'wsgi_next' or some such.  This 
>>would allow a factory to have required arguments in its signature, e.g.:
>>     def some_factory(required_arg_x, required_arg_y, optional_arg="foo", 
>> ....):
>>         ...
>>The problem with my original idea to have the "next app" be a positional 
>>argument is that it would prevent non-middleware applications from having 
>>any required arguments.
>
>I think it's fine to declare the next_app keyword argument as special, and 
>promise (by convention) to always pass it in with that name.

Actually, now that we have the "entry points" capability in pkg_resources 
(I just checked it in), we could simply have middleware components looked 
up in 'wsgi.middleware_factories' and applications looked up in 
'wsgi.application_factories'.  If a factory can be used for both, you can 
always list it in both places.

Entry points have 1001 uses...  I can imagine applications defining entry 
point groups for URL namespaces.  For example, Trac has URLs like 
/changesets and /roadmap, and these could be defined via a trac.navigation 
entry point group, e.g.:

     [trac.navigation]
     changesets = some.module:foo
     roadmap = other.module:bar

And then people could easily create plugin projects that add additional 
navigation components.  (Trac already has an internal extension point 
system to do things rather like this, but entry points are automatically 
discoverable without any prior knowledge of what modules to import.)

There are other frameworks out there (e.g. PyBlosxom), both web and 
non-web, that could really do nicely with having a standard way to do this 
kind of thing, rather than having to roll their own.


From ianb at colorstudy.com  Mon Jul 25 02:26:22 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun, 24 Jul 2005 19:26:22 -0500
Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow
 deployment config
In-Reply-To: <5.1.1.6.0.20050724160148.02874f10@mail.telecommunity.com>
References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724160148.02874f10@mail.telecommunity.com>
Message-ID: <42E431AE.6070204@colorstudy.com>

Phillip J. Eby wrote:
> The actual syntax I'm going to end up with is:
> 
>       entry_points = {
>           "wsgi.app_factories": [
>               "feature1 = somemodule:somefunction",
>               "feature2 = another.module:SomeClass [extra1,extra2]",
>           ]
>       }

That seems weird to put the assignment inside a string, instead of:

entry_points = {
   'wsgi.app_factories': {
     'app': 'somemodule:somefunction',
   },
}

Also, is there any default name?  Like for a package that distributes 
only one application.  Or these just different spellings for the same thing?


-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From chrism at plope.com  Mon Jul 25 02:35:08 2005
From: chrism at plope.com (Chris McDonough)
Date: Sun, 24 Jul 2005 20:35:08 -0400
Subject: [Web-SIG] Entry points and import maps (was Re:
	Scarecrow	deployment config
In-Reply-To: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
Message-ID: <1122251708.3650.241.camel@plope.dyndns.org>

Sorry, I think I may have lost track of where we were going wrt the
deployment spec.  Specifically, I don't know how we got to using eggs
(which I'd really like to, BTW, they're awesome conceptually!) from
where we were in the discussion about configuring a WSGI pipeline.  What
is a "feature"?  What is an "import map"? "Entry point"?  Should I just
get more familiar with eggs to understand what's being discussed here or
did I miss a few posts?

On Sun, 2005-07-24 at 12:49 -0400, Phillip J. Eby wrote:
> [cc:ed to distutils-sig because much of the below is about a new egg 
> feature; follow-ups about the web stuff should stay on web-sig]
> 
> At 04:04 AM 7/24/2005 -0500, Ian Bicking wrote:
> >So maybe here's a deployment spec we can start with.  It looks like:
> >
> >    [feature1]
> >    someapplication.somemodule.some_function
> >
> >    [feature2]
> >    someapplication.somemodule.some_function2
> >
> >You can't get dumber than that!  There should also be a "no-feature"
> >section; maybe one without a section identifier, or some special section
> >identifier.
> >
> >It goes in the .egg-info directory.  This way elsewhere you can say:
> >
> >    application = SomeApplication[feature1]
> 
> I like this a lot, although for a different purpose than the format Chris 
> and I were talking about.  I see this fitting into that format as maybe:
> 
>     [feature1 from SomeApplication]
>     # configuration here
> 
> 
> >And it's quite unambiguous.  Note that there is *no* "configuration" in
> >the egg-info file, because you can't put any configuration related to a
> >deployment in an .egg-info directory, because it's not specific to any
> >deployment.  Obviously we still need a way to get configuration in
> >there, but lets say that's a different matter.
> 
> Easily fixed via what I've been thinking of as the "deployment descriptor"; 
> I would call your proposal here the "import map".  Basically, an import map 
> describes a mapping from some sort of feature name to qualified names in 
> the code.
> 
> I have an extension that I would make, though.  Instead of using sections 
> for features, I would use name/value pairs inside of sections named for the 
> kind of import map.  E.g.:
> 
>      [wsgi.app_factories]
>      feature1 = somemodule:somefunction
>      feature2 = another.module:SomeClass
>      ...
> 
>      [mime.parsers]
>      application/atom+xml = something:atom_parser
>      ...
> 
> In other words, feature maps could be a generic mechanism offered by 
> setuptools, with a 'Distribution.load_entry_point(kind,name)' API to 
> retrieve the desired object.  That way, we don't end up reinventing this 
> idea for dozens of frameworks or pluggable applications that just need a 
> way to find a few simple entry points into the code.
> 
> In addition to specifying the entry point, each entry in the import map 
> could optionally list the "extras" that are required if that entry point is 
> used.
> It could also issue a 'require()' for the corresponding feature if it has 
> any additional requirements listed in the extras_require dictionary.
> 
> So, I'm thinking that this would be implemented with an entry_points.txt 
> file in .egg-info, but supplied in setup.py like this:
> 
>      setup(
>          ...
>          entry_points = {
>              "wsgi.app_factories": dict(
>                  feature1 = "somemodule:somefunction",
>                  feature2 = "another.module:SomeClass [extra1,extra2]",
>              ),
>              "mime.parsers": {
>                  "application/atom+xml": "something:atom_parser [feedparser]"
>              }
>          },
>          extras_require = dict(
>              feedparser = [...],
>              extra1 = [...],
>              extra2 = [...],
>          )
>      )
> 
> Anyway, this would make the most common use case for eggs-as-plugins very 
> easy: an application or framework would simply define entry points, and 
> plugin projects would declare the ones they offer in their setup script.
> 
> I think this is a fantastic idea and I'm about to leap into implementing 
> it.  :)
> 
> 
> >This puts complex middleware construction into the function that is
> >referenced.  This function might be, in turn, an import from a
> >framework.  Or it might be some complex setup specific to the
> >application.  Whatever.
> >
> >The API would look like:
> >
> >    wsgiapp = wsgiref.get_egg_application('SomeApplication[feature1]')
> >
> >Which ultimately resolves to:
> >
> >    wsgiapp = some_function()
> >
> >get_egg_application could also take a pkg_resources.Distribution object.
> 
> Yeah, I'm thinking that this could be implemented as something like:
> 
>      import pkg_resources
> 
>      def get_wsgi_app(project_name, app_name, *args, **kw):
>          dist = pkg_resources.require(project_name)[0]
>          return dist.load_entry_point('wsgi.app_factories', 
> app_name)(*args,**kw)
> 
> with all the heavy lifting happening in the pkg_resources.Distribution 
> class, along with maybe a new EntryPoint class (to handle parsing entry 
> point specifiers and to do the loading of them.
> 
> 
> >Open issues?  Yep, there's a bunch.  This requires the rest of the
> >configuration to be done quite lazily.
> 
> Not sure I follow you; the deployment descriptor could contain all the 
> configuration; see the Web-SIG post I made just previous to this one.
> 
> 
> >   But I can fit this into source
> >control; it is about *all* I can fit into source control (I can't have
> >any filenames, I can't have any installation-specific pipelines, I can't
> >have any other apps), but it is also enough that the deployment-specific
> >parts can avoid many complexities of pipelining and factories and all
> >that -- presumably the factory functions handle that.
> 
> +1.
> 
> 
> >   I don't think
> >this is useful without the other pieces (both in front of this
> >configuration file and behind it) but maybe we can think about what
> >those other pieces could look like.  I'm particularly open to
> >suggestions that some_function() take some arguments, but I don't know
> >what arguments.
> 
> At this point, I think this "entry points" concept weighs in favor of 
> having the deployment descriptor configuration values be Python 
> expressions, meaning that a WSGI application factory would accept keyword 
> arguments that can be whatever you like in order to configure it.
> 
> However, after more thought, I think that the "next application" argument 
> should be a keyword argument too, like 'wsgi_next' or some such.  This 
> would allow a factory to have required arguments in its signature, e.g.:
> 
>      def some_factory(required_arg_x, required_arg_y, optional_arg="foo", 
> ....):
>          ...
> 
> The problem with my original idea to have the "next app" be a positional 
> argument is that it would prevent non-middleware applications from having 
> any required arguments.
> 
> Anyway, I think we're now very close to being able to define a useful 
> deployment descriptor format for establishing pipelines and setting 
> options, that leaves open the possibility to do some very sophisticated 
> things.
> 
> Hm.  Interesting thought...  we could have a function to read a deployment 
> descriptor (from a string, stream, or filename) and then return the WSGI 
> application object.  You could then wrap this in a simple WSGI app that 
> does filesystem-based URL routing to serve up *.wsgi files from a 
> directory.  This would let you bootstrap a deployment capability into 
> existing WSGI servers, without them having to add their own support for 
> it!  Web servers and frameworks that have some kind of file extension 
> mapping mechanism could do this directly, of course.  I can envision 
> putting *.wsgi files in my web directories and then configuring Apache to 
> run them using either mod_python or FastCGI or even as a CGI, just by 
> tweaking local .htaccess files.  However, once you have Apache tweaked the 
> way you want, .wsgi files can be just dropped in and edited.
> 
> Of course, there are still some open design issues, like caching of .wsgi 
> files (e.g. should the file be checked for changes on each hit?  I guess 
> that could be a setting under "WSGI options", and would only work if the 
> descriptor parser was given an actual filename to load from.)
> 
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/chrism%40plope.com
> 


From ianb at colorstudy.com  Mon Jul 25 03:49:22 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun, 24 Jul 2005 20:49:22 -0500
Subject: [Web-SIG] Entry points and import maps (was Re:
 Scarecrow	deployment config
In-Reply-To: <1122251708.3650.241.camel@plope.dyndns.org>
References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<1122251708.3650.241.camel@plope.dyndns.org>
Message-ID: <42E44522.3090602@colorstudy.com>

Chris McDonough wrote:
> Sorry, I think I may have lost track of where we were going wrt the
> deployment spec.  Specifically, I don't know how we got to using eggs
> (which I'd really like to, BTW, they're awesome conceptually!) from
> where we were in the discussion about configuring a WSGI pipeline.  What
> is a "feature"?  What is an "import map"? "Entry point"?  Should I just
> get more familiar with eggs to understand what's being discussed here or
> did I miss a few posts?

It wouldn't hurt to read up on eggs.  It's not obvious how they fit 
here, and it's taken me a while to figure it out.  But specifically:

* Eggs are packages.  Packages can have optional features.  Those 
features can have additional requirements (external packages) that the 
base package does not have.  Package specifications are spelled like 
"PackageName>=VERSION_NUMBER[FeatureName]"

* Import maps and entry points are new things we're discussing now. 
They are kind of the same thing; basically an entry point maps a logical 
specification (like a 'wsgi.app_factory' named 'foo') to a actual import 
statement.  That's the configuration file:

   [wsgi.app_factory]
   app = mymodule.wsgi:make_app

Which means to get an object "app" which fulfills the spec 
"wsgi.app_factory" you would do "from mymodule.wsgi import make_app"

Eggs have an PackageName.egg-info directory, where configuration files 
can go, and pkg_resources (which is part of setuptools, and associated 
with easy_install, and defines the require() function) can find and 
parse them.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From pje at telecommunity.com  Mon Jul 25 04:08:41 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 24 Jul 2005 22:08:41 -0400
Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow
 deployment config
In-Reply-To: <42E431AE.6070204@colorstudy.com>
References: <5.1.1.6.0.20050724160148.02874f10@mail.telecommunity.com>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724160148.02874f10@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20050724220544.026e50b0@mail.telecommunity.com>

At 07:26 PM 7/24/2005 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>The actual syntax I'm going to end up with is:
>>       entry_points = {
>>           "wsgi.app_factories": [
>>               "feature1 = somemodule:somefunction",
>>               "feature2 = another.module:SomeClass [extra1,extra2]",
>>           ]
>>       }
>
>That seems weird to put the assignment inside a string, instead of:
>
>entry_points = {
>   'wsgi.app_factories': {
>     'app': 'somemodule:somefunction',
>   },
>}

It turned out that EntryPoint objects really want to know their 'name' for 
ease of use in various APIs, and it also made it really easy to do stuff 
like "map(EntryPoint.parse, lines)" to get a list of entry points from a 
list of lines.


>Also, is there any default name?

Huh?


>   Like for a package that distributes only one application.  Or these 
> just different spellings for the same thing?

I don't understand you.  The most minimal way to specify a single entry 
point in setup() is with:

     entry_points = """
         [groupname.here]
         entryname = some.thing:here
     """


From ianb at colorstudy.com  Mon Jul 25 04:21:56 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun, 24 Jul 2005 21:21:56 -0500
Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow
 deployment config
In-Reply-To: <5.1.1.6.0.20050724220544.026e50b0@mail.telecommunity.com>
References: <5.1.1.6.0.20050724160148.02874f10@mail.telecommunity.com>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724160148.02874f10@mail.telecommunity.com>
	<5.1.1.6.0.20050724220544.026e50b0@mail.telecommunity.com>
Message-ID: <42E44CC4.4000807@colorstudy.com>

Phillip J. Eby wrote:
>>   Like for a package that distributes only one application.  Or these 
>> just different spellings for the same thing?
> 
> 
> I don't understand you.  The most minimal way to specify a single entry 
> point in setup() is with:
> 
>     entry_points = """
>         [groupname.here]
>         entryname = some.thing:here
>     """

Basically, in the (I think common) case where a package only provides 
one entry point, do we have to choose an arbitrary entry name.  Like, a 
package that implements one web application; it seems like that 
application would have to be named.  Maybe that name could match the 
package name, or a fixed name we agree upon, but otherwise it adds 
another name to the mix.


-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From pje at telecommunity.com  Mon Jul 25 04:24:28 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 24 Jul 2005 22:24:28 -0400
Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow
 deployment config
In-Reply-To: <1122251708.3650.241.camel@plope.dyndns.org>
References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com>

At 08:35 PM 7/24/2005 -0400, Chris McDonough wrote:
>Sorry, I think I may have lost track of where we were going wrt the
>deployment spec.  Specifically, I don't know how we got to using eggs
>(which I'd really like to, BTW, they're awesome conceptually!) from
>where we were in the discussion about configuring a WSGI pipeline.  What
>is a "feature"?  What is an "import map"? "Entry point"?  Should I just
>get more familiar with eggs to understand what's being discussed here or
>did I miss a few posts?

I suggest this post as the shortest architectural introduction to the whole 
egg thang:

     http://mail.python.org/pipermail/distutils-sig/2005-June/004652.html

It explains pretty much all of the terminology I'm currently using, except 
for the new terms invented today...

Entry points are a new concept, invented today by Ian and myself.  Ian 
proposed having a mapping file (which I dubbed an "import map") included in 
an egg's metadata, and then referring to named entries from a pipeline 
descriptor, so that you don't have to know or care about the exact name to 
import.  The application or middleware factory name would be looked up in 
the egg's import map in order to find the actual factory object.

I took Ian's proposal and did two things:

1) Generalized the idea to a concept of "entry points".  An entry point is 
a name that corresponds to an import specification, and an optional list of 
"extras" (see terminology link above) that the entry point may 
require.  Entry point names exist in a namespace called an "entry point 
group", and I implied that the WSGI deployment spec would define two such 
groups: wsgi.applications and wsgi.middleware, but a vast number of other 
possibilities for entry points and groups exist.  In fact, I went ahead and 
implemented them in setuptools today, and realized I could use them to 
register setup commands with setuptools, making it extensible by any 
project that registers entry points in a 'distutils.commands' group.

2) I then proposed that we extend our deployment descriptor (.wsgi file) 
syntax so that you can do things like:

     [foo from SomeProject]
     # configuration here

What this does is tell the WSGI deployment API to look up the "foo" entry 
point in either the wsgi.middleware or wsgi.applications entry point group 
for the named project, according to whether it's the last item in the .wsgi 
file.  It then invokes the factory as before, with the configuration values 
as keyword arguments.

This proposal is of course an *extension*; it should still be possible to 
use regular dotted names as section headings, if you haven't yet drunk the 
setuptools kool-aid.  But, it makes for interesting possibilities because 
we could now have a tool that reads a WSGI deployment descriptor and runs 
easy_install to find and download the right projects.  So, you could 
potentially just write up a descriptor that lists what you want and the 
server could install it, although I think I personally would want to run a 
tool explicitly; maybe I'll eventually add a --wsgi=FILENAME option to 
EasyInstall that would tell it to find out what to install from a WSGI 
deployment descriptor.

That would actually be pretty cool, when you realize it means that all you 
have to do to get an app deployed across a bunch of web servers is to copy 
the deployment descriptor and tell 'em to install stuff.  You can always 
create an NFS-mounted cache directory where you put pre-built eggs, and 
EasyInstall would just fetch and extract them in that case.

Whew.  Almost makes me wish I was back in my web apps shop, where this kind 
of thing would've been *really* useful to have.


From ianb at colorstudy.com  Mon Jul 25 04:33:53 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun, 24 Jul 2005 21:33:53 -0500
Subject: [Web-SIG] WSGI deployment part 2: factory API
Message-ID: <42E44F91.2040703@colorstudy.com>

OK, so lets assume we have a way (entry points) to get an object that 
represents the package's WSGI application, as a factory.  What do we do 
with that factory?  That is, how do we make an application out of the 
factory?  Well, it seems rather obvious that we call the factory, so 
what do we pass?  Also, consider that there might be two separate but 
similar APIs, one for filters and another for applications.

We could go free-form, and you call application factories with keyword 
arguments that are dependent on the application.  This serves as 
configuration.  You can call filter factories with keyword arguments, 
and one special (required?) keyword argument "next_app".

Another option is we pass in a single dictionary that represents the 
entire configuration.  This leaves room to add more arguments later, 
where if we use keyword arguments for configuration then there's really 
no room at all (the entire signature of the factory is taken up by 
application-specific configuration).

Another part of the API that I can see as useful is passing in the 
distribution object itself.  This way a function in paste (or wherever) 
could serve as the loader for any application with the proper 
framework-specific metadata (and so probably this could devolve into 
per-framework loaders).  This would perhaps preclude non-setuptools 
factories, though you could also pass in None for the distribution for 
those cases.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From pje at telecommunity.com  Mon Jul 25 04:33:31 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 24 Jul 2005 22:33:31 -0400
Subject: [Web-SIG] Entry points and import maps (was Re:
 Scarecrow	deployment config
In-Reply-To: <42E44522.3090602@colorstudy.com>
References: <1122251708.3650.241.camel@plope.dyndns.org>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<1122251708.3650.241.camel@plope.dyndns.org>
Message-ID: <5.1.1.6.0.20050724222434.02865820@mail.telecommunity.com>

At 08:49 PM 7/24/2005 -0500, Ian Bicking wrote:
>Chris McDonough wrote:
>>Sorry, I think I may have lost track of where we were going wrt the
>>deployment spec.  Specifically, I don't know how we got to using eggs
>>(which I'd really like to, BTW, they're awesome conceptually!) from
>>where we were in the discussion about configuring a WSGI pipeline.  What
>>is a "feature"?  What is an "import map"? "Entry point"?  Should I just
>>get more familiar with eggs to understand what's being discussed here or
>>did I miss a few posts?
>
>It wouldn't hurt to read up on eggs.  It's not obvious how they fit here, 
>and it's taken me a while to figure it out.  But specifically:
>
>* Eggs are packages.  Packages can have optional features.

I've taken to using the term "project" to mean a collection of packages, 
scripts, data files, etc., wrapped with a setup script.  In order to avoid 
confusion with other kinds of "features" and "options", the official term 
for those things is now "extras".  An "extra" is some optional capability 
of a project that may incur additional requirements.


>   Those features can have additional requirements (external packages) 
> that the base package does not have.  Package specifications are spelled 
> like "PackageName>=VERSION_NUMBER[FeatureName]"

Actually, it's "ProjectName[extra,...]>=version", and you can list multiple 
version operators, like "FooBar>1.2,<2.1,==2.6,>3.0" to mean versions 
between 1.2 and 2.1 exclusive, and anything *after* 3.0, but 2.6 was okay 
too.  :)

I'm proposing that for WSGI entry points, we allow everything but the 
[extras_list] in a section heading, e.g.:

     [wiki from FooBarWiki>=2.0]

would mean what it looks like it does.  By the way, all this version 
parsing, dependency checking, PyPI-finding, auto-download and build from 
source or binary stuff already exists; it's not a hypothetical 
pie-in-the-sky proposal.


>* Import maps and entry points are new things we're discussing now. They 
>are kind of the same thing; basically an entry point maps a logical 
>specification (like a 'wsgi.app_factory' named 'foo') to a actual import 
>statement.  That's the configuration file:
>
>   [wsgi.app_factory]
>   app = mymodule.wsgi:make_app
>
>Which means to get an object "app" which fulfills the spec 
>"wsgi.app_factory" you would do "from mymodule.wsgi import make_app"
>
>Eggs have an PackageName.egg-info directory, where configuration files can 
>go, and pkg_resources (which is part of setuptools, and associated with 
>easy_install, and defines the require() function) can find and parse them.

Yes, and with the CVS HEAD version of setuptools you can now specify a 
project's entry point map in it setup script, and it will generate the 
entry point file in the project's .egg-info directory, and parse it at 
runtime when you request lookup of an entry point.  There's an API in 
pkg_resources that lets you do:

     factory = load_entry_point("ProjectName", "wsgi.app_factory", "app")

which will do the same as if you had said "from mymodule.wsgi import 
make_app as factory".


From pje at telecommunity.com  Mon Jul 25 05:06:32 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 24 Jul 2005 23:06:32 -0400
Subject: [Web-SIG] WSGI deployment part 2: factory API
In-Reply-To: <42E44F91.2040703@colorstudy.com>
Message-ID: <5.1.1.6.0.20050724230048.02865e98@mail.telecommunity.com>

At 09:33 PM 7/24/2005 -0500, Ian Bicking wrote:
>We could go free-form, and you call application factories with keyword
>arguments that are dependent on the application.  This serves as
>configuration.  You can call filter factories with keyword arguments,
>and one special (required?) keyword argument "next_app".

I think we can just go positional on the next-app argument, since only 
filter factories can do anything with it.


>Another option is we pass in a single dictionary that represents the
>entire configuration.  This leaves room to add more arguments later,
>where if we use keyword arguments for configuration then there's really
>no room at all (the entire signature of the factory is taken up by
>application-specific configuration).

YAGNI; We don't have any place for this theoretical extra configuration to 
come from, and no use cases that can't be met by just adding it to the 
configuration.  Early error trapping is important, so I think it's better 
to let factories use normal Python argument validation to have required 
arguments, optional values, and to reject unrecognized arguments.


>Another part of the API that I can see as useful is passing in the
>distribution object itself.

Which distribution?  The one the entry point came from?  It already knows 
(or can find out) what distribution it's in.


>   This way a function in paste (or wherever)
>could serve as the loader for any application with the proper
>framework-specific metadata (and so probably this could devolve into
>per-framework loaders).

I don't understand.


>   This would perhaps preclude non-setuptools
>factories, though you could also pass in None for the distribution for
>those cases.

Huh?

I propose that we allow import specs as factory designators, so that the 
default case works fine without setuptools.  You only need setuptools if 
you use factory specs of the form "[feature from Project...]".  Of course, 
they're so cool that everybody will *want* to use them...  ;)


From ianb at colorstudy.com  Mon Jul 25 05:26:29 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun, 24 Jul 2005 22:26:29 -0500
Subject: [Web-SIG] WSGI deployment part 2: factory API
In-Reply-To: <5.1.1.6.0.20050724230048.02865e98@mail.telecommunity.com>
References: <5.1.1.6.0.20050724230048.02865e98@mail.telecommunity.com>
Message-ID: <42E45BE5.9050807@colorstudy.com>

Phillip J. Eby wrote:
>> Another option is we pass in a single dictionary that represents the
>> entire configuration.  This leaves room to add more arguments later,
>> where if we use keyword arguments for configuration then there's really
>> no room at all (the entire signature of the factory is taken up by
>> application-specific configuration).
> 
> 
> YAGNI; We don't have any place for this theoretical extra configuration 
> to come from, and no use cases that can't be met by just adding it to 
> the configuration.  Early error trapping is important, so I think it's 
> better to let factories use normal Python argument validation to have 
> required arguments, optional values, and to reject unrecognized arguments.

I think in practice I'll always take **kw, because I otherwise I'd have 
to enumerate all the configuration all the middleware takes, and that's 
impractical.  I suppose I could later assemble the middleware, determine 
what configuration the actual set of middleware+application takes, then 
check for extras.  But I doubt I will.  And even if I do, it's 
incidental -- I'm quite sure I won't use using the function signature 
for parameter checking.

>> Another part of the API that I can see as useful is passing in the
>> distribution object itself.
> 
> 
> Which distribution?  The one the entry point came from?  It already 
> knows (or can find out) what distribution it's in.

I mean like:

   [wsgi.app_factory]
   filebrowser = paste.wareweb:make_app

Where paste.wareweb.make_app knows how to build an application from 
filename conventions in the package itself, even though the 
paste.wareweb module isn't in the project itself.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From chrism at plope.com  Mon Jul 25 08:33:43 2005
From: chrism at plope.com (Chris McDonough)
Date: Mon, 25 Jul 2005 02:33:43 -0400
Subject: [Web-SIG] Entry points and import maps (was Re:
	Scarecrow	deployment config
In-Reply-To: <5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com>
References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com>
Message-ID: <1122273223.8767.18.camel@localhost.localdomain>

Thanks...

I'm still confused about high level requirements so please try to be
patient with me as I try get back on track.

These are the requirements as I understand them:

1.  We want to be able to distribute WSGI applications and middleware
    (presumably in a format supported by setuptools).

3.  We want to be able to configure a WSGI application in order
    to create an application instance.

2.  We want a way to combine configured instances of those
    applications into pipelines and start an "instance" of a pipeline.

Are these requirements the ones being discussed?  If so, which of the
config file formats we've been discussing matches which requirement?

Thanks,

- C

On Sun, 2005-07-24 at 22:24 -0400, Phillip J. Eby wrote:
> At 08:35 PM 7/24/2005 -0400, Chris McDonough wrote:
> >Sorry, I think I may have lost track of where we were going wrt the
> >deployment spec.  Specifically, I don't know how we got to using eggs
> >(which I'd really like to, BTW, they're awesome conceptually!) from
> >where we were in the discussion about configuring a WSGI pipeline.  What
> >is a "feature"?  What is an "import map"? "Entry point"?  Should I just
> >get more familiar with eggs to understand what's being discussed here or
> >did I miss a few posts?
> 
> I suggest this post as the shortest architectural introduction to the whole 
> egg thang:
> 
>      http://mail.python.org/pipermail/distutils-sig/2005-June/004652.html
> 
> It explains pretty much all of the terminology I'm currently using, except 
> for the new terms invented today...
> 
> Entry points are a new concept, invented today by Ian and myself.  Ian 
> proposed having a mapping file (which I dubbed an "import map") included in 
> an egg's metadata, and then referring to named entries from a pipeline 
> descriptor, so that you don't have to know or care about the exact name to 
> import.  The application or middleware factory name would be looked up in 
> the egg's import map in order to find the actual factory object.
> 
> I took Ian's proposal and did two things:
> 
> 1) Generalized the idea to a concept of "entry points".  An entry point is 
> a name that corresponds to an import specification, and an optional list of 
> "extras" (see terminology link above) that the entry point may 
> require.  Entry point names exist in a namespace called an "entry point 
> group", and I implied that the WSGI deployment spec would define two such 
> groups: wsgi.applications and wsgi.middleware, but a vast number of other 
> possibilities for entry points and groups exist.  In fact, I went ahead and 
> implemented them in setuptools today, and realized I could use them to 
> register setup commands with setuptools, making it extensible by any 
> project that registers entry points in a 'distutils.commands' group.
> 
> 2) I then proposed that we extend our deployment descriptor (.wsgi file) 
> syntax so that you can do things like:
> 
>      [foo from SomeProject]
>      # configuration here
> 
> What this does is tell the WSGI deployment API to look up the "foo" entry 
> point in either the wsgi.middleware or wsgi.applications entry point group 
> for the named project, according to whether it's the last item in the .wsgi 
> file.  It then invokes the factory as before, with the configuration values 
> as keyword arguments.
> 
> This proposal is of course an *extension*; it should still be possible to 
> use regular dotted names as section headings, if you haven't yet drunk the 
> setuptools kool-aid.  But, it makes for interesting possibilities because 
> we could now have a tool that reads a WSGI deployment descriptor and runs 
> easy_install to find and download the right projects.  So, you could 
> potentially just write up a descriptor that lists what you want and the 
> server could install it, although I think I personally would want to run a 
> tool explicitly; maybe I'll eventually add a --wsgi=FILENAME option to 
> EasyInstall that would tell it to find out what to install from a WSGI 
> deployment descriptor.
> 
> That would actually be pretty cool, when you realize it means that all you 
> have to do to get an app deployed across a bunch of web servers is to copy 
> the deployment descriptor and tell 'em to install stuff.  You can always 
> create an NFS-mounted cache directory where you put pre-built eggs, and 
> EasyInstall would just fetch and extract them in that case.
> 
> Whew.  Almost makes me wish I was back in my web apps shop, where this kind 
> of thing would've been *really* useful to have.
> 


From chrism at plope.com  Mon Jul 25 08:40:49 2005
From: chrism at plope.com (Chris McDonough)
Date: Mon, 25 Jul 2005 02:40:49 -0400
Subject: [Web-SIG] Entry points and import maps (was
	Re:	Scarecrow	deployment config
In-Reply-To: <1122273223.8767.18.camel@localhost.localdomain>
References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com>
	<1122273223.8767.18.camel@localhost.localdomain>
Message-ID: <1122273649.8767.25.camel@localhost.localdomain>

BTW, a simple example that includes proposed solutions for all of these
requirements would go a long way towards helping me (and maybe others)
understand how all the pieces fit together.  Maybe something like:

- Define two simple WSGI components:  a WSGI middleware and a WSGI
  application.

- Describe how to package each as an indpendent egg.

- Describe how to configure an instance of the application.

- Describe how to configure an instance of the middleware

- Describe how to string them together into a pipeline.

- C


On Mon, 2005-07-25 at 02:33 -0400, Chris McDonough wrote:
> Thanks...
> 
> I'm still confused about high level requirements so please try to be
> patient with me as I try get back on track.
> 
> These are the requirements as I understand them:
> 
> 1.  We want to be able to distribute WSGI applications and middleware
>     (presumably in a format supported by setuptools).
> 
> 3.  We want to be able to configure a WSGI application in order
>     to create an application instance.
> 
> 2.  We want a way to combine configured instances of those
>     applications into pipelines and start an "instance" of a pipeline.
> 
> Are these requirements the ones being discussed?  If so, which of the
> config file formats we've been discussing matches which requirement?
> 
> Thanks,
> 
> - C
> 
> On Sun, 2005-07-24 at 22:24 -0400, Phillip J. Eby wrote:
> > At 08:35 PM 7/24/2005 -0400, Chris McDonough wrote:
> > >Sorry, I think I may have lost track of where we were going wrt the
> > >deployment spec.  Specifically, I don't know how we got to using eggs
> > >(which I'd really like to, BTW, they're awesome conceptually!) from
> > >where we were in the discussion about configuring a WSGI pipeline.  What
> > >is a "feature"?  What is an "import map"? "Entry point"?  Should I just
> > >get more familiar with eggs to understand what's being discussed here or
> > >did I miss a few posts?
> > 
> > I suggest this post as the shortest architectural introduction to the whole 
> > egg thang:
> > 
> >      http://mail.python.org/pipermail/distutils-sig/2005-June/004652.html
> > 
> > It explains pretty much all of the terminology I'm currently using, except 
> > for the new terms invented today...
> > 
> > Entry points are a new concept, invented today by Ian and myself.  Ian 
> > proposed having a mapping file (which I dubbed an "import map") included in 
> > an egg's metadata, and then referring to named entries from a pipeline 
> > descriptor, so that you don't have to know or care about the exact name to 
> > import.  The application or middleware factory name would be looked up in 
> > the egg's import map in order to find the actual factory object.
> > 
> > I took Ian's proposal and did two things:
> > 
> > 1) Generalized the idea to a concept of "entry points".  An entry point is 
> > a name that corresponds to an import specification, and an optional list of 
> > "extras" (see terminology link above) that the entry point may 
> > require.  Entry point names exist in a namespace called an "entry point 
> > group", and I implied that the WSGI deployment spec would define two such 
> > groups: wsgi.applications and wsgi.middleware, but a vast number of other 
> > possibilities for entry points and groups exist.  In fact, I went ahead and 
> > implemented them in setuptools today, and realized I could use them to 
> > register setup commands with setuptools, making it extensible by any 
> > project that registers entry points in a 'distutils.commands' group.
> > 
> > 2) I then proposed that we extend our deployment descriptor (.wsgi file) 
> > syntax so that you can do things like:
> > 
> >      [foo from SomeProject]
> >      # configuration here
> > 
> > What this does is tell the WSGI deployment API to look up the "foo" entry 
> > point in either the wsgi.middleware or wsgi.applications entry point group 
> > for the named project, according to whether it's the last item in the .wsgi 
> > file.  It then invokes the factory as before, with the configuration values 
> > as keyword arguments.
> > 
> > This proposal is of course an *extension*; it should still be possible to 
> > use regular dotted names as section headings, if you haven't yet drunk the 
> > setuptools kool-aid.  But, it makes for interesting possibilities because 
> > we could now have a tool that reads a WSGI deployment descriptor and runs 
> > easy_install to find and download the right projects.  So, you could 
> > potentially just write up a descriptor that lists what you want and the 
> > server could install it, although I think I personally would want to run a 
> > tool explicitly; maybe I'll eventually add a --wsgi=FILENAME option to 
> > EasyInstall that would tell it to find out what to install from a WSGI 
> > deployment descriptor.
> > 
> > That would actually be pretty cool, when you realize it means that all you 
> > have to do to get an app deployed across a bunch of web servers is to copy 
> > the deployment descriptor and tell 'em to install stuff.  You can always 
> > create an NFS-mounted cache directory where you put pre-built eggs, and 
> > EasyInstall would just fetch and extract them in that case.
> > 
> > Whew.  Almost makes me wish I was back in my web apps shop, where this kind 
> > of thing would've been *really* useful to have.
> > 
> 
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/chrism%40plope.com
> 


From chrism at plope.com  Mon Jul 25 09:02:27 2005
From: chrism at plope.com (Chris McDonough)
Date: Mon, 25 Jul 2005 03:02:27 -0400
Subject: [Web-SIG] Entry points and import maps
	(was	Re:	Scarecrow	deployment config
In-Reply-To: <1122273649.8767.25.camel@localhost.localdomain>
References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com>
	<1122273223.8767.18.camel@localhost.localdomain>
	<1122273649.8767.25.camel@localhost.localdomain>
Message-ID: <1122274948.8767.40.camel@localhost.localdomain>

Actually, let me give this a shot.

We package up an egg called helloworld.egg.  It happens to contain
something that can be used as a WSGI component.  Let's say it's a WSGI
application that always returns 'Hello World'.  And let's say it also
contains middleware that lowercases anything that passes through before
it's returned.

The implementations of these components could be as follows:

class HelloWorld:
    def __init__(self, app, **kw):
        pass # nothing to configure

    def __call__(self, environ, start_response):
        start_response('200 OK', [])
        return ['Hello World']

class Lowercaser:
    def __init__(self, app, **kw):
        self.app = app
        # nothing else to configure

    def __call__(self, environ, start_response):
        for chunk in self.app(environ, start_response):
            yield chunk.lower()

An import map would ship inside of the egg-info dir:

[wsgi.app_factories]
helloworld = helloworld:HelloWorld
lowercaser = helloworld:Lowercaser

So we install the egg and this does nothing except allow it to be used
from within Python.
  
But when we create a "deployment descriptor" like so in a text editor:

[helloworld from helloworld]

[lowercaser from helloworld]

... and run some "starter" script that parses that as a pipeline,
creates the two instances, wires them together, and we get a running
pipeline?

Am I on track?

OK, back to Battlestar Galactica ;-)


On Mon, 2005-07-25 at 02:40 -0400, Chris McDonough wrote:
> BTW, a simple example that includes proposed solutions for all of these
> requirements would go a long way towards helping me (and maybe others)
> understand how all the pieces fit together.  Maybe something like:
> 
> - Define two simple WSGI components:  a WSGI middleware and a WSGI
>   application.
> 
> - Describe how to package each as an indpendent egg.
> 
> - Describe how to configure an instance of the application.
> 
> - Describe how to configure an instance of the middleware
> 
> - Describe how to string them together into a pipeline.
> 
> - C
> 
> 
> On Mon, 2005-07-25 at 02:33 -0400, Chris McDonough wrote:
> > Thanks...
> > 
> > I'm still confused about high level requirements so please try to be
> > patient with me as I try get back on track.
> > 
> > These are the requirements as I understand them:
> > 
> > 1.  We want to be able to distribute WSGI applications and middleware
> >     (presumably in a format supported by setuptools).
> > 
> > 3.  We want to be able to configure a WSGI application in order
> >     to create an application instance.
> > 
> > 2.  We want a way to combine configured instances of those
> >     applications into pipelines and start an "instance" of a pipeline.
> > 
> > Are these requirements the ones being discussed?  If so, which of the
> > config file formats we've been discussing matches which requirement?
> > 
> > Thanks,
> > 
> > - C
> > 
> > On Sun, 2005-07-24 at 22:24 -0400, Phillip J. Eby wrote:
> > > At 08:35 PM 7/24/2005 -0400, Chris McDonough wrote:
> > > >Sorry, I think I may have lost track of where we were going wrt the
> > > >deployment spec.  Specifically, I don't know how we got to using eggs
> > > >(which I'd really like to, BTW, they're awesome conceptually!) from
> > > >where we were in the discussion about configuring a WSGI pipeline.  What
> > > >is a "feature"?  What is an "import map"? "Entry point"?  Should I just
> > > >get more familiar with eggs to understand what's being discussed here or
> > > >did I miss a few posts?
> > > 
> > > I suggest this post as the shortest architectural introduction to the whole 
> > > egg thang:
> > > 
> > >      http://mail.python.org/pipermail/distutils-sig/2005-June/004652.html
> > > 
> > > It explains pretty much all of the terminology I'm currently using, except 
> > > for the new terms invented today...
> > > 
> > > Entry points are a new concept, invented today by Ian and myself.  Ian 
> > > proposed having a mapping file (which I dubbed an "import map") included in 
> > > an egg's metadata, and then referring to named entries from a pipeline 
> > > descriptor, so that you don't have to know or care about the exact name to 
> > > import.  The application or middleware factory name would be looked up in 
> > > the egg's import map in order to find the actual factory object.
> > > 
> > > I took Ian's proposal and did two things:
> > > 
> > > 1) Generalized the idea to a concept of "entry points".  An entry point is 
> > > a name that corresponds to an import specification, and an optional list of 
> > > "extras" (see terminology link above) that the entry point may 
> > > require.  Entry point names exist in a namespace called an "entry point 
> > > group", and I implied that the WSGI deployment spec would define two such 
> > > groups: wsgi.applications and wsgi.middleware, but a vast number of other 
> > > possibilities for entry points and groups exist.  In fact, I went ahead and 
> > > implemented them in setuptools today, and realized I could use them to 
> > > register setup commands with setuptools, making it extensible by any 
> > > project that registers entry points in a 'distutils.commands' group.
> > > 
> > > 2) I then proposed that we extend our deployment descriptor (.wsgi file) 
> > > syntax so that you can do things like:
> > > 
> > >      [foo from SomeProject]
> > >      # configuration here
> > > 
> > > What this does is tell the WSGI deployment API to look up the "foo" entry 
> > > point in either the wsgi.middleware or wsgi.applications entry point group 
> > > for the named project, according to whether it's the last item in the .wsgi 
> > > file.  It then invokes the factory as before, with the configuration values 
> > > as keyword arguments.
> > > 
> > > This proposal is of course an *extension*; it should still be possible to 
> > > use regular dotted names as section headings, if you haven't yet drunk the 
> > > setuptools kool-aid.  But, it makes for interesting possibilities because 
> > > we could now have a tool that reads a WSGI deployment descriptor and runs 
> > > easy_install to find and download the right projects.  So, you could 
> > > potentially just write up a descriptor that lists what you want and the 
> > > server could install it, although I think I personally would want to run a 
> > > tool explicitly; maybe I'll eventually add a --wsgi=FILENAME option to 
> > > EasyInstall that would tell it to find out what to install from a WSGI 
> > > deployment descriptor.
> > > 
> > > That would actually be pretty cool, when you realize it means that all you 
> > > have to do to get an app deployed across a bunch of web servers is to copy 
> > > the deployment descriptor and tell 'em to install stuff.  You can always 
> > > create an NFS-mounted cache directory where you put pre-built eggs, and 
> > > EasyInstall would just fetch and extract them in that case.
> > > 
> > > Whew.  Almost makes me wish I was back in my web apps shop, where this kind 
> > > of thing would've been *really* useful to have.
> > > 
> > 
> > _______________________________________________
> > Web-SIG mailing list
> > Web-SIG at python.org
> > Web SIG: http://www.python.org/sigs/web-sig
> > Unsubscribe: http://mail.python.org/mailman/options/web-sig/chrism%40plope.com
> > 
> 
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/chrism%40plope.com
> 


From pje at telecommunity.com  Mon Jul 25 16:39:48 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 25 Jul 2005 10:39:48 -0400
Subject: [Web-SIG] Entry points and import maps (was
 Re:	Scarecrow	deployment config
In-Reply-To: <1122274948.8767.40.camel@localhost.localdomain>
References: <1122273649.8767.25.camel@localhost.localdomain>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com>
	<1122273223.8767.18.camel@localhost.localdomain>
	<1122273649.8767.25.camel@localhost.localdomain>
Message-ID: <5.1.1.6.0.20050725103138.02879238@mail.telecommunity.com>

At 03:02 AM 7/25/2005 -0400, Chris McDonough wrote:
>Actually, let me give this a shot.
>
>We package up an egg called helloworld.egg.  It happens to contain
>something that can be used as a WSGI component.  Let's say it's a WSGI
>application that always returns 'Hello World'.  And let's say it also
>contains middleware that lowercases anything that passes through before
>it's returned.
>
>The implementations of these components could be as follows:
>
>class HelloWorld:
>     def __init__(self, app, **kw):
>         pass # nothing to configure
>
>     def __call__(self, environ, start_response):
>         start_response('200 OK', [])
>         return ['Hello World']

I'm thinking that an application like this wouldn't take an 'app' 
constuctor parameter, and if it takes no configuration parameters it 
doesn't need **kw, but good so far.


>class Lowercaser:
>     def __init__(self, app, **kw):
>         self.app = app
>         # nothing else to configure
>
>     def __call__(self, environ, start_response):
>         for chunk in self.app(environ, start_response):
>             yield chunk.lower()

Again, no need for **kw if it doesn't take any configuration, but okay.


>An import map would ship inside of the egg-info dir:
>
>[wsgi.app_factories]
>helloworld = helloworld:HelloWorld
>lowercaser = helloworld:Lowercaser

I'm thinking it would be more like:

     [wsgi.middleware]
     lowercaser = helloworld:Lowercaser

     [wsgi.apps]
     helloworld = helloworld:HelloWorld

and you'd specify it in the setup script as something like this:

     setup(
         #...
         entry_points = {
             'wsgi.apps': ['helloworld = helloworld:HelloWorld']
             'wsgi.middleware': ['lowercaser = helloworld:Lowercaser']
         }
     )

(And the CVS version of setuptools already supports this.)


>So we install the egg and this does nothing except allow it to be used
>from within Python.
>
>But when we create a "deployment descriptor" like so in a text editor:
>
>[helloworld from helloworld]
>
>[lowercaser from helloworld]

Opposite order, though; the lowercaser comes first because it's the 
middleware; the application would always come last, because they're listed 
in the order in which they receive data, just like a pipes-and-filters 
command line.


>... and run some "starter" script that parses that as a pipeline,

... possibly using a #! line if you're using CGI or FastCGI with Apache or 
some other non-Python webserver.


>creates the two instances, wires them together, and we get a running
>pipeline?
>
>Am I on track?

Definitely.


From pje at telecommunity.com  Mon Jul 25 16:40:49 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 25 Jul 2005 10:40:49 -0400
Subject: [Web-SIG] WSGI deployment part 2: factory API
In-Reply-To: <42E45BE5.9050807@colorstudy.com>
References: <5.1.1.6.0.20050724230048.02865e98@mail.telecommunity.com>
	<5.1.1.6.0.20050724230048.02865e98@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20050724233258.026e69a8@mail.telecommunity.com>

At 10:26 PM 7/24/2005 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>>Another option is we pass in a single dictionary that represents the
>>>entire configuration.  This leaves room to add more arguments later,
>>>where if we use keyword arguments for configuration then there's really
>>>no room at all (the entire signature of the factory is taken up by
>>>application-specific configuration).
>>
>>YAGNI; We don't have any place for this theoretical extra configuration 
>>to come from, and no use cases that can't be met by just adding it to the 
>>configuration.  Early error trapping is important, so I think it's better 
>>to let factories use normal Python argument validation to have required 
>>arguments, optional values, and to reject unrecognized arguments.
>
>I think in practice I'll always take **kw, because I otherwise I'd have to 
>enumerate all the configuration all the middleware takes, and that's 
>impractical.  I suppose I could later assemble the middleware, determine 
>what configuration the actual set of middleware+application takes, then 
>check for extras.  But I doubt I will.  And even if I do, it's incidental 
>-- I'm quite sure I won't use using the function signature for parameter 
>checking.

Well, I'm sure I will for simple things.  For more complex things, I'll use 
the pattern of checking **kw against class attributes to make sure they 
exist.  PEAK, for example, already has this ability built-in, so it's 
definitely the path of least resistance for implementing a middleware 
component in PEAK; just subclass binding.Component and add attribute 
bindings for everything needed.  I'd hate to give that up for a theoretical 
argument that someday we might need some kind of arguments that aren't 
arguments.  It's not as if we couldn't define a new protocol, and a 
different modifier in the deployment descriptor, if that day ever actually 
arrived.


>>>Another part of the API that I can see as useful is passing in the
>>>distribution object itself.
>>
>>Which distribution?  The one the entry point came from?  It already knows 
>>(or can find out) what distribution it's in.
>
>I mean like:
>
>   [wsgi.app_factory]
>   filebrowser = paste.wareweb:make_app
>
>Where paste.wareweb.make_app knows how to build an application from 
>filename conventions in the package itself, even though the paste.wareweb 
>module isn't in the project itself.

Oh.  I think I get you now; you want to be able to define an entry point 
that wraps itself in something else.  I don't see though why I can't just 
put the wrapper code in myself, like this:

     def my_app(*args, **kw):
         return paste.wareweb.make_app(
             pkg_resources.get_provider(__name__), *args, **kw
         )

And then just make the entry point refer to this.  Or, if you want to be fancy:

     my_app = paste.wareweb.app_maker(__name__)

This seems more than sufficient for the use case.


From chrism at plope.com  Mon Jul 25 18:59:22 2005
From: chrism at plope.com (Chris McDonough)
Date: Mon, 25 Jul 2005 12:59:22 -0400
Subject: [Web-SIG] Entry points and import maps
	(was	Re:	Scarecrow	deployment config
In-Reply-To: <5.1.1.6.0.20050725103138.02879238@mail.telecommunity.com>
References: <1122273649.8767.25.camel@localhost.localdomain>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com>
	<1122273223.8767.18.camel@localhost.localdomain>
	<1122273649.8767.25.camel@localhost.localdomain>
	<5.1.1.6.0.20050725103138.02879238@mail.telecommunity.com>
Message-ID: <1122310762.3898.26.camel@plope.dyndns.org>

Great.  Given that, I've created the beginnings of a more formal
specification:

WSGI Deployment Specification
-----------------------------

  I use the term "WSGI component" in here as shorthand to indicate all
  types of WSGI implementations (application, middleware).

  The primary deployment concern is to create a way to specify the
  configuration of an instance of a WSGI component within a
  declarative configuration file.  A secondary deployment concern is
  to create a way to "wire up" components together into a specific
  deployable "pipeline".

Pipeline Descriptors
--------------------

  Pipeline descriptors are file representations of a particular WSGI
  "pipeline".  They include enough information to configure,
  instantiate, and wire together WSGI apps and middleware components
  into one pipeline for use by a WSGI server.  Installation of the
  software which composes those components is handled separately.

  In order to define a pipeline, we use a ".ini"-format configuration
  file conventionally named '<something>.wsgi'.  This file may
  optionally be marked as executable and associated with a simple UNIX
  interpreter via a leading hash-bang line to allow servers which
  employ stdin and stdout streams (ala CGI) to run the pipeline
  directly without any intermediation.  For example, a deployment
  descriptor named 'myapplication.wsgi' might be composed of the
  following text::

    #!/usr/bin/runwsgi

    [mypackage.mymodule.factory1]
    quux = arbitraryvalue
    eekx = arbitraryvalue

    [mypackage.mymodule.factory2]
    foo = arbitraryvalue
    bar = arbitraryvalue

  Section names are Python-dotted-path names (or setuptools "entry
  point names" described in a later section) which represent
  factories.  Key-value pairs within a given section are used as
  keyword arguments to the factory that can be used as configuration
  for the component being instantiated.

  All sections in the deployment descriptor describe 'middleware'
  except for the last section, which must describe an application.

  Factories which construct middleware must return something which is
  a WSGI "callable" by implementing the following API::

     def factory(next_app, [**kw]):
         """ next_app is the next application in the WSGI pipeline,
         **kw is optional, and accepts the key-value pairs
         that are used in the section as a dictionary, used
         for configuration """

  Factories which construct middleware must return something which is
  a WSGI "callable" by implementing the following API::

     def factory([**kw]):
         """" **kw is optional, and accepts the key-value pairs
          that are used in the section as a dictionary, used
          for configuration """

  A deployment descriptor can also be parsed from within Python.  An
  importable configurator which resides in 'wsgiref' exposes a
  function that accepts a single argument, "configure"::

    >>> from wsgiref.runwsgi import parse_deployment
    >>> appchain = parse_deployment('myapplication.wsgi')

  'appchain' will be an object representing the fully configured
  "pipeline".  'parse_deployment' is guaranteed to return something
  that implements the WSGI "callable" API described in PEP 333.

Entry Points

  <description of setuptools entry points goes here>


On Mon, 2005-07-25 at 10:39 -0400, Phillip J. Eby wrote:
> At 03:02 AM 7/25/2005 -0400, Chris McDonough wrote:
> >Actually, let me give this a shot.
> >
> >We package up an egg called helloworld.egg.  It happens to contain
> >something that can be used as a WSGI component.  Let's say it's a WSGI
> >application that always returns 'Hello World'.  And let's say it also
> >contains middleware that lowercases anything that passes through before
> >it's returned.
> >
> >The implementations of these components could be as follows:
> >
> >class HelloWorld:
> >     def __init__(self, app, **kw):
> >         pass # nothing to configure
> >
> >     def __call__(self, environ, start_response):
> >         start_response('200 OK', [])
> >         return ['Hello World']
> 
> I'm thinking that an application like this wouldn't take an 'app' 
> constuctor parameter, and if it takes no configuration parameters it 
> doesn't need **kw, but good so far.
> 
> 
> >class Lowercaser:
> >     def __init__(self, app, **kw):
> >         self.app = app
> >         # nothing else to configure
> >
> >     def __call__(self, environ, start_response):
> >         for chunk in self.app(environ, start_response):
> >             yield chunk.lower()
> 
> Again, no need for **kw if it doesn't take any configuration, but okay.
> 
> 
> >An import map would ship inside of the egg-info dir:
> >
> >[wsgi.app_factories]
> >helloworld = helloworld:HelloWorld
> >lowercaser = helloworld:Lowercaser
> 
> I'm thinking it would be more like:
> 
>      [wsgi.middleware]
>      lowercaser = helloworld:Lowercaser
> 
>      [wsgi.apps]
>      helloworld = helloworld:HelloWorld
> 
> and you'd specify it in the setup script as something like this:
> 
>      setup(
>          #...
>          entry_points = {
>              'wsgi.apps': ['helloworld = helloworld:HelloWorld']
>              'wsgi.middleware': ['lowercaser = helloworld:Lowercaser']
>          }
>      )
> 
> (And the CVS version of setuptools already supports this.)
> 
> 
> 
> >So we install the egg and this does nothing except allow it to be used
> >from within Python.
> >
> >But when we create a "deployment descriptor" like so in a text editor:
> >
> >[helloworld from helloworld]
> >
> >[lowercaser from helloworld]
> 
> Opposite order, though; the lowercaser comes first because it's the 
> middleware; the application would always come last, because they're listed 
> in the order in which they receive data, just like a pipes-and-filters 
> command line.
> 
> 
> >... and run some "starter" script that parses that as a pipeline,
> 
> ... possibly using a #! line if you're using CGI or FastCGI with Apache or 
> some other non-Python webserver.
> 
> 
> >creates the two instances, wires them together, and we get a running
> >pipeline?
> >
> >Am I on track?
> 
> Definitely.
> 


From pje at telecommunity.com  Mon Jul 25 19:35:11 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 25 Jul 2005 13:35:11 -0400
Subject: [Web-SIG] Entry points and import maps (was
 Re:	Scarecrow	deployment config
In-Reply-To: <1122310762.3898.26.camel@plope.dyndns.org>
References: <5.1.1.6.0.20050725103138.02879238@mail.telecommunity.com>
	<1122273649.8767.25.camel@localhost.localdomain>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com>
	<1122273223.8767.18.camel@localhost.localdomain>
	<1122273649.8767.25.camel@localhost.localdomain>
	<5.1.1.6.0.20050725103138.02879238@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20050725131252.02789400@mail.telecommunity.com>

At 12:59 PM 7/25/2005 -0400, Chris McDonough wrote:
>   In order to define a pipeline, we use a ".ini"-format configuration

Ultimately I think the spec will need a formal description of what that 
means exactly, including such issues as a PEP 263-style "encoding" 
specifier, and the precise format of values.  But I'm fine with adding all 
that myself, since I'm going to have to specify it well enough to create a 
parser anyway.

With respect to the format, I'm actually leaning towards either treating 
the settings as Python assignment statements (syntactically speaking) or 
restricting the values to being single-token Python literals (i.e., 
numbers, strings, or True/False/None, but not tuples, lists, or other 
expressions).

Interestingly enough, I think you could actually define the entire format 
in terms of standard Python-language tokens, it's just the higher-level 
syntax that differs from Python's.  Although actually using the "tokenize" 
module to scan it would mean that all lines' content would need to start 
exactly at the left margin, with no indentation.  Probably not a big deal, 
though.

The syntax would probably be something like:

   pipeline    ::= section*
   section     ::= heading assignment*
   heading     ::= '[' qname trailer ']' NEWLINE
   assignment  ::= NAME '=' value NEWLINE

   qname       ::= NAME ('.' NAME) *
   trailer     ::= "from" requirement | "options"
   value       ::= NUMBER | STRING | "True" | "False" | "None"
   requirement ::= NAME versionlist?
   versionlist ::= versionspec (',' versionspec)*
   versionspec ::= relop STRING
   relop       ::= "<" | "<=" | "==" | "!=" | ">=" | ">"

The versions would have to be strings in order to avoid problems parsing 
e.g '2.1a4' as a number.  And if we were going to allow structures like 
tuples or lists or dictionaries, then we'd need to expand on 'value' a 
little bit, but not as much as if we allowed arbitrary expressions.


>   file conventionally named '<something>.wsgi'.  This file may
>   optionally be marked as executable and associated with a simple UNIX
>   interpreter via a leading hash-bang line to allow servers which
>   employ stdin and stdout streams (ala CGI) to run the pipeline
>   directly without any intermediation.

For that matter, while doing development and testing, the interpreter could 
be something like "#!invoke peak launch wsgifile", to launch the app in a 
web browser from a localhost http server.  (Assuming I added a "wsgifile" 
command to PEAK, of course.)


>   Factories which construct middleware must return something which is
>   a WSGI "callable" by implementing the following API::
>
>      def factory(next_app, [**kw]):
>          """ next_app is the next application in the WSGI pipeline,
>          **kw is optional, and accepts the key-value pairs
>          that are used in the section as a dictionary, used
>          for configuration """

Note that you can also just list the parameter names you take, or no 
parameter names at all.  I don't want to imply that you *have* to use kw, 
because it's fairly easy to envision simple middleware components that only 
take two or three parameters, or maybe even just one (e.g., their config 
file name).


>   Factories which construct middleware must return something which is
>   a WSGI "callable" by implementing the following API::

You probably meant "application" or "terminal application" here.  (Or 
whatever term we end up with for an application that isn't middleware.


>   A deployment descriptor can also be parsed from within Python.  An
>   importable configurator which resides in 'wsgiref' exposes a
>   function that accepts a single argument, "configure"::
>
>     >>> from wsgiref.runwsgi import parse_deployment
>     >>> appchain = parse_deployment('myapplication.wsgi')
>
>   'appchain' will be an object representing the fully configured
>   "pipeline".  'parse_deployment' is guaranteed to return something
>   that implements the WSGI "callable" API described in PEP 333.

Or raise SyntaxError for a malformed descriptor file, or ImportError if an 
application import failed or an entry point couldn't be found, or 
DistributionNotFound if a needed egg couldn't be found, or VersionConflict 
if it needs a conflicting version.

Or really it could raise anything if one of the factories failed, come to 
think of it.


From pje at telecommunity.com  Mon Jul 25 19:40:49 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 25 Jul 2005 13:40:49 -0400
Subject: [Web-SIG] Entry points and import maps (was
 Re:	Scarecrow	deployment config
In-Reply-To: <5.1.1.6.0.20050725131252.02789400@mail.telecommunity.com>
References: <1122310762.3898.26.camel@plope.dyndns.org>
	<5.1.1.6.0.20050725103138.02879238@mail.telecommunity.com>
	<1122273649.8767.25.camel@localhost.localdomain>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>
	<5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com>
	<1122273223.8767.18.camel@localhost.localdomain>
	<1122273649.8767.25.camel@localhost.localdomain>
	<5.1.1.6.0.20050725103138.02879238@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20050725134010.02809100@mail.telecommunity.com>

At 01:35 PM 7/25/2005 -0400, Phillip J. Eby wrote:
>    heading     ::= '[' qname trailer ']' NEWLINE

Oops. That should've been "trailer?", since the trailer is optional.


From ianb at colorstudy.com  Mon Jul 25 19:49:26 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon, 25 Jul 2005 12:49:26 -0500
Subject: [Web-SIG] Entry points and import maps (was
 Re:	Scarecrow	deployment config
In-Reply-To: <5.1.1.6.0.20050725131252.02789400@mail.telecommunity.com>
References: <5.1.1.6.0.20050725103138.02879238@mail.telecommunity.com>	<1122273649.8767.25.camel@localhost.localdomain>	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>	<5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com>	<5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com>	<1122273223.8767.18.camel@localhost.localdomain>	<1122273649.8767.25.camel@localhost.localdomain>	<5.1.1.6.0.20050725103138.02879238@mail.telecommunity.com>
	<5.1.1.6.0.20050725131252.02789400@mail.telecommunity.com>
Message-ID: <42E52626.5080104@colorstudy.com>

Phillip J. Eby wrote:
> At 12:59 PM 7/25/2005 -0400, Chris McDonough wrote:
> 
>>  In order to define a pipeline, we use a ".ini"-format configuration
> 
> 
> Ultimately I think the spec will need a formal description of what that 
> means exactly, including such issues as a PEP 263-style "encoding" 
> specifier, and the precise format of values.  But I'm fine with adding all 
> that myself, since I'm going to have to specify it well enough to create a 
> parser anyway.

Incidentally I have a generic ini parser here:

   http://svn.w4py.org/home/ianb/wsgikit_old_config/iniparser.py

I suspect I'm doing the character decoding improperly (line-by-line 
instead of opening the file with the given character encoding), but 
otherwise it's been sufficiently generic and workable, and should allow 
for doing more extensive parsing of things like section headers.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

From james at pythonweb.org  Mon Jul 25 23:54:08 2005
From: james at pythonweb.org (James Gardner)
Date: Mon, 25 Jul 2005 22:54:08 +0100
Subject: [Web-SIG] Entry points and import maps (was
 Re:	Scarecrow	deployment config
Message-ID: <42E55F80.7090300@pythonweb.org>

Hi All,

I'm a bit late coming to all this and didn't really see the benefits of 
the new format over what we already do so I set out to contrast new and 
old to demonstrate why it wasn't *that* useful. I've since changed my 
mind and think it is great but here is the contrasting I did anyway. I'd 
be pleased to hear all the glaring errors :-)

Here is a new example: we want to have an application that returns a 
GZip encoded "hello world" string after it has been made lowercase by 
case changer middleware taking a parameter newCase. The GZip middleware 
is an optional feature of the modules in wsgiFilters.egg and the 
CaseChanger middleware and HelloWorld application are in the helloworld.egg.

The classes look like this:

class HelloWorld:
    def __call__(self, environ, start_response):
        start_response('200 OK', [('Content-type','text/plain')])
        return ['Hello World']

class CaseChanger:
    def __init__(self, app, newCase):
        self.app = app
        self.newCase = newCase

    def __call__(self, environ, start_response):
        for chunk in self.app(environ, start_response):
            if self.newCase == 'lower':
                yield chunk.lower()
            else: 
                yield chunk

Class GZip:
    def __init__(self, app):
        self.app = app

    def __call__(self, environ, start_response):
        # Do clever things with headers here (omitted)
	for chunk in self.app(environ, start_response):
            yeild gzip(chunk)

The way we would write our application at the moment is as follows:

from pkg_resources import require

require('helloworld >= 0.2')
from helloworld import Helloworld

require('wsgiFilters[GZip] == 1.4.3')
from wsgiFilters import GZip

pipeline =  GZip(
                app = CaseChanger(
                    app = HelloWorld(),
                    newCase = 'lowercase',
                )
            )

With pipeline itself somehow being executed as a WSGI application.

The new way is like this (correct me if I'm wrong)

The modules have egg_info files like this respectively defining the 
"entry points":

wsgiFilters.egg:

[wsgi.middleware]
gzipper = GZip:GZip

helloworld.egg:

[wsgi.middleware]
cs = helloworld:CaseChanger

[wsgi.app]
myApp = helloworld:HelloWorld


We would then write an "import map" (below) based on the "deployment 
descriptors" in the .eggs used to describe the "entry points" into the 
eggs. The order the "pipeline" would be built is the same as in the 
Python example eg middleware first then application.

[gzipper from wsgiFilters[GZip] == 1.4.3]
[cs from helloworld  >= 0.2 ]
newCase = 'lower'
[myApp from helloworld >= 0.2]


It is loaded using an as yet unwritten modules which uses a factory 
returning a middleware pipeline equivalent to what would be produced in 
the Python example (is this very last bit correct?)

Doing things this new way has the following advantages:
* We have specified explicitly in the setup.py of the eggs that the 
middleware and applications we are importing are actually middleware and 
an application
* It is simpler for a non-technical user.
* There are lots of other applications for the ideas being discussed

It has the following disadvantages:
* We are limited as to what we can use as variable names. Existing 
middleware would need customising to only accept basic parameters.
* We require all WSGI coders to use the egg format.
* Users can't customise the middleware in the configuration file (eg by 
creating a derived class etc and you lose flexibility).
* If we use a Python file we can directly import and manipulate the 
pipeline (I guess you can do this anyway once your factory has returned 
the pipeline)

Both methods are the same in that
* We have specified the order of the pipeline and the middleware and 
applications involved
* Auto-downloading and installation of middleware and applications based 
on version requirements is possible (thanks to PJE's eggs)
* We have specified which versions of modules we require.
* Both could call a script such as wsgi_CGI.py wsgi_mod_python.py etc to 
execute the WSGI pipeline so both method's files could be distributed as 
a single file and would auto download their own dependencies.

Other ideas:

Is it really necessary to be able to give an entry point a name? If not 
because we know what we want to import anyway, we can combine the 
deployment descriptor into the import map:

[GZip:GZip from wsgiFilters[GZip] == 1.4.3]


We can then simplify the deployment descriptor like this:

[wsgi.middleware]
GZip:GZip

And then remove the colons and give a fully qualified Python-style path:

[GZip.GZip from wsgiFilters[GZip] == 1.4.3]

and

[wsgi.middleware]
GZip.GZip

Is this not better? Why do you need to assign names to entry points?

Although writing a middleware chain is dead easy for a Python 
programmer, it isn't for the end user and if you compare the end user 
files from this example I know which one I'd rather explain to someone. 
So although this deployment format seemed at first like overkill, I'm 
now very much in favour. I was personally considering YAML for doing my 
own configuration using a factory but frankly the new format is much 
cleaner and you don't need all the power of YAML anyway! Count me in!

James


From pje at telecommunity.com  Tue Jul 26 00:27:17 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 25 Jul 2005 18:27:17 -0400
Subject: [Web-SIG] Entry points and import maps (was
 Re:	Scarecrow	deployment config
In-Reply-To: <42E55F80.7090300@pythonweb.org>
Message-ID: <5.1.1.6.0.20050725180559.027bcfd8@mail.telecommunity.com>

At 10:54 PM 7/25/2005 +0100, James Gardner wrote:
>The new way is like this (correct me if I'm wrong)
>
>The modules have egg_info files like this respectively defining the
>"entry points":
>
>wsgiFilters.egg:
>
>[wsgi.middleware]
>gzipper = GZip:GZip

Almost; this one should be:

    [wsgi.middleware]
    gzipper = GZip:GZip [GZip]

So that using gzipper doesn't require specifying "extras" in the pipeline 
descriptor.  See below.


>helloworld.egg:
>
>[wsgi.middleware]
>cs = helloworld:CaseChanger
>
>[wsgi.app]
>myApp = helloworld:HelloWorld
>
>
>We would then write an "import map" (below) based on the "deployment
>descriptors" in the .eggs used to describe the "entry points" into the
>eggs.

Actually, the new thing you write is the deployment descriptor or pipeline 
descriptor.  The "import map" is the thing you put in the eggs' setup.py to 
list the entry points offered by the eggs.


>The order the "pipeline" would be built is the same as in the
>Python example eg middleware first then application.
>
>[gzipper from wsgiFilters[GZip] == 1.4.3]
>[cs from helloworld  >= 0.2 ]
>newCase = 'lower'
>[myApp from helloworld >= 0.2]

You wouldn't need the [GZip] part if it were declared with the entry point, 
as I showed above.


>It is loaded using an as yet unwritten modules which uses a factory
>returning a middleware pipeline equivalent to what would be produced in
>the Python example (is this very last bit correct?)

Yes.  The order in the file is the order in which the items are invoked by 
the controlling server.


>Doing things this new way has the following advantages:
>* We have specified explicitly in the setup.py of the eggs that the
>middleware and applications we are importing are actually middleware and
>an application
>* It is simpler for a non-technical user.
>* There are lots of other applications for the ideas being discussed
>
>It has the following disadvantages:
>* We are limited as to what we can use as variable names. Existing
>middleware would need customising to only accept basic parameters.

This depends a lot on the details of the .ini-like format, which are still 
up in the air.


>* We require all WSGI coders to use the egg format.

Not so; you can use [GZip.GZip] as a section header in order to do just a 
plain ol' import.


>* Users can't customise the middleware in the configuration file (eg by
>creating a derived class etc and you lose flexibility).

No, but all they have to do is create a Python file and refer to it, and 
they are thereby encouraged to separate code from configuration.  :)


>* If we use a Python file we can directly import and manipulate the
>pipeline (I guess you can do this anyway once your factory has returned
>the pipeline)

Yep.


>Both methods are the same in that
>* We have specified the order of the pipeline and the middleware and
>applications involved
>* Auto-downloading and installation of middleware and applications based
>on version requirements is possible (thanks to PJE's eggs)

One difference here: the .ini format is parseable to determine what eggs 
are needed without executing arbitrary code.


>Other ideas:
>
>Is it really necessary to be able to give an entry point a name?

Yes.  Entry points are a generic setuptools mechanism now, and they have 
names.  However, this doesn't mean they all have to be exported by an egg's 
import map.


>If not
>because we know what we want to import anyway, we can combine the
>deployment descriptor into the import map:
>
>[GZip:GZip from wsgiFilters[GZip] == 1.4.3]

We could perhaps still allow that format; the format is still being 
discussed.  However, this would just be a function of the .wsgi file, and 
doesn't affect the concept of "entry points".  It's just naming a factory 
directly instead of accessing it via an entry point.


>We can then simplify the deployment descriptor like this:
>
>[wsgi.middleware]
>GZip:GZip

If you don't care about the entry point, you can just not declare one.  But 
you can't opt out of naming them.


>Why do you need to assign names to entry points?

Because other things that use entry points need names.  For example, 
setuptools searches a "distutils.commands" entry point group to find 
commands that extend the normal setup commands.  It certainly doesn't know 
in advance what commands the eggs are going to provide.

The question you should be asking is, "Why do we have to use entry points 
to specify factories?", and the answer is, "we don't".  :)


>Although writing a middleware chain is dead easy for a Python
>programmer, it isn't for the end user and if you compare the end user
>files from this example I know which one I'd rather explain to someone.

Yep.  Ian gets the credit for further simplifying my "sequence of 
[factory.name] sections" proposal by coming up with the idea of having 
named entry points declared in an egg.  I then took the entry points idea 
to its logical conclusion, even refactoring setuptools to use them for its 
own extensibility.


>So although this deployment format seemed at first like overkill, I'm
>now very much in favour. I was personally considering YAML for doing my
>own configuration using a factory but frankly the new format is much
>cleaner and you don't need all the power of YAML anyway! Count me in!

There's one other advantage: this format will hopefully become as 
successful as WSGI itself in adoption by servers and 
applications.  Hopefully within a year or so, *the* normal way to deploy a 
Python web app will be using a .wsgi file.

Beyond that, we can hopefully begin to see "Python" rather than "framework 
X" as being what people write their web apps with.


From ianb at colorstudy.com  Tue Jul 26 01:40:14 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon, 25 Jul 2005 18:40:14 -0500
Subject: [Web-SIG] WSGI deployment use case
Message-ID: <42E5785E.1040900@colorstudy.com>

Well, I thought I'd chime in with everything I'd want in a deployment 
strategy; some of this clearly is the realm of practice, not code, but 
it all fits together.  It's not necessarily the Universal Use Case, but 
I don't think it's too strange.


Here's some types of applications:

* Things I (or someone in my company) codes.

* Full application someone else creates and I use.

* Applications that are more like a service, that use inside an 
application of mine.  These are end-point applications, like a REST 
service or something.  An application like this probably appears to live 
inside my application, through some form of delegation; unless it's a 
service many applications share...?

* Middleware written by me for my internal use, maybe application-specific.

* Middleware written specifically for the framework I (or someone else) 
uses.

* General-purpose middleware that could apply to anything.


Some kinds of deployments I want to do:

* Two clients with the same application, same version (like "latest 
stable version").

* Two clients with different versions; e.g., one client hasn't paid for 
an upgrade (which might mean upgrading).

* A client with branched code, i.e., we've tweaked one instance of the 
application just for them.

* Two installations of the same application, in the same process, with 
different URLs and different configurations.  This might be something as 
small as a formmail kind of script, or a large program.

* Sometimes apps go into different processes, but often they can go into 
the same process (especially if I start using Python WSGI for the kind 
of seldom-used apps that I now use CGI for).

* I have to mount these applications at some location.  This should be 
part of the deployment configuration; both path based and domain name based.


Here's some aspects of the configuration:

* Many applications have a lot of configuration.  Much of it is "just in 
case" configuration that I'd never want to tweak.  Some of that 
configuration may be derivative of things I do want to tweak, e.g., URL 
layouts where I configure the base URL, but all the other URLs could be 
derived from that.

* What appears to be an application from the outside might be composed 
of many applications.  Maybe an app includes an external formmail app 
for a "support" link.  That app requires configuration (like smtp server).

* I'd like to configure some things globally.  Like that smtp server. 
Or an email address to send unexpected exceptions to.

* I might want to override configuration locally, like that email 
address.  I might want to augment configuration, like just add an 
address to the list, not reset the whole value.

* I'd like to install some middleware globally as well.  Like a session 
handler, perhaps.  Or authentication.  Or an exception catcher -- I'd 
like everyone to use my well-configured exception catcher.  So not only 
am I adding middleware, I might be asking that middleware be excluded 
(or should simply short-circuit itself).

* And of course, all my applications take configuration, separate from 
middleware and frameworks.

* And usually there are non-WSGI pieces that need access to the exact 
same configuration; scripts and cronjobs and whatnot.  Usually they just 
need the application configuration, but nothing related to middleware or 
the web.


I think quite a bit of this is handled well by what we're talking about, 
even if it wasn't just a little while ago; versioning for instance. 
Branches I'm a little less sure about, since version numbers are linear.

But configuration and composition of multiple independent applications 
into a single process isn't.  I don't think we can solve these 
separately, because the Hard Problem is how to handle configuration 
alongside composition.  How can I apply configuration to a set of 
applications?  How can I make exceptions?  How can an application 
consume configuration as well as delegate configuration to a 
subapplication?  The pipeline is often more like a tree, so the logic is 
a little complex.  Or, rather, there's actual *logic* in how 
configuration is applied, almost all of which are viable.

I can figure out a bunch of ad hoc and formal ways of accomplishing this 
in Paste; most of it is already possible, and entry points alone clean 
up a lot of what's there (encouraging a separation between how an 
application is invoked generally, and install-specific configuration). 
But with a more limited and declarative configuration it is harder. 
Also when configuration is pushed into factories as keyword arguments, 
instead of being pulled out of a dictionary, it is much harder -- the 
configuration becomes unhackable.


-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

From ianb at colorstudy.com  Tue Jul 26 01:49:36 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon, 25 Jul 2005 18:49:36 -0500
Subject: [Web-SIG] WSGI deployment use case
In-Reply-To: <42E5785E.1040900@colorstudy.com>
References: <42E5785E.1040900@colorstudy.com>
Message-ID: <42E57A90.5060306@colorstudy.com>

I thought it would muck up the list of issues too much if I added too 
much commentary.  But then it's not that useful without it...

Ian Bicking wrote:
> Here's some types of applications:
> 
> * Things I (or someone in my company) codes.

I'm planning on making everything an egg.  I'm even thinking about how I 
can make my Javascript libraries eggs, though I'm not sure about that. 
We keep an internal index of these.

> * Full application someone else creates and I use.

I'm hoping these are eggable; if not we'll probably make them so, just 
so we can manage them.

> * Applications that are more like a service, that use inside an 
> application of mine.  These are end-point applications, like a REST 
> service or something.  An application like this probably appears to live 
> inside my application, through some form of delegation; unless it's a 
> service many applications share...?

This is more annoying.  Again, eggs.  But they get mounted somewhere, 
maybe based on configuration, maybe not.

If the application is nested, then my application will recursively use 
configuration to create these applications.

> * Middleware written by me for my internal use, maybe application-specific.

I'll probably apply these in my own factory functions.

> * Middleware written specifically for the framework I (or someone else) 
> uses.

Again, probably in a factory function.  Sometimes my own stuff will go 
above or below these.  Some pieces of the framework need to be aware of 
my special middleware (like a URL parser).

> * General-purpose middleware that could apply to anything.

This is stuff I want to configure globally; an open issue.

> Some kinds of deployments I want to do:
> 
> * Two clients with the same application, same version (like "latest 
> stable version").

I can potentially install these in a single process or in separate 
processes.  They each get a separate configuration.  Probably domain 
name based dispatching to the different configuration files.

> * Two clients with different versions; e.g., one client hasn't paid for 
> an upgrade (which might mean upgrading).

Definitely need two processes; otherwise no problem with Eggs -- that 
means I don't have to fiddle with PYTHONPATH, special package 
directories, etc.

> * A client with branched code, i.e., we've tweaked one instance of the 
> application just for them.

I don't know what to version such a branch as.  Maybe some version that 
goes before everything else, and use an explicit version requirement 
(==client_name_1.0)

> * Two installations of the same application, in the same process, with 
> different URLs and different configurations.  This might be something as 
> small as a formmail kind of script, or a large program.

Woops, same thing as before.  Anyway, I might use a pattern, like "/app" 
gets redirected in Apache to an application server that does further 
dispatching on URLs.  Or I might add specific rewriting and aliases to 
mount applications in their place.  These have to map to specific 
processes, maybe through some convention on port numbers, maybe 
filenames if I'm using something that talks over named sockets.

I guess potentially I could use an environmental variable to indicate 
which app I'm trying to point to (SetEnvIf style).  Or, rather, what 
configuration file I'm pointing to, since there's a many-to-one 
relationship between configuration files and applications.

> * Sometimes apps go into different processes, but often they can go into 
> the same process (especially if I start using Python WSGI for the kind 
> of seldom-used apps that I now use CGI for).

Deployment in these cases should be really light.  Definitely not a 
programming task.

> * I have to mount these applications at some location.  This should be 
> part of the deployment configuration; both path based and domain name based.
> 

... And then none of this configuration stuff is handled to my 
satisfaction...

> Here's some aspects of the configuration:
> 
> * Many applications have a lot of configuration.  Much of it is "just in 
> case" configuration that I'd never want to tweak.  Some of that 
> configuration may be derivative of things I do want to tweak, e.g., URL 
> layouts where I configure the base URL, but all the other URLs could be 
> derived from that.
> 
> * What appears to be an application from the outside might be composed 
> of many applications.  Maybe an app includes an external formmail app 
> for a "support" link.  That app requires configuration (like smtp server).
> 
> * I'd like to configure some things globally.  Like that smtp server. 
> Or an email address to send unexpected exceptions to.
> 
> * I might want to override configuration locally, like that email 
> address.  I might want to augment configuration, like just add an 
> address to the list, not reset the whole value.
> 
> * I'd like to install some middleware globally as well.  Like a session 
> handler, perhaps.  Or authentication.  Or an exception catcher -- I'd 
> like everyone to use my well-configured exception catcher.  So not only 
> am I adding middleware, I might be asking that middleware be excluded 
> (or should simply short-circuit itself).
> 
> * And of course, all my applications take configuration, separate from 
> middleware and frameworks.
> 
> * And usually there are non-WSGI pieces that need access to the exact 
> same configuration; scripts and cronjobs and whatnot.  Usually they just 
> need the application configuration, but nothing related to middleware or 
> the web.

From pje at telecommunity.com  Tue Jul 26 02:04:30 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 25 Jul 2005 20:04:30 -0400
Subject: [Web-SIG] Entry points and import maps (was
 Re:	Scarecrow	deployment config
In-Reply-To: <42E56F36.6040601@pythonweb.org>
References: <5.1.1.6.0.20050725180559.027bcfd8@mail.telecommunity.com>
	<5.1.1.6.0.20050725180559.027bcfd8@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20050725194017.027f36d8@mail.telecommunity.com>

[cc:'d to distutils-sig because this is mostly about cool uses for the new 
EntryPoint facility of setuptools/pkg_resources]

At 12:01 AM 7/26/2005 +0100, James Gardner wrote:
>Hi Phillip,
>
>>There's one other advantage: this format will hopefully become as 
>>successful as WSGI itself in adoption by servers and applications.
>>Hopefully within a year or so, *the* normal way to deploy a Python web 
>>app will be using a .wsgi file.
>>
>>Beyond that, we can hopefully begin to see "Python" rather than 
>>"framework X" as being what people write their web apps with.
>
>Well that would be absolutely wonderful but also looking fairly likely 
>which is great news. I've got to say a massive thank you for the eggs 
>format and easy install as well.. Python was really crying out for it and 
>it will be phenomenally useful. I've split all my code up as a result 
>because there is no need to worry about people having to install lots of 
>packages if it is all done automatically.
>
>One thought: I'd ideally like to be able to backup a WSGI deployment to 
>allow it to be easily redeployed on another server with a different 
>configuration or just restored in the event of data loss. This would 
>probably just involve making a zip file of all data files (including an 
>SQL dump) and then redistributing it with the .wsgi file. Have you had any 
>thoughts on how that could be achieved or is that something you wouldn't 
>want the .wsgi file to be used for? Whatever software installed the 
>dependencies of the .wsgi file would need to be aware of the data file and 
>what to do with it, perhaps simply by calling an install handler 
>somewhere? Umm, all getting a bit complicated but I just wondered if you 
>had had any thoughts of that nature?

Well, you could define another set of entry point groups, like 
"wsgi.middleware.backup_handlers", which would contain entry points with 
the same names as in middleware, that would get called with the same 
configuration arguments as the application factories, but would then do 
some kind of backing up.  Similarly, you could have an entry point group 
for restoration functions.  These would have to defined by the same egg as 
the one with the factory, of course, although perhaps we could make the 
entry point names be the entry point targets instead of using the original 
entry point names.  That additional level of indirection would let one egg 
define backup and restore services for another's factories.  Perhaps the 
backup functions would return the name of a directory tree to back up, and 
the restore functions would receive some kind of zipfile or archive.

Obviously that's a totally unbaked idea that would need a fair amount of 
thought, but there's nothing stopping anybody from fleshing it out as a 
tool and publishing a spec for the entry points it uses.


>Oh sorry, another quick question: Is there any work underway auto-document 
>eggs using some of the docutils code if an appropriate specifier is made 
>in the egg_info file saying the egg is in restructured text or similar? 
>Would that be something you would be willing to include as a feature of 
>easy_install or is it a bit too messy? I'd love to be able to distribute a 
>.wsgi file and have all the documentation for the downloaded modules auto 
>created. If only some of the modules supported it it would still be quite 
>handy.

I'm having a little trouble envisioning what you mean exactly.  All that's 
really coming across is the idea that "there's some way to generate 
documentation from eggs".  I'd certainly like to be able to see tools like 
epydoc or pydoc support generating documentation for an egg.  However, 
there's a fair amount of balkanization in how you specify inputs for Python 
documentation tools, not unlike the previous balkanization of web servers 
and web apps/frameworks.  Maybe somebody will come up with a similar lingua 
franca for documentation tools.

With respect to adding more cool features to setup(), I plan to add a 
couple of entry point groups to setuptools that would support what you have 
in mind, though.  There's already a distutils.commands group that allows 
you to register setup commands, but I also plan to add egg_info.writers and 
distutils.setup_args.  The setup_args entry points would have the name of a 
parameter you'd like setup() to have, and would be a function that would 
get called on the Distribution object during its finalize_options() (so you 
can validate the argument).  The egg_info.writers group will define entry 
points for writing metadata files as part of the egg_info command.

Last, but not least, I'll add a 'setup_requires' argument to setup() that 
will specify eggs that need to be present for the setup script to run.

With these three things in place, tools like the build_utils or py2exe and 
py2app won't have to monkeypatch the distutils in order to install 
themselves; they can instead just define entry points for setup() args and 
the new commands they add.  And for your documentation concept, this could 
include document-specification arguments and an egg_info.writers entry 
point to put it in the EGG-INFO.  Packages using the arguments would have 
to use 'setup_requires', however, to list the eggs needed to process those 
arguments.

My idea for stuff like this was mainly to support frameworks; for example 
if an application needs plugin metadata other than entry points, it can 
define an egg that extends setuptools with the necessary setup arguments 
and metadata writers.  Then, when you're building a plugin for the tool, 
you just setup_requires=["AppSetup"], where "AppSetup" is the egg with the 
setuptools extensions for "App".  (Most apps will want their setuptools 
extensions in a separate egg, because the app itself may need the same 
extensions in order to be built, which would lead to a hairy 
chicken-and-egg problem.  setuptools itself was a little tricky to 
bootstrap since it finds its own commands via entry points now!)


>Thanks for the answers anyway, the whole thing looks tremendously exciting!

That's because it is.  :)


From pje at telecommunity.com  Tue Jul 26 02:15:05 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 25 Jul 2005 20:15:05 -0400
Subject: [Web-SIG] WSGI deployment use case
In-Reply-To: <42E5785E.1040900@colorstudy.com>
Message-ID: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com>

At 06:40 PM 7/25/2005 -0500, Ian Bicking wrote:
>But configuration and composition of multiple independent applications
>into a single process isn't.  I don't think we can solve these
>separately, because the Hard Problem is how to handle configuration
>alongside composition.  How can I apply configuration to a set of
>applications?  How can I make exceptions?  How can an application
>consume configuration as well as delegate configuration to a
>subapplication?  The pipeline is often more like a tree, so the logic is
>a little complex.  Or, rather, there's actual *logic* in how
>configuration is applied, almost all of which are viable.

We probably need something like a "site map" configuration, that can handle 
tree structure, and can specify pipelines on a per location basis, 
including the ability to specify pipeline components to be applied above 
everything under a certain URL pattern.  This is more or less the same as 
my "container API" concept, but we are a little closer to being able to 
think about such a thing.

Of course, I still think it's something that can be added *after* having a 
basic deployment spec.


>I can figure out a bunch of ad hoc and formal ways of accomplishing this
>in Paste; most of it is already possible, and entry points alone clean
>up a lot of what's there (encouraging a separation between how an
>application is invoked generally, and install-specific configuration).
>But with a more limited and declarative configuration it is harder.

But the tradeoff is greater ability to build tools that operate on the 
configuration to do something -- like James Gardner's ideas about 
backup/restore and documentation tools.


>Also when configuration is pushed into factories as keyword arguments,
>instead of being pulled out of a dictionary, it is much harder -- the
>configuration becomes unhackable.

But a **kw argument *is* a dictionary, so I don't understand what you mean 
here.


From renesd at gmail.com  Tue Jul 26 02:34:10 2005
From: renesd at gmail.com (Rene Dudfield)
Date: Tue, 26 Jul 2005 10:34:10 +1000
Subject: [Web-SIG] file system configuration.
Message-ID: <64ddb72c050725173435527944@mail.gmail.com>

What about apache style configuration, that uses the file system.

It works quite well, and can be understood by all those people using
apache already.

You can have the main configuration done where-ever, but allowing
people to add in specific configuration at any part in the url by
simply adding a .htaccess file can make doing things really easy.


eg. here is a basic website url structure.

/
/admin
/images

I drop a config file into / to implement the website.

Now I drop a config file into /admin which uses some sort of auth scheme.

I can drop a .htaccess into images to do:
 1. gallery application, to make viewing of the images by thumbnmails easier.
   - configure the gallery application(thumbnail size, etc)
 2. do not allow linking to images from other sites(by checking referer tags).


By making another directory inside the directory structure you create
another path which can be configured by these config files.  If the
top level application made a members/ url, and you want to add auth to
it, you make a members/ directory and edit the config files in it.

From ianb at colorstudy.com  Tue Jul 26 03:29:34 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon, 25 Jul 2005 20:29:34 -0500
Subject: [Web-SIG] WSGI deployment use case
In-Reply-To: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com>
References: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com>
Message-ID: <42E591FE.5040209@colorstudy.com>

Phillip J. Eby wrote:
> At 06:40 PM 7/25/2005 -0500, Ian Bicking wrote:
> 
>> But configuration and composition of multiple independent applications
>> into a single process isn't.  I don't think we can solve these
>> separately, because the Hard Problem is how to handle configuration
>> alongside composition.  How can I apply configuration to a set of
>> applications?  How can I make exceptions?  How can an application
>> consume configuration as well as delegate configuration to a
>> subapplication?  The pipeline is often more like a tree, so the logic is
>> a little complex.  Or, rather, there's actual *logic* in how
>> configuration is applied, almost all of which are viable.
> 
> 
> We probably need something like a "site map" configuration, that can 
> handle tree structure, and can specify pipelines on a per location 
> basis, including the ability to specify pipeline components to be 
> applied above everything under a certain URL pattern.  This is more or 
> less the same as my "container API" concept, but we are a little closer 
> to being able to think about such a thing.

It could also be something based on general matching rules, with some 
notion of precedence and how the rule effects SCRIPT_NAME/PATH_INFO.  Or 
something like that.

> Of course, I still think it's something that can be added *after* having 
> a basic deployment spec.

I feel a very strong need that this be resolved before settling on 
anything deployment related.  Not necessarily as a standard, but 
possibly as a set of practices.  Even a realistic and concrete use case 
might be enough.

>> I can figure out a bunch of ad hoc and formal ways of accomplishing this
>> in Paste; most of it is already possible, and entry points alone clean
>> up a lot of what's there (encouraging a separation between how an
>> application is invoked generally, and install-specific configuration).
>> But with a more limited and declarative configuration it is harder.
> 
> 
> But the tradeoff is greater ability to build tools that operate on the 
> configuration to do something -- like James Gardner's ideas about 
> backup/restore and documentation tools.

I can see that.  But I know my way works, which is a bit of a bonus. 
And really it's entirely possible to inspect it as well.

>> Also when configuration is pushed into factories as keyword arguments,
>> instead of being pulled out of a dictionary, it is much harder -- the
>> configuration becomes unhackable.
> 
> 
> But a **kw argument *is* a dictionary, so I don't understand what you 
> mean here.

It's about how configuration is delegated to contained applications and 
middleware, and what's the expectation of what that configuration looks 
like.  I think components that don't take **kw will be hard to work with.

Right now Paste hands around a fairly flat dictionary.  This dictionary 
is passed around in full (as part of the WSGI environment) to every 
piece of middleware, and actually to everything (via an import and 
threadlocal storage).  It gets used all over the place, and the ability 
to draw in configuration without passing it around is very important.  I 
know it seems like heavy coupling, but in practice it causes unstable 
APIs if it is passed around explicitly, and as long as you keep clever 
dynamic values out of the configuration it isn't a problem.

Anyway, every piece gets the full dictionary, so if any piece expected a 
constrained set of keys it would break.  Even ignoring that there are 
multiple consumers with different keys that they pull out, it is common 
to create intermediate configuration values to make the configuration 
more abstract.  E.g., I set a "base_dir", then derive "publish_dir" and 
"template_dir" from that.  Apache configuration is a good anti-example 
here; its lack of variables hurts me daily.  While some variables could 
be declared "abstract" somehow, that adds complexity where the 
unconstrained model avoids that complexity.

When one piece delegates to another, it passes the entire dictionary 
through (by convention, and by the fact it gets passed around 
implicitly).  It is certainly possible in some circumstances that a 
filtered version of the configuration should be passed in; that hasn't 
happened to me yet, but I can certainly imagine it being necessary 
(especially when a larger amount of more diverse software is running in 
the same process).

One downside of this is that there's no protection from name conflicts. 
  Though name conflicts can go both ways.  The Happy Coincidence is when 
two pieces use the same name for the same purpose (e.g., it's highly 
likely "smtp_server" would be the subject of a Happy Coincidence).  An 
Unhappy Coincidence is when two pieces use the same value for different 
purposes ("publish_dir" perhaps).  An Expected Coincidence is when the 
same code, invoked in two separate call stacks, consumes the same value. 
  Of course, I allow configuration to be overwritten depending on the 
request, so high collision names (like publish_dir) in practice are 
unlikely to be a problem.

The upside over anything that expects structure in the configuration 
(e.g., that configuration be targetted at a specific component) is that 
I can hide implementation.  This is extremely important to me, because I 
have lots of pieces.  Some of them are clearly different components from 
the inside, some are vague and the distinction would be based entirely 
on my mood.  For instance an application-specific middleware that could 
plausibly be used more widely -- does it consume the application 
configuration, or does it take its own configuration?  But even 
excluding those ambiguous situations, the way my middleware is factored 
is an internal implementation detail, and I don't feel comfortable 
pushing that structure into the configuration.

So that's the issue I'm concerned about.


-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From pje at telecommunity.com  Tue Jul 26 04:04:30 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 25 Jul 2005 22:04:30 -0400
Subject: [Web-SIG] WSGI deployment use case
In-Reply-To: <42E591FE.5040209@colorstudy.com>
References: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com>
	<5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20050725214457.026ec148@mail.telecommunity.com>

At 08:29 PM 7/25/2005 -0500, Ian Bicking wrote:
>Right now Paste hands around a fairly flat dictionary.  This dictionary is 
>passed around in full (as part of the WSGI environment) to every piece of 
>middleware, and actually to everything (via an import and threadlocal 
>storage).  It gets used all over the place, and the ability to draw in 
>configuration without passing it around is very important.  I know it 
>seems like heavy coupling, but in practice it causes unstable APIs if it 
>is passed around explicitly, and as long as you keep clever dynamic values 
>out of the configuration it isn't a problem.
>
>Anyway, every piece gets the full dictionary, so if any piece expected a 
>constrained set of keys it would break.  Even ignoring that there are 
>multiple consumers with different keys that they pull out, it is common to 
>create intermediate configuration values to make the configuration more 
>abstract.  E.g., I set a "base_dir", then derive "publish_dir" and 
>"template_dir" from that.  Apache configuration is a good anti-example 
>here; its lack of variables hurts me daily.  While some variables could be 
>declared "abstract" somehow, that adds complexity where the unconstrained 
>model avoids that complexity.

*shudder* I think someone just walked over my grave.  ;)

I'd rather add complexity to the deployment format (e.g. variables, 
interpolation, etc.) to handle this sort of thing than add complexity to 
the components.  I also find it hard to understand why e.g. multiple 
components would need the same "template_dir".  Why isn't there a template 
service component, for example?


>When one piece delegates to another, it passes the entire dictionary 
>through (by convention, and by the fact it gets passed around 
>implicitly).  It is certainly possible in some circumstances that a 
>filtered version of the configuration should be passed in; that hasn't 
>happened to me yet, but I can certainly imagine it being necessary 
>(especially when a larger amount of more diverse software is running in 
>the same process).
>
>One downside of this is that there's no protection from name 
>conflicts.  Though name conflicts can go both ways.  The Happy Coincidence 
>is when two pieces use the same name for the same purpose (e.g., it's 
>highly likely "smtp_server" would be the subject of a Happy 
>Coincidence).  An Unhappy Coincidence is when two pieces use the same 
>value for different purposes ("publish_dir" perhaps).  An Expected 
>Coincidence is when the same code, invoked in two separate call stacks, 
>consumes the same value.  Of course, I allow configuration to be 
>overwritten depending on the request, so high collision names (like 
>publish_dir) in practice are unlikely to be a problem.

I think you've just explained why this approach doesn't scale very well, 
even to a large team, let alone to inter-organization collaboration (i.e. 
open source projects).


>   For instance an application-specific middleware that could plausibly be 
> used more widely -- does it consume the application configuration, or 
> does it take its own configuration?  But even excluding those ambiguous 
> situations, the way my middleware is factored is an internal 
> implementation detail, and I don't feel comfortable pushing that 
> structure into the configuration.

That's what encapsulation is for.  Just create a factory that takes a set 
of application-level parameters (like template_dir, publish_dir, etc.) and 
then *passes* them to the lower level components.

Heck, we could even add that to the .wsgi format...

    # app template file
    [WSGI options]
    parameters = "template_dir", "publish_dir", ...

    [filter1 from foo]
    some_param = template_dir

    [filter2 from bar]
    other_param = publish_dir


    # deployment file
    [use file "app_template.wsgi"]
    template_dir = "/some/where"
    publish_dir = "/another/place"


>So that's the issue I'm concerned about.

I think the right way to fix it is parameterization; that way you don't 
push a global (and non type-checkable) namespace down into each 
component.  Components should have an extremely minimal configuration with 
fairly specific parameters, because it makes early error checking easier, 
and you don't have to search all over the place to find how a parameter is 
used, etc., etc.


From chrism at plope.com  Tue Jul 26 04:11:00 2005
From: chrism at plope.com (Chris McDonough)
Date: Mon, 25 Jul 2005 22:11:00 -0400
Subject: [Web-SIG] WSGI deployment use case
In-Reply-To: <42E591FE.5040209@colorstudy.com>
References: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com>
	<42E591FE.5040209@colorstudy.com>
Message-ID: <1122343861.3898.91.camel@plope.dyndns.org>

On Mon, 2005-07-25 at 20:29 -0500, Ian Bicking wrote:
> > We probably need something like a "site map" configuration, that can 
> > handle tree structure, and can specify pipelines on a per location 
> > basis, including the ability to specify pipeline components to be 
> > applied above everything under a certain URL pattern.  This is more or 
> > less the same as my "container API" concept, but we are a little closer 
> > to being able to think about such a thing.
> 
> It could also be something based on general matching rules, with some 
> notion of precedence and how the rule effects SCRIPT_NAME/PATH_INFO.  Or 
> something like that.
How much of this could be solved by using a web server's
directory/alias-mapping facility?

For instance, if you needed a single Apache webserver to support
multiple pipelines based on URL mapping, wouldn't it be possible in many
cases to compose that out of things like rewrite rules and script
aliases (the below assumes running them just as CGI scripts, obviously
it would be different with something using mod_python or what-have-you):

<VirtualHost *:80>
 ServerAdmin webmaster at plope.com
 ServerName plope.com
 ServerAlias plope.com
 ScriptAlias /viewcvs "/home/chrism/viewcvs.wsgi"
 ScriptAlias /blog "/home/chrism/blog.wsgi"
 RewriteEngine On
 RewriteRule ^/[^/]viewcvs*$ /home/chrism/viewcvs.wsgi [PT]
 RewriteRule ^/[^/]blog*$ /home/chrism/blog.wsgi [PT]
</VirtualHost>

Obviously it would mean some repetition in "wsgi" files if you needed to
repeat parts of a pipeline for each URL mapping.  But it does mean we
wouldn't need to invent more software.


> 
> > Of course, I still think it's something that can be added *after* having 
> > a basic deployment spec.
> 
> I feel a very strong need that this be resolved before settling on 
> anything deployment related.  Not necessarily as a standard, but 
> possibly as a set of practices.  Even a realistic and concrete use case 
> might be enough.


I *think* more complicated use cases may revolve around attempting to
use middleware as services that dynamize the pipeline instead of as
"oblivious" things.  I don't think there's anything really wrong with
that but I also don't think it can ever be specified with as much
clarity as what we've already got because IMHO it's a programming task.

I'm repeating myself, I'm sure, but I'm more apt to put a "service
manager" piece of middleware in the pipeline (or maybe just implement it
as a library) which would allow my endpoint app to use it to do
sessioning and auth and whatnot.  I realize that is essentially
"building a framework" (which is reviled lately) but since the endpoint
app needs to collaborate anyway, I don't see a better way to do it
except to rely completely on convention for service lookup (which is
what you seem to be struggling with in the later bits of your post).

- C


From ianb at colorstudy.com  Tue Jul 26 04:54:01 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon, 25 Jul 2005 21:54:01 -0500
Subject: [Web-SIG] WSGI deployment use case
In-Reply-To: <5.1.1.6.0.20050725214457.026ec148@mail.telecommunity.com>
References: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com>
	<5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com>
	<5.1.1.6.0.20050725214457.026ec148@mail.telecommunity.com>
Message-ID: <42E5A5C9.2050408@colorstudy.com>

Phillip J. Eby wrote:
> At 08:29 PM 7/25/2005 -0500, Ian Bicking wrote:
> 
>> Right now Paste hands around a fairly flat dictionary.  This 
>> dictionary is passed around in full (as part of the WSGI environment) 
>> to every piece of middleware, and actually to everything (via an 
>> import and threadlocal storage).  It gets used all over the place, and 
>> the ability to draw in configuration without passing it around is very 
>> important.  I know it seems like heavy coupling, but in practice it 
>> causes unstable APIs if it is passed around explicitly, and as long as 
>> you keep clever dynamic values out of the configuration it isn't a 
>> problem.
>>
>> Anyway, every piece gets the full dictionary, so if any piece expected 
>> a constrained set of keys it would break.  Even ignoring that there 
>> are multiple consumers with different keys that they pull out, it is 
>> common to create intermediate configuration values to make the 
>> configuration more abstract.  E.g., I set a "base_dir", then derive 
>> "publish_dir" and "template_dir" from that.  Apache configuration is a 
>> good anti-example here; its lack of variables hurts me daily.  While 
>> some variables could be declared "abstract" somehow, that adds 
>> complexity where the unconstrained model avoids that complexity.
> 
> 
> *shudder* I think someone just walked over my grave.  ;)
> 
> I'd rather add complexity to the deployment format (e.g. variables, 
> interpolation, etc.) to handle this sort of thing than add complexity to 
> the components.  I also find it hard to understand why e.g. multiple 
> components would need the same "template_dir".  Why isn't there a 
> template service component, for example?

In that case, no, multiple components are unlikely to usefully share 
template_dir.  But that's not an issue I'm really hitting -- though it 
does start to add importance to the order in which configuration files 
are loaded.

>> When one piece delegates to another, it passes the entire dictionary 
>> through (by convention, and by the fact it gets passed around 
>> implicitly).  It is certainly possible in some circumstances that a 
>> filtered version of the configuration should be passed in; that hasn't 
>> happened to me yet, but I can certainly imagine it being necessary 
>> (especially when a larger amount of more diverse software is running 
>> in the same process).
>>
>> One downside of this is that there's no protection from name 
>> conflicts.  Though name conflicts can go both ways.  The Happy 
>> Coincidence is when two pieces use the same name for the same purpose 
>> (e.g., it's highly likely "smtp_server" would be the subject of a 
>> Happy Coincidence).  An Unhappy Coincidence is when two pieces use the 
>> same value for different purposes ("publish_dir" perhaps).  An 
>> Expected Coincidence is when the same code, invoked in two separate 
>> call stacks, consumes the same value.  Of course, I allow 
>> configuration to be overwritten depending on the request, so high 
>> collision names (like publish_dir) in practice are unlikely to be a 
>> problem.
> 
> 
> I think you've just explained why this approach doesn't scale very well, 
> even to a large team, let alone to inter-organization collaboration 
> (i.e. open source projects).

I admit there's problems.  On the other hand, it's a similar problem as 
the fact that attributes on objects don't have namespaces.  It causes 
problems, but those problems aren't so bad in practice.

If you can offer something where configuration can be applied to a set 
of components without exposing the internal structure of those 
components, and without the frontend copying each piece destined for an 
internal application explicitly, then great.  I'm not closed to other 
ideas, but I'm not happy putting it off either.  Back when I started up 
this WSGI thread, it was about just this issue, so it's one of the 
things I'm fairly concerned about.

Unlike deployment, this issue of configuration touches all of my code. 
So I'm happier putting off deployment, which though it is suboptimal 
currently, I suspect my code will be forward-compatible to without great 
effort.

>>   For instance an application-specific middleware that could plausibly 
>> be used more widely -- does it consume the application configuration, 
>> or does it take its own configuration?  But even excluding those 
>> ambiguous situations, the way my middleware is factored is an internal 
>> implementation detail, and I don't feel comfortable pushing that 
>> structure into the configuration.
> 
> 
> That's what encapsulation is for.  Just create a factory that takes a 
> set of application-level parameters (like template_dir, publish_dir, 
> etc.) and then *passes* them to the lower level components.
> 
> Heck, we could even add that to the .wsgi format...
> 
>    # app template file
>    [WSGI options]
>    parameters = "template_dir", "publish_dir", ...
> 
>    [filter1 from foo]
>    some_param = template_dir
> 
>    [filter2 from bar]
>    other_param = publish_dir
> 
> 
>    # deployment file
>    [use file "app_template.wsgi"]
>    template_dir = "/some/where"
>    publish_dir = "/another/place"

I'm not clear exactly what you are proposing.  Let's use a more 
realistic example.  Components:

* Exception catcher.  Takes "email_errors", which is a list of addresses 
to email exceptions to.  I want to apply this globally.

* An application mounted on /, which takes "document_root" and serves up 
those files directly.

* An application mounted at /blog, takes "database" (a string) where all 
its information is kept.

* An application mounted at /admin.  Takes "document_root", which is 
where the editable files are located.  Around it goes two pieces of 
middleware...

* A authentication middleware, which takes "database", which is where 
user information is kept.  And...

* An authorization middleware, that takes "allowed_roles", and checks it 
against what the authentication middleware puts in.

How would I configure that?

>> So that's the issue I'm concerned about.
> 
> 
> I think the right way to fix it is parameterization; that way you don't 
> push a global (and non type-checkable) namespace down into each 
> component.  Components should have an extremely minimal configuration 
> with fairly specific parameters, because it makes early error checking 
> easier, and you don't have to search all over the place to find how a 
> parameter is used, etc., etc.

If we define schemas for the configuration that components take, that's 
fine with me.  I don't mind being explicit in the design of the 
components.  I just don't want to push all the internal structure into 
the deployment file, and I don't want changes to the design of a 
component to effect the design of anything that might wrap that component.


-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From ianb at colorstudy.com  Tue Jul 26 05:01:53 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon, 25 Jul 2005 22:01:53 -0500
Subject: [Web-SIG] WSGI deployment use case
In-Reply-To: <1122343861.3898.91.camel@plope.dyndns.org>
References: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com>	
	<42E591FE.5040209@colorstudy.com>
	<1122343861.3898.91.camel@plope.dyndns.org>
Message-ID: <42E5A7A1.3030004@colorstudy.com>

Chris McDonough wrote:
> How much of this could be solved by using a web server's
> directory/alias-mapping facility?
> 
> For instance, if you needed a single Apache webserver to support
> multiple pipelines based on URL mapping, wouldn't it be possible in many
> cases to compose that out of things like rewrite rules and script
> aliases (the below assumes running them just as CGI scripts, obviously
> it would be different with something using mod_python or what-have-you):
> 
> <VirtualHost *:80>
>  ServerAdmin webmaster at plope.com
>  ServerName plope.com
>  ServerAlias plope.com
>  ScriptAlias /viewcvs "/home/chrism/viewcvs.wsgi"
>  ScriptAlias /blog "/home/chrism/blog.wsgi"
>  RewriteEngine On
>  RewriteRule ^/[^/]viewcvs*$ /home/chrism/viewcvs.wsgi [PT]
>  RewriteRule ^/[^/]blog*$ /home/chrism/blog.wsgi [PT]
> </VirtualHost>
> 
> Obviously it would mean some repetition in "wsgi" files if you needed to
> repeat parts of a pipeline for each URL mapping.  But it does mean we
> wouldn't need to invent more software.

No, we already have templating languages to generate those configuration 
files so it's no problem ;)

Messy configuration files (and RewriteRule for that matter) are my bane.

To be fair, in a shared hosting situation (websites maintained by 
customers, not the host) this would seem more workable than a 
centralized configuration.  Perhaps... it's not the kind of situation I 
deal with much anymore, so I've lost touch with that case.  And would 
that mean we'd start seeing ".wsgi" in URLs?  Hrm.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From chrism at plope.com  Tue Jul 26 05:19:28 2005
From: chrism at plope.com (Chris McDonough)
Date: Mon, 25 Jul 2005 23:19:28 -0400
Subject: [Web-SIG] WSGI deployment use case
In-Reply-To: <42E5A7A1.3030004@colorstudy.com>
References: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com>
	<42E591FE.5040209@colorstudy.com>
	<1122343861.3898.91.camel@plope.dyndns.org>
	<42E5A7A1.3030004@colorstudy.com>
Message-ID: <1122347969.3898.99.camel@plope.dyndns.org>

On Mon, 2005-07-25 at 22:01 -0500, Ian Bicking wrote:
> > <VirtualHost *:80>
> >  ServerAdmin webmaster at plope.com
> >  ServerName plope.com
> >  ServerAlias plope.com
> >  ScriptAlias /viewcvs "/home/chrism/viewcvs.wsgi"
> >  ScriptAlias /blog "/home/chrism/blog.wsgi"
> >  RewriteEngine On
> >  RewriteRule ^/[^/]viewcvs*$ /home/chrism/viewcvs.wsgi [PT]
> >  RewriteRule ^/[^/]blog*$ /home/chrism/blog.wsgi [PT]
> > </VirtualHost>

> Messy configuration files (and RewriteRule for that matter) are my bane.

I agree.  In fact, I stole that snippet from my own server and modified
it.  It would probably do *something* but to be honest I'm not even sure
I remember exactly what. ;-)  But there's always the docs to fall back
on...

> To be fair, in a shared hosting situation (websites maintained by 
> customers, not the host) this would seem more workable than a 
> centralized configuration.  Perhaps... it's not the kind of situation I 
> deal with much anymore, so I've lost touch with that case.  And would 
> that mean we'd start seeing ".wsgi" in URLs?  Hrm.

No, I think I just remembered... that's what the RewriteRules are
for! ;-)

- C


From chrism at plope.com  Tue Jul 26 07:09:09 2005
From: chrism at plope.com (Chris McDonough)
Date: Tue, 26 Jul 2005 01:09:09 -0400
Subject: [Web-SIG] WSGI deployment use case
In-Reply-To: <42E5A5C9.2050408@colorstudy.com>
References: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com>
	<5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com>
	<5.1.1.6.0.20050725214457.026ec148@mail.telecommunity.com>
	<42E5A5C9.2050408@colorstudy.com>
Message-ID: <1122354549.3898.126.camel@plope.dyndns.org>

Just for a frame of reference, I'll say how I might do these things.
These all assume I'd use Apache and mod_python, for better or worse:

> I'm not clear exactly what you are proposing.  Let's use a more 
> realistic example.  Components:
> 
> * Exception catcher.  Takes "email_errors", which is a list of addresses 
> to email exceptions to.  I want to apply this globally.

I'd likely do this in my endpoint apps (maybe share some sort of library
between them to do it).  Errors that occur in middleware would be
diagnosable/detectable via mod_python's error logging facility and
something like snort.

> * An application mounted on /, which takes "document_root" and serves up 
> those files directly.

Use the webserver.

> * An application mounted at /blog, takes "database" (a string) where all 
> its information is kept.

Separate WSGI pipeline descriptor with rewrite rules or whatever
aliasing "/blog" to it.

> * An application mounted at /admin.  Takes "document_root", which is 
> where the editable files are located.  Around it goes two pieces of 
> middleware...

Same as above...

> * A authentication middleware, which takes "database", which is where 
> user information is kept.  And...

I'd probably make this into a service that would be consumable by
applications with a completely separate configuration outside of any
deployment spec.  For example, I might try to pull Zope's "Pluggable
Authentication Utility" out of Zope 3, leaving intact its
configurability through ZCML.

But if I did put it in middleware, I'd put it in each of my application
pipelines (implied by /blog, /admin) in an appropriate place.

> * An authorization middleware, that takes "allowed_roles", and checks it 
> against what the authentication middleware puts in.

This one I know wouldn't make into middleware.  Instead, I'd use a
library much like the thing I proposed as "decsec" (although at the time
I wrote that proposal, I did think it would be middleware; I changed my
mind).

- C


From ianb at colorstudy.com  Tue Jul 26 08:18:40 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue, 26 Jul 2005 01:18:40 -0500
Subject: [Web-SIG] WSGI deployment use case
In-Reply-To: <1122354549.3898.126.camel@plope.dyndns.org>
References: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com>	
	<5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com>	
	<5.1.1.6.0.20050725214457.026ec148@mail.telecommunity.com>	
	<42E5A5C9.2050408@colorstudy.com>
	<1122354549.3898.126.camel@plope.dyndns.org>
Message-ID: <42E5D5C0.9080102@colorstudy.com>

Well, the stack is really just an example, meant to be more realistic 
than "sample1" and "sample2".  I actually think it's a very reasonable 
example, but that's not really the point.  Presuming this stack, how 
would you configure it?


Chris McDonough wrote:
> Just for a frame of reference, I'll say how I might do these things.
> These all assume I'd use Apache and mod_python, for better or worse:
> 
> 
>>I'm not clear exactly what you are proposing.  Let's use a more 
>>realistic example.  Components:
>>
>>* Exception catcher.  Takes "email_errors", which is a list of addresses 
>>to email exceptions to.  I want to apply this globally.
> 
> 
> I'd likely do this in my endpoint apps (maybe share some sort of library
> between them to do it).  Errors that occur in middleware would be
> diagnosable/detectable via mod_python's error logging facility and
> something like snort.
> 
> 
>>* An application mounted on /, which takes "document_root" and serves up 
>>those files directly.
> 
> 
> Use the webserver.
> 
> 
>>* An application mounted at /blog, takes "database" (a string) where all 
>>its information is kept.
> 
> 
> Separate WSGI pipeline descriptor with rewrite rules or whatever
> aliasing "/blog" to it.
> 
> 
>>* An application mounted at /admin.  Takes "document_root", which is 
>>where the editable files are located.  Around it goes two pieces of 
>>middleware...
> 
> 
> Same as above...
> 
> 
>>* A authentication middleware, which takes "database", which is where 
>>user information is kept.  And...
> 
> 
> I'd probably make this into a service that would be consumable by
> applications with a completely separate configuration outside of any
> deployment spec.  For example, I might try to pull Zope's "Pluggable
> Authentication Utility" out of Zope 3, leaving intact its
> configurability through ZCML.
> 
> But if I did put it in middleware, I'd put it in each of my application
> pipelines (implied by /blog, /admin) in an appropriate place.
> 
> 
>>* An authorization middleware, that takes "allowed_roles", and checks it 
>>against what the authentication middleware puts in.
> 
> 
> This one I know wouldn't make into middleware.  Instead, I'd use a
> library much like the thing I proposed as "decsec" (although at the time
> I wrote that proposal, I did think it would be middleware; I changed my
> mind).

From chrism at plope.com  Tue Jul 26 09:55:27 2005
From: chrism at plope.com (Chris McDonough)
Date: Tue, 26 Jul 2005 03:55:27 -0400
Subject: [Web-SIG] WSGI deployment use case
In-Reply-To: <42E5D5C0.9080102@colorstudy.com>
References: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com>
	<5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com>
	<5.1.1.6.0.20050725214457.026ec148@mail.telecommunity.com>
	<42E5A5C9.2050408@colorstudy.com>
	<1122354549.3898.126.camel@plope.dyndns.org>
	<42E5D5C0.9080102@colorstudy.com>
Message-ID: <1122364528.3898.148.camel@plope.dyndns.org>

On Tue, 2005-07-26 at 01:18 -0500, Ian Bicking wrote:
> Well, the stack is really just an example, meant to be more realistic 
> than "sample1" and "sample2".  I actually think it's a very reasonable 
> example, but that's not really the point.  Presuming this stack, how 
> would you configure it?

I typically roll out software to clients using a build mechanism (I
happens to use "pymake" at http://www.plope.com/software/pymake/ but
anything dependency-based works).

I write "generic" build scripts for all of the software components.  For
example, I might write makefiles that check out and build python,
openldap, mysql and so on (each into a "non-system" location).  I leave
a bit of room for customization in their build definitions that I can
override from within a "profile".  A "profile" is a set of customized
software builds for a specific purpose.

I might have, maybe, 3 different profiles for each customer where the
profile usually works out to be tied to machine function (load balancer,
app server, database server).  I mantain these build scripts and the
profiles in CVS for each customer.  I never install anything by hand, I
always change the buildout and rerun it if I need to get something set
up.

This usually works out pretty well because to roll out a new major
version of software, I just rerun the build scripts for a particular
profile and move the data over.  Usually the only thing that needs to
change frequently are a few bits of software that are checked out of
version control, so doing "cvs up" on those bits typically gets me where
I need to be unless it's a major revision.

So in this case, I'd likely write a build that either built Apache from
source or at least created an "httpd-includes" file meant to be
referenced from within the "system" Apache config file with the proper
stuff in it given the profile's purpose.  The build would also download
and install Python, it would get the the proper eggs and/or Python
software and the database, and so forth.  All the configuration would be
done via the "profile" which is in version control.

I don't know if this kind of thing works for everybody, but it has
worked well for me so far.  I do this all the time, and I have a good
library of buildout scripts already so it's less painful for me than it
might be for someone who is starting from scratch.  That said, it is
time-consuming and imperfect... upgrades are the most painful.  New
installs are simple, though.

So, anyway, the short answer is "I write a script to do the config for
me so I can repeat it on demand".

- C


From ianb at colorstudy.com  Thu Jul 28 18:09:40 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu, 28 Jul 2005 11:09:40 -0500
Subject: [Web-SIG] JS libs: MochiKit
Message-ID: <42E90344.1080300@colorstudy.com>

Since there was a bunch of interest in Javascript libraries here before, 
for anyone who hasn't seen it I thought I'd note MochiKit:

   http://mochikit.com/

It's a fairly recent entrant from our own Bob Ippolito 
(http://bob.pythonmac.org/).  Tests, docs, and a bit of the flavor of 
Python.  Just a bit, really -- there's only so much you can do in 
Javascript.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

From brenocon at gmail.com  Thu Jul 28 19:22:09 2005
From: brenocon at gmail.com (Brendan O'Connor)
Date: Thu, 28 Jul 2005 10:22:09 -0700
Subject: [Web-SIG] JS libs: MochiKit
In-Reply-To: <42E90344.1080300@colorstudy.com>
References: <42E90344.1080300@colorstudy.com>
Message-ID: <op.sumvu7aprxfzpn@localhost.localdomain>

here's another: http://prototype.conio.net/

not as much documentation, not from a python-er, but I've heard it's  
pretty useful.

Brendan


On Thu, 28 Jul 2005 09:09:40 -0700, Ian Bicking <ianb at colorstudy.com>  
wrote:

> Since there was a bunch of interest in Javascript libraries here before,
> for anyone who hasn't seen it I thought I'd note MochiKit:
>
>    http://mochikit.com/
>
> It's a fairly recent entrant from our own Bob Ippolito
> (http://bob.pythonmac.org/).  Tests, docs, and a bit of the flavor of
> Python.  Just a bit, really -- there's only so much you can do in
> Javascript.
>


-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

From dangoor at gmail.com  Thu Jul 28 19:27:59 2005
From: dangoor at gmail.com (Kevin Dangoor)
Date: Thu, 28 Jul 2005 13:27:59 -0400
Subject: [Web-SIG] JS libs: MochiKit
In-Reply-To: <op.sumvu7aprxfzpn@localhost.localdomain>
References: <42E90344.1080300@colorstudy.com>
	<op.sumvu7aprxfzpn@localhost.localdomain>
Message-ID: <3f085ecd0507281027289fce8@mail.gmail.com>

Prototype does its business by mucking with Object.prototype, which
many people think is a no-no (it breaks certain things that you might
reasonably expect to work).

Prototype has gained a fair bit of acceptance because of Rails. So
much so that MochiKit even includes the $(elementid) shorthand for
document.getElementById(elementid) that Prototype has popularized.

If no one beats me to it, I'm hoping to port some of the visual
goodies from script.aculo.us and Rico, both of which are based on
Prototype.

Kevin

On 7/28/05, Brendan O'Connor <brenocon at gmail.com> wrote:
> here's another: http://prototype.conio.net/
> 
> not as much documentation, not from a python-er, but I've heard it's
> pretty useful.

From jonathan at carnageblender.com  Thu Jul 28 19:30:42 2005
From: jonathan at carnageblender.com (Jonathan Ellis)
Date: Thu, 28 Jul 2005 10:30:42 -0700
Subject: [Web-SIG] JS libs: MochiKit
In-Reply-To: <op.sumvu7aprxfzpn@localhost.localdomain>
References: <42E90344.1080300@colorstudy.com>
	<op.sumvu7aprxfzpn@localhost.localdomain>
Message-ID: <1122571842.19870.239456015@webmail.messagingengine.com>

On Thu, 28 Jul 2005 10:22:09 -0700, "Brendan O'Connor"
<brenocon at gmail.com> said:
> here's another: http://prototype.conio.net/
> 
> not as much documentation, not from a python-er, but I've heard it's  
> pretty useful.

Heh.

I don't think Bob would appreciate mochikit being mentioned in the same
breath as prototype. :)

http://bob.pythonmac.org/archives/2005/07/01/javascript-frameworks/

-Jonathan

From ianb at colorstudy.com  Fri Jul 29 00:40:04 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu, 28 Jul 2005 17:40:04 -0500
Subject: [Web-SIG] WSGI deployment: an experiment
Message-ID: <42E95EC4.9040906@colorstudy.com>

I've created a branch in Paste with a rough experiment in WSGI 
deployment, declarative but (I think) more general than what's been 
discussed.  The branch is at:

http://svn.pythonpaste.org/Paste/branches/wsgi-deployment-experiment/

All the specific modules for this stuff are in wsgi_*; wsgi_deploy.py 
being the main one.

And an application that is runnable with it is at:

http://svn.pythonpaste.org/Paste/apps/FileBrowser/trunk/

It's experimental.  It's far too bound to ConfigParser.  Maybe it's too 
closely bound to .ini files in general.  It doesn't handle multiple 
files or file references well at all.  Actually, not just not well, but 
just not at all.  But I think it's fairly simple and usable as a proof 
of concept.


And here's the deployment file, with some comments added:

# This is a special section for the server.  Probably it should
# just be named "server", but eh.  This is for when you use
# paste.wsgi_deploy.make_deployment -- you can also create an
# application from this file without serving it; it just happens
# to be that you can put both application sections and a server
# section in the same file without clashing...
[server:main]
# use: does pkg_resources.load_entry_point(spec, 'type...', name)
# you can also use "factory" to avoid eggishness.
# servers have a type of wsgi.server_factory00
# applications have a type of wsgi.app_factory00
# filters (aka middleware) have a type of wsgi.filter_factory00
use: Paste wsgiutils
port: 8080
host: 127.0.0.1

# "main" is the application that is loaded when this file is
# loaded.
[application: main]
# This is an application factory.  The application factory is passed
# app_factory(this_configparser_object, this_section), and returns
# the application.  In this case the pipeline factory will use other
# sections in the config file to compose middleware.
use: Paste pipeline
# These each refer to sections; the last item is an application, the
# others are filters.
pipeline: printdebug urlmap

# Here's that filter.
[filter: printdebug]
use: Paste printdebug

# This isn't a filter, even though it dispatches, because it doesn't
# dispatch to a single application.
[application: urlmap]
use: Paste urlmap
# Path like things are used to map to other named applications.
# In this case nothing is mapped to /, so you'll get a 404 unless
# you go to one of these paths.  But something could be mapped to /,
# of course.
/home = fb1
/other = fb2

# This is the first real application.
[application: fb1]
use: FileBrowser app

# This is a configuration parameter that is passed to the application.
# The actual passing happens in wsgi_deploy.make_paste_app, which
# is invoked by the 'app' entry point.  It uses the paste convention
# of a flat configuration.
browse_path = /home/ianb

# And the same app, but with different configuration.  Of course
# the pipeline app could also be used, or whatever.  Ideally it
# should be easier to point to other files, not just other sections.
[application: fb2]
use: FileBrowser app

browse_path = /home/rflosi


-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

From ianb at colorstudy.com  Fri Jul 29 02:44:07 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu, 28 Jul 2005 19:44:07 -0500
Subject: [Web-SIG] WSGI deployment: an experiment
In-Reply-To: <42E95EC4.9040906@colorstudy.com>
References: <42E95EC4.9040906@colorstudy.com>
Message-ID: <42E97BD7.2020807@colorstudy.com>

Ian Bicking wrote:
> It's experimental.  It's far too bound to ConfigParser.  Maybe it's too 
> closely bound to .ini files in general.  It doesn't handle multiple 
> files or file references well at all.  Actually, not just not well, but 
> just not at all.  But I think it's fairly simple and usable as a proof 
> of concept.

I realize the code really wants is a couple callbacks into the 
configuration.  Applications should be able to construct other 
applications, and applications should be able to read the other 
variables in their section.  Where I'm passing around 
(config_parser_instance, section_name), I should be passing around 
(config_context, section_data), and config_context would be an object 
that could build other applications based on name (or filename).

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org

From renesd at gmail.com  Fri Jul 29 03:14:56 2005
From: renesd at gmail.com (Rene Dudfield)
Date: Fri, 29 Jul 2005 11:14:56 +1000
Subject: [Web-SIG] WSGI deployment: an experiment
In-Reply-To: <42E95EC4.9040906@colorstudy.com>
References: <42E95EC4.9040906@colorstudy.com>
Message-ID: <64ddb72c05072818144e70ae16@mail.gmail.com>

Hey,

There is a lot of terminology here that would not be understood by
some random sys admin coming to have a look at the config file.

Below I pasted it here without the comments.  Sometimes it is good to
have a look at things without comments to see how readable they are.

Is this config file reusable?  Can I place it in a path of other apps,
and then it could live in say /app2 instead of at / Can it not care
about the server it is running on?


[server:main]
use: Paste wsgiutils
port: 8080
host: 127.0.0.1

[application: main]
use: Paste pipeline
pipeline: printdebug urlmap

[filter: printdebug]
use: Paste printdebug

[application: urlmap]
use: Paste urlmap
/home = fb1
/other = fb2

[application: fb1]
use: FileBrowser app
browse_path = /home/ianb

[application: fb2]
use: FileBrowser app
browse_path = /home/rflosi


On 7/29/05, Ian Bicking <ianb at colorstudy.com> wrote:
> I've created a branch in Paste with a rough experiment in WSGI
> deployment, declarative but (I think) more general than what's been
> discussed.  The branch is at:
> 
> http://svn.pythonpaste.org/Paste/branches/wsgi-deployment-experiment/
> 
> All the specific modules for this stuff are in wsgi_*; wsgi_deploy.py
> being the main one.
> 
> And an application that is runnable with it is at:
> 
> http://svn.pythonpaste.org/Paste/apps/FileBrowser/trunk/
> 
> It's experimental.  It's far too bound to ConfigParser.  Maybe it's too
> closely bound to .ini files in general.  It doesn't handle multiple
> files or file references well at all.  Actually, not just not well, but
> just not at all.  But I think it's fairly simple and usable as a proof
> of concept.
> 
> 
> And here's the deployment file, with some comments added:
> 
> # This is a special section for the server.  Probably it should
> # just be named "server", but eh.  This is for when you use
> # paste.wsgi_deploy.make_deployment -- you can also create an
> # application from this file without serving it; it just happens
> # to be that you can put both application sections and a server
> # section in the same file without clashing...
> [server:main]
> # use: does pkg_resources.load_entry_point(spec, 'type...', name)
> # you can also use "factory" to avoid eggishness.
> # servers have a type of wsgi.server_factory00
> # applications have a type of wsgi.app_factory00
> # filters (aka middleware) have a type of wsgi.filter_factory00
> use: Paste wsgiutils
> port: 8080
> host: 127.0.0.1
> 
> # "main" is the application that is loaded when this file is
> # loaded.
> [application: main]
> # This is an application factory.  The application factory is passed
> # app_factory(this_configparser_object, this_section), and returns
> # the application.  In this case the pipeline factory will use other
> # sections in the config file to compose middleware.
> use: Paste pipeline
> # These each refer to sections; the last item is an application, the
> # others are filters.
> pipeline: printdebug urlmap
> 
> # Here's that filter.
> [filter: printdebug]
> use: Paste printdebug
> 
> # This isn't a filter, even though it dispatches, because it doesn't
> # dispatch to a single application.
> [application: urlmap]
> use: Paste urlmap
> # Path like things are used to map to other named applications.
> # In this case nothing is mapped to /, so you'll get a 404 unless
> # you go to one of these paths.  But something could be mapped to /,
> # of course.
> /home = fb1
> /other = fb2
> 
> # This is the first real application.
> [application: fb1]
> use: FileBrowser app
> 
> # This is a configuration parameter that is passed to the application.
> # The actual passing happens in wsgi_deploy.make_paste_app, which
> # is invoked by the 'app' entry point.  It uses the paste convention
> # of a flat configuration.
> browse_path = /home/ianb
> 
> # And the same app, but with different configuration.  Of course
> # the pipeline app could also be used, or whatever.  Ideally it
> # should be easier to point to other files, not just other sections.
> [application: fb2]
> use: FileBrowser app
> 
> browse_path = /home/rflosi
> 
> 
> 
> --
> Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/renesd%40gmail.com
>

From ianb at colorstudy.com  Fri Jul 29 06:12:03 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu, 28 Jul 2005 23:12:03 -0500
Subject: [Web-SIG] WSGI deployment: an experiment
In-Reply-To: <64ddb72c05072818144e70ae16@mail.gmail.com>
References: <42E95EC4.9040906@colorstudy.com>
	<64ddb72c05072818144e70ae16@mail.gmail.com>
Message-ID: <2c144e0434fdb717f97f0cba0ea1c210@colorstudy.com>

On Jul 28, 2005, at 8:14 PM, Rene Dudfield wrote:
> There is a lot of terminology here that would not be understood by
> some random sys admin coming to have a look at the config file.

Yeah... I don't know.  I suppose if it looked like Apache it would feel 
more natural.  The "use" stuff is, IMHO, has simple as it can be made.  
The application configuration ("browse_path") is pretty much free-form. 
  So something like urlmap could have been like:

<Location "/home">
   Application FileBrowser app
</Location>

It's more special-case than I like, but maybe that's okay.  This would 
imply something ZConfig-based.  But still, there's no magic bullet for 
configuration, there's always something new to figure out, so IMHO the 
usability is more about error handling and the like.

> Below I pasted it here without the comments.  Sometimes it is good to
> have a look at things without comments to see how readable they are.
>
> Is this config file reusable?  Can I place it in a path of other apps,
> and then it could live in say /app2 instead of at / Can it not care
> about the server it is running on?

The application that this configuration file describes can be mounted 
anywhere; so you could reference it from another configuration file 
which put the whole batch at /app2.  What it doesn't do yet (but 
wouldn't be hard) would be something like:

   [application: urlmap]
   use: Paste urlmap
   /webmail: config_file.ini

Then that configuration file could in turn have another urlmap entry, 
dispatching to yet more applications.  urlmap incidentally supports 
virtual hosts as well as path dispatching, so you could do:

   http://foobar.com = foobar.ini

Well... that's where .ini syntax fails (the ":"), but we'll ignore 
that...


The server doesn't matter.  It just happens that the server and 
application configuration live in differently-named sections which 
don't clash, so they can go in the same configuration file.  You could 
have the files separate just as easily.  Not that my script supports 
that, but since it's a three-line frontend at this point...

> 
> [server:main]
> use: Paste wsgiutils
> port: 8080
> host: 127.0.0.1
>
> [application: main]
> use: Paste pipeline
> pipeline: printdebug urlmap
>
> [filter: printdebug]
> use: Paste printdebug
>
> [application: urlmap]
> use: Paste urlmap
> /home = fb1
> /other = fb2
>
> [application: fb1]
> use: FileBrowser app
> browse_path = /home/ianb
>
> [application: fb2]
> use: FileBrowser app
> browse_path = /home/rflosi

--
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org


From ianb at colorstudy.com  Fri Jul 29 18:22:42 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri, 29 Jul 2005 11:22:42 -0500
Subject: [Web-SIG] WSGI deployment: an experiment
In-Reply-To: <42E95EC4.9040906@colorstudy.com>
References: <42E95EC4.9040906@colorstudy.com>
Message-ID: <42EA57D2.1060902@colorstudy.com>

Ian Bicking wrote:
> I've created a branch in Paste with a rough experiment in WSGI 
> deployment, declarative but (I think) more general than what's been 
> discussed.  The branch is at:
> 
> http://svn.pythonpaste.org/Paste/branches/wsgi-deployment-experiment/

I've updated the implementation, taking ConfigParser out of the public 
interface, and cleaning things up some.  The config files stay the same 
(though now you can reference external files with file:, where you would 
have referenced other sections), but since the Python side is cleaned up 
here's an example of how the pipeline construct is implemented:

def make_pipeline(context):
     pipeline = context.app_config.get('pipeline', '').split()
     filters = pipeline[:-1]
     filters.reverse()
     app_name = pipeline[-1]
     deploy = context.deployment_config
     app = deploy.make_app(app_name)
     for filter_name in filters:
         wsgi_filter = deploy.make_filter(filter_name)
         app = wsgi_filter(app)
     return app

The context object has a reference both to the local configuration 
values (context.app_config), and the larger configuration file 
(context.deployment_config).


-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org