From deelan at interplanet.it Mon Jul 4 11:10:55 2005 From: deelan at interplanet.it (deelan) Date: Mon, 04 Jul 2005 11:10:55 +0200 Subject: [Web-SIG] CSS selector parsing In-Reply-To: <5b024817050602104429fd8668@mail.gmail.com> References: <5b024817050602104429fd8668@mail.gmail.com> Message-ID: <42C8FD1F.1080707@interplanet.it> Sanghyeon Seo wrote: > Hello, I am new here. > > Web SIG charter says: "HTML and XML parsing are pretty solid, but a > critical lack on the client side is the lack of a CSS parser." > > Is there any progress on a CSS parser? Any prior art? for the record, i've just noticed this: "cssutils - CSS Cascading Style Sheets library for Python" From ianb at colorstudy.com Mon Jul 11 20:57:43 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 11 Jul 2005 13:57:43 -0500 Subject: [Web-SIG] Standardized configuration Message-ID: <42D2C127.5060706@colorstudy.com> Lately I've been thinking about the role of Paste and WSGI and whatnot. Much of what makes a Paste component Pastey is configuration; otherwise the bits are just independent pieces of middleware, WSGI applications, etc. So, potentially if we can agree on configuration, we can start using each other's middleware more usefully. I think we should avoid questions of configuration file syntax for now. Lets instead simply consider configuration consumers. A standard would consist of: * A WSGI environment key (e.g., 'webapp01.config') * A standard for what goes in that key (e.g., a dictionary object) * A reference implementation of the middleware * Maybe a non-WSGI-environment way to access the configuration (like paste.CONFIG, which is a global object that dispatches to per-request configuration objects) -- in practice this is really really useful, as you don't have to pass the configuration object around. There's some other things we have to consider, as configuration syntaxes do effect the configuration objects significantly. So, the standard for what goes in the key has to take into consideration some possible configuration syntaxes. The obvious starting place is a dictionary-like object. I would suggest that the keys should be valid Python identifiers. Not all syntaxes require this, but some do. This restriction simply means that configuration consumers should try to consume Python identifiers. There's also a question about name conflicts (two consumers that are looking for the same key), and whether nested configuration should be preferred, and in what style. Note that the standard we decide on here doesn't have to be the only way the object can be accessed. For instance, you could make your configuration available through 'myframework.config', and create a compliant wrapper that lives in 'webapp01.config', perhaps even doing different kinds of mapping to fix convention differences. There's also a question about what types of objects we can expect in the configuration. Some input styles (e.g., INI and command line) only produce strings. I think consumers should treat strings (or maybe a special string subclass) specially, performing conversions as necessary (e.g., 'yes'->True). Thoughts? -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From chrism at plope.com Sun Jul 17 05:37:35 2005 From: chrism at plope.com (Chris McDonough) Date: Sat, 16 Jul 2005 23:37:35 -0400 Subject: [Web-SIG] Standardized configuration Message-ID: <1121571455.24386.171.camel@plope.dyndns.org> I've also been putting a bit of thought into middleware configuration, although maybe in a different direction. I'm not too concerned yet about being able to introspect the configuration of an individual component. Maybe that's because I haven't thought about the problem enough to be concerned about it. In the meantime, though, I *am* concerned about being able to configure a middleware "pipeline" easily and have it work. I've been attempting to divine a declarative way to configure a pipeline of WSGI middleware components. This is simple enough through code, except that at least in terms of how I'm attempting to factor my middleware, some components in the pipeline may have dependencies on other pipeline components. For example, it would be useful in some circumstances to create separate WSGI components for user identification and user authorization. The process of identification -- obtaining user credentials from a request -- and user authorization -- ensuring that the user is who he says he is by comparing the credentials against a data source -- are really pretty much distinct operations. There might also be a "challenge" component which forces a login dialog. In practice, I don't know if this is a truly useful separation of concerns that need to be implemented in terms of separate components in the middleware pipeline (I see that paste.login conflates them), it's just an example. But at very least it would keep each component simpler if the concerns were factored out into separate pieces. But in the example I present, the "authentication" component depends entirely on the result of the "identification" component. It would be simple enough to glom them together by using a distinct environment key for the identification component results and have the authentication component look for that key later in the middleware result chain, but then it feels like you might as well have written the whole process within one middleware component because the coupling is pretty strong. I have a feeling that adapters fit in here somewhere, but I haven't really puzzled that out yet. I'm sure this has been discussed somewhere in the lifetime of WSGI but I can't find much in this list's archives. > Lately I've been thinking about the role of Paste and WSGI and > whatnot. Much of what makes a Paste component Pastey is > configuration; otherwise the bits are just independent pieces of > middleware, WSGI applications, etc. So, potentially if we can agree > on configuration, we can start using each other's middleware more > usefully. > > I think we should avoid questions of configuration file syntax for > now. Lets instead simply consider configuration consumers. A > standard would consist of: > > * A WSGI environment key (e.g., 'webapp01.config') > * A standard for what goes in that key (e.g., a dictionary object) > * A reference implementation of the middleware > * Maybe a non-WSGI-environment way to access the configuration (like > paste.CONFIG, which is a global object that dispatches to per-request > configuration objects) -- in practice this is really really useful, as > you don't have to pass the configuration object around. > > There's some other things we have to consider, as configuration syntaxes > do effect the configuration objects significantly. So, the standard for > what goes in the key has to take into consideration some possible > configuration syntaxes. > > The obvious starting place is a dictionary-like object. I would suggest > that the keys should be valid Python identifiers. Not all syntaxes > require this, but some do. This restriction simply means that > configuration consumers should try to consume Python identifiers. > > There's also a question about name conflicts (two consumers that are > looking for the same key), and whether nested configuration should be > preferred, and in what style. > > Note that the standard we decide on here doesn't have to be the only way > the object can be accessed. For instance, you could make your > configuration available through 'myframework.config', and create a > compliant wrapper that lives in 'webapp01.config', perhaps even doing > different kinds of mapping to fix convention differences. > > There's also a question about what types of objects we can expect in the > configuration. Some input styles (e.g., INI and command line) only > produce strings. I think consumers should treat strings (or maybe a > special string subclass) specially, performing conversions as necessary > (e.g., 'yes'->True). > > Thoughts? From exarkun at divmod.com Sun Jul 17 05:52:45 2005 From: exarkun at divmod.com (Jp Calderone) Date: Sat, 16 Jul 2005 23:52:45 -0400 Subject: [Web-SIG] Standardized configuration In-Reply-To: <1121571455.24386.171.camel@plope.dyndns.org> Message-ID: <20050717035245.26278.1537979134.divmod.quotient.13326@ohm> http://twistedmatrix.com/pipermail/twisted-python/2005-July/010902.html might be of interest on this topic. Jp From ianb at colorstudy.com Sun Jul 17 06:29:46 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Sat, 16 Jul 2005 23:29:46 -0500 Subject: [Web-SIG] Standardized configuration In-Reply-To: <1121571455.24386.171.camel@plope.dyndns.org> References: <1121571455.24386.171.camel@plope.dyndns.org> Message-ID: <42D9DEBA.4080609@colorstudy.com> Chris McDonough wrote: > I've also been putting a bit of thought into middleware configuration, > although maybe in a different direction. I'm not too concerned yet > about being able to introspect the configuration of an individual > component. Maybe that's because I haven't thought about the problem > enough to be concerned about it. In the meantime, though, I *am* > concerned about being able to configure a middleware "pipeline" easily > and have it work. There's nothing in WSGI to facilitate introspection. Sometimes that seems annoying, though I suspect lots of headaches are removed because of it, and I haven't found it to be a stopper yet. The issue I'm interested in is just how to deliver configuration to middleware. Because middleware can't be introspected (generally), this makes things like configuration schemas very hard to implement. It all needs to be late-bound. > I've been attempting to divine a declarative way to configure a pipeline > of WSGI middleware components. This is simple enough through code, > except that at least in terms of how I'm attempting to factor my > middleware, some components in the pipeline may have dependencies on > other pipeline components. At least in Paste, you just have to set up the stack properly. It would be cool if middleware could detect the presence of its prerequesites, and add the prerequesites if they weren't present; I don't think that's terribly complicated, but I haven't actually tried it. Mostly you'd test for a key, and if not present then you'd instantiate the middleware and reinvoke. > For example, it would be useful in some circumstances to create separate > WSGI components for user identification and user authorization. The > process of identification -- obtaining user credentials from a request > -- and user authorization -- ensuring that the user is who he says he > is by comparing the credentials against a data source -- are really > pretty much distinct operations. There might also be a "challenge" > component which forces a login dialog. I've always thought that a 401 response is a good way of indicating that, but not everyone agrees. (The idea being that the middleware catches the 401 and possibly translates it into a redirect or something.) > In practice, I don't know if this is a truly useful separation of > concerns that need to be implemented in terms of separate components in > the middleware pipeline (I see that paste.login conflates them), it's > just an example. Do you mean identification and authentication (you mention authorization above)? I think authorization is different, and is conflated in paste.login, but I don't have any many use cases where it's a useful distinction. I guess there's a number of ways of getting a username and password; and to some degree the authenticator object works at that level of abstraction. And there's a couple other ways of authenticating a user as well (public keys, IP address, etc). I've generally used a "user manager" object for this kind of abstraction, with subclassing for different kinds of generality (e.g., the basic abstract class makes username/password logins simple, but a subclass can override that and authenticate based on anything in the request). Maybe there's a better term, the fact these two words start with "auth" causes all kinds of confusion. Conflating identification and authentication isn't so bad, but authentication and authorization is really bad (but common). > But at very least it would keep each component simpler > if the concerns were factored out into separate pieces. > > But in the example I present, the "authentication" component depends > entirely on the result of the "identification" component. It would be > simple enough to glom them together by using a distinct environment key > for the identification component results and have the authentication > component look for that key later in the middleware result chain, but > then it feels like you might as well have written the whole process > within one middleware component because the coupling is pretty strong. > > I have a feeling that adapters fit in here somewhere, but I haven't > really puzzled that out yet. I'm sure this has been discussed somewhere > in the lifetime of WSGI but I can't find much in this list's archives. No, I don't think so. It was something I experimented with in paste.login (purely intellectually, I haven't used it in a real app), and Aaron Lav did a little work on it as well, but until it gets some use it's hard to know how complete it is. As long as it's properly partitioned, I don't think it's a terribly hard problem. That is, with proper partitioning the pieces can be recombined, even if the implementations aren't general enough for all cases. Apache and Zope 2 authentication being examples where the partitioning was done improperly. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Sun Jul 17 06:33:57 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 17 Jul 2005 00:33:57 -0400 Subject: [Web-SIG] Standardized configuration In-Reply-To: <42D2C127.5060706@colorstudy.com> Message-ID: <5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com> At 01:57 PM 7/11/2005 -0500, Ian Bicking wrote: >Lately I've been thinking about the role of Paste and WSGI and whatnot. > Much of what makes a Paste component Pastey is configuration; >otherwise the bits are just independent pieces of middleware, WSGI >applications, etc. So, potentially if we can agree on configuration, we >can start using each other's middleware more usefully. I'm going to go ahead and throw my hat in the ring here, even though I've been trying to avoid it. Most of the stuff you are calling middleware really isn't, or at any rate it has no reason to be middleware. What I think you actually need is a way to create WSGI application objects with a "context" object. The "context" object would have a method like "get_service(name)", and if it didn't find the service, it would ask its parent context, and so on, until there's no parent context to get it from. The web server would provide a way to configure a root or default context. This would allow you to do early binding of services without needing to do lookups on every web hit. E.g.:: class MyApplication: def __init__(self, context): self.authenticate = context.get_service('security.authentication') def __call__(self, environ, start_response): user = self.authenticate(environ) So, you would simply register an application *factory* with the web server instead of an application instance, and it invokes it on the context object in order to get the right thing. Really, the only stuff that actually needs to be middleware, is stuff that wraps an *oblivious* application; i.e., the application doesn't know it's there. If it's a service the application uses, then it makes more sense to create a service management mechanism for configuration and deployment of WSGI applications. However, I think that the again the key part of configuration that actually relates to WSGI here is *deployment* configuration, such as which service implementations to use for the various kinds of services. Configuration *of* the services can and should be private to those services, since they'll have implementation-specific needs. (This doesn't mean, however, that a "configuration service" couldn't be part of the family of WSGI service interfaces.) I hope this isn't too vague; I've been wanting to say something about this since I saw your blog post about doing transaction services in WSGI, as that was when I first understood why you were making everything into middleware. (i.e., to create a poor man's substitute for "placeful" services and utilities as found in PEAK and Zope 3.) Anyway, I don't have a problem with trying to create a framework-neutral (in theory, anyway) component system, but I think it would be a good idea to take lessons from ones that have solved this problem well, and then create an extremely scaled-down version, rather than kludging application configuration into what's really per-request data. From chrism at plope.com Sun Jul 17 07:31:20 2005 From: chrism at plope.com (Chris McDonough) Date: Sun, 17 Jul 2005 01:31:20 -0400 Subject: [Web-SIG] Standardized configuration In-Reply-To: <42D9DEBA.4080609@colorstudy.com> References: <1121571455.24386.171.camel@plope.dyndns.org> <42D9DEBA.4080609@colorstudy.com> Message-ID: <1121578280.24386.228.camel@plope.dyndns.org> On Sat, 2005-07-16 at 23:29 -0500, Ian Bicking wrote: > There's nothing in WSGI to facilitate introspection. Sometimes that > seems annoying, though I suspect lots of headaches are removed because > of it, and I haven't found it to be a stopper yet. The issue I'm > interested in is just how to deliver configuration to middleware. Whew, I hoped you'd respond. ;-) It appears that I haven't gotten as far as to want introspection into the implementation or configuration of a middleware component. Instead, I want the ability to declaratively construct a pipeline out of largely opaque and potentially interdependent (but loosely coupled) WSGI middleware components, which is another problem entirely. It seemed cogent, so I just somewhat belligerently coopted this thread, sorry! > Because middleware can't be introspected (generally), this makes things > like configuration schemas very hard to implement. It all needs to be > late-bound. The pipeline itself isn't really late bound. For instance, if I was to create a WSGI middleware pipeline something like this: server <--> session <--> identification <--> authentication <--> <--> challenge <--> application ... session, identification, authentication, and challenge are middleware components (you'll need to imagine their implementations). And within a module that started a server, you might end up doing something like: def configure_pipeline(app): return SessionMiddleware( IdentificationMiddleware( AuthenticationMiddleware( ChallengeMiddleware(app))))) if __name__ == '__main__': app = Application() pipeline = configure_pipeline(app) server = Server(pipeline) server.serve() The pipeline is static. When a request comes in, the pipeline itself is already constructed. I don't really want a way to prevent "improper" pipeline construction at startup time (right now anyway), because failures due to missing dependencies will be fairly obvious. But some elements of the pipeline at this level of factoring do need to have dependencies on availability and pipeline placement of the other elements. In this example, proper operation of the authentication component depends on the availability and pipeline placement of the identification component. Likewise, the identification component may depend on values that need to be retrieved from the session component. I've just seen Phillip's post where he implies that this kind of fine-grained component factoring wasn't really the initial purpose of WSGI middleware. That's kind of a bummer. ;-) Factoring middleware components in this way seems to provide clear demarcation points for reuse and maintenance. For example, I imagined a declarative security module that might be factored as a piece of middleware here: http://www.plope.com/Members/chrism/decsec_proposal . Of course, this sort of thing doesn't *need* to be middleware. But making it middleware feels very right to me in terms of being able to deglom nice features inspired by Zope and other frameworks into pieces that are easy to recombine as necessary. Implementations as WSGI middleware seems a nice way to move these kinds of features out of our respective applications and into more application-agnostic pieces that are very loosely coupled, but perhaps I'm taking it too far. > > For example, it would be useful in some circumstances to create separate > > WSGI components for user identification and user authorization. The > > process of identification -- obtaining user credentials from a request > > -- and user authorization -- ensuring that the user is who he says he > > is by comparing the credentials against a data source -- are really > > pretty much distinct operations. There might also be a "challenge" > > component which forces a login dialog. > > I've always thought that a 401 response is a good way of indicating > that, but not everyone agrees. (The idea being that the middleware > catches the 401 and possibly translates it into a redirect or something.) Yep. That'd be a fine signaling mechanism. > > In practice, I don't know if this is a truly useful separation of > > concerns that need to be implemented in terms of separate components in > > the middleware pipeline (I see that paste.login conflates them), it's > > just an example. > > Do you mean identification and authentication (you mention authorization > above)? Aggh. Yes, I meant to write authentication, sorry. > I think authorization is different, and is conflated in > paste.login, but I don't have any many use cases where it's a useful > distinction. I guess there's a number of ways of getting a username and > password; and to some degree the authenticator object works at that > level of abstraction. And there's a couple other ways of authenticating > a user as well (public keys, IP address, etc). I've generally used a > "user manager" object for this kind of abstraction, with subclassing for > different kinds of generality (e.g., the basic abstract class makes > username/password logins simple, but a subclass can override that and > authenticate based on anything in the request). Sure. OTOH, Zope 2 has proven that inheritance makes for a pretty awful general reuse pattern when things become sufficiently complicated. > As long as it's properly partitioned, I don't think it's a terribly hard > problem. That is, with proper partitioning the pieces can be > recombined, even if the implementations aren't general enough for all > cases. Apache and Zope 2 authentication being examples where the > partitioning was done improperly. Yes. I think it goes further than that. For example, I'd like to have be able to swap out implementations of the following kinds of components at a level somewhere above my application: Sessioning Authentication/identification Authorization (via something like declarative security based on a path) Virtual hosting awareness View lookup View invocation Transformation during rendering Caching Essentially, as Phillip divined, to do so, I've been trying to construct a framework-neutral component system out of middleware pieces to do so, but maybe I need to step back from that a bit. It sure is tempting, though. ;-) - C From ianb at colorstudy.com Sun Jul 17 10:16:14 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 17 Jul 2005 03:16:14 -0500 Subject: [Web-SIG] Standardized configuration In-Reply-To: <1121578280.24386.228.camel@plope.dyndns.org> References: <1121571455.24386.171.camel@plope.dyndns.org> <42D9DEBA.4080609@colorstudy.com> <1121578280.24386.228.camel@plope.dyndns.org> Message-ID: <42DA13CE.2080208@colorstudy.com> Chris McDonough wrote: >>Because middleware can't be introspected (generally), this makes things >>like configuration schemas very hard to implement. It all needs to be >>late-bound. > > > The pipeline itself isn't really late bound. For instance, if I was to > create a WSGI middleware pipeline something like this: > > server <--> session <--> identification <--> authentication <--> > <--> challenge <--> application > > ... session, identification, authentication, and challenge are > middleware components (you'll need to imagine their implementations). > And within a module that started a server, you might end up doing > something like: > > def configure_pipeline(app): > return SessionMiddleware( > IdentificationMiddleware( > AuthenticationMiddleware( > ChallengeMiddleware(app))))) > > if __name__ == '__main__': > app = Application() > pipeline = configure_pipeline(app) > server = Server(pipeline) > server.serve() This is what Paste does in configuration, like: middleware.extend([ SessionMiddleware, IdentificationMiddleware, AuthenticationMiddleware, ChallengeMiddleware]) This kind of middleware takes a single argument, which is the application it will wrap. In practice, this means all the other parameters go into lazily-read configuration. You can also define a "framework" (a plugin to Paste), which in addition to finding an "app" can also add middleware; basically embodying all the middleware that is typical for a framework. Paste is really a deployment configuration. Well, that as well as stuff to deploy. And two frameworks. And whatever else I feel a need or desire to throw in there. Note also that parts of the pipeline are very much late bound. For instance, the way I implemented Webware (and Wareweb) each servlet is a WSGI application. So while there's one URLParser application, the application that actually handles the request differs per request. If you start hanging more complete applications (that might have their own middleware) at different URLs, then this happens more generally. There's a newish poorly tested feature where you can do urlmap['/path'] = 'config_file.conf' and it'll hang the application described by that configuration file at that URL. > The pipeline is static. When a request comes in, the pipeline itself is > already constructed. I don't really want a way to prevent "improper" > pipeline construction at startup time (right now anyway), because > failures due to missing dependencies will be fairly obvious. I think that's reasonable too; it's what Paste implements now. > But some elements of the pipeline at this level of factoring do need to > have dependencies on availability and pipeline placement of the other > elements. In this example, proper operation of the authentication > component depends on the availability and pipeline placement of the > identification component. Likewise, the identification component may > depend on values that need to be retrieved from the session component. Yes; and potentially you could have several middlewares implementing the same functionality for a single request, e.g., if you had different kind of authentication for part of your site/application; that might shadow authentication further up the stack. > I've just seen Phillip's post where he implies that this kind of > fine-grained component factoring wasn't really the initial purpose of > WSGI middleware. That's kind of a bummer. ;-) Well, I don't understand the services he's proposing yet. I'm quite happy with using middleware the way I have been, so I'm not seeing a problem with it, and there's lots of benefits. > Factoring middleware components in this way seems to provide clear > demarcation points for reuse and maintenance. For example, I imagined a > declarative security module that might be factored as a piece of > middleware here: http://www.plope.com/Members/chrism/decsec_proposal . Yes, I read that before; I haven't quite figured out how to digest it, though. This is probably in part because of the resource-based orientation of Zope, and WSGI is application-based, where applications are rather opaque and defined only in terms of function. > Of course, this sort of thing doesn't *need* to be middleware. But > making it middleware feels very right to me in terms of being able to > deglom nice features inspired by Zope and other frameworks into pieces > that are easy to recombine as necessary. Implementations as WSGI > middleware seems a nice way to move these kinds of features out of our > respective applications and into more application-agnostic pieces that > are very loosely coupled, but perhaps I'm taking it too far. Certainly these pieces of code can apply to multiple applications and disparate systems. The most obvious instance right now that I think of is a WSGI WebDAV server (and someone's working on that for Google Summer of Code), which should be implemented pretty framework-free, simply because a good WebDAV implementation works at a low level. But obviously you want that to work with the same authentication as other parts of the system. I guess this is how I come back to lazily introducing middleware. For instance, some "application" (which might be a fairly small bit of functionality) might require a session. If there's no session available, then it can probably make a reasonable session itself. But it shouldn't shadow any session available to it, if that's already available. This is doubly true for something more authoritative like authentication. >> I think authorization is different, and is conflated in >>paste.login, but I don't have any many use cases where it's a useful >>distinction. I guess there's a number of ways of getting a username and >>password; and to some degree the authenticator object works at that >>level of abstraction. And there's a couple other ways of authenticating >>a user as well (public keys, IP address, etc). I've generally used a >>"user manager" object for this kind of abstraction, with subclassing for >>different kinds of generality (e.g., the basic abstract class makes >>username/password logins simple, but a subclass can override that and >>authenticate based on anything in the request). > > > Sure. OTOH, Zope 2 has proven that inheritance makes for a pretty awful > general reuse pattern when things become sufficiently complicated. True. But part of that is having a clear internal and external interface. The external interface -- which you can implement without using the abstract (convenience) superclass -- should be small and explicit. I've found interfaces a useful way of adding discipline in this way, even though I've never really used them at runtime. But I think it's reasonable to use inheritance for convenience sake, so long as you don't implement more than one thing in a class. >>As long as it's properly partitioned, I don't think it's a terribly hard >>problem. That is, with proper partitioning the pieces can be >>recombined, even if the implementations aren't general enough for all >>cases. Apache and Zope 2 authentication being examples where the >>partitioning was done improperly. > > > Yes. I think it goes further than that. For example, I'd like to have > be able to swap out implementations of the following kinds of components > at a level somewhere above my application: > > Sessioning Yes; we need a standard interface for sessions, but that's pretty straight-forward. There's other levels where a useful standard can be implemented as well; for instance, flup.middleware.session has SessionStore, which is where most of the parts of the session that you'd want to reimplement are implemented. > Authentication/identification This seems very doable right now, just by using SCRIPT_NAME. This leads to rather dumb users -- just a string -- but it's a good lowest-common-denominator starting point. More interesting interfaces -- like lists of roles/groups, or user objects -- can be added on incrementally. > Authorization (via something like declarative security based on a path) Sure; I can imagine a whole slew of ways to do authorization. An application can do it simply by returning 403 Forbidden. A front-end middleware could do it with simple pattern matching on the URL. A URL parser (aka traversal) can look for security annotations. > Virtual hosting awareness I've never had a problem with this, except in Zope... Anyway, to me this feels like a kind of URL parsing. One of the mini-proposals I made before involved a way of URL parsers to add URL variables to the system (basically a standard WSGI key to put URL variables as a dictionary). So a pattern like: (?.*)\.myblogspace.com/(?\d\d\d\d)/(?\d\d)/ Would add username, year, and month variables to the system. But regex matching is just one way; the *result* of parsing is usually either in the object (e.g., you use domains to get entirely different sites), or in terms of these variables. > View lookup > View invocation This I imagine happening either below WSGI entirely, or as part of a URL parser. There's certainly a place for adaptation at different stages. For instance, paste.urlparser.URLParser.get_application() clearly is ripe for adaptation. I imagine this wrapping the "resource" with something that renders it using a view. If you make resources and views -- lots of (most?) frameworks use controllers and views, and view lookup tends to be controller driven. So it feels very framework-specific to me. > Transformation during rendering If you mean what I think -- e.g., rendering XSL -- I think WSGI is ripe for this sort of thing. So far I've just done small things, like HTML checking, debugging log messages, etc. But other things are very possible. > Caching Again, I think this is a very natural fit. Well, at least for whole-page caching. Partial page caching doesn't really fit well at all, I'm afraid, though both systems could use the same caching backend. > Essentially, as Phillip divined, to do so, I've been trying to construct > a framework-neutral component system out of middleware pieces to do so, > but maybe I need to step back from that a bit. It sure is tempting, > though. ;-) I've found it satisfyingly easy. Maybe there's a "better" way... but "better" without "easier" doesn't excite me at all. And we learn best by doing... which is my way of saying you should try it with code right now ;) -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Sun Jul 17 10:28:05 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 17 Jul 2005 03:28:05 -0500 Subject: [Web-SIG] Standardized configuration In-Reply-To: <5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com> References: <5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com> Message-ID: <42DA1695.7020304@colorstudy.com> Phillip J. Eby wrote: > At 01:57 PM 7/11/2005 -0500, Ian Bicking wrote: > >> Lately I've been thinking about the role of Paste and WSGI and whatnot. >> Much of what makes a Paste component Pastey is configuration; >> otherwise the bits are just independent pieces of middleware, WSGI >> applications, etc. So, potentially if we can agree on configuration, we >> can start using each other's middleware more usefully. > > > I'm going to go ahead and throw my hat in the ring here, even though > I've been trying to avoid it. > > Most of the stuff you are calling middleware really isn't, or at any > rate it has no reason to be middleware. Well, it is if you implement it that way ;) I think I'd prefer the term "filter" actually; less bad connotations for people. But that's really unrelated to your point. > What I think you actually need is a way to create WSGI application > objects with a "context" object. The "context" object would have a > method like "get_service(name)", and if it didn't find the service, it > would ask its parent context, and so on, until there's no parent context > to get it from. The web server would provide a way to configure a root > or default context. I guess I'm treating the request environment as that context. I don't really see the problem with that...? > This would allow you to do early binding of services without needing to > do lookups on every web hit. E.g.:: > > class MyApplication: > def __init__(self, context): > self.authenticate = > context.get_service('security.authentication') > def __call__(self, environ, start_response): > user = self.authenticate(environ) > > So, you would simply register an application *factory* with the web > server instead of an application instance, and it invokes it on the > context object in order to get the right thing. I don't see the distinction between a factory and an instance. Or at least, it's easy to translate from one to the other. In many cases, the middleware is modifying or watching the application's output. For instance, catching a 401 and turning that into the appropriate login -- which might mean producing a 401, a redirect, a login page via internal redirect, or whatever. I guess you could make one Uber Middleware that could handle the services' needs to rewrite output, watch for errors and finalize resources, etc. This isn't unreasonable, and I've kind of expected one to evolve at some point. But you'll have to say more to get me to see how "services" is a better way to manage this. > Really, the only stuff that actually needs to be middleware, is stuff > that wraps an *oblivious* application; i.e., the application doesn't > know it's there. If it's a service the application uses, then it makes > more sense to create a service management mechanism for configuration > and deployment of WSGI applications. Applications always care about the things around them, so any convention that middleware and applications be unaware of each other would rule out most middleware. > However, I think that the again the key part of configuration that > actually relates to WSGI here is *deployment* configuration, such as > which service implementations to use for the various kinds of services. > Configuration *of* the services can and should be private to those > services, since they'll have implementation-specific needs. (This > doesn't mean, however, that a "configuration service" couldn't be part > of the family of WSGI service interfaces.) > > I hope this isn't too vague; I've been wanting to say something about > this since I saw your blog post about doing transaction services in > WSGI, as that was when I first understood why you were making everything > into middleware. (i.e., to create a poor man's substitute for > "placeful" services and utilities as found in PEAK and Zope 3.) What do they provide that middleware does not? > Anyway, I don't have a problem with trying to create a framework-neutral > (in theory, anyway) component system, but I think it would be a good > idea to take lessons from ones that have solved this problem well, and > then create an extremely scaled-down version, rather than kludging > application configuration into what's really per-request data. Per-request or not, from the application's side I don't see the difference. It is convenient to put configuration into the request, though paste.CONFIG is also provided as a global variable that represents the current request's configuration. In practice the configuration is usually identical for all requests, but I haven't seen any reason to enforce this. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From grahamd at dscpl.com.au Sun Jul 17 12:04:48 2005 From: grahamd at dscpl.com.au (Graham Dumpleton) Date: Sun, 17 Jul 2005 20:04:48 +1000 Subject: [Web-SIG] Standardized configuration In-Reply-To: <42DA13CE.2080208@colorstudy.com> References: <1121571455.24386.171.camel@plope.dyndns.org> <42D9DEBA.4080609@colorstudy.com> <1121578280.24386.228.camel@plope.dyndns.org> <42DA13CE.2080208@colorstudy.com> Message-ID: <669891ba7d6b3bb20d95f44ff112074a@dscpl.com.au> On 17/07/2005, at 6:16 PM, Ian Bicking wrote: >> The pipeline itself isn't really late bound. For instance, if I was >> to >> create a WSGI middleware pipeline something like this: >> >> server <--> session <--> identification <--> authentication <--> >> <--> challenge <--> application >> >> ... session, identification, authentication, and challenge are >> middleware components (you'll need to imagine their implementations). >> And within a module that started a server, you might end up doing >> something like: >> >> def configure_pipeline(app): >> return SessionMiddleware( >> IdentificationMiddleware( >> AuthenticationMiddleware( >> ChallengeMiddleware(app))))) > > This is what Paste does in configuration, like: > > middleware.extend([ > SessionMiddleware, IdentificationMiddleware, > AuthenticationMiddleware, ChallengeMiddleware]) > > This kind of middleware takes a single argument, which is the > application it will wrap. In practice, this means all the other > parameters go into lazily-read configuration. Sorry, but you have given me a nice opening here to hijack this conversation a bit and make some comments and pose some questions about WSGI that I have been thinking on for a while. My understanding from reading the WSGI PEP and examples like that above is that the WSGI middleware stack concept is very much tree like, but where at any specific node within the tree, one can only traverse into one child. Ie., a parent middleware component could make a decision to defer to one child or another, but there is no means of really trying out multiple choices until you find one that is prepared to handle the request. The only way around it seems to be make the linear chain of nested applications longer and longer, something which to me just doesn't sit right. In some respects the need for the configuration scheme is in part to make that less unwieldy. To explain what I am going on about, I am going to use examples from some work I have been doing with componentised construction of request handler stacks in mod_python. I will not use the term middleware here, as I note that someone here in this discussion has already made the point of saying that the components being talked about here aren't really middleware and in what I have been doing I have been taking it to an even more fine grained level. I believe I can draw a reasonable analogy to mod_python as at the simplest, a mod_python request handler and a WSGI application are both providing the most basic function of proving the service for responding to a request, they just do so in different ways. Normally in mod_python a handler can return an OK response, an error response or a DECLINED response. The DECLINED response is special and indicates to mod_python that any further content handlers defined by mod_python should be skipped and control passed back up to Apache so that it can potentially serve up a matched static file. What I am doing is making it acceptable for a handler to also return None. If this were returned by the highest level handler, it would equate to being the same as DECLINED, but within the context of middleware components it has a lightly relaxed meaning. Specifically, it indicates that that handler isn't returning a response, but not that it is indicating that the request as a whole is being DECLINED causing a return to Apache. Doing this means that within the context of a tree based middleware stack, at a particular node in the stack one can introduce a list of handlers at a particular node. Each handler in the list will in turn be tried to see if it wishes to handle the response, returning either an error or valid response, or None. If it doesn't raise a response, the next handler in the list would be tried until one is found, and if one isn't, then None is passed back to the parent middleware component. This all means I could write something like: handler = Handlers( IfLocationMatches(r"/_",NotFound()), IfLocationMatches(r"\.py(/.*)?$",NotFound()), PythonModule(), ) This handler might be associated with any access to a directory as a whole. In iterating over each of the handlers it filters out requests to files that we don't want to provide access to, with the final handler deferring to a handler within a Python module associated with the actual resource being requested. Although Apache provides means of filtering out requests, it only works properly for physical files and not virtual resources specified by way of the path info. For example, a file "page.tmpl" (a Cheetah file) could have a "page.py" file that defines: handler = Handlers( IfLocationMatches(r"\.bak(/.*)?$",NotFound()), IfLocationMatches(r"\.tmpl(/.*)?$",NotFound()), IfLocationMatches(r"/.*\.html(/.*)?$",CheetahModule()), ) Again, more filtering and finally a handler is triggered which knows how to trigger a precompiled Cheetah template stored as a Python module. All in all a similar tree like structure to WSGI, except you have the ability to iterate through handlers at one level with them being able to explicitly define that they aren't providing a response and instead allowing the next handler to be tried. My experience with this so far is that it has allowed more fine grained components to be created which provide specific filtering without it all turning into a mess due to having to nest each handler within another in a big pipeline as things seem they must be done in WSGI. In mod_python one already has access to a table object storing configuration options set within the Apache configuration for mod_python, plus the ability to add Python objects into the mod_python request object itself as necessary In terms of configuration, using this ability of a list of handlers where they don't actually return a response, seems to me to make it easier to avoid having to have a separate configuration system for most stuff. For example, I can have a handler "SetPythonOption" which sets an option in the options table object and always returns None, thus passing control onto the next handler. In the highest level handler before point where control is dispatched off to a separate Python module or special purpose handler, one can thus define the configuration as necessary. handler = Handlers( SetPythonOption("PythonDebug","1"), SetPythonOption("ApplicationPath","/application"), IfLocationMatches(r"/_",NotFound()), IfLocationMatches(r"\.py(/.*)?$",NotFound()), PythonModule(), ) In other words, the code itself contains the configuration and one doesn't have to worry about where the configuration is found and working out what you may need from it. Of course you could still have a separate configuration object and provide a special purpose handler which merges that into the environment of the request object in some way. For this later case, inline with how its request object is used, you could have something like: config = getApplicationConfig() handler = Handlers( SetRequestAttribute("config",config), IfLocationMatches(r"/_",NotFound()), IfLocationMatches(r"\.py(/.*)?$",NotFound()), PythonModule(), ) Having done that, any later handler could access "req.config" to get access to the configuration object and use it as necessary. In WSGI such things would be placed into the "environ" dictionary and propagated to subsequent applications. One last example, is what a session based login mechanism might look like since this was one of the examples posed in the initial discussion. Here you might have a handler for a whole directory which contains: _userDatabase = _users.UserDatabase() handler = Handlers( IfLocationMatches(r"\.bak(/.*)?$",NotFound()), IfLocationMatches(r"\.tmpl(/.*)?$",NotFound()), IfLocationIsADirectory(ExternalRedirect('index.html')), # Create session and stick it in request object. CreateUserSession(), # Login form shouldn't require user to be logged in to access it. IfLocationMatches(r"^/login\.html(/.*)?$",CheetahModule()), # Serve requests against login/logout URLs and otherwise # don't let request proceed if user not yet authenticated. # Will redirect to login form if not authenticated. FormAuthentication(_userDatabase,"login.html"), SetResponseHeader('Pragma','no-cache'), SetResponseHeader('Cache-Control','no-cache'), SetResponseHeader('Expires','-1'), IfLocationMatches(r"/.*\.html(/.*)?$",CheetahModule()), ) Again, one has done away with the need for a configuration files as the code itself specifies what is required, along with the constraints as to what order things should be done in. Another thing this example shows is that handlers when they return None due to not returning an actual response, can still add to the response headers in the way of special cookies as required by sessions, or headers controlling caching etc. In terms of late binding of which handler is executed, the "PythonModule" handler is one example in that it selects which Python module to load only when the request is being handled. Another example of late construction of an instance of a handler in what I am doing, albeit the same type, is: class Handler: def __init__(self,req): self.__req = req def __call__(self,name="value"): self.__req.content_type = "text/html" self.__req.send_http_header() self.__req.write("") self.__req.write("

name=%r

"%cgi.escape(name)) self.__req.write("") return apache.OK handler = IfExtensionEquals("html",HandlerInstance(Handler)) First off the "HandlerInstance" object is only triggered if the request against this specific file based resource was by way of a ".html" extension. When it is triggered, it is only at that point that an instance of "Handler" is created, with the request object being supplied to the constructor. To round this off, the special "Handlers" handler only contains the following code. Pretty simple, but makes construction of the component hierarchy a bit easier in my mind when multiple things need to be done in turn where nesting isn't strictly required. class Handlers: def __init__(self,*handlers): self.__handlers = handlers def __call__(self,req): if len(self.__handlers) != 0: for handler in self.__handlers: result = _execute(req,handler,lazy=True) if result is not None: return result Would be very interested to see how people see this relating to what is possible with WSGI. Could one instigate a similar sort of class to "Handlers" in WSGI to sequence through WSGI applications until one generates a complete response? The areas that have me thinking the answer is "no" is that I recollect the PEP saying that the "start_response" object can only be called once, which precludes applications in a list adding to the response headers without returning a valid status. Secondly, if "start_response" object hasn't been called when the parent starts to try and construct the response content from the result of calling the application, it raises an error. But then, I have a distinct lack of proper knowledge on WSGI so could be wrong. If my thinking is correct, it could only be done by changing the WSGI specification to support the concept of trying applications in sequence, by way of allowing None as the status when "start_response" is called to indicate the same as when I return None from a handler. Ie., the application may have set headers, but otherwise the parent should where possible move to a subsequence application and try it etc. Anyway, people may feel that this is totally contrary to what WSGI is all about and not relevant and that is fine, I am at least finding it an interesting idea to play with in respect of mod_python at least. BTW, WSGI itself could just become a plugable component within this mod_python middleware equivalent. :-) handler = Handlers( IfLocationMatches(r"/_",NotFound()), IfLocationMatches(r"\.py(/.*)?$",NotFound()), WSGIApplicationModule(), ) Feedback most welcome. I have been trying to work out how what I am doing may transfered to WSGI for a little while, but if people think it is a stupid idea then I'll no longer waste my time on thinking about it and just stick with mod_python. Graham From chrism at plope.com Sun Jul 17 13:29:56 2005 From: chrism at plope.com (Chris McDonough) Date: Sun, 17 Jul 2005 07:29:56 -0400 Subject: [Web-SIG] Standardized configuration In-Reply-To: <42DA13CE.2080208@colorstudy.com> References: <1121571455.24386.171.camel@plope.dyndns.org> <42D9DEBA.4080609@colorstudy.com> <1121578280.24386.228.camel@plope.dyndns.org> <42DA13CE.2080208@colorstudy.com> Message-ID: <1121599799.24386.347.camel@plope.dyndns.org> On Sun, 2005-07-17 at 03:16 -0500, Ian Bicking wrote: > This is what Paste does in configuration, like: > > middleware.extend([ > SessionMiddleware, IdentificationMiddleware, > AuthenticationMiddleware, ChallengeMiddleware]) > > This kind of middleware takes a single argument, which is the > application it will wrap. In practice, this means all the other > parameters go into lazily-read configuration. I'm finding it hard to imagine a reason to have another kind of middleware. Well, actually that's not true. In noodling about this, I did think it would be kind of neat in a twisted way to have "decision middleware" like: class DecisionMiddleware: def __init__(self, apps): self.apps = apps def __call__(self, environ, start_response): app = self.choose(environ) for chunk in app(environ, start_response): yield chunk def choose(self, environ): app = some_decision_function(self.apps, environ) I can imagine using this pattern as a decision point for a WSGI pipeline serving multiple application end-points (perhaps based on URL matching of the PATH_INFO in environ). But by and large, most middleware components seem to be just wrappers for the next application in the chain. There seem to be two types of middleware that takes a single application object as a parameter to its constructor. There is "decorator" middleware where you want to add something to the environment for an application to find later and "action" middleware that does some rewriting of the body or the response headers before the response is sent back to the client. Some of this kind of middleware does both. > You can also define a "framework" (a plugin to Paste), which in addition > to finding an "app" can also add middleware; basically embodying all the > middleware that is typical for a framework. This appears to be what I'm trying to do too, which is why I'm intrigued by Paste. OTOH, I'm not sure that I want my framework to "find" an app for me. I'd like to be able to define pipelines that include my app, but I'd typically just want to statically declare it as the end point of a pipeline composed of service middleware. I should look at Paste a little more to see if it has the same philosophy or if I'm misunderstanding you. > Paste is really a deployment configuration. Well, that as well as stuff > to deploy. And two frameworks. And whatever else I feel a need or > desire to throw in there. Yeah. FWIW, as someone who has recently taken a brief look at Paste, I think it would be helpful (at least for newbies) to partition out the bits of Paste which are meant to be deployment configuration from the bits that are meant to be deployed. Zope 2 fell into the same trap early on, and never recovered. For example, ZPublisher (nee Bobo) was always meant to be able to be useful outside of Zope, but in practice it never happened because nobody could figure out how to disentangle it from its ever-increasing dependencies on other software only found in a Zope checkout. In the end, nobody even remembered what its dependencies were *supposed* to be. If you ask ten people, you'd get ten different answers. I also think that the rigor of separating out different components helps to make the software stronger and more easily understood in bite-sized pieces. Unfortunately, separating them makes configuration tough, but I think that's what we're trying to find an answer about how to do "the right way" here. > Note also that parts of the pipeline are very much late bound. For > instance, the way I implemented Webware (and Wareweb) each servlet is a > WSGI application. So while there's one URLParser application, the > application that actually handles the request differs per request. If > you start hanging more complete applications (that might have their own > middleware) at different URLs, then this happens more generally. Well, if you put the "decider" in middleware itself, all of the middleware components in each pipeline could still be at least constructed early. I'm pretty sure this doesn't really strictly qualify as "early binding" but it's not terribly dynamic either. It also makes configuration pretty straightforward. At least I can imagine a declarative syntax for configuring pipelines this way. I'm pretty sure you're not advocating it, but in case you are, I'm not sure it adds as much value as it removes to be able to have a "dynamic" middleware chain whereby new middleware elements can be added "on the fly" to a pipeline after a request has begun. That is *very* "late binding" to me and it's impossible to configure declaratively. > > But some elements of the pipeline at this level of factoring do need to > > have dependencies on availability and pipeline placement of the other > > elements. In this example, proper operation of the authentication > > component depends on the availability and pipeline placement of the > > identification component. Likewise, the identification component may > > depend on values that need to be retrieved from the session component. > > Yes; and potentially you could have several middlewares implementing the > same functionality for a single request, e.g., if you had different kind > of authentication for part of your site/application; that might shadow > authentication further up the stack. That's true. In the Zope world, we'd call that a "placeful service". I'd be tempted to model this with "decision middleware". > > I've just seen Phillip's post where he implies that this kind of > > fine-grained component factoring wasn't really the initial purpose of > > WSGI middleware. That's kind of a bummer. ;-) > > Well, I don't understand the services he's proposing yet. I'm quite > happy with using middleware the way I have been, so I'm not seeing a > problem with it, and there's lots of benefits. I agree! I'm a bit confused because one of the canonical examples of how WSGI middleware is useful seems to be the example of implementing a framework-agnostic sessioning service. And for that sessioning service to be useful, your application has to be able to depend on its availability so it can't be "oblivious". OTOH, the primary benefit -- to me, at least -- of modeling services as WSGI middleware is the fact that someone else might be able to use my service outside the scope of my projects (and thus help maintain it and find bugs, etc). So if I've got the wrong concept of what kinds of middleware that I can expect "normal" people to use, I don't want to go very far down that road without listening carefully to Phillip. Perhaps I'll have a shot at influencing the direction of WSGI to make it more appropriate for this sort of thing or maybe we'll come up with a better way of doing it. Zope 3 is a component system much like what I'm after, and I may just end up using it wholesale. But my immediate problem with Zope 3 is that like Zope 2, it's a collection of libraries that have dependencies on other libraries that are only included within its own checkout and don't yet have much of a life of their own. It's not really a technical problem, it's a social one... I'd rather have a somewhat messy framework with a lot of diversity composed of wildly differing component implementations that have a life of their own than to be be trapped in a clean, pure world where all the components are used only within that world. I suspect there's a middle ground here somewhere. > > Factoring middleware components in this way seems to provide clear > > demarcation points for reuse and maintenance. For example, I imagined a > > declarative security module that might be factored as a piece of > > middleware here: http://www.plope.com/Members/chrism/decsec_proposal . > > Yes, I read that before; I haven't quite figured out how to digest it, > though. This is probably in part because of the resource-based > orientation of Zope, and WSGI is application-based, where applications > are rather opaque and defined only in terms of function. Yes, it is a bit Zopeish because it assumes content lives at a path. This isn't always the case, I know, but it often is. Well, it's a bit of a stretch, but an alternate decsec implementation might use a "content identifier" to determine the protection of a resource instead of a full path. For example, if you're implementing an application that is very simple and takes one and only one URL, but calls it with a different query string variable to display different pieces of content (e.g. '/blog?entry_num=1234'), you might have one ACL as the "root" ACL but optionally protect each piece of content with a separate ACL if one can be found. Maybe the content-specific ACL would be 'entry_num=1234' instead of a path. A function that accepts a form post for displaying or changing the blog entry for 1234 might look like this: def blog(environ, start_response): acl = environ['acl'] # added by decsec middleware userid = environ['userid'] # added by an authentication middleware formvars = get_form_vars_from(environ) if formvars['action'] == "view": permission = 'view' elif formvars['action'] == "change": permission = 'edit' content = get_blog_entry(environ) # pulls out the entry for 1234 if not acl.check(userid, permission): start_response('401 Unauthorized', []) return ['Unauthorized'] [ ... further code to change or display the blog entry ... ] The ACL could be the "root" ACL (say, all users can view, members of the group "manager" could change, everything else is denied). The "root" ACL would be used if content did not have its own ACL. But associating an ACL with a content identifier would allow the developer or site manager to protect individual blog entries (e.g. 1234, 5678, etc) with different ACLs. "Joe can view this one but he can't change it", "Jim can view all of them and can change all of them", etc.. the sorts of things useful for "staging" and workflow delegation without unduly mucking up the actual application code. Decsec would also take into account the user's group memberships and so forth during the "check" step, so you wouldn't have to write any of this code either. The "blog" example is stupid, of course, the concept is more useful for higher-security apps. Sorry, all of this is somewhat besides the point of this thread, but it does provide an example of kind of functionality I'd like to be able to put into middleware. > > Of course, this sort of thing doesn't *need* to be middleware. But > > making it middleware feels very right to me in terms of being able to > > deglom nice features inspired by Zope and other frameworks into pieces > > that are easy to recombine as necessary. Implementations as WSGI > > middleware seems a nice way to move these kinds of features out of our > > respective applications and into more application-agnostic pieces that > > are very loosely coupled, but perhaps I'm taking it too far. > > Certainly these pieces of code can apply to multiple applications and > disparate systems. The most obvious instance right now that I think of > is a WSGI WebDAV server (and someone's working on that for Google Summer > of Code), which should be implemented pretty framework-free, simply > because a good WebDAV implementation works at a low level. But > obviously you want that to work with the same authentication as other > parts of the system. Yes. In particular, if you knew you were working with an application that could resolve a path in terms of containers and contained pieces of content (just like a filesystem does), it would be pretty easy to code up a DAV "action middleware" component that rendered containerish things as DAV "collections" and contentish things as DAV "resources", and which could handle DAV locking and property rendering and so forth. This kind of middleware might be tough, though, because it probably requires explicit cooperation from the end-point application (it expects to be talking to an actual filesystem, but that won't always be the case at least without some sort of adaptation). But in any case, it's a good example of how we could prevent people from needing to reinvent the wheel... this guy appears to be coming up with his own identification, authentication, authorization, and challenge libraries entirely http://cwho.blogspot.com/ which just feels very wasteful. > I guess this is how I come back to lazily introducing middleware. For > instance, some "application" (which might be a fairly small bit of > functionality) might require a session. If there's no session > available, then it can probably make a reasonable session itself. But > it shouldn't shadow any session available to it, if that's already > available. This is doubly true for something more authoritative like > authentication. I'm not sure I know enough to be able to agree or disagree. But this seems definitely more in the realm of "late binding", which I'm a little concerned about from a config perspective. > > Sure. OTOH, Zope 2 has proven that inheritance makes for a pretty awful > > general reuse pattern when things become sufficiently complicated. > > True. But part of that is having a clear internal and external > interface. The external interface -- which you can implement without > using the abstract (convenience) superclass -- should be small and > explicit. I've found interfaces a useful way of adding discipline in > this way, even though I've never really used them at runtime. > > But I think it's reasonable to use inheritance for convenience sake, so > long as you don't implement more than one thing in a class. I agree completely. > > Yes. I think it goes further than that. For example, I'd like to have > > be able to swap out implementations of the following kinds of components > > at a level somewhere above my application: > > > > Sessioning > > Yes; we need a standard interface for sessions, but that's pretty > straight-forward. There's other levels where a useful standard can be > implemented as well; for instance, flup.middleware.session has > SessionStore, which is where most of the parts of the session that you'd > want to reimplement are implemented. Yes. Furthermore, if sessioning is a middleware component, anything can be a middleware component as far as I can tell. ;-) > > Authentication/identification > > This seems very doable right now, just by using SCRIPT_NAME. This leads > to rather dumb users -- just a string -- but it's a good > lowest-common-denominator starting point. More interesting interfaces > -- like lists of roles/groups, or user objects -- can be added on > incrementally. Sure. > > Authorization (via something like declarative security based on a path) > > Sure; I can imagine a whole slew of ways to do authorization. An > application can do it simply by returning 403 Forbidden. > A front-end > middleware could do it with simple pattern matching on the URL. A URL > parser (aka traversal) can look for security annotations. Yes. In the simplest case, security annotations for resources could be kept statically in a Python module. In more complicated cases, the application itself would need to collaborate with "upstream" middleware to do authorization. > > Virtual hosting awareness > > I've never had a problem with this, except in Zope... > > Anyway, to me this feels like a kind of URL parsing. One of the > mini-proposals I made before involved a way of URL parsers to add URL > variables to the system (basically a standard WSGI key to put URL > variables as a dictionary). So a pattern like: > > (?.*)\.myblogspace.com/(?\d\d\d\d)/(?\d\d)/ > > Would add username, year, and month variables to the system. But regex > matching is just one way; the *result* of parsing is usually either in > the object (e.g., you use domains to get entirely different sites), or > in terms of these variables. Yes, this seems to be more of a problem for Zope because it's a) a long-running app with its own webserver b) has convenience functions for generating URLs based on its internal containment graph and c) doesn't deal well with relative URLs. So if you want an application that lives in a "subfolder" of your Zope object graph to behave as if it lives at "http://example.com" instead of "http://example.com/subfolder", you need to give it clues. > > View lookup > > View invocation > > This I imagine happening either below WSGI entirely, or as part of a URL > parser. There's certainly a place for adaptation at different stages. > For instance, paste.urlparser.URLParser.get_application() clearly is > ripe for adaptation. I imagine this wrapping the "resource" with > something that renders it using a view. If you make resources and views > -- lots of (most?) frameworks use controllers and views, and view lookup > tends to be controller driven. So it feels very framework-specific to me. Yep, I suspect the same. I think these things will end up in the end-point application but it's kinda fun to try to think about abstracting them. > > Transformation during rendering > > If you mean what I think -- e.g., rendering XSL -- I think WSGI is ripe > for this sort of thing. Yes, that's what I meant. > So far I've just done small things, like HTML > checking, debugging log messages, etc. But other things are very possible. > > > Caching > > Again, I think this is a very natural fit. Well, at least for > whole-page caching. Partial page caching doesn't really fit well at > all, I'm afraid, though both systems could use the same caching backend. > > > Essentially, as Phillip divined, to do so, I've been trying to construct > > a framework-neutral component system out of middleware pieces to do so, > > but maybe I need to step back from that a bit. It sure is tempting, > > though. ;-) > > I've found it satisfyingly easy. Maybe there's a "better" way... but > "better" without "easier" doesn't excite me at all. And we learn best > by doing... which is my way of saying you should try it with code right > now ;) Yes, I should stop blathering and get to work. I gotta admit that I'm pretty excited about the possibilities. It's just reassuring to know that I'm not entirely insane, or at least that other people are just as insane as I am. ;-) - C From pje at telecommunity.com Sun Jul 17 19:56:35 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 17 Jul 2005 13:56:35 -0400 Subject: [Web-SIG] Standardized configuration In-Reply-To: <1121599799.24386.347.camel@plope.dyndns.org> References: <42DA13CE.2080208@colorstudy.com> <1121571455.24386.171.camel@plope.dyndns.org> <42D9DEBA.4080609@colorstudy.com> <1121578280.24386.228.camel@plope.dyndns.org> <42DA13CE.2080208@colorstudy.com> Message-ID: <5.1.1.6.0.20050717134029.0269b730@mail.telecommunity.com> At 07:29 AM 7/17/2005 -0400, Chris McDonough wrote: >I'm a bit confused because one of the canonical examples of >how WSGI middleware is useful seems to be the example of implementing a >framework-agnostic sessioning service. And for that sessioning service >to be useful, your application has to be able to depend on its >availability so it can't be "oblivious". Exactly. As soon as you start trying to have configured services, you are creating Yet Another Framework. Which isn't a bad thing per se, except that it falls outside the scope of PEP 333. It deserves a separate PEP, I think, and a separate implementation mechanism than being crammed into the request environment. These things should be allowed to be static, so that an application can do some reasonable setup, and so that you don't have per-request overhead to shove ninety services into the environment. Also, because we are dealing not with basic plumbing but with making a nice kitchen, it seems to me we can afford to make the fixtures nice. That is, for an add-on specification to WSGI we don't need to adhere to the "let it be ugly for apps if it makes the server easier" principle that guided PEP 333. The assumption there was that people would mostly port existing wrappers over HTTP/CGI to be wrappers over WSGI. But for services, we are talking about an actual framework to be used by application developers directly, so more user-friendliness is definitely in order. For WSGI itself, the server-side implementation has to be very server specific. But the bulk of a service stack could be implemented once (e.g. as part of wsgiref), and then just used by servers. So, we don't have to worry as much about making it easy for server people to implement, except for any server-specific choices about how configuration might be stacked. (For example, in a filesystem-oriented server like Apache, you might want subdirectories to inherit services defined in parent directories.) >OTOH, the primary benefit -- to me, at least -- of modeling services as >WSGI middleware is the fact that someone else might be able to use my >service outside the scope of my projects (and thus help maintain it and >find bugs, etc). So if I've got the wrong concept of what kinds of >middleware that I can expect "normal" people to use, I don't want to go >very far down that road without listening carefully to Phillip. Perhaps >I'll have a shot at influencing the direction of WSGI to make it more >appropriate for this sort of thing or maybe we'll come up with a better >way of doing it. > >Zope 3 is a component system much like what I'm after, and I may just >end up using it wholesale. But my immediate problem with Zope 3 is that >like Zope 2, it's a collection of libraries that have dependencies on >other libraries that are only included within its own checkout and don't >yet have much of a life of their own. It's not really a technical >problem, it's a social one... I'd rather have a somewhat messy framework >with a lot of diversity composed of wildly differing component >implementations that have a life of their own than to be be trapped in a >clean, pure world where all the components are used only within that >world. > >I suspect there's a middle ground here somewhere. Right; I'm suggesting that we grow a "WSGI Deployment" or "WSGI Stack" specification that includes a simple way to obtain services (using the Zope 3 definition of "service" as simply a named component). This would form the basis for various "WSGI Service" specifications. And, for existing frameworks there's at least some potential possibility of integrating with this stack, since PEAK and Zope 3 both already have ways to define and acquire named services, so it might be possible to define the spec in such a way that their implementations could be reused by wrapping them in a thin "WSGI Stack" adapter. Similarly, if there are any other frameworks out there that offer similar functionality, then they ought to be able to play too, at least in principle. From pje at telecommunity.com Sun Jul 17 20:23:46 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 17 Jul 2005 14:23:46 -0400 Subject: [Web-SIG] Standardized configuration In-Reply-To: <42DA1695.7020304@colorstudy.com> References: <5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com> <5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050717135650.026a0428@mail.telecommunity.com> At 03:28 AM 7/17/2005 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>What I think you actually need is a way to create WSGI application >>objects with a "context" object. The "context" object would have a >>method like "get_service(name)", and if it didn't find the service, it >>would ask its parent context, and so on, until there's no parent context >>to get it from. The web server would provide a way to configure a root >>or default context. > >I guess I'm treating the request environment as that context. I don't >really see the problem with that...? It puts a layer in the request call stack for each service you want to offer, versus *no* layers for an arbitrary number of services. It adds work to every request to put stuff into the environment, then take it out again, versus just getting what you want in the first place. >In many cases, the middleware is modifying or watching the application's >output. For instance, catching a 401 and turning that into the >appropriate login -- which might mean producing a 401, a redirect, a login >page via internal redirect, or whatever. And that would be legitimate middleware, except I don't think that's what you really want for that use case. What you want is an "authentication service" that you just call to say, "I need a login" and get the login information from, and return its return value so that it does start_response for you and sends the right output. The difference is obliviousness; if you want to *wrap* an application not written to use WSGI services, then it makes sense to make it middleware. If you're writing a new application, just have it use components instead of mocking up a 401 just so you can use the existing middleware. Notice, by the way, that it's trivial to create middleware that detects the 401 and then *invokes the service*. So, it's more reusable to make services be services, and middleware be wrappers to apply services to oblivious applications. >I guess you could make one Uber Middleware that could handle the services' >needs to rewrite output, watch for errors and finalize resources, etc. Um, it's called a library of functions. :) WSGI was designed to make it easy to use library calls to do stuff. If you don't need the obliviousness, then library calls (or service calls) are the Obvious Way To Do It. > This isn't unreasonable, and I've kind of expected one to evolve at > some point. But you'll have to say more to get me to see how "services" > is a better way to manage this. I'm saying that middleware can use services, and applications can use services. Making applications *have to* use middleware in order to use the services is wasteful of both computer time and developer brainpower. Just let them use services directly when the situation calls for it, and you can always write middleware to use the services when you encounter the occasional (and ever-rarer with time) oblivious application. >>Really, the only stuff that actually needs to be middleware, is stuff >>that wraps an *oblivious* application; i.e., the application doesn't know >>it's there. If it's a service the application uses, then it makes more >>sense to create a service management mechanism for configuration and >>deployment of WSGI applications. > >Applications always care about the things around them, so any convention >that middleware and applications be unaware of each other would rule out >most middleware. Yes, exactly! Now you understand me. :) If the application is what wants the service, let it just call the service. Middleware is *overhead* in that case. >>I hope this isn't too vague; I've been wanting to say something about >>this since I saw your blog post about doing transaction services in WSGI, >>as that was when I first understood why you were making everything into >>middleware. (i.e., to create a poor man's substitute for "placeful" >>services and utilities as found in PEAK and Zope 3.) > >What do they provide that middleware does not? Well, some services may be things the application needs only when it's being initially configured. Or maybe the service is something like a scheduler that gives timed callbacks. There are lots of non-per-request services that make sense, so forcing service access to be only through the environment makes for cruftier code, since you now have to keep track of whether you've been called before, and then do any setup during your first web hit. For that matter, some service configuration might need to be dynamically determined, based on the application object requesting it. But the main thing they provide that middleware does not is simplicity and ease of use. I understand your desire to preserve the appearance of neutrality, but you are creating new web frameworks here, and making them ugly doesn't make them any less of a framework. :) What's worse is that by tying the service access mechanism to the request environment, you're effectively locking out frameworks like PEAK and Zope 3 from being able to play, and that goes against (IMO) the goals of WSGI, which is to get more and more frameworks to be able to play, and give them *incentive* to merge and dissolve and be assimilated into the primordial soup of WSGI-based integration, or at least to be competitors for various implementation/use case niches in the WSGI ecosystem. See also my message to Chris just now about why a WSGI service spec can and should follow different rules of engagement than the WSGI spec did; it really isn't necessary to make services ugly for applications in order to make it easy for server implementors, as it was for the WSGI core spec. In fact, the opposite condition applies: the service stack should make it easy and clean for applications to use WSGI services, because they're the things that will let them hide WSGI implementation details in the absence of an existing web framework. From chrism at plope.com Mon Jul 18 06:57:26 2005 From: chrism at plope.com (Chris McDonough) Date: Mon, 18 Jul 2005 00:57:26 -0400 Subject: [Web-SIG] Standardized configuration In-Reply-To: <5.1.1.6.0.20050717134029.0269b730@mail.telecommunity.com> References: <42DA13CE.2080208@colorstudy.com> <1121571455.24386.171.camel@plope.dyndns.org> <42D9DEBA.4080609@colorstudy.com> <1121578280.24386.228.camel@plope.dyndns.org> <42DA13CE.2080208@colorstudy.com> <5.1.1.6.0.20050717134029.0269b730@mail.telecommunity.com> Message-ID: <1121662646.24386.462.camel@plope.dyndns.org> I tried to think of this today in terms of creating a "deployment spec" but boy, it gets complicated if you want a lot of useful features out of it. I have about four or five pages of a straw man "deployment configuration" proposal, but it makes way too many assumptions. So I tried to boil the problem down into its parts. There seem to be three distinct categories of configuration: - Server/gateway/application instance configuration. This is the kind of configuration that may be exposed to deployers by application authors. Creating an instance configuration results in an instance of an application or gateway or maybe even a server. - "Wiring" configuration which allows you to string together a "stack" out of instances. I like calling it a "pipeline" better, but when in Rome... This is the kind of configuration that would be useful if you already have a bunch of instance configurations from the step above laying around and you want to create a stack out of them for deployment purposes. - "Service" configuration which allows you create bits of context that can be used by applications in the stack, but which aren't inserted into the stack itself. I suspect we should stick to the first category of configuration first, but I'll note that the desire for the other two categories might impose some design constraints on the first. The last kind of configuration definitely ventures far out into framework land and though it'd be terribly useful and seems to be where a lot of people think the value of WSGI is, it might be something other than WSGI entirely. So, anyway, towards the first category, I'll throw something out to the wolves. Note that below when I say "component" I mean a WSGI server, gateway, or application: Each Python package which includes one or more WSGI components may optionally include descriptions of these components' "meta-configuration". This meta-configuration would take the form of one or more "schemas". Each schema would enumerate the configurable elements of a single WSGI component implementation. A schema for a component defines *the minimal number* of typed, component-specific keys and values that may be used to create instances of this component. >>> # load the schemas >>> server_schema = loadSchema('components/server/server.schema') >>> gateway_schema = loadSchema('components/gateway/gateway.schema') >>> app_schema = loadSchema('components/app/app.schema') >>> # create the instances; any one of these steps would fail >>> # if the config file violated its schema. >>> server_factory = loadConfig('instances/server/server.conf', schema = server_schema) >>> gateway_factory = loadConfig('instances/gateway/gateway.conf', schema = gateway_schema) >>> app_factory = loadConfig('instances/app/app.conf', schema = app_schema) >>> # create instances from the factories >>> server = server_factory.create() >>> gateway = gateway_factory.create() >>> app = app_factory.create() # configure the instances into a pipeline >>> pipeline = server(gateway(app)) # serve up the pipeline (notionally) >>> server.serve() Of course this is just a more declarative way to do what is already possible in code except for the schema-checking part, which presumably would supply the deployer with clues if he had screwed up a config file. I purposely didn't attempt to describe the syntax of the configuration or schema files, but I suspect it would be best to make them both ConfigParser files. FWIW, ZConfig already does this exact thing, and it's already written, but introducing dependencies on non-stdlib things seems problematic. Is this more or less what people have in mind for deployment configuration or am I out in left field? On Sun, 2005-07-17 at 13:56 -0400, Phillip J. Eby wrote: > At 07:29 AM 7/17/2005 -0400, Chris McDonough wrote: > >I'm a bit confused because one of the canonical examples of > >how WSGI middleware is useful seems to be the example of implementing a > >framework-agnostic sessioning service. And for that sessioning service > >to be useful, your application has to be able to depend on its > >availability so it can't be "oblivious". > > Exactly. As soon as you start trying to have configured services, you are > creating Yet Another Framework. Which isn't a bad thing per se, except > that it falls outside the scope of PEP 333. It deserves a separate PEP, I > think, and a separate implementation mechanism than being crammed into the > request environment. These things should be allowed to be static, so that > an application can do some reasonable setup, and so that you don't have > per-request overhead to shove ninety services into the environment. > > Also, because we are dealing not with basic plumbing but with making a nice > kitchen, it seems to me we can afford to make the fixtures nice. That is, > for an add-on specification to WSGI we don't need to adhere to the "let it > be ugly for apps if it makes the server easier" principle that guided PEP > 333. The assumption there was that people would mostly port existing > wrappers over HTTP/CGI to be wrappers over WSGI. But for services, we are > talking about an actual framework to be used by application developers > directly, so more user-friendliness is definitely in order. > > For WSGI itself, the server-side implementation has to be very server > specific. But the bulk of a service stack could be implemented once (e.g. > as part of wsgiref), and then just used by servers. So, we don't have to > worry as much about making it easy for server people to implement, except > for any server-specific choices about how configuration might be > stacked. (For example, in a filesystem-oriented server like Apache, you > might want subdirectories to inherit services defined in parent directories.) > > > >OTOH, the primary benefit -- to me, at least -- of modeling services as > >WSGI middleware is the fact that someone else might be able to use my > >service outside the scope of my projects (and thus help maintain it and > >find bugs, etc). So if I've got the wrong concept of what kinds of > >middleware that I can expect "normal" people to use, I don't want to go > >very far down that road without listening carefully to Phillip. Perhaps > >I'll have a shot at influencing the direction of WSGI to make it more > >appropriate for this sort of thing or maybe we'll come up with a better > >way of doing it. > > > >Zope 3 is a component system much like what I'm after, and I may just > >end up using it wholesale. But my immediate problem with Zope 3 is that > >like Zope 2, it's a collection of libraries that have dependencies on > >other libraries that are only included within its own checkout and don't > >yet have much of a life of their own. It's not really a technical > >problem, it's a social one... I'd rather have a somewhat messy framework > >with a lot of diversity composed of wildly differing component > >implementations that have a life of their own than to be be trapped in a > >clean, pure world where all the components are used only within that > >world. > > > >I suspect there's a middle ground here somewhere. > > Right; I'm suggesting that we grow a "WSGI Deployment" or "WSGI Stack" > specification that includes a simple way to obtain services (using the Zope > 3 definition of "service" as simply a named component). This would form > the basis for various "WSGI Service" specifications. And, for existing > frameworks there's at least some potential possibility of integrating with > this stack, since PEAK and Zope 3 both already have ways to define and > acquire named services, so it might be possible to define the spec in such > a way that their implementations could be reused by wrapping them in a thin > "WSGI Stack" adapter. Similarly, if there are any other frameworks out > there that offer similar functionality, then they ought to be able to play > too, at least in principle. > From mso at oz.net Mon Jul 18 23:11:51 2005 From: mso at oz.net (mso@oz.net) Date: Mon, 18 Jul 2005 14:11:51 -0700 (PDT) Subject: [Web-SIG] Standardized configuration In-Reply-To: <5.1.1.6.0.20050717135650.026a0428@mail.telecommunity.com> References: <5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com> <5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com> <5.1.1.6.0.20050717135650.026a0428@mail.telecommunity.com> Message-ID: <32994.161.55.66.121.1121721111.squirrel@www.oz.net> A couple things I don't understand in this discussion. Phillip J. Eby said: > At 03:28 AM 7/17/2005 -0500, Ian Bicking wrote: >>I guess I'm treating the request environment as that context. I don't >>really see the problem with that...? > > It puts a layer in the request call stack for each service you want to > offer, versus *no* layers for an arbitrary number of services. It adds > work to every request to put stuff into the environment, then take it out > again, versus just getting what you want in the first place. But the "overhead" is adding one dictionary item and reading it again. The most insignificant thing imaginable. More important is the ugliness of accessing an arbitrarily-named key in the application, but even that is minor. > The difference is obliviousness; if you want to *wrap* an application not > written to use WSGI services, then it makes sense to make it > middleware. If you're writing a new application, just have it use > components instead of mocking up a 401 just so you can use the existing > middleware. That seems to suggest the whole PEP 333 excersise was a waste of time. (I'm not saying it is, just that it seems to be the logical conclusion of your statement.) WSGI is just "backward compatibility" for existing applications? Practically all the interesting middleware falls into this "component" category. I'm having a hard time seeing what middleware a naive CGI/legacy application would benefit from, besides access to alternative webservers. (But at this point, none of these are "better" than the frameworks' native servers.) Especially since legacy apps access their services in a framework-specific way and would need specific middleware or patching. If a new API is in order, it seems high priority to get a PEP out soon, or at least some reference implementations. Otherwise the middleware way will become a de facto standard. -- -- Mike Orr From ianb at colorstudy.com Tue Jul 19 04:57:40 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 18 Jul 2005 21:57:40 -0500 Subject: [Web-SIG] Standardized configuration In-Reply-To: <669891ba7d6b3bb20d95f44ff112074a@dscpl.com.au> References: <1121571455.24386.171.camel@plope.dyndns.org> <42D9DEBA.4080609@colorstudy.com> <1121578280.24386.228.camel@plope.dyndns.org> <42DA13CE.2080208@colorstudy.com> <669891ba7d6b3bb20d95f44ff112074a@dscpl.com.au> Message-ID: <42DC6C24.3080905@colorstudy.com> Graham Dumpleton wrote: > My understanding from reading the WSGI PEP and examples like that above is > that the WSGI middleware stack concept is very much tree like, but where at > any specific node within the tree, one can only traverse into one child. > Ie., > a parent middleware component could make a decision to defer to one > child or > another, but there is no means of really trying out multiple choices until > you find one that is prepared to handle the request. The only way around it > seems to be make the linear chain of nested applications longer and longer, > something which to me just doesn't sit right. In some respects the need for > the configuration scheme is in part to make that less unwieldy. It's not at all limited to this, but these are simply the ones that are easy to configure, and can be inserted into a stack without changing the stack very much. > What I am doing is making it acceptable for a handler to also return None. > If this were returned by the highest level handler, it would equate to > being > the same as DECLINED, but within the context of middleware components it > has a lightly relaxed meaning. Specifically, it indicates that that handler > isn't returning a response, but not that it is indicating that the request > as a whole is being DECLINED causing a return to Apache. Incidentally, I'd typically use an exception when the return value didn't include the semantics I wanted, but that might not be a problem here. > One last example, is what a session based login mechanism might look like > since this was one of the examples posed in the initial discussion. Here > you > might have a handler for a whole directory which contains: > > _userDatabase = _users.UserDatabase() > > handler = Handlers( > IfLocationMatches(r"\.bak(/.*)?$",NotFound()), > IfLocationMatches(r"\.tmpl(/.*)?$",NotFound()), > > IfLocationIsADirectory(ExternalRedirect('index.html')), > > # Create session and stick it in request object. > CreateUserSession(), > > # Login form shouldn't require user to be logged in to access it. > IfLocationMatches(r"^/login\.html(/.*)?$",CheetahModule()), > > # Serve requests against login/logout URLs and otherwise > # don't let request proceed if user not yet authenticated. > # Will redirect to login form if not authenticated. > FormAuthentication(_userDatabase,"login.html"), > > SetResponseHeader('Pragma','no-cache'), > SetResponseHeader('Cache-Control','no-cache'), > SetResponseHeader('Expires','-1'), > > IfLocationMatches(r"/.*\.html(/.*)?$",CheetahModule()), > ) > > Again, one has done away with the need for a configuration files as the > code > itself specifies what is required, along with the constraints as to what > order things should be done in. > > Another thing this example shows is that handlers when they return None due > to not returning an actual response, can still add to the response headers > in the way of special cookies as required by sessions, or headers > controlling > caching etc. This is not possible in WSGI middleware if handled in a chain-like fashion. Nested middleware can do this, of course. This kind of chaining would be necessary if "services" were used, as many services have to effect the response, and there's no WSGI-related spec about where or how they would do that. Though I haven't digested all the long emails lately... > In terms of late binding of which handler is executed, the "PythonModule" > handler is one example in that it selects which Python module to load only > when the request is being handled. Another example of late construction of > an instance of a handler in what I am doing, albeit the same type, is: > > class Handler: > > def __init__(self,req): > self.__req = req > > def __call__(self,name="value"): > self.__req.content_type = "text/html" > self.__req.send_http_header() > self.__req.write("") > self.__req.write("

name=%r

"%cgi.escape(name)) > self.__req.write("") > return apache.OK > > handler = IfExtensionEquals("html",HandlerInstance(Handler)) > > First off the "HandlerInstance" object is only triggered if the request > against this specific file based resource was by way of a ".html" > extension. When it is triggered, it is only at that point that an instance > of "Handler" is created, with the request object being supplied to the > constructor. Incidentally, I'm doing something a little like that with the filebrowser example in Paste: http://svn.pythonpaste.org/Paste/trunk/examples/filebrowser/web/__init__.py Looking at it now, it's not clear where that's happening, but (in application()) context.path(path) creates a WSGI application using a class based on the extension/expected mime type. So the dispatching is similar. > To round this off, the special "Handlers" handler only contains the > following > code. Pretty simple, but makes construction of the component hierarchy a > bit > easier in my mind when multiple things need to be done in turn where > nesting > isn't strictly required. > > class Handlers: > > def __init__(self,*handlers): > self.__handlers = handlers > > def __call__(self,req): > if len(self.__handlers) != 0: > for handler in self.__handlers: > result = _execute(req,handler,lazy=True) > if result is not None: > return result > > Would be very interested to see how people see this relating to what is > possible > with WSGI. Could one instigate a similar sort of class to "Handlers" in > WSGI > to sequence through WSGI applications until one generates a complete > response? > > The areas that have me thinking the answer is "no" is that I recollect > the PEP > saying that the "start_response" object can only be called once, which > precludes > applications in a list adding to the response headers without returning > a valid > status. Secondly, if "start_response" object hasn't been called when the > parent > starts to try and construct the response content from the result of > calling the > application, it raises an error. But then, I have a distinct lack of proper > knowledge on WSGI so could be wrong. When you just want to add headers (like with a session) you can use wrapping middleware, which appends to its application's response headers, but doesn't create a full response on its own. As for the order, when there's an issue you can cache the call. For instance, if I want to look at what gets passed to start_response before passing it up to the server, I create a fake start_response that just saves the values. Or sometimes a start_response that merely watches the values, like when I want to check the content-type to see if I can insert information into the page (since you can't append text to an image, for instance). > If my thinking is correct, it could only be done by changing the WSGI > specification > to support the concept of trying applications in sequence, by way of > allowing None > as the status when "start_response" is called to indicate the same as > when I return > None from a handler. Ie., the application may have set headers, but > otherwise the > parent should where possible move to a subsequence application and try > it etc. There's several conventions that could be used for trying applications in-sequence. For instance, you could do something like this (untested) for delegating to different apps until one of them doesn't respond with a 404: class FirstFound(object): """Try apps in sequence until one doesn't return 404""" def __init__(self, apps): self.apps = apps def __call__(self, environ, start_response): def replacement_start_response(status, headers): if int(status.split()[0]) == 404: raise HTTPNotFound return start_response(status, headers) for app in self.apps[:-1]: try: return app(environ, replacement_start_response) except HTTPNotFound: pass # If the last one responds with 404, so be it return self.apps[-1](environ, start_response) > Anyway, people may feel that this is totally contrary to what WSGI is > all about and > not relevant and that is fine, I am at least finding it an interesting > idea to > play with in respect of mod_python at least. It's very relevent, at least in my opinion. This is exactly the sort of architecture I've been attracted to, and the kind of middleware I've been adding to Paste. The biggest difference is that mod_python uses an actual list and return values, where WSGI uses nested function calls. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Tue Jul 19 05:49:44 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 18 Jul 2005 22:49:44 -0500 Subject: [Web-SIG] Standardized configuration In-Reply-To: <1121599799.24386.347.camel@plope.dyndns.org> References: <1121571455.24386.171.camel@plope.dyndns.org> <42D9DEBA.4080609@colorstudy.com> <1121578280.24386.228.camel@plope.dyndns.org> <42DA13CE.2080208@colorstudy.com> <1121599799.24386.347.camel@plope.dyndns.org> Message-ID: <42DC7858.80007@colorstudy.com> Chris McDonough wrote: > On Sun, 2005-07-17 at 03:16 -0500, Ian Bicking wrote: > >>This is what Paste does in configuration, like: >> >>middleware.extend([ >> SessionMiddleware, IdentificationMiddleware, >> AuthenticationMiddleware, ChallengeMiddleware]) >> >>This kind of middleware takes a single argument, which is the >>application it will wrap. In practice, this means all the other >>parameters go into lazily-read configuration. > > > I'm finding it hard to imagine a reason to have another kind of > middleware. > > Well, actually that's not true. In noodling about this, I did think it > would be kind of neat in a twisted way to have "decision middleware" > like: In addition to the examples I gave in response to Graham, I wrote a document on this a while ago: http://pythonpaste.org/docs/url-parsing-with-wsgi.html The hard part about this is configuration; it's easy to configure a non-branching chain of middleware. Once it branches the configuration becomes hard (like programming-hard; which isn't *hard*, but it quickly stops feeling like configuration). >>You can also define a "framework" (a plugin to Paste), which in addition >>to finding an "app" can also add middleware; basically embodying all the >>middleware that is typical for a framework. > > > This appears to be what I'm trying to do too, which is why I'm intrigued > by Paste. > > OTOH, I'm not sure that I want my framework to "find" an app for me. > I'd like to be able to define pipelines that include my app, but I'd > typically just want to statically declare it as the end point of a > pipeline composed of service middleware. I should look at Paste a > little more to see if it has the same philosophy or if I'm > misunderstanding you. Mostly I wanted to avoid lots of magical incantations for the simple case. If you are used to Webware, well it has a very straight-forward way of finding your application -- you give it a directory name. If Quixote or CherryPy, you give it a root object. Maybe Zope would take a ZEO connection string, and so on. >>Paste is really a deployment configuration. Well, that as well as stuff >>to deploy. And two frameworks. And whatever else I feel a need or >>desire to throw in there. > > > Yeah. FWIW, as someone who has recently taken a brief look at Paste, I > think it would be helpful (at least for newbies) to partition out the > bits of Paste which are meant to be deployment configuration from the > bits that are meant to be deployed. Zope 2 fell into the same trap > early on, and never recovered. For example, ZPublisher (nee Bobo) was > always meant to be able to be useful outside of Zope, but in practice it > never happened because nobody could figure out how to disentangle it > from its ever-increasing dependencies on other software only found in a > Zope checkout. In the end, nobody even remembered what its dependencies > were *supposed* to be. If you ask ten people, you'd get ten different > answers. Maybe with setuptools' namespace packages I can try this sometime. It's not a high priority, though if splitting pieces out would make them more appealing then I could do that. Deployment doesn't actually interest me, it's just a pain in the ass and I wanted to give it a go. There's no real competition that I know of, because it's a boring and annoying problem ;) So if I split it off, it might become accidentally orphaned... > I also think that the rigor of separating out different components helps > to make the software stronger and more easily understood in bite-sized > pieces. Unfortunately, separating them makes configuration tough, but I > think that's what we're trying to find an answer about how to do "the > right way" here. Yes, you've reminded me why I brought this up, for that exact reason, though we've digressed a great deal. Lots of pieces of Paste have zero (or close to it) dependencies, except for configuration. That's what distinguishes a Paste component from a generic WSGI component, and I'm just as happy if there is no distinction. >>Note also that parts of the pipeline are very much late bound. For >>instance, the way I implemented Webware (and Wareweb) each servlet is a >>WSGI application. So while there's one URLParser application, the >>application that actually handles the request differs per request. If >>you start hanging more complete applications (that might have their own >>middleware) at different URLs, then this happens more generally. > > > Well, if you put the "decider" in middleware itself, all of the > middleware components in each pipeline could still be at least > constructed early. I'm pretty sure this doesn't really strictly qualify > as "early binding" but it's not terribly dynamic either. It also makes > configuration pretty straightforward. At least I can imagine a > declarative syntax for configuring pipelines this way. This is close to how Paste works now. The typical middleware stack does everything but find the terminal application object, though with hooks if you are inclined to add yet more middleware (like the Paste examples.filebrowser.web.__init__.application() object I mentioned before). > I'm pretty sure you're not advocating it, but in case you are, I'm not > sure it adds as much value as it removes to be able to have a "dynamic" > middleware chain whereby new middleware elements can be added "on the > fly" to a pipeline after a request has begun. That is *very* "late > binding" to me and it's impossible to configure declaratively. I'm comfortable with a little of both. I don't even know *how* I'd stop dynamic middleware. For instance, one of the methods I added to Wareweb recently allows any servlet to forward to any WSGI application; but from the outside the servlet looks like a normal WSGI application just like before. I guess this is part of the advantage (and disadvantage) of completely opaque applications; you don't and can't know what they do. >>>I've just seen Phillip's post where he implies that this kind of >>>fine-grained component factoring wasn't really the initial purpose of >>>WSGI middleware. That's kind of a bummer. ;-) >> >>Well, I don't understand the services he's proposing yet. I'm quite >>happy with using middleware the way I have been, so I'm not seeing a >>problem with it, and there's lots of benefits. > > > I agree! I'm a bit confused because one of the canonical examples of > how WSGI middleware is useful seems to be the example of implementing a > framework-agnostic sessioning service. And for that sessioning service > to be useful, your application has to be able to depend on its > availability so it can't be "oblivious". This is where I'd like additional (incrementally agreed upon) standards. For instance, a standard for the interface of 'webapp01.session'. It's a requirement, certainly, but the requirement is merely "there must be a webapp01-compliant session installed". > OTOH, the primary benefit -- to me, at least -- of modeling services as > WSGI middleware is the fact that someone else might be able to use my > service outside the scope of my projects (and thus help maintain it and > find bugs, etc). So if I've got the wrong concept of what kinds of > middleware that I can expect "normal" people to use, I don't want to go > very far down that road without listening carefully to Phillip. Perhaps > I'll have a shot at influencing the direction of WSGI to make it more > appropriate for this sort of thing or maybe we'll come up with a better > way of doing it. Well, you can go some ways. If you are distributing an application -- which can be very fine-grained -- you can always resort to invoking middleware yourself. If you are distributing middleware or a library that depends on middleware, then dependencies are part of the deployment configuration. Which has always been the case. Also, a smart middleware can pretend to be many kinds of middleware, by putting objects with different (wrapper) interfaces in multiple keys. So if we have an explosion of incompatible session middlewares, for instance, we can ultimately create an ubersession that maintains backward compatibility and provides a forward-compatible interface. > Zope 3 is a component system much like what I'm after, and I may just > end up using it wholesale. But my immediate problem with Zope 3 is that > like Zope 2, it's a collection of libraries that have dependencies on > other libraries that are only included within its own checkout and don't > yet have much of a life of their own. It's not really a technical > problem, it's a social one... I'd rather have a somewhat messy framework > with a lot of diversity composed of wildly differing component > implementations that have a life of their own than to be be trapped in a > clean, pure world where all the components are used only within that > world. My personal critique would be that Zope 3 adds novel concepts more than libraries, and they are better concepts than in Zope 2 (where "concept" was just whatever got thrown into the most base classes), but there's still a lot of concept there. Some of them deserve to become part of the wider Python knowledge base. I think some of them don't. But there's no survival of the fittest, since the concepts depend on each other. > I suspect there's a middle ground here somewhere. > > >>>Factoring middleware components in this way seems to provide clear >>>demarcation points for reuse and maintenance. For example, I imagined a >>>declarative security module that might be factored as a piece of >>>middleware here: http://www.plope.com/Members/chrism/decsec_proposal . >> >>Yes, I read that before; I haven't quite figured out how to digest it, >>though. This is probably in part because of the resource-based >>orientation of Zope, and WSGI is application-based, where applications >>are rather opaque and defined only in terms of function. > > > Yes, it is a bit Zopeish because it assumes content lives at a path. > This isn't always the case, I know, but it often is. Well, it's a bit > of a stretch, but an alternate decsec implementation might use a > "content identifier" to determine the protection of a resource instead > of a full path. > > For example, if you're implementing an application that is very simple > and takes one and only one URL, but calls it with a different query > string variable to display different pieces of content (e.g. > '/blog?entry_num=1234'), you might have one ACL as the "root" ACL but > optionally protect each piece of content with a separate ACL if one can > be found. Maybe the content-specific ACL would be 'entry_num=1234' > instead of a path. Zope really puts a lot of importance in paths; though I don't think typical Zope applications have any better URLs as a result. I don't know if that's something specific to Zope, or merely the inevitable result that when you make something Important you make it Hard and Fragile. I'd actually go for the latter, which is why I'd be very reluctant to make URL-based permissions anything more than one tool among many. Something like services seem more practical in this case, or perhaps an advisory object that gets placed in the request if we're seeing what we can do without services. The advisory object doesn't know what the entry_num=1234 object is, but the application can figure out how that object maps to what the advisory object knows about (e.g., owners and editors and whatnot). But oh! that's exactly what you describe below. With all these long emails I don't have the room in my brain to read ahead, because it all becomes a jumble of WSGIness. Which is good, just hard... > A function that accepts a form post for displaying > or changing the blog entry for 1234 might look like this: > > def blog(environ, start_response): > acl = environ['acl'] # added by decsec middleware > userid = environ['userid'] # added by an authentication middleware > formvars = get_form_vars_from(environ) > if formvars['action'] == "view": > permission = 'view' > elif formvars['action'] == "change": > permission = 'edit' > content = get_blog_entry(environ) > # pulls out the entry for 1234 > if not acl.check(userid, permission): > start_response('401 Unauthorized', []) > return ['Unauthorized'] > [ ... further code to change or display the blog entry ... ] > > The ACL could be the "root" ACL (say, all users can view, members of the > group "manager" could change, everything else is denied). The "root" > ACL would be used if content did not have its own ACL. But associating > an ACL with a content identifier would allow the developer or site > manager to protect individual blog entries (e.g. 1234, 5678, etc) with > different ACLs. "Joe can view this one but he can't change it", "Jim > can view all of them and can change all of them", etc.. the sorts of > things useful for "staging" and workflow delegation without unduly > mucking up the actual application code. > > Decsec would also take into account the user's group memberships and so > forth during the "check" step, so you wouldn't have to write any of this > code either. The "blog" example is stupid, of course, the concept is > more useful for higher-security apps. > > Sorry, all of this is somewhat besides the point of this thread, but it > does provide an example of kind of functionality I'd like to be able to > put into middleware. > > >>>Of course, this sort of thing doesn't *need* to be middleware. But >>>making it middleware feels very right to me in terms of being able to >>>deglom nice features inspired by Zope and other frameworks into pieces >>>that are easy to recombine as necessary. Implementations as WSGI >>>middleware seems a nice way to move these kinds of features out of our >>>respective applications and into more application-agnostic pieces that >>>are very loosely coupled, but perhaps I'm taking it too far. >> >>Certainly these pieces of code can apply to multiple applications and >>disparate systems. The most obvious instance right now that I think of >>is a WSGI WebDAV server (and someone's working on that for Google Summer >>of Code), which should be implemented pretty framework-free, simply >>because a good WebDAV implementation works at a low level. But >>obviously you want that to work with the same authentication as other >>parts of the system. > > > Yes. In particular, if you knew you were working with an application > that could resolve a path in terms of containers and contained pieces of > content (just like a filesystem does), it would be pretty easy to code > up a DAV "action middleware" component that rendered containerish things > as DAV "collections" and contentish things as DAV "resources", and which > could handle DAV locking and property rendering and so forth. > > This kind of middleware might be tough, though, because it probably > requires explicit cooperation from the end-point application (it expects > to be talking to an actual filesystem, but that won't always be the case > at least without some sort of adaptation). I think WebDAV is very unripe for WSGI abstractions. And even if I remember the Zope WebDAV code I briefly looked at, it special cases all sorts of things (e.g., based on user agent) because there's so much more semantics than with a normal web page. It's the kind of place where introspection really would be helpful; though maybe the discipline of enforced decoupling would still help. > But in any case, it's a good example of how we could prevent people from > needing to reinvent the wheel... this guy appears to be coming up with > his own identification, authentication, authorization, and challenge > libraries entirely http://cwho.blogspot.com/ which just feels very > wasteful. Yes; I'm his advisor. I've encouraged him to look at reusing stuff, but I really have to give stronger direction. >>>Virtual hosting awareness >> >>I've never had a problem with this, except in Zope... >> >>Anyway, to me this feels like a kind of URL parsing. One of the >>mini-proposals I made before involved a way of URL parsers to add URL >>variables to the system (basically a standard WSGI key to put URL >>variables as a dictionary). So a pattern like: >> >> (?.*)\.myblogspace.com/(?\d\d\d\d)/(?\d\d)/ >> >>Would add username, year, and month variables to the system. But regex >>matching is just one way; the *result* of parsing is usually either in >>the object (e.g., you use domains to get entirely different sites), or >>in terms of these variables. > > > Yes, this seems to be more of a problem for Zope because it's a) a > long-running app with its own webserver b) has convenience functions for > generating URLs based on its internal containment graph and c) doesn't > deal well with relative URLs. So if you want an application that lives > in a "subfolder" of your Zope object graph to behave as if it lives at > "http://example.com" instead of "http://example.com/subfolder", you need > to give it clues. Incidentally, since this is frequently a problem, for my applications I've been using something bookmark-like; at some point in the request (often just before URLParser is invoked) I store the SCRIPT_NAME and give it some name (like 'app_name.base_url'). Then I can construct all my URLs relative to that. This still involves information I keep in my head (like how internal URLs are constructed), but at least it gets it right without hardcoding/configuring URLs, or being clever and getting it wrong. >>>Transformation during rendering >> >>If you mean what I think -- e.g., rendering XSL -- I think WSGI is ripe >>for this sort of thing. > > > Yes, that's what I meant. Incidentally someone just did an XSLT middleware today: http://www.decafbad.com/blog/2005/07/18/discovering_wsgi_and_xslt_as_middleware -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From grahamd at dscpl.com.au Tue Jul 19 06:02:10 2005 From: grahamd at dscpl.com.au (Graham Dumpleton) Date: Tue, 19 Jul 2005 00:02:10 -0400 Subject: [Web-SIG] Standardized configuration Message-ID: <1121745730.17225@dscpl.user.openhosting.com> Ian Bicking wrote .. > There's several conventions that could be used for trying applications > in-sequence. For instance, you could do something like this (untested) > for delegating to different apps until one of them doesn't respond with > a 404: > > class FirstFound(object): > """Try apps in sequence until one doesn't return 404""" > def __init__(self, apps): > self.apps = apps > def __call__(self, environ, start_response): > def replacement_start_response(status, headers): > if int(status.split()[0]) == 404: > raise HTTPNotFound > return start_response(status, headers) > for app in self.apps[:-1]: > try: > return app(environ, replacement_start_response) > except HTTPNotFound: > pass > # If the last one responds with 404, so be it > return self.apps[-1](environ, start_response) > > > Anyway, people may feel that this is totally contrary to what WSGI is > > all about and > > not relevant and that is fine, I am at least finding it an interesting > > idea to > > play with in respect of mod_python at least. As far as using 404 to indicate this, I had thought of that, but it then precludes one of those applications actually raising that as a real response. I often return NotFound as opposed to Forbidden when access is to files such as ".py" files. Return forbidden still gives a clue as to what implementation language is used where as returning not found doesn't. I do this, perhaps in a misguided way, as by not exposing how something is implemented, I feel it makes it just a bit harder for people to work out how to breach your security. :-) If one was going to use a specific error code to indicate next application object should be tried, maybe it might be more appropriate to use 303 (See Other) with there being no redirect URL specified. Ie., something that doesn't necessarily overlap with something that might be valid for a application object to do. > It's very relevent, at least in my opinion. This is exactly the sort of > architecture I've been attracted to, and the kind of middleware I've > been adding to Paste. The biggest difference is that mod_python uses an > actual list and return values, where WSGI uses nested function calls. To say that mod_python uses an actual list is only really true at the level of Apache configuration where one defines the PythonHandler directive and can specify multiple handlers to run in succession. Most people would only have the one. At the level I am working where I use "Handlers()", not a part of mod_python itself, I am using both sequences of handlers as well as recursive nesting. The "IfLocationMatches()" object in my examples was wrapping the "NotFound()" object, but it could equally have wrapped a "Handlers()" or another "If" object, which in turn wraps lower level objects. Even the "PythonModule()" object wrapped objects indirectly, they just happen to be loaded at run time much like the URLParser example for Paste. Thus I am using both lists and nested callable objects in the way of wrappers. WSGI seems to focus mainly on the latter of using only nested calls in all the examples I have seen, although you do show above one way perhaps of having a lineal search for an application object. Anyway, the point I was trying to make was that to me, the lineal search through a list of handlers (or application objects) seems to be an easier way of dealing with things in some cases and looks simpler in code than having a long nested chain of objects, yet WSGI doesn't seem to make any real use of that approach to composing together middleware components. I'll leave it at that for the moment. I guess I'll just have to show whether one way works better and is easier to understand than the other by way of example at some point. :-) Thanks for the response. Graham From chrism at plope.com Tue Jul 19 08:39:11 2005 From: chrism at plope.com (Chris McDonough) Date: Tue, 19 Jul 2005 02:39:11 -0400 Subject: [Web-SIG] Standardized configuration In-Reply-To: <42DC7858.80007@colorstudy.com> References: <1121571455.24386.171.camel@plope.dyndns.org> <42D9DEBA.4080609@colorstudy.com> <1121578280.24386.228.camel@plope.dyndns.org> <42DA13CE.2080208@colorstudy.com> <1121599799.24386.347.camel@plope.dyndns.org> <42DC7858.80007@colorstudy.com> Message-ID: <1121755151.13123.70.camel@plope.dyndns.org> On Mon, 2005-07-18 at 22:49 -0500, Ian Bicking wrote: > In addition to the examples I gave in response to Graham, I wrote a > document on this a while ago: > http://pythonpaste.org/docs/url-parsing-with-wsgi.html > > The hard part about this is configuration; it's easy to configure a > non-branching chain of middleware. Once it branches the configuration > becomes hard (like programming-hard; which isn't *hard*, but it quickly > stops feeling like configuration). Yep. I think I'm getting it. For example, I see that Paste's URLParser seems to *construct* applications if they don't already exist based on the URL. And I assume that these applications could themselves be middleware. I don't think that is configurable declaratively if you want to decide which app to use based on arbitrary request parameters. But if we already had the config for each app "instance" that URLParser wanted to consult laying around as files on disk, wouldn't it be just as easy to construct these app objects "eagerly" at startup time? Then you URLParser could choose an already-configured app based on some sort of configuration file in the URLParser component itself. The "apps" themselves may be pipelines, too, I realize that, but that is still configurable without coding. Maybe there'd be some concern about needing to stop the process in order to add new applications. That's a use case I hadn't really considered. I suspect this could be done with a signal handler, though, which could tell the URLParser to reload its config file instead of potentially locating a and creating a new application within every request. This would make URLParser a kind of "decision" middleware, but it would choose from a static set of existing applications (or pipelines) for the lifetime of the process as opposed to constructing them lazily. > > OTOH, I'm not sure that I want my framework to "find" an app for me. > > I'd like to be able to define pipelines that include my app, but I'd > > typically just want to statically declare it as the end point of a > > pipeline composed of service middleware. I should look at Paste a > > little more to see if it has the same philosophy or if I'm > > misunderstanding you. > > Mostly I wanted to avoid lots of magical incantations for the simple > case. If you are used to Webware, well it has a very straight-forward > way of finding your application -- you give it a directory name. If > Quixote or CherryPy, you give it a root object. Maybe Zope would take a > ZEO connection string, and so on. I think I understand now. In general, I think I'd rather create "instance" locations of WSGI applications (which would essentially consist of a config file on disk plus any state info required by the app), configure and construct Python objects out of those instances eagerly at "startup time" and just choose between already-constructed apps if in "decision middleware" that has its own declarative configuration if decisions need to be made about which app to use. This is mostly because I want the configuration info to live within the application/middleware instance and have some other "starter" import those configurations from application/middleware instance locations on the filesystem. The "starter" would construct required instances as Python objects, and chain them together arbitrarily based on some other "pipeline configuration" file that lives with the "starter". The first part of that (construct required instances) is described in a post I made to this list yesterday. This is probably because I'd like there to be one well-understood way to declaratively configure pipelines as opposed to each piece of middleware potentially needing to manage app construction and having its own configuration to do so. I don't know if this is reasonable for simpler requirements. This is more of a "formal deployment spec" idea and of course is likely flawed in some subtle way I don't understand yet. > > I'm pretty sure you're not advocating it, but in case you are, I'm not > > sure it adds as much value as it removes to be able to have a "dynamic" > > middleware chain whereby new middleware elements can be added "on the > > fly" to a pipeline after a request has begun. That is *very* "late > > binding" to me and it's impossible to configure declaratively. > > I'm comfortable with a little of both. I don't even know *how* I'd stop > dynamic middleware. For instance, one of the methods I added to Wareweb > recently allows any servlet to forward to any WSGI application; but from > the outside the servlet looks like a normal WSGI application just like > before. It's obviously fine if applications themselves want to do this. I'm not sure that it would be possible to create a "deployment spec" that canonized *how* to do it because as you mentioned it's not really a configuration task, it's a programming task. > > I agree! I'm a bit confused because one of the canonical examples of > > how WSGI middleware is useful seems to be the example of implementing a > > framework-agnostic sessioning service. And for that sessioning service > > to be useful, your application has to be able to depend on its > > availability so it can't be "oblivious". > > This is where I'd like additional (incrementally agreed upon) standards. > For instance, a standard for the interface of 'webapp01.session'. > It's a requirement, certainly, but the requirement is merely "there must > be a webapp01-compliant session installed". Yes... I think the best way to describe this sort of thing is through interfaces (at least notional, documented ones, if not formal ones that can be introspected at runtime). But that will need to be fleshed out on a service-by-service basis, obviously. FWIW, I'm also finding myself agreeing with Phillip's idea of allowing applications to have a context object to which can help them find services, as opposed to implementing each service entirely as middleware. Instead of obtaining the sessioning service via "environ['webapp01.session']" in an application's __call__ , you might do "self.context.get_service('session')"... or maybe even "environ['services'].get_service('session')". The latter would be easier to add because we'd be using an existing PEP 333 protocol. We'd consume a single key within the environ namespace, but there would need to be no change to the WSGI spec. This would be pretty straightforward and a separate services framework could be implemented outside WSGI entirely perhaps taking some cues from PEAK and/or Zope 3 ( or even [gasp] *code!*, god knows this problem has already been solved many times over ;-) -- for implementing service registration and lookup. It could form the basis for a "WSGI services" spec without muddying the waters for PEP 333. That said, if you're not interested in that because you think implementing services as middleware is "good enough" and you'd rather not implement another framework, I'd totally understand that. At that point I probably wouldn't be interested either because you're the defacto champion of WSGI middleware as a lingua franca and the only reason to do any of this is for the sake of collaboration and code sharing. But I do think it would be cleaner. Anyway, lots of good ideas and tips in your further responses, thanks, but for the sake of brevity and keeping the thread somewhat on topic, I won't respond to them. - C From ianb at colorstudy.com Tue Jul 19 19:15:00 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 19 Jul 2005 12:15:00 -0500 Subject: [Web-SIG] Standardized configuration In-Reply-To: <1121755151.13123.70.camel@plope.dyndns.org> References: <1121571455.24386.171.camel@plope.dyndns.org> <42D9DEBA.4080609@colorstudy.com> <1121578280.24386.228.camel@plope.dyndns.org> <42DA13CE.2080208@colorstudy.com> <1121599799.24386.347.camel@plope.dyndns.org> <42DC7858.80007@colorstudy.com> <1121755151.13123.70.camel@plope.dyndns.org> Message-ID: <42DD3514.5080100@colorstudy.com> Chris McDonough wrote: > On Mon, 2005-07-18 at 22:49 -0500, Ian Bicking wrote: > >>In addition to the examples I gave in response to Graham, I wrote a >>document on this a while ago: >>http://pythonpaste.org/docs/url-parsing-with-wsgi.html >> >>The hard part about this is configuration; it's easy to configure a >>non-branching chain of middleware. Once it branches the configuration >>becomes hard (like programming-hard; which isn't *hard*, but it quickly >>stops feeling like configuration). > > > Yep. I think I'm getting it. For example, I see that Paste's URLParser > seems to *construct* applications if they don't already exist based on > the URL. And I assume that these applications could themselves be > middleware. I don't think that is configurable declaratively if you > want to decide which app to use based on arbitrary request parameters. > > But if we already had the config for each app "instance" that URLParser > wanted to consult laying around as files on disk, wouldn't it be just as > easy to construct these app objects "eagerly" at startup time? Then you > URLParser could choose an already-configured app based on some sort of > configuration file in the URLParser component itself. The "apps" > themselves may be pipelines, too, I realize that, but that is still > configurable without coding. That's what paste.urlmap is for: http://svn.pythonpaste.org/Paste/trunk/paste/urlmap.py (I haven't actually tried using it much for practical things, so it's quite possible it has design mistakes in it) The idea being that you do: urlmap['/myapp'] = MyApp() But additionally (in PathProxyURLMap): urlmap['/myapp'] = 'myapp.conf' And it builds the application from the configuration file. > Maybe there'd be some concern about needing to stop the process in order > to add new applications. That's a use case I hadn't really considered. > I suspect this could be done with a signal handler, though, which could > tell the URLParser to reload its config file instead of potentially > locating a and creating a new application within every request. > > This would make URLParser a kind of "decision" middleware, but it would > choose from a static set of existing applications (or pipelines) for the > lifetime of the process as opposed to constructing them lazily. URLParser itself is just one parsing implementation, though maybe named too generically. I don't think that particular code needs to grow many more features, but there's also room for many other parsers. And it's also fairly easy to wrestle control from URLParser if that gets put in the stack (for instance, putting an application function in __init__.py will basically take over URL parsing for that directory). >>>OTOH, I'm not sure that I want my framework to "find" an app for me. >>>I'd like to be able to define pipelines that include my app, but I'd >>>typically just want to statically declare it as the end point of a >>>pipeline composed of service middleware. I should look at Paste a >>>little more to see if it has the same philosophy or if I'm >>>misunderstanding you. >> >>Mostly I wanted to avoid lots of magical incantations for the simple >>case. If you are used to Webware, well it has a very straight-forward >>way of finding your application -- you give it a directory name. If >>Quixote or CherryPy, you give it a root object. Maybe Zope would take a >>ZEO connection string, and so on. > > > I think I understand now. > > In general, I think I'd rather create "instance" locations of WSGI > applications (which would essentially consist of a config file on disk > plus any state info required by the app), configure and construct Python > objects out of those instances eagerly at "startup time" and just choose > between already-constructed apps if in "decision middleware" that has > its own declarative configuration if decisions need to be made about > which app to use. I think this is a laudible goal. Right now, when I'm deploying applications written for Paste, I am reluctant to intermingle them in the same process and configuration... but that's because Paste is young, not because that's a bad idea. But as a result I haven't tried it, and I only have a moderate concept of what it would mean in practice. A neat feature would be to configure fairly seemlessly across process boundaries. E.g., add a "fork=True" parameter to an application's configuration, and the server would fork a process (or delegate to an already forked worker process) for that application. That's the sort of thing that could move Python into PHP-style hosting situations. > This is mostly because I want the configuration info to live within the > application/middleware instance and have some other "starter" import > those configurations from application/middleware instance locations on > the filesystem. The "starter" would construct required instances as > Python objects, and chain them together arbitrarily based on some other > "pipeline configuration" file that lives with the "starter". The first > part of that (construct required instances) is described in a post I > made to this list yesterday. > > This is probably because I'd like there to be one well-understood way to > declaratively configure pipelines as opposed to each piece of middleware > potentially needing to manage app construction and having its own > configuration to do so. > > I don't know if this is reasonable for simpler requirements. This is > more of a "formal deployment spec" idea and of course is likely flawed > in some subtle way I don't understand yet. I think there's probably some room for separation. In practice I end up with multiple configuration files for my projects -- one that's generic to the application, and one that's specific to the deployment. But it's very hard to determine ahead of time what stuff goes where. For instance, server options mostly go in the deployment configuration. Until I start building conventions about configuration information on the servers, at which time I expect configuration will migrate into common locations in the form of configuration-loading options. E.g., where I now do: server = 'scgi_threaded' port = 4010 In the future I might do: import port_map port = port_map.find_port(app_name) Where port_map is some global module where I keep the entire server's list of ports mappings. And being able to do stuff like this is what makes Python-syntax imperative configuration so nice... it's crude and annoying, but configuration that is more declarative becomes even worse when you try to build these kind of features into it. But I digress... the deployment configuration as I currently use it is usually something that overwrites the generic application configuration. They aren't two distinct things. And the configuration doesn't belong to one or the other. Is the location of session information server specific, application specific, profile specific? It depends on your situation. I might have a standard convention for the location of Javascript libraries that lives in my configuration; but on my development machine I override that because I'm doing development on one of those libraries. There's all sorts of specific cases, and in declarative or well-partitioned configurations the configuration language has to include lots and lots of features. Or you end up with configuration file generation or other nonsense. In the end, I think I have more faith in the general applicability of Python as a way to describe structures, combined with strong configuration-specific conventions and style guides. Otherwise it feels like this embeds policy into the configuration-loading code, and I hate policy in code. >>>I'm pretty sure you're not advocating it, but in case you are, I'm not >>>sure it adds as much value as it removes to be able to have a "dynamic" >>>middleware chain whereby new middleware elements can be added "on the >>>fly" to a pipeline after a request has begun. That is *very* "late >>>binding" to me and it's impossible to configure declaratively. >> >>I'm comfortable with a little of both. I don't even know *how* I'd stop >>dynamic middleware. For instance, one of the methods I added to Wareweb >>recently allows any servlet to forward to any WSGI application; but from >>the outside the servlet looks like a normal WSGI application just like >>before. > > > It's obviously fine if applications themselves want to do this. I'm not > sure that it would be possible to create a "deployment spec" that > canonized *how* to do it because as you mentioned it's not really a > configuration task, it's a programming task. > > >>>I agree! I'm a bit confused because one of the canonical examples of >>>how WSGI middleware is useful seems to be the example of implementing a >>>framework-agnostic sessioning service. And for that sessioning service >>>to be useful, your application has to be able to depend on its >>>availability so it can't be "oblivious". >> >>This is where I'd like additional (incrementally agreed upon) standards. >> For instance, a standard for the interface of 'webapp01.session'. >>It's a requirement, certainly, but the requirement is merely "there must >>be a webapp01-compliant session installed". > > > Yes... I think the best way to describe this sort of thing is through > interfaces (at least notional, documented ones, if not formal ones that > can be introspected at runtime). But that will need to be fleshed out > on a service-by-service basis, obviously. > > FWIW, I'm also finding myself agreeing with Phillip's idea of allowing > applications to have a context object to which can help them find > services, as opposed to implementing each service entirely as > middleware. > > Instead of obtaining the sessioning service via > "environ['webapp01.session']" in an application's __call__ , you might > do "self.context.get_service('session')"... or maybe even > "environ['services'].get_service('session')". The latter would be > easier to add because we'd be using an existing PEP 333 protocol. We'd > consume a single key within the environ namespace, but there would need > to be no change to the WSGI spec. I have to read over PJE's email some more. It doesn't really remove the need for middleware, it's more like it could consolidate many services into one generic service middleware. For instance, the session service still needs access to the response, and the only general way to access the response is through middleware. The request, at least, can be generally accessed as the environment dictionary; but replacing middleware with contracts on what you must return from your application is a non-starter. E.g., if an auth service requires something like: auth = get_service('auth') if not auth.allowed(app_context): forbidden = auth.forbidden() start_response(forbidden[0], forbidden[1]) return forbidden[2] Well... that's not very nice, is it? And it's totally infeasible once your code is in the bowls of some framework. You could do it with an exception (with some middleware that catches the exception). You could do the session service with some middleware that collects extra headers and other response information. And now that I'm thinking through an implementation, I realize it's something I've thought of before -- in my mind it was about lighter-weight filters and simpler configuration, but the implementation would be similar. My only concern is if it confuses the order of filters. If there's one generic service middleware, it's probably going to be invoked before some other middleware and after others. But the services would communicate with that service middleware outside of the WSGI band (using callbacks or shared structures or something). This makes it difficult for transforming middleware to be certain that it has full control to wrap applications. > This would be pretty straightforward and a separate services framework > could be implemented outside WSGI entirely perhaps taking some cues from > PEAK and/or Zope 3 ( or even [gasp] *code!*, god knows this problem has > already been solved many times over ;-) -- for implementing service > registration and lookup. It could form the basis for a "WSGI services" > spec without muddying the waters for PEP 333. > > That said, if you're not interested in that because you think > implementing services as middleware is "good enough" and you'd rather > not implement another framework, I'd totally understand that. At that > point I probably wouldn't be interested either because you're the > defacto champion of WSGI middleware as a lingua franca and the only > reason to do any of this is for the sake of collaboration and code > sharing. But I do think it would be cleaner. Well, I'm a fan of working code. If services are a better way of doing some of this stuff, and they supercede code I've written or imagined, that's not that big a deal. At this point I'd be interested to see how a Really Lame Implementation of Sessions (for instance) would be implemented with services. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Tue Jul 19 19:28:21 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 19 Jul 2005 12:28:21 -0500 Subject: [Web-SIG] Standardized configuration In-Reply-To: <5.1.1.6.0.20050717134029.0269b730@mail.telecommunity.com> References: <42DA13CE.2080208@colorstudy.com> <1121571455.24386.171.camel@plope.dyndns.org> <42D9DEBA.4080609@colorstudy.com> <1121578280.24386.228.camel@plope.dyndns.org> <42DA13CE.2080208@colorstudy.com> <5.1.1.6.0.20050717134029.0269b730@mail.telecommunity.com> Message-ID: <42DD3835.1040300@colorstudy.com> Phillip J. Eby wrote: > At 07:29 AM 7/17/2005 -0400, Chris McDonough wrote: > >> I'm a bit confused because one of the canonical examples of >> how WSGI middleware is useful seems to be the example of implementing a >> framework-agnostic sessioning service. And for that sessioning service >> to be useful, your application has to be able to depend on its >> availability so it can't be "oblivious". > > > Exactly. As soon as you start trying to have configured services, you > are creating Yet Another Framework. Which isn't a bad thing per se, > except that it falls outside the scope of PEP 333. It deserves a > separate PEP, I think, and a separate implementation mechanism than > being crammed into the request environment. These things should be > allowed to be static, so that an application can do some reasonable > setup, and so that you don't have per-request overhead to shove ninety > services into the environment. The services themselves can be fairly lazy; though unfortunately you can't be trickly and add laziness when a service was originally written in a very concrete way, since that would require fake dictionaries and other things WSGI disallows. But there's not a lot of overhead to environ['paste.session.factory']() -- it's just a stub object stuck in a particulra key, that knows the context in which it was created so it can communicate with that context later. > Also, because we are dealing not with basic plumbing but with making a > nice kitchen, it seems to me we can afford to make the fixtures nice. > That is, for an add-on specification to WSGI we don't need to adhere to > the "let it be ugly for apps if it makes the server easier" principle > that guided PEP 333. The assumption there was that people would mostly > port existing wrappers over HTTP/CGI to be wrappers over WSGI. But for > services, we are talking about an actual framework to be used by > application developers directly, so more user-friendliness is definitely > in order. My own vision for most middleware is that it get wrapped by frameworks. In fact, that it be so godawful ugly you can't help but wrap it ;) Well, not deliberately horrible for no good reason... but at least that it doesn't matter that much, because the frameworks will want to wrap it anyway. This is the "aesthetically neutral" aspect of middleware that I've mentioned before. People get all bothered if you use underscores instead of mixed case, or vice versa, even though that's one of the least important aspects of the features being implemented. Of course, there are real problems with wrapping. Like it reduces the transparency -- middleware becomes this magic part of the system because it's not something people deal with day-to-day, and if your first chance to work with middleware is to write it, that's intimidating. There's also the leaky abstraction problem; though I think well-defined middleware helps avoid this. Really, if you are building user-visible standard libraries, you are building a framework. And maybe I'm just too pessimistic about a standard framework... but, well, I am certainly not optimistic about it. On the other hand, it's not like people are breaking down my door with their enthusiasm to use Paste middleware either. So I dunno. I can only say a good strategy clearly has to build on developer's laziness, their fear of new things, and their reluctance to learn new things. Well, that's the negative way of saying it. It has to build on the likelihood that their attention is primarily focused on their domain, that it builds on their existing knowledge, and that it presents a minimal set of new concepts. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From jjinux at gmail.com Tue Jul 19 19:46:25 2005 From: jjinux at gmail.com (Shannon -jj Behrens) Date: Tue, 19 Jul 2005 10:46:25 -0700 Subject: [Web-SIG] Standardized configuration In-Reply-To: <1121571455.24386.171.camel@plope.dyndns.org> References: <1121571455.24386.171.camel@plope.dyndns.org> Message-ID: It seems to me that authentication and authorization should be a put into a library that isn't bound to the Web at all. I thought that those guys reimplementing J2EE in Python did that. :-/ Oh well, -jj On 7/16/05, Chris McDonough wrote: > I've also been putting a bit of thought into middleware configuration, > although maybe in a different direction. I'm not too concerned yet > about being able to introspect the configuration of an individual > component. Maybe that's because I haven't thought about the problem > enough to be concerned about it. In the meantime, though, I *am* > concerned about being able to configure a middleware "pipeline" easily > and have it work. > > I've been attempting to divine a declarative way to configure a pipeline > of WSGI middleware components. This is simple enough through code, > except that at least in terms of how I'm attempting to factor my > middleware, some components in the pipeline may have dependencies on > other pipeline components. > > For example, it would be useful in some circumstances to create separate > WSGI components for user identification and user authorization. The > process of identification -- obtaining user credentials from a request > -- and user authorization -- ensuring that the user is who he says he > is by comparing the credentials against a data source -- are really > pretty much distinct operations. There might also be a "challenge" > component which forces a login dialog. > > In practice, I don't know if this is a truly useful separation of > concerns that need to be implemented in terms of separate components in > the middleware pipeline (I see that paste.login conflates them), it's > just an example. But at very least it would keep each component simpler > if the concerns were factored out into separate pieces. > > But in the example I present, the "authentication" component depends > entirely on the result of the "identification" component. It would be > simple enough to glom them together by using a distinct environment key > for the identification component results and have the authentication > component look for that key later in the middleware result chain, but > then it feels like you might as well have written the whole process > within one middleware component because the coupling is pretty strong. > > I have a feeling that adapters fit in here somewhere, but I haven't > really puzzled that out yet. I'm sure this has been discussed somewhere > in the lifetime of WSGI but I can't find much in this list's archives. > > > Lately I've been thinking about the role of Paste and WSGI and > > whatnot. Much of what makes a Paste component Pastey is > > configuration; otherwise the bits are just independent pieces of > > middleware, WSGI applications, etc. So, potentially if we can agree > > on configuration, we can start using each other's middleware more > > usefully. > > > > I think we should avoid questions of configuration file syntax for > > now. Lets instead simply consider configuration consumers. A > > standard would consist of: > > > > * A WSGI environment key (e.g., 'webapp01.config') > > * A standard for what goes in that key (e.g., a dictionary object) > > * A reference implementation of the middleware > > * Maybe a non-WSGI-environment way to access the configuration (like > > paste.CONFIG, which is a global object that dispatches to per-request > > configuration objects) -- in practice this is really really useful, as > > you don't have to pass the configuration object around. > > > > There's some other things we have to consider, as configuration syntaxes > > do effect the configuration objects significantly. So, the standard for > > what goes in the key has to take into consideration some possible > > configuration syntaxes. > > > > The obvious starting place is a dictionary-like object. I would suggest > > that the keys should be valid Python identifiers. Not all syntaxes > > require this, but some do. This restriction simply means that > > configuration consumers should try to consume Python identifiers. > > > > There's also a question about name conflicts (two consumers that are > > looking for the same key), and whether nested configuration should be > > preferred, and in what style. > > > > Note that the standard we decide on here doesn't have to be the only way > > the object can be accessed. For instance, you could make your > > configuration available through 'myframework.config', and create a > > compliant wrapper that lives in 'webapp01.config', perhaps even doing > > different kinds of mapping to fix convention differences. > > > > There's also a question about what types of objects we can expect in the > > configuration. Some input styles (e.g., INI and command line) only > > produce strings. I think consumers should treat strings (or maybe a > > special string subclass) specially, performing conversions as necessary > > (e.g., 'yes'->True). > > > > Thoughts? > > > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/jjinux%40gmail.com > -- I have decided to switch to Gmail, but messages to my Yahoo account will still get through. From ianb at colorstudy.com Tue Jul 19 19:56:02 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 19 Jul 2005 12:56:02 -0500 Subject: [Web-SIG] Standardized configuration In-Reply-To: <5.1.1.6.0.20050717135650.026a0428@mail.telecommunity.com> References: <5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com> <5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com> <5.1.1.6.0.20050717135650.026a0428@mail.telecommunity.com> Message-ID: <42DD3EB2.4090605@colorstudy.com> Phillip J. Eby wrote: >> In many cases, the middleware is modifying or watching the >> application's output. For instance, catching a 401 and turning that >> into the appropriate login -- which might mean producing a 401, a >> redirect, a login page via internal redirect, or whatever. > > > And that would be legitimate middleware, except I don't think that's > what you really want for that use case. What you want is an > "authentication service" that you just call to say, "I need a login" and > get the login information from, and return its return value so that it > does start_response for you and sends the right output. Like I mentioned in my response to Chris, this kind of contract about return values is a difficult one to implement. A "return 401 status" contract is pretty simple, in that it's vague in a way that fits with typical frameworks -- they all have a way of changing the status, and most have a way of aborting with that kind of error. > The difference is obliviousness; if you want to *wrap* an application > not written to use WSGI services, then it makes sense to make it > middleware. If you're writing a new application, just have it use > components instead of mocking up a 401 just so you can use the existing > middleware. Who's writing new applications? OK... I guess a lot of people are. I may be more focused on retrofitting compared to other people. > Notice, by the way, that it's trivial to create middleware that detects > the 401 and then *invokes the service*. So, it's more reusable to make > services be services, and middleware be wrappers to apply services to > oblivious applications. Yes, this would be the single-middleware-multiple-service model. I don't understand exactly how services work myself, so I can't write that, but I'm certainly interested in examples. Well... I'll throw out one just for the heck of it: class ServiceMiddleware(object): def __init__(self, app): self.app = app def __call__(self, environ, start_response): context = environ['webapp.service_context'] = ServiceContext() # You could also do some thread-local registering of this # context at this point def replacement_start_response(status, headers): status, headers, writer = context.start_response( start_response, status, headers) return writer app_iter = self.app(environ, start_response) return context.app_iter(app_iter) class ServiceContext(object): def __init__(self): self.services = [] def get_service(self, name): ... something I don't understand ... self.services.append(service) return service def start_response(self, start_response, status, headers): for service in self.services: if hasattr(service, 'munge_start_response'): status, headers = service.munge_start_response(status, headers) return start_response(status, headers) def app_iter(self, app_iter): return app_iter And ServiceContext should also ask services if they care to munge_body or something, and then pipe all calls to the writer and all the parts of app_iter into that service if so. And it should let services catch exceptions. >> I guess you could make one Uber Middleware that could handle the >> services' needs to rewrite output, watch for errors and finalize >> resources, etc. > > > Um, it's called a library of functions. :) WSGI was designed to make > it easy to use library calls to do stuff. If you don't need the > obliviousness, then library calls (or service calls) are the Obvious Way > To Do It. I do use library calls when possible; and even when not possible I (generally) try to make the middleware as small as possible, just handling the logic of the transformation. But mostly libraries don't need to be discussed here, because they are simple ;) There are perhaps a few places where standardization of some library manipulations would be useful. E.g., get_cookies() and parse_querystring() in paste.wsgilib (http://svn.pythonpaste.org/Paste/trunk/paste/wsgilib.py) could be standardized, and then WSGI-based libraries that were interested in the request could probably retrieve the frameworks' parsed version of URL and cookie parameters. >>> Really, the only stuff that actually needs to be middleware, is stuff >>> that wraps an *oblivious* application; i.e., the application doesn't >>> know it's there. If it's a service the application uses, then it >>> makes more sense to create a service management mechanism for >>> configuration and deployment of WSGI applications. >> >> >> Applications always care about the things around them, so any >> convention that middleware and applications be unaware of each other >> would rule out most middleware. > > > Yes, exactly! Now you understand me. :) If the application is what > wants the service, let it just call the service. Middleware is > *overhead* in that case. Well, no, I don't really understand you, but if it makes you feel better... ;) For instance, applications may be interested to know there's a piece of middleware that will catch unexpected exceptions. For instance, it might see that and reraise unexpected exceptions instead of providing its own error report. But it's not "overhead" or something the application wants handled lazily. It's just useful information about the environment. >>> I hope this isn't too vague; I've been wanting to say something about >>> this since I saw your blog post about doing transaction services in >>> WSGI, as that was when I first understood why you were making >>> everything into middleware. (i.e., to create a poor man's substitute >>> for "placeful" services and utilities as found in PEAK and Zope 3.) >> >> >> What do they provide that middleware does not? > > > Well, some services may be things the application needs only when it's > being initially configured. Or maybe the service is something like a > scheduler that gives timed callbacks. There are lots of non-per-request > services that make sense, so forcing service access to be only through > the environment makes for cruftier code, since you now have to keep > track of whether you've been called before, and then do any setup during > your first web hit. For that matter, some service configuration might > need to be dynamically determined, based on the application object > requesting it. > > But the main thing they provide that middleware does not is simplicity > and ease of use. I understand your desire to preserve the appearance of > neutrality, but you are creating new web frameworks here, and making > them ugly doesn't make them any less of a framework. :) > > What's worse is that by tying the service access mechanism to the > request environment, you're effectively locking out frameworks like PEAK > and Zope 3 from being able to play, and that goes against (IMO) the > goals of WSGI, which is to get more and more frameworks to be able to > play, and give them *incentive* to merge and dissolve and be assimilated > into the primordial soup of WSGI-based integration, or at least to be > competitors for various implementation/use case niches in the WSGI > ecosystem. How is being request-oriented locking them out? To me this mostly seems like an aesthetics and implementation discussion; mapping from one to the other doesn't seem that hard. If you map from request to service, you do it by putting a little proxy in the request that calls the service. If mapping from service to request, you keep the request around somewhere (threadlocal or something) and the service is implemented in terms of things found in the request. > See also my message to Chris just now about why a WSGI service spec can > and should follow different rules of engagement than the WSGI spec did; > it really isn't necessary to make services ugly for applications in > order to make it easy for server implementors, as it was for the WSGI > core spec. In fact, the opposite condition applies: the service stack > should make it easy and clean for applications to use WSGI services, > because they're the things that will let them hide WSGI implementation > details in the absence of an existing web framework. With perhaps a couple exceptions, I don't think WSGI is that bad for the application side. Not that you'll write to WSGI directly most of the time, but if you do it's still not that bad. WSGI is dumb and crude, which is a feature. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From fuzzybr80 at gmail.com Tue Jul 19 20:08:41 2005 From: fuzzybr80 at gmail.com (ChunWei Ho) Date: Wed, 20 Jul 2005 02:08:41 +0800 Subject: [Web-SIG] Standardized configuration In-Reply-To: References: Message-ID: <31f07fc30507191108b01ba7d@mail.gmail.com> Hi, I have been looking at WSGI for only a few weeks, but had some ideas similar (I hope) to what is being discussed that I'll put down here. I'm new to this so I beg your indulgence if this is heading down the wrong track or wildly offtopic :) It seems to me that a major drawback of WSGI middleware that is preventing flexible configuration/chain paths is that the application to be run has to be determined at init time. It is much flexible if we were able to specify what application to run and configuration information at call time - the middleware would be able to approximate a service of sorts. An example: I have an WSGI application simulating a file-server, and I wish to authenticate users and gzip served files where application. In a middleware chain it would probably work out to be: application = authmiddleware(gzipmiddleware(fileserverapp)) For example, a simplified gzipping middleware consists of: class gzipmiddleware: def __init__(self, application, configparam): self._application = application .... def __call__(self, environ, start_response): do start_response call self._application(environ, start_response) as iterable get each iterator output and zip and yield it. and the fileserverapp, with doGET, doPUT, doPOST subapplications that do the actual processing: def fileserverapp(environ, start_response): if(GET): return doGET(environ, start_response) if(POST): return doPOST(environ, start_response) if(PUT): return doPUT(environ, start_response) Now, the application-server is specific on what it wishes to gzip (usually only on GET or POST entity responses and only if the mimetype allows it). But this level of logic is not to be placed in the gzipping middleware, since its configurable on the webserver. So in order to tell the gzipmiddleware whether to gzip or not: (a) Add a key in environ, say environ[gzip.do_gzip] = True or False to inform the gzipmiddleware to do gzip or not. This does mean that gzipmiddleware remains in the chain, irregardless of whether it is needed or not. (b) Have chain application = authmiddleware(fileserverapp) Use Handlers, as Ian suggested, and in the fileserverapp's init: Handlers( IfTest(method=GET,MimeOkForGzip=True, RunApp=gzipmiddleware(doGET)), IfTest(method=GET,MimeOkForGzip=False, RunApp=doGET), IfTest(method=POST,MimeOkForGzip=True, RunApp=gzipmiddleware(doPOST)), IfTest(method=POST,MimeOkForGzip=False, RunApp=doPOST), IfTest(method=PUT, RunApp=doPOST) ) (c) Make gzipmiddleware a service in the following form: class gzipmiddleware: def __init__(self, application=None, configparam=None): self._application = application .... def __call__(self, environ, start_response, application=None, configparam=None): if application and configparam is specified, use them instead of the init values do start_response call self._application(environ, start_response) as iterable get each iterator output and zip and yield it. This "middleware" is still compatible with PEP-333, but can also be used as: #on main application initialization, create a gzipservice and put it in environ without #specifying application or configparams for init(): environ['service.gzip'] = gzipmiddleware() Modify fileserverapp to: def fileserverapp(environ, start_response): if(GET): if(mimetype ok for gzip): gzipservice = environ['service.gzip'] return gzipservice(environ, start_response, doGET, gzipconfigparams) else: return doGET(environ, start_response) if(POST): if(mimetype ok for gzip): gzipservice = environ['service.gzip'] return gzipservice(environ, start_response, doPOST, gzipconfigparams) else: return doPOST(environ, start_response) if(PUT): doPUT(environ, start_response) The main difference here is that you don't have to initialize full application chains for each possible middleware-path for the request. This would be very useful if you had many middleware in the chain with many permutations as to which middleware are needed You could also instead put a service factory object into environ, it will return the gzipmiddleware object as a service if already exist, otherwise it will create it and then return it. From mike_mp at zzzcomputing.com Tue Jul 19 20:25:04 2005 From: mike_mp at zzzcomputing.com (mike bayer) Date: Tue, 19 Jul 2005 14:25:04 -0400 (EDT) Subject: [Web-SIG] Standardized configuration In-Reply-To: <42DD3835.1040300@colorstudy.com> References: <42DA13CE.2080208@colorstudy.com> <1121571455.24386.171.camel@plope.dyndns.org> <42D9DEBA.4080609@colorstudy.com> <1121578280.24386.228.camel@plope.dyndns.org> <42DA13CE.2080208@colorstudy.com> <5.1.1.6.0.20050717134029.0269b730@mail.telecommunity.com> <42DD3835.1040300@colorstudy.com> Message-ID: <6107.66.192.34.8.1121797504.squirrel@66.192.34.8> While I'm not following every detail of this discussion, this line caught my attention - Ian Bicking said: > Really, if you are building user-visible standard libraries, you are > building a framework. only because Fowler recently posted something that made me think about this, where he distinguishes a "framework" as being something which employs the "inversion of control" principle, as Paste does, versus a "library" which does not: http://martinfowler.com/bliki/InversionOfControl.html . I know theres a lot of discussion over "A Framework ? Not a Framework?" lately, largely in response to the recent meme "more frameworks == BAD" that seems to be getting around these days; perhaps Fowler's distinction is helpful...I hadn't thought of it that way before. From jjinux at gmail.com Tue Jul 19 22:33:02 2005 From: jjinux at gmail.com (Shannon -jj Behrens) Date: Tue, 19 Jul 2005 13:33:02 -0700 Subject: [Web-SIG] Standardized configuration In-Reply-To: <5.1.1.6.0.20050717135650.026a0428@mail.telecommunity.com> References: <5.1.1.6.0.20050717001919.029efe40@mail.telecommunity.com> <42DA1695.7020304@colorstudy.com> <5.1.1.6.0.20050717135650.026a0428@mail.telecommunity.com> Message-ID: Phillip, 100% agreed. Libraries are more flexible than middleware because you get to decide when, if, and how they get called. Middleware has its place, but it doesn't make sense to try to package all library code as middleware. Even when you do write middleware, it should simply link in library code so that you can use the library code in the absence of the middleware. Consider an XSLT middleware layer. It makes sense to have such a thing. It doesn't make sense to only be able to use the XSLT code via the middleware interface. As much as possible, you want to be able to interact with libraries directly. Best Regards, -jj On 7/17/05, Phillip J. Eby wrote: > At 03:28 AM 7/17/2005 -0500, Ian Bicking wrote: > >Phillip J. Eby wrote: > >>What I think you actually need is a way to create WSGI application > >>objects with a "context" object. The "context" object would have a > >>method like "get_service(name)", and if it didn't find the service, it > >>would ask its parent context, and so on, until there's no parent context > >>to get it from. The web server would provide a way to configure a root > >>or default context. > > > >I guess I'm treating the request environment as that context. I don't > >really see the problem with that...? > > It puts a layer in the request call stack for each service you want to > offer, versus *no* layers for an arbitrary number of services. It adds > work to every request to put stuff into the environment, then take it out > again, versus just getting what you want in the first place. > > > >In many cases, the middleware is modifying or watching the application's > >output. For instance, catching a 401 and turning that into the > >appropriate login -- which might mean producing a 401, a redirect, a login > >page via internal redirect, or whatever. > > And that would be legitimate middleware, except I don't think that's what > you really want for that use case. What you want is an "authentication > service" that you just call to say, "I need a login" and get the login > information from, and return its return value so that it does > start_response for you and sends the right output. > > The difference is obliviousness; if you want to *wrap* an application not > written to use WSGI services, then it makes sense to make it > middleware. If you're writing a new application, just have it use > components instead of mocking up a 401 just so you can use the existing > middleware. > > Notice, by the way, that it's trivial to create middleware that detects the > 401 and then *invokes the service*. So, it's more reusable to make > services be services, and middleware be wrappers to apply services to > oblivious applications. > > > >I guess you could make one Uber Middleware that could handle the services' > >needs to rewrite output, watch for errors and finalize resources, etc. > > Um, it's called a library of functions. :) WSGI was designed to make it > easy to use library calls to do stuff. If you don't need the > obliviousness, then library calls (or service calls) are the Obvious Way To > Do It. > > > > This isn't unreasonable, and I've kind of expected one to evolve at > > some point. But you'll have to say more to get me to see how "services" > > is a better way to manage this. > > I'm saying that middleware can use services, and applications can use > services. Making applications *have to* use middleware in order to use the > services is wasteful of both computer time and developer brainpower. Just > let them use services directly when the situation calls for it, and you can > always write middleware to use the services when you encounter the > occasional (and ever-rarer with time) oblivious application. > > > >>Really, the only stuff that actually needs to be middleware, is stuff > >>that wraps an *oblivious* application; i.e., the application doesn't know > >>it's there. If it's a service the application uses, then it makes more > >>sense to create a service management mechanism for configuration and > >>deployment of WSGI applications. > > > >Applications always care about the things around them, so any convention > >that middleware and applications be unaware of each other would rule out > >most middleware. > > Yes, exactly! Now you understand me. :) If the application is what wants > the service, let it just call the service. Middleware is *overhead* in > that case. > > > >>I hope this isn't too vague; I've been wanting to say something about > >>this since I saw your blog post about doing transaction services in WSGI, > >>as that was when I first understood why you were making everything into > >>middleware. (i.e., to create a poor man's substitute for "placeful" > >>services and utilities as found in PEAK and Zope 3.) > > > >What do they provide that middleware does not? > > Well, some services may be things the application needs only when it's > being initially configured. Or maybe the service is something like a > scheduler that gives timed callbacks. There are lots of non-per-request > services that make sense, so forcing service access to be only through the > environment makes for cruftier code, since you now have to keep track of > whether you've been called before, and then do any setup during your first > web hit. For that matter, some service configuration might need to be > dynamically determined, based on the application object requesting it. > > But the main thing they provide that middleware does not is simplicity and > ease of use. I understand your desire to preserve the appearance of > neutrality, but you are creating new web frameworks here, and making them > ugly doesn't make them any less of a framework. :) > > What's worse is that by tying the service access mechanism to the request > environment, you're effectively locking out frameworks like PEAK and Zope 3 > from being able to play, and that goes against (IMO) the goals of WSGI, > which is to get more and more frameworks to be able to play, and give them > *incentive* to merge and dissolve and be assimilated into the primordial > soup of WSGI-based integration, or at least to be competitors for various > implementation/use case niches in the WSGI ecosystem. > > See also my message to Chris just now about why a WSGI service spec can and > should follow different rules of engagement than the WSGI spec did; it > really isn't necessary to make services ugly for applications in order to > make it easy for server implementors, as it was for the WSGI core spec. In > fact, the opposite condition applies: the service stack should make it easy > and clean for applications to use WSGI services, because they're the things > that will let them hide WSGI implementation details in the absence of an > existing web framework. > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/jjinux%40gmail.com > -- I have decided to switch to Gmail, but messages to my Yahoo account will still get through. From fuzzybr80 at gmail.com Wed Jul 20 05:34:07 2005 From: fuzzybr80 at gmail.com (ChunWei Ho) Date: Wed, 20 Jul 2005 11:34:07 +0800 Subject: [Web-SIG] Standardized configuration In-Reply-To: <31f07fc30507191108b01ba7d@mail.gmail.com> References: <31f07fc30507191108b01ba7d@mail.gmail.com> Message-ID: <31f07fc30507192034438a8617@mail.gmail.com> > (b) > Have chain application = authmiddleware(fileserverapp) > Use Handlers, as Ian suggested, and in the fileserverapp's init: > Handlers( > IfTest(method=GET,MimeOkForGzip=True, RunApp=gzipmiddleware(doGET)), > IfTest(method=GET,MimeOkForGzip=False, RunApp=doGET), > IfTest(method=POST,MimeOkForGzip=True, RunApp=gzipmiddleware(doPOST)), > IfTest(method=POST,MimeOkForGzip=False, RunApp=doPOST), > IfTest(method=PUT, RunApp=doPOST) > ) It was Graham who suggested the use of Handlers initially. Sincere apologies for my confusion. > (c) > Make gzipmiddleware a service in the following form: > class gzipmiddleware: > def __init__(self, application=None, configparam=None): > self._application = application > .... > def __call__(self, environ, start_response, application=None, > configparam=None): > if application and configparam is specified, use them instead of > the init values > do start_response > call self._application(environ, start_response) as iterable > get each iterator output and zip and yield it. > > This "middleware" is still compatible with PEP-333, but can also be used as: > #on main application initialization, create a gzipservice and put it > in environ without > #specifying application or configparams for init(): > environ['service.gzip'] = gzipmiddleware() > > Modify fileserverapp to: > def fileserverapp(environ, start_response): > if(GET): > if(mimetype ok for gzip): > gzipservice = environ['service.gzip'] > return gzipservice(environ, start_response, doGET, gzipconfigparams) > else: return doGET(environ, start_response) > if(POST): > if(mimetype ok for gzip): > gzipservice = environ['service.gzip'] > return gzipservice(environ, start_response, doPOST, > gzipconfigparams) > else: return doPOST(environ, start_response) > if(PUT): doPUT(environ, start_response) > > The main difference here is that you don't have to initialize full > application chains for each possible middleware-path for the request. > This would be very useful if you had many middleware in the chain with > many permutations as to which middleware are needed > > You could also instead put a service factory object into environ, it > will return the gzipmiddleware object as a service if already exist, > otherwise it will create it and then return it. > From mo.babaei at gmail.com Thu Jul 21 13:05:37 2005 From: mo.babaei at gmail.com (mohammad babaei) Date: Thu, 21 Jul 2005 15:35:37 +0430 Subject: [Web-SIG] Session Handling in python Message-ID: <5bf3a41f05072104058ffffbb@mail.gmail.com> Hi, what is the best way for "Session Handling" in python for production use ? regards -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/web-sig/attachments/20050721/930e1a59/attachment.htm From mike_mp at zzzcomputing.com Thu Jul 21 18:37:40 2005 From: mike_mp at zzzcomputing.com (mike bayer) Date: Thu, 21 Jul 2005 12:37:40 -0400 (EDT) Subject: [Web-SIG] Session Handling in python In-Reply-To: <5bf3a41f05072104058ffffbb@mail.gmail.com> References: <5bf3a41f05072104058ffffbb@mail.gmail.com> Message-ID: <20114.66.192.34.8.1121963860.squirrel@66.192.34.8> theres a mod_python FAQ entry on this which names several packages for session management: http://www.modpython.org/FAQ/faqw.py?req=show&file=faq03.008.htp the first one mentioned is my own, which can adapt to mod_python, CGI and WSGI interfaces. mohammad babaei said: > Hi, > what is the best way for "Session Handling" in python for production use ? > > regards > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/mike_mp%40zzzcomputing.com > From jjinux at gmail.com Thu Jul 21 19:15:21 2005 From: jjinux at gmail.com (Shannon -jj Behrens) Date: Thu, 21 Jul 2005 10:15:21 -0700 Subject: [Web-SIG] Session Handling in python In-Reply-To: <20114.66.192.34.8.1121963860.squirrel@66.192.34.8> References: <5bf3a41f05072104058ffffbb@mail.gmail.com> <20114.66.192.34.8.1121963860.squirrel@66.192.34.8> Message-ID: If you use Aquarium, it has its own session infrastructure, supporting in-memory sessions, database sessions, or whatever other backends you want to plug in. I think most of the other frameworks do the same. Best Regards, -jj On 7/21/05, mike bayer wrote: > theres a mod_python FAQ entry on this which names several packages for > session management: > > http://www.modpython.org/FAQ/faqw.py?req=show&file=faq03.008.htp > > the first one mentioned is my own, which can adapt to mod_python, CGI and > WSGI interfaces. > > mohammad babaei said: > > Hi, > > what is the best way for "Session Handling" in python for production use ? > > > > regards > > _______________________________________________ > > Web-SIG mailing list > > Web-SIG at python.org > > Web SIG: http://www.python.org/sigs/web-sig > > Unsubscribe: > > http://mail.python.org/mailman/options/web-sig/mike_mp%40zzzcomputing.com > > > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/jjinux%40gmail.com > -- I have decided to switch to Gmail, but messages to my Yahoo account will still get through. From chrism at plope.com Fri Jul 22 22:38:07 2005 From: chrism at plope.com (Chris McDonough) Date: Fri, 22 Jul 2005 16:38:07 -0400 Subject: [Web-SIG] Standardized configuration In-Reply-To: <31f07fc30507191108b01ba7d@mail.gmail.com> References: <31f07fc30507191108b01ba7d@mail.gmail.com> Message-ID: <1122064687.8446.2.camel@localhost.localdomain> I've had a stab at creating a simple WSGI deployment implementation. I use the term "WSGI component" in here as shorthand to indicate all types of WSGI implementations (server, application, gateway). The primary deployment concern is to create a way to specify the configuration of an instance of a WSGI component, preferably within a declarative configuration file. A secondary deployment concern is to create a way to "wire up" components together into a specific deployable "pipeline". A strawman implementation that solves both issues via the "configurator", which would be presumed to live in "wsgiref". Currently it lives in a package named "wsgiconfig" on my laptop. This module follows. """ Configurator for establishing a WSGI pipeline """ from ConfigParser import ConfigParser import types def configure(path): config = ConfigParser() if isinstance(path, types.StringTypes): config.readfp(open(path)) else: config.readfp(path) appsections = [] for name in config.sections(): if name.startswith('application:'): appsections.append(name) elif name == 'pipeline': pass else: raise ValueError, '%s is not a valid section name' app_defs = {} for appsection in appsections: app_config_file = config.get(appsection, 'config') app_factory_name = config.get(appsection, 'factory') app_name = appsection.split('application:')[1] if app_config_file is None: raise ValueError, ('application section %s requires a "config" ' 'option' % app_config_file) if app_factory_name is None: raise ValueError, ('application %s requires a "factory"' ' option' % app_factory_name) app_defs[app_name] = {'config':app_config_file, 'factory':app_factory_name} if not config.has_section('pipeline'): raise ValueError, 'must have a "pipeline" section in config' pipeline_str = config.get('pipeline', 'apps') if pipeline_str is None: raise ValueError, ('must have an "apps" definition in the ' 'pipeline section') pipeline_def = pipeline_str.split() next = None while pipeline_def: app_name = pipeline_def.pop() app_def = app_defs.get(app_name) if app_def is None: raise ValueError, ('appname %s os defined in pipeline ' '%s butno application is defined ' 'with that name') factory_name = app_def['factory'] factory = import_by_name(factory_name) config_file = app_def['config'] app_factory = factory(config_file) app = app_factory(next) next = app if not next: raise ValueError, 'no apps defined in pipeline' return next def import_by_name(name): if not "." in name: raise ValueError("unloadable name: " + `name`) components = name.split('.') start = components[0] g = globals() package = __import__(start, g, g) modulenames = [start] for component in components[1:]: modulenames.append(component) try: package = getattr(package, component) except AttributeError: n = '.'.join(modulenames) package = __import__(n, g, g, component) return package We configure a pipeline based on a config file, which creates and chains two "sample" WSGI applications together. To do this, we use a ConfigParser-format config file named 'myapplication.conf' that looks like this:: [application:sample1] config = sample1.conf factory = wsgiconfig.tests.sample_components.factory1 [application:sample2] config = sample2.conf factory = wsgiconfig.tests.sample_components.factory2 [pipeline] apps = sample1 sample2 The configurator exposes a function that accepts a single argument, "configure". >>> from wsgiconfig.configurator import configure >>> appchain = configure('myapplication.conf') The "sample_components" module referred to in the 'myapplication.conf' file application definitions might look like this:: class sample1: """ middleware """ def __init__(self, app): self.app = app def __call__(self, environ, start_response): environ['sample1'] = True return self.app(environ, start_response) class sample2: """ end-point app """ def __init__(self, app): self.app = app def __call__(self, environ, start_response): environ['sample2'] = True return ['return value 2'] def factory1(filename): # this app requires no configuration, but if it did, we would # parse the file represented by filename and do some config return sample1 def factory2(filename): # this app requires no configuration, but if it did, we would # parse the file represented by filename and do some config return sample2 The appchain represents an automatically constructed pipeline of WSGI components. Each application in the chain is constructed from a factory. >>> appchain.__class__.__name__ # sample1 (middleware) 'sample1' >>> appchain.app.__class__.__name__ # sample2 (application) 'sample2' Calling the "appchain" in this example results in the keys "sample1" and "sample2" being available in the environment, and what is returned is the result of the application, which is the list ['return value 2']. Potential points of contention - The WSGI configurator assumes that you are willing to write WSGI component factories which accept a filename as a config file. This factory returns *another* factory (typically a class) that accepts "the next" application in the pipeline chain and returns a WSGI application instance. This pattern is necessary to support argument currying across a declaratively configured pipeline, because the WSGI spec doesn't allow for it. This is more contract than currently exists in the WSGI specification but it would be trivial to change existing WSGI components to adapt to this pattern. Or we could adopt a pattern/convention that removed one of the factories, passing both the "next" application and the config file into a single factory function. Whatever. In any case, in order to do declarative pipeline configuration, some convention will need to be adopted. The convention I'm advocating above seems to already have been for the current crop of middleware components (using a factory which accepts the application as the first argument). - Pipeline deployment configuration should be used only to configure essential information about pipeline and individual pipeline components. Where complex service data configuration is necessary, the component which implements a service should provide its own external configuration mechanism. For example, if an XSL service is implemented as a WSGI component, and it needs configuration knobs of some kind, these knobs should not live within the WSGI pipeline deployment file. Instead, each component should have its own configuration file. This is the purpose (undemonstrated above) of allowing an [application] section to specify a config filename. - Some people have seem to be arguing that there should be a single configuration format across all WSGI applications and gateways to configure everything about those components. I don't think this is workable. I think the only thing that is workable is to recommend to WSGI component authors that they make their components configurable using some configuration file or other type of path (URL, perhaps). The composition, storage, and format of all other configuration data for the component should be chosen by the author. - Threads which discussed this earlier on the web-sig list included the idea that a server or gateway should be able to "find" an end-point application based on a lookup of source file/module + attrname specified in the server's configuration. I'm suggesting instead that the mapping between servers, gateways, and applications be a pipeline and that the pipeline itself have a configuration definition that may live outside of any particular server, gateway, or application. The pipeline definition(s) would wire up the servers, gateways, and applications itself. The pipeline definition *could* be kept amongs the files representing a particular server instance on the filesystem (and this might be the default), but it wouldn't necessarily have to be. This might just be semantics. - There were a few mentions of being able to configure/create a WSGI application at request time by passing name/value string pairs "through the pipeline" that would ostensibly be used to create a new application instance (thereby dynamically extending or modifying the pipeline). I think it's fine if a particular component does this, but I'm suggesting that a canonization of the mechanism used to do this is not necessary and that it's useful to have the ability to define static pipelines for deployment. - If elements in the pipeline depend on "services" (ala Paste-as-not-a-chain-of-middleware-components), it may be advantageous to create a "service manager" instead of deploying each service as middleware. The "service manager" idea is not a part of the deployment spec. The service manager would itself likely be implemented as a piece of middleware or perhaps just a library. On Wed, 2005-07-20 at 02:08 +0800, ChunWei Ho wrote: > Hi, I have been looking at WSGI for only a few weeks, but had some > ideas similar (I hope) to what is being discussed that I'll put down > here. I'm new to this so I beg your indulgence if this is heading down > the wrong track or wildly offtopic :) > > It seems to me that a major drawback of WSGI middleware that is > preventing flexible configuration/chain paths is that the application > to be run has to be determined at init time. It is much flexible if we > were able to specify what application to run and configuration > information at call time - the middleware would be able to approximate > a service of sorts. .... From ianb at colorstudy.com Sat Jul 23 00:26:01 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 22 Jul 2005 17:26:01 -0500 Subject: [Web-SIG] Standardized configuration In-Reply-To: <1122064687.8446.2.camel@localhost.localdomain> References: <31f07fc30507191108b01ba7d@mail.gmail.com> <1122064687.8446.2.camel@localhost.localdomain> Message-ID: <42E17279.5040104@colorstudy.com> Chris McDonough wrote: > I've had a stab at creating a simple WSGI deployment implementation. > I use the term "WSGI component" in here as shorthand to indicate all > types of WSGI implementations (server, application, gateway). > > The primary deployment concern is to create a way to specify the > configuration of an instance of a WSGI component, preferably within a > declarative configuration file. A secondary deployment concern is to > create a way to "wire up" components together into a specific > deployable "pipeline". > > A strawman implementation that solves both issues via the > "configurator", which would be presumed to live in "wsgiref". Currently > it lives in a package named "wsgiconfig" on my laptop. This module > follows. I have a weird problem reading unhighlighted source. I dunno why. But anyway, the configuration file is what interests me most... > To do this, we use a ConfigParser-format config file named > 'myapplication.conf' that looks like this:: > > [application:sample1] > config = sample1.conf > factory = wsgiconfig.tests.sample_components.factory1 > > [application:sample2] > config = sample2.conf > factory = wsgiconfig.tests.sample_components.factory2 > > [pipeline] > apps = sample1 sample2 I think it's confusing to call both these applications. I think "middleware" or "filter" would be better. I think people understand "filter" far better, so I'm inclined to use that. So... [application:sample2] # What is this relative to? I hate both absolute paths and # paths relative to pwd equally... config = sample1.conf factory = wsgiconfig... [filter:sample1] config = sample1.conf factory = ... [pipeline] # The app is unique and special...? app = sample2 filters = sample1 Well, that's just a first refactoring; I'm having other inclinations... > Potential points of contention > > - The WSGI configurator assumes that you are willing to write WSGI > component factories which accept a filename as a config file. This > factory returns *another* factory (typically a class) that accepts > "the next" application in the pipeline chain and returns a WSGI > application instance. This pattern is necessary to support > argument currying across a declaratively configured pipeline, > because the WSGI spec doesn't allow for it. This is more contract > than currently exists in the WSGI specification but it would be > trivial to change existing WSGI components to adapt to this > pattern. Or we could adopt a pattern/convention that removed one > of the factories, passing both the "next" application and the > config file into a single factory function. Whatever. In any > case, in order to do declarative pipeline configuration, some > convention will need to be adopted. The convention I'm advocating > above seems to already have been for the current crop of middleware > components (using a factory which accepts the application as the > first argument). I hate the proliferation of configuration files this implies. I consider the filters an implementation detail; if they each have partitioned configuration then they become a highly exposed piece of the architecture. It's also a lot of management overhead. Typical middleware takes 0-5 configuration parameters. For instance, paste.profilemiddleware is perfectly usable with no configuration at all, and only has two parameters. But this is reasonably easy to resolve -- there's a perfectly good configuration section sitting there, waiting to be used: [filter:profile] factory = paste.profilemiddleware.ProfileMiddleware # Show top 50 functions: limit = 50 This in no way precludes 'config', which is just a special case of this general configuration. The only real problem is a possible conflict if we wanted to add new special names to the configuration, i.e., meta-filter-configuration. Another option is indirection like: [filter:profile] factory = paste.profilemiddleware.ProfileMiddleware [config:profile] limit = 50 If we do something like this, the interface for these factories does become larger, as we're passing in objects that are more complex than strings. Another thing this could allow is recursive configuration, like: [application:urlmap] factory = paste.urlmap.URLMapBuilder app1 = blog app1.url = / app2 = statview app2.url = /stats app3 = cms app3.host = dev.* [application:blog] factory = leonardo.wsgifactory config = myblog.conf [application:statview] factory = statview log_location = /var/logs/apache2 [application:cms] factory = proxy location = http://localhost:8080 map = / /cms.php [pipeline] app = urlmap So URLMapBuilder needs the entire configuration file passed in, along with the name of the section it is building. It then reads some keys, and builds some named applications, and creates an application that delegates based on patterns. That's the kind of configuration file I could really use. Of course, if I really wanted this I could implement: [application:configurable] factory = paste.configurable_pipeline conf = abetterconffile.conf But then the configuration file becomes a dummy configuration, and no one else gets to use my fancier middleware with the normal configuration file. > - Pipeline deployment configuration should be used only to configure > essential information about pipeline and individual pipeline > components. Where complex service data configuration is necessary, > the component which implements a service should provide its own > external configuration mechanism. For example, if an XSL service > is implemented as a WSGI component, and it needs configuration > knobs of some kind, these knobs should not live within the WSGI > pipeline deployment file. Instead, each component should have its > own configuration file. This is the purpose (undemonstrated above) > of allowing an [application] section to specify a config filename. The intelligent finding of files is important to me with any references to filenames. Working directory is, IMHO, fragile and unreliable. Absolute paths are reliable but fragile. In some cases module names are a more robust way of location resources, if those modules are self-describing applications. Mostly because there's a search path. Several projects encourage this kind of system, though I'm not particularly fond of it because it mixes installation-specific files with code. > - Some people have seem to be arguing that there should be a single > configuration format across all WSGI applications and gateways to > configure everything about those components. I don't think this is > workable. I think the only thing that is workable is to recommend > to WSGI component authors that they make their components > configurable using some configuration file or other type of path > (URL, perhaps). The composition, storage, and format of all other > configuration data for the component should be chosen by the > author. While I appreciate the difficulty of agreeing on a configuration format, the way this proposal avoids that is by underpowering the deployment file so that authors are forced to create other configuration files. > - Threads which discussed this earlier on the web-sig list included > the idea that a server or gateway should be able to "find" an > end-point application based on a lookup of source file/module + > attrname specified in the server's configuration. I'm suggesting > instead that the mapping between servers, gateways, and > applications be a pipeline and that the pipeline itself have a > configuration definition that may live outside of any particular > server, gateway, or application. The pipeline definition(s) would > wire up the servers, gateways, and applications itself. The > pipeline definition *could* be kept amongs the files representing a > particular server instance on the filesystem (and this might be the > default), but it wouldn't necessarily have to be. This might just > be semantics. I think it's mostly semantics. > - There were a few mentions of being able to configure/create a WSGI > application at request time by passing name/value string pairs > "through the pipeline" that would ostensibly be used to create a > new application instance (thereby dynamically extending or > modifying the pipeline). I think it's fine if a particular > component does this, but I'm suggesting that a canonization of the > mechanism used to do this is not necessary and that it's useful to > have the ability to define static pipelines for deployment. It does concern me that we allow for dynamic systems. A dynamic system allows for more levels of abstraction in deployment, meaning more potential for automation. I think this can be achieved simply by defining a standard based on the object interface, where the configuration file itself is a reference implementation (that we expect people will usually use). Semantics from the configuration file will leak through, but it's lot easier to deal with (for example) a system that can only support string configuration values, than a system based on concrete files in a specific format. > - If elements in the pipeline depend on "services" (ala > Paste-as-not-a-chain-of-middleware-components), it may be > advantageous to create a "service manager" instead of deploying > each service as middleware. The "service manager" idea is not a > part of the deployment spec. The service manager would itself > likely be implemented as a piece of middleware or perhaps just a > library. That might be best. It's also quite possible for the factory to instantiate more middleware. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Sat Jul 23 20:46:02 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Sat, 23 Jul 2005 13:46:02 -0500 Subject: [Web-SIG] Standardized configuration In-Reply-To: <42E17279.5040104@colorstudy.com> References: <31f07fc30507191108b01ba7d@mail.gmail.com> <1122064687.8446.2.camel@localhost.localdomain> <42E17279.5040104@colorstudy.com> Message-ID: <42E2906A.1060004@colorstudy.com> >> To do this, we use a ConfigParser-format config file named >> 'myapplication.conf' that looks like this:: >> >> [application:sample1] >> config = sample1.conf >> factory = wsgiconfig.tests.sample_components.factory1 >> >> [application:sample2] >> config = sample2.conf >> factory = wsgiconfig.tests.sample_components.factory2 >> >> [pipeline] >> apps = sample1 sample2 On another tack, I think it's important we consider how setuptools/pkg_resources fits into this. Specifically we should allow: [application:sample1] require = WSGIConfig factory = ... Since the factory might not be importable until require() is called. There's lots of other potential benefits to being able to get that information about requirements as well. Another option is if, instead of a factory (or as an alternative alongside it) we make distributions publishable themselves, like: [application:sample] egg = MyAppSuite[filebrowser] Which would require('MyAppSuite[filebrowser]'), and look in Paste.egg-info for a configuration file. The [filebrowser] portion is pkg_resource's way of defining a feature, and I figure it can also identify a specific application if one package holds multiple applications. However, that feature specification would be optional. What the configuration file in egg-info looks like, I don't know. Probably just like the original configuration file, except this time with a factory. I don't like the configuration key "egg" though. But eh, that's a detail. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From chrism at plope.com Sun Jul 24 02:08:03 2005 From: chrism at plope.com (Chris McDonough) Date: Sat, 23 Jul 2005 20:08:03 -0400 Subject: [Web-SIG] Standardized configuration In-Reply-To: <42E17279.5040104@colorstudy.com> References: <31f07fc30507191108b01ba7d@mail.gmail.com> <1122064687.8446.2.camel@localhost.localdomain> <42E17279.5040104@colorstudy.com> Message-ID: <1122163683.3650.132.camel@plope.dyndns.org> On Fri, 2005-07-22 at 17:26 -0500, Ian Bicking wrote: > > To do this, we use a ConfigParser-format config file named > > 'myapplication.conf' that looks like this:: > > > > [application:sample1] > > config = sample1.conf > > factory = wsgiconfig.tests.sample_components.factory1 > > > > [application:sample2] > > config = sample2.conf > > factory = wsgiconfig.tests.sample_components.factory2 > > > > [pipeline] > > apps = sample1 sample2 > > I think it's confusing to call both these applications. I think > "middleware" or "filter" would be better. I think people understand > "filter" far better, so I'm inclined to use that. So... The reason I called them applications instead of filters is because all of them implement the WSGI "application" API (they all implement "a callable that accepts two parameters, environ and start_response"). Some happen to be gateways/filters/middleware/whatever but at least one is just an application and does no delegation. In my example above, "sample2" is not a filter, it is the end-point application. "sample1" is a filter, but it's of course also an application too. Would you maybe rather make it more explicit that some apps are also gateways, e.g.: [application:bleeb] config = bleeb.conf factory = bleeb.factory [filter:blaz] config = blaz.conf factory = blaz.factory ? I don't know that there's any way we could make use of the distinction between the two types in the configurator other than disallowing people to place an application "before" a filter in a pipeline through validation. Is there something else you had in mind? > [application:sample2] > # What is this relative to? I hate both absolute paths and > # paths relative to pwd equally... > config = sample1.conf > factory = wsgiconfig... This was from a doctest I wrote so I could rely on relative paths, sorry. You're right. Ummmm... we could probably cause use the environment as "defaults" to ConfigParser inerpolation and set whatever we need before the configurator is run: $ export APP_ROOT=/home/chrism/myapplication $ ./wsgi-configurator.py myapplication.conf And in myapplication.conf: [application:sample1] config = %(APP_ROOT)s/sample1.conf factory = myapp.sample1.factory That would probably be the least-effort and most flexible thing to do and doesn't mandate any particular directory structure. Of course, we could provide a convention for a recommended directory structure, but this gives us an "out" from being painted in to that in specific cases. > [pipeline] > # The app is unique and special...? > app = sample2 > filters = sample1 > > > > Well, that's just a first refactoring; I'm having other inclinations... I'm not sure whether this is just a stylistic thing or if there's a reason you want to treat the endpoint app specially. By definition, in my implementation, the endpoint app is just the last app mentioned in the pipeline. > > Potential points of contention > > > > - The WSGI configurator assumes that you are willing to write WSGI > > component factories which accept a filename as a config file. This > > factory returns *another* factory (typically a class) that accepts > > "the next" application in the pipeline chain and returns a WSGI > > application instance. This pattern is necessary to support > > argument currying across a declaratively configured pipeline, > > because the WSGI spec doesn't allow for it. This is more contract > > than currently exists in the WSGI specification but it would be > > trivial to change existing WSGI components to adapt to this > > pattern. Or we could adopt a pattern/convention that removed one > > of the factories, passing both the "next" application and the > > config file into a single factory function. Whatever. In any > > case, in order to do declarative pipeline configuration, some > > convention will need to be adopted. The convention I'm advocating > > above seems to already have been for the current crop of middleware > > components (using a factory which accepts the application as the > > first argument). > > I hate the proliferation of configuration files this implies. I > consider the filters an implementation detail; if they each have > partitioned configuration then they become a highly exposed piece of the > architecture. > > It's also a lot of management overhead. Typical middleware takes 0-5 > configuration parameters. For instance, paste.profilemiddleware is > perfectly usable with no configuration at all, and only has two parameters. True. The config file param should be optional. Apps might use the environment to configure themselves. > But this is reasonably easy to resolve -- there's a perfectly good > configuration section sitting there, waiting to be used: > > [filter:profile] > factory = paste.profilemiddleware.ProfileMiddleware > # Show top 50 functions: > limit = 50 > > This in no way precludes 'config', which is just a special case of this > general configuration. The only real problem is a possible conflict if > we wanted to add new special names to the configuration, i.e., > meta-filter-configuration. I think I'd maybe rather see configuration settings for apps that don't require much configuration to come in as environment variables (maybe not necessarily in the "environ" namespace that is implied by the WSGI callable interface but instead in os.environ). Envvars are uncontroversial, so they don't cost us any coding time, PEP time, or brain cycles. But if you really do want a bunch of config to happen in the pipeline deployment file itself (definitely to be able to visually inspect it all in one place would be nice), maybe there could be one optional section in the pipeline deployment config file that sets keys and values into os.environ before creating any application instances: [environment] app1.hosed = true app2.disabled = false ... apps could just look for these keys and values in os.environ within their factories and configure themselves appropriately. If you didn't particularly want this, you could not define the section and just do: $ app1.hosed=true app2.hosed=false ./wsgi-configurator.py \ myapplication.conf or run a shell script to export these things before running the configurator. > Another option is indirection like: > > [filter:profile] > factory = paste.profilemiddleware.ProfileMiddleware > > [config:profile] > limit = 50 > > If we do something like this, the interface for these factories does > become larger, as we're passing in objects that are more complex than > strings. Sure. If this were a democracy, I'd vote to use a single well-known already-existing namespace (os.environ) as a config namespace for all apps that don't require their own config files instead of baking the idea of configuration sections for the apps themselves into the configurator logic. But I'd like to hear what others besides you and me think. > Another thing this could allow is recursive configuration, like: > > [application:urlmap] > factory = paste.urlmap.URLMapBuilder > app1 = blog > app1.url = / > app2 = statview > app2.url = /stats > app3 = cms > app3.host = dev.* > > [application:blog] > factory = leonardo.wsgifactory > config = myblog.conf > > [application:statview] > factory = statview > log_location = /var/logs/apache2 > > [application:cms] > factory = proxy > location = http://localhost:8080 > map = / /cms.php > > [pipeline] > app = urlmap > > > So URLMapBuilder needs the entire configuration file passed in, along > with the name of the section it is building. It then reads some keys, > and builds some named applications, and creates an application that > delegates based on patterns. That's the kind of configuration file I > could really use. Maybe one other (less flexible, but declaratively configurable and simpler to code) way to do this might be by canonizing the idea of "decision middleware", allowing one component in an otherwise static pipeline to decide which is the "next" one by executing a Python expression which runs in a context that exposes the WSGI environment. [application:blog] factory = leonardo.wsgifactory config = myblog.conf [application:statview] factory = statview [application:cms] factory = proxy [decision:urlmapper] cms = environ['PATH_INFO'].startswith('/cms') statview = environ['PATH_INFO'].startswith('/statview') blog = environ['PATH_INFO'].startswith('/blog') [environment] statview.log_location = /var/logs/apache2 cms.location = http://localhost:8080 cms.map = / /cms.php [pipeline] apps = urlmapper > Of course, if I really wanted this I could implement: > > [application:configurable] > factory = paste.configurable_pipeline > conf = abetterconffile.conf > > But then the configuration file becomes a dummy configuration, and no > one else gets to use my fancier middleware with the normal configuration > file. > > - Pipeline deployment configuration should be used only to configure > > essential information about pipeline and individual pipeline > > components. Where complex service data configuration is necessary, > > the component which implements a service should provide its own > > external configuration mechanism. For example, if an XSL service > > is implemented as a WSGI component, and it needs configuration > > knobs of some kind, these knobs should not live within the WSGI > > pipeline deployment file. Instead, each component should have its > > own configuration file. This is the purpose (undemonstrated above) > > of allowing an [application] section to specify a config filename. > > The intelligent finding of files is important to me with any references > to filenames. Working directory is, IMHO, fragile and unreliable. > Absolute paths are reliable but fragile. Yup. > In some cases module names are a more robust way of location resources, > if those modules are self-describing applications. Mostly because > there's a search path. Several projects encourage this kind of system, > though I'm not particularly fond of it because it mixes > installation-specific files with code. > > > - Some people have seem to be arguing that there should be a single > > configuration format across all WSGI applications and gateways to > > configure everything about those components. I don't think this is > > workable. I think the only thing that is workable is to recommend > > to WSGI component authors that they make their components > > configurable using some configuration file or other type of path > > (URL, perhaps). The composition, storage, and format of all other > > configuration data for the component should be chosen by the > > author. > > While I appreciate the difficulty of agreeing on a configuration format, > the way this proposal avoids that is by underpowering the deployment > file so that authors are forced to create other configuration files. I *think* promoting a convention of using environment variables to do configuration and allowing envvars to be set in the main deployment file solves this for apps that don't actually need their own config file. > > - There were a few mentions of being able to configure/create a WSGI > > application at request time by passing name/value string pairs > > "through the pipeline" that would ostensibly be used to create a > > new application instance (thereby dynamically extending or > > modifying the pipeline). I think it's fine if a particular > > component does this, but I'm suggesting that a canonization of the > > mechanism used to do this is not necessary and that it's useful to > > have the ability to define static pipelines for deployment. > > It does concern me that we allow for dynamic systems. A dynamic system > allows for more levels of abstraction in deployment, meaning more > potential for automation. Yes. OTOH, when a certain level of dynamicism is reached, it's no longer possible to configure things declaratively because it becomes a programming task, and this proposal is (so far) just about being able to configure things declaratively so I think we need some sort of compromise. > I think this can be achieved simply by defining a standard based on the > object interface, where the configuration file itself is a reference > implementation (that we expect people will usually use). Semantics from > the configuration file will leak through, but it's lot easier to deal > with (for example) a system that can only support string configuration > values, than a system based on concrete files in a specific format. Sorry, I can't parse that paragraph. > > - If elements in the pipeline depend on "services" (ala > > Paste-as-not-a-chain-of-middleware-components), it may be > > advantageous to create a "service manager" instead of deploying > > each service as middleware. The "service manager" idea is not a > > part of the deployment spec. The service manager would itself > > likely be implemented as a piece of middleware or perhaps just a > > library. > > That might be best. It's also quite possible for the factory to > instantiate more middleware. Which factory? Thanks, - C From pje at telecommunity.com Sun Jul 24 02:21:13 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 23 Jul 2005 20:21:13 -0400 Subject: [Web-SIG] Standardized configuration In-Reply-To: <1122163683.3650.132.camel@plope.dyndns.org> References: <42E17279.5040104@colorstudy.com> <31f07fc30507191108b01ba7d@mail.gmail.com> <1122064687.8446.2.camel@localhost.localdomain> <42E17279.5040104@colorstudy.com> Message-ID: <5.1.1.6.0.20050723201631.027d8628@mail.telecommunity.com> At 08:08 PM 7/23/2005 -0400, Chris McDonough wrote: >Would you maybe rather make it more explicit that some apps are also >gateways, e.g.: > >[application:bleeb] >config = bleeb.conf >factory = bleeb.factory > >[filter:blaz] >config = blaz.conf >factory = blaz.factory That looks backwards to me. Why not just list the sections in pipeline order? i.e., outermost middleware first, and the final application last? For that matter, if you did that, you could specify the above as: [blaz.factory] config=blaz.conf [bleeb.factory] config=bleeb.conf Which looks a lot nicer to me. If you want global WSGI or server options for the stack, one could always use multi-word section names e.g.: [WSGI options] multi_thread = 0 [mod_python options] blah = "feh" and not treat these sections as part of the pipeline. For Ian's idea about requiring particular projects to be available (via pkg_resources), I'd suggest making that sort of thing part of one of the options sections. From chrism at plope.com Sun Jul 24 02:41:43 2005 From: chrism at plope.com (Chris McDonough) Date: Sat, 23 Jul 2005 20:41:43 -0400 Subject: [Web-SIG] Standardized configuration In-Reply-To: <5.1.1.6.0.20050723201631.027d8628@mail.telecommunity.com> References: <42E17279.5040104@colorstudy.com> <31f07fc30507191108b01ba7d@mail.gmail.com> <1122064687.8446.2.camel@localhost.localdomain> <42E17279.5040104@colorstudy.com> <5.1.1.6.0.20050723201631.027d8628@mail.telecommunity.com> Message-ID: <1122165703.3650.144.camel@plope.dyndns.org> On Sat, 2005-07-23 at 20:21 -0400, Phillip J. Eby wrote: > At 08:08 PM 7/23/2005 -0400, Chris McDonough wrote: > >Would you maybe rather make it more explicit that some apps are also > >gateways, e.g.: > > > >[application:bleeb] > >config = bleeb.conf > >factory = bleeb.factory > > > >[filter:blaz] > >config = blaz.conf > >factory = blaz.factory > > That looks backwards to me. Why not just list the sections in pipeline > order? i.e., outermost middleware first, and the final application last? > > For that matter, if you did that, you could specify the above as: > > [blaz.factory] > config=blaz.conf > > [bleeb.factory] > config=bleeb.conf Guess that would work for me, but out of the box, ConfigParser doesn't appear to preserve section ordering. I'm sure we could make it do that. Not a dealbreaker either, but if you ever did want a way to declaratively configure something in the config file like the generic "decision middleware" I described in that message, this wouldn't really work. I hadn't described it yet, but I can also imagine declaring multiple pipelines in the config file and using decision middleware to choose the first app in the next pipeline (as opposed to just an app). - C From ianb at colorstudy.com Sun Jul 24 03:01:25 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Sat, 23 Jul 2005 20:01:25 -0500 Subject: [Web-SIG] Standardized configuration In-Reply-To: <1122163683.3650.132.camel@plope.dyndns.org> References: <31f07fc30507191108b01ba7d@mail.gmail.com> <1122064687.8446.2.camel@localhost.localdomain> <42E17279.5040104@colorstudy.com> <1122163683.3650.132.camel@plope.dyndns.org> Message-ID: <42E2E865.2020702@colorstudy.com> Chris McDonough wrote: > On Fri, 2005-07-22 at 17:26 -0500, Ian Bicking wrote: >>> To do this, we use a ConfigParser-format config file named >>> 'myapplication.conf' that looks like this:: >>> >>> [application:sample1] >>> config = sample1.conf >>> factory = wsgiconfig.tests.sample_components.factory1 >>> >>> [application:sample2] >>> config = sample2.conf >>> factory = wsgiconfig.tests.sample_components.factory2 >>> >>> [pipeline] >>> apps = sample1 sample2 >> >>I think it's confusing to call both these applications. I think >>"middleware" or "filter" would be better. I think people understand >>"filter" far better, so I'm inclined to use that. So... > > > The reason I called them applications instead of filters is because all > of them implement the WSGI "application" API (they all implement "a > callable that accepts two parameters, environ and start_response"). > Some happen to be gateways/filters/middleware/whatever but at least one > is just an application and does no delegation. In my example above, > "sample2" is not a filter, it is the end-point application. "sample1" > is a filter, but it's of course also an application too. Well, the difference I see is that a filter accepts a next-application, where a plain application does not. From the perspective of this configuration file, those seem ver different. In fact, it could actually be: [application:sample1] config = sample1.conf factory = ... ... [application:real_sample1] pipeline = printdebug_app sample1 That is, a "pipeline" simply describes a new application. And then -- perhaps with a conventional name, or through some more global configuration -- we indicate which application we are going to serve. Hmm... thinking about it, this seems much more general, in a very useful way, since anyone can plugin in ways to compose applications. "pipeline" is just one use case for how to compose applications. > Would you maybe rather make it more explicit that some apps are also > gateways, e.g.: > > [application:bleeb] > config = bleeb.conf > factory = bleeb.factory > > [filter:blaz] > config = blaz.conf > factory = blaz.factory > > ? I don't know that there's any way we could make use of the > distinction between the two types in the configurator other than > disallowing people to place an application "before" a filter in a > pipeline through validation. Is there something else you had in mind? I have forgotten what the actual factory interface was, but I think it should be different for the two. Well, I think it *is* different, and passing in a next-application of None just covers up that difference. >>[application:sample2] >># What is this relative to? I hate both absolute paths and >># paths relative to pwd equally... >>config = sample1.conf >>factory = wsgiconfig... > > > This was from a doctest I wrote so I could rely on relative paths, > sorry. You're right. Ummmm... we could probably cause use the > environment as "defaults" to ConfigParser inerpolation and set whatever > we need before the configurator is run: > > $ export APP_ROOT=/home/chrism/myapplication > $ ./wsgi-configurator.py myapplication.conf > > And in myapplication.conf: > > [application:sample1] > config = %(APP_ROOT)s/sample1.conf > factory = myapp.sample1.factory I hate %(APP_ROOT)s as a syntax; I think it's okay to simply say that the configuration loader (in some fashion) should determine the root (maybe with an environmental variable or command line parameter). Though, realistically, there might be several app roots. Apache's root directory configuration (for relative paths) isn't very useful to me, in practice, because it's not flexible enough nor allow more than one root. >>But this is reasonably easy to resolve -- there's a perfectly good >>configuration section sitting there, waiting to be used: >> >> [filter:profile] >> factory = paste.profilemiddleware.ProfileMiddleware >> # Show top 50 functions: >> limit = 50 >> >>This in no way precludes 'config', which is just a special case of this >>general configuration. The only real problem is a possible conflict if >>we wanted to add new special names to the configuration, i.e., >>meta-filter-configuration. > > > I think I'd maybe rather see configuration settings for apps that don't > require much configuration to come in as environment variables (maybe > not necessarily in the "environ" namespace that is implied by the WSGI > callable interface but instead in os.environ). Envvars are > uncontroversial, so they don't cost us any coding time, PEP time, or > brain cycles. Yikes! Were you like the ZConfig holdout or something? os.environ is way, way, way too inflexible. Just the other day I was able to deploy a single application I wrote with two configurations in the same process, without having thought about that possibility ahead of time, and without doing any extra work or avoiding any particular shortcuts. It worked absolutely seamlessly, because I wasn't using any global variables, and I had stuck to a convention where Paste nests configurations in a safe manner. os.environ is very global, very hard to work with from a UI perspective, and very invisible. These configuration files should be totally encapsulated, and easy to nest. There's a small number of places where I might be open to using environmental variables as an *optional* way to feed information, like APP_ROOT (but even there I feel strongly there should be a configuration-file-based way to say the same thing). For middleware configuration it makes no sense at all -- configuration must be encapsulated in the file itself (or the files that are referenced). >>Another thing this could allow is recursive configuration, like: >> >>[application:urlmap] >>factory = paste.urlmap.URLMapBuilder >>app1 = blog >>app1.url = / >>app2 = statview >>app2.url = /stats >>app3 = cms >>app3.host = dev.* >> >>[application:blog] >>factory = leonardo.wsgifactory >>config = myblog.conf >> >>[application:statview] >>factory = statview >>log_location = /var/logs/apache2 >> >>[application:cms] >>factory = proxy >>location = http://localhost:8080 >>map = / /cms.php >> >>[pipeline] >>app = urlmap >> >> >>So URLMapBuilder needs the entire configuration file passed in, along >>with the name of the section it is building. It then reads some keys, >>and builds some named applications, and creates an application that >>delegates based on patterns. That's the kind of configuration file I >>could really use. > > > Maybe one other (less flexible, but declaratively configurable and > simpler to code) way to do this might be by canonizing the idea of > "decision middleware", allowing one component in an otherwise static > pipeline to decide which is the "next" one by executing a Python > expression which runs in a context that exposes the WSGI environment. > > [application:blog] > factory = leonardo.wsgifactory > config = myblog.conf > > [application:statview] > factory = statview > > [application:cms] > factory = proxy > > [decision:urlmapper] > cms = environ['PATH_INFO'].startswith('/cms') > statview = environ['PATH_INFO'].startswith('/statview') > blog = environ['PATH_INFO'].startswith('/blog') Well, that's hard to imagine working. First, you'd need a way to import new functions, since a large number of use cases can't be handled without imports (like re). But even then, these transformations typically modify the environment. For instance, if you map /cms to an application, you have to put /cms onto SCRIPT_NAME, and take it off of PATH_INFO. This keeps URL introspection sane. But the example I gave seems just as declarative to me (moreso, even), and not hard to implement. It just requires that the factory get a reference to the full parsed configuration file. > [environment] > statview.log_location = /var/logs/apache2 > cms.location = http://localhost:8080 > cms.map = / /cms.php > > [pipeline] > apps = urlmapper > Yes. OTOH, when a certain level of dynamicism is reached, it's no > longer possible to configure things declaratively because it becomes a > programming task, and this proposal is (so far) just about being able to > configure things declaratively so I think we need some sort of > compromise. > > >>I think this can be achieved simply by defining a standard based on the >>object interface, where the configuration file itself is a reference >>implementation (that we expect people will usually use). Semantics from >>the configuration file will leak through, but it's lot easier to deal >>with (for example) a system that can only support string configuration >>values, than a system based on concrete files in a specific format. > > > Sorry, I can't parse that paragraph. I mean that a standard should be in terms of what interface the factories must implement, and what objects they are given. The actual implementation of a loader based on an INI configuration file is a useful reference library (and maybe the only library we need), but shouldn't be part of the standard. >>> - If elements in the pipeline depend on "services" (ala >>> Paste-as-not-a-chain-of-middleware-components), it may be >>> advantageous to create a "service manager" instead of deploying >>> each service as middleware. The "service manager" idea is not a >>> part of the deployment spec. The service manager would itself >>> likely be implemented as a piece of middleware or perhaps just a >>> library. >> >>That might be best. It's also quite possible for the factory to >>instantiate more middleware. > > > Which factory? The object referenced by the "factory" key in the configuration file. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Sun Jul 24 03:57:13 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 23 Jul 2005 21:57:13 -0400 Subject: [Web-SIG] Standardized configuration In-Reply-To: <1122165703.3650.144.camel@plope.dyndns.org> References: <5.1.1.6.0.20050723201631.027d8628@mail.telecommunity.com> <42E17279.5040104@colorstudy.com> <31f07fc30507191108b01ba7d@mail.gmail.com> <1122064687.8446.2.camel@localhost.localdomain> <42E17279.5040104@colorstudy.com> <5.1.1.6.0.20050723201631.027d8628@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050723212408.02878df0@mail.telecommunity.com> At 08:41 PM 7/23/2005 -0400, Chris McDonough wrote: >On Sat, 2005-07-23 at 20:21 -0400, Phillip J. Eby wrote: > > At 08:08 PM 7/23/2005 -0400, Chris McDonough wrote: > > >Would you maybe rather make it more explicit that some apps are also > > >gateways, e.g.: > > > > > >[application:bleeb] > > >config = bleeb.conf > > >factory = bleeb.factory > > > > > >[filter:blaz] > > >config = blaz.conf > > >factory = blaz.factory > > > > That looks backwards to me. Why not just list the sections in pipeline > > order? i.e., outermost middleware first, and the final application last? > > > > For that matter, if you did that, you could specify the above as: > > > > [blaz.factory] > > config=blaz.conf > > > > [bleeb.factory] > > config=bleeb.conf > >Guess that would work for me, but out of the box, ConfigParser doesn't >appear to preserve section ordering. I'm sure we could make it do that. >Not a dealbreaker either, but if you ever did want a way to >declaratively configure something in the config file like the generic >"decision middleware" I described in that message, this wouldn't really >work. I hadn't described it yet, but I can also imagine declaring >multiple pipelines in the config file and using decision middleware to >choose the first app in the next pipeline (as opposed to just an app). I consider this a YAGNI, myself. But then again, most of the pipeline stuff seems like a YAGNI to me. Probably that's because everything you guys are talking about implementing with pipelines of middleware, I'd use a single generic function for. If I was wrapping oblivious or legacy apps, I'd just make one middleware object that then calls the generic function to do any and all dynamic requirements, because it would only take a little bit of syntax sugar to implement "configuration" scripts like: use_auth("/some/subdir", some_auth_service) mount_app("/other/path", some_app_object) etc. So, all the time spent on coming up with an uglier, less-powerful pseudo-framework to simulate these capabilities using crude .ini files and poking stuff into environ seems kind of wasteful to me, versus defining a powerful API to -- dare I say it -- "paste" applications together. :) However, such an API deserves to be both powerful and easy-to-use, not kludged together with .ini syntax. That's not saying I don't think WSGI should have a deployment configuration format based on .ini syntax -- I still do! I just don't think it should even attempt to allow anything complex. A simple static pipeline and some server-defined and WSGI-defined options will do nicely for the "simple things are simple" case, and a Python file will do nicely for all the "complex things are possible" cases. That's why I'd like to see this effort split into two parts: 1) simple deployment, and 2) a "pasting" API whose entire purpose in life is to stack, route, and multiplex "middleware" and "applications" without having to explicitly manage a pipeline. This API would use *specificity* as a basis for establishing pipelines, because it's not at all scalable (developer-wise) to set up pipelines on a URL-by-URL basis for a complex application -- especially for applications that aren't page-based! Usually, you'll need some kind of pipeline inheritance to manage that sort of thing. There is little reason, however, why you can't configure a significant portion of a URL space using a single WSGI component, using an appropriate mechanism. For example, recasting my earlier example: def factory(container): container.use_auth("some/subdir", some_auth_service) container.mount_app_factory("other/path", some_app_factory) Then, the 'mount_app_factory()' call could invoke 'some_app_factory(subcontainer)' where 'subcontainer' is a wrapper that prepends 'other/path' to URLs before delegating to 'container'. In other words, once you have this "container API", there's no reason not to just use it to implement the whole stack in a single middleware object. Anyway, this is why I think there should be a "WSGI Services" and/or "WSGI Container API" spec, distinct from a "WSGI Deployment Metadata" spec. These two spheres are both valuable, but I think it'll take longer to get a "deployment" spec if we mix "container API" stuff into it -- and get a much less useful container API than if we set our minds on making a good container API, rather than a souped-up deployment descriptor. From mo.babaei at gmail.com Sun Jul 24 07:02:20 2005 From: mo.babaei at gmail.com (mohammad babaei) Date: Sun, 24 Jul 2005 09:32:20 +0430 Subject: [Web-SIG] change "?" into "/" in url Message-ID: <5bf3a41f050723220239b0eacb@mail.gmail.com> Hi, how can i change "?" into "/" in urls ? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/web-sig/attachments/20050724/511c0409/attachment.htm From jonathan at carnageblender.com Sun Jul 24 07:06:00 2005 From: jonathan at carnageblender.com (Jonathan Ellis) Date: Sat, 23 Jul 2005 22:06:00 -0700 Subject: [Web-SIG] change "?" into "/" in url In-Reply-To: <5bf3a41f050723220239b0eacb@mail.gmail.com> References: <5bf3a41f050723220239b0eacb@mail.gmail.com> Message-ID: <1122181560.12090.239102466@webmail.messagingengine.com> On Sun, 24 Jul 2005 09:32:20 +0430, "mohammad babaei" said: > Hi, > how can i change "?" into "/" in urls ? It's quite platform-dependent... if Apache is an option, mod_rewrite is your friend. Well, okay, mod_rewrite isn't really friendly even on a good day, but it's a common solution. :) -Jonathan From chrism at plope.com Sun Jul 24 09:38:40 2005 From: chrism at plope.com (Chris McDonough) Date: Sun, 24 Jul 2005 03:38:40 -0400 Subject: [Web-SIG] Standardized configuration In-Reply-To: <5.1.1.6.0.20050723212408.02878df0@mail.telecommunity.com> References: <5.1.1.6.0.20050723201631.027d8628@mail.telecommunity.com> <42E17279.5040104@colorstudy.com> <31f07fc30507191108b01ba7d@mail.gmail.com> <1122064687.8446.2.camel@localhost.localdomain> <42E17279.5040104@colorstudy.com> <5.1.1.6.0.20050723201631.027d8628@mail.telecommunity.com> <5.1.1.6.0.20050723212408.02878df0@mail.telecommunity.com> Message-ID: <1122190720.3650.186.camel@plope.dyndns.org> On Sat, 2005-07-23 at 21:57 -0400, Phillip J. Eby wrote: > > > For that matter, if you did that, you could specify the above as: > > > > > > [blaz.factory] > > > config=blaz.conf > > > > > > [bleeb.factory] > > > config=bleeb.conf > > > >Guess that would work for me, but out of the box, ConfigParser doesn't > >appear to preserve section ordering. I'm sure we could make it do that. > >Not a dealbreaker either, but if you ever did want a way to > >declaratively configure something in the config file like the generic > >"decision middleware" I described in that message, this wouldn't really > >work. I hadn't described it yet, but I can also imagine declaring > >multiple pipelines in the config file and using decision middleware to > >choose the first app in the next pipeline (as opposed to just an app). > > I consider this a YAGNI, myself. But then again, most of the pipeline > stuff seems like a YAGNI to me. > > Probably that's because everything you guys are talking about implementing > with pipelines of middleware, I'd use a single generic function for. FWIW, I think I fall somewhere between you and Ian on this, and maybe more towards you. I believe that there are services that are usefully composed as middleware ("oblivious" things like XSL renderering and caches). But sessioning and auth services and whatnot I wouldn't put into middleware. Instead, I'd use some service library that would have a much nicer configuration API. But none of that should really be described within the deployment spec, so I haven't done so. I'm trying to be sensitive of Ian's desire to use middleware for all kinds of services. I also do think there is a place for middleware, so it's useful to be able to compose pipelines declaratively even if they are terribly simple. OTOH, if I set up an actual deployment for a customer, it would rarely consist of more than one or two gateways and then the application and many times it would just be the application if I had no need for "oblivious" middleware apps in the pipeline. Anyway, back to the nitty gritty of config, I'd rather just use ConfigParser "as is" right now than to come up with another .ini parser that preserves section ordering, thus the non-dependence on ordering within the deployment file. > If I > was wrapping oblivious or legacy apps, I'd just make one middleware object > that then calls the generic function to do any and all dynamic > requirements, because it would only take a little bit of syntax sugar to > implement "configuration" scripts like: > > use_auth("/some/subdir", some_auth_service) > mount_app("/other/path", some_app_object) > > etc. So, all the time spent on coming up with an uglier, less-powerful > pseudo-framework to simulate these capabilities using crude .ini files and > poking stuff into environ seems kind of wasteful to me, versus defining a > powerful API to -- dare I say it -- "paste" applications together. :) > > However, such an API deserves to be both powerful and easy-to-use, not > kludged together with .ini syntax. I agree. > That's not saying I don't think WSGI should have a deployment configuration > format based on .ini syntax -- I still do! I just don't think it should > even attempt to allow anything complex. A simple static pipeline and some > server-defined and WSGI-defined options will do nicely for the "simple > things are simple" case, and a Python file will do nicely for all the > "complex things are possible" cases. That's fine by me. > That's why I'd like to see this effort split into two parts: 1) simple > deployment, and 2) a "pasting" API whose entire purpose in life is to > stack, route, and multiplex "middleware" and "applications" without having > to explicitly manage a pipeline. > > This API would use *specificity* as a basis for establishing pipelines, > because it's not at all scalable (developer-wise) to set up pipelines on a > URL-by-URL basis for a complex application -- especially for applications > that aren't page-based! Usually, you'll need some kind of pipeline > inheritance to manage that sort of thing. > > There is little reason, however, why you can't configure a significant > portion of a URL space using a single WSGI component, using an appropriate > mechanism. For example, recasting my earlier example: > > def factory(container): > container.use_auth("some/subdir", some_auth_service) > container.mount_app_factory("other/path", some_app_factory) Yes. I hadn't thought about managing service context based on containment like this (and I like that), but to me, this is a services registration all the same. > Then, the 'mount_app_factory()' call could invoke > 'some_app_factory(subcontainer)' where 'subcontainer' is a wrapper that > prepends 'other/path' to URLs before delegating to 'container'. > > In other words, once you have this "container API", there's no reason not > to just use it to implement the whole stack in a single middleware object. I'd agree. I'd only like to use the deployment spec to compose a pipeline out of very simple oblivious middleware apps and a single endpoint app. > Anyway, this is why I think there should be a "WSGI Services" and/or "WSGI > Container API" spec, distinct from a "WSGI Deployment Metadata" > spec. These two spheres are both valuable, but I think it'll take longer > to get a "deployment" spec if we mix "container API" stuff into it -- and > get a much less useful container API than if we set our minds on making a > good container API, rather than a souped-up deployment descriptor. +1. This is the main reason that I'm trying to resist putting arbitrarily complex configuration into the deployment file. I don't think there's anything about the proposal I sent over the other day that advocates complexity in the config format. As far as I'm concerned, there isn't much configuration for middleware, and when there is, they can use envvars or a separate config file. Most of the more complex configuration I'd tend to do via a services library. - C From chrism at plope.com Sun Jul 24 10:05:43 2005 From: chrism at plope.com (Chris McDonough) Date: Sun, 24 Jul 2005 04:05:43 -0400 Subject: [Web-SIG] Standardized configuration In-Reply-To: <42E2E865.2020702@colorstudy.com> References: <31f07fc30507191108b01ba7d@mail.gmail.com> <1122064687.8446.2.camel@localhost.localdomain> <42E17279.5040104@colorstudy.com> <1122163683.3650.132.camel@plope.dyndns.org> <42E2E865.2020702@colorstudy.com> Message-ID: <1122192343.3650.203.camel@plope.dyndns.org> Thanks for the response... I'm not going to respond point-by-point here because probably nobody has time to read this stuff anyway. But in general: 1) I'm for creating a simple deployment spec that allows you to define static pipelines declaratively. The decision middleware thing is just an idea. I'm not really sure it's even a good idea, but it's a stab at a compromise which would allow for a bit of pipeline dynamicism. 2) I don't have a strong preference one way or another about what the main config looks like other than it should be simple. So I'd probably be fine with any of: [application:foo] factory = foo.factory config = foo.conf [application:bar] factory = bar.factory config = bar.conf [pipeline] apps = foo bar - OR (assuming we have section ordering and we can live with a single pipeline) - [foo.factory] config = foo.conf [bar.factory] config = bar.conf - OR (if we passed the factory a namespace instead of a filename) - [foo.factory] arbitrarykey1 = arbitraryvalue1 arbitrarykey2 = arbitraryvalue2 [bar.factory] arbitrarykey1 = arbitraryvalue1 arbitrarykey2 = arbitraryvalue2 (Forget my ramblings about os.environ. You're right. It all comes out the same.) 3) I don't have a strong opinion on whether middleware and endpoint apps should be treated differently in the config file. If we used section ordering in configparser to imply the pipeline, I'd suspect they wouldn't be. So where does that leave us? - C On Sat, 2005-07-23 at 20:01 -0500, Ian Bicking wrote: > Chris McDonough wrote: > > On Fri, 2005-07-22 at 17:26 -0500, Ian Bicking wrote: > >>> To do this, we use a ConfigParser-format config file named > >>> 'myapplication.conf' that looks like this:: > >>> > >>> [application:sample1] > >>> config = sample1.conf > >>> factory = wsgiconfig.tests.sample_components.factory1 > >>> > >>> [application:sample2] > >>> config = sample2.conf > >>> factory = wsgiconfig.tests.sample_components.factory2 > >>> > >>> [pipeline] > >>> apps = sample1 sample2 > >> > >>I think it's confusing to call both these applications. I think > >>"middleware" or "filter" would be better. I think people understand > >>"filter" far better, so I'm inclined to use that. So... > > > > > > The reason I called them applications instead of filters is because all > > of them implement the WSGI "application" API (they all implement "a > > callable that accepts two parameters, environ and start_response"). > > Some happen to be gateways/filters/middleware/whatever but at least one > > is just an application and does no delegation. In my example above, > > "sample2" is not a filter, it is the end-point application. "sample1" > > is a filter, but it's of course also an application too. > > Well, the difference I see is that a filter accepts a next-application, > where a plain application does not. From the perspective of this > configuration file, those seem ver different. In fact, it could > actually be: > > [application:sample1] > config = sample1.conf > factory = ... > > ... > > [application:real_sample1] > pipeline = printdebug_app sample1 > > That is, a "pipeline" simply describes a new application. And then -- > perhaps with a conventional name, or through some more global > configuration -- we indicate which application we are going to serve. > > Hmm... thinking about it, this seems much more general, in a very useful > way, since anyone can plugin in ways to compose applications. > "pipeline" is just one use case for how to compose applications. > > > Would you maybe rather make it more explicit that some apps are also > > gateways, e.g.: > > > > [application:bleeb] > > config = bleeb.conf > > factory = bleeb.factory > > > > [filter:blaz] > > config = blaz.conf > > factory = blaz.factory > > > > ? I don't know that there's any way we could make use of the > > distinction between the two types in the configurator other than > > disallowing people to place an application "before" a filter in a > > pipeline through validation. Is there something else you had in mind? > > I have forgotten what the actual factory interface was, but I think it > should be different for the two. Well, I think it *is* different, and > passing in a next-application of None just covers up that difference. > > >>[application:sample2] > >># What is this relative to? I hate both absolute paths and > >># paths relative to pwd equally... > >>config = sample1.conf > >>factory = wsgiconfig... > > > > > > This was from a doctest I wrote so I could rely on relative paths, > > sorry. You're right. Ummmm... we could probably cause use the > > environment as "defaults" to ConfigParser inerpolation and set whatever > > we need before the configurator is run: > > > > $ export APP_ROOT=/home/chrism/myapplication > > $ ./wsgi-configurator.py myapplication.conf > > > > And in myapplication.conf: > > > > [application:sample1] > > config = %(APP_ROOT)s/sample1.conf > > factory = myapp.sample1.factory > > I hate %(APP_ROOT)s as a syntax; I think it's okay to simply say that > the configuration loader (in some fashion) should determine the root > (maybe with an environmental variable or command line parameter). > > Though, realistically, there might be several app roots. Apache's root > directory configuration (for relative paths) isn't very useful to me, in > practice, because it's not flexible enough nor allow more than one root. > > >>But this is reasonably easy to resolve -- there's a perfectly good > >>configuration section sitting there, waiting to be used: > >> > >> [filter:profile] > >> factory = paste.profilemiddleware.ProfileMiddleware > >> # Show top 50 functions: > >> limit = 50 > >> > >>This in no way precludes 'config', which is just a special case of this > >>general configuration. The only real problem is a possible conflict if > >>we wanted to add new special names to the configuration, i.e., > >>meta-filter-configuration. > > > > > > I think I'd maybe rather see configuration settings for apps that don't > > require much configuration to come in as environment variables (maybe > > not necessarily in the "environ" namespace that is implied by the WSGI > > callable interface but instead in os.environ). Envvars are > > uncontroversial, so they don't cost us any coding time, PEP time, or > > brain cycles. > > Yikes! Were you like the ZConfig holdout or something? os.environ is > way, way, way too inflexible. > > Just the other day I was able to deploy a single application I wrote > with two configurations in the same process, without having thought > about that possibility ahead of time, and without doing any extra work > or avoiding any particular shortcuts. It worked absolutely seamlessly, > because I wasn't using any global variables, and I had stuck to a > convention where Paste nests configurations in a safe manner. > os.environ is very global, very hard to work with from a UI perspective, > and very invisible. These configuration files should be totally > encapsulated, and easy to nest. > > There's a small number of places where I might be open to using > environmental variables as an *optional* way to feed information, like > APP_ROOT (but even there I feel strongly there should be a > configuration-file-based way to say the same thing). For middleware > configuration it makes no sense at all -- configuration must be > encapsulated in the file itself (or the files that are referenced). > > >>Another thing this could allow is recursive configuration, like: > >> > >>[application:urlmap] > >>factory = paste.urlmap.URLMapBuilder > >>app1 = blog > >>app1.url = / > >>app2 = statview > >>app2.url = /stats > >>app3 = cms > >>app3.host = dev.* > >> > >>[application:blog] > >>factory = leonardo.wsgifactory > >>config = myblog.conf > >> > >>[application:statview] > >>factory = statview > >>log_location = /var/logs/apache2 > >> > >>[application:cms] > >>factory = proxy > >>location = http://localhost:8080 > >>map = / /cms.php > >> > >>[pipeline] > >>app = urlmap > >> > >> > >>So URLMapBuilder needs the entire configuration file passed in, along > >>with the name of the section it is building. It then reads some keys, > >>and builds some named applications, and creates an application that > >>delegates based on patterns. That's the kind of configuration file I > >>could really use. > > > > > > Maybe one other (less flexible, but declaratively configurable and > > simpler to code) way to do this might be by canonizing the idea of > > "decision middleware", allowing one component in an otherwise static > > pipeline to decide which is the "next" one by executing a Python > > expression which runs in a context that exposes the WSGI environment. > > > > [application:blog] > > factory = leonardo.wsgifactory > > config = myblog.conf > > > > [application:statview] > > factory = statview > > > > [application:cms] > > factory = proxy > > > > [decision:urlmapper] > > cms = environ['PATH_INFO'].startswith('/cms') > > statview = environ['PATH_INFO'].startswith('/statview') > > blog = environ['PATH_INFO'].startswith('/blog') > > Well, that's hard to imagine working. First, you'd need a way to import > new functions, since a large number of use cases can't be handled > without imports (like re). But even then, these transformations > typically modify the environment. For instance, if you map /cms to an > application, you have to put /cms onto SCRIPT_NAME, and take it off of > PATH_INFO. This keeps URL introspection sane. > > But the example I gave seems just as declarative to me (moreso, even), > and not hard to implement. It just requires that the factory get a > reference to the full parsed configuration file. > > > [environment] > > statview.log_location = /var/logs/apache2 > > cms.location = http://localhost:8080 > > cms.map = / /cms.php > > > > [pipeline] > > apps = urlmapper > > > Yes. OTOH, when a certain level of dynamicism is reached, it's no > > longer possible to configure things declaratively because it becomes a > > programming task, and this proposal is (so far) just about being able to > > configure things declaratively so I think we need some sort of > > compromise. > > > > > >>I think this can be achieved simply by defining a standard based on the > >>object interface, where the configuration file itself is a reference > >>implementation (that we expect people will usually use). Semantics from > >>the configuration file will leak through, but it's lot easier to deal > >>with (for example) a system that can only support string configuration > >>values, than a system based on concrete files in a specific format. > > > > > > Sorry, I can't parse that paragraph. > > I mean that a standard should be in terms of what interface the > factories must implement, and what objects they are given. The actual > implementation of a loader based on an INI configuration file is a > useful reference library (and maybe the only library we need), but > shouldn't be part of the standard. > > >>> - If elements in the pipeline depend on "services" (ala > >>> Paste-as-not-a-chain-of-middleware-components), it may be > >>> advantageous to create a "service manager" instead of deploying > >>> each service as middleware. The "service manager" idea is not a > >>> part of the deployment spec. The service manager would itself > >>> likely be implemented as a piece of middleware or perhaps just a > >>> library. > >> > >>That might be best. It's also quite possible for the factory to > >>instantiate more middleware. > > > > > > Which factory? > > The object referenced by the "factory" key in the configuration file. > From ianb at colorstudy.com Sun Jul 24 11:04:43 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 24 Jul 2005 04:04:43 -0500 Subject: [Web-SIG] Scarecrow deployment config Message-ID: <42E359AB.5010002@colorstudy.com> So maybe here's a deployment spec we can start with. It looks like: [feature1] someapplication.somemodule.some_function [feature2] someapplication.somemodule.some_function2 You can't get dumber than that! There should also be a "no-feature" section; maybe one without a section identifier, or some special section identifier. It goes in the .egg-info directory. This way elsewhere you can say: application = SomeApplication[feature1] And it's quite unambiguous. Note that there is *no* "configuration" in the egg-info file, because you can't put any configuration related to a deployment in an .egg-info directory, because it's not specific to any deployment. Obviously we still need a way to get configuration in there, but lets say that's a different matter. This puts complex middleware construction into the function that is referenced. This function might be, in turn, an import from a framework. Or it might be some complex setup specific to the application. Whatever. The API would look like: wsgiapp = wsgiref.get_egg_application('SomeApplication[feature1]') Which ultimately resolves to: wsgiapp = some_function() get_egg_application could also take a pkg_resources.Distribution object. Open issues? Yep, there's a bunch. This requires the rest of the configuration to be done quite lazily. But I can fit this into source control; it is about *all* I can fit into source control (I can't have any filenames, I can't have any installation-specific pipelines, I can't have any other apps), but it is also enough that the deployment-specific parts can avoid many complexities of pipelining and factories and all that -- presumably the factory functions handle that. I don't think this is useful without the other pieces (both in front of this configuration file and behind it) but maybe we can think about what those other pieces could look like. I'm particularly open to suggestions that some_function() take some arguments, but I don't know what arguments. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Sun Jul 24 17:29:30 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 24 Jul 2005 11:29:30 -0400 Subject: [Web-SIG] Standardized configuration In-Reply-To: <1122192343.3650.203.camel@plope.dyndns.org> References: <42E2E865.2020702@colorstudy.com> <31f07fc30507191108b01ba7d@mail.gmail.com> <1122064687.8446.2.camel@localhost.localdomain> <42E17279.5040104@colorstudy.com> <1122163683.3650.132.camel@plope.dyndns.org> <42E2E865.2020702@colorstudy.com> Message-ID: <5.1.1.6.0.20050724111347.02733ff0@mail.telecommunity.com> At 04:05 AM 7/24/2005 -0400, Chris McDonough wrote: >- OR (if we passed the factory a namespace instead of a filename) - > > [foo.factory] > arbitrarykey1 = arbitraryvalue1 > arbitrarykey2 = arbitraryvalue2 > > [bar.factory] > arbitrarykey1 = arbitraryvalue1 > arbitrarykey2 = arbitraryvalue2 This one's my favorite. I'd say the semantics are that each factory gets passed the key/value pairs as keyword arguments, with a positional argument used to pass in the "next application". The last factory in the file wouldn't get the positional argument. If a section's name has len(sectionName.split())>1, then the second and subsequent words are directives that change the default interpretation of the section, so that we can have things like: [WSGI options] # WSGI options, like required eggs, threading mode, etc. [mod_python options] # mod_python-specific options [some.app object] # this app is an object, not a factory I don't care that ConfigParser doesn't support any of this, because low-level .ini parsers are easy to write and I've previously written two: one for peak.config and one for pkg_resources. And if the implementation can assume pkg_resources is available, it can use the one that's there to do the sequential section-splitting part of the job. I'm not sure of this, but I tend towards thinking that the 'arbitraryvalues' should be Python expressions, rather than raw strings. I also think that we should support a source-encoding comment to allow for localization of Unicode literals, whether we treat values as raw strings or Python expressions. From pje at telecommunity.com Sun Jul 24 18:49:03 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 24 Jul 2005 12:49:03 -0400 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <42E359AB.5010002@colorstudy.com> Message-ID: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> [cc:ed to distutils-sig because much of the below is about a new egg feature; follow-ups about the web stuff should stay on web-sig] At 04:04 AM 7/24/2005 -0500, Ian Bicking wrote: >So maybe here's a deployment spec we can start with. It looks like: > > [feature1] > someapplication.somemodule.some_function > > [feature2] > someapplication.somemodule.some_function2 > >You can't get dumber than that! There should also be a "no-feature" >section; maybe one without a section identifier, or some special section >identifier. > >It goes in the .egg-info directory. This way elsewhere you can say: > > application = SomeApplication[feature1] I like this a lot, although for a different purpose than the format Chris and I were talking about. I see this fitting into that format as maybe: [feature1 from SomeApplication] # configuration here >And it's quite unambiguous. Note that there is *no* "configuration" in >the egg-info file, because you can't put any configuration related to a >deployment in an .egg-info directory, because it's not specific to any >deployment. Obviously we still need a way to get configuration in >there, but lets say that's a different matter. Easily fixed via what I've been thinking of as the "deployment descriptor"; I would call your proposal here the "import map". Basically, an import map describes a mapping from some sort of feature name to qualified names in the code. I have an extension that I would make, though. Instead of using sections for features, I would use name/value pairs inside of sections named for the kind of import map. E.g.: [wsgi.app_factories] feature1 = somemodule:somefunction feature2 = another.module:SomeClass ... [mime.parsers] application/atom+xml = something:atom_parser ... In other words, feature maps could be a generic mechanism offered by setuptools, with a 'Distribution.load_entry_point(kind,name)' API to retrieve the desired object. That way, we don't end up reinventing this idea for dozens of frameworks or pluggable applications that just need a way to find a few simple entry points into the code. In addition to specifying the entry point, each entry in the import map could optionally list the "extras" that are required if that entry point is used. It could also issue a 'require()' for the corresponding feature if it has any additional requirements listed in the extras_require dictionary. So, I'm thinking that this would be implemented with an entry_points.txt file in .egg-info, but supplied in setup.py like this: setup( ... entry_points = { "wsgi.app_factories": dict( feature1 = "somemodule:somefunction", feature2 = "another.module:SomeClass [extra1,extra2]", ), "mime.parsers": { "application/atom+xml": "something:atom_parser [feedparser]" } }, extras_require = dict( feedparser = [...], extra1 = [...], extra2 = [...], ) ) Anyway, this would make the most common use case for eggs-as-plugins very easy: an application or framework would simply define entry points, and plugin projects would declare the ones they offer in their setup script. I think this is a fantastic idea and I'm about to leap into implementing it. :) >This puts complex middleware construction into the function that is >referenced. This function might be, in turn, an import from a >framework. Or it might be some complex setup specific to the >application. Whatever. > >The API would look like: > > wsgiapp = wsgiref.get_egg_application('SomeApplication[feature1]') > >Which ultimately resolves to: > > wsgiapp = some_function() > >get_egg_application could also take a pkg_resources.Distribution object. Yeah, I'm thinking that this could be implemented as something like: import pkg_resources def get_wsgi_app(project_name, app_name, *args, **kw): dist = pkg_resources.require(project_name)[0] return dist.load_entry_point('wsgi.app_factories', app_name)(*args,**kw) with all the heavy lifting happening in the pkg_resources.Distribution class, along with maybe a new EntryPoint class (to handle parsing entry point specifiers and to do the loading of them. >Open issues? Yep, there's a bunch. This requires the rest of the >configuration to be done quite lazily. Not sure I follow you; the deployment descriptor could contain all the configuration; see the Web-SIG post I made just previous to this one. > But I can fit this into source >control; it is about *all* I can fit into source control (I can't have >any filenames, I can't have any installation-specific pipelines, I can't >have any other apps), but it is also enough that the deployment-specific >parts can avoid many complexities of pipelining and factories and all >that -- presumably the factory functions handle that. +1. > I don't think >this is useful without the other pieces (both in front of this >configuration file and behind it) but maybe we can think about what >those other pieces could look like. I'm particularly open to >suggestions that some_function() take some arguments, but I don't know >what arguments. At this point, I think this "entry points" concept weighs in favor of having the deployment descriptor configuration values be Python expressions, meaning that a WSGI application factory would accept keyword arguments that can be whatever you like in order to configure it. However, after more thought, I think that the "next application" argument should be a keyword argument too, like 'wsgi_next' or some such. This would allow a factory to have required arguments in its signature, e.g.: def some_factory(required_arg_x, required_arg_y, optional_arg="foo", ....): ... The problem with my original idea to have the "next app" be a positional argument is that it would prevent non-middleware applications from having any required arguments. Anyway, I think we're now very close to being able to define a useful deployment descriptor format for establishing pipelines and setting options, that leaves open the possibility to do some very sophisticated things. Hm. Interesting thought... we could have a function to read a deployment descriptor (from a string, stream, or filename) and then return the WSGI application object. You could then wrap this in a simple WSGI app that does filesystem-based URL routing to serve up *.wsgi files from a directory. This would let you bootstrap a deployment capability into existing WSGI servers, without them having to add their own support for it! Web servers and frameworks that have some kind of file extension mapping mechanism could do this directly, of course. I can envision putting *.wsgi files in my web directories and then configuring Apache to run them using either mod_python or FastCGI or even as a CGI, just by tweaking local .htaccess files. However, once you have Apache tweaked the way you want, .wsgi files can be just dropped in and edited. Of course, there are still some open design issues, like caching of .wsgi files (e.g. should the file be checked for changes on each hit? I guess that could be a setting under "WSGI options", and would only work if the descriptor parser was given an actual filename to load from.) From ianb at colorstudy.com Sun Jul 24 19:59:20 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 24 Jul 2005 12:59:20 -0500 Subject: [Web-SIG] Scarecrow deployment config In-Reply-To: <42E359AB.5010002@colorstudy.com> References: <42E359AB.5010002@colorstudy.com> Message-ID: <42E3D6F8.1020905@colorstudy.com> Did I say scarecrow? Man it must have been late, I think I meant strawman ;) -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Sun Jul 24 21:12:02 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 24 Jul 2005 14:12:02 -0500 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> Message-ID: <42E3E802.4030500@colorstudy.com> Phillip J. Eby wrote: >> It goes in the .egg-info directory. This way elsewhere you can say: >> >> application = SomeApplication[feature1] > > > I like this a lot, although for a different purpose than the format > Chris and I were talking about. Yes, this proposal really just simplifies a part of that application deployment configuration, it doesn't replace it. Though it might make other standardization less important. > I see this fitting into that format as > maybe: > > [feature1 from SomeApplication] > # configuration here > > >> And it's quite unambiguous. Note that there is *no* "configuration" in >> the egg-info file, because you can't put any configuration related to a >> deployment in an .egg-info directory, because it's not specific to any >> deployment. Obviously we still need a way to get configuration in >> there, but lets say that's a different matter. > > > Easily fixed via what I've been thinking of as the "deployment > descriptor"; I would call your proposal here the "import map". > Basically, an import map describes a mapping from some sort of feature > name to qualified names in the code. Yes, it really just gives you a shorthand for the factory configuration variable. > I have an extension that I would make, though. Instead of using > sections for features, I would use name/value pairs inside of sections > named for the kind of import map. E.g.: > > [wsgi.app_factories] > feature1 = somemodule:somefunction > feature2 = another.module:SomeClass > ... > > [mime.parsers] > application/atom+xml = something:atom_parser > ... I assume mime.parsers is just a theoretical example of another kind of service a package can provide? But yes, this seems very reasonable, and even allows for loosely versioned specs (e.g., wsgi.app_factories02, which returns factories with a different interface; or maybe something like foo.configuration_schema, an optional entry point that returns the configuration schema for an application described elsewhere). This kind of addresses the issue where the module structure of a package becomes an often unintentional part of its external interface. It feels a little crude in that respect... but maybe not. Is it worse to do: from package.module import name or: name = require('Package').load_entry_point('service_type', 'name') OK, well clearly the second is worse ;) But if that turned into a single function call: name = load_service('Package', 'service_type', 'name') It's not that bad. Maybe even: name = services['Package:service_type:name'] Though service_type feels extraneous to me. I see the benefit of being explicit about what the factory provides, but I don't see the benefit of separating namespaces; the name should be unambiguous. Well... unless you used the same name to group related services, like the configuration schema and the application factory itself. So maybe I retract that criticism. > In addition to specifying the entry point, each entry in the import map > could optionally list the "extras" that are required if that entry point > is used. > It could also issue a 'require()' for the corresponding feature if it > has any additional requirements listed in the extras_require dictionary. I figured each entry point would just map to a feature, so the extra_require dictionary would already have entries. > So, I'm thinking that this would be implemented with an entry_points.txt > file in .egg-info, but supplied in setup.py like this: > > setup( > ... > entry_points = { > "wsgi.app_factories": dict( > feature1 = "somemodule:somefunction", > feature2 = "another.module:SomeClass [extra1,extra2]", > ), > "mime.parsers": { > "application/atom+xml": "something:atom_parser > [feedparser]" > } > }, > extras_require = dict( > feedparser = [...], > extra1 = [...], > extra2 = [...], > ) > ) I think I'd rather just put the canonical version in .egg-info instead of as an argument to setup(); this is one place where using Python expressions isn't a shining example of clarity. But I guess this is fine too; for clarity I'll probably start writing my setup.py files with variable assignments, then a setup() call that just refers to those variables. >> Open issues? Yep, there's a bunch. This requires the rest of the >> configuration to be done quite lazily. > > > Not sure I follow you; the deployment descriptor could contain all the > configuration; see the Web-SIG post I made just previous to this one. Well, when I proposed that the factory be called with zero arguments, that wouldn't allow any configuration to be passed in. >> I don't think >> this is useful without the other pieces (both in front of this >> configuration file and behind it) but maybe we can think about what >> those other pieces could look like. I'm particularly open to >> suggestions that some_function() take some arguments, but I don't know >> what arguments. > > > At this point, I think this "entry points" concept weighs in favor of > having the deployment descriptor configuration values be Python > expressions, meaning that a WSGI application factory would accept > keyword arguments that can be whatever you like in order to configure it. Yes, I'd considered this as well. I'm not a huge fan of Python expressions, because something like "allow_hosts=['127.0.0.1']" seems unnecessarily complex to me. As a convention (maybe not a requirement; a SHOULD) I like if configuration consumers handle strings specially, doing context-sensitive conversion (in this case maybe splitting on ',' or on whitespace). It would make me sad to see a something accept requests from the IP addresses ['1', '2', '7', '.', '0', '.', '0', '.', '1']. This is the small sort of thing that I think makes the experience less pleasant. > However, after more thought, I think that the "next application" > argument should be a keyword argument too, like 'wsgi_next' or some > such. This would allow a factory to have required arguments in its > signature, e.g.: > > def some_factory(required_arg_x, required_arg_y, optional_arg="foo", > ....): > ... > > The problem with my original idea to have the "next app" be a positional > argument is that it would prevent non-middleware applications from > having any required arguments. I think it's fine to declare the next_app keyword argument as special, and promise (by convention) to always pass it in with that name. > Anyway, I think we're now very close to being able to define a useful > deployment descriptor format for establishing pipelines and setting > options, that leaves open the possibility to do some very sophisticated > things. > > Hm. Interesting thought... we could have a function to read a > deployment descriptor (from a string, stream, or filename) and then > return the WSGI application object. You could then wrap this in a > simple WSGI app that does filesystem-based URL routing to serve up > *.wsgi files from a directory. This would let you bootstrap a > deployment capability into existing WSGI servers, without them having to > add their own support for it! Web servers and frameworks that have some > kind of file extension mapping mechanism could do this directly, of > course. I can envision putting *.wsgi files in my web directories and > then configuring Apache to run them using either mod_python or FastCGI > or even as a CGI, just by tweaking local .htaccess files. However, once > you have Apache tweaked the way you want, .wsgi files can be just > dropped in and edited. Absolutely; I see no reason WSGI servers should have any dispatching logic in them, except in cases when they also dispatch to non-Python applications (like Apache). So it seems natural that we present deployment as a single application factory that takes zero or one arguments. > Of course, there are still some open design issues, like caching of > .wsgi files (e.g. should the file be checked for changes on each hit? I > guess that could be a setting under "WSGI options", and would only work > if the descriptor parser was given an actual filename to load from.) I don't know what we'd do if we checked the file and found it wasn't up to date. In this particular case I suppose you could reload the configuration file, but if the change in the configuration file reflected a change in the source code, then you're stuck because reloading in Python is so infeasible. I'm all for warnings, but I don't see how we can do the Right Thing here, as much as I wish it were otherwise. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Sun Jul 24 22:42:35 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 24 Jul 2005 16:42:35 -0400 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <42E3E802.4030500@colorstudy.com> References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050724160148.02874f10@mail.telecommunity.com> At 02:12 PM 7/24/2005 -0500, Ian Bicking wrote: >This kind of addresses the issue where the module structure of a package >becomes an often unintentional part of its external interface. It feels a >little crude in that respect... but maybe not. Is it worse to do: > > from package.module import name > >or: > > name = require('Package').load_entry_point('service_type', 'name') > >OK, well clearly the second is worse ;) But if that turned into a single >function call: > > name = load_service('Package', 'service_type', 'name') > >It's not that bad. Maybe even: > > name = services['Package:service_type:name'] The actual API I have implemented in my CVS working copy is: the_object = load_entry_point('Project', 'group', 'name') which seems pretty clean to me. You can also use dist.load_entry_point('group','name') if you already have a distribution object for some reason. (For example, if you use an activation listener to get callbacks when distributions are activated on sys.path.) To introspect an entry point or check for its existence, you can use: entry_point = get_entry_info('Project', 'group', 'name') which returns either None or an EntryPoint object with various attributes. To list the entry points of a group, or to list the groups, you can use: # dictionary of group names to entry map for each kind group_names = get_entry_map('Project') # dictionary of entry names to corresponding EntryPoint object entry_names = get_entry_map('Project', 'group') These are useful for dynamic entry points. >Though service_type feels extraneous to me. I see the benefit of being >explicit about what the factory provides, but I don't see the benefit of >separating namespaces; the name should be unambiguous. You're making the assumption that the package author defines the entry point names, but that's not the case for application plugins; the application will define entry point names and group names for the application's use, and some applications will need multiple groups. Groups might be keyed statically (i.e. a known set of entry point names) or dynamically (the keys are used to put things in a table, e.g. a file extension handler table). >>In addition to specifying the entry point, each entry in the import map >>could optionally list the "extras" that are required if that entry point >>is used. >>It could also issue a 'require()' for the corresponding feature if it has >>any additional requirements listed in the extras_require dictionary. > >I figured each entry point would just map to a feature, so the >extra_require dictionary would already have entries. The problem with that is that asking for a feature that's not in extras_require is an InvalidOption error, so this would force you to define entries in extras_require even if you have no extras involved. It would also make for redundancies when entry points share an extra. I also don't expect extras to be used as frequently as entry points. >>So, I'm thinking that this would be implemented with an entry_points.txt >>file in .egg-info, but supplied in setup.py like this: >> setup( >> ... >> entry_points = { >> "wsgi.app_factories": dict( >> feature1 = "somemodule:somefunction", >> feature2 = "another.module:SomeClass [extra1,extra2]", >> ), >> "mime.parsers": { >> "application/atom+xml": "something:atom_parser [feedparser]" >> } >> }, >> extras_require = dict( >> feedparser = [...], >> extra1 = [...], >> extra2 = [...], >> ) >> ) > >I think I'd rather just put the canonical version in .egg-info instead of >as an argument to setup(); this is one place where using Python >expressions isn't a shining example of clarity. But I guess this is fine >too; for clarity I'll probably start writing my setup.py files with >variable assignments, then a setup() call that just refers to those variables. The actual syntax I'm going to end up with is: entry_points = { "wsgi.app_factories": [ "feature1 = somemodule:somefunction", "feature2 = another.module:SomeClass [extra1,extra2]", ] } Which is still not great, but it's a bit simpler. If you only have one entry point, you can use: entry_points = { "wsgi.app_factories": "feature = somemodule:somefunction", } Or you can use a long string for each group: entry_points = { "wsgi.app_factories": """ # define features for blah blah feature1 = somemodule:somefunction feature2 = another.module:SomeClass [extra1,extra2] """ } Or even list everything in one giant string: entry_points = """ [wsgi.app_factories] # define features for blah blah feature1 = somemodule:somefunction feature2 = another.module:SomeClass [extra1,extra2] """ This last format is more readable than the others, I think, but there are likely to be setup scripts that will be generating some of this dynamically, and I'd rather not force them to use strings when lists or dictionaries would be more convenient for their use cases. Anyway, I hope to check in a working implementation with tests later today. Currently, the EntryPoint class works, but setuptools doesn't generate the entry_points.txt file yet, and I don't have any tests yet for the entry_points.txt parser or the API functions, although they're already implemented. From pje at telecommunity.com Mon Jul 25 01:20:20 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 24 Jul 2005 19:20:20 -0400 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <42E3E802.4030500@colorstudy.com> References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050724191150.027de240@mail.telecommunity.com> At 02:12 PM 7/24/2005 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>However, after more thought, I think that the "next application" argument >>should be a keyword argument too, like 'wsgi_next' or some such. This >>would allow a factory to have required arguments in its signature, e.g.: >> def some_factory(required_arg_x, required_arg_y, optional_arg="foo", >> ....): >> ... >>The problem with my original idea to have the "next app" be a positional >>argument is that it would prevent non-middleware applications from having >>any required arguments. > >I think it's fine to declare the next_app keyword argument as special, and >promise (by convention) to always pass it in with that name. Actually, now that we have the "entry points" capability in pkg_resources (I just checked it in), we could simply have middleware components looked up in 'wsgi.middleware_factories' and applications looked up in 'wsgi.application_factories'. If a factory can be used for both, you can always list it in both places. Entry points have 1001 uses... I can imagine applications defining entry point groups for URL namespaces. For example, Trac has URLs like /changesets and /roadmap, and these could be defined via a trac.navigation entry point group, e.g.: [trac.navigation] changesets = some.module:foo roadmap = other.module:bar And then people could easily create plugin projects that add additional navigation components. (Trac already has an internal extension point system to do things rather like this, but entry points are automatically discoverable without any prior knowledge of what modules to import.) There are other frameworks out there (e.g. PyBlosxom), both web and non-web, that could really do nicely with having a standard way to do this kind of thing, rather than having to roll their own. From ianb at colorstudy.com Mon Jul 25 02:26:22 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 24 Jul 2005 19:26:22 -0500 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <5.1.1.6.0.20050724160148.02874f10@mail.telecommunity.com> References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724160148.02874f10@mail.telecommunity.com> Message-ID: <42E431AE.6070204@colorstudy.com> Phillip J. Eby wrote: > The actual syntax I'm going to end up with is: > > entry_points = { > "wsgi.app_factories": [ > "feature1 = somemodule:somefunction", > "feature2 = another.module:SomeClass [extra1,extra2]", > ] > } That seems weird to put the assignment inside a string, instead of: entry_points = { 'wsgi.app_factories': { 'app': 'somemodule:somefunction', }, } Also, is there any default name? Like for a package that distributes only one application. Or these just different spellings for the same thing? -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From chrism at plope.com Mon Jul 25 02:35:08 2005 From: chrism at plope.com (Chris McDonough) Date: Sun, 24 Jul 2005 20:35:08 -0400 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> Message-ID: <1122251708.3650.241.camel@plope.dyndns.org> Sorry, I think I may have lost track of where we were going wrt the deployment spec. Specifically, I don't know how we got to using eggs (which I'd really like to, BTW, they're awesome conceptually!) from where we were in the discussion about configuring a WSGI pipeline. What is a "feature"? What is an "import map"? "Entry point"? Should I just get more familiar with eggs to understand what's being discussed here or did I miss a few posts? On Sun, 2005-07-24 at 12:49 -0400, Phillip J. Eby wrote: > [cc:ed to distutils-sig because much of the below is about a new egg > feature; follow-ups about the web stuff should stay on web-sig] > > At 04:04 AM 7/24/2005 -0500, Ian Bicking wrote: > >So maybe here's a deployment spec we can start with. It looks like: > > > > [feature1] > > someapplication.somemodule.some_function > > > > [feature2] > > someapplication.somemodule.some_function2 > > > >You can't get dumber than that! There should also be a "no-feature" > >section; maybe one without a section identifier, or some special section > >identifier. > > > >It goes in the .egg-info directory. This way elsewhere you can say: > > > > application = SomeApplication[feature1] > > I like this a lot, although for a different purpose than the format Chris > and I were talking about. I see this fitting into that format as maybe: > > [feature1 from SomeApplication] > # configuration here > > > >And it's quite unambiguous. Note that there is *no* "configuration" in > >the egg-info file, because you can't put any configuration related to a > >deployment in an .egg-info directory, because it's not specific to any > >deployment. Obviously we still need a way to get configuration in > >there, but lets say that's a different matter. > > Easily fixed via what I've been thinking of as the "deployment descriptor"; > I would call your proposal here the "import map". Basically, an import map > describes a mapping from some sort of feature name to qualified names in > the code. > > I have an extension that I would make, though. Instead of using sections > for features, I would use name/value pairs inside of sections named for the > kind of import map. E.g.: > > [wsgi.app_factories] > feature1 = somemodule:somefunction > feature2 = another.module:SomeClass > ... > > [mime.parsers] > application/atom+xml = something:atom_parser > ... > > In other words, feature maps could be a generic mechanism offered by > setuptools, with a 'Distribution.load_entry_point(kind,name)' API to > retrieve the desired object. That way, we don't end up reinventing this > idea for dozens of frameworks or pluggable applications that just need a > way to find a few simple entry points into the code. > > In addition to specifying the entry point, each entry in the import map > could optionally list the "extras" that are required if that entry point is > used. > It could also issue a 'require()' for the corresponding feature if it has > any additional requirements listed in the extras_require dictionary. > > So, I'm thinking that this would be implemented with an entry_points.txt > file in .egg-info, but supplied in setup.py like this: > > setup( > ... > entry_points = { > "wsgi.app_factories": dict( > feature1 = "somemodule:somefunction", > feature2 = "another.module:SomeClass [extra1,extra2]", > ), > "mime.parsers": { > "application/atom+xml": "something:atom_parser [feedparser]" > } > }, > extras_require = dict( > feedparser = [...], > extra1 = [...], > extra2 = [...], > ) > ) > > Anyway, this would make the most common use case for eggs-as-plugins very > easy: an application or framework would simply define entry points, and > plugin projects would declare the ones they offer in their setup script. > > I think this is a fantastic idea and I'm about to leap into implementing > it. :) > > > >This puts complex middleware construction into the function that is > >referenced. This function might be, in turn, an import from a > >framework. Or it might be some complex setup specific to the > >application. Whatever. > > > >The API would look like: > > > > wsgiapp = wsgiref.get_egg_application('SomeApplication[feature1]') > > > >Which ultimately resolves to: > > > > wsgiapp = some_function() > > > >get_egg_application could also take a pkg_resources.Distribution object. > > Yeah, I'm thinking that this could be implemented as something like: > > import pkg_resources > > def get_wsgi_app(project_name, app_name, *args, **kw): > dist = pkg_resources.require(project_name)[0] > return dist.load_entry_point('wsgi.app_factories', > app_name)(*args,**kw) > > with all the heavy lifting happening in the pkg_resources.Distribution > class, along with maybe a new EntryPoint class (to handle parsing entry > point specifiers and to do the loading of them. > > > >Open issues? Yep, there's a bunch. This requires the rest of the > >configuration to be done quite lazily. > > Not sure I follow you; the deployment descriptor could contain all the > configuration; see the Web-SIG post I made just previous to this one. > > > > But I can fit this into source > >control; it is about *all* I can fit into source control (I can't have > >any filenames, I can't have any installation-specific pipelines, I can't > >have any other apps), but it is also enough that the deployment-specific > >parts can avoid many complexities of pipelining and factories and all > >that -- presumably the factory functions handle that. > > +1. > > > > I don't think > >this is useful without the other pieces (both in front of this > >configuration file and behind it) but maybe we can think about what > >those other pieces could look like. I'm particularly open to > >suggestions that some_function() take some arguments, but I don't know > >what arguments. > > At this point, I think this "entry points" concept weighs in favor of > having the deployment descriptor configuration values be Python > expressions, meaning that a WSGI application factory would accept keyword > arguments that can be whatever you like in order to configure it. > > However, after more thought, I think that the "next application" argument > should be a keyword argument too, like 'wsgi_next' or some such. This > would allow a factory to have required arguments in its signature, e.g.: > > def some_factory(required_arg_x, required_arg_y, optional_arg="foo", > ....): > ... > > The problem with my original idea to have the "next app" be a positional > argument is that it would prevent non-middleware applications from having > any required arguments. > > Anyway, I think we're now very close to being able to define a useful > deployment descriptor format for establishing pipelines and setting > options, that leaves open the possibility to do some very sophisticated > things. > > Hm. Interesting thought... we could have a function to read a deployment > descriptor (from a string, stream, or filename) and then return the WSGI > application object. You could then wrap this in a simple WSGI app that > does filesystem-based URL routing to serve up *.wsgi files from a > directory. This would let you bootstrap a deployment capability into > existing WSGI servers, without them having to add their own support for > it! Web servers and frameworks that have some kind of file extension > mapping mechanism could do this directly, of course. I can envision > putting *.wsgi files in my web directories and then configuring Apache to > run them using either mod_python or FastCGI or even as a CGI, just by > tweaking local .htaccess files. However, once you have Apache tweaked the > way you want, .wsgi files can be just dropped in and edited. > > Of course, there are still some open design issues, like caching of .wsgi > files (e.g. should the file be checked for changes on each hit? I guess > that could be a setting under "WSGI options", and would only work if the > descriptor parser was given an actual filename to load from.) > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/chrism%40plope.com > From ianb at colorstudy.com Mon Jul 25 03:49:22 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 24 Jul 2005 20:49:22 -0500 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <1122251708.3650.241.camel@plope.dyndns.org> References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <1122251708.3650.241.camel@plope.dyndns.org> Message-ID: <42E44522.3090602@colorstudy.com> Chris McDonough wrote: > Sorry, I think I may have lost track of where we were going wrt the > deployment spec. Specifically, I don't know how we got to using eggs > (which I'd really like to, BTW, they're awesome conceptually!) from > where we were in the discussion about configuring a WSGI pipeline. What > is a "feature"? What is an "import map"? "Entry point"? Should I just > get more familiar with eggs to understand what's being discussed here or > did I miss a few posts? It wouldn't hurt to read up on eggs. It's not obvious how they fit here, and it's taken me a while to figure it out. But specifically: * Eggs are packages. Packages can have optional features. Those features can have additional requirements (external packages) that the base package does not have. Package specifications are spelled like "PackageName>=VERSION_NUMBER[FeatureName]" * Import maps and entry points are new things we're discussing now. They are kind of the same thing; basically an entry point maps a logical specification (like a 'wsgi.app_factory' named 'foo') to a actual import statement. That's the configuration file: [wsgi.app_factory] app = mymodule.wsgi:make_app Which means to get an object "app" which fulfills the spec "wsgi.app_factory" you would do "from mymodule.wsgi import make_app" Eggs have an PackageName.egg-info directory, where configuration files can go, and pkg_resources (which is part of setuptools, and associated with easy_install, and defines the require() function) can find and parse them. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Mon Jul 25 04:08:41 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 24 Jul 2005 22:08:41 -0400 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <42E431AE.6070204@colorstudy.com> References: <5.1.1.6.0.20050724160148.02874f10@mail.telecommunity.com> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724160148.02874f10@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050724220544.026e50b0@mail.telecommunity.com> At 07:26 PM 7/24/2005 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>The actual syntax I'm going to end up with is: >> entry_points = { >> "wsgi.app_factories": [ >> "feature1 = somemodule:somefunction", >> "feature2 = another.module:SomeClass [extra1,extra2]", >> ] >> } > >That seems weird to put the assignment inside a string, instead of: > >entry_points = { > 'wsgi.app_factories': { > 'app': 'somemodule:somefunction', > }, >} It turned out that EntryPoint objects really want to know their 'name' for ease of use in various APIs, and it also made it really easy to do stuff like "map(EntryPoint.parse, lines)" to get a list of entry points from a list of lines. >Also, is there any default name? Huh? > Like for a package that distributes only one application. Or these > just different spellings for the same thing? I don't understand you. The most minimal way to specify a single entry point in setup() is with: entry_points = """ [groupname.here] entryname = some.thing:here """ From ianb at colorstudy.com Mon Jul 25 04:21:56 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 24 Jul 2005 21:21:56 -0500 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <5.1.1.6.0.20050724220544.026e50b0@mail.telecommunity.com> References: <5.1.1.6.0.20050724160148.02874f10@mail.telecommunity.com> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724160148.02874f10@mail.telecommunity.com> <5.1.1.6.0.20050724220544.026e50b0@mail.telecommunity.com> Message-ID: <42E44CC4.4000807@colorstudy.com> Phillip J. Eby wrote: >> Like for a package that distributes only one application. Or these >> just different spellings for the same thing? > > > I don't understand you. The most minimal way to specify a single entry > point in setup() is with: > > entry_points = """ > [groupname.here] > entryname = some.thing:here > """ Basically, in the (I think common) case where a package only provides one entry point, do we have to choose an arbitrary entry name. Like, a package that implements one web application; it seems like that application would have to be named. Maybe that name could match the package name, or a fixed name we agree upon, but otherwise it adds another name to the mix. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Mon Jul 25 04:24:28 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 24 Jul 2005 22:24:28 -0400 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <1122251708.3650.241.camel@plope.dyndns.org> References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com> At 08:35 PM 7/24/2005 -0400, Chris McDonough wrote: >Sorry, I think I may have lost track of where we were going wrt the >deployment spec. Specifically, I don't know how we got to using eggs >(which I'd really like to, BTW, they're awesome conceptually!) from >where we were in the discussion about configuring a WSGI pipeline. What >is a "feature"? What is an "import map"? "Entry point"? Should I just >get more familiar with eggs to understand what's being discussed here or >did I miss a few posts? I suggest this post as the shortest architectural introduction to the whole egg thang: http://mail.python.org/pipermail/distutils-sig/2005-June/004652.html It explains pretty much all of the terminology I'm currently using, except for the new terms invented today... Entry points are a new concept, invented today by Ian and myself. Ian proposed having a mapping file (which I dubbed an "import map") included in an egg's metadata, and then referring to named entries from a pipeline descriptor, so that you don't have to know or care about the exact name to import. The application or middleware factory name would be looked up in the egg's import map in order to find the actual factory object. I took Ian's proposal and did two things: 1) Generalized the idea to a concept of "entry points". An entry point is a name that corresponds to an import specification, and an optional list of "extras" (see terminology link above) that the entry point may require. Entry point names exist in a namespace called an "entry point group", and I implied that the WSGI deployment spec would define two such groups: wsgi.applications and wsgi.middleware, but a vast number of other possibilities for entry points and groups exist. In fact, I went ahead and implemented them in setuptools today, and realized I could use them to register setup commands with setuptools, making it extensible by any project that registers entry points in a 'distutils.commands' group. 2) I then proposed that we extend our deployment descriptor (.wsgi file) syntax so that you can do things like: [foo from SomeProject] # configuration here What this does is tell the WSGI deployment API to look up the "foo" entry point in either the wsgi.middleware or wsgi.applications entry point group for the named project, according to whether it's the last item in the .wsgi file. It then invokes the factory as before, with the configuration values as keyword arguments. This proposal is of course an *extension*; it should still be possible to use regular dotted names as section headings, if you haven't yet drunk the setuptools kool-aid. But, it makes for interesting possibilities because we could now have a tool that reads a WSGI deployment descriptor and runs easy_install to find and download the right projects. So, you could potentially just write up a descriptor that lists what you want and the server could install it, although I think I personally would want to run a tool explicitly; maybe I'll eventually add a --wsgi=FILENAME option to EasyInstall that would tell it to find out what to install from a WSGI deployment descriptor. That would actually be pretty cool, when you realize it means that all you have to do to get an app deployed across a bunch of web servers is to copy the deployment descriptor and tell 'em to install stuff. You can always create an NFS-mounted cache directory where you put pre-built eggs, and EasyInstall would just fetch and extract them in that case. Whew. Almost makes me wish I was back in my web apps shop, where this kind of thing would've been *really* useful to have. From ianb at colorstudy.com Mon Jul 25 04:33:53 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 24 Jul 2005 21:33:53 -0500 Subject: [Web-SIG] WSGI deployment part 2: factory API Message-ID: <42E44F91.2040703@colorstudy.com> OK, so lets assume we have a way (entry points) to get an object that represents the package's WSGI application, as a factory. What do we do with that factory? That is, how do we make an application out of the factory? Well, it seems rather obvious that we call the factory, so what do we pass? Also, consider that there might be two separate but similar APIs, one for filters and another for applications. We could go free-form, and you call application factories with keyword arguments that are dependent on the application. This serves as configuration. You can call filter factories with keyword arguments, and one special (required?) keyword argument "next_app". Another option is we pass in a single dictionary that represents the entire configuration. This leaves room to add more arguments later, where if we use keyword arguments for configuration then there's really no room at all (the entire signature of the factory is taken up by application-specific configuration). Another part of the API that I can see as useful is passing in the distribution object itself. This way a function in paste (or wherever) could serve as the loader for any application with the proper framework-specific metadata (and so probably this could devolve into per-framework loaders). This would perhaps preclude non-setuptools factories, though you could also pass in None for the distribution for those cases. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Mon Jul 25 04:33:31 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 24 Jul 2005 22:33:31 -0400 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <42E44522.3090602@colorstudy.com> References: <1122251708.3650.241.camel@plope.dyndns.org> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <1122251708.3650.241.camel@plope.dyndns.org> Message-ID: <5.1.1.6.0.20050724222434.02865820@mail.telecommunity.com> At 08:49 PM 7/24/2005 -0500, Ian Bicking wrote: >Chris McDonough wrote: >>Sorry, I think I may have lost track of where we were going wrt the >>deployment spec. Specifically, I don't know how we got to using eggs >>(which I'd really like to, BTW, they're awesome conceptually!) from >>where we were in the discussion about configuring a WSGI pipeline. What >>is a "feature"? What is an "import map"? "Entry point"? Should I just >>get more familiar with eggs to understand what's being discussed here or >>did I miss a few posts? > >It wouldn't hurt to read up on eggs. It's not obvious how they fit here, >and it's taken me a while to figure it out. But specifically: > >* Eggs are packages. Packages can have optional features. I've taken to using the term "project" to mean a collection of packages, scripts, data files, etc., wrapped with a setup script. In order to avoid confusion with other kinds of "features" and "options", the official term for those things is now "extras". An "extra" is some optional capability of a project that may incur additional requirements. > Those features can have additional requirements (external packages) > that the base package does not have. Package specifications are spelled > like "PackageName>=VERSION_NUMBER[FeatureName]" Actually, it's "ProjectName[extra,...]>=version", and you can list multiple version operators, like "FooBar>1.2,<2.1,==2.6,>3.0" to mean versions between 1.2 and 2.1 exclusive, and anything *after* 3.0, but 2.6 was okay too. :) I'm proposing that for WSGI entry points, we allow everything but the [extras_list] in a section heading, e.g.: [wiki from FooBarWiki>=2.0] would mean what it looks like it does. By the way, all this version parsing, dependency checking, PyPI-finding, auto-download and build from source or binary stuff already exists; it's not a hypothetical pie-in-the-sky proposal. >* Import maps and entry points are new things we're discussing now. They >are kind of the same thing; basically an entry point maps a logical >specification (like a 'wsgi.app_factory' named 'foo') to a actual import >statement. That's the configuration file: > > [wsgi.app_factory] > app = mymodule.wsgi:make_app > >Which means to get an object "app" which fulfills the spec >"wsgi.app_factory" you would do "from mymodule.wsgi import make_app" > >Eggs have an PackageName.egg-info directory, where configuration files can >go, and pkg_resources (which is part of setuptools, and associated with >easy_install, and defines the require() function) can find and parse them. Yes, and with the CVS HEAD version of setuptools you can now specify a project's entry point map in it setup script, and it will generate the entry point file in the project's .egg-info directory, and parse it at runtime when you request lookup of an entry point. There's an API in pkg_resources that lets you do: factory = load_entry_point("ProjectName", "wsgi.app_factory", "app") which will do the same as if you had said "from mymodule.wsgi import make_app as factory". From pje at telecommunity.com Mon Jul 25 05:06:32 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 24 Jul 2005 23:06:32 -0400 Subject: [Web-SIG] WSGI deployment part 2: factory API In-Reply-To: <42E44F91.2040703@colorstudy.com> Message-ID: <5.1.1.6.0.20050724230048.02865e98@mail.telecommunity.com> At 09:33 PM 7/24/2005 -0500, Ian Bicking wrote: >We could go free-form, and you call application factories with keyword >arguments that are dependent on the application. This serves as >configuration. You can call filter factories with keyword arguments, >and one special (required?) keyword argument "next_app". I think we can just go positional on the next-app argument, since only filter factories can do anything with it. >Another option is we pass in a single dictionary that represents the >entire configuration. This leaves room to add more arguments later, >where if we use keyword arguments for configuration then there's really >no room at all (the entire signature of the factory is taken up by >application-specific configuration). YAGNI; We don't have any place for this theoretical extra configuration to come from, and no use cases that can't be met by just adding it to the configuration. Early error trapping is important, so I think it's better to let factories use normal Python argument validation to have required arguments, optional values, and to reject unrecognized arguments. >Another part of the API that I can see as useful is passing in the >distribution object itself. Which distribution? The one the entry point came from? It already knows (or can find out) what distribution it's in. > This way a function in paste (or wherever) >could serve as the loader for any application with the proper >framework-specific metadata (and so probably this could devolve into >per-framework loaders). I don't understand. > This would perhaps preclude non-setuptools >factories, though you could also pass in None for the distribution for >those cases. Huh? I propose that we allow import specs as factory designators, so that the default case works fine without setuptools. You only need setuptools if you use factory specs of the form "[feature from Project...]". Of course, they're so cool that everybody will *want* to use them... ;) From ianb at colorstudy.com Mon Jul 25 05:26:29 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 24 Jul 2005 22:26:29 -0500 Subject: [Web-SIG] WSGI deployment part 2: factory API In-Reply-To: <5.1.1.6.0.20050724230048.02865e98@mail.telecommunity.com> References: <5.1.1.6.0.20050724230048.02865e98@mail.telecommunity.com> Message-ID: <42E45BE5.9050807@colorstudy.com> Phillip J. Eby wrote: >> Another option is we pass in a single dictionary that represents the >> entire configuration. This leaves room to add more arguments later, >> where if we use keyword arguments for configuration then there's really >> no room at all (the entire signature of the factory is taken up by >> application-specific configuration). > > > YAGNI; We don't have any place for this theoretical extra configuration > to come from, and no use cases that can't be met by just adding it to > the configuration. Early error trapping is important, so I think it's > better to let factories use normal Python argument validation to have > required arguments, optional values, and to reject unrecognized arguments. I think in practice I'll always take **kw, because I otherwise I'd have to enumerate all the configuration all the middleware takes, and that's impractical. I suppose I could later assemble the middleware, determine what configuration the actual set of middleware+application takes, then check for extras. But I doubt I will. And even if I do, it's incidental -- I'm quite sure I won't use using the function signature for parameter checking. >> Another part of the API that I can see as useful is passing in the >> distribution object itself. > > > Which distribution? The one the entry point came from? It already > knows (or can find out) what distribution it's in. I mean like: [wsgi.app_factory] filebrowser = paste.wareweb:make_app Where paste.wareweb.make_app knows how to build an application from filename conventions in the package itself, even though the paste.wareweb module isn't in the project itself. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From chrism at plope.com Mon Jul 25 08:33:43 2005 From: chrism at plope.com (Chris McDonough) Date: Mon, 25 Jul 2005 02:33:43 -0400 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com> References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com> Message-ID: <1122273223.8767.18.camel@localhost.localdomain> Thanks... I'm still confused about high level requirements so please try to be patient with me as I try get back on track. These are the requirements as I understand them: 1. We want to be able to distribute WSGI applications and middleware (presumably in a format supported by setuptools). 3. We want to be able to configure a WSGI application in order to create an application instance. 2. We want a way to combine configured instances of those applications into pipelines and start an "instance" of a pipeline. Are these requirements the ones being discussed? If so, which of the config file formats we've been discussing matches which requirement? Thanks, - C On Sun, 2005-07-24 at 22:24 -0400, Phillip J. Eby wrote: > At 08:35 PM 7/24/2005 -0400, Chris McDonough wrote: > >Sorry, I think I may have lost track of where we were going wrt the > >deployment spec. Specifically, I don't know how we got to using eggs > >(which I'd really like to, BTW, they're awesome conceptually!) from > >where we were in the discussion about configuring a WSGI pipeline. What > >is a "feature"? What is an "import map"? "Entry point"? Should I just > >get more familiar with eggs to understand what's being discussed here or > >did I miss a few posts? > > I suggest this post as the shortest architectural introduction to the whole > egg thang: > > http://mail.python.org/pipermail/distutils-sig/2005-June/004652.html > > It explains pretty much all of the terminology I'm currently using, except > for the new terms invented today... > > Entry points are a new concept, invented today by Ian and myself. Ian > proposed having a mapping file (which I dubbed an "import map") included in > an egg's metadata, and then referring to named entries from a pipeline > descriptor, so that you don't have to know or care about the exact name to > import. The application or middleware factory name would be looked up in > the egg's import map in order to find the actual factory object. > > I took Ian's proposal and did two things: > > 1) Generalized the idea to a concept of "entry points". An entry point is > a name that corresponds to an import specification, and an optional list of > "extras" (see terminology link above) that the entry point may > require. Entry point names exist in a namespace called an "entry point > group", and I implied that the WSGI deployment spec would define two such > groups: wsgi.applications and wsgi.middleware, but a vast number of other > possibilities for entry points and groups exist. In fact, I went ahead and > implemented them in setuptools today, and realized I could use them to > register setup commands with setuptools, making it extensible by any > project that registers entry points in a 'distutils.commands' group. > > 2) I then proposed that we extend our deployment descriptor (.wsgi file) > syntax so that you can do things like: > > [foo from SomeProject] > # configuration here > > What this does is tell the WSGI deployment API to look up the "foo" entry > point in either the wsgi.middleware or wsgi.applications entry point group > for the named project, according to whether it's the last item in the .wsgi > file. It then invokes the factory as before, with the configuration values > as keyword arguments. > > This proposal is of course an *extension*; it should still be possible to > use regular dotted names as section headings, if you haven't yet drunk the > setuptools kool-aid. But, it makes for interesting possibilities because > we could now have a tool that reads a WSGI deployment descriptor and runs > easy_install to find and download the right projects. So, you could > potentially just write up a descriptor that lists what you want and the > server could install it, although I think I personally would want to run a > tool explicitly; maybe I'll eventually add a --wsgi=FILENAME option to > EasyInstall that would tell it to find out what to install from a WSGI > deployment descriptor. > > That would actually be pretty cool, when you realize it means that all you > have to do to get an app deployed across a bunch of web servers is to copy > the deployment descriptor and tell 'em to install stuff. You can always > create an NFS-mounted cache directory where you put pre-built eggs, and > EasyInstall would just fetch and extract them in that case. > > Whew. Almost makes me wish I was back in my web apps shop, where this kind > of thing would've been *really* useful to have. > From chrism at plope.com Mon Jul 25 08:40:49 2005 From: chrism at plope.com (Chris McDonough) Date: Mon, 25 Jul 2005 02:40:49 -0400 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <1122273223.8767.18.camel@localhost.localdomain> References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com> <1122273223.8767.18.camel@localhost.localdomain> Message-ID: <1122273649.8767.25.camel@localhost.localdomain> BTW, a simple example that includes proposed solutions for all of these requirements would go a long way towards helping me (and maybe others) understand how all the pieces fit together. Maybe something like: - Define two simple WSGI components: a WSGI middleware and a WSGI application. - Describe how to package each as an indpendent egg. - Describe how to configure an instance of the application. - Describe how to configure an instance of the middleware - Describe how to string them together into a pipeline. - C On Mon, 2005-07-25 at 02:33 -0400, Chris McDonough wrote: > Thanks... > > I'm still confused about high level requirements so please try to be > patient with me as I try get back on track. > > These are the requirements as I understand them: > > 1. We want to be able to distribute WSGI applications and middleware > (presumably in a format supported by setuptools). > > 3. We want to be able to configure a WSGI application in order > to create an application instance. > > 2. We want a way to combine configured instances of those > applications into pipelines and start an "instance" of a pipeline. > > Are these requirements the ones being discussed? If so, which of the > config file formats we've been discussing matches which requirement? > > Thanks, > > - C > > On Sun, 2005-07-24 at 22:24 -0400, Phillip J. Eby wrote: > > At 08:35 PM 7/24/2005 -0400, Chris McDonough wrote: > > >Sorry, I think I may have lost track of where we were going wrt the > > >deployment spec. Specifically, I don't know how we got to using eggs > > >(which I'd really like to, BTW, they're awesome conceptually!) from > > >where we were in the discussion about configuring a WSGI pipeline. What > > >is a "feature"? What is an "import map"? "Entry point"? Should I just > > >get more familiar with eggs to understand what's being discussed here or > > >did I miss a few posts? > > > > I suggest this post as the shortest architectural introduction to the whole > > egg thang: > > > > http://mail.python.org/pipermail/distutils-sig/2005-June/004652.html > > > > It explains pretty much all of the terminology I'm currently using, except > > for the new terms invented today... > > > > Entry points are a new concept, invented today by Ian and myself. Ian > > proposed having a mapping file (which I dubbed an "import map") included in > > an egg's metadata, and then referring to named entries from a pipeline > > descriptor, so that you don't have to know or care about the exact name to > > import. The application or middleware factory name would be looked up in > > the egg's import map in order to find the actual factory object. > > > > I took Ian's proposal and did two things: > > > > 1) Generalized the idea to a concept of "entry points". An entry point is > > a name that corresponds to an import specification, and an optional list of > > "extras" (see terminology link above) that the entry point may > > require. Entry point names exist in a namespace called an "entry point > > group", and I implied that the WSGI deployment spec would define two such > > groups: wsgi.applications and wsgi.middleware, but a vast number of other > > possibilities for entry points and groups exist. In fact, I went ahead and > > implemented them in setuptools today, and realized I could use them to > > register setup commands with setuptools, making it extensible by any > > project that registers entry points in a 'distutils.commands' group. > > > > 2) I then proposed that we extend our deployment descriptor (.wsgi file) > > syntax so that you can do things like: > > > > [foo from SomeProject] > > # configuration here > > > > What this does is tell the WSGI deployment API to look up the "foo" entry > > point in either the wsgi.middleware or wsgi.applications entry point group > > for the named project, according to whether it's the last item in the .wsgi > > file. It then invokes the factory as before, with the configuration values > > as keyword arguments. > > > > This proposal is of course an *extension*; it should still be possible to > > use regular dotted names as section headings, if you haven't yet drunk the > > setuptools kool-aid. But, it makes for interesting possibilities because > > we could now have a tool that reads a WSGI deployment descriptor and runs > > easy_install to find and download the right projects. So, you could > > potentially just write up a descriptor that lists what you want and the > > server could install it, although I think I personally would want to run a > > tool explicitly; maybe I'll eventually add a --wsgi=FILENAME option to > > EasyInstall that would tell it to find out what to install from a WSGI > > deployment descriptor. > > > > That would actually be pretty cool, when you realize it means that all you > > have to do to get an app deployed across a bunch of web servers is to copy > > the deployment descriptor and tell 'em to install stuff. You can always > > create an NFS-mounted cache directory where you put pre-built eggs, and > > EasyInstall would just fetch and extract them in that case. > > > > Whew. Almost makes me wish I was back in my web apps shop, where this kind > > of thing would've been *really* useful to have. > > > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/chrism%40plope.com > From chrism at plope.com Mon Jul 25 09:02:27 2005 From: chrism at plope.com (Chris McDonough) Date: Mon, 25 Jul 2005 03:02:27 -0400 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <1122273649.8767.25.camel@localhost.localdomain> References: <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com> <1122273223.8767.18.camel@localhost.localdomain> <1122273649.8767.25.camel@localhost.localdomain> Message-ID: <1122274948.8767.40.camel@localhost.localdomain> Actually, let me give this a shot. We package up an egg called helloworld.egg. It happens to contain something that can be used as a WSGI component. Let's say it's a WSGI application that always returns 'Hello World'. And let's say it also contains middleware that lowercases anything that passes through before it's returned. The implementations of these components could be as follows: class HelloWorld: def __init__(self, app, **kw): pass # nothing to configure def __call__(self, environ, start_response): start_response('200 OK', []) return ['Hello World'] class Lowercaser: def __init__(self, app, **kw): self.app = app # nothing else to configure def __call__(self, environ, start_response): for chunk in self.app(environ, start_response): yield chunk.lower() An import map would ship inside of the egg-info dir: [wsgi.app_factories] helloworld = helloworld:HelloWorld lowercaser = helloworld:Lowercaser So we install the egg and this does nothing except allow it to be used from within Python. But when we create a "deployment descriptor" like so in a text editor: [helloworld from helloworld] [lowercaser from helloworld] ... and run some "starter" script that parses that as a pipeline, creates the two instances, wires them together, and we get a running pipeline? Am I on track? OK, back to Battlestar Galactica ;-) On Mon, 2005-07-25 at 02:40 -0400, Chris McDonough wrote: > BTW, a simple example that includes proposed solutions for all of these > requirements would go a long way towards helping me (and maybe others) > understand how all the pieces fit together. Maybe something like: > > - Define two simple WSGI components: a WSGI middleware and a WSGI > application. > > - Describe how to package each as an indpendent egg. > > - Describe how to configure an instance of the application. > > - Describe how to configure an instance of the middleware > > - Describe how to string them together into a pipeline. > > - C > > > On Mon, 2005-07-25 at 02:33 -0400, Chris McDonough wrote: > > Thanks... > > > > I'm still confused about high level requirements so please try to be > > patient with me as I try get back on track. > > > > These are the requirements as I understand them: > > > > 1. We want to be able to distribute WSGI applications and middleware > > (presumably in a format supported by setuptools). > > > > 3. We want to be able to configure a WSGI application in order > > to create an application instance. > > > > 2. We want a way to combine configured instances of those > > applications into pipelines and start an "instance" of a pipeline. > > > > Are these requirements the ones being discussed? If so, which of the > > config file formats we've been discussing matches which requirement? > > > > Thanks, > > > > - C > > > > On Sun, 2005-07-24 at 22:24 -0400, Phillip J. Eby wrote: > > > At 08:35 PM 7/24/2005 -0400, Chris McDonough wrote: > > > >Sorry, I think I may have lost track of where we were going wrt the > > > >deployment spec. Specifically, I don't know how we got to using eggs > > > >(which I'd really like to, BTW, they're awesome conceptually!) from > > > >where we were in the discussion about configuring a WSGI pipeline. What > > > >is a "feature"? What is an "import map"? "Entry point"? Should I just > > > >get more familiar with eggs to understand what's being discussed here or > > > >did I miss a few posts? > > > > > > I suggest this post as the shortest architectural introduction to the whole > > > egg thang: > > > > > > http://mail.python.org/pipermail/distutils-sig/2005-June/004652.html > > > > > > It explains pretty much all of the terminology I'm currently using, except > > > for the new terms invented today... > > > > > > Entry points are a new concept, invented today by Ian and myself. Ian > > > proposed having a mapping file (which I dubbed an "import map") included in > > > an egg's metadata, and then referring to named entries from a pipeline > > > descriptor, so that you don't have to know or care about the exact name to > > > import. The application or middleware factory name would be looked up in > > > the egg's import map in order to find the actual factory object. > > > > > > I took Ian's proposal and did two things: > > > > > > 1) Generalized the idea to a concept of "entry points". An entry point is > > > a name that corresponds to an import specification, and an optional list of > > > "extras" (see terminology link above) that the entry point may > > > require. Entry point names exist in a namespace called an "entry point > > > group", and I implied that the WSGI deployment spec would define two such > > > groups: wsgi.applications and wsgi.middleware, but a vast number of other > > > possibilities for entry points and groups exist. In fact, I went ahead and > > > implemented them in setuptools today, and realized I could use them to > > > register setup commands with setuptools, making it extensible by any > > > project that registers entry points in a 'distutils.commands' group. > > > > > > 2) I then proposed that we extend our deployment descriptor (.wsgi file) > > > syntax so that you can do things like: > > > > > > [foo from SomeProject] > > > # configuration here > > > > > > What this does is tell the WSGI deployment API to look up the "foo" entry > > > point in either the wsgi.middleware or wsgi.applications entry point group > > > for the named project, according to whether it's the last item in the .wsgi > > > file. It then invokes the factory as before, with the configuration values > > > as keyword arguments. > > > > > > This proposal is of course an *extension*; it should still be possible to > > > use regular dotted names as section headings, if you haven't yet drunk the > > > setuptools kool-aid. But, it makes for interesting possibilities because > > > we could now have a tool that reads a WSGI deployment descriptor and runs > > > easy_install to find and download the right projects. So, you could > > > potentially just write up a descriptor that lists what you want and the > > > server could install it, although I think I personally would want to run a > > > tool explicitly; maybe I'll eventually add a --wsgi=FILENAME option to > > > EasyInstall that would tell it to find out what to install from a WSGI > > > deployment descriptor. > > > > > > That would actually be pretty cool, when you realize it means that all you > > > have to do to get an app deployed across a bunch of web servers is to copy > > > the deployment descriptor and tell 'em to install stuff. You can always > > > create an NFS-mounted cache directory where you put pre-built eggs, and > > > EasyInstall would just fetch and extract them in that case. > > > > > > Whew. Almost makes me wish I was back in my web apps shop, where this kind > > > of thing would've been *really* useful to have. > > > > > > > _______________________________________________ > > Web-SIG mailing list > > Web-SIG at python.org > > Web SIG: http://www.python.org/sigs/web-sig > > Unsubscribe: http://mail.python.org/mailman/options/web-sig/chrism%40plope.com > > > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/chrism%40plope.com > From pje at telecommunity.com Mon Jul 25 16:39:48 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 25 Jul 2005 10:39:48 -0400 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <1122274948.8767.40.camel@localhost.localdomain> References: <1122273649.8767.25.camel@localhost.localdomain> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com> <1122273223.8767.18.camel@localhost.localdomain> <1122273649.8767.25.camel@localhost.localdomain> Message-ID: <5.1.1.6.0.20050725103138.02879238@mail.telecommunity.com> At 03:02 AM 7/25/2005 -0400, Chris McDonough wrote: >Actually, let me give this a shot. > >We package up an egg called helloworld.egg. It happens to contain >something that can be used as a WSGI component. Let's say it's a WSGI >application that always returns 'Hello World'. And let's say it also >contains middleware that lowercases anything that passes through before >it's returned. > >The implementations of these components could be as follows: > >class HelloWorld: > def __init__(self, app, **kw): > pass # nothing to configure > > def __call__(self, environ, start_response): > start_response('200 OK', []) > return ['Hello World'] I'm thinking that an application like this wouldn't take an 'app' constuctor parameter, and if it takes no configuration parameters it doesn't need **kw, but good so far. >class Lowercaser: > def __init__(self, app, **kw): > self.app = app > # nothing else to configure > > def __call__(self, environ, start_response): > for chunk in self.app(environ, start_response): > yield chunk.lower() Again, no need for **kw if it doesn't take any configuration, but okay. >An import map would ship inside of the egg-info dir: > >[wsgi.app_factories] >helloworld = helloworld:HelloWorld >lowercaser = helloworld:Lowercaser I'm thinking it would be more like: [wsgi.middleware] lowercaser = helloworld:Lowercaser [wsgi.apps] helloworld = helloworld:HelloWorld and you'd specify it in the setup script as something like this: setup( #... entry_points = { 'wsgi.apps': ['helloworld = helloworld:HelloWorld'] 'wsgi.middleware': ['lowercaser = helloworld:Lowercaser'] } ) (And the CVS version of setuptools already supports this.) >So we install the egg and this does nothing except allow it to be used >from within Python. > >But when we create a "deployment descriptor" like so in a text editor: > >[helloworld from helloworld] > >[lowercaser from helloworld] Opposite order, though; the lowercaser comes first because it's the middleware; the application would always come last, because they're listed in the order in which they receive data, just like a pipes-and-filters command line. >... and run some "starter" script that parses that as a pipeline, ... possibly using a #! line if you're using CGI or FastCGI with Apache or some other non-Python webserver. >creates the two instances, wires them together, and we get a running >pipeline? > >Am I on track? Definitely. From pje at telecommunity.com Mon Jul 25 16:40:49 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 25 Jul 2005 10:40:49 -0400 Subject: [Web-SIG] WSGI deployment part 2: factory API In-Reply-To: <42E45BE5.9050807@colorstudy.com> References: <5.1.1.6.0.20050724230048.02865e98@mail.telecommunity.com> <5.1.1.6.0.20050724230048.02865e98@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050724233258.026e69a8@mail.telecommunity.com> At 10:26 PM 7/24/2005 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>>Another option is we pass in a single dictionary that represents the >>>entire configuration. This leaves room to add more arguments later, >>>where if we use keyword arguments for configuration then there's really >>>no room at all (the entire signature of the factory is taken up by >>>application-specific configuration). >> >>YAGNI; We don't have any place for this theoretical extra configuration >>to come from, and no use cases that can't be met by just adding it to the >>configuration. Early error trapping is important, so I think it's better >>to let factories use normal Python argument validation to have required >>arguments, optional values, and to reject unrecognized arguments. > >I think in practice I'll always take **kw, because I otherwise I'd have to >enumerate all the configuration all the middleware takes, and that's >impractical. I suppose I could later assemble the middleware, determine >what configuration the actual set of middleware+application takes, then >check for extras. But I doubt I will. And even if I do, it's incidental >-- I'm quite sure I won't use using the function signature for parameter >checking. Well, I'm sure I will for simple things. For more complex things, I'll use the pattern of checking **kw against class attributes to make sure they exist. PEAK, for example, already has this ability built-in, so it's definitely the path of least resistance for implementing a middleware component in PEAK; just subclass binding.Component and add attribute bindings for everything needed. I'd hate to give that up for a theoretical argument that someday we might need some kind of arguments that aren't arguments. It's not as if we couldn't define a new protocol, and a different modifier in the deployment descriptor, if that day ever actually arrived. >>>Another part of the API that I can see as useful is passing in the >>>distribution object itself. >> >>Which distribution? The one the entry point came from? It already knows >>(or can find out) what distribution it's in. > >I mean like: > > [wsgi.app_factory] > filebrowser = paste.wareweb:make_app > >Where paste.wareweb.make_app knows how to build an application from >filename conventions in the package itself, even though the paste.wareweb >module isn't in the project itself. Oh. I think I get you now; you want to be able to define an entry point that wraps itself in something else. I don't see though why I can't just put the wrapper code in myself, like this: def my_app(*args, **kw): return paste.wareweb.make_app( pkg_resources.get_provider(__name__), *args, **kw ) And then just make the entry point refer to this. Or, if you want to be fancy: my_app = paste.wareweb.app_maker(__name__) This seems more than sufficient for the use case. From chrism at plope.com Mon Jul 25 18:59:22 2005 From: chrism at plope.com (Chris McDonough) Date: Mon, 25 Jul 2005 12:59:22 -0400 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <5.1.1.6.0.20050725103138.02879238@mail.telecommunity.com> References: <1122273649.8767.25.camel@localhost.localdomain> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com> <1122273223.8767.18.camel@localhost.localdomain> <1122273649.8767.25.camel@localhost.localdomain> <5.1.1.6.0.20050725103138.02879238@mail.telecommunity.com> Message-ID: <1122310762.3898.26.camel@plope.dyndns.org> Great. Given that, I've created the beginnings of a more formal specification: WSGI Deployment Specification ----------------------------- I use the term "WSGI component" in here as shorthand to indicate all types of WSGI implementations (application, middleware). The primary deployment concern is to create a way to specify the configuration of an instance of a WSGI component within a declarative configuration file. A secondary deployment concern is to create a way to "wire up" components together into a specific deployable "pipeline". Pipeline Descriptors -------------------- Pipeline descriptors are file representations of a particular WSGI "pipeline". They include enough information to configure, instantiate, and wire together WSGI apps and middleware components into one pipeline for use by a WSGI server. Installation of the software which composes those components is handled separately. In order to define a pipeline, we use a ".ini"-format configuration file conventionally named '.wsgi'. This file may optionally be marked as executable and associated with a simple UNIX interpreter via a leading hash-bang line to allow servers which employ stdin and stdout streams (ala CGI) to run the pipeline directly without any intermediation. For example, a deployment descriptor named 'myapplication.wsgi' might be composed of the following text:: #!/usr/bin/runwsgi [mypackage.mymodule.factory1] quux = arbitraryvalue eekx = arbitraryvalue [mypackage.mymodule.factory2] foo = arbitraryvalue bar = arbitraryvalue Section names are Python-dotted-path names (or setuptools "entry point names" described in a later section) which represent factories. Key-value pairs within a given section are used as keyword arguments to the factory that can be used as configuration for the component being instantiated. All sections in the deployment descriptor describe 'middleware' except for the last section, which must describe an application. Factories which construct middleware must return something which is a WSGI "callable" by implementing the following API:: def factory(next_app, [**kw]): """ next_app is the next application in the WSGI pipeline, **kw is optional, and accepts the key-value pairs that are used in the section as a dictionary, used for configuration """ Factories which construct middleware must return something which is a WSGI "callable" by implementing the following API:: def factory([**kw]): """" **kw is optional, and accepts the key-value pairs that are used in the section as a dictionary, used for configuration """ A deployment descriptor can also be parsed from within Python. An importable configurator which resides in 'wsgiref' exposes a function that accepts a single argument, "configure":: >>> from wsgiref.runwsgi import parse_deployment >>> appchain = parse_deployment('myapplication.wsgi') 'appchain' will be an object representing the fully configured "pipeline". 'parse_deployment' is guaranteed to return something that implements the WSGI "callable" API described in PEP 333. Entry Points On Mon, 2005-07-25 at 10:39 -0400, Phillip J. Eby wrote: > At 03:02 AM 7/25/2005 -0400, Chris McDonough wrote: > >Actually, let me give this a shot. > > > >We package up an egg called helloworld.egg. It happens to contain > >something that can be used as a WSGI component. Let's say it's a WSGI > >application that always returns 'Hello World'. And let's say it also > >contains middleware that lowercases anything that passes through before > >it's returned. > > > >The implementations of these components could be as follows: > > > >class HelloWorld: > > def __init__(self, app, **kw): > > pass # nothing to configure > > > > def __call__(self, environ, start_response): > > start_response('200 OK', []) > > return ['Hello World'] > > I'm thinking that an application like this wouldn't take an 'app' > constuctor parameter, and if it takes no configuration parameters it > doesn't need **kw, but good so far. > > > >class Lowercaser: > > def __init__(self, app, **kw): > > self.app = app > > # nothing else to configure > > > > def __call__(self, environ, start_response): > > for chunk in self.app(environ, start_response): > > yield chunk.lower() > > Again, no need for **kw if it doesn't take any configuration, but okay. > > > >An import map would ship inside of the egg-info dir: > > > >[wsgi.app_factories] > >helloworld = helloworld:HelloWorld > >lowercaser = helloworld:Lowercaser > > I'm thinking it would be more like: > > [wsgi.middleware] > lowercaser = helloworld:Lowercaser > > [wsgi.apps] > helloworld = helloworld:HelloWorld > > and you'd specify it in the setup script as something like this: > > setup( > #... > entry_points = { > 'wsgi.apps': ['helloworld = helloworld:HelloWorld'] > 'wsgi.middleware': ['lowercaser = helloworld:Lowercaser'] > } > ) > > (And the CVS version of setuptools already supports this.) > > > > >So we install the egg and this does nothing except allow it to be used > >from within Python. > > > >But when we create a "deployment descriptor" like so in a text editor: > > > >[helloworld from helloworld] > > > >[lowercaser from helloworld] > > Opposite order, though; the lowercaser comes first because it's the > middleware; the application would always come last, because they're listed > in the order in which they receive data, just like a pipes-and-filters > command line. > > > >... and run some "starter" script that parses that as a pipeline, > > ... possibly using a #! line if you're using CGI or FastCGI with Apache or > some other non-Python webserver. > > > >creates the two instances, wires them together, and we get a running > >pipeline? > > > >Am I on track? > > Definitely. > From pje at telecommunity.com Mon Jul 25 19:35:11 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 25 Jul 2005 13:35:11 -0400 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <1122310762.3898.26.camel@plope.dyndns.org> References: <5.1.1.6.0.20050725103138.02879238@mail.telecommunity.com> <1122273649.8767.25.camel@localhost.localdomain> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com> <1122273223.8767.18.camel@localhost.localdomain> <1122273649.8767.25.camel@localhost.localdomain> <5.1.1.6.0.20050725103138.02879238@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050725131252.02789400@mail.telecommunity.com> At 12:59 PM 7/25/2005 -0400, Chris McDonough wrote: > In order to define a pipeline, we use a ".ini"-format configuration Ultimately I think the spec will need a formal description of what that means exactly, including such issues as a PEP 263-style "encoding" specifier, and the precise format of values. But I'm fine with adding all that myself, since I'm going to have to specify it well enough to create a parser anyway. With respect to the format, I'm actually leaning towards either treating the settings as Python assignment statements (syntactically speaking) or restricting the values to being single-token Python literals (i.e., numbers, strings, or True/False/None, but not tuples, lists, or other expressions). Interestingly enough, I think you could actually define the entire format in terms of standard Python-language tokens, it's just the higher-level syntax that differs from Python's. Although actually using the "tokenize" module to scan it would mean that all lines' content would need to start exactly at the left margin, with no indentation. Probably not a big deal, though. The syntax would probably be something like: pipeline ::= section* section ::= heading assignment* heading ::= '[' qname trailer ']' NEWLINE assignment ::= NAME '=' value NEWLINE qname ::= NAME ('.' NAME) * trailer ::= "from" requirement | "options" value ::= NUMBER | STRING | "True" | "False" | "None" requirement ::= NAME versionlist? versionlist ::= versionspec (',' versionspec)* versionspec ::= relop STRING relop ::= "<" | "<=" | "==" | "!=" | ">=" | ">" The versions would have to be strings in order to avoid problems parsing e.g '2.1a4' as a number. And if we were going to allow structures like tuples or lists or dictionaries, then we'd need to expand on 'value' a little bit, but not as much as if we allowed arbitrary expressions. > file conventionally named '.wsgi'. This file may > optionally be marked as executable and associated with a simple UNIX > interpreter via a leading hash-bang line to allow servers which > employ stdin and stdout streams (ala CGI) to run the pipeline > directly without any intermediation. For that matter, while doing development and testing, the interpreter could be something like "#!invoke peak launch wsgifile", to launch the app in a web browser from a localhost http server. (Assuming I added a "wsgifile" command to PEAK, of course.) > Factories which construct middleware must return something which is > a WSGI "callable" by implementing the following API:: > > def factory(next_app, [**kw]): > """ next_app is the next application in the WSGI pipeline, > **kw is optional, and accepts the key-value pairs > that are used in the section as a dictionary, used > for configuration """ Note that you can also just list the parameter names you take, or no parameter names at all. I don't want to imply that you *have* to use kw, because it's fairly easy to envision simple middleware components that only take two or three parameters, or maybe even just one (e.g., their config file name). > Factories which construct middleware must return something which is > a WSGI "callable" by implementing the following API:: You probably meant "application" or "terminal application" here. (Or whatever term we end up with for an application that isn't middleware. > A deployment descriptor can also be parsed from within Python. An > importable configurator which resides in 'wsgiref' exposes a > function that accepts a single argument, "configure":: > > >>> from wsgiref.runwsgi import parse_deployment > >>> appchain = parse_deployment('myapplication.wsgi') > > 'appchain' will be an object representing the fully configured > "pipeline". 'parse_deployment' is guaranteed to return something > that implements the WSGI "callable" API described in PEP 333. Or raise SyntaxError for a malformed descriptor file, or ImportError if an application import failed or an entry point couldn't be found, or DistributionNotFound if a needed egg couldn't be found, or VersionConflict if it needs a conflicting version. Or really it could raise anything if one of the factories failed, come to think of it. From pje at telecommunity.com Mon Jul 25 19:40:49 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 25 Jul 2005 13:40:49 -0400 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <5.1.1.6.0.20050725131252.02789400@mail.telecommunity.com> References: <1122310762.3898.26.camel@plope.dyndns.org> <5.1.1.6.0.20050725103138.02879238@mail.telecommunity.com> <1122273649.8767.25.camel@localhost.localdomain> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com> <1122273223.8767.18.camel@localhost.localdomain> <1122273649.8767.25.camel@localhost.localdomain> <5.1.1.6.0.20050725103138.02879238@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050725134010.02809100@mail.telecommunity.com> At 01:35 PM 7/25/2005 -0400, Phillip J. Eby wrote: > heading ::= '[' qname trailer ']' NEWLINE Oops. That should've been "trailer?", since the trailer is optional. From ianb at colorstudy.com Mon Jul 25 19:49:26 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 25 Jul 2005 12:49:26 -0500 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <5.1.1.6.0.20050725131252.02789400@mail.telecommunity.com> References: <5.1.1.6.0.20050725103138.02879238@mail.telecommunity.com> <1122273649.8767.25.camel@localhost.localdomain> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724114256.0272b7e0@mail.telecommunity.com> <5.1.1.6.0.20050724220847.0284ea10@mail.telecommunity.com> <1122273223.8767.18.camel@localhost.localdomain> <1122273649.8767.25.camel@localhost.localdomain> <5.1.1.6.0.20050725103138.02879238@mail.telecommunity.com> <5.1.1.6.0.20050725131252.02789400@mail.telecommunity.com> Message-ID: <42E52626.5080104@colorstudy.com> Phillip J. Eby wrote: > At 12:59 PM 7/25/2005 -0400, Chris McDonough wrote: > >> In order to define a pipeline, we use a ".ini"-format configuration > > > Ultimately I think the spec will need a formal description of what that > means exactly, including such issues as a PEP 263-style "encoding" > specifier, and the precise format of values. But I'm fine with adding all > that myself, since I'm going to have to specify it well enough to create a > parser anyway. Incidentally I have a generic ini parser here: http://svn.w4py.org/home/ianb/wsgikit_old_config/iniparser.py I suspect I'm doing the character decoding improperly (line-by-line instead of opening the file with the given character encoding), but otherwise it's been sufficiently generic and workable, and should allow for doing more extensive parsing of things like section headers. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From james at pythonweb.org Mon Jul 25 23:54:08 2005 From: james at pythonweb.org (James Gardner) Date: Mon, 25 Jul 2005 22:54:08 +0100 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config Message-ID: <42E55F80.7090300@pythonweb.org> Hi All, I'm a bit late coming to all this and didn't really see the benefits of the new format over what we already do so I set out to contrast new and old to demonstrate why it wasn't *that* useful. I've since changed my mind and think it is great but here is the contrasting I did anyway. I'd be pleased to hear all the glaring errors :-) Here is a new example: we want to have an application that returns a GZip encoded "hello world" string after it has been made lowercase by case changer middleware taking a parameter newCase. The GZip middleware is an optional feature of the modules in wsgiFilters.egg and the CaseChanger middleware and HelloWorld application are in the helloworld.egg. The classes look like this: class HelloWorld: def __call__(self, environ, start_response): start_response('200 OK', [('Content-type','text/plain')]) return ['Hello World'] class CaseChanger: def __init__(self, app, newCase): self.app = app self.newCase = newCase def __call__(self, environ, start_response): for chunk in self.app(environ, start_response): if self.newCase == 'lower': yield chunk.lower() else: yield chunk Class GZip: def __init__(self, app): self.app = app def __call__(self, environ, start_response): # Do clever things with headers here (omitted) for chunk in self.app(environ, start_response): yeild gzip(chunk) The way we would write our application at the moment is as follows: from pkg_resources import require require('helloworld >= 0.2') from helloworld import Helloworld require('wsgiFilters[GZip] == 1.4.3') from wsgiFilters import GZip pipeline = GZip( app = CaseChanger( app = HelloWorld(), newCase = 'lowercase', ) ) With pipeline itself somehow being executed as a WSGI application. The new way is like this (correct me if I'm wrong) The modules have egg_info files like this respectively defining the "entry points": wsgiFilters.egg: [wsgi.middleware] gzipper = GZip:GZip helloworld.egg: [wsgi.middleware] cs = helloworld:CaseChanger [wsgi.app] myApp = helloworld:HelloWorld We would then write an "import map" (below) based on the "deployment descriptors" in the .eggs used to describe the "entry points" into the eggs. The order the "pipeline" would be built is the same as in the Python example eg middleware first then application. [gzipper from wsgiFilters[GZip] == 1.4.3] [cs from helloworld >= 0.2 ] newCase = 'lower' [myApp from helloworld >= 0.2] It is loaded using an as yet unwritten modules which uses a factory returning a middleware pipeline equivalent to what would be produced in the Python example (is this very last bit correct?) Doing things this new way has the following advantages: * We have specified explicitly in the setup.py of the eggs that the middleware and applications we are importing are actually middleware and an application * It is simpler for a non-technical user. * There are lots of other applications for the ideas being discussed It has the following disadvantages: * We are limited as to what we can use as variable names. Existing middleware would need customising to only accept basic parameters. * We require all WSGI coders to use the egg format. * Users can't customise the middleware in the configuration file (eg by creating a derived class etc and you lose flexibility). * If we use a Python file we can directly import and manipulate the pipeline (I guess you can do this anyway once your factory has returned the pipeline) Both methods are the same in that * We have specified the order of the pipeline and the middleware and applications involved * Auto-downloading and installation of middleware and applications based on version requirements is possible (thanks to PJE's eggs) * We have specified which versions of modules we require. * Both could call a script such as wsgi_CGI.py wsgi_mod_python.py etc to execute the WSGI pipeline so both method's files could be distributed as a single file and would auto download their own dependencies. Other ideas: Is it really necessary to be able to give an entry point a name? If not because we know what we want to import anyway, we can combine the deployment descriptor into the import map: [GZip:GZip from wsgiFilters[GZip] == 1.4.3] We can then simplify the deployment descriptor like this: [wsgi.middleware] GZip:GZip And then remove the colons and give a fully qualified Python-style path: [GZip.GZip from wsgiFilters[GZip] == 1.4.3] and [wsgi.middleware] GZip.GZip Is this not better? Why do you need to assign names to entry points? Although writing a middleware chain is dead easy for a Python programmer, it isn't for the end user and if you compare the end user files from this example I know which one I'd rather explain to someone. So although this deployment format seemed at first like overkill, I'm now very much in favour. I was personally considering YAML for doing my own configuration using a factory but frankly the new format is much cleaner and you don't need all the power of YAML anyway! Count me in! James From pje at telecommunity.com Tue Jul 26 00:27:17 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 25 Jul 2005 18:27:17 -0400 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <42E55F80.7090300@pythonweb.org> Message-ID: <5.1.1.6.0.20050725180559.027bcfd8@mail.telecommunity.com> At 10:54 PM 7/25/2005 +0100, James Gardner wrote: >The new way is like this (correct me if I'm wrong) > >The modules have egg_info files like this respectively defining the >"entry points": > >wsgiFilters.egg: > >[wsgi.middleware] >gzipper = GZip:GZip Almost; this one should be: [wsgi.middleware] gzipper = GZip:GZip [GZip] So that using gzipper doesn't require specifying "extras" in the pipeline descriptor. See below. >helloworld.egg: > >[wsgi.middleware] >cs = helloworld:CaseChanger > >[wsgi.app] >myApp = helloworld:HelloWorld > > >We would then write an "import map" (below) based on the "deployment >descriptors" in the .eggs used to describe the "entry points" into the >eggs. Actually, the new thing you write is the deployment descriptor or pipeline descriptor. The "import map" is the thing you put in the eggs' setup.py to list the entry points offered by the eggs. >The order the "pipeline" would be built is the same as in the >Python example eg middleware first then application. > >[gzipper from wsgiFilters[GZip] == 1.4.3] >[cs from helloworld >= 0.2 ] >newCase = 'lower' >[myApp from helloworld >= 0.2] You wouldn't need the [GZip] part if it were declared with the entry point, as I showed above. >It is loaded using an as yet unwritten modules which uses a factory >returning a middleware pipeline equivalent to what would be produced in >the Python example (is this very last bit correct?) Yes. The order in the file is the order in which the items are invoked by the controlling server. >Doing things this new way has the following advantages: >* We have specified explicitly in the setup.py of the eggs that the >middleware and applications we are importing are actually middleware and >an application >* It is simpler for a non-technical user. >* There are lots of other applications for the ideas being discussed > >It has the following disadvantages: >* We are limited as to what we can use as variable names. Existing >middleware would need customising to only accept basic parameters. This depends a lot on the details of the .ini-like format, which are still up in the air. >* We require all WSGI coders to use the egg format. Not so; you can use [GZip.GZip] as a section header in order to do just a plain ol' import. >* Users can't customise the middleware in the configuration file (eg by >creating a derived class etc and you lose flexibility). No, but all they have to do is create a Python file and refer to it, and they are thereby encouraged to separate code from configuration. :) >* If we use a Python file we can directly import and manipulate the >pipeline (I guess you can do this anyway once your factory has returned >the pipeline) Yep. >Both methods are the same in that >* We have specified the order of the pipeline and the middleware and >applications involved >* Auto-downloading and installation of middleware and applications based >on version requirements is possible (thanks to PJE's eggs) One difference here: the .ini format is parseable to determine what eggs are needed without executing arbitrary code. >Other ideas: > >Is it really necessary to be able to give an entry point a name? Yes. Entry points are a generic setuptools mechanism now, and they have names. However, this doesn't mean they all have to be exported by an egg's import map. >If not >because we know what we want to import anyway, we can combine the >deployment descriptor into the import map: > >[GZip:GZip from wsgiFilters[GZip] == 1.4.3] We could perhaps still allow that format; the format is still being discussed. However, this would just be a function of the .wsgi file, and doesn't affect the concept of "entry points". It's just naming a factory directly instead of accessing it via an entry point. >We can then simplify the deployment descriptor like this: > >[wsgi.middleware] >GZip:GZip If you don't care about the entry point, you can just not declare one. But you can't opt out of naming them. >Why do you need to assign names to entry points? Because other things that use entry points need names. For example, setuptools searches a "distutils.commands" entry point group to find commands that extend the normal setup commands. It certainly doesn't know in advance what commands the eggs are going to provide. The question you should be asking is, "Why do we have to use entry points to specify factories?", and the answer is, "we don't". :) >Although writing a middleware chain is dead easy for a Python >programmer, it isn't for the end user and if you compare the end user >files from this example I know which one I'd rather explain to someone. Yep. Ian gets the credit for further simplifying my "sequence of [factory.name] sections" proposal by coming up with the idea of having named entry points declared in an egg. I then took the entry points idea to its logical conclusion, even refactoring setuptools to use them for its own extensibility. >So although this deployment format seemed at first like overkill, I'm >now very much in favour. I was personally considering YAML for doing my >own configuration using a factory but frankly the new format is much >cleaner and you don't need all the power of YAML anyway! Count me in! There's one other advantage: this format will hopefully become as successful as WSGI itself in adoption by servers and applications. Hopefully within a year or so, *the* normal way to deploy a Python web app will be using a .wsgi file. Beyond that, we can hopefully begin to see "Python" rather than "framework X" as being what people write their web apps with. From ianb at colorstudy.com Tue Jul 26 01:40:14 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 25 Jul 2005 18:40:14 -0500 Subject: [Web-SIG] WSGI deployment use case Message-ID: <42E5785E.1040900@colorstudy.com> Well, I thought I'd chime in with everything I'd want in a deployment strategy; some of this clearly is the realm of practice, not code, but it all fits together. It's not necessarily the Universal Use Case, but I don't think it's too strange. Here's some types of applications: * Things I (or someone in my company) codes. * Full application someone else creates and I use. * Applications that are more like a service, that use inside an application of mine. These are end-point applications, like a REST service or something. An application like this probably appears to live inside my application, through some form of delegation; unless it's a service many applications share...? * Middleware written by me for my internal use, maybe application-specific. * Middleware written specifically for the framework I (or someone else) uses. * General-purpose middleware that could apply to anything. Some kinds of deployments I want to do: * Two clients with the same application, same version (like "latest stable version"). * Two clients with different versions; e.g., one client hasn't paid for an upgrade (which might mean upgrading). * A client with branched code, i.e., we've tweaked one instance of the application just for them. * Two installations of the same application, in the same process, with different URLs and different configurations. This might be something as small as a formmail kind of script, or a large program. * Sometimes apps go into different processes, but often they can go into the same process (especially if I start using Python WSGI for the kind of seldom-used apps that I now use CGI for). * I have to mount these applications at some location. This should be part of the deployment configuration; both path based and domain name based. Here's some aspects of the configuration: * Many applications have a lot of configuration. Much of it is "just in case" configuration that I'd never want to tweak. Some of that configuration may be derivative of things I do want to tweak, e.g., URL layouts where I configure the base URL, but all the other URLs could be derived from that. * What appears to be an application from the outside might be composed of many applications. Maybe an app includes an external formmail app for a "support" link. That app requires configuration (like smtp server). * I'd like to configure some things globally. Like that smtp server. Or an email address to send unexpected exceptions to. * I might want to override configuration locally, like that email address. I might want to augment configuration, like just add an address to the list, not reset the whole value. * I'd like to install some middleware globally as well. Like a session handler, perhaps. Or authentication. Or an exception catcher -- I'd like everyone to use my well-configured exception catcher. So not only am I adding middleware, I might be asking that middleware be excluded (or should simply short-circuit itself). * And of course, all my applications take configuration, separate from middleware and frameworks. * And usually there are non-WSGI pieces that need access to the exact same configuration; scripts and cronjobs and whatnot. Usually they just need the application configuration, but nothing related to middleware or the web. I think quite a bit of this is handled well by what we're talking about, even if it wasn't just a little while ago; versioning for instance. Branches I'm a little less sure about, since version numbers are linear. But configuration and composition of multiple independent applications into a single process isn't. I don't think we can solve these separately, because the Hard Problem is how to handle configuration alongside composition. How can I apply configuration to a set of applications? How can I make exceptions? How can an application consume configuration as well as delegate configuration to a subapplication? The pipeline is often more like a tree, so the logic is a little complex. Or, rather, there's actual *logic* in how configuration is applied, almost all of which are viable. I can figure out a bunch of ad hoc and formal ways of accomplishing this in Paste; most of it is already possible, and entry points alone clean up a lot of what's there (encouraging a separation between how an application is invoked generally, and install-specific configuration). But with a more limited and declarative configuration it is harder. Also when configuration is pushed into factories as keyword arguments, instead of being pulled out of a dictionary, it is much harder -- the configuration becomes unhackable. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Tue Jul 26 01:49:36 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 25 Jul 2005 18:49:36 -0500 Subject: [Web-SIG] WSGI deployment use case In-Reply-To: <42E5785E.1040900@colorstudy.com> References: <42E5785E.1040900@colorstudy.com> Message-ID: <42E57A90.5060306@colorstudy.com> I thought it would muck up the list of issues too much if I added too much commentary. But then it's not that useful without it... Ian Bicking wrote: > Here's some types of applications: > > * Things I (or someone in my company) codes. I'm planning on making everything an egg. I'm even thinking about how I can make my Javascript libraries eggs, though I'm not sure about that. We keep an internal index of these. > * Full application someone else creates and I use. I'm hoping these are eggable; if not we'll probably make them so, just so we can manage them. > * Applications that are more like a service, that use inside an > application of mine. These are end-point applications, like a REST > service or something. An application like this probably appears to live > inside my application, through some form of delegation; unless it's a > service many applications share...? This is more annoying. Again, eggs. But they get mounted somewhere, maybe based on configuration, maybe not. If the application is nested, then my application will recursively use configuration to create these applications. > * Middleware written by me for my internal use, maybe application-specific. I'll probably apply these in my own factory functions. > * Middleware written specifically for the framework I (or someone else) > uses. Again, probably in a factory function. Sometimes my own stuff will go above or below these. Some pieces of the framework need to be aware of my special middleware (like a URL parser). > * General-purpose middleware that could apply to anything. This is stuff I want to configure globally; an open issue. > Some kinds of deployments I want to do: > > * Two clients with the same application, same version (like "latest > stable version"). I can potentially install these in a single process or in separate processes. They each get a separate configuration. Probably domain name based dispatching to the different configuration files. > * Two clients with different versions; e.g., one client hasn't paid for > an upgrade (which might mean upgrading). Definitely need two processes; otherwise no problem with Eggs -- that means I don't have to fiddle with PYTHONPATH, special package directories, etc. > * A client with branched code, i.e., we've tweaked one instance of the > application just for them. I don't know what to version such a branch as. Maybe some version that goes before everything else, and use an explicit version requirement (==client_name_1.0) > * Two installations of the same application, in the same process, with > different URLs and different configurations. This might be something as > small as a formmail kind of script, or a large program. Woops, same thing as before. Anyway, I might use a pattern, like "/app" gets redirected in Apache to an application server that does further dispatching on URLs. Or I might add specific rewriting and aliases to mount applications in their place. These have to map to specific processes, maybe through some convention on port numbers, maybe filenames if I'm using something that talks over named sockets. I guess potentially I could use an environmental variable to indicate which app I'm trying to point to (SetEnvIf style). Or, rather, what configuration file I'm pointing to, since there's a many-to-one relationship between configuration files and applications. > * Sometimes apps go into different processes, but often they can go into > the same process (especially if I start using Python WSGI for the kind > of seldom-used apps that I now use CGI for). Deployment in these cases should be really light. Definitely not a programming task. > * I have to mount these applications at some location. This should be > part of the deployment configuration; both path based and domain name based. > ... And then none of this configuration stuff is handled to my satisfaction... > Here's some aspects of the configuration: > > * Many applications have a lot of configuration. Much of it is "just in > case" configuration that I'd never want to tweak. Some of that > configuration may be derivative of things I do want to tweak, e.g., URL > layouts where I configure the base URL, but all the other URLs could be > derived from that. > > * What appears to be an application from the outside might be composed > of many applications. Maybe an app includes an external formmail app > for a "support" link. That app requires configuration (like smtp server). > > * I'd like to configure some things globally. Like that smtp server. > Or an email address to send unexpected exceptions to. > > * I might want to override configuration locally, like that email > address. I might want to augment configuration, like just add an > address to the list, not reset the whole value. > > * I'd like to install some middleware globally as well. Like a session > handler, perhaps. Or authentication. Or an exception catcher -- I'd > like everyone to use my well-configured exception catcher. So not only > am I adding middleware, I might be asking that middleware be excluded > (or should simply short-circuit itself). > > * And of course, all my applications take configuration, separate from > middleware and frameworks. > > * And usually there are non-WSGI pieces that need access to the exact > same configuration; scripts and cronjobs and whatnot. Usually they just > need the application configuration, but nothing related to middleware or > the web. From pje at telecommunity.com Tue Jul 26 02:04:30 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 25 Jul 2005 20:04:30 -0400 Subject: [Web-SIG] Entry points and import maps (was Re: Scarecrow deployment config In-Reply-To: <42E56F36.6040601@pythonweb.org> References: <5.1.1.6.0.20050725180559.027bcfd8@mail.telecommunity.com> <5.1.1.6.0.20050725180559.027bcfd8@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050725194017.027f36d8@mail.telecommunity.com> [cc:'d to distutils-sig because this is mostly about cool uses for the new EntryPoint facility of setuptools/pkg_resources] At 12:01 AM 7/26/2005 +0100, James Gardner wrote: >Hi Phillip, > >>There's one other advantage: this format will hopefully become as >>successful as WSGI itself in adoption by servers and applications. >>Hopefully within a year or so, *the* normal way to deploy a Python web >>app will be using a .wsgi file. >> >>Beyond that, we can hopefully begin to see "Python" rather than >>"framework X" as being what people write their web apps with. > >Well that would be absolutely wonderful but also looking fairly likely >which is great news. I've got to say a massive thank you for the eggs >format and easy install as well.. Python was really crying out for it and >it will be phenomenally useful. I've split all my code up as a result >because there is no need to worry about people having to install lots of >packages if it is all done automatically. > >One thought: I'd ideally like to be able to backup a WSGI deployment to >allow it to be easily redeployed on another server with a different >configuration or just restored in the event of data loss. This would >probably just involve making a zip file of all data files (including an >SQL dump) and then redistributing it with the .wsgi file. Have you had any >thoughts on how that could be achieved or is that something you wouldn't >want the .wsgi file to be used for? Whatever software installed the >dependencies of the .wsgi file would need to be aware of the data file and >what to do with it, perhaps simply by calling an install handler >somewhere? Umm, all getting a bit complicated but I just wondered if you >had had any thoughts of that nature? Well, you could define another set of entry point groups, like "wsgi.middleware.backup_handlers", which would contain entry points with the same names as in middleware, that would get called with the same configuration arguments as the application factories, but would then do some kind of backing up. Similarly, you could have an entry point group for restoration functions. These would have to defined by the same egg as the one with the factory, of course, although perhaps we could make the entry point names be the entry point targets instead of using the original entry point names. That additional level of indirection would let one egg define backup and restore services for another's factories. Perhaps the backup functions would return the name of a directory tree to back up, and the restore functions would receive some kind of zipfile or archive. Obviously that's a totally unbaked idea that would need a fair amount of thought, but there's nothing stopping anybody from fleshing it out as a tool and publishing a spec for the entry points it uses. >Oh sorry, another quick question: Is there any work underway auto-document >eggs using some of the docutils code if an appropriate specifier is made >in the egg_info file saying the egg is in restructured text or similar? >Would that be something you would be willing to include as a feature of >easy_install or is it a bit too messy? I'd love to be able to distribute a >.wsgi file and have all the documentation for the downloaded modules auto >created. If only some of the modules supported it it would still be quite >handy. I'm having a little trouble envisioning what you mean exactly. All that's really coming across is the idea that "there's some way to generate documentation from eggs". I'd certainly like to be able to see tools like epydoc or pydoc support generating documentation for an egg. However, there's a fair amount of balkanization in how you specify inputs for Python documentation tools, not unlike the previous balkanization of web servers and web apps/frameworks. Maybe somebody will come up with a similar lingua franca for documentation tools. With respect to adding more cool features to setup(), I plan to add a couple of entry point groups to setuptools that would support what you have in mind, though. There's already a distutils.commands group that allows you to register setup commands, but I also plan to add egg_info.writers and distutils.setup_args. The setup_args entry points would have the name of a parameter you'd like setup() to have, and would be a function that would get called on the Distribution object during its finalize_options() (so you can validate the argument). The egg_info.writers group will define entry points for writing metadata files as part of the egg_info command. Last, but not least, I'll add a 'setup_requires' argument to setup() that will specify eggs that need to be present for the setup script to run. With these three things in place, tools like the build_utils or py2exe and py2app won't have to monkeypatch the distutils in order to install themselves; they can instead just define entry points for setup() args and the new commands they add. And for your documentation concept, this could include document-specification arguments and an egg_info.writers entry point to put it in the EGG-INFO. Packages using the arguments would have to use 'setup_requires', however, to list the eggs needed to process those arguments. My idea for stuff like this was mainly to support frameworks; for example if an application needs plugin metadata other than entry points, it can define an egg that extends setuptools with the necessary setup arguments and metadata writers. Then, when you're building a plugin for the tool, you just setup_requires=["AppSetup"], where "AppSetup" is the egg with the setuptools extensions for "App". (Most apps will want their setuptools extensions in a separate egg, because the app itself may need the same extensions in order to be built, which would lead to a hairy chicken-and-egg problem. setuptools itself was a little tricky to bootstrap since it finds its own commands via entry points now!) >Thanks for the answers anyway, the whole thing looks tremendously exciting! That's because it is. :) From pje at telecommunity.com Tue Jul 26 02:15:05 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 25 Jul 2005 20:15:05 -0400 Subject: [Web-SIG] WSGI deployment use case In-Reply-To: <42E5785E.1040900@colorstudy.com> Message-ID: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com> At 06:40 PM 7/25/2005 -0500, Ian Bicking wrote: >But configuration and composition of multiple independent applications >into a single process isn't. I don't think we can solve these >separately, because the Hard Problem is how to handle configuration >alongside composition. How can I apply configuration to a set of >applications? How can I make exceptions? How can an application >consume configuration as well as delegate configuration to a >subapplication? The pipeline is often more like a tree, so the logic is >a little complex. Or, rather, there's actual *logic* in how >configuration is applied, almost all of which are viable. We probably need something like a "site map" configuration, that can handle tree structure, and can specify pipelines on a per location basis, including the ability to specify pipeline components to be applied above everything under a certain URL pattern. This is more or less the same as my "container API" concept, but we are a little closer to being able to think about such a thing. Of course, I still think it's something that can be added *after* having a basic deployment spec. >I can figure out a bunch of ad hoc and formal ways of accomplishing this >in Paste; most of it is already possible, and entry points alone clean >up a lot of what's there (encouraging a separation between how an >application is invoked generally, and install-specific configuration). >But with a more limited and declarative configuration it is harder. But the tradeoff is greater ability to build tools that operate on the configuration to do something -- like James Gardner's ideas about backup/restore and documentation tools. >Also when configuration is pushed into factories as keyword arguments, >instead of being pulled out of a dictionary, it is much harder -- the >configuration becomes unhackable. But a **kw argument *is* a dictionary, so I don't understand what you mean here. From renesd at gmail.com Tue Jul 26 02:34:10 2005 From: renesd at gmail.com (Rene Dudfield) Date: Tue, 26 Jul 2005 10:34:10 +1000 Subject: [Web-SIG] file system configuration. Message-ID: <64ddb72c050725173435527944@mail.gmail.com> What about apache style configuration, that uses the file system. It works quite well, and can be understood by all those people using apache already. You can have the main configuration done where-ever, but allowing people to add in specific configuration at any part in the url by simply adding a .htaccess file can make doing things really easy. eg. here is a basic website url structure. / /admin /images I drop a config file into / to implement the website. Now I drop a config file into /admin which uses some sort of auth scheme. I can drop a .htaccess into images to do: 1. gallery application, to make viewing of the images by thumbnmails easier. - configure the gallery application(thumbnail size, etc) 2. do not allow linking to images from other sites(by checking referer tags). By making another directory inside the directory structure you create another path which can be configured by these config files. If the top level application made a members/ url, and you want to add auth to it, you make a members/ directory and edit the config files in it. From ianb at colorstudy.com Tue Jul 26 03:29:34 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 25 Jul 2005 20:29:34 -0500 Subject: [Web-SIG] WSGI deployment use case In-Reply-To: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com> References: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com> Message-ID: <42E591FE.5040209@colorstudy.com> Phillip J. Eby wrote: > At 06:40 PM 7/25/2005 -0500, Ian Bicking wrote: > >> But configuration and composition of multiple independent applications >> into a single process isn't. I don't think we can solve these >> separately, because the Hard Problem is how to handle configuration >> alongside composition. How can I apply configuration to a set of >> applications? How can I make exceptions? How can an application >> consume configuration as well as delegate configuration to a >> subapplication? The pipeline is often more like a tree, so the logic is >> a little complex. Or, rather, there's actual *logic* in how >> configuration is applied, almost all of which are viable. > > > We probably need something like a "site map" configuration, that can > handle tree structure, and can specify pipelines on a per location > basis, including the ability to specify pipeline components to be > applied above everything under a certain URL pattern. This is more or > less the same as my "container API" concept, but we are a little closer > to being able to think about such a thing. It could also be something based on general matching rules, with some notion of precedence and how the rule effects SCRIPT_NAME/PATH_INFO. Or something like that. > Of course, I still think it's something that can be added *after* having > a basic deployment spec. I feel a very strong need that this be resolved before settling on anything deployment related. Not necessarily as a standard, but possibly as a set of practices. Even a realistic and concrete use case might be enough. >> I can figure out a bunch of ad hoc and formal ways of accomplishing this >> in Paste; most of it is already possible, and entry points alone clean >> up a lot of what's there (encouraging a separation between how an >> application is invoked generally, and install-specific configuration). >> But with a more limited and declarative configuration it is harder. > > > But the tradeoff is greater ability to build tools that operate on the > configuration to do something -- like James Gardner's ideas about > backup/restore and documentation tools. I can see that. But I know my way works, which is a bit of a bonus. And really it's entirely possible to inspect it as well. >> Also when configuration is pushed into factories as keyword arguments, >> instead of being pulled out of a dictionary, it is much harder -- the >> configuration becomes unhackable. > > > But a **kw argument *is* a dictionary, so I don't understand what you > mean here. It's about how configuration is delegated to contained applications and middleware, and what's the expectation of what that configuration looks like. I think components that don't take **kw will be hard to work with. Right now Paste hands around a fairly flat dictionary. This dictionary is passed around in full (as part of the WSGI environment) to every piece of middleware, and actually to everything (via an import and threadlocal storage). It gets used all over the place, and the ability to draw in configuration without passing it around is very important. I know it seems like heavy coupling, but in practice it causes unstable APIs if it is passed around explicitly, and as long as you keep clever dynamic values out of the configuration it isn't a problem. Anyway, every piece gets the full dictionary, so if any piece expected a constrained set of keys it would break. Even ignoring that there are multiple consumers with different keys that they pull out, it is common to create intermediate configuration values to make the configuration more abstract. E.g., I set a "base_dir", then derive "publish_dir" and "template_dir" from that. Apache configuration is a good anti-example here; its lack of variables hurts me daily. While some variables could be declared "abstract" somehow, that adds complexity where the unconstrained model avoids that complexity. When one piece delegates to another, it passes the entire dictionary through (by convention, and by the fact it gets passed around implicitly). It is certainly possible in some circumstances that a filtered version of the configuration should be passed in; that hasn't happened to me yet, but I can certainly imagine it being necessary (especially when a larger amount of more diverse software is running in the same process). One downside of this is that there's no protection from name conflicts. Though name conflicts can go both ways. The Happy Coincidence is when two pieces use the same name for the same purpose (e.g., it's highly likely "smtp_server" would be the subject of a Happy Coincidence). An Unhappy Coincidence is when two pieces use the same value for different purposes ("publish_dir" perhaps). An Expected Coincidence is when the same code, invoked in two separate call stacks, consumes the same value. Of course, I allow configuration to be overwritten depending on the request, so high collision names (like publish_dir) in practice are unlikely to be a problem. The upside over anything that expects structure in the configuration (e.g., that configuration be targetted at a specific component) is that I can hide implementation. This is extremely important to me, because I have lots of pieces. Some of them are clearly different components from the inside, some are vague and the distinction would be based entirely on my mood. For instance an application-specific middleware that could plausibly be used more widely -- does it consume the application configuration, or does it take its own configuration? But even excluding those ambiguous situations, the way my middleware is factored is an internal implementation detail, and I don't feel comfortable pushing that structure into the configuration. So that's the issue I'm concerned about. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Tue Jul 26 04:04:30 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 25 Jul 2005 22:04:30 -0400 Subject: [Web-SIG] WSGI deployment use case In-Reply-To: <42E591FE.5040209@colorstudy.com> References: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com> <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050725214457.026ec148@mail.telecommunity.com> At 08:29 PM 7/25/2005 -0500, Ian Bicking wrote: >Right now Paste hands around a fairly flat dictionary. This dictionary is >passed around in full (as part of the WSGI environment) to every piece of >middleware, and actually to everything (via an import and threadlocal >storage). It gets used all over the place, and the ability to draw in >configuration without passing it around is very important. I know it >seems like heavy coupling, but in practice it causes unstable APIs if it >is passed around explicitly, and as long as you keep clever dynamic values >out of the configuration it isn't a problem. > >Anyway, every piece gets the full dictionary, so if any piece expected a >constrained set of keys it would break. Even ignoring that there are >multiple consumers with different keys that they pull out, it is common to >create intermediate configuration values to make the configuration more >abstract. E.g., I set a "base_dir", then derive "publish_dir" and >"template_dir" from that. Apache configuration is a good anti-example >here; its lack of variables hurts me daily. While some variables could be >declared "abstract" somehow, that adds complexity where the unconstrained >model avoids that complexity. *shudder* I think someone just walked over my grave. ;) I'd rather add complexity to the deployment format (e.g. variables, interpolation, etc.) to handle this sort of thing than add complexity to the components. I also find it hard to understand why e.g. multiple components would need the same "template_dir". Why isn't there a template service component, for example? >When one piece delegates to another, it passes the entire dictionary >through (by convention, and by the fact it gets passed around >implicitly). It is certainly possible in some circumstances that a >filtered version of the configuration should be passed in; that hasn't >happened to me yet, but I can certainly imagine it being necessary >(especially when a larger amount of more diverse software is running in >the same process). > >One downside of this is that there's no protection from name >conflicts. Though name conflicts can go both ways. The Happy Coincidence >is when two pieces use the same name for the same purpose (e.g., it's >highly likely "smtp_server" would be the subject of a Happy >Coincidence). An Unhappy Coincidence is when two pieces use the same >value for different purposes ("publish_dir" perhaps). An Expected >Coincidence is when the same code, invoked in two separate call stacks, >consumes the same value. Of course, I allow configuration to be >overwritten depending on the request, so high collision names (like >publish_dir) in practice are unlikely to be a problem. I think you've just explained why this approach doesn't scale very well, even to a large team, let alone to inter-organization collaboration (i.e. open source projects). > For instance an application-specific middleware that could plausibly be > used more widely -- does it consume the application configuration, or > does it take its own configuration? But even excluding those ambiguous > situations, the way my middleware is factored is an internal > implementation detail, and I don't feel comfortable pushing that > structure into the configuration. That's what encapsulation is for. Just create a factory that takes a set of application-level parameters (like template_dir, publish_dir, etc.) and then *passes* them to the lower level components. Heck, we could even add that to the .wsgi format... # app template file [WSGI options] parameters = "template_dir", "publish_dir", ... [filter1 from foo] some_param = template_dir [filter2 from bar] other_param = publish_dir # deployment file [use file "app_template.wsgi"] template_dir = "/some/where" publish_dir = "/another/place" >So that's the issue I'm concerned about. I think the right way to fix it is parameterization; that way you don't push a global (and non type-checkable) namespace down into each component. Components should have an extremely minimal configuration with fairly specific parameters, because it makes early error checking easier, and you don't have to search all over the place to find how a parameter is used, etc., etc. From chrism at plope.com Tue Jul 26 04:11:00 2005 From: chrism at plope.com (Chris McDonough) Date: Mon, 25 Jul 2005 22:11:00 -0400 Subject: [Web-SIG] WSGI deployment use case In-Reply-To: <42E591FE.5040209@colorstudy.com> References: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com> <42E591FE.5040209@colorstudy.com> Message-ID: <1122343861.3898.91.camel@plope.dyndns.org> On Mon, 2005-07-25 at 20:29 -0500, Ian Bicking wrote: > > We probably need something like a "site map" configuration, that can > > handle tree structure, and can specify pipelines on a per location > > basis, including the ability to specify pipeline components to be > > applied above everything under a certain URL pattern. This is more or > > less the same as my "container API" concept, but we are a little closer > > to being able to think about such a thing. > > It could also be something based on general matching rules, with some > notion of precedence and how the rule effects SCRIPT_NAME/PATH_INFO. Or > something like that. How much of this could be solved by using a web server's directory/alias-mapping facility? For instance, if you needed a single Apache webserver to support multiple pipelines based on URL mapping, wouldn't it be possible in many cases to compose that out of things like rewrite rules and script aliases (the below assumes running them just as CGI scripts, obviously it would be different with something using mod_python or what-have-you): ServerAdmin webmaster at plope.com ServerName plope.com ServerAlias plope.com ScriptAlias /viewcvs "/home/chrism/viewcvs.wsgi" ScriptAlias /blog "/home/chrism/blog.wsgi" RewriteEngine On RewriteRule ^/[^/]viewcvs*$ /home/chrism/viewcvs.wsgi [PT] RewriteRule ^/[^/]blog*$ /home/chrism/blog.wsgi [PT] Obviously it would mean some repetition in "wsgi" files if you needed to repeat parts of a pipeline for each URL mapping. But it does mean we wouldn't need to invent more software. > > > Of course, I still think it's something that can be added *after* having > > a basic deployment spec. > > I feel a very strong need that this be resolved before settling on > anything deployment related. Not necessarily as a standard, but > possibly as a set of practices. Even a realistic and concrete use case > might be enough. I *think* more complicated use cases may revolve around attempting to use middleware as services that dynamize the pipeline instead of as "oblivious" things. I don't think there's anything really wrong with that but I also don't think it can ever be specified with as much clarity as what we've already got because IMHO it's a programming task. I'm repeating myself, I'm sure, but I'm more apt to put a "service manager" piece of middleware in the pipeline (or maybe just implement it as a library) which would allow my endpoint app to use it to do sessioning and auth and whatnot. I realize that is essentially "building a framework" (which is reviled lately) but since the endpoint app needs to collaborate anyway, I don't see a better way to do it except to rely completely on convention for service lookup (which is what you seem to be struggling with in the later bits of your post). - C From ianb at colorstudy.com Tue Jul 26 04:54:01 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 25 Jul 2005 21:54:01 -0500 Subject: [Web-SIG] WSGI deployment use case In-Reply-To: <5.1.1.6.0.20050725214457.026ec148@mail.telecommunity.com> References: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com> <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com> <5.1.1.6.0.20050725214457.026ec148@mail.telecommunity.com> Message-ID: <42E5A5C9.2050408@colorstudy.com> Phillip J. Eby wrote: > At 08:29 PM 7/25/2005 -0500, Ian Bicking wrote: > >> Right now Paste hands around a fairly flat dictionary. This >> dictionary is passed around in full (as part of the WSGI environment) >> to every piece of middleware, and actually to everything (via an >> import and threadlocal storage). It gets used all over the place, and >> the ability to draw in configuration without passing it around is very >> important. I know it seems like heavy coupling, but in practice it >> causes unstable APIs if it is passed around explicitly, and as long as >> you keep clever dynamic values out of the configuration it isn't a >> problem. >> >> Anyway, every piece gets the full dictionary, so if any piece expected >> a constrained set of keys it would break. Even ignoring that there >> are multiple consumers with different keys that they pull out, it is >> common to create intermediate configuration values to make the >> configuration more abstract. E.g., I set a "base_dir", then derive >> "publish_dir" and "template_dir" from that. Apache configuration is a >> good anti-example here; its lack of variables hurts me daily. While >> some variables could be declared "abstract" somehow, that adds >> complexity where the unconstrained model avoids that complexity. > > > *shudder* I think someone just walked over my grave. ;) > > I'd rather add complexity to the deployment format (e.g. variables, > interpolation, etc.) to handle this sort of thing than add complexity to > the components. I also find it hard to understand why e.g. multiple > components would need the same "template_dir". Why isn't there a > template service component, for example? In that case, no, multiple components are unlikely to usefully share template_dir. But that's not an issue I'm really hitting -- though it does start to add importance to the order in which configuration files are loaded. >> When one piece delegates to another, it passes the entire dictionary >> through (by convention, and by the fact it gets passed around >> implicitly). It is certainly possible in some circumstances that a >> filtered version of the configuration should be passed in; that hasn't >> happened to me yet, but I can certainly imagine it being necessary >> (especially when a larger amount of more diverse software is running >> in the same process). >> >> One downside of this is that there's no protection from name >> conflicts. Though name conflicts can go both ways. The Happy >> Coincidence is when two pieces use the same name for the same purpose >> (e.g., it's highly likely "smtp_server" would be the subject of a >> Happy Coincidence). An Unhappy Coincidence is when two pieces use the >> same value for different purposes ("publish_dir" perhaps). An >> Expected Coincidence is when the same code, invoked in two separate >> call stacks, consumes the same value. Of course, I allow >> configuration to be overwritten depending on the request, so high >> collision names (like publish_dir) in practice are unlikely to be a >> problem. > > > I think you've just explained why this approach doesn't scale very well, > even to a large team, let alone to inter-organization collaboration > (i.e. open source projects). I admit there's problems. On the other hand, it's a similar problem as the fact that attributes on objects don't have namespaces. It causes problems, but those problems aren't so bad in practice. If you can offer something where configuration can be applied to a set of components without exposing the internal structure of those components, and without the frontend copying each piece destined for an internal application explicitly, then great. I'm not closed to other ideas, but I'm not happy putting it off either. Back when I started up this WSGI thread, it was about just this issue, so it's one of the things I'm fairly concerned about. Unlike deployment, this issue of configuration touches all of my code. So I'm happier putting off deployment, which though it is suboptimal currently, I suspect my code will be forward-compatible to without great effort. >> For instance an application-specific middleware that could plausibly >> be used more widely -- does it consume the application configuration, >> or does it take its own configuration? But even excluding those >> ambiguous situations, the way my middleware is factored is an internal >> implementation detail, and I don't feel comfortable pushing that >> structure into the configuration. > > > That's what encapsulation is for. Just create a factory that takes a > set of application-level parameters (like template_dir, publish_dir, > etc.) and then *passes* them to the lower level components. > > Heck, we could even add that to the .wsgi format... > > # app template file > [WSGI options] > parameters = "template_dir", "publish_dir", ... > > [filter1 from foo] > some_param = template_dir > > [filter2 from bar] > other_param = publish_dir > > > # deployment file > [use file "app_template.wsgi"] > template_dir = "/some/where" > publish_dir = "/another/place" I'm not clear exactly what you are proposing. Let's use a more realistic example. Components: * Exception catcher. Takes "email_errors", which is a list of addresses to email exceptions to. I want to apply this globally. * An application mounted on /, which takes "document_root" and serves up those files directly. * An application mounted at /blog, takes "database" (a string) where all its information is kept. * An application mounted at /admin. Takes "document_root", which is where the editable files are located. Around it goes two pieces of middleware... * A authentication middleware, which takes "database", which is where user information is kept. And... * An authorization middleware, that takes "allowed_roles", and checks it against what the authentication middleware puts in. How would I configure that? >> So that's the issue I'm concerned about. > > > I think the right way to fix it is parameterization; that way you don't > push a global (and non type-checkable) namespace down into each > component. Components should have an extremely minimal configuration > with fairly specific parameters, because it makes early error checking > easier, and you don't have to search all over the place to find how a > parameter is used, etc., etc. If we define schemas for the configuration that components take, that's fine with me. I don't mind being explicit in the design of the components. I just don't want to push all the internal structure into the deployment file, and I don't want changes to the design of a component to effect the design of anything that might wrap that component. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Tue Jul 26 05:01:53 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 25 Jul 2005 22:01:53 -0500 Subject: [Web-SIG] WSGI deployment use case In-Reply-To: <1122343861.3898.91.camel@plope.dyndns.org> References: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com> <42E591FE.5040209@colorstudy.com> <1122343861.3898.91.camel@plope.dyndns.org> Message-ID: <42E5A7A1.3030004@colorstudy.com> Chris McDonough wrote: > How much of this could be solved by using a web server's > directory/alias-mapping facility? > > For instance, if you needed a single Apache webserver to support > multiple pipelines based on URL mapping, wouldn't it be possible in many > cases to compose that out of things like rewrite rules and script > aliases (the below assumes running them just as CGI scripts, obviously > it would be different with something using mod_python or what-have-you): > > > ServerAdmin webmaster at plope.com > ServerName plope.com > ServerAlias plope.com > ScriptAlias /viewcvs "/home/chrism/viewcvs.wsgi" > ScriptAlias /blog "/home/chrism/blog.wsgi" > RewriteEngine On > RewriteRule ^/[^/]viewcvs*$ /home/chrism/viewcvs.wsgi [PT] > RewriteRule ^/[^/]blog*$ /home/chrism/blog.wsgi [PT] > > > Obviously it would mean some repetition in "wsgi" files if you needed to > repeat parts of a pipeline for each URL mapping. But it does mean we > wouldn't need to invent more software. No, we already have templating languages to generate those configuration files so it's no problem ;) Messy configuration files (and RewriteRule for that matter) are my bane. To be fair, in a shared hosting situation (websites maintained by customers, not the host) this would seem more workable than a centralized configuration. Perhaps... it's not the kind of situation I deal with much anymore, so I've lost touch with that case. And would that mean we'd start seeing ".wsgi" in URLs? Hrm. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From chrism at plope.com Tue Jul 26 05:19:28 2005 From: chrism at plope.com (Chris McDonough) Date: Mon, 25 Jul 2005 23:19:28 -0400 Subject: [Web-SIG] WSGI deployment use case In-Reply-To: <42E5A7A1.3030004@colorstudy.com> References: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com> <42E591FE.5040209@colorstudy.com> <1122343861.3898.91.camel@plope.dyndns.org> <42E5A7A1.3030004@colorstudy.com> Message-ID: <1122347969.3898.99.camel@plope.dyndns.org> On Mon, 2005-07-25 at 22:01 -0500, Ian Bicking wrote: > > > > ServerAdmin webmaster at plope.com > > ServerName plope.com > > ServerAlias plope.com > > ScriptAlias /viewcvs "/home/chrism/viewcvs.wsgi" > > ScriptAlias /blog "/home/chrism/blog.wsgi" > > RewriteEngine On > > RewriteRule ^/[^/]viewcvs*$ /home/chrism/viewcvs.wsgi [PT] > > RewriteRule ^/[^/]blog*$ /home/chrism/blog.wsgi [PT] > > > Messy configuration files (and RewriteRule for that matter) are my bane. I agree. In fact, I stole that snippet from my own server and modified it. It would probably do *something* but to be honest I'm not even sure I remember exactly what. ;-) But there's always the docs to fall back on... > To be fair, in a shared hosting situation (websites maintained by > customers, not the host) this would seem more workable than a > centralized configuration. Perhaps... it's not the kind of situation I > deal with much anymore, so I've lost touch with that case. And would > that mean we'd start seeing ".wsgi" in URLs? Hrm. No, I think I just remembered... that's what the RewriteRules are for! ;-) - C From chrism at plope.com Tue Jul 26 07:09:09 2005 From: chrism at plope.com (Chris McDonough) Date: Tue, 26 Jul 2005 01:09:09 -0400 Subject: [Web-SIG] WSGI deployment use case In-Reply-To: <42E5A5C9.2050408@colorstudy.com> References: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com> <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com> <5.1.1.6.0.20050725214457.026ec148@mail.telecommunity.com> <42E5A5C9.2050408@colorstudy.com> Message-ID: <1122354549.3898.126.camel@plope.dyndns.org> Just for a frame of reference, I'll say how I might do these things. These all assume I'd use Apache and mod_python, for better or worse: > I'm not clear exactly what you are proposing. Let's use a more > realistic example. Components: > > * Exception catcher. Takes "email_errors", which is a list of addresses > to email exceptions to. I want to apply this globally. I'd likely do this in my endpoint apps (maybe share some sort of library between them to do it). Errors that occur in middleware would be diagnosable/detectable via mod_python's error logging facility and something like snort. > * An application mounted on /, which takes "document_root" and serves up > those files directly. Use the webserver. > * An application mounted at /blog, takes "database" (a string) where all > its information is kept. Separate WSGI pipeline descriptor with rewrite rules or whatever aliasing "/blog" to it. > * An application mounted at /admin. Takes "document_root", which is > where the editable files are located. Around it goes two pieces of > middleware... Same as above... > * A authentication middleware, which takes "database", which is where > user information is kept. And... I'd probably make this into a service that would be consumable by applications with a completely separate configuration outside of any deployment spec. For example, I might try to pull Zope's "Pluggable Authentication Utility" out of Zope 3, leaving intact its configurability through ZCML. But if I did put it in middleware, I'd put it in each of my application pipelines (implied by /blog, /admin) in an appropriate place. > * An authorization middleware, that takes "allowed_roles", and checks it > against what the authentication middleware puts in. This one I know wouldn't make into middleware. Instead, I'd use a library much like the thing I proposed as "decsec" (although at the time I wrote that proposal, I did think it would be middleware; I changed my mind). - C From ianb at colorstudy.com Tue Jul 26 08:18:40 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 26 Jul 2005 01:18:40 -0500 Subject: [Web-SIG] WSGI deployment use case In-Reply-To: <1122354549.3898.126.camel@plope.dyndns.org> References: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com> <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com> <5.1.1.6.0.20050725214457.026ec148@mail.telecommunity.com> <42E5A5C9.2050408@colorstudy.com> <1122354549.3898.126.camel@plope.dyndns.org> Message-ID: <42E5D5C0.9080102@colorstudy.com> Well, the stack is really just an example, meant to be more realistic than "sample1" and "sample2". I actually think it's a very reasonable example, but that's not really the point. Presuming this stack, how would you configure it? Chris McDonough wrote: > Just for a frame of reference, I'll say how I might do these things. > These all assume I'd use Apache and mod_python, for better or worse: > > >>I'm not clear exactly what you are proposing. Let's use a more >>realistic example. Components: >> >>* Exception catcher. Takes "email_errors", which is a list of addresses >>to email exceptions to. I want to apply this globally. > > > I'd likely do this in my endpoint apps (maybe share some sort of library > between them to do it). Errors that occur in middleware would be > diagnosable/detectable via mod_python's error logging facility and > something like snort. > > >>* An application mounted on /, which takes "document_root" and serves up >>those files directly. > > > Use the webserver. > > >>* An application mounted at /blog, takes "database" (a string) where all >>its information is kept. > > > Separate WSGI pipeline descriptor with rewrite rules or whatever > aliasing "/blog" to it. > > >>* An application mounted at /admin. Takes "document_root", which is >>where the editable files are located. Around it goes two pieces of >>middleware... > > > Same as above... > > >>* A authentication middleware, which takes "database", which is where >>user information is kept. And... > > > I'd probably make this into a service that would be consumable by > applications with a completely separate configuration outside of any > deployment spec. For example, I might try to pull Zope's "Pluggable > Authentication Utility" out of Zope 3, leaving intact its > configurability through ZCML. > > But if I did put it in middleware, I'd put it in each of my application > pipelines (implied by /blog, /admin) in an appropriate place. > > >>* An authorization middleware, that takes "allowed_roles", and checks it >>against what the authentication middleware puts in. > > > This one I know wouldn't make into middleware. Instead, I'd use a > library much like the thing I proposed as "decsec" (although at the time > I wrote that proposal, I did think it would be middleware; I changed my > mind). From chrism at plope.com Tue Jul 26 09:55:27 2005 From: chrism at plope.com (Chris McDonough) Date: Tue, 26 Jul 2005 03:55:27 -0400 Subject: [Web-SIG] WSGI deployment use case In-Reply-To: <42E5D5C0.9080102@colorstudy.com> References: <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com> <5.1.1.6.0.20050725200735.02806d98@mail.telecommunity.com> <5.1.1.6.0.20050725214457.026ec148@mail.telecommunity.com> <42E5A5C9.2050408@colorstudy.com> <1122354549.3898.126.camel@plope.dyndns.org> <42E5D5C0.9080102@colorstudy.com> Message-ID: <1122364528.3898.148.camel@plope.dyndns.org> On Tue, 2005-07-26 at 01:18 -0500, Ian Bicking wrote: > Well, the stack is really just an example, meant to be more realistic > than "sample1" and "sample2". I actually think it's a very reasonable > example, but that's not really the point. Presuming this stack, how > would you configure it? I typically roll out software to clients using a build mechanism (I happens to use "pymake" at http://www.plope.com/software/pymake/ but anything dependency-based works). I write "generic" build scripts for all of the software components. For example, I might write makefiles that check out and build python, openldap, mysql and so on (each into a "non-system" location). I leave a bit of room for customization in their build definitions that I can override from within a "profile". A "profile" is a set of customized software builds for a specific purpose. I might have, maybe, 3 different profiles for each customer where the profile usually works out to be tied to machine function (load balancer, app server, database server). I mantain these build scripts and the profiles in CVS for each customer. I never install anything by hand, I always change the buildout and rerun it if I need to get something set up. This usually works out pretty well because to roll out a new major version of software, I just rerun the build scripts for a particular profile and move the data over. Usually the only thing that needs to change frequently are a few bits of software that are checked out of version control, so doing "cvs up" on those bits typically gets me where I need to be unless it's a major revision. So in this case, I'd likely write a build that either built Apache from source or at least created an "httpd-includes" file meant to be referenced from within the "system" Apache config file with the proper stuff in it given the profile's purpose. The build would also download and install Python, it would get the the proper eggs and/or Python software and the database, and so forth. All the configuration would be done via the "profile" which is in version control. I don't know if this kind of thing works for everybody, but it has worked well for me so far. I do this all the time, and I have a good library of buildout scripts already so it's less painful for me than it might be for someone who is starting from scratch. That said, it is time-consuming and imperfect... upgrades are the most painful. New installs are simple, though. So, anyway, the short answer is "I write a script to do the config for me so I can repeat it on demand". - C From ianb at colorstudy.com Thu Jul 28 18:09:40 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 28 Jul 2005 11:09:40 -0500 Subject: [Web-SIG] JS libs: MochiKit Message-ID: <42E90344.1080300@colorstudy.com> Since there was a bunch of interest in Javascript libraries here before, for anyone who hasn't seen it I thought I'd note MochiKit: http://mochikit.com/ It's a fairly recent entrant from our own Bob Ippolito (http://bob.pythonmac.org/). Tests, docs, and a bit of the flavor of Python. Just a bit, really -- there's only so much you can do in Javascript. -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From brenocon at gmail.com Thu Jul 28 19:22:09 2005 From: brenocon at gmail.com (Brendan O'Connor) Date: Thu, 28 Jul 2005 10:22:09 -0700 Subject: [Web-SIG] JS libs: MochiKit In-Reply-To: <42E90344.1080300@colorstudy.com> References: <42E90344.1080300@colorstudy.com> Message-ID: here's another: http://prototype.conio.net/ not as much documentation, not from a python-er, but I've heard it's pretty useful. Brendan On Thu, 28 Jul 2005 09:09:40 -0700, Ian Bicking wrote: > Since there was a bunch of interest in Javascript libraries here before, > for anyone who hasn't seen it I thought I'd note MochiKit: > > http://mochikit.com/ > > It's a fairly recent entrant from our own Bob Ippolito > (http://bob.pythonmac.org/). Tests, docs, and a bit of the flavor of > Python. Just a bit, really -- there's only so much you can do in > Javascript. > -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ From dangoor at gmail.com Thu Jul 28 19:27:59 2005 From: dangoor at gmail.com (Kevin Dangoor) Date: Thu, 28 Jul 2005 13:27:59 -0400 Subject: [Web-SIG] JS libs: MochiKit In-Reply-To: References: <42E90344.1080300@colorstudy.com> Message-ID: <3f085ecd0507281027289fce8@mail.gmail.com> Prototype does its business by mucking with Object.prototype, which many people think is a no-no (it breaks certain things that you might reasonably expect to work). Prototype has gained a fair bit of acceptance because of Rails. So much so that MochiKit even includes the $(elementid) shorthand for document.getElementById(elementid) that Prototype has popularized. If no one beats me to it, I'm hoping to port some of the visual goodies from script.aculo.us and Rico, both of which are based on Prototype. Kevin On 7/28/05, Brendan O'Connor wrote: > here's another: http://prototype.conio.net/ > > not as much documentation, not from a python-er, but I've heard it's > pretty useful. From jonathan at carnageblender.com Thu Jul 28 19:30:42 2005 From: jonathan at carnageblender.com (Jonathan Ellis) Date: Thu, 28 Jul 2005 10:30:42 -0700 Subject: [Web-SIG] JS libs: MochiKit In-Reply-To: References: <42E90344.1080300@colorstudy.com> Message-ID: <1122571842.19870.239456015@webmail.messagingengine.com> On Thu, 28 Jul 2005 10:22:09 -0700, "Brendan O'Connor" said: > here's another: http://prototype.conio.net/ > > not as much documentation, not from a python-er, but I've heard it's > pretty useful. Heh. I don't think Bob would appreciate mochikit being mentioned in the same breath as prototype. :) http://bob.pythonmac.org/archives/2005/07/01/javascript-frameworks/ -Jonathan From ianb at colorstudy.com Fri Jul 29 00:40:04 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 28 Jul 2005 17:40:04 -0500 Subject: [Web-SIG] WSGI deployment: an experiment Message-ID: <42E95EC4.9040906@colorstudy.com> I've created a branch in Paste with a rough experiment in WSGI deployment, declarative but (I think) more general than what's been discussed. The branch is at: http://svn.pythonpaste.org/Paste/branches/wsgi-deployment-experiment/ All the specific modules for this stuff are in wsgi_*; wsgi_deploy.py being the main one. And an application that is runnable with it is at: http://svn.pythonpaste.org/Paste/apps/FileBrowser/trunk/ It's experimental. It's far too bound to ConfigParser. Maybe it's too closely bound to .ini files in general. It doesn't handle multiple files or file references well at all. Actually, not just not well, but just not at all. But I think it's fairly simple and usable as a proof of concept. And here's the deployment file, with some comments added: # This is a special section for the server. Probably it should # just be named "server", but eh. This is for when you use # paste.wsgi_deploy.make_deployment -- you can also create an # application from this file without serving it; it just happens # to be that you can put both application sections and a server # section in the same file without clashing... [server:main] # use: does pkg_resources.load_entry_point(spec, 'type...', name) # you can also use "factory" to avoid eggishness. # servers have a type of wsgi.server_factory00 # applications have a type of wsgi.app_factory00 # filters (aka middleware) have a type of wsgi.filter_factory00 use: Paste wsgiutils port: 8080 host: 127.0.0.1 # "main" is the application that is loaded when this file is # loaded. [application: main] # This is an application factory. The application factory is passed # app_factory(this_configparser_object, this_section), and returns # the application. In this case the pipeline factory will use other # sections in the config file to compose middleware. use: Paste pipeline # These each refer to sections; the last item is an application, the # others are filters. pipeline: printdebug urlmap # Here's that filter. [filter: printdebug] use: Paste printdebug # This isn't a filter, even though it dispatches, because it doesn't # dispatch to a single application. [application: urlmap] use: Paste urlmap # Path like things are used to map to other named applications. # In this case nothing is mapped to /, so you'll get a 404 unless # you go to one of these paths. But something could be mapped to /, # of course. /home = fb1 /other = fb2 # This is the first real application. [application: fb1] use: FileBrowser app # This is a configuration parameter that is passed to the application. # The actual passing happens in wsgi_deploy.make_paste_app, which # is invoked by the 'app' entry point. It uses the paste convention # of a flat configuration. browse_path = /home/ianb # And the same app, but with different configuration. Of course # the pipeline app could also be used, or whatever. Ideally it # should be easier to point to other files, not just other sections. [application: fb2] use: FileBrowser app browse_path = /home/rflosi -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Fri Jul 29 02:44:07 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 28 Jul 2005 19:44:07 -0500 Subject: [Web-SIG] WSGI deployment: an experiment In-Reply-To: <42E95EC4.9040906@colorstudy.com> References: <42E95EC4.9040906@colorstudy.com> Message-ID: <42E97BD7.2020807@colorstudy.com> Ian Bicking wrote: > It's experimental. It's far too bound to ConfigParser. Maybe it's too > closely bound to .ini files in general. It doesn't handle multiple > files or file references well at all. Actually, not just not well, but > just not at all. But I think it's fairly simple and usable as a proof > of concept. I realize the code really wants is a couple callbacks into the configuration. Applications should be able to construct other applications, and applications should be able to read the other variables in their section. Where I'm passing around (config_parser_instance, section_name), I should be passing around (config_context, section_data), and config_context would be an object that could build other applications based on name (or filename). -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org From renesd at gmail.com Fri Jul 29 03:14:56 2005 From: renesd at gmail.com (Rene Dudfield) Date: Fri, 29 Jul 2005 11:14:56 +1000 Subject: [Web-SIG] WSGI deployment: an experiment In-Reply-To: <42E95EC4.9040906@colorstudy.com> References: <42E95EC4.9040906@colorstudy.com> Message-ID: <64ddb72c05072818144e70ae16@mail.gmail.com> Hey, There is a lot of terminology here that would not be understood by some random sys admin coming to have a look at the config file. Below I pasted it here without the comments. Sometimes it is good to have a look at things without comments to see how readable they are. Is this config file reusable? Can I place it in a path of other apps, and then it could live in say /app2 instead of at / Can it not care about the server it is running on? [server:main] use: Paste wsgiutils port: 8080 host: 127.0.0.1 [application: main] use: Paste pipeline pipeline: printdebug urlmap [filter: printdebug] use: Paste printdebug [application: urlmap] use: Paste urlmap /home = fb1 /other = fb2 [application: fb1] use: FileBrowser app browse_path = /home/ianb [application: fb2] use: FileBrowser app browse_path = /home/rflosi On 7/29/05, Ian Bicking wrote: > I've created a branch in Paste with a rough experiment in WSGI > deployment, declarative but (I think) more general than what's been > discussed. The branch is at: > > http://svn.pythonpaste.org/Paste/branches/wsgi-deployment-experiment/ > > All the specific modules for this stuff are in wsgi_*; wsgi_deploy.py > being the main one. > > And an application that is runnable with it is at: > > http://svn.pythonpaste.org/Paste/apps/FileBrowser/trunk/ > > It's experimental. It's far too bound to ConfigParser. Maybe it's too > closely bound to .ini files in general. It doesn't handle multiple > files or file references well at all. Actually, not just not well, but > just not at all. But I think it's fairly simple and usable as a proof > of concept. > > > And here's the deployment file, with some comments added: > > # This is a special section for the server. Probably it should > # just be named "server", but eh. This is for when you use > # paste.wsgi_deploy.make_deployment -- you can also create an > # application from this file without serving it; it just happens > # to be that you can put both application sections and a server > # section in the same file without clashing... > [server:main] > # use: does pkg_resources.load_entry_point(spec, 'type...', name) > # you can also use "factory" to avoid eggishness. > # servers have a type of wsgi.server_factory00 > # applications have a type of wsgi.app_factory00 > # filters (aka middleware) have a type of wsgi.filter_factory00 > use: Paste wsgiutils > port: 8080 > host: 127.0.0.1 > > # "main" is the application that is loaded when this file is > # loaded. > [application: main] > # This is an application factory. The application factory is passed > # app_factory(this_configparser_object, this_section), and returns > # the application. In this case the pipeline factory will use other > # sections in the config file to compose middleware. > use: Paste pipeline > # These each refer to sections; the last item is an application, the > # others are filters. > pipeline: printdebug urlmap > > # Here's that filter. > [filter: printdebug] > use: Paste printdebug > > # This isn't a filter, even though it dispatches, because it doesn't > # dispatch to a single application. > [application: urlmap] > use: Paste urlmap > # Path like things are used to map to other named applications. > # In this case nothing is mapped to /, so you'll get a 404 unless > # you go to one of these paths. But something could be mapped to /, > # of course. > /home = fb1 > /other = fb2 > > # This is the first real application. > [application: fb1] > use: FileBrowser app > > # This is a configuration parameter that is passed to the application. > # The actual passing happens in wsgi_deploy.make_paste_app, which > # is invoked by the 'app' entry point. It uses the paste convention > # of a flat configuration. > browse_path = /home/ianb > > # And the same app, but with different configuration. Of course > # the pipeline app could also be used, or whatever. Ideally it > # should be easier to point to other files, not just other sections. > [application: fb2] > use: FileBrowser app > > browse_path = /home/rflosi > > > > -- > Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/renesd%40gmail.com > From ianb at colorstudy.com Fri Jul 29 06:12:03 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 28 Jul 2005 23:12:03 -0500 Subject: [Web-SIG] WSGI deployment: an experiment In-Reply-To: <64ddb72c05072818144e70ae16@mail.gmail.com> References: <42E95EC4.9040906@colorstudy.com> <64ddb72c05072818144e70ae16@mail.gmail.com> Message-ID: <2c144e0434fdb717f97f0cba0ea1c210@colorstudy.com> On Jul 28, 2005, at 8:14 PM, Rene Dudfield wrote: > There is a lot of terminology here that would not be understood by > some random sys admin coming to have a look at the config file. Yeah... I don't know. I suppose if it looked like Apache it would feel more natural. The "use" stuff is, IMHO, has simple as it can be made. The application configuration ("browse_path") is pretty much free-form. So something like urlmap could have been like: Application FileBrowser app It's more special-case than I like, but maybe that's okay. This would imply something ZConfig-based. But still, there's no magic bullet for configuration, there's always something new to figure out, so IMHO the usability is more about error handling and the like. > Below I pasted it here without the comments. Sometimes it is good to > have a look at things without comments to see how readable they are. > > Is this config file reusable? Can I place it in a path of other apps, > and then it could live in say /app2 instead of at / Can it not care > about the server it is running on? The application that this configuration file describes can be mounted anywhere; so you could reference it from another configuration file which put the whole batch at /app2. What it doesn't do yet (but wouldn't be hard) would be something like: [application: urlmap] use: Paste urlmap /webmail: config_file.ini Then that configuration file could in turn have another urlmap entry, dispatching to yet more applications. urlmap incidentally supports virtual hosts as well as path dispatching, so you could do: http://foobar.com = foobar.ini Well... that's where .ini syntax fails (the ":"), but we'll ignore that... The server doesn't matter. It just happens that the server and application configuration live in differently-named sections which don't clash, so they can go in the same configuration file. You could have the files separate just as easily. Not that my script supports that, but since it's a three-line frontend at this point... >  > [server:main] > use: Paste wsgiutils > port: 8080 > host: 127.0.0.1 > > [application: main] > use: Paste pipeline > pipeline: printdebug urlmap > > [filter: printdebug] > use: Paste printdebug > > [application: urlmap] > use: Paste urlmap > /home = fb1 > /other = fb2 > > [application: fb1] > use: FileBrowser app > browse_path = /home/ianb > > [application: fb2] > use: FileBrowser app > browse_path = /home/rflosi -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From ianb at colorstudy.com Fri Jul 29 18:22:42 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 29 Jul 2005 11:22:42 -0500 Subject: [Web-SIG] WSGI deployment: an experiment In-Reply-To: <42E95EC4.9040906@colorstudy.com> References: <42E95EC4.9040906@colorstudy.com> Message-ID: <42EA57D2.1060902@colorstudy.com> Ian Bicking wrote: > I've created a branch in Paste with a rough experiment in WSGI > deployment, declarative but (I think) more general than what's been > discussed. The branch is at: > > http://svn.pythonpaste.org/Paste/branches/wsgi-deployment-experiment/ I've updated the implementation, taking ConfigParser out of the public interface, and cleaning things up some. The config files stay the same (though now you can reference external files with file:, where you would have referenced other sections), but since the Python side is cleaned up here's an example of how the pipeline construct is implemented: def make_pipeline(context): pipeline = context.app_config.get('pipeline', '').split() filters = pipeline[:-1] filters.reverse() app_name = pipeline[-1] deploy = context.deployment_config app = deploy.make_app(app_name) for filter_name in filters: wsgi_filter = deploy.make_filter(filter_name) app = wsgi_filter(app) return app The context object has a reference both to the local configuration values (context.app_config), and the larger configuration file (context.deployment_config). -- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org