From graham.dumpleton at gmail.com Thu May 24 06:43:36 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Thu, 24 May 2007 14:43:36 +1000 Subject: [Web-SIG] Internal redirect using Location with absolute path and status of 200. Message-ID: <88e286470705232143wdc0d1dcl7d947b955d3de2f0@mail.gmail.com> The CGI specification allows for a CGI script to return a 'Location' header which refers to a location within the local web server. Quoting the RFC: """If the Location value is a path, then the server will generate the response that it would have produced in response to a request containing the URL""" In Apache this is honoured when the Status returned by the CGI script is also 200. The end result is that rather than sending a redirect back to the web client, Apache will trigger a new sub request against the path (as a GET request) and return the result of that to the web client. Although the WSGI specification doesn't mention any requirement for a WSGI adapter for a web server to do the same thing, it may be an interesting thing to consider for a future version of the WSGI specification. Does anyone know of any WSGI adapter which currently implements this, besides CGI/WSGI adapters which by virtue of CGI specification should implement it? Has anyone used this convention internal to a WSGI stack as a means of performing local redirection, thereby avoiding forcing the client to do the redirect? Does anyone think this would be nice extension for a WSGI adapter written against current specification to implement even if not necessarily portable? Graham From ianb at colorstudy.com Thu May 24 08:19:21 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 24 May 2007 01:19:21 -0500 Subject: [Web-SIG] Internal redirect using Location with absolute path and status of 200. In-Reply-To: <88e286470705232143wdc0d1dcl7d947b955d3de2f0@mail.gmail.com> References: <88e286470705232143wdc0d1dcl7d947b955d3de2f0@mail.gmail.com> Message-ID: <46552E69.8000807@colorstudy.com> Graham Dumpleton wrote: > Does anyone know of any WSGI adapter which currently implements this, > besides CGI/WSGI adapters which by virtue of CGI specification should > implement it? Not that I know of. > Has anyone used this convention internal to a WSGI stack as a means of > performing local redirection, thereby avoiding forcing the client to > do the redirect? Nope. I use a hook to the most parent application, and then call in to that manually. > Does anyone think this would be nice extension for a WSGI adapter > written against current specification to implement even if not > necessarily portable? Eh. In the context of mod_wsgi, I think it would be more interesting to provide a WSGI application that called back into Apache (basically wrapping Apache's normal subrequest machinery in a WSGI exterior). -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From ianb at colorstudy.com Thu May 24 08:23:12 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 24 May 2007 01:23:12 -0500 Subject: [Web-SIG] Internal redirect using Location with absolute path and status of 200. In-Reply-To: <46552E69.8000807@colorstudy.com> References: <88e286470705232143wdc0d1dcl7d947b955d3de2f0@mail.gmail.com> <46552E69.8000807@colorstudy.com> Message-ID: <46552F50.9080207@colorstudy.com> Ian Bicking wrote: >> Does anyone think this would be nice extension for a WSGI adapter >> written against current specification to implement even if not >> necessarily portable? > > Eh. To add to this, I never found the CGI functionality useful. Why would I do that and not a real redirect? If there's links they'll be broken, because the client and the resource won't agree on what the real request URL was. If the script/app is the consumer, the Location header stuff doesn't work -- you don't get back the response. So it doesn't work for web service style internal requests. If it's something like authentication, that doesn't work either -- you are giving a request path back, which anyone could access, and you haven't added any information to it to specifically permit access (unless there's something in the environment that makes the subrequest clear; I never looked closely enough, nor is it specified). -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From graham.dumpleton at gmail.com Thu May 24 08:49:11 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Thu, 24 May 2007 16:49:11 +1000 Subject: [Web-SIG] Internal redirect using Location with absolute path and status of 200. In-Reply-To: <46552E69.8000807@colorstudy.com> References: <88e286470705232143wdc0d1dcl7d947b955d3de2f0@mail.gmail.com> <46552E69.8000807@colorstudy.com> Message-ID: <88e286470705232349i5b0c21cqdd3ff29cb497f2dd@mail.gmail.com> On 24/05/07, Ian Bicking wrote: > Graham Dumpleton wrote: > > Does anyone think this would be nice extension for a WSGI adapter > > written against current specification to implement even if not > > necessarily portable? > > Eh. In the context of mod_wsgi, I think it would be more interesting to > provide a WSGI application that called back into Apache (basically > wrapping Apache's normal subrequest machinery in a WSGI exterior). I was trying to avoid as much as possible having mod_wsgi provide any sort of hooks which would allow one to perform actions against internals of Apache. I had two reasons for this. The first reason is that it would only be available when using mod_wsgi and thus an application written to that interface would not then be a portable WSGI application. The second reason is that I can't provide that feature for both modes that mod_wsgi operates in. The first mode I speak of here is 'embedded' mode, which is where everything runs in the Apache child processes just like mod_python and thus has access to all the redirection machinery. The second mode and one for which I wouldn't be able to provide this feature for is 'daemon' mode, which is similar to FASTCGI/SCGI solutions, where the WSGI request handling is done within the context of a separate process. Although this daemon process is forked from the main Apache process and managed by Apache, it is only capable of handling the specific WSGI request handed off to it from the Apache child process that received the request, it cant do internal redirects or other complicated stuff involving internal Apache APIs. Because in 'daemon' mode an application can run as a distinct user and not the Apache user, and because it moves the memory bloat of a WSGI application out of the Apache child processes, this mode is likely to be the preferred way of using mod_wsgi. If the feature weren't to be available in this mode it is sort of questionable whether one should provide it for 'embedded' mode. The 'Location' header based redirect I can though do in both modes no problem, as the 'Location' header redirect would be handled in Apache child process when handling response from daemon process. It can only redirect to a GET request as POST data would have already been consumed as necessary to send that through to the daemon process in case it was required just like if it was a CGI script. In the longer term I am looking at doing an alternate Apache module for using Python similar to mod_python which gives full Apache API access using SWIG rather than hand crafted API like mod_python. This Apache module may be a better vehicle for hosting WSGI application where you want to hook into Apache internals. Thus, mod_wsgi might be seen as average users solution and suitable (safe) for commodity web hosting, whereas this other Apache module I speak of might be seen as more for power users who dedicate Apache instance to the application. Graham From graham.dumpleton at gmail.com Thu May 24 08:52:47 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Thu, 24 May 2007 16:52:47 +1000 Subject: [Web-SIG] Internal redirect using Location with absolute path and status of 200. In-Reply-To: <46552F50.9080207@colorstudy.com> References: <88e286470705232143wdc0d1dcl7d947b955d3de2f0@mail.gmail.com> <46552E69.8000807@colorstudy.com> <46552F50.9080207@colorstudy.com> Message-ID: <88e286470705232352v41628913j8fffed7c11843fb0@mail.gmail.com> On 24/05/07, Ian Bicking wrote: > Ian Bicking wrote: > >> Does anyone think this would be nice extension for a WSGI adapter > >> written against current specification to implement even if not > >> necessarily portable? > > > > Eh. > > To add to this, I never found the CGI functionality useful. Why would I > do that and not a real redirect? If there's links they'll be broken, > because the client and the resource won't agree on what the real request > URL was. If the script/app is the consumer, the Location header stuff > doesn't work -- you don't get back the response. So it doesn't work for > web service style internal requests. If it's something like > authentication, that doesn't work either -- you are giving a request > path back, which anyone could access, and you haven't added any > information to it to specifically permit access (unless there's > something in the environment that makes the subrequest clear; I never > looked closely enough, nor is it specified). Understand. Although CGI specifies it, if it isn't in practice useful for much then no point doing it. I'll thus not worry about it unless someone comes along with a very compelling argument of how it may actually be useful. Graham From ianb at colorstudy.com Thu May 24 17:07:43 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 24 May 2007 10:07:43 -0500 Subject: [Web-SIG] Internal redirect using Location with absolute path and status of 200. In-Reply-To: <88e286470705232349i5b0c21cqdd3ff29cb497f2dd@mail.gmail.com> References: <88e286470705232143wdc0d1dcl7d947b955d3de2f0@mail.gmail.com> <46552E69.8000807@colorstudy.com> <88e286470705232349i5b0c21cqdd3ff29cb497f2dd@mail.gmail.com> Message-ID: <4655AA3F.4010208@colorstudy.com> Graham Dumpleton wrote: > On 24/05/07, Ian Bicking wrote: >> Graham Dumpleton wrote: >> > Does anyone think this would be nice extension for a WSGI adapter >> > written against current specification to implement even if not >> > necessarily portable? >> >> Eh. In the context of mod_wsgi, I think it would be more interesting to >> provide a WSGI application that called back into Apache (basically >> wrapping Apache's normal subrequest machinery in a WSGI exterior). > > I was trying to avoid as much as possible having mod_wsgi provide any > sort of hooks which would allow one to perform actions against > internals of Apache. I had two reasons for this. This is a much more constrained hook into Apache than what mod_python provides. For instance, you could provide much the same thing, but where subrequests actually go out over HTTP. There's quite a bit of data you couldn't share over HTTP, so it's not entirely equivalent, but it's still pretty close (especially if there was something on the Apache side to fix up the slightly-richer-than-HTTP environment based on special headers). -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers From graham.dumpleton at gmail.com Sat May 26 06:33:38 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Sat, 26 May 2007 14:33:38 +1000 Subject: [Web-SIG] Internal redirect using Location with absolute path and status of 200. In-Reply-To: <4655AA3F.4010208@colorstudy.com> References: <88e286470705232143wdc0d1dcl7d947b955d3de2f0@mail.gmail.com> <46552E69.8000807@colorstudy.com> <88e286470705232349i5b0c21cqdd3ff29cb497f2dd@mail.gmail.com> <4655AA3F.4010208@colorstudy.com> Message-ID: <88e286470705252133v12ccc010n80bd7fdeb9c4ad72@mail.gmail.com> On 25/05/07, Ian Bicking wrote: > Graham Dumpleton wrote: > > On 24/05/07, Ian Bicking wrote: > >> Graham Dumpleton wrote: > >> > Does anyone think this would be nice extension for a WSGI adapter > >> > written against current specification to implement even if not > >> > necessarily portable? > >> > >> Eh. In the context of mod_wsgi, I think it would be more interesting to > >> provide a WSGI application that called back into Apache (basically > >> wrapping Apache's normal subrequest machinery in a WSGI exterior). > > > > I was trying to avoid as much as possible having mod_wsgi provide any > > sort of hooks which would allow one to perform actions against > > internals of Apache. I had two reasons for this. > > This is a much more constrained hook into Apache than what mod_python > provides. For instance, you could provide much the same thing, but > where subrequests actually go out over HTTP. There's quite a bit of > data you couldn't share over HTTP, so it's not entirely equivalent, but > it's still pretty close (especially if there was something on the Apache > side to fix up the slightly-richer-than-HTTP environment based on > special headers). If I am going to be providing any sort of way of interacting with Apache internals though, I don't want to be in the business of having to write custom wrappers for performing specific tasks. This would just turn mod_wsgi into just another framework rather than being just the absolute minimal WSGI adapter it is. When I did look at allowing it to be more than just an adapter, the approach I looked at for providing what you want to do is to simply pass through the SWIG wrapping for the Apache request object (request_rec) in the WSGI environment. This wouldn't be something that I saw happening by default though as by doing so it would place a dependency of mod_wsgi on all the SWIG bindings for Apache and would also be a way of circumventing the locking down of what the user is allowed to do with mod_wsgi. Ie., am trying to make mod_wsgi be as safe as possible so that web hosting companies might consider looking at it for use in shared hosting environments. If it were to have just as many problems and unknowns as mod_python, the whole exercise in writing it would be a waste of time. All this means is that to enable the feature you would first need to specify a configuration directive in the main Apache configuration something like: WSGIExtensions RequestRec This would just indicate that passing a request object would be allowed, you would still then need to enable it for a specific application (part of the URL namespace). WSGIPassRequestRec On That done, the request object could be accessed as 'apache.request_rec' in the WSGI environment. Although only the request object is being passed that is enough, as the separate SWIG bindings for the Apache API, which would not even be a part of mod_wsgi but a separate package, would then provide everything else. The SWIG bindings would though just be a direct mapping to the C API with no real wrapping giving it a Pythonic feel. Thus for example your internal redirect would be written something like: from apache.http_request import * def application(environ, start_response): r = environ['apache.request_rec'] ap_internal_redirect('/some/other/path', r) # Dummy WSGI response as redirect already sent response. start_response('200 OK', []) return [] If desired, people could then write if they wish WSGI component objects which wrap up such low level Apache API calls to do things. One example is obviously an internal redirect, but another may be use apache.mod_ssl.ssl_var_lookup() to lookup specific properties of a client side SSL certificate which wouldn't otherwise be available to an WSGI application. For cases like accessing SSL certificate information using the API there wouldn't be a big problem, but one problem with something like internal redirects is that the way WSGI applications return a response isn't a direct mapping to the lower level Apache handler response but is more complicated than that. Thus you end up having to use some sort of dummy response which wouldn't add to what a sub request may have already returned. Alternatively, you have to provide other stuff in the WSGI environment which the application could use in some way to raise an exception that would then be caught by mod_wsgi and taken to mean that the normal WSGI application response processing doesn't need to be done, but that a normal Apache API status value of OK still be returned (different to HTTP_OK). In other words the mismatch in the APIs and that the WSGI interface is not as rich as the Apache handler API as far as how a handler response and the HTTP status can be indicated can make it all just a bit messy. Also, some of the things one can do through the Apache API are stepping outside of the flow of operations with WSGI applications. Just as comparison, if using just the Apache API direct, it would have been written as: from apache.httpd import * from apache.http_request import * def handler(r): ap_internal_redirect('/some/other/path', r) return DONE Important here is that the value DONE is being returned, which indicates to Apache that a complete response has been provided, by virtue of the sub request, and that for the parent handler processing nothing more should be done if there did so happen to be further handler registered for the response handler phase. This is in contrast to a standard Apache type handler which might have been: from apache.httpd import * from apache.http_protocol import * def handler(r): content = 'hello world!\n' ap_set_content_type(r, 'text/plain') ap_set_content_length(r, len(content)) ap_rwrite(content, r) return OK So in practice they are quite different worlds and my feeling is that allowing WSGI applications to call back into the Apache internals may just cause more problems than its worth, especially since you can't properly represent the low level Apache handler response in the response to the WSGI application. Another mismatch and one I am already having to contend with with mod_wsgi is that HTTP error status can be indicated in two ways. the first is: from apache.httpd import * from apache.http_protocol import * def handler(r): content = 'NOT FOUND. GO AWAY.\n' r.status = HTTP_NOT_FOUND ap_set_content_type(r, 'text/plain') ap_set_content_length(r, len(content)) ap_rwrite(content, r) return OK By returning OK here and using r.status to indicate the HTTP status code, it tells Apache that I have already provided a response body and thus it shouldn't try and provide one through processing ErrorDocument directives or by adding its own default. The other option is: from apache.httpd import * def handler(r): return HTTP_NOT_FOUND In this case, because the HTTP status code was returned as the actual response, it indicates that I haven't provided a response body and thus Apache should instead try and provide one. In WSGI, when something like: start_response('404 Not Found', []) is used, it is still up to the WSGI application to provide a response body with the content to the page to be displayed in the browser. If it doesn't then the browser will prevent a black page. What I don't know is if mod_wsgi should be trying to be smart and pick up where a HTTP error response is returned but where there is no response body and instead of doing equivalent of setting r.status = HTTP_NOT_FOUND and returning OK, just return HTTP_NOT_FOUND as result as in second example, such that Apache can instead get the chance of providing an error page since the WSGI application didn't actually provide one. I know I am rambling, but if you have got this far and followed what I am going on about in the last example, I might ask what you do about error pages. How do you ensure a consistent error page layout across a whole WSGI application containing many disparate components? Do you try and pass down through the WSGI environment some special hook an application can call to generate errors pages with the same style, but then cause a dependency on this hook existing? Do you instead allow a WSGI application to return an empty error page and have a higher up WSGI middleware component catch that and substitute its own based on the error type, ie., in similar style to Apache and ErrorDocument directive? Or do you do something else? The problem then as above is what does one do at the boundary between a WSGI application and the web server hosting it? Do you just always assume a WSGI application provides an error page, or allow some way that a WSGI application can defer to the web server the task of generating an error page instead? Graham From fumanchu at amor.org Sat May 26 18:05:25 2007 From: fumanchu at amor.org (Robert Brewer) Date: Sat, 26 May 2007 09:05:25 -0700 Subject: [Web-SIG] Internal redirect using Location with absolute pathand status of 200. References: <88e286470705232143wdc0d1dcl7d947b955d3de2f0@mail.gmail.com><46552E69.8000807@colorstudy.com><88e286470705232349i5b0c21cqdd3ff29cb497f2dd@mail.gmail.com><4655AA3F.4010208@colorstudy.com> <88e286470705252133v12ccc010n80bd7fdeb9c4ad72@mail.gmail.com> Message-ID: <435DF58A933BA74397B42CDEB8145A86224DAC@ex9.hostedexchange.local> Graham Dumpleton wrote: > The problem then as above is what does one do at > the boundary between a WSGI application and the > web server hosting it? Do you just always assume > a WSGI application provides an error page, or allow > some way that a WSGI application can defer to the > web server the task of generating an error page > instead? The web server should not be trying to correct what the application gives it, except as absolutely necessary to get the response on the wire, or obey the HTTP spec (and even that's questionable). If you want consistent error pages across an entire application or site, write a piece of WSGI middleware which does that and explicitly include it in your middleware graph. To quote the PEP: In general, servers and gateways should "play dumb" and allow the application complete control over its output. They should only make changes that do not alter the effective semantics of the application's response. It is always possible for the application developer to add middleware components to supply additional features, so server/gateway developers should be conservative in their implementation. In a sense, a server should consider itself to be like an HTTP "gateway server", with the application being an HTTP "origin server". (See RFC 2616, section 1.3, for the definition of these terms.) Robert Brewer System Architect Amor Ministries fumanchu at amor.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/web-sig/attachments/20070526/9875eebf/attachment.htm From graham.dumpleton at gmail.com Sun May 27 00:25:06 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Sun, 27 May 2007 08:25:06 +1000 Subject: [Web-SIG] Internal redirect using Location with absolute pathand status of 200. In-Reply-To: <435DF58A933BA74397B42CDEB8145A86224DAC@ex9.hostedexchange.local> References: <88e286470705232143wdc0d1dcl7d947b955d3de2f0@mail.gmail.com> <46552E69.8000807@colorstudy.com> <88e286470705232349i5b0c21cqdd3ff29cb497f2dd@mail.gmail.com> <4655AA3F.4010208@colorstudy.com> <88e286470705252133v12ccc010n80bd7fdeb9c4ad72@mail.gmail.com> <435DF58A933BA74397B42CDEB8145A86224DAC@ex9.hostedexchange.local> Message-ID: <88e286470705261525p2cb775fk249a8d2b9ef44d79@mail.gmail.com> On 27/05/07, Robert Brewer wrote: > > > > Graham Dumpleton wrote: > > The problem then as above is what does one do at > > the boundary between a WSGI application and the > > web server hosting it? Do you just always assume > > a WSGI application provides an error page, or allow > > some way that a WSGI application can defer to the > > web server the task of generating an error page > > instead? > > The web server should not be trying to correct what > the application gives it, except as absolutely necessary > to get the response on the wire, or obey the HTTP spec > (and even that's questionable). If you want consistent > error pages across an entire application or site, write > a piece of WSGI middleware which does that and explicitly > include it in your middleware graph. Your thinking just WSGI though. The whole thing with Apache is that it is possible to bring together application components from many different sources. Thus Apache in this case is more than just a web server but an application framework itself in some ways. Ie., your application could be a combination of CGI scripts written in Perl, Python etc, handlers or page templating systems making use of PHP, mod_perl, mod_python etc, as well as WSGI components. Apache already provides ways for all these components to be able to signal that the error page generation be done by the Apache core so it is consistent, but the way that WSGI is defined one can't use it. Thus, I am looking at the bigger picture of those people who want to use Apache as a the framework for building a web application and who don't want to restrict themselves to just Python and WSGI. Graham