From paul.boddie at ementor.no Mon Aug 2 13:01:38 2004 From: paul.boddie at ementor.no (Paul Boddie) Date: Mon Aug 2 13:01:46 2004 Subject: [Web-SIG] AMK's "Web applications (again)" Message-ID: Hello, Having just caught up with the Daily Python-URL after being away for a few days, I saw that there had been some commentary on writing Web applications in Python. Has anyone given any more thought to the various standardisation activities that were discussed on this list? Paul From pje at telecommunity.com Mon Aug 2 19:23:09 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 2 19:19:08 2004 Subject: [Web-SIG] AMK's "Web applications (again)" In-Reply-To: Message-ID: <5.1.1.6.0.20040802131446.053f00f0@mail.telecommunity.com> At 01:01 PM 8/2/04 +0200, Paul Boddie wrote: >Hello, > >Having just caught up with the Daily Python-URL after being away for a >few days, I saw that there had been some commentary on writing Web >applications in Python. Has anyone given any more thought to the various >standardisation activities that were discussed on this list? One comment on the blog caught my eye: """ You know, this rant (that Python has too much vs. Java) always bugged me, exactly for the remark you made: who can make sense out of what's available *just in Jakarta* without pulling hairs? The Java developers I work with choose their "faith", and that's the road they travel on, and rarely track the other frameworks. But: all these frameworks *deploy* in a standard fashion, and (I think) the frameworks can happily co-exist in the same deployment. That's the part that I find lacking in Python: all the apps have their own deployment strategies, and often seem to relish in Python's ease of setting up micro-servers. Posted by Roger Espinosa at July 28, 2004 06:32 AM """ This is what the WSGI proposal is meant to tackle. I'm currently still putting off a rewrite of the proposal to address the issues raised by folks on this list, and to extend it slightly to better support architectures that want to either be asynchronous or to pipeline request preprocessors or response postprocessors. From paul.boddie at ementor.no Tue Aug 3 14:29:17 2004 From: paul.boddie at ementor.no (Paul Boddie) Date: Tue Aug 3 14:29:21 2004 Subject: [Web-SIG] AMK's "Web applications (again)" Message-ID: Phillip J. Eby [mailto:pje@telecommunity.com] wrote: > [Roger Espinosa] > > But: all these frameworks *deploy* in a standard fashion, and (I think) > > the frameworks can happily co-exist in the same deployment. > > That's the part that I find lacking in Python: all the apps have their > > own deployment strategies, and often seem to relish in Python's ease of > > setting up micro-servers. > > This is what the WSGI proposal is meant to tackle. I'm currently still > putting off a rewrite of the proposal to address the issues raised by > folks on this list, and to extend it slightly to better support > architectures that want to either be asynchronous or to pipeline request > preprocessors or response postprocessors. Originally, when I looked at the proposal, I interpreted it as a means to run different server frameworks on top of a common "transport" container - a kind of multiplexing arrangement. However, looking at the text now, what the WSGI proposal [1] also seems to advocate (looking at certain examples [2] for clarification) is the ability of applications to use existing framework APIs but then to have those applications deployed on other frameworks - to a Webware application, for example, all frameworks look like Webware. But then, looking deeper at the proposal, I wonder how the WSGI-defined concepts fit in with those framework APIs. If input, output, errors and environ have standardised semantics, it occurs to me that the semantics of the framework-specific objects used by the applications must be translated to the WSGI semantics as defined in the proposal. Otherwise, it appears that these semantics get exposed to the applications themselves, making them non-standard within the context of the framework API they are using. Anyway, I'm curious as to how the described interface relates to the WebStack API in purpose and functionality. The principal objective of WebStack is to provide applications with a common API across different underlying server frameworks - to WebStack applications, all frameworks appear the same (they provide the WebStack API). Perhaps there is some overlap between the semantic translation part of WSGI and the objects provided by WebStack. Moreover, it might also be interesting to contrast the concept of a WebStack framework adapter with parts of the WSGI proposal. On the subject of deployment, however, the Java-style standardised deployment with things like .war files and descriptors doesn't necessarily deliver everything that the hype would suggest, as anyone who has had to work with more than one application server would know. Moreover, the more Web applications appear like normal Python packages and programs, the easier they are likely to be to deploy, especially if the means of deployment doesn't involve uploading some archive file to some Web application and clicking a "restart" button - something that turned many people away from Zope, I'd wager. In certain circumstances, the lightweight "micro-servers" really do have their advantages... Paul [1] http://mail.python.org/pipermail/web-sig/2003-December/000394.html [2] http://mail.python.org/pipermail/web-sig/2003-December/000417.html From pje at telecommunity.com Tue Aug 3 17:12:51 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Aug 3 17:08:50 2004 Subject: [Web-SIG] AMK's "Web applications (again)" In-Reply-To: Message-ID: <5.1.1.6.0.20040803110315.01ea4ec0@mail.telecommunity.com> At 02:29 PM 8/3/04 +0200, Paul Boddie wrote: >Phillip J. Eby [mailto:pje@telecommunity.com] wrote: > > > >[Roger Espinosa] > > > > But: all these frameworks *deploy* in a standard fashion, and (I >think) > > > the frameworks can happily co-exist in the same deployment. > > > That's the part that I find lacking in Python: all the apps have >their > > > own deployment strategies, and often seem to relish in Python's ease >of > > > setting up micro-servers. > > > > This is what the WSGI proposal is meant to tackle. I'm currently >still > > putting off a rewrite of the proposal to address the issues raised by > > folks on this list, and to extend it slightly to better support > > architectures that want to either be asynchronous or to pipeline >request > > preprocessors or response postprocessors. > >Originally, when I looked at the proposal, I interpreted it as a means >to run different server frameworks on top of a common "transport" >container - a kind of multiplexing arrangement. However, looking at the >text now, what the WSGI proposal [1] also seems to advocate (looking at >certain examples [2] for clarification) is the ability of applications >to use existing framework APIs but then to have those applications >deployed on other frameworks - to a Webware application, for example, >all frameworks look like Webware. Right, that's not a bad way of putting it. When I redo the PEP for WSGI, the terminology will hopefully be clearer. In fact, "WSGI" now stands for "Web Server Gateway Interface", so it might be more correct to say that "a Webware application could run on any web server that supports WSGI." >But then, looking deeper at the proposal, I wonder how the WSGI-defined >concepts fit in with those framework APIs. If input, output, errors and >environ have standardised semantics, it occurs to me that the semantics >of the framework-specific objects used by the applications must be >translated to the WSGI semantics as defined in the proposal. Absolutely. However, those are by-and-large the semantics of HTTP and CGI, which form the basis for most existing web servers and gateway protocols. Any framework that supports being run under CGI (or FastCGI, or any of the FastCGI clones) is relatively simple to adapt to WSGI. >Anyway, I'm curious as to how the described interface relates to the >WebStack API in purpose and functionality. The principal objective of >WebStack is to provide applications with a common API across different >underlying server frameworks - to WebStack applications, all frameworks >appear the same (they provide the WebStack API). Perhaps there is some >overlap between the semantic translation part of WSGI and the objects >provided by WebStack. Moreover, it might also be interesting to contrast >the concept of a WebStack framework adapter with parts of the WSGI >proposal. I took a brief look at WebStack yesterday; my impression is that under WSGI, your framework adapters would be unnecessary, because you'd just have one for WSGI. From WSGI's point of view, WebStack is just another Python web framework. >On the subject of deployment, however, the Java-style standardised >deployment with things like .war files and descriptors doesn't >necessarily deliver everything that the hype would suggest, as anyone >who has had to work with more than one application server would know. Well, WSGI isn't trying to standardize to *that* level as yet. :) That would be a different PEP at some later stage after there's some field experience with the interface. >Moreover, the more Web applications appear like normal Python packages >and programs, the easier they are likely to be to deploy, especially if >the means of deployment doesn't involve uploading some archive file to >some Web application and clicking a "restart" button - something that >turned many people away from Zope, I'd wager. In certain circumstances, >the lightweight "micro-servers" really do have their advantages... Well, all of the frameworks are free to "innovate" as much as they like in this respect. Different systems have different user audiences. Personally, I wouldn't mind someday seeing a standardized .zip format for deploying applications to a web server... as long as you can build it with the distutils! From pje at telecommunity.com Thu Aug 5 18:19:50 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Aug 5 18:15:45 2004 Subject: [Web-SIG] Asynchronous streaming in WSGI Message-ID: <5.1.1.6.0.20040805120355.01ed23e0@mail.telecommunity.com> I've been looking at a possible change to the WSGI protocol to address some issues raised by Grisha and Ian. But I'm not sure that the change is best, given the range of existing platforms and applications that may *currently* use asynchronous streaming of responses, even though in many ways the change would handle asynchronous streaming *better*. Let me explain. The previous WSGI proposal was based on an interface like: def runCGI(inp,out,err,env): # do everything The modified interface, that I've been playing with in peak.web is: def handle_http(env): return status_string,header_list,output_iterable The ideas that changed here are: * Separate status from headers and output * Don't require servers to parse headers or create an output buffer * Allow lengthy output to be streamed *after* the function returns, to avoid tying up a task thread in multi-threaded servers * Allow non-CGI variables (e.g. 'wsgi.input_stream', 'wsgi.error_stream', 'wsgi.version', 'wsgi.multi_threaded', etc.) in the environment to avoid a separate configuration method and simplify chaining of processors As a result of these changes, it should also be much easier to write request preprocessors, response postprocessors, and other kinds of intermediaries between the web server and the actual application/frameworks, because less parsing and buffering are required. Last, but not least, an interface like this should be easier to implement in asynchronous web servers, because they can just invoke 'iterator.next()' when they need another block to send out. I think these are improvements in the direction that folks requested, *except* for one issue: unbuffered streaming output in existing code can't use this. A prime example is Zope, whose response.write() method does streaming output. Under the revised WSGI, there's nothing to write *to*, so such existing code would have to run in a separate thread from the web server and communicate via a queue. This doesn't seem like a great idea. So, there are several possible ways to deal with this: 1) Stick with the old interface 2) Go with the newer interface, and try to lobby frameworks that support this type of "push" to make changes to support it 3) Publish both interfaces, and push for a stdlib module that can convert between them 4) Some other idea I haven't thought of :) Opinions? Questions? Ideas? From pje at telecommunity.com Mon Aug 9 01:59:14 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 9 02:12:15 2004 Subject: [Web-SIG] The rewritten WSGI pre-PEP Message-ID: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> This version is an almost complete rewrite, based on a new interface approach developed by Tony Lownds and I. As you'll see, it tries to address as much of the list's feedback as I could absorb and remember. So, please be patient with me if I missed taking something into account. As always, your comments and feedback are appreciated. PEP: XXX Title: Python Web Server Gateway Interface v1.0 Version: $Revision: 1.1 $ Last-Modified: $Date: 2004/08/08 19:48:42 $ Author: Phillip J. Eby Discussions-To: Python Web-SIG Status: Draft Type: Informational Content-Type: text/x-rst Created: 07-Dec-2003 Post-History: 07-Dec-2003, 08-Aug-2004 Abstract ======== This document specifies a proposed standard interface between web servers and Python web applications or frameworks, to promote web application portability across a variety of web servers. Rationale ========= Python currently boasts a wide variety of web application frameworks, such as Zope, Quixote, Webware, Skunkware, PSO, and Twisted Web -- to name just a few [1]_. This wide variety of choices can be a problem for new Python users, because generally speaking, their choice of web framework will limit their choice of usable web servers, and vice versa. By contrast, although Java has just as many web application frameworks available, Java's "servlet" API makes it possible for applications written with any Java web application framework to run in any web server that supports the servlet API. The availability and widespread use of such an API in web servers for Python -- whether those servers are written in Python (e.g. Medusa), embed Python (e.g. mod_python), or invoke Python via a gateway protocol (e.g. CGI, FastCGI, etc.) -- would separate choice of framework from choice of web server, freeing users to choose a pairing that suits them, while freeing framework and server developers to focus on their area of specialty. This PEP, therefore, proposes a simple and universal interface between web servers and web applications or frameworks: the Python Web Server Gateway Interface (WSGI). But the mere existence of a WSGI spec does nothing to address the existing state of servers and frameworks for Python web applications. Server and framework authors and maintainers must actually implement WSGI for there to be any effect. However, since no existing servers or frameworks support WSGI, there is little immediate reward for an author who implements WSGI support. Thus, WSGI *must* be easy to implement, so that an author's initial investment in the interface can be reasonably low. Thus, simplicity of implementation on *both* the server and framework sides of the interface is absolutely critical to the utility of the WSGI interface, and is therefore the principal criterion for any design decisions. (It should also be easy to create request preprocessors, response postprocessors, and other "middleware" components that look like an application to their containing server, while acting as a server for their contained applications.) Note, however, that simplicity of implementation for a framework author is not the same thing as ease of use for a web application author. WSGI presents an absolutely "no frills" interface to the framework author, because bells and whistles like response objects and cookie handling would just get in the way of existing frameworks' handling of these issues. Again, the goal of WSGI is to facilitate easy interconnection of existing servers and applications or frameworks, not to create a new web framework. Note also that this goal precludes WSGI from requiring anything that is not already available in deployed versions of Python. Therefore, new standard library modules are not proposed or required by this specification, and nothing in WSGI requires a Python version greater than 1.5.2. (It would be a good idea, however, for future versions of Python to include support for this interface in web servers provided by the standard library.) Finally, the current version of WSGI does not prescribe any particular mechanism for "deploying" an application for use with a web server or server gateway. At the present time, this is necessarily implementation-defined by the server or gateway. After a sufficient number of servers and frameworks have implemented WSGI to provide field experience with varying deployment requirements, it may make sense to create another PEP, describing a deployment standard for WSGI servers and application frameworks. Specification Overview ====================== The WSGI interface has two sides: the "server" or "gateway" side, and the "application" side. The server side invokes a callable object that is provided by the application side. The specifics of how that object is provided are up to the server or gateway. It is assumed that some servers or gateways will require an application's deployer to write a short script to create an instance of the server or gateway, and supply it with the application object. Other servers and gateways may use configuration files or other mechanisms to specify where the application object should be imported from. The application object is simply a callable object that accepts two arguments. The term "object" should not be misconstrued as requiring an actual object instance: a function, method, class, or instance with a ``__call__`` method are all acceptable for use as an application object. Here are two example application objects; one is a function, and the other is a class:: def simple_app(environ, start_response): """Simplest possible application object""" status = '200 OK' headers = [('Content-type','text/plain')] write = start_response(status, headers) write('Hello world!\n') class AppClass: """Much the same thing, but as a class""" def __init__(environ, start_response): self.environ = environ self.start = start_response def __iter__(self): status = '200 OK' headers = [('Content-type','text/plain')] self.start(status, headers) yield "Hello world!\n" for i in range(1,11): yield "Extra line %s\n" % i The server or gateway invokes the application once for each request it receives from a web browser. To illustrate, here is a simple CGI gateway, implemented as a function taking an application object (all error handling omitted):: import os, sys def run_with_cgi(application): environ = {} envrion.update(os.environ) environ['wsgi.input'] = sys.stdin environ['wsgi.errors'] = sys.stderr environ['wsgi.version'] = '1.0' environ['wsgi.multithread'] = False environ['wsgi.multiprocess'] = True def start_response(status,headers): print "Status:", status for key,val in headers: print "%s: %s" % (key,val) return sys.stdout.write result = application(environ, start_response) if result: try: for data in result: sys.stdout.write(data) finally: if hasattr(result,'close'): result.close() In the next section, we will specify the precise semantics that these illustrations are examples of. Specification Details ===================== The application object must accept two positional arguments. For the sake of illustration, we have named them ``environ``, and ``start_response``, but they are not required to have these names. A server or gateway *must* invoke the application object using positional (not keyword) arguments. The first parameter is a dictionary object, containing CGI-style environment variables. This object *must* be a builtin Python dictionary (*not* a subclass, ``UserDict`` or other dictionary emulation), and the application is allowed to modify the dictionary in any way it desires. The dictionary must also include certain WSGI-required variables (described in a later section), and may also include server-specific extension variables, named according to a convention that will be described below. The second parameter is a callable accepting two positional arguments: a status string of the form ``"999 Message here"``, and a list of ``(header_name,header_value)`` tuples describing the HTTP response header. This callable must return another callable that takes one parameter: a string to write as part of the HTTP response body. The application object may return either ``None`` (indicating that there is no additional output), or it may return a non-empty iterable yielding strings. (For example, it could be a generator-iterator that yields strings, or it could be a sequence such as a list of strings.) If the application returns an iterable, and the iterable has a ``close()`` method, the server or gateway *must* call that method upon completion of the current request, whether the request was completed normally, or terminated early due to an error. (This is to support resource release by the application. The specific protocol is intended to support PEP 325, and also the simple case of an application returning an open text file.) ``environ`` Variables --------------------- The ``environ`` dictionary is required to contain CGI environment variables, as defined by the Common Gateway Interface specification [2]_. In addition, it must contain the following WSGI-defined variables: ==================== ============================================= Variable Value ==================== ============================================= ``wsgi.version`` The string ``"1.0"`` ``wsgi.input`` An input stream from which the HTTP request body can be read. ``wsgi.errors`` An output stream to which error output can be written. For most servers, this will be the server's error log. ``wsgi.multithread`` This value should be true if the application object may be simultaneously invoked by another thread in the same process, and false otherwise. ``wsgi.multiprocess`` This value should be true if an equivalent application object may be simultaneously invoked by another process, and false otherwise. ==================== ============================================= Finally, the ``environ`` dictionary may also contain server-defined variables. These variables should be named using only lower-case letters, numbers, dots, and underscores, and should be prefixed with a name that is unique to the defining server or gateway. For example, ``mod_python`` might define variables with names like ``mod_python.some_variable``. Note: missing variables (such as ``REMOTE_USER`` when no authentication has occurred) should be left out of the ``environ`` dictionary. Also note that CGI-defined variables must be strings, if they are present at all. It is a violation of this specification for a CGI variable's value to be of any type other than ``str``. Input and Error Streams ~~~~~~~~~~~~~~~~~~~~~~~ The input and error streams provided by the server must support the following methods: =================== ========= ======== Method Files Notes =================== ========= ======== ``read(size)`` ``input`` ``readline()`` ``input`` 1 ``readlines(hint)`` ``input`` 2 ``__iter__()`` ``input`` ``flush()`` ``errors`` 3 ``write(str)`` ``errors`` ``writelines(seq)`` ``errors`` =================== ========== ======== The semantics of each method are as documented in the Python Library Reference, except for these notes as listed in the table above: 1. The optional "size" argument to ``readline()`` is not supported, as it may be complex for server authors to implement, and is not often used in practice. 2. Note that the ``hint`` argument to ``readlines()`` is optional for both caller and implementer. The application is free not to supply it, and the server or gateway is free to ignore it. 3. Since the ``errors`` stream may not be rewound, a container is free to forward write operations immediately, without buffering. In this case, the ``flush()`` method may be a no-op. Portable applications, however, cannot assume that output is unbuffered or that ``flush()`` is a no-op. They must call ``flush()`` if they need to ensure that output has in fact been written. (For example, to minimize intermingling of data from multiple processes writing to the same error log. The methods listed in the table above *must* be supported by all servers conforming to this specification. Applications conforming to this specification *must not* use any other methods or attributes of the ``input`` or ``errors`` objects. In particular, applications *must not* attempt to close these streams, even if they possess ``close()`` methods. The ``start_response()`` Callable --------------------------------- The second parameter passed to the application object is itself a two-argument callable, used to begin the HTTP response and return a ``write()`` function. The first parameter it takes is a "status" string, of the form ``"999 Message here"``, where ``999`` is replaced with the HTTP status code, and ``Message here`` is replaced with the appropriate message text. The string *must* be pure 7-bit ASCII, containing no control characters. In particular, it must not be terminated with a carriage return or linefeed. The second parameter accepted by the ``start_response()`` callable must be a sequence of ``(header_name,header_value)`` tuples. Each ``header_name`` must be a valid HTTP header name, without a trailing colon or other punctuation. Each ``header_value`` *must not* include a trailing carriage return or linefeed: it should be a raw header value. (These requirements are to minimize the complexity of parsing required by servers, gateways, and intermediate response processors that need to inspect or modify response headers.) The return value of the ``start_response()`` callable is a one-argument callable, that accepts strings to write as part of the HTTP response body. Implementation/Application Notes ================================ Unicode ------- HTTP does not directly support Unicode, and neither does this interface. All encoding/decoding must be handled by the application; all strings and streams passed to or from the server must be standard Python byte strings, not Unicode objects. The result of using a Unicode object where a string object is required, is undefined. Multiple Invocations -------------------- Application objects must be able to be invoked more than once, since virtually all servers/gateways will make such requests. Error Handling -------------- Servers *should* trap and log exceptions raised by applications, and *may* continue to execute, or attempt to shut down gracefully. Applications *should* avoid allowing exceptions to escape their execution scope, since the result of uncaught exceptions is server-defined. Thread Support -------------- Thread support, or lack thereof, is also server-dependent. Servers that can run multiple requests in parallel, *should* also provide the option of running an application in a single-threaded fashion, so that applications or frameworks that are not thread-safe may still be used with that server. Application Configuration ------------------------- This specification does not define how a server selects or obtains an application to invoke. These and other configuration options are highly server-specific matters. It is expected that server/gateway authors will document how to configure the server to execute a particular application object, and with what options (such as threading options). Framework authors, on the other hand, should document how to create an application object that wraps their framework's functionality. The user, who has chosen both the server and the application framework, must connect the two together. However, since both the framework and the server now have a common interface, this should be merely a mechanical matter, rather than a significant engineering effort for each new server/framework pair. Middleware ---------- Note that a single object may play the role of a server with respect to some application(s), while also acting as an application with respect to some server(s). Such "middleware" components can perform such functions as: * Routing a request to different application objects based on the target URL, after rewriting the ``environ`` accordingly. * Allowing multiple applications or frameworks to run side-by-side in the same process * Load balancing and remote processing, by forwarding requests and responses over a network * Perform content postprocessing, such as applying XSL stylesheets Given the existence of applications and servers conforming to this specification, the appearance of such reusable middleware becomes a possibility. Questions and Answers ===================== 1. Why must ``environ`` be a dictionary? What's wrong with using a subclass? The rationale for requiring a dictionary is to maximize portability between servers. The alternative would be to define some subset of a dictionary's methods as being the standard and portable interface. In practice, however, most servers will probably find a dictionary adequate to their needs, and thus framework authors will come to expect the full set of dictionary features to be available, since they will be there more often than not. But, if some server chooses *not* to use a dictionary, then there will be interoperability problems despite that server's "conformance" to spec. Therefore, making a dictionary mandatory simplifies the specification and guarantees interoperabilty. Note that this does not prevent server or framework developers from offering specialized services as custom variables *inside* the ``environ`` dictionary. This is the recommended approach for offering any such value-added services. 2. Why can you call ``write()`` *and* yield strings/return an iterator? Shouldn't we pick just one way? If we supported only the iteration approach, then current frameworks that assume the availability of "push" suffer. But, if we only support pushing via ``write()``, then server performance suffers for transmission of e.g. large files (if a worker thread can't start on a new request until all of the output has been sent). Thus, this compromise allows an application framework to support both approaches, as appropriate, but with only a little more burden to the server implementor than a push-only approach would require. 3. What's the ``close()`` for? When writes are done from during the execution of an application object, the application can ensure that resources are released using a try/finally block. But, if the application returns an iterator, any resources used will not be released until the iterator is garbage collected. The ``close()`` idiom allows an application to release critical resources at the end of a request, and it's forward-compatible with the support for try/finally in generators that's proposed by PEP 325. 4. Why is this interface so low-level? I want feature X! (e.g. cookies, sessions, persistence, ...) This isn't Yet Another Python Web Framework. It's just a way for frameworks to talk to web servers, and vice versa. If you want these features, you need to pick a web framework that provides the features you want. And if that framework lets you create a WSGI application, you should be able to run it in most WSGI-supporting servers. Also, some WSGI servers may offer additional services via objects provided in their ``environ`` dictionary; see the applicable server documentation for details. (Of course, applications that use such extensions will not be portable to other WSGI-based servers.) Acknowledgements ================ Thanks go to the many folks on the Web-SIG mailing list whose thoughtful feedback made this revised draft possible. Especially: * Gregory "Grisha" Trubetskoy, author of ``mod_python``, who beat up on the first draft as not offering any advantages over "plain old CGI", thus encouraging me to look for a better approach. * Ian Bicking, who helped nag me into properly specifying the multithreading and multiprocess options, as well as badgering me to provide a mechanism for servers to supply custom extension data to an application. * Tony Lownds, who came up with the concept of a ``start_response`` function that took the status and headers, returning a ``write`` function. References ========== .. [1] The Python Wiki "Web Programming" topic (http://www.python.org/cgi-bin/moinmoin/WebProgramming) .. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: From smulloni at smullyan.org Mon Aug 9 04:03:23 2004 From: smulloni at smullyan.org (Jacob Smullyan) Date: Mon Aug 9 04:03:25 2004 Subject: [Web-SIG] The rewritten WSGI pre-PEP In-Reply-To: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> Message-ID: <20040809020323.GA21842@smullyan.org> On Sun, Aug 08, 2004 at 07:59:14PM -0400, Phillip J. Eby wrote: > Python currently boasts a wide variety of web application > frameworks, such as Zope, Quixote, Webware, Skunkware, PSO, > and Twisted Web -- to name just a few [1]_. If you must continue to call SkunkWeb SkunkWare, please be consistent and call Webware Wareweb. Cheers, Jacob Smullyan -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://mail.python.org/pipermail/web-sig/attachments/20040808/ac457e7e/attachment.pgp From pje at telecommunity.com Mon Aug 9 06:00:30 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 9 05:56:17 2004 Subject: [Web-SIG] The rewritten WSGI pre-PEP In-Reply-To: <20040809020323.GA21842@smullyan.org> References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040808235422.029de490@mail.telecommunity.com> At 10:03 PM 8/8/04 -0400, Jacob Smullyan wrote: >On Sun, Aug 08, 2004 at 07:59:14PM -0400, Phillip J. Eby wrote: > > Python currently boasts a wide variety of web application > > frameworks, such as Zope, Quixote, Webware, Skunkware, PSO, > > and Twisted Web -- to name just a few [1]_. > >If you must continue to call SkunkWeb SkunkWare, please be consistent >and call Webware Wareweb. > >Cheers, > >Jacob Smullyan Crap. Sorry about that. Changing it in my file copy now. (I'm surprised you didn't mention it back in December, though, as that error was in the first draft, too.) On the bright side: out of dozens of frameworks, at least yours got mentioned. ;) Maybe I shouldn't have named any, since yours was the second framework I got the name wrong on. From ianb at colorstudy.com Wed Aug 11 08:42:22 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Aug 11 08:42:27 2004 Subject: [Web-SIG] The rewritten WSGI pre-PEP In-Reply-To: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> Message-ID: <4119BFCE.4080207@colorstudy.com> It looks great to me. Of course, I got all my wishes. A couple smaller things, and some possible clarifications: Phillip J. Eby wrote: > Specification Overview > ====================== > > The WSGI interface has two sides: the "server" or "gateway" side, > and the "application" side. The server side invokes a callable > object that is provided by the application side. The specifics > of how that object is provided are up to the server or gateway. > It is assumed that some servers or gateways will require an > application's deployer to write a short script to create an > instance of the server or gateway, and supply it with the > application object. Other servers and gateways may use > configuration files or other mechanisms to specify where the > application object should be imported from. > > The application object is simply a callable object that accepts > two arguments. The term "object" should not be misconstrued as > requiring an actual object instance: a function, method, class, > or instance with a ``__call__`` method are all acceptable for > use as an application object. Here are two example application > objects; one is a function, and the other is a class:: > > def simple_app(environ, start_response): > """Simplest possible application object""" > status = '200 OK' > headers = [('Content-type','text/plain')] > write = start_response(status, headers) > write('Hello world!\n') The callables are a little confusing to me. The application is a callable. Start_response is a callable. It returns a callable. Of course, if it wasn't a callable, it would be an object with only one method, which is kind of boring. A contrary example to this would be iterators, which have basically one method in their interface (next); yet they are not simply callables. I'm not of strong opinion, but the callables definitely make it harder to understand. > ``environ`` Variables > --------------------- > > The ``environ`` dictionary is required to contain CGI environment > variables, as defined by the Common Gateway Interface specification > [2]_. In addition, it must contain the following WSGI-defined > variables: > > ==================== ============================================= > Variable Value > ==================== ============================================= > ``wsgi.version`` The string ``"1.0"`` Would it make sense for this to be a tuple, like (1, 0), like sys.version_info? > ``wsgi.input`` An input stream from which the HTTP request > body can be read. > > ``wsgi.errors`` An output stream to which error output can > be written. For most servers, this will be > the server's error log. > > ``wsgi.multithread`` This value should be true if the application > object may be simultaneously invoked by > another thread in the same process, and > false otherwise. > > ``wsgi.multiprocess`` This value should be true if an equivalent > application object may be simultaneously > invoked by another process, and false > otherwise. > ==================== ============================================= Another useful one I brought up last time would be some indication that the application was definitely not going to be reused, i.e., it's being invoked in a CGI context. The performance issues there are completely different than in other environments. Webware has a CGI interface, but it suffers from being really slow. It could be faster, but everything is optimized toward the long-running case. I think CGI could be made to perform better, putting in information to know when to do those optimizations would leave that door open. Another common use case would be sessions. It's best to preserve sessions over server restarts, but you might keep sessions in memory and only write to disk when the server shuts down. If it's a CGI request, you can skip all that and just write to disk immediately. > .. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft > (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt) I think before we discussed being explicit about a couple variables. Specifically that SCRIPT_NAME should refer to the application's root, and PATH_INFO to everything that comes after. This is in contrast to a situation where SCRIPT_NAME points to the WSGI server, and PATH_INFO to the application (in a case where the server hosts multiple applications at different URLs). Your CGI example avoids this issue because it only supports one application, but a naive extension of that example to support more applications might improperly set these variables. Should there be any policy about path segments containing //, ./, or ../? Hmm... what should the server do if it gets a Location header with no Status? I think Apache does an internal redirect, sometimes. Should there be any notion of an internal redirect? The CGI spec seems to require internal redirects in this case. The CGI spec says servers should change the current working directory to the resource being run. I think this won't be that common for WSGI servers, though. I wonder if this will be an issue with imports. Specifically, relative imports. Eh, I guess that's an application issue. Will GATEWAY_INTERFACE be defined? If so, what value? "WSGI/1.0"? I assume SERVER_SOFTWARE will be up to the WSGI server. Should they be sure to rewrite this value if these servers are nested? E.g., should your CGI example rewrite that value? It seems like each piece adds another name to the end in the format "name/version_number", where the name has no spaces. And it might optionally have more information in parenthesis after the version, which may contain spaces. Maybe this should be a suggestion. Is there any non-parsed header form? This would be difficult to support in some environments. Easy in BasicHTTPServer, but hard with a CGI server. This is from the CGI spec: Scripts MUST be prepared to handled URL-encoded values in metavariables. In addition, they MUST recognise both "+" and "%20" in URL-encoded quantities as representing the space character. (See section 3.1.) That seems weird; I've never URL-decoded values besides QUERY_STRING. The CGI spec doesn't seem to mention REQUEST_URI. That's surprising. Here's the Apache CGI variables it doesn't mention: SERVER_SIGNATURE (pretty boring) SERVER_ADDR (seems very basic) DOCUMENT_ROOT (doesn't seem appropriate) SCRIPT_FILENAME (also often not appropriate) SERVER_ADMIN (boring) SCRIPT_URI REQUEST_URI (I don't understand the distinction) REMOTE_PORT (boring, though I guess if you wanted to add an ident check it would be useful) UNIQUE_ID (not needed) I think SERVER_ADDR and REMOTE_PORT are easy to add, and potentially useful. SCRIPT_URI and REQUEST_URI might be good. For middleware application/servers, it might be suggested that they use mod_rewrites extra variables (http://httpd.apache.org/docs/mod/mod_rewrite.html#EnvVar): This module keeps track of two additional (non-standard) CGI/SSI environment variables named SCRIPT_URL and SCRIPT_URI. These contain the logical Web-view to the current resource, while the standard CGI/SSI variables SCRIPT_NAME and SCRIPT_FILENAME contain the physical System-view. Notice: These variables hold the URI/URL as they were initially requested, i.e., before any rewriting. This is important because the rewriting process is primarily used to rewrite logical URLs to physical pathnames. Example: SCRIPT_NAME=/sw/lib/w3s/tree/global/u/rse/.www/index.html SCRIPT_FILENAME=/u/rse/.www/index.html SCRIPT_URL=/u/rse/ SCRIPT_URI=http://en1.engelschall.com/u/rse/ From fredrik at pythonware.com Wed Aug 11 13:04:50 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Aug 11 13:20:38 2004 Subject: [Web-SIG] Re: The rewritten WSGI pre-PEP References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> Message-ID: Phillip J. Eby wrote: > As always, your comments and feedback are appreciated. > def run_with_cgi(application): > > environ = {} > envrion.update(os.environ) NameError > environ['wsgi.input'] = sys.stdin > environ['wsgi.errors'] = sys.stderr > environ['wsgi.version'] = '1.0' > environ['wsgi.multithread'] = False > environ['wsgi.multiprocess'] = True The answer's probably hidden somewhere in the mailing list archives, but why do you mix WSGI variables with external CGI environment variables? I'd prefer def application(context, environ, start_response) where context is an object of a server-defined type, with attributes for input, errors, etc: context = MyApplicationServerContext() context.input = sys.stdin context.errors = sys.stderr context.version = "1.0" (or (1, 0)) etc Advantages: - contexts can (probably) be reused - attributes can be lazily initialized (via properties or getattr hooks) - the user code looks nicer - future safe: more attributes and methods can be added to the context object in future revisions of this specification, without changing the function signatures Disadvantages: - one more argument; but if that's really a problem, why not make start_response a method of the context class? def application(context, environ): ... context.start(status, headers) > The second parameter passed to the application object is itself a > two-argument callable, used to begin the HTTP response and return > a ``write()`` function. The first parameter it takes is a "status" > string, of the form ``"999 Message here"``, where ``999`` is replaced > with the HTTP status code, and ``Message here`` is replaced with the > appropriate message text. To make life easier for users, you might wish to accept either an integer status code (e.g. start(200, headers)) or a string. In case a status code is provided, the server can fill in a suitable string value (as per the HTTP specification). Except for those small nits, I'm totally +1 on this proposal. From ianb at colorstudy.com Wed Aug 11 17:54:36 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Aug 11 17:55:29 2004 Subject: [Web-SIG] Re: The rewritten WSGI pre-PEP In-Reply-To: References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> Message-ID: <411A413C.2050803@colorstudy.com> Fredrik Lundh wrote: > Disadvantages: > - one more argument; but if that's really a problem, why not make > start_response a method of the context class? > > def application(context, environ): > ... > context.start(status, headers) This would solve the too-many-callables problem as well. However, because the context could have a complex implementation, it would be hard to rewrite the context if you forward the request. OTOH, most of the pieces of the context shouldn't be forwarded on. For instance, if mod_python gives access to the apache module, or the original request object, should middleware pass through that access? It would probably be incorrect, as the middleware is doing some filtering and the mod_python extensions would bypass that filtering. Which is to say, middleware shouldn't pass through extensions by default, but with a dictionionary implementation it would be common to do so. One positive aspect of a dictionary is that introspection is easier. There's no reliable equivalent of .keys() for an arbitrary object. And, if we package things into an object, environ could also become an attribute of context. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Wed Aug 11 18:44:18 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Aug 11 18:44:23 2004 Subject: [Web-SIG] The rewritten WSGI pre-PEP In-Reply-To: <4119BFCE.4080207@colorstudy.com> References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040811122521.01ea8c60@mail.telecommunity.com> At 01:42 AM 8/11/04 -0500, Ian Bicking wrote: >The callables are a little confusing to me. The application is a >callable. Start_response is a callable. It returns a callable. Of >course, if it wasn't a callable, it would be an object with only one >method, which is kind of boring. > >A contrary example to this would be iterators, which have basically one >method in their interface (next); yet they are not simply callables. It's assumed that iterators may have other behaviors. In any case, I certainly made use of iterators and methods where appropriate, i.e. in the return value of the application, which can support __iter__(), next(), and close() if they are needed. >I'm not of strong opinion, but the callables definitely make it harder to >understand. ...but easier to implement, since everything can be done with functions and closures. Do you think you would have difficulty creating a conforming implementation, or are you just saying it took you a while to grasp how you would do so? >>==================== ============================================= >>Variable Value >>==================== ============================================= >>``wsgi.version`` The string ``"1.0"`` > >Would it make sense for this to be a tuple, like (1, 0), like >sys.version_info? Maybe. I'm not sure it makes any difference. I could just as soon drop versioning altogether and just use the presence or absence of feature keys as the means of determining the version. >Another useful one I brought up last time would be some indication that >the application was definitely not going to be reused, i.e., it's being >invoked in a CGI context. The performance issues there are completely >different than in other environments. Okay... how about 'wsgi.last_call', which is a true value if this invocation of the application will *probably* be the last? IOW, the server need not guarantee that the app will *not* be called again; this is just a "suggestion". >>.. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft >> (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt) > >I think before we discussed being explicit about a couple variables. >Specifically that SCRIPT_NAME should refer to the application's root, and >PATH_INFO to everything that comes after. Good point; I'll update this. >Should there be any policy about path segments containing //, ./, or ../? What do you have in mind? >Hmm... what should the server do if it gets a Location header with no Status? There's no such thing; there's always a status under this spec. However, what happens to the HTTP headers passed to 'start_response()' could perhaps be made clearer. >The CGI spec says servers should change the current working directory to >the resource being run. I think this won't be that common for WSGI >servers, though. Do you think this needs to be stated? WSGI only references CGI with respect to environment variables. >Will GATEWAY_INTERFACE be defined? If so, what value? "WSGI/1.0"? I >assume SERVER_SOFTWARE will be up to the WSGI server. Should they be sure >to rewrite this value if these servers are nested? E.g., should your CGI >example rewrite that value? It seems like each piece adds another name to >the end in the format "name/version_number", where the name has no >spaces. And it might optionally have more information in parenthesis >after the version, which may contain spaces. Maybe this should be a >suggestion. The normal value of the CGI variables should be server-defined. WSGI variables should be out-of-band. >Is there any non-parsed header form? The entire thing is "non-parsed headers". They're a list of tuples. If you mean, can you stop a web server from adding/changing headers according to its whims, then no, you can't. >This is from the CGI spec: > > Scripts MUST be prepared to handled URL-encoded values in > metavariables. In addition, they MUST recognise both "+" and > "%20" in URL-encoded quantities as representing the space > character. (See section 3.1.) > >That seems weird; I've never URL-decoded values besides QUERY_STRING. That's probably an addition to the 1.1 spec. However, ISTM I've seen code in Zope that expects to decode path segments. I could be wrong. >The CGI spec doesn't seem to mention REQUEST_URI. That's surprising. >Here's the Apache CGI variables it doesn't mention: > >SERVER_SIGNATURE (pretty boring) >SERVER_ADDR (seems very basic) >DOCUMENT_ROOT (doesn't seem appropriate) >SCRIPT_FILENAME (also often not appropriate) >SERVER_ADMIN (boring) >SCRIPT_URI >REQUEST_URI (I don't understand the distinction) >REMOTE_PORT (boring, though I guess if you wanted to add an ident check it >would be useful) >UNIQUE_ID (not needed) > > >I think SERVER_ADDR and REMOTE_PORT are easy to add, and potentially >useful. SCRIPT_URI and REQUEST_URI might be good. Sigh. I guess maybe I'll have to go back and pick out variables one by one. However, I don't think *any* of the variables you listed should be required to exist. For one thing, it's much easier to write middleware if you only have to munge SCRIPT_NAME and PATH_INFO during traversals. From pje at telecommunity.com Wed Aug 11 18:52:25 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Aug 11 18:52:29 2004 Subject: [Web-SIG] Re: The rewritten WSGI pre-PEP In-Reply-To: References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040811124440.03a6cd20@mail.telecommunity.com> At 01:04 PM 8/11/04 +0200, Fredrik Lundh wrote: >Phillip J. Eby wrote: > > > As always, your comments and feedback are appreciated. > > def run_with_cgi(application): > > > > environ = {} > > envrion.update(os.environ) > >NameError [added to to-do list] > > environ['wsgi.input'] = sys.stdin > > environ['wsgi.errors'] = sys.stderr > > environ['wsgi.version'] = '1.0' > > environ['wsgi.multithread'] = False > > environ['wsgi.multiprocess'] = True > >The answer's probably hidden somewhere in the mailing list archives, but why >do you mix WSGI variables with external CGI environment variables? > >I'd prefer > > def application(context, environ, start_response) > >where context is an object of a server-defined type, with attributes for >input, errors, etc: > > context = MyApplicationServerContext() > context.input = sys.stdin > context.errors = sys.stderr > context.version = "1.0" (or (1, 0)) > etc > >Advantages: >- contexts can (probably) be reused >- attributes can be lazily initialized (via properties or getattr hooks) >- the user code looks nicer >- future safe: more attributes and methods can be added to the context > object in future revisions of this specification, without changing the > function signatures All of these advantages also apply to an object supplied in the dictionary, i.e.: environ['some_server.context'] = context_object >Disadvantages: >- one more argument; but if that's really a problem, why not make > start_response a method of the context class? > > def application(context, environ): > ... > context.start(status, headers) The advantage is simplicity of implementation. It's possible to write middleware (application that's also a server) without creating any new classes. In essence, WSGI is an almost pure-functional architecture, which makes it (IMO) easier to reason about. > > The second parameter passed to the application object is itself a > > two-argument callable, used to begin the HTTP response and return > > a ``write()`` function. The first parameter it takes is a "status" > > string, of the form ``"999 Message here"``, where ``999`` is replaced > > with the HTTP status code, and ``Message here`` is replaced with the > > appropriate message text. > >To make life easier for users, you might wish to accept either an integer >status code (e.g. start(200, headers)) or a string. In case a status code >is provided, the server can fill in a suitable string value (as per the HTTP >specification). I thought about this, but the diffference between '200' and '"200 OK"' is so trivial as to be unuseful compared to the scope creep for the server's implementation. That is, allowing this means the server software has to have a list of the numeric statuses, versus an application author looking up the few that they actually want to use. Also, web frameworks often already have such a lookup table, so it seems to me that putting the responsibility on the application side is the better balance. >Except for those small nits, I'm totally +1 on this proposal. Thanks. I'll add your questions to the Q&A section. From pje at telecommunity.com Wed Aug 11 18:57:30 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Aug 11 18:57:35 2004 Subject: [Web-SIG] Re: The rewritten WSGI pre-PEP In-Reply-To: <411A413C.2050803@colorstudy.com> References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040811125244.03a790d0@mail.telecommunity.com> At 10:54 AM 8/11/04 -0500, Ian Bicking wrote: >However, because the context could have a complex implementation, it would >be hard to rewrite the context if you forward the request. OTOH, most of >the pieces of the context shouldn't be forwarded on. For instance, if >mod_python gives access to the apache module, or the original request >object, should middleware pass through that access? It would probably be >incorrect, as the middleware is doing some filtering and the mod_python >extensions would bypass that filtering. > >Which is to say, middleware shouldn't pass through extensions by default, >but with a dictionionary implementation it would be common to do so. Actually, the idea behind the naming convention is that middleware can filter out extensions if it needs to. It need only delete any lowercase key that doesn't begin with 'wsgi.' to remove all extensions, or it can be more specific, according to its needs. I didn't actually mention this in the spec, though, so I'll need to fix that. >One positive aspect of a dictionary is that introspection is easier. >There's no reliable equivalent of .keys() for an arbitrary object. > >And, if we package things into an object, environ could also become an >attribute of context. I'm -1 on making an object out of it. It will make the spec even longer than it already is, and it will increase the number of things to discuss. (E.g. names of the methods). From fredrik at pythonware.com Wed Aug 11 19:15:16 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Aug 11 19:40:21 2004 Subject: [Web-SIG] Re: Re: The rewritten WSGI pre-PEP References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> <5.1.1.6.0.20040811124440.03a6cd20@mail.telecommunity.com> Message-ID: Phillip J. Eby wrote: >>Advantages: >>- contexts can (probably) be reused >>- attributes can be lazily initialized (via properties or getattr hooks) >>- the user code looks nicer >>- future safe: more attributes and methods can be added to the context >> object in future revisions of this specification, without changing the >> function signatures > > All of these advantages also apply to an object supplied in the dictionary, i.e.: > > environ['some_server.context'] = context_object that's obviously not true: environment dictionaries cannot be reused, environment items cannot be lazily initialized (since you require apps to use a PyDict), and the code using WSGI variables has to use dict access syntax (x["y"]) instead of standard attribute access (x.y). > The advantage is simplicity of implementation. It's possible to write middleware (application > that's also a server) without creating any new classes. so "def a(b)" is easy to write, but "class a" is hard to write? you're obviously not interested in feedback from experienced Python programmers. I'm sorry I wasted everybody's time. From ianb at colorstudy.com Wed Aug 11 19:51:56 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Aug 11 19:52:47 2004 Subject: [Web-SIG] The rewritten WSGI pre-PEP In-Reply-To: <5.1.1.6.0.20040811122521.01ea8c60@mail.telecommunity.com> References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> <5.1.1.6.0.20040811122521.01ea8c60@mail.telecommunity.com> Message-ID: <411A5CBC.7080306@colorstudy.com> Phillip J. Eby wrote: >> I'm not of strong opinion, but the callables definitely make it harder >> to understand. > > > ...but easier to implement, since everything can be done with functions > and closures. > > Do you think you would have difficulty creating a conforming > implementation, or are you just saying it took you a while to grasp how > you would do so? No, I don't think it would make it any harder to implement. Mostly it's just harder to talk about. >>> ==================== ============================================= >>> Variable Value >>> ==================== ============================================= >>> ``wsgi.version`` The string ``"1.0"`` >> >> >> Would it make sense for this to be a tuple, like (1, 0), like >> sys.version_info? > > > Maybe. I'm not sure it makes any difference. I could just as soon drop > versioning altogether and just use the presence or absence of feature > keys as the means of determining the version. I think of the version as something of a contract. The WSGI server author can't deny that they intended to implement the full spec if they include the version number. Also it could be used like HTTP 1.1 sometimes is, like you must include a Host header if you claim to be talking 1.1. Similarly applications could require certain features if the server claims to talk, say, WSGI 1.1. >> Another useful one I brought up last time would be some indication >> that the application was definitely not going to be reused, i.e., it's >> being invoked in a CGI context. The performance issues there are >> completely different than in other environments. > > Okay... how about 'wsgi.last_call', which is a true value if this > invocation of the application will *probably* be the last? IOW, the > server need not guarantee that the app will *not* be called again; this > is just a "suggestion". Yes, that sounds good. >> Should there be any policy about path segments containing //, ./, or ../? > > > What do you have in mind? I don't know. Normalization, perhaps -- remove empty path segments, and resolve any relative paths. Which would mean something like: path = re.sub(r'/[^/]*/../', '/', path) path = re.sub(r'/./', '/', path) path = re.sub(r'//+', '/', path) I dunno... that should probably be up to the application. >> Hmm... what should the server do if it gets a Location header with no >> Status? > > There's no such thing; there's always a status under this spec. > However, what happens to the HTTP headers passed to 'start_response()' > could perhaps be made clearer. Okay, that's fine. Though any internal redirect would have to be done through an extension in that case. Though in practice internal redirects are kind of complicated to deal with anyway. Lots of linking confusion, lost headers, etc. >> The CGI spec says servers should change the current working directory >> to the resource being run. I think this won't be that common for WSGI >> servers, though. > > Do you think this needs to be stated? WSGI only references CGI with > respect to environment variables. Probably it's no big deal. >> This is from the CGI spec: >> >> Scripts MUST be prepared to handled URL-encoded values in >> metavariables. In addition, they MUST recognise both "+" and >> "%20" in URL-encoded quantities as representing the space >> character. (See section 3.1.) >> >> That seems weird; I've never URL-decoded values besides QUERY_STRING. > > > That's probably an addition to the 1.1 spec. However, ISTM I've seen > code in Zope that expects to decode path segments. I could be wrong. I would assume in that case it was decoding something that was encoded on the server side. E.g.: I/O library As opposed to the CGI gateway encoding any of its values. Even QUERY_STRING is encoded by the browser, not the gateway. Maybe this is just a case of HTTP issues leaking into the CGI spec. >> The CGI spec doesn't seem to mention REQUEST_URI. That's surprising. >> Here's the Apache CGI variables it doesn't mention: >> >> SERVER_SIGNATURE (pretty boring) >> SERVER_ADDR (seems very basic) >> DOCUMENT_ROOT (doesn't seem appropriate) >> SCRIPT_FILENAME (also often not appropriate) >> SERVER_ADMIN (boring) >> SCRIPT_URI >> REQUEST_URI (I don't understand the distinction) >> REMOTE_PORT (boring, though I guess if you wanted to add an ident >> check it would be useful) >> UNIQUE_ID (not needed) >> >> >> I think SERVER_ADDR and REMOTE_PORT are easy to add, and potentially >> useful. SCRIPT_URI and REQUEST_URI might be good. > > > Sigh. I guess maybe I'll have to go back and pick out variables one by > one. However, I don't think *any* of the variables you listed should be > required to exist. For one thing, it's much easier to write middleware > if you only have to munge SCRIPT_NAME and PATH_INFO during traversals. I've had constant problems trying to backtrack through middleware (like mod_rewrite) to figure out how to create a URL that is internal to the application. I'd like to keep around some artifact indicating what the original URI was (e.g., REQUEST_URI); something that middleware specifically should not rewrite. Nor is there any real reason for it to be rewritten. SERVER_ADDR and REMOTE_PORT also don't require any rewriting, and should just be passed through any middleware. Hmm... the CGI spec also leaves out any SSL variables. Those are, of course, all optional. But if the user connected via SSL, I think HTTPS=on should be required. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From tali.wang at gmail.com Wed Aug 11 20:19:51 2004 From: tali.wang at gmail.com (Taliesin Wang) Date: Wed Aug 11 20:19:56 2004 Subject: [Web-SIG] my view of python web app server Message-ID: hi,all, I'm new to here,and my english is not good.so, forgive me if grammer/spell mistake. My view on python web app server: 1, Try to implement a app server in Java wayis not valueable. To go this way, the best result we could archive is to make an "tomcat" clone. 2, The advangtage of python , I think is in ORM. in java, we need to do a lot of work to void strong type limitation, but in python, it could be much more easier. 3, For high traffic website, cache is used to avoid heavy db operation. For java, the operation of disk/memory is not flexable enough. But in python,it's more easier. >From ideas above, I suggest: 1, We go from ORM first, have a python version of hibernator first. As 80% of web programming is based on db operation(add/edit/remove a record). these object could be named PWO(Python Web Object). 2, Leave the main space to those PWO, and treat request/response/session concepts as "helper" class ,serve PWO. 3, highly intergratted db driver into App server. (An embedded template engine is welcome,too). the final goal is, the web app developper need not to know what db they use, and what's the schema the database is. just write PWOs , deployee them, then orgnazie them by Business Logic. -- Wang ----------------------------------------------- Email:tali.wang@gmail.com Mobile:+86-136-3281-4194 ----------------------------------------------- From pje at telecommunity.com Wed Aug 11 20:30:49 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Aug 11 20:30:57 2004 Subject: [Web-SIG] Re: Re: The rewritten WSGI pre-PEP In-Reply-To: References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> <5.1.1.6.0.20040811124440.03a6cd20@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040811140110.02a03770@mail.telecommunity.com> At 07:15 PM 8/11/04 +0200, Fredrik Lundh wrote: >Phillip J. Eby wrote: > > >>Advantages: > >>- contexts can (probably) be reused > >>- attributes can be lazily initialized (via properties or getattr hooks) > >>- the user code looks nicer > >>- future safe: more attributes and methods can be added to the context > >> object in future revisions of this specification, without changing the > >> function signatures > > > > All of these advantages also apply to an object supplied in the > dictionary, i.e.: > > > > environ['some_server.context'] = context_object > >that's obviously not true: environment dictionaries cannot be reused, >environment items cannot be lazily initialized (since you require apps >to use a PyDict), and the code using WSGI variables has to use dict >access syntax (x["y"]) instead of standard attribute access (x.y). I meant that a server-specific 'context_object' can have all those advantages, not that the dictionary would. In other words, I was suggesting that WSGI extensions could make use of all of these things, but I'd prefer that the core WSGI variables weren't presented that way. Given that all the WSGI-defined keys are strings or booleans, except for 'input' and 'errors', I don't see the advantage of lazy initialization for the spec-defined values. I will agree that user code would look nicer as attributes, but there are other ways to accomplish that, such as using constants for keys, e.g. 'environ[INPUT]', or functions 'input_of(environ)'. As for future safety, you can add as many framework- or server-specific keys, as long as you follow the naming convention. And those entries can be objects of whatever nature is desired. So really, the only thing that an object *adds* is a '.' syntax. But, this syntax doesn't easily allow for namespaces: if server A and server B both define a 'foo' method, but with different signatures, how can an application tell what kind of 'foo' it is? At least with a dictionary, the application object can look for 'server_A.foo' and 'server_B.foo' keys. Finally, although I do want it to be simple on both the server and app sides, please remember that this is primarily intended to be a server-to-framework protocol, not an API for writing applications. It's expected that normally the only code dealing directly with the WSGI 'environ' is either framework code, "middleware", or a server. > > The advantage is simplicity of implementation. It's possible to write > middleware (application > > that's also a server) without creating any new classes. > >so "def a(b)" is easy to write, but "class a" is hard to write? No, it's that one can write code to transform the dictionary in-place, while supplying an altered context object will require not just an extra class, but potentially error-prone code to copy attributes and delegate methods to the previous context object. That is, unless the spec requires that the context object allow arbitrary attributes to be set, or provides some extension dictionary, there's no way for a "middleware" component to add new behaviors. Whereas, under the present spec, it's just 'environ[somekey]=value', and pass the call on to the next "application" object. >you're obviously not interested in feedback from experienced Python >programmers. I'm sorry I wasted everybody's time. Huh? I thought we were just getting started. Ian argued with me for weeks to get a lot of the stuff in this draft that he wanted, so I'm not closed to feedback, just thickheaded. :) I'm also prone to not communicating all of my assumptions/conclusions about a design, because I think they're "obvious". So, feedback like yours forces me to elaborate on them, and if you can get me to understand your actual use case I'll try to incorporate it. If it means redoing the whole spec, so be it -- if you search for my first posting of the spec last December, you'll notice that this version is almost *nothing* like the original, which had objects and methods galore, by comparison. From pje at telecommunity.com Wed Aug 11 20:40:56 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Aug 11 20:41:02 2004 Subject: [Web-SIG] The rewritten WSGI pre-PEP In-Reply-To: <411A5CBC.7080306@colorstudy.com> References: <5.1.1.6.0.20040811122521.01ea8c60@mail.telecommunity.com> <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> <5.1.1.6.0.20040811122521.01ea8c60@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040811143129.02974c30@mail.telecommunity.com> At 12:51 PM 8/11/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>>>==================== ============================================= >>>>Variable Value >>>>==================== ============================================= >>>>``wsgi.version`` The string ``"1.0"`` >>> >>> >>>Would it make sense for this to be a tuple, like (1, 0), like >>>sys.version_info? >> >>Maybe. I'm not sure it makes any difference. I could just as soon drop >>versioning altogether and just use the presence or absence of feature >>keys as the means of determining the version. > >I think of the version as something of a contract. The WSGI server author >can't deny that they intended to implement the full spec if they include >the version number. Also it could be used like HTTP 1.1 sometimes is, >like you must include a Host header if you claim to be talking >1.1. Similarly applications could require certain features if the server >claims to talk, say, WSGI 1.1. Fair enough. Unless anybody else has any input one way or the other, we'll make it the tuple (1,0). >I've had constant problems trying to backtrack through middleware (like >mod_rewrite) to figure out how to create a URL that is internal to the >application. I'd like to keep around some artifact indicating what the >original URI was (e.g., REQUEST_URI); something that middleware >specifically should not rewrite. Nor is there any real reason for it to >be rewritten. Hm. And SCRIPT_NAME is insufficient for this? I think I can see why mod_rewrite would make this a problem, but ISTM that Python middleware component could do rewrites that left SCRIPT_NAME "logically correct". I'm more concerned that the presence of such a variable would encourage people to use it in ways that would ignore "rewritten" variables, thus breaking middleware. Meanwhile, the common solution I've seen to this issue in web applications is to have configuration for where the application is in URL-space. >SERVER_ADDR and REMOTE_PORT also don't require any rewriting, and should >just be passed through any middleware. Are you sure? SERVER_ADDR might be different if the request is forwarded to another machine, mightn't it? I seem to recall that mod_backhand does some stuff with this. In any case it highlights the trouble with trying to precisely pin down things that are already inherently implementation-defined. Unfortunately, WSGI isn't really going to eliminate all the environment introspecting and munging code that lives in the various existing apps and frameworks today. > Hmm... the CGI spec also leaves out any SSL variables. Those are, of > course, all optional. But if the user connected via SSL, I think > HTTPS=on should be required. I'll add something about this, and maybe some sort of a general note about the inherent implementation-specificness of CGI variables. :( From ianb at colorstudy.com Wed Aug 11 21:20:23 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Aug 11 21:21:11 2004 Subject: [Web-SIG] The rewritten WSGI pre-PEP In-Reply-To: <5.1.1.6.0.20040811143129.02974c30@mail.telecommunity.com> References: <5.1.1.6.0.20040811122521.01ea8c60@mail.telecommunity.com> <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com> <5.1.1.6.0.20040811122521.01ea8c60@mail.telecommunity.com> <5.1.1.6.0.20040811143129.02974c30@mail.telecommunity.com> Message-ID: <411A7177.5040108@colorstudy.com> Phillip J. Eby wrote: >> I've had constant problems trying to backtrack through middleware >> (like mod_rewrite) to figure out how to create a URL that is internal >> to the application. I'd like to keep around some artifact indicating >> what the original URI was (e.g., REQUEST_URI); something that >> middleware specifically should not rewrite. Nor is there any real >> reason for it to be rewritten. > > > Hm. And SCRIPT_NAME is insufficient for this? I think I can see why > mod_rewrite would make this a problem, but ISTM that Python middleware > component could do rewrites that left SCRIPT_NAME "logically correct". I suppose it could, i.e., http:// + SERVER_NAME + ":" + SERVER_PORT + SCRIPT_NAME + PATH_INFO + "?" + QUERY_STRING is the complete URL. If that's the expectation, then that too should be in the spec. But, if only because of the existance of mod_rewrite, that's not likely to be true. REQUEST_URI just seems like a natural part of the request description -- it says exactly what the client asked for, without the extra meaning that SCRIPT_NAME and PATH_INFO have. In the end I've come to dislike mod_rewrite because of these issues, but given its existance... >> SERVER_ADDR and REMOTE_PORT also don't require any rewriting, and >> should just be passed through any middleware. > > > Are you sure? SERVER_ADDR might be different if the request is > forwarded to another machine, mightn't it? I seem to recall that > mod_backhand does some stuff with this. In any case it highlights the > trouble with trying to precisely pin down things that are already > inherently implementation-defined. Unfortunately, WSGI isn't really > going to eliminate all the environment introspecting and munging code > that lives in the various existing apps and frameworks today. If SERVER_ADDR needs to be rewritten, then SERVER_NAME would be rewritten at the same time. I think I've also seen some inconsistencies of SERVER_NAME and HTTP_HOST. SERVER_NAME tends to be the canonical name of the host, ignoring any named virtual hosts (at least in Apache). So really if you are going to construct a URL it should use (environ.get("HTTP_HOST") or environ.get("SERVER_NAME")). Maybe it would be good to include how the URL is supposed to be split up, at least informationally. Like, you can reconstruct the URL by doing: if environ.get('HTTPS') == 'on': url = 'https://' else: url = 'http://' if environ.get('HTTP_HOST'): url += environ['HTTP_HOST'] else: url += environ['SERVER_NAME'] if environ.get('HTTPS') == 'on': if environ['SERVER_PORT'] != '443' url += ':' + environ['SERVER_PORT'] else: if environ['SERVER_PORT'] != '80': url += ':' + environ['SERVER_PORT'] url += environ['SCRIPT_NAME'] url += environ.get('PATH_INFO', '') if environ.get('QUERY_STRING'): url += '?' + environ['QUERY_STRING'] This should never fail (no missing keys), and should always be accurate except for details like a ? without a query string, or an explicit port that matches the default, or a server may optionally normalize the path. If it can't be accurate -- e.g., because SCRIPT_NAME or PATH_INFO have been muddled (or even QUERY_STRING) -- then I'd like to have a REQUEST_URI which is accurate. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From angryhicKclown at netscape.net Sat Aug 14 19:42:22 2004 From: angryhicKclown at netscape.net (angryhicKclown@netscape.net) Date: Sat Aug 14 19:42:28 2004 Subject: [Web-SIG] WSGI - alternative ideas Message-ID: <1B619C81.6DDEF2D7.519F8DB3@netscape.net> Hi, I've just subscribed to this list, but I've read much of the archives. Python is in dire and immediate need of WSGI. I think WSGI needs to be essentially very similar to jonpy (jonpy.sf.net), except without the templating. Jonpy exposes an interface very similar to Java servlets, and can run on cgi, fastcgi, and mod_python by changing one line of code. WSGI, I believe, should be a higher-level interface than what has been currently outlined. For Python to succeed as a web language (and I believe that it will), it needs to support the following out of the box: - a clean servlet interface, see jonpy's Handler classes - support for a multitude of different platforms easily - sessions - database connection pooling - caching The syntax for something like this would be as follows: ------------------------- import wsgi class MyServlet(wsgi.Servlet): # perhaps a different name than Servlet? def handle(self, req, **formargs): pass wsgi.main(MyServlet()) ------------------ The wsgi module should automatically detect if its running under CGI, mod_python, fastcgi, PyWX, or even IIS ASP with Python activex script or ISAPI. The request args are passed as key=value, unless there are multiple values for one key, in which case the values are passed as a list. The request object would support sessions via a "req.sessions" dict. WSGI would pick the storage method it uses depending on what platform it is run on. It would also support a database pool by using a "req.pool" object. I believe it should support pooling of any type of class. Here's an idea for syntax: req.pool['database'] = (MySQLdb.connect, {'user':'example','passwd':'secret','db':'example'}) And a call to req.pool['database'] would check out a connection to that database, and would be automatically returned at the end of the request. Or am I taking this at too high a level? Perhaps it should simply clone the cgi module for different platforms (i.e. from wsgi import cgi, from wsgi import mod_python), or, perhaps the wsgi module will expose the same interface as the cgi module, and autodetect the platform and act accordingly. Thanks for reading, Peter Hunt __________________________________________________________________ Switch to Netscape Internet Service. As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register Netscape. Just the Net You Need. New! Netscape Toolbar for Internet Explorer Search from anywhere on the Web and block those annoying pop-ups. Download now at http://channels.netscape.com/ns/search/install.jsp From pje at telecommunity.com Sat Aug 14 19:53:51 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Aug 14 19:54:20 2004 Subject: [Web-SIG] WSGI - alternative ideas In-Reply-To: <1B619C81.6DDEF2D7.519F8DB3@netscape.net> Message-ID: <5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com> At 01:42 PM 8/14/04 -0400, angryhicKclown@netscape.net wrote: >Hi, I've just subscribed to this list, but I've read much of the archives. >Python is in dire and immediate need of WSGI. > >I think WSGI needs to be essentially very similar to jonpy (jonpy.sf.net), >except without the templating. Jonpy exposes an interface very similar to >Java servlets, and can run on cgi, fastcgi, and mod_python by changing one >line of code. WSGI, I believe, should be a higher-level interface than >what has been currently outlined. For Python to succeed as a web language >(and I believe that it will), it needs to support the following out of the box: > >- a clean servlet interface, see jonpy's Handler classes >- support for a multitude of different platforms easily >- sessions >- database connection pooling >- caching These needs are already served by dozens of Python web frameworks. To duplicate even *one* of these facilities in the WSGI specification simply adds to the number of existing web frameworks, without fixing anything. WSGI is *intentionally* primitive, to minimize the number of things that different frameworks disagree on. Unfortunately, *everybody* wants to write the "framework to end all frameworks", but this always just results in the existence of framework number N+1. To really change the status quo, there *must* exist something which is *not* a framework. WSGI can reach critical mass if a sufficient number of popular frameworks and servers support it. By contrast, a new framework must successfully "recruit" *individual* users of existing frameworks who have (potentially) already written quite a lot of code to that framework's API. A new framework also threatens the value of the investments existing framework authors have made, and therefore does not encourage their participation in "cannibalizing" their work! From mnot at mnot.net Sat Aug 14 20:25:25 2004 From: mnot at mnot.net (Mark Nottingham) Date: Sat Aug 14 20:25:30 2004 Subject: [Web-SIG] WSGI - alternative ideas In-Reply-To: <5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com> References: <5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com> Message-ID: <4FE70A4B-EE1F-11D8-9BC6-000A95BD86C0@mnot.net> +1 FWIW, I really like this; I'm going to code something up and see how it goes, but from a first look, this is *exactly* what the world needs. My comments so far (based on revision 1.1): - in general, it would be helpful if references to external specs and constructs they define (e.g., CGI, HTTP, URI) had explicit links and section numbers, so we all are talking about the same things. - in "The start_response() Callable", trailing CRs or LFs are forbidden; what about those inside the text? Multi-line HTTP headers are legal... - it would be helpful if you gave distinguished names to the different callables flying around, and perhaps included an illustration; it gets confusing. - could you talk a bit about the choice of using an environment dictionary for requests? In particular, I understand that CGI-style environment variables makes things easy for CGI implementations, potentially at the expense of others; why not do a list of tuples -- in the style that you describe for response headers? Cheers, On Aug 14, 2004, at 10:53 AM, Phillip J. Eby wrote: > At 01:42 PM 8/14/04 -0400, angryhicKclown@netscape.net wrote: >> Hi, I've just subscribed to this list, but I've read much of the >> archives. Python is in dire and immediate need of WSGI. >> >> I think WSGI needs to be essentially very similar to jonpy >> (jonpy.sf.net), except without the templating. Jonpy exposes an >> interface very similar to Java servlets, and can run on cgi, fastcgi, >> and mod_python by changing one line of code. WSGI, I believe, should >> be a higher-level interface than what has been currently outlined. >> For Python to succeed as a web language (and I believe that it will), >> it needs to support the following out of the box: >> >> - a clean servlet interface, see jonpy's Handler classes >> - support for a multitude of different platforms easily >> - sessions >> - database connection pooling >> - caching > > These needs are already served by dozens of Python web frameworks. To > duplicate even *one* of these facilities in the WSGI specification > simply adds to the number of existing web frameworks, without fixing > anything. WSGI is *intentionally* primitive, to minimize the number > of things that different frameworks disagree on. > > Unfortunately, *everybody* wants to write the "framework to end all > frameworks", but this always just results in the existence of > framework number N+1. To really change the status quo, there *must* > exist something which is *not* a framework. > > WSGI can reach critical mass if a sufficient number of popular > frameworks and servers support it. By contrast, a new framework must > successfully "recruit" *individual* users of existing frameworks who > have (potentially) already written quite a lot of code to that > framework's API. > > A new framework also threatens the value of the investments existing > framework authors have made, and therefore does not encourage their > participation in "cannibalizing" their work! > > _______________________________________________ > Web-SIG mailing list > Web-SIG@python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/mnot%40mnot.net > -- Mark Nottingham http://www.mnot.net/ From pje at telecommunity.com Sat Aug 14 21:05:56 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Aug 14 21:06:21 2004 Subject: [Web-SIG] WSGI - alternative ideas In-Reply-To: <4FE70A4B-EE1F-11D8-9BC6-000A95BD86C0@mnot.net> References: <5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com> <5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040814144604.03088180@mail.telecommunity.com> At 11:25 AM 8/14/04 -0700, Mark Nottingham wrote: >+1 > >FWIW, I really like this; I'm going to code something up and see how it >goes, but from a first look, this is *exactly* what the world needs. > >My comments so far (based on revision 1.1): > > - in general, it would be helpful if references to external specs and > constructs they define (e.g., CGI, HTTP, URI) had explicit links and > section numbers, so we all are talking about the same things. > >- in "The start_response() Callable", trailing CRs or LFs are forbidden; >what about those inside the text? Multi-line HTTP headers are legal... In the next draft, I'll be drilling into these issues more, per Ian and Fredrik's comments earlier this week. Specifically, I'm going to go into a lot more detail about *which* CGI variables are required to be available, and what headers must be supplied by the application object versus those which must be supplied by the server if not present. >- it would be helpful if you gave distinguished names to the different >callables flying around, and perhaps included an illustration; it gets >confusing. Well, the only one that doesn't have an explicit name is the 'write' callable, and I can fix that by calling it "the 'write' callable". Some editing should ensure that all callables are referred to with some kind of name nearby. >- could you talk a bit about the choice of using an environment dictionary >for requests? In particular, I understand that CGI-style environment >variables makes things easy for CGI implementations, potentially at the >expense of others; why not do a list of tuples -- in the style that you >describe for response headers? Because CGI variables aren't ordered. If the request input were required to be HTTP headers, this would make it impossible for CGI, FastCGI and other gateways defined in terms of CGI to serve as valid WSGI implementations. From mnot at mnot.net Sat Aug 14 22:21:15 2004 From: mnot at mnot.net (Mark Nottingham) Date: Sat Aug 14 22:21:19 2004 Subject: [Web-SIG] WSGI - alternative ideas In-Reply-To: <5.1.1.6.0.20040814144604.03088180@mail.telecommunity.com> References: <5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com> <5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com> <5.1.1.6.0.20040814144604.03088180@mail.telecommunity.com> Message-ID: <7E38FB6A-EE2F-11D8-9BC6-000A95BD86C0@mnot.net> On Aug 14, 2004, at 12:05 PM, Phillip J. Eby wrote: > Well, the only one that doesn't have an explicit name is the 'write' > callable, and I can fix that by calling it "the 'write' callable". > Some editing should ensure that all callables are referred to with > some kind of name nearby. Great. Maybe something more descriptive, like writeResponseBody? >> - could you talk a bit about the choice of using an environment >> dictionary for requests? In particular, I understand that CGI-style >> environment variables makes things easy for CGI implementations, >> potentially at the expense of others; why not do a list of tuples -- >> in the style that you describe for response headers? > > Because CGI variables aren't ordered. If the request input were > required to be HTTP headers, this would make it impossible for CGI, > FastCGI and other gateways defined in terms of CGI to serve as valid > WSGI implementations. Sorry, I don't follow. HTTP headers aren't ordered, except within a particular header field-name. It's trivial to map from a dictionary like {"HTTP_REFERER": "http://www.example.com/", "HTTP_HOST": "www.example.org"} to [("referer", "http://www.example.com/"), ("host", "www.example.org")]. No information is lost, and it's easier for non-CGI implementations to work with. This isn't to say that the entire environment should be in this style; I'm just concerned about the HTTP headers. One other thing -- as far as I can tell, this interface can't accommodate Expect/100-Continue interactions, as specified in RFC2616 section 8.2.3. I.e., to support this, the application needs to be given access to the request headers before reading the request body, so that it can send a 100 Continue, then read the request body, then send a normal response. I think it would be possible to support this feature without unduly burdening implementations by saying that start_response() can be called a second time, IF there is an expect: 100-continue header in the request. Server implementations which don't support this behaviour, or automatically handle it themselves, can filter out that header. E.g., if the request environment contains the expect: 100-continue header, the application can do one of three things; 1) respond as normal; i.e., call start_response() and send a successful, redirect, or error response (possibly blocking until the request body is received) 2) respond with a 417 Expectation Failed status code 3) respond with a 100 Continue in the first call to start_response(), and then call start_response() again to make the actual response. Thoughts? Keep up the good work! -- Mark Nottingham http://www.mnot.net/ From pje at telecommunity.com Sun Aug 15 00:11:10 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Aug 15 00:11:37 2004 Subject: [Web-SIG] WSGI - alternative ideas In-Reply-To: <7E38FB6A-EE2F-11D8-9BC6-000A95BD86C0@mnot.net> References: <5.1.1.6.0.20040814144604.03088180@mail.telecommunity.com> <5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com> <5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com> <5.1.1.6.0.20040814144604.03088180@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040814175553.02681ab0@mail.telecommunity.com> At 01:21 PM 8/14/04 -0700, Mark Nottingham wrote: >On Aug 14, 2004, at 12:05 PM, Phillip J. Eby wrote: >>Because CGI variables aren't ordered. If the request input were required >>to be HTTP headers, this would make it impossible for CGI, FastCGI and >>other gateways defined in terms of CGI to serve as valid WSGI implementations. > >Sorry, I don't follow. HTTP headers aren't ordered, except within a >particular header field-name. It's trivial to map from a dictionary like >{"HTTP_REFERER": "http://www.example.com/", "HTTP_HOST": >"www.example.org"} to [("referer", "http://www.example.com/"), ("host", >"www.example.org")]. No information is lost, and it's easier for non-CGI >implementations to work with. Hm. How many such frameworks are there? I'm making the implicit assumption that web servers that know how to create CGI variables are more common than frameworks that don't use CGI variables. If that's not a valid assumption, then perhaps the decision should be revisited. >One other thing -- as far as I can tell, this interface can't accommodate >Expect/100-Continue interactions, as specified in RFC2616 section 8.2.3. >I.e., to support this, the application needs to be given access to the >request headers before reading the request body, so that it can send a 100 >Continue, then read the request body, then send a normal response. Actually, this *could* work under the current spec, so long as the application manages the second response. However, there are pieces that need to be better defined, such as what happens if the application writes more data than is present in the outgoing 'Content-Length' header. >I think it would be possible to support this feature without unduly >burdening implementations by saying that start_response() can be called a >second time, IF there is an expect: 100-continue header in the request. >Server implementations which don't support this behaviour, or >automatically handle it themselves, can filter out that header. > >E.g., if the request environment contains the expect: 100-continue header, >the application can do one of three things; > >1) respond as normal; i.e., call start_response() and send a successful, >redirect, or error response (possibly blocking until the request body is >received) > >2) respond with a 417 Expectation Failed status code > >3) respond with a 100 Continue in the first call to start_response(), and >then call start_response() again to make the actual response. > >Thoughts? Unfortunately, I have no experience with this aspect of HTTP/1.1. I fear I shall end up having to study the RFC extensively before drawing a conclusion on this. :( It seems to me that another approach is possible, though... couldn't the web server just send a 100 Continue response if there's an "expect: 100-continue" header in the request, and you attempt to read from the input stream before you've called the 'start_response' callable? At first glance, this sounds like a reasonable way to handle it, that wouldn't require any explicit handling by the application code. Then, WSGI could also require that such an "expect" header must NOT appear in the request passed to an application. From mnot at mnot.net Sun Aug 15 01:01:21 2004 From: mnot at mnot.net (Mark Nottingham) Date: Sun Aug 15 01:01:27 2004 Subject: [Web-SIG] WSGI - alternative ideas In-Reply-To: <5.1.1.6.0.20040814175553.02681ab0@mail.telecommunity.com> References: <5.1.1.6.0.20040814144604.03088180@mail.telecommunity.com> <5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com> <5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com> <5.1.1.6.0.20040814144604.03088180@mail.telecommunity.com> <5.1.1.6.0.20040814175553.02681ab0@mail.telecommunity.com> Message-ID: On Aug 14, 2004, at 3:11 PM, Phillip J. Eby wrote: > It seems to me that another approach is possible, though... couldn't > the web server just send a 100 Continue response if there's an > "expect: 100-continue" header in the request, and you attempt to read > from the input stream before you've called the 'start_response' > callable? At first glance, this sounds like a reasonable way to > handle it, that wouldn't require any explicit handling by the > application code. Then, WSGI could also require that such an "expect" > header must NOT appear in the request passed to an application. That sounds very reasonable... -- Mark Nottingham http://www.mnot.net/ From angryhicKclown at netscape.net Sun Aug 15 02:27:28 2004 From: angryhicKclown at netscape.net (angryhicKclown@netscape.net) Date: Sun Aug 15 02:27:32 2004 Subject: [Web-SIG] Re: WSGI - alternative ideas Message-ID: <77A88906.512D2578.519F8DB3@netscape.net> Thanks for replying. Python does need sessions, pooling, and caching, but, as I understand it, they would be implemented as separate modules on top of WSGI? And how about simply creating a wsgi module that emulates the cgi module, except works across different web platforms? __________________________________________________________________ Switch to Netscape Internet Service. As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register Netscape. Just the Net You Need. New! Netscape Toolbar for Internet Explorer Search from anywhere on the Web and block those annoying pop-ups. Download now at http://channels.netscape.com/ns/search/install.jsp From pje at telecommunity.com Sun Aug 15 06:26:43 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Aug 15 06:27:12 2004 Subject: [Web-SIG] Re: WSGI - alternative ideas In-Reply-To: <77A88906.512D2578.519F8DB3@netscape.net> Message-ID: <5.1.1.6.0.20040815002103.035e7ec0@mail.telecommunity.com> At 08:27 PM 8/14/04 -0400, angryhicKclown@netscape.net wrote: >Thanks for replying. > >Python does need sessions, pooling, and caching, but, as I understand it, >they would be implemented as separate modules on top of WSGI? More precisely, the idea is to convince authors of existing frameworks that provide those services, to enable their frameworks to be run under various web servers, and the authors of web servers, to support WSGI so those frameworks can run. To a limited extent, WSGI itself can support new "ultralight" frameworks, in the sense that WSGI is intended to allow easy creation of "middleware" components. For example, one could create a WSGI "session manager" that looks at a request and adds a session object to the 'environ' dictionary under a special key. The point is that since it's a standardized API, you can plug together whatever components you want or need. >And how about simply creating a wsgi module that emulates the cgi module, >except works across different web platforms? That's not in scope for the WSGI, whose goals specifically state that the specification must *not* require anything added to the standard library. This does not preclude separate proposals for standard library enhancements based on WSGI; it's just that they're not a part of *this* proposal. From paul.boddie at ementor.no Mon Aug 16 10:13:32 2004 From: paul.boddie at ementor.no (Paul Boddie) Date: Mon Aug 16 10:13:37 2004 Subject: [Web-SIG] WSGI - alternative ideas Message-ID: > angryhicKclown@netscape.net wrote: > > Hi, I've just subscribed to this list, but I've read much of the archives. > Python is in dire and immediate need of WSGI. As later messages have suggested, it isn't so much WSGI that you're looking for, but a standardised API for application development. > I think WSGI needs to be essentially very similar to jonpy (jonpy.sf.net), > except without the templating. Jonpy exposes an interface very similar to > Java servlets, and can run on cgi, fastcgi, and mod_python by changing one > line of code. WSGI, I believe, should be a higher-level interface than > what has been currently outlined. For Python to succeed as a web language > (and I believe that it will), it needs to support the following out of the > box: > > - a clean servlet interface, see jonpy's Handler classes > - support for a multitude of different platforms easily So far, this is what WebStack [1] does. I suppose I could have either extended jonpy or adopted the API, but I have tried to implement something which is more complete from the lowest levels upward. > - sessions > - database connection pooling > - caching Things like shared resources aren't yet supported by WebStack, but I'm thinking of ways to expose framework functionality in a uniform fashion. > The syntax for something like this would be as follows: > > ------------------------- > > import wsgi > > class MyServlet(wsgi.Servlet): # perhaps a different name than Servlet? > def handle(self, req, **formargs): > pass > > wsgi.main(MyServlet()) This is a lot like WebStack except that the initialisation of resources (servlets in the above example) varies across frameworks. Therefore, you wouldn't initialise resources in the same place as they are defined - see the examples for WebStack for more details. > ------------------ > > The wsgi module should automatically detect if its running under CGI, > mod_python, fastcgi, PyWX, or even IIS ASP with Python activex script or > ISAPI. The request args are passed as key=value, unless there are multiple > values for one key, in which case the values are passed as a list. See the WebStack framework support and the WebStack.Generic module for the API. I've been very conservative with the multiple values per parameter issue, always returning a list of values whether you intended there to be just one or not, mostly because developers should be aware of such issues that, if exploited by mischievous users, could make their solutions less robust. > The request object would support sessions via a "req.sessions" dict. WSGI > would pick the storage method it uses depending on what platform it is run > on. This is the general idea for WebStack's eventual session support. > It would also support a database pool by using a "req.pool" object. I > believe it should support pooling of any type of class. Here's an idea for > syntax: > > req.pool['database'] = (MySQLdb.connect, > {'user':'example','passwd':'secret','db':'example'}) > > And a call to req.pool['database'] would check out a connection to that > database, and would be automatically returned at the end of the request. I'm inclined to utilise a general database pooling package rather than invent an API which in eventual hindsight could seem to be inadequate. > Or am I taking this at too high a level? Perhaps it should simply clone > the cgi module for different platforms (i.e. from wsgi import cgi, from > wsgi import mod_python), or, perhaps the wsgi module will expose the same > interface as the cgi module, and autodetect the platform and act > accordingly. I think you're looking at the right level for standardisation. What WSGI is meant for, as far as I've discovered through reading this list and by the occasional question, is the deployment of existing applications on top of frameworks or servers which do not natively support the API employed by those applications; as I noted, to a Webware application running on some WSGI layer, all frameworks and servers look like Webware. The problem with (or rather the problem avoided by) WSGI is that it doesn't provide any coherency for people writing applications or higher-level frameworks - by the latter, I'm talking about things which do form handling and templating - you still have to choose your favourite framework and then hope that the tricks you've employed will work on WSGI. This means that newcomers still have to stare down that recently pruned list on the WebProgramming page [2]. Paul [1] http://www.python.org/pypi?%3Aaction=search&name=WebStack [2] http://www.python.org/cgi-bin/moinmoin/WebProgramming From pje at telecommunity.com Mon Aug 16 17:30:32 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 16 17:31:17 2004 Subject: [Web-SIG] WSGI - alternative ideas In-Reply-To: Message-ID: <5.1.1.6.0.20040816111206.036d47a0@mail.telecommunity.com> At 10:13 AM 8/16/04 +0200, Paul Boddie wrote: >The problem with (or rather the problem avoided by) WSGI is that it >doesn't >provide any coherency for people writing applications or higher-level >frameworks - by the latter, I'm talking about things which do form >handling >and templating - you still have to choose your favourite framework and >then >hope that the tricks you've employed will work on WSGI. This means that >newcomers still have to stare down that recently pruned list on the >WebProgramming page [2]. Well, at least it doesn't *add* a new choice to that list. ;) It *does*, however, create an environment that allows for "non-framework" frameworks, since middleware components can add arbitrary data and service objects to the 'environ'. (And, there's also nothing stopping components from being distributed as non-middleware functions or objects that one supplies the 'environ' to, in order to obtain data or do things.) So, even though WSGI itself doesn't provide a higher-level API, its existence and popularity should eventually allow users to choose framework services on a piece-by-piece rather than framework-at-a-time basis. But, we won't get there if WSGI doesn't get implemented in web servers, and it won't be attractive for server authors unless there's a "market" for WSGI web servers. And there won't be a significant market for them unless existing software, under existing frameworks, can run on WSGI. Anyway, once WSGI middleware components are popular, there's then a market for framework authors to allow WSGI components to be plugged in *below* their frameworks, e.g. as objects in a Zope "folder", as Webware "resources", etc. Once this happens, I expect some framework authors may see the value in refactoring their framework as a collection of WSGI middleware components... at which point frameworks disappear, and components reign supreme. Ultimate choice and flexibility now belongs to the user, and we all live happily ever after in the land of happy happy web programmers, or something like that. That is a *long* way off, however. The reality today is that nothing is going to change without a clear win for the framework authors whose frameworks own the bulk of the market share in Python web applications. Trying to directly create a new, competing API is quite simply an attack on their investment, and it's not going to get us anywhere. At the least, such a new API doesn't do anything positive for them. In principle, WSGI will let their apps run on more servers, and is simple enough for server and framework authors to try it out as an experiment. From ianb at colorstudy.com Thu Aug 19 06:50:54 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Aug 19 06:50:57 2004 Subject: [Web-SIG] WSGI uses Message-ID: <412431AE.9050909@colorstudy.com> I was playing around with making a WSGI server, and I'm starting to think that some really neat stuff could be done with middleware. For instance, I was thinking about setting up something for Medusa with WSGI. But though I think asynchronous code seems like a good server architecture, I'm not that interested in it for applications. But this iteration of the WSGI spec allows for async pretty well; you can tell you are in that situation when wsgi.multiprocess is false and wsgi.multithread is false, and the iterator output can produce the data fairly well. I then realized that threading itself could be a piece of middleware -- you just have to do the proper buffering with input and output. An intelligent application that realizes it can't run as an async process could install this middleware itself when necessary. Even multiprocess could really be implemented as a piece of middleware, either running CGI scripts, or forking worker processes. It could get out of hand if every application in a multi-application system had its own middleware; but the extension mechanism could also allow you to lazily implement these models, providing callbacks to access existing thread or worker process pools. Another useful piece of middleware would be something for error reporting; it would basically pass everything through, but wrap everything in try:except:. Then you could develop and plug a nice debugger into whatever architecture, and the basic server can do just the most minimal error logging (basically print a traceback to the error log). Anyway, I'm pretty excited about the possibilities. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Thu Aug 19 07:28:30 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Aug 19 07:28:22 2004 Subject: [Web-SIG] WSGI uses In-Reply-To: <412431AE.9050909@colorstudy.com> Message-ID: <5.1.1.6.0.20040819012310.02bad720@mail.telecommunity.com> At 11:50 PM 8/18/04 -0500, Ian Bicking wrote: >I was playing around with making a WSGI server, and I'm starting to think >that some really neat stuff could be done with middleware. Indeed. In recent months, when I was refactoring peak.web to remove its dependency on Zope X3, I came up with a variant of the current WSGI interface as a strictly internal coupling mechanism for peak.web. It turned out to be a delightfully simple way to connect internal components. A peak.web user had previously complained about the difficulty of wrapping arbitrary postprocessing around pages, so I invented a more "functional" protocol for coupling peak.web components. The main difference between it and WSGI now, was that it returned the status, headers, and iterator all in one tuple. Tony Lownds proposed the write = start_response(status,headers) part, and I merged it with returning an iterator to form the new interface. It's a reasonably lightweight API for both the server and application sides, but it's *remarkably* lightweight for middleware. From fumanchu at amor.org Thu Aug 19 10:56:44 2004 From: fumanchu at amor.org (Robert Brewer) Date: Thu Aug 19 11:02:03 2004 Subject: [Web-SIG] The rewritten WSGI pre-PEP Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022E0B@exchange.hqamor.amorhq.net> > The ``environ`` dictionary is required to contain CGI environment > variables, as defined by the Common Gateway Interface specification > [2]_. In addition, it must contain the following WSGI-defined > variables: > Finally, the ``environ`` dictionary may also contain server-defined > variables. These variables should be named using only lower-case > letters, numbers, dots, and underscores, and should be prefixed with > a name that is unique to the defining server or gateway. For > example, ``mod_python`` might define variables with names like > ``mod_python.some_variable``. I'm all for simplicity, but also for ubiquity; I'd like to see a standard "uploads" entry in the environ dict. I'd really hate to see environ['mod_python.uploaded_files'] which is different from, say, environ['iis_asp.files_which_have_been_uploaded'] when they don't need to be specialized. Example: environ['uploads'] = {supplied_filename: read_func, ...} mod_python, for example, would populate it via: for param in (util.FieldStorage(req, 1).list or []): if param.filename: environ['uploads'][param.filename] = param.file.read else: # handle non-file param ... Perhaps there are other candidates for standardized (but not required) entries? Robert Brewer MIS Amor Ministries fumanchu@amor.org * Introduction at the end: Hello, all. I'm fairly new to Python (~ 1 year). I just replaced the existing core business webapp at my company (which I wrote in VB4 !) with a more enterprise-level Python one. So I've rolled my own framework (templating, ORM, and multi-webserver), at least once. ;) Oh, and I also wrote a wiki-like app to run on the same framework. From and-py at doxdesk.com Thu Aug 19 15:16:08 2004 From: and-py at doxdesk.com (Andrew Clover) Date: Thu Aug 19 15:15:32 2004 Subject: [Web-SIG] The rewritten WSGI pre-PEP In-Reply-To: <3A81C87DC164034AA4E2DDFE11D258E3022E0B@exchange.hqamor.amorhq.net> References: <3A81C87DC164034AA4E2DDFE11D258E3022E0B@exchange.hqamor.amorhq.net> Message-ID: <4124A818.2070907@doxdesk.com> Robert Brewer wrote: > I'm all for simplicity, but also for ubiquity; I'd like to see a > standard "uploads" entry in the environ dict. I wouldn't! WSGI should not touch the HTTP request input stream, and definitely should not attempt to parse a form submission to get fields and file uploads out of it. That's the job of the framework (or other form-reading package not necessarily part of a complete framework). There are multiple existing form-reading implementations with different ways of working. -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From fumanchu at amor.org Thu Aug 19 16:35:21 2004 From: fumanchu at amor.org (Robert Brewer) Date: Thu Aug 19 16:40:40 2004 Subject: [Web-SIG] The rewritten WSGI pre-PEP Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022E0C@exchange.hqamor.amorhq.net> Andrew Clover wrote: > Robert Brewer wrote: > > > I'm all for simplicity, but also for ubiquity; I'd like to see a > > standard "uploads" entry in the environ dict. > > I wouldn't! WSGI should not touch the HTTP request input stream, and > definitely should not attempt to parse a form submission to > get fields and file uploads out of it. > > That's the job of the framework (or other form-reading package not > necessarily part of a complete framework). There are multiple > existing form-reading implementations with different ways of > working. Fair enough. That would really break chaining components, as well, now that I think about it. Idea retracted. Robert Brewer MIS Amor Ministries fumanchu@amor.org From tony at lownds.com Thu Aug 19 21:37:01 2004 From: tony at lownds.com (tony@lownds.com) Date: Thu Aug 19 21:52:30 2004 Subject: [Web-SIG] WSGI uses In-Reply-To: <412431AE.9050909@colorstudy.com> References: <412431AE.9050909@colorstudy.com> Message-ID: <51261.204.162.121.54.1092944221.squirrel@*> > For instance, I was thinking about setting up something for Medusa with > WSGI. But though I think asynchronous code seems like a good server > architecture, I'm not that interested in it for applications. But this > iteration of the WSGI spec allows for async pretty well; you can tell > you are in that situation when wsgi.multiprocess is false and > wsgi.multithread is false, and the iterator output can produce the data > fairly well. > How do you decide when to actually send the data back to the client? On every yield? That could perform badly if one does def application(...): ... return open(filename) ...that usage is actually suggested in the spec. In a similar vein, if servers/gateways send data back on every call to write, and applications don't take that into account, they could also suffer in performance. It seems like an object with write() and flush() makes it easier to provide guarantees about streaming -- which I think WSGI ought to do. > I then realized that threading itself could be a piece of middleware -- > you just have to do the proper buffering with input and output. An > intelligent application that realizes it can't run as an async process > could install this middleware itself when necessary. > Did you find that an async server has to provide a new buffer for every request to implement the write() function correctly? Although I suggested the (env, start_response) -> write() protocol, it just can't adapt to future needs. As soon as more than one function/method is needed, the API is broken -- and can't be fixed. For instance, having one method to start the response and NOT get a write() function could allow server/gateways avoid some work... What about passing in a class with class methods in place of the start_response method? i.e. class ContextLogic: @classmethod def start_writing(cls, env, status, headers): cls.start(env, status, headers) # prepare output object return output.write @classmethod def start(cls, env, status, headers): .... @classmethod def request_url(cls, env): ... @classmethod def get_input_stream(cls, env): ... Contexts can be re-used, and middleware does not have to delegate (it just subclasses on the fly). class Pooler: class PoolingLogicMixin: @classmethod def get_pool(cls, env): ... def __init__(self, subapp, ...): self.subapp = subapp def __call__(self, context, env): class NewContext(PoolingLogicMixin, context): pass return self.subapp(NewContext, env) One more thought: how about using the term WSGI "driver" instead of server/gateway? -Tony From fumanchu at amor.org Thu Aug 19 22:25:55 2004 From: fumanchu at amor.org (Robert Brewer) Date: Thu Aug 19 22:31:15 2004 Subject: [Web-SIG] WSGI uses Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022E0D@exchange.hqamor.amorhq.net> Tony Lownds wrote: > One more thought: how about using the term WSGI "driver" instead of > server/gateway? +1. And "provider" or some word besides the extremely-overused "application", which usually has already been used by any given framework. Robert Brewer MIS Amor Ministries fumanchu@amor.org From pje at telecommunity.com Thu Aug 19 22:59:23 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Aug 19 22:59:15 2004 Subject: [Web-SIG] WSGI uses In-Reply-To: <51261.204.162.121.54.1092944221.squirrel@*> References: <412431AE.9050909@colorstudy.com> <412431AE.9050909@colorstudy.com> Message-ID: <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com> At 12:37 PM 8/19/04 -0700, tony@lownds.com wrote: > > For instance, I was thinking about setting up something for Medusa with > > WSGI. But though I think asynchronous code seems like a good server > > architecture, I'm not that interested in it for applications. But this > > iteration of the WSGI spec allows for async pretty well; you can tell > > you are in that situation when wsgi.multiprocess is false and > > wsgi.multithread is false, and the iterator output can produce the data > > fairly well. > > > >How do you decide when to actually send the data back to the client? On >every yield? That's up to the server to decide. >In a similar vein, if servers/gateways send data back on every call to >write, and applications don't take that into account, they could also >suffer in performance. It seems like an object with write() and flush() >makes it easier to provide guarantees about streaming -- which I think >WSGI ought to do. If you need to avoid creating data before the client is ready for it, you should use the async interface (yielding data) rather than the push interface (write() calls). An asynchronous server should avoid moving the iterator forward when the outgoing socket isn't ready for data to be sent. > > I then realized that threading itself could be a piece of middleware -- > > you just have to do the proper buffering with input and output. An > > intelligent application that realizes it can't run as an async process > > could install this middleware itself when necessary. > > > >Did you find that an async server has to provide a new buffer for every >request to implement the write() function correctly? I'm not sure I'm following either you or Ian here. >Although I suggested the (env, start_response) -> write() protocol, it >just can't adapt to future needs. As soon as more than one function/method >is needed, the API is broken -- and can't be fixed. Actually, there are several extension routes available, such as adding optional or keyword parameters to start_response() and write(). >For instance, having one method to start the response and NOT get a >write() function could allow server/gateways avoid some work... In order to be compliant, the server *must* support the write() facility, so there's no point to making it optional. >What about passing in a class with class methods in place of the >start_response method? i.e. > >class ContextLogic: > @classmethod > .... > >Contexts can be re-used, and middleware does not have to delegate (it just >subclasses on the fly). Interesting concept, although it means that servers would also be subclassing on-the-fly if they need per-request data on the context. Though I suppose all methods could be required to take the environment as a parameter. So far, I'm -0.5 on the idea, as I'd *really* like to keep the whole thing super-minimalistic. Everything that expands the scope increases the range of ways that people can accidentally write implementations that don't interoperate. >One more thought: how about using the term WSGI "driver" instead of >server/gateway? But servers and gateways are what they *are*. They're not "drivers" in any sense that I understand, at least. From tony at lownds.com Thu Aug 19 23:33:59 2004 From: tony at lownds.com (tony@lownds.com) Date: Thu Aug 19 23:49:28 2004 Subject: [Web-SIG] WSGI uses In-Reply-To: <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com> References: <412431AE.9050909@colorstudy.com><412431AE.9050909@colorstudy.com> <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com> Message-ID: <52525.204.162.121.54.1092951239.squirrel@*> [Phillip] > If you need to avoid creating data before the client is ready for it, you > should use the async interface (yielding data) rather than the push > interface (write() calls). An asynchronous server should avoid moving the > iterator forward when the outgoing socket isn't ready for data to be sent. > The use case I had in mind was the application sending a partial response, then doing a lot of work, then sending the rest of the response. I guess you are saying that WSGI apps shouldn't use write() in that case. I wonder when they should use write() then. If it's a second class citizen to the iterator why not force all applications to provide their own buffering? >>Although I suggested the (env, start_response) -> write() protocol, it >>just can't adapt to future needs. As soon as more than one >> function/method >>is needed, the API is broken -- and can't be fixed. > > Actually, there are several extension routes available, such as adding > optional or keyword parameters to start_response() and write(). > True. Thats less than elegant though. >>What about passing in a class with class methods in place of the >>start_response method? i.e. > > Interesting concept, although it means that servers would also be > subclassing on-the-fly if they need per-request data on the > context. Though I suppose all methods could be required to take the > environment as a parameter. > Yes, all methods would need take env. > So far, I'm -0.5 on the idea, as I'd *really* like to keep the whole thing > super-minimalistic. Everything that expands the scope increases the range > of ways that people can accidentally write implementations that don't > interoperate. > It just seems too minimal. It's hard to see how a server could cleanly implement a more powerful API than WSGI 1.0 and still be backwards compatible with apps/frameworks that use the WSGI 1.0 interface. > >>One more thought: how about using the term WSGI "driver" instead of >>server/gateway? > > But servers and gateways are what they *are*. They're not "drivers" in > any > sense that I understand, at least. > The way I see it, server is apache, or mod_python -- there would be a piece of code that implements the WSGI interface on top of the server. That's the driver. -Tony From pje at telecommunity.com Thu Aug 19 23:59:17 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Aug 19 23:59:09 2004 Subject: [Web-SIG] WSGI uses In-Reply-To: <52525.204.162.121.54.1092951239.squirrel@*> References: <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com> <412431AE.9050909@colorstudy.com> <412431AE.9050909@colorstudy.com> <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com> At 02:33 PM 8/19/04 -0700, tony@lownds.com wrote: >[Phillip] > > If you need to avoid creating data before the client is ready for it, you > > should use the async interface (yielding data) rather than the push > > interface (write() calls). An asynchronous server should avoid moving the > > iterator forward when the outgoing socket isn't ready for data to be sent. > > > >The use case I had in mind was the application sending a partial response, >then doing a lot of work, then sending the rest of the response. I guess >you are saying that WSGI apps shouldn't use write() in that case. I wonder >when they should use write() then. If it's a second class citizen to the >iterator why not force all applications to provide their own buffering? I don't see what you mean about buffering. As for it being a second-class citizen, it certainly is. But application frameworks that currently provide some analogue to write() in their API today, can't live without it. So, the write() functionality has to be there or WSGI is DOA. > > So far, I'm -0.5 on the idea, as I'd *really* like to keep the whole thing > > super-minimalistic. Everything that expands the scope increases the range > > of ways that people can accidentally write implementations that don't > > interoperate. > > > >It just seems too minimal. It's hard to see how a server could cleanly >implement a more powerful API than WSGI 1.0 and still be backwards >compatible with apps/frameworks that use the WSGI 1.0 interface. What would this "more powerful API" consist of? WSGI is a paper-thin abstraction of HTTP; that's its sole purpose. > > But servers and gateways are what they *are*. They're not "drivers" in > > any > > sense that I understand, at least. > > > >The way I see it, server is apache, or mod_python -- there would be a >piece of code that implements the WSGI interface on top of the server. >That's the driver. I've written a prototype WSGI server based on the previous draft: all it does is serve WSGI apps, so there's no "driver" involved. I expect there will be other such fully-integrated servers. A CGI-based gateway also isn't *part* of the server it runs under, so it's a gateway, not a driver. Thus, it seems to me there are only servers and gateways. That some gateways may be implemented as a driver within a server seems like obscuring the *purpose* of the API (allowing an application to run in a server or a gateway thereto) in favor of an implementation detail that doesn't even always apply. From tony at lownds.com Fri Aug 20 01:35:30 2004 From: tony at lownds.com (tony@lownds.com) Date: Fri Aug 20 01:51:00 2004 Subject: [Web-SIG] WSGI uses In-Reply-To: <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com> References: <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com><412431AE.9050909@colorstudy.com><412431AE.9050909@colorstudy.com><5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com> <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com> Message-ID: <53474.204.162.121.54.1092958530.squirrel@*> >> If it's a second class citizen to the >>iterator why not force all applications to provide their own buffering? > > I don't see what you mean about buffering. s/buffering/write()/ An application framework can provide its own write() very easily def app_framework(env, start_response): start_response(...) buffer = [] write = buffer.append ... return buffer > As for it being a second-class > citizen, it certainly is. But application frameworks that currently > provide some analogue to write() in their API today, can't live without > it. So, the write() functionality has to be there or WSGI is DOA. > > Ok > What would this "more powerful API" consist of? WSGI is a paper-thin > abstraction of HTTP; that's its sole purpose. > One example: redirecting to a resource internal to the server (like Location: for CGI) I suppose you could use a specific status code or a header. > Thus, it seems to me there are only servers and gateways. Ok -Tony From pje at telecommunity.com Fri Aug 20 18:04:12 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Aug 20 18:04:09 2004 Subject: [Web-SIG] Write buffering (was Re: WSGI uses) In-Reply-To: <53474.204.162.121.54.1092958530.squirrel@*> References: <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com> <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com> <412431AE.9050909@colorstudy.com> <412431AE.9050909@colorstudy.com> <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com> <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com> At 04:35 PM 8/19/04 -0700, tony@lownds.com wrote: > >> If it's a second class citizen to the > >>iterator why not force all applications to provide their own buffering? > > > > I don't see what you mean about buffering. > >s/buffering/write()/ > >An application framework can provide its own write() very easily > >def app_framework(env, start_response): > start_response(...) > buffer = [] > write = buffer.append > ... > return buffer Ah. But that doesn't allow *streaming* writes. The specific use case I had in mind for using your write() idea was to allow frameworks that currently allow streaming writes as a function/method invocation during the request execution, to still work under WSGI. In effect, 'write()' is a backward compatibility mechanism for existing code that expects to be able to stream data to the client during request execution, and is not currently written in the form of an iterator/producer. (It's also an acceptable mechanism for small/fast requests, and frameworks that normally buffer their I/O and would only call 'write()' once anyway.) Still, your comments have illustrated to me that there does need to be better definition of how flushing is expected to occur, although there is only one use case I can think of for it. Specifically, the only time an application needs to ensure that all its pending output has been sent to the client, is when it is about to perform some lengthy calculation and is using "server push" to display a "please wait" screen before returning the real result. In this case, if I/O is single-threaded (i.e. only happens when write() calls are made), and write() isn't guaranteed to be flushed (e.g. it's buffered and sent in blocks), then the application would need to have a way to say, "no, really, please send it *now*." On the other hand, if I/O is single-threaded in that fashion, then the server should be required to finish every write() before the write() call returns. The write() function should only be allowed to buffer the data if another thread is emptying the buffer continuously. I'll add this to the spec, unless anybody knows of any other use cases for either buffering or not buffering writes. From ianb at colorstudy.com Fri Aug 20 18:13:21 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Aug 20 18:14:40 2004 Subject: [Web-SIG] Write buffering (was Re: WSGI uses) In-Reply-To: <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com> References: <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com> <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com> <412431AE.9050909@colorstudy.com> <412431AE.9050909@colorstudy.com> <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com> <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com> <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com> Message-ID: <41262321.5080306@colorstudy.com> Phillip J. Eby wrote: > Still, your comments have illustrated to me that there does need to be > better definition of how flushing is expected to occur, although there > is only one use case I can think of for it. Specifically, the only time > an application needs to ensure that all its pending output has been sent > to the client, is when it is about to perform some lengthy calculation > and is using "server push" to display a "please wait" screen before > returning the real result. In this case, if I/O is single-threaded > (i.e. only happens when write() calls are made), and write() isn't > guaranteed to be flushed (e.g. it's buffered and sent in blocks), then > the application would need to have a way to say, "no, really, please > send it *now*." I some environments (e.g., CGI) I don't believe there's any way to ensure that the data gets sent immediately. The buffering is rather opaque in those cases. So all we can do is try, we can't really guarantee that data will be sent. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Fri Aug 20 18:37:54 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Aug 20 18:37:46 2004 Subject: [Web-SIG] Write buffering (was Re: WSGI uses) In-Reply-To: <41262321.5080306@colorstudy.com> References: <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com> <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com> <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com> <412431AE.9050909@colorstudy.com> <412431AE.9050909@colorstudy.com> <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com> <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com> <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040820123459.01f20b20@mail.telecommunity.com> At 11:13 AM 8/20/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>Still, your comments have illustrated to me that there does need to be >>better definition of how flushing is expected to occur, although there is >>only one use case I can think of for it. Specifically, the only time an >>application needs to ensure that all its pending output has been sent to >>the client, is when it is about to perform some lengthy calculation and >>is using "server push" to display a "please wait" screen before returning >>the real result. In this case, if I/O is single-threaded (i.e. only >>happens when write() calls are made), and write() isn't guaranteed to be >>flushed (e.g. it's buffered and sent in blocks), then the application >>would need to have a way to say, "no, really, please send it *now*." > >I some environments (e.g., CGI) I don't believe there's any way to ensure >that the data gets sent immediately. The buffering is rather opaque in >those cases. So all we can do is try, we can't really guarantee that data >will be sent. True enough. I've not seen a problem with CGI myself, but I believe some CGI-based protocols (not FastCGI, but some clones of it) buffer entire requests no matter what you do. WSGI servers or gateways that can't do streaming should document that fact. Or perhaps there should be two compliance levels: WSGI Basic and WSGI Streaming. From ianb at colorstudy.com Fri Aug 20 18:43:10 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Aug 20 18:44:22 2004 Subject: [Web-SIG] Write buffering (was Re: WSGI uses) In-Reply-To: <5.1.1.6.0.20040820123459.01f20b20@mail.telecommunity.com> References: <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com> <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com> <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com> <412431AE.9050909@colorstudy.com> <412431AE.9050909@colorstudy.com> <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com> <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com> <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com> <5.1.1.6.0.20040820123459.01f20b20@mail.telecommunity.com> Message-ID: <41262A1E.5020508@colorstudy.com> Phillip J. Eby wrote: > True enough. I've not seen a problem with CGI myself, but I believe > some CGI-based protocols (not FastCGI, but some clones of it) buffer > entire requests no matter what you do. WSGI servers or gateways that > can't do streaming should document that fact. Or perhaps there should > be two compliance levels: WSGI Basic and WSGI Streaming. It could just be something like a 'wsgi.streaming' key in the environment, no? Gateways should be encouraged to set that to false until they've confirmed that streaming really works consistently. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From tony at lownds.com Fri Aug 20 18:55:40 2004 From: tony at lownds.com (tony@lownds.com) Date: Fri Aug 20 19:11:23 2004 Subject: [Web-SIG] Write buffering (was Re: WSGI uses) In-Reply-To: <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com> References: <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com><5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com><412431AE.9050909@colorstudy.com><412431AE.9050909@colorstudy.com><5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com><5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com> <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com> Message-ID: <60553.68.122.70.234.1093020940.squirrel@*> > Still, your comments have illustrated to me that there does need to be > better definition of how flushing is expected to occur Thanks for deciphering those comments. This is what I was hoping for. >... the application would need > to > have a way to say, "no, really, please send it *now*." > Are you considering requiring start_response() return an object with .write() and .flush() methods? -Tony From pje at telecommunity.com Fri Aug 20 19:21:05 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Aug 20 19:20:56 2004 Subject: [Web-SIG] Write buffering (was Re: WSGI uses) In-Reply-To: <60553.68.122.70.234.1093020940.squirrel@*> References: <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com> <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com> <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com> <412431AE.9050909@colorstudy.com> <412431AE.9050909@colorstudy.com> <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com> <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com> <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040820131610.02bbcec0@mail.telecommunity.com> At 09:55 AM 8/20/04 -0700, tony@lownds.com wrote: > > Still, your comments have illustrated to me that there does need to be > > better definition of how flushing is expected to occur > >Thanks for deciphering those comments. This is what I was hoping for. > > >... the application would need > > to > > have a way to say, "no, really, please send it *now*." > > > >Are you considering requiring start_response() return an object with >.write() and .flush() methods? No, I'm suggesting that write() should be guaranteed to either: 1) Flush all output before returning, or 2) Put data in a buffer that will be emptied by another thread or by the operating system To be a conforming implementation, a server/gateway must do one or the other. The same rules should apply for data yielded by a returned iterator, i.e. the data must be sent or buffered for continuous sending before the iterator's next() method is called again. From pje at telecommunity.com Sat Aug 21 00:42:53 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Aug 21 00:42:48 2004 Subject: [Web-SIG] Write buffering (was Re: WSGI uses) In-Reply-To: <41262A1E.5020508@colorstudy.com> References: <5.1.1.6.0.20040820123459.01f20b20@mail.telecommunity.com> <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com> <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com> <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com> <412431AE.9050909@colorstudy.com> <412431AE.9050909@colorstudy.com> <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com> <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com> <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com> <5.1.1.6.0.20040820123459.01f20b20@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040820183539.02964050@mail.telecommunity.com> At 11:43 AM 8/20/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>True enough. I've not seen a problem with CGI myself, but I believe some >>CGI-based protocols (not FastCGI, but some clones of it) buffer entire >>requests no matter what you do. WSGI servers or gateways that can't do >>streaming should document that fact. Or perhaps there should be two >>compliance levels: WSGI Basic and WSGI Streaming. > >It could just be something like a 'wsgi.streaming' key in the environment, >no? Gateways should be encouraged to set that to false until they've >confirmed that streaming really works consistently. I think I've convinced myself that servers or gateways must *always* attempt to stream data passed to write() or yielded by the iterator. The only time this can cause any problems is if the application sends lots of small strings, and the I/O is single-threaded and unbuffered. As a practical matter, TCP/IP stacks usually have at least a K or two of outbound buffering for a connection, don't they? So until that fills up, the application will continue to execute normally. It's not as good as something better, but it'll do. So, I'm updating the spec and recommending that applications do buffering of their own for "moderately sized" responses that are neither too large for buffering nor too small to worry about it. I know that Zope, for example, normally generates its body output as one big block anyway, and I think the common pattern for e.g. Python page templating systems is to produce their output as a single string, rather than in pieces. So, it would seem in common cases that there will only be one write() call anyway. (Especially since buffered dynamic output gives an application an error-handling advantage: it can send an error page rather than dumping error garbage into the middle of a partially-completed response.) From pje at telecommunity.com Sat Aug 21 01:20:49 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Aug 21 01:42:15 2004 Subject: [Web-SIG] Latest WSGI Draft Message-ID: <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> Once again, please pardon me if I missed an update, and gently remind me with a clue by four if need be. :) Or better yet, by supplying a patch implementing your suggested changes. :) I was going to post a diff, but even a unified diff is about as long as the previous version was, and the new draft is almost 50% longer than the old one, as lots of new material has been added about streaming, URL determination, required CGI variables, etc. etc. There's even some extra material in the Rationale and Goals about using WSGI middleware to better modularize frameworks, allowing more mix-and-match between them. I think this is just about ready to submit as an official PEP, get a numbering, and post to c.l.py and Python-Dev, but of course I could be wrong. Your feedback is appreciated. PEP: XXX Title: Python Web Server Gateway Interface v1.0 Version: $Revision: 1.1 $ Last-Modified: $Date: 2004/08/20 19:11:27 $ Author: Phillip J. Eby Discussions-To: Python Web-SIG Status: Draft Type: Informational Content-Type: text/x-rst Created: 07-Dec-2003 Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004 Abstract ======== This document specifies a proposed standard interface between web servers and Python web applications or frameworks, to promote web application portability across a variety of web servers. Rationale and Goals =================== Python currently boasts a wide variety of web application frameworks, such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web -- to name just a few [1]_. This wide variety of choices can be a problem for new Python users, because generally speaking, their choice of web framework will limit their choice of usable web servers, and vice versa. By contrast, although Java has just as many web application frameworks available, Java's "servlet" API makes it possible for applications written with any Java web application framework to run in any web server that supports the servlet API. The availability and widespread use of such an API in web servers for Python -- whether those servers are written in Python (e.g. Medusa), embed Python (e.g. mod_python), or invoke Python via a gateway protocol (e.g. CGI, FastCGI, etc.) -- would separate choice of framework from choice of web server, freeing users to choose a pairing that suits them, while freeing framework and server developers to focus on their area of specialty. This PEP, therefore, proposes a simple and universal interface between web servers and web applications or frameworks: the Python Web Server Gateway Interface (WSGI). But the mere existence of a WSGI spec does nothing to address the existing state of servers and frameworks for Python web applications. Server and framework authors and maintainers must actually implement WSGI for there to be any effect. However, since no existing servers or frameworks support WSGI, there is little immediate reward for an author who implements WSGI support. Thus, WSGI *must* be easy to implement, so that an author's initial investment in the interface can be reasonably low. Thus, simplicity of implementation on *both* the server and framework sides of the interface is absolutely critical to the utility of the WSGI interface, and is therefore the principal criterion for any design decisions. Note, however, that simplicity of implementation for a framework author is not the same thing as ease of use for a web application author. WSGI presents an absolutely "no frills" interface to the framework author, because bells and whistles like response objects and cookie handling would just get in the way of existing frameworks' handling of these issues. Again, the goal of WSGI is to facilitate easy interconnection of existing servers and applications or frameworks, not to create a new web framework. Note also that this goal precludes WSGI from requiring anything that is not already available in deployed versions of Python. Therefore, new standard library modules are not proposed or required by this specification, and nothing in WSGI requires a Python version greater than 1.5.2. (It would be a good idea, however, for future versions of Python to include support for this interface in web servers provided by the standard library.) In addition to ease of implementation for existing and future frameworks and servers, it should also be easy to create request preprocessors, response postprocessors, and other WSGI-based "middleware" components that look like an application to their containing server, while acting as a server for their contained applications. If middleware can be both simple and robust, and WSGI is widely available in servers and frameworks, it allows for the possibility of an entirely new kind of Python web application framework: one consisting of loosely-coupled WSGI middleware components. Indeed, existing framework authors may even choose to refactor their frameworks' existing services to be provided in this way, becoming more like libraries used with WSGI, and less like monolithic frameworks. This would then allow application developers to choose "best-of-breed" components for specific functionality, rather than having to commit to all the pros and cons of a single framework. Of course, as of this writing, that day is doubtless quite far off. In the meantime, it is a sufficient short-term goal for WSGI to enable the use of any framework with any server. Finally, it should be mentioned that the current version of WSGI does not prescribe any particular mechanism for "deploying" an application for use with a web server or server gateway. At the present time, this is necessarily implementation-defined by the server or gateway. After a sufficient number of servers and frameworks have implemented WSGI to provide field experience with varying deployment requirements, it may make sense to create another PEP, describing a deployment standard for WSGI servers and application frameworks. Specification Overview ====================== The WSGI interface has two sides: the "server" or "gateway" side, and the "application" side. The server side invokes a callable object that is provided by the application side. The specifics of how that object is provided are up to the server or gateway. It is assumed that some servers or gateways will require an application's deployer to write a short script to create an instance of the server or gateway, and supply it with the application object. Other servers and gateways may use configuration files or other mechanisms to specify where the application object should be imported from. The application object is simply a callable object that accepts two arguments. The term "object" should not be misconstrued as requiring an actual object instance: a function, method, class, or instance with a ``__call__`` method are all acceptable for use as an application object. Here are two example application objects; one is a function, and the other is a class:: def simple_app(environ, start_response): """Simplest possible application object""" status = '200 OK' headers = [('Content-type','text/plain')] write = start_response(status, headers) write('Hello world!\n') class AppClass: """Much the same thing, but as a class""" def __init__(self, environ, start_response): self.environ = environ self.start = start_response def __iter__(self): status = '200 OK' headers = [('Content-type','text/plain')] self.start(status, headers) yield "Hello world!\n" for i in range(1,11): yield "Extra line %s\n" % i The server or gateway invokes the application once for each request it receives from a web browser. To illustrate, here is a simple CGI gateway, implemented as a function taking an application object (all error handling omitted):: import os, sys def run_with_cgi(application): environ = {} environ.update(os.environ) environ['wsgi.input'] = sys.stdin environ['wsgi.errors'] = sys.stderr environ['wsgi.version'] = '1.0' environ['wsgi.multithread'] = False environ['wsgi.multiprocess'] = True environ['wsgi.last_call'] = True def start_response(status,headers): print "Status:", status for key,val in headers: print "%s: %s" % (key,val) return sys.stdout.write result = application(environ, start_response) if result: try: for data in result: sys.stdout.write(data) finally: if hasattr(result,'close'): result.close() In the next section, we will specify the precise semantics that these illustrations are examples of. Specification Details ===================== The application object must accept two positional arguments. For the sake of illustration, we have named them ``environ``, and ``start_response``, but they are not required to have these names. A server or gateway *must* invoke the application object using positional (not keyword) arguments. The first parameter is a dictionary object, containing CGI-style environment variables. This object *must* be a builtin Python dictionary (*not* a subclass, ``UserDict`` or other dictionary emulation), and the application is allowed to modify the dictionary in any way it desires. The dictionary must also include certain WSGI-required variables (described in a later section), and may also include server-specific extension variables, named according to a convention that will be described below. The second parameter is a callable accepting two positional arguments: a status string of the form ``"999 Message here"``, and a list of ``(header_name,header_value)`` tuples describing the HTTP response header. This callable must return another callable that takes one parameter: a string to write as part of the HTTP response body. The application object may return either ``None`` (indicating that there is no additional output), or it may return a non-empty iterable yielding strings. (For example, it could be a generator-iterator that yields strings, or it could be a sequence such as a list of strings.) The server or gateway will treat the strings yielded by the iterable as if they had been passed to the ``write()`` method. Also, if the application returns an iterable, and the iterable has a ``close()`` method, the server or gateway *must* call that method upon completion of the current request, whether the request was completed normally, or terminated early due to an error. (This is to support resource release by the application. This protocol is intended to support PEP 325, and also the simple case of an application returning an open text file.) ``environ`` Variables --------------------- The ``environ`` dictionary is required to contain these CGI environment variables, as defined by the Common Gateway Interface specification [2]_. The following variables *must* be present, but *may* be an empty string, if there is no more appropriate value for them: * ``REQUEST_METHOD`` * ``SCRIPT_NAME`` (The initial portion of the request URL's "path" that corresponds to the application object, so that the application knows its virtual "location".) * ``PATH_INFO`` (The remainder of the request URL's "path", designating the virtual "location" of the request's target within the application) * ``QUERY_STRING`` * ``CONTENT_TYPE`` * ``CONTENT_LENGTH`` * ``SERVER_NAME`` and ``SERVER_PORT`` (which, when combined with ``SCRIPT_NAME`` and ``PATH_INFO``, should complete the * Variables corresponding to the client-supplied HTTP headers (i.e., variables whose names begin with ``"HTTP_"``). In general, a server or gateway should attempt to provide as many other CGI variables as are applicable, including e.g. the nonstandard SSL variables such as ``HTTPS=on``, if an SSL connection is in effect. However, an application that uses any variables other than the ones listed above are necessarily non-portable to web servers that do not support the relevant extensions. A WSGI-compliant server or gateway *should* document what variables it provides, along with their definitions as appropriate. Applications *should* check for the presence of any nonstandard variables they require, and have a fallback plan in the event such a variable is absent. Note: missing variables (such as ``REMOTE_USER`` when no authentication has occurred) should be left out of the ``environ`` dictionary. Also note that CGI-defined variables must be strings, if they are present at all. It is a violation of this specification for a CGI variable's value to be of any type other than ``str``. In addition to the CGI-defined variables, the ``environ`` dictionary must also contain the following WSGI-defined variables: ===================== ============================================== Variable Value ===================== ============================================== ``wsgi.version`` The tuple ``(1,0)``, representing WSGI version 1.0. ``wsgi.input`` An input stream from which the HTTP request body can be read. ``wsgi.errors`` An output stream to which error output can be written. For most servers, this will be the server's error log. ``wsgi.multithread`` This value should be true if the application object may be simultaneously invoked by another thread in the same process, and false otherwise. ``wsgi.multiprocess`` This value should be true if an equivalent application object may be simultaneously invoked by another process, and false otherwise. ``wsgi.last_call`` This value should be true if this is expected to be the last invocation of the application in this process. This is provided to allow applications to optimize their setup for long-running vs. short-running scenarios. This flag should normally only be true for CGI applications, or while a server is doing some kind of "graceful shutdown". Note that a server or gateway is still allowed to invoke the application again; this flag is only a "suggestion" to the application that it is unlikely to be reinvoked. ===================== ============================================== Finally, the ``environ`` dictionary may also contain server-defined variables. These variables should be named using only lower-case letters, numbers, dots, and underscores, and should be prefixed with a name that is unique to the defining server or gateway. For example, ``mod_python`` might define variables with names like ``mod_python.some_variable``. This naming convention allows "middleware" components to safely filter out extensions that they do not understand. (E.g. by deleting all keys from ``environ`` that are all-lowercase and do not begin with ``"wsgi."``.) Input and Error Streams ~~~~~~~~~~~~~~~~~~~~~~~ The input and error streams provided by the server must support the following methods: =================== ========== ======== Method Files Notes =================== ========== ======== ``read(size)`` ``input`` ``readline()`` ``input`` 1 ``readlines(hint)`` ``input`` 2 ``__iter__()`` ``input`` ``flush()`` ``errors`` 3 ``write(str)`` ``errors`` ``writelines(seq)`` ``errors`` =================== ========== ======== The semantics of each method are as documented in the Python Library Reference, except for these notes as listed in the table above: 1. The optional "size" argument to ``readline()`` is not supported, as it may be complex for server authors to implement, and is not often used in practice. 2. Note that the ``hint`` argument to ``readlines()`` is optional for both caller and implementer. The application is free not to supply it, and the server or gateway is free to ignore it. 3. Since the ``errors`` stream may not be rewound, a container is free to forward write operations immediately, without buffering. In this case, the ``flush()`` method may be a no-op. Portable applications, however, cannot assume that output is unbuffered or that ``flush()`` is a no-op. They must call ``flush()`` if they need to ensure that output has in fact been written. (For example, to minimize intermingling of data from multiple processes writing to the same error log. The methods listed in the table above *must* be supported by all servers conforming to this specification. Applications conforming to this specification *must not* use any other methods or attributes of the ``input`` or ``errors`` objects. In particular, applications *must not* attempt to close these streams, even if they possess ``close()`` methods. The ``start_response()`` Callable --------------------------------- The second parameter passed to the application object is itself a two-argument callable, used to begin the HTTP response and return a ``write()`` callable. The first parameter the ``start_response()`` callable takes is a "status" string, of the form ``"999 Message here"``, where ``999`` is replaced with the HTTP status code, and ``Message here`` is replaced with the appropriate message text. The string *must* be pure 7-bit ASCII, containing no control characters. In particular, it must not be terminated with a carriage return or linefeed. The second parameter accepted by the ``start_response()`` callable must be a sequence of ``(header_name,header_value)`` tuples. Each ``header_name`` must be a valid HTTP header name, without a trailing colon or other punctuation. Each ``header_value`` *must not* include carriage returns or linefeeds: it should be a raw *unfolded* header value. If the HTTP spec calls for folding of a particular header, the server shall be responsible for performing the folding. (These requirements are to minimize the complexity of parsing required by servers, gateways, and intermediate response processors that need to inspect or modify response headers.) In general, the server or gateway is responsible for ensuring that correct headers are sent to the client: if the application omits a needed header, the server or gateway *should* add it. For example, the HTTP ``Date:`` and ``Server:`` headers would normally be supplied by the server or gateway. If the application supplies a header that the server would ordinarily supply, or that contradicts the server's intended behavior (e.g. supplying a different ``Connection:`` header), the server or gateway *may* discard the conflicting header, provided that its action is recorded for the benefit of the application author. The ``write()`` Callable ------------------------ The return value of the ``start_response()`` callable is a one-argument `write()`` callable, that accepts strings to write as part of the HTTP response body. Note that the purpose of the ``write()`` callable is primarily to support existing application frameworks that support a streaming "push" API. Therefore, strings passed to ``write()`` *must* be sent to the client *as soon as possible*; they must *not* be buffered unless the buffer will be emptied in parallel with the application's continuing execution (e.g. by a separate I/O thread). If the server or gateway does not have a separate I/O thread available, it *must* finish writing the supplied string before it returns from each ``write()`` invocation. If the application returns an iterable, each string produced by the iterable must be treated as though it had been passed to ``write()``, with the data sent in an "as soon as possible" manner. That is, the iterable should not be asked for a new string until the previous string has been sent to the client, or is buffered for such sending by a parallel thread. Notice that these rules discourage the generation of content before a client is ready for it, in excess of the buffer sizes provided by the server and operating system. For this reason, some applications may wish to buffer data internally before passing any of it to ``write()`` or yielding it from an iterator, in order to avoid waiting for the client to catch up with their output. This approach may yield better throughput for dynamically generated pages of moderate size, since the application is then freed for other tasks. In addition to improved performance, buffering all of an application's output has an advantage for error handling: the buffered output can be thrown away and replaced by an error page, rather than dumping an error message in the middle of some partially-completed output. For this and other reasons, many existing Python frameworks already accumulate their output for a single write, unless the application explicitly requests streaming, or the expected output is larger than practical for buffering (e.g. multi-megabyte PDFs). So, these application frameworks are already a natural fit for the WSGI streaming model: for most requests they will only call ``write()`` once anyway! Implementation/Application Notes ================================ Unicode ------- HTTP does not directly support Unicode, and neither does this interface. All encoding/decoding must be handled by the application; all strings and streams passed to or from the server must be standard Python byte strings, not Unicode objects. The result of using a Unicode object where a string object is required, is undefined. Multiple Invocations -------------------- Application objects must be able to be invoked more than once, since virtually all servers/gateways will make such requests. Error Handling -------------- Servers *should* trap and log exceptions raised by applications, and *may* continue to execute, or attempt to shut down gracefully. Applications *should* avoid allowing exceptions to escape their execution scope, since the result of uncaught exceptions is server-defined. Thread Support -------------- Thread support, or lack thereof, is also server-dependent. Servers that can run multiple requests in parallel, *should* also provide the option of running an application in a single-threaded fashion, so that applications or frameworks that are not thread-safe may still be used with that server. URL Reconstruction ------------------ If an application wishes to reconstruct a request's complete URL, it may do so using the following algorithm, contributed by Ian Bicking:: if environ.get('HTTPS') == 'on': url = 'https://' else: url = 'http://' if environ.get('HTTP_HOST'): url += environ['HTTP_HOST'] else: url += environ['SERVER_NAME'] if environ.get('HTTPS') == 'on': if environ['SERVER_PORT'] != '443' url += ':' + environ['SERVER_PORT'] else: if environ['SERVER_PORT'] != '80': url += ':' + environ['SERVER_PORT'] url += environ['SCRIPT_NAME'] url += environ['PATH_INFO'] if environ.get('QUERY_STRING'): url += '?' + environ['QUERY_STRING'] Note that such a reconstructed URL may not be precisely the same URI as requested by the client. Server rewrite rules, for example, may have modified the client's originally requested URL to place it in a canonical form. Application Configuration ------------------------- This specification does not define how a server selects or obtains an application to invoke. These and other configuration options are highly server-specific matters. It is expected that server/gateway authors will document how to configure the server to execute a particular application object, and with what options (such as threading options). Framework authors, on the other hand, should document how to create an application object that wraps their framework's functionality. The user, who has chosen both the server and the application framework, must connect the two together. However, since both the framework and the server now have a common interface, this should be merely a mechanical matter, rather than a significant engineering effort for each new server/framework pair. Middleware ---------- Note that a single object may play the role of a server with respect to some application(s), while also acting as an application with respect to some server(s). Such "middleware" components can perform such functions as: * Routing a request to different application objects based on the target URL, after rewriting the ``environ`` accordingly. * Allowing multiple applications or frameworks to run side-by-side in the same process * Load balancing and remote processing, by forwarding requests and responses over a network * Perform content postprocessing, such as applying XSL stylesheets Given the existence of applications and servers conforming to this specification, the appearance of such reusable middleware becomes a possibility. Middleware components that transform the request or response data should in general remove WSGI extension data from the ``environ`` that the middleware does not understand, to prevent applications from inadvertently bypassing the middleware's mediation of the interaction by use of a server extension. The simplest way to do this is to just delete keys from ``environ`` that are all lowercase and do not begin with ``"wsgi."``, before passing the ``environ`` on to the application. HTTP 1.1 Expect/Continue ------------------------ Servers and gateways *must* provide transparent support for HTTP 1.1's "expect/continue" mechanism, if they implement HTTP 1.1. This may be done in any of several ways: 1. Reject all client requests containing an ``Expect: 100-continue`` header with a "417 Expectation failed" error. Such requests will not be forwarded to an application object. 2. Respond to requests containing an ``Expect: 100-continue`` request with an immediate "100 Continue" response, and proceed normally. 3. Proceed with the request normally, but provide the application with a ``wsgi.input`` stream that will send the "100 Continue" response if/when the application first attempts to read from the input stream. The read request must then remain blocked until the client responds. Note that this behavior restriction does not apply for HTTP 1.0 requests, or for requests that are not directed to an application object. For more information on HTTP 1.1 Expect/Continue, see RFC 2616, sections 8.2.3 and 10.1.1. Questions and Answers ===================== 1. Why must ``environ`` be a dictionary? What's wrong with using a subclass? The rationale for requiring a dictionary is to maximize portability between servers. The alternative would be to define some subset of a dictionary's methods as being the standard and portable interface. In practice, however, most servers will probably find a dictionary adequate to their needs, and thus framework authors will come to expect the full set of dictionary features to be available, since they will be there more often than not. But, if some server chooses *not* to use a dictionary, then there will be interoperability problems despite that server's "conformance" to spec. Therefore, making a dictionary mandatory simplifies the specification and guarantees interoperabilty. Note that this does not prevent server or framework developers from offering specialized services as custom variables *inside* the ``environ`` dictionary. This is the recommended approach for offering any such value-added services. 2. Why can you call ``write()`` *and* yield strings/return an iterator? Shouldn't we pick just one way? If we supported only the iteration approach, then current frameworks that assume the availability of "push" suffer. But, if we only support pushing via ``write()``, then server performance suffers for transmission of e.g. large files (if a worker thread can't begin work on a new request until all of the output has been sent). Thus, this compromise allows an application framework to support both approaches, as appropriate, but with only a little more burden to the server implementor than a push-only approach would require. 3. What's the ``close()`` for? When writes are done from during the execution of an application object, the application can ensure that resources are released using a try/finally block. But, if the application returns an iterator, any resources used will not be released until the iterator is garbage collected. The ``close()`` idiom allows an application to release critical resources at the end of a request, and it's forward-compatible with the support for try/finally in generators that's proposed by PEP 325. 4. Why is this interface so low-level? I want feature X! (e.g. cookies, sessions, persistence, ...) This isn't Yet Another Python Web Framework. It's just a way for frameworks to talk to web servers, and vice versa. If you want these features, you need to pick a web framework that provides the features you want. And if that framework lets you create a WSGI application, you should be able to run it in most WSGI-supporting servers. Also, some WSGI servers may offer additional services via objects provided in their ``environ`` dictionary; see the applicable server documentation for details. (Of course, applications that use such extensions will not be portable to other WSGI-based servers.) 5. Why use CGI variables instead of good old HTTP headers? And why mix them in with WSGI-defined variables? Many existing web frameworks are built heavily upon the CGI spec, and existing web servers know how to generate CGI variables. In contrast, alternative ways of representing inbound HTTP information are fragmented and lack market share. Thus, using the CGI "standard" seems like a good way to leverage existing implementations. As for mixing them with WSGI variables, separating them would just require two dictionary arguments to be passed around, while providing no real benefits. 6. What about the status string? Can't we just use the number, passing in ``200`` instead of ``"200 OK"``? Doing this would complicate the server or gateway, by requiring them to have a table of numeric statuses and corresponding messages. By contrast, it is easy for an application or framework author to type the extra text to go with the specific response code they are using, and existing frameworks often already have a table containing the needed messages. So, on balance it seems better to make the application/framework responsible, rather than the server or gateway. Acknowledgements ================ Thanks go to the many folks on the Web-SIG mailing list whose thoughtful feedback made this revised draft possible. Especially: * Gregory "Grisha" Trubetskoy, author of ``mod_python``, who beat up on the first draft as not offering any advantages over "plain old CGI", thus encouraging me to look for a better approach. * Ian Bicking, who helped nag me into properly specifying the multithreading and multiprocess options, as well as badgering me to provide a mechanism for servers to supply custom extension data to an application. * Tony Lownds, who came up with the concept of a ``start_response`` function that took the status and headers, returning a ``write`` function. References ========== .. [1] The Python Wiki "Web Programming" topic (http://www.python.org/cgi-bin/moinmoin/WebProgramming) .. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: From pje at telecommunity.com Sat Aug 21 20:28:06 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Aug 21 20:27:52 2004 Subject: [Web-SIG] Latest WSGI Draft Message-ID: <5.1.1.6.0.20040821142742.027283d0@mail.telecommunity.com> At 10:51 AM 8/21/04 -0700, tony@lownds.com wrote: > > I think this is just about ready to submit as an official PEP, get a > > numbering, and post to c.l.py and Python-Dev, but of course I could be > > wrong. Your feedback is appreciated. > >+1 on PEPing it > >This diff addresses one typo, qualifies the claim of 1.5.2 support a bit, >and adds some language that I imagine server implementors who want to >support Keep-alive would find useful. None of these changes are that >important to me, if you disagree with them in the PEP I could still >comment on them. I think I'm going to expand on the chunked-encoding part a bit, because you've got some good stuff in there about when the server can and can't supply an omitted 'Content-Length'. I think it should actually go in the section about the server supplying omitted headers, rather than buried in a later note. The "application note" should just mention the option of using chunked encoding as an alternative to closing the connection when Content-Length isn't supplied by the app. As for checking the length of the iterable, I'm okay with that, but it should be wrapped in a try: because it shouldn't be required that the iterable have a __len__ method. Finally, regarding 1.5.2, I'm fine with dropping that claim altogether, if it saves us having to spell out the pre-2.2 iteration protocol (__len__/__getitem__/IndexError). From ianb at colorstudy.com Sun Aug 22 06:29:03 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Sun Aug 22 06:55:50 2004 Subject: [Web-SIG] Latest WSGI Draft In-Reply-To: <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> References: <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> Message-ID: <4128210F.3050901@colorstudy.com> Phillip J. Eby wrote: > Once again, please pardon me if I missed an update, and gently remind me > with a clue by four if need be. :) Or better yet, by supplying a patch > implementing your suggested changes. :) > > I was going to post a diff, but even a unified diff is about as long as > the previous version was, and the new draft is almost 50% longer than > the old one, as lots of new material has been added about streaming, URL > determination, required CGI variables, etc. etc. There's even some > extra material in the Rationale and Goals about using WSGI middleware to > better modularize > frameworks, allowing more mix-and-match between them. > > I think this is just about ready to submit as an official PEP, get a > numbering, and post to c.l.py and Python-Dev, but of course I could be > wrong. Your feedback is appreciated. I think it's ready as well. I have only a couple small comments, which are mostly about language. There's going to be more discussion later anyway, so why not get started with the second round. > PEP: XXX For some reason this got caught as spam. I blame it on these triple Xs. > Title: Python Web Server Gateway Interface v1.0 > Version: $Revision: 1.1 $ > Last-Modified: $Date: 2004/08/20 19:11:27 $ > Author: Phillip J. Eby > Discussions-To: Python Web-SIG > Status: Draft > Type: Informational > Content-Type: text/x-rst > Created: 07-Dec-2003 > Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004 > > > Abstract > ======== > > This document specifies a proposed standard interface between web > servers and Python web applications or frameworks, to promote > web application portability across a variety of web servers. > > > Rationale and Goals > =================== > > Python currently boasts a wide variety of web application > frameworks, such as Zope, Quixote, Webware, SkunkWeb, PSO, > and Twisted Web -- to name just a few [1]_. This wide variety > of choices can be a problem for new Python users, because > generally speaking, their choice of web framework will limit > their choice of usable web servers, and vice versa. > > By contrast, although Java has just as many web application > frameworks available, Java's "servlet" API makes it possible > for applications written with any Java web application framework > to run in any web server that supports the servlet API. > > The availability and widespread use of such an API in web > servers for Python -- whether those servers are written in > Python (e.g. Medusa), embed Python (e.g. mod_python), or > invoke Python via a gateway protocol (e.g. CGI, FastCGI, > etc.) -- would separate choice of framework from choice > of web server, freeing users to choose a pairing that suits > them, while freeing framework and server developers to focus > on their area of specialty. > > This PEP, therefore, proposes a simple and universal interface > between web servers and web applications or frameworks: the > Python Web Server Gateway Interface (WSGI). > > But the mere existence of a WSGI spec does nothing to address the > existing state of servers and frameworks for Python web applications. > Server and framework authors and maintainers must actually implement > WSGI for there to be any effect. > > However, since no existing servers or frameworks support WSGI, there > is little immediate reward for an author who implements WSGI support. > Thus, WSGI *must* be easy to implement, so that an author's initial > investment in the interface can be reasonably low. > > Thus, simplicity of implementation on *both* the server and framework > sides of the interface is absolutely critical to the utility of the > WSGI interface, and is therefore the principal criterion for any > design decisions. > > Note, however, that simplicity of implementation for a framework > author is not the same thing as ease of use for a web application > author. WSGI presents an absolutely "no frills" interface to the > framework author, because bells and whistles like response objects > and cookie handling would just get in the way of existing frameworks' > handling of these issues. Again, the goal of WSGI is to facilitate > easy interconnection of existing servers and applications or > frameworks, not to create a new web framework. > > Note also that this goal precludes WSGI from requiring anything that > is not already available in deployed versions of Python. Therefore, > new standard library modules are not proposed or required by this > specification, and nothing in WSGI requires a Python version greater > than 1.5.2. (It would be a good idea, however, for future versions > of Python to include support for this interface in web servers > provided by the standard library.) Like you said, maybe 1.5.2 is optimistic. The spec works for 1.5.2, but most servers and applications will have higher requirements, and the iteration is annoying to handle in those versions. > In addition to ease of implementation for existing and future > frameworks and servers, it should also be easy to create request > preprocessors, response postprocessors, and other WSGI-based > "middleware" components that look like an application to their > containing server, while acting as a server for their contained > applications. > > If middleware can be both simple and robust, and WSGI is widely > available in servers and frameworks, it allows for the possibility > of an entirely new kind of Python web application framework: one > consisting of loosely-coupled WSGI middleware components. Indeed, > existing framework authors may even choose to refactor their > frameworks' existing services to be provided in this way, becoming > more like libraries used with WSGI, and less like monolithic > frameworks. This would then allow application developers to choose > "best-of-breed" components for specific functionality, rather than > having to commit to all the pros and cons of a single framework. > > Of course, as of this writing, that day is doubtless quite far off. > In the meantime, it is a sufficient short-term goal for WSGI to > enable the use of any framework with any server. That's a awfully pessimistic paragraph ;) > Finally, it should be mentioned that the current version of WSGI > does not prescribe any particular mechanism for "deploying" an > application for use with a web server or server gateway. At the > present time, this is necessarily implementation-defined by the > server or gateway. After a sufficient number of servers and > frameworks have implemented WSGI to provide field experience with > varying deployment requirements, it may make sense to create > another PEP, describing a deployment standard for WSGI servers and > application frameworks. > > > > Specification Overview > ====================== > > The WSGI interface has two sides: the "server" or "gateway" side, > and the "application" side. The server side invokes a callable > object that is provided by the application side. The specifics > of how that object is provided are up to the server or gateway. > It is assumed that some servers or gateways will require an > application's deployer to write a short script to create an > instance of the server or gateway, and supply it with the > application object. Other servers and gateways may use > configuration files or other mechanisms to specify where the > application object should be imported from. Maybe "gateway" is just distracting. > The application object is simply a callable object that accepts > two arguments. The term "object" should not be misconstrued as > requiring an actual object instance: a function, method, class, > or instance with a ``__call__`` method are all acceptable for > use as an application object. Here are two example application > objects; one is a function, and the other is a class:: > > def simple_app(environ, start_response): > """Simplest possible application object""" > status = '200 OK' > headers = [('Content-type','text/plain')] > write = start_response(status, headers) > write('Hello world!\n') > > > class AppClass: > """Much the same thing, but as a class""" > > def __init__(self, environ, start_response): > self.environ = environ > self.start = start_response > > def __iter__(self): > status = '200 OK' > headers = [('Content-type','text/plain')] > self.start(status, headers) > > yield "Hello world!\n" > for i in range(1,11): > yield "Extra line %s\n" % i This second example confuses me. Though as I reread it I realize more clearly what it's doing; __init__ is the callable (in essence), but self is automatically returned. I think an instance with a __call__ method would be easier to understand. OTOH, there's more concurrency overhead. I dunno. Anyway, that one confused me. > The server or gateway invokes the application once for each request > it receives from a web browser. To illustrate, here is a simple > CGI gateway, implemented as a function taking an application object > (all error handling omitted):: > > import os, sys > > def run_with_cgi(application): > > environ = {} > environ.update(os.environ) > environ['wsgi.input'] = sys.stdin > environ['wsgi.errors'] = sys.stderr > environ['wsgi.version'] = '1.0' > environ['wsgi.multithread'] = False > environ['wsgi.multiprocess'] = True > environ['wsgi.last_call'] = True > > def start_response(status,headers): > print "Status:", status > for key,val in headers: > print "%s: %s" % (key,val) > return sys.stdout.write > > result = application(environ, start_response) > if result: > try: > for data in result: > sys.stdout.write(data) > finally: > if hasattr(result,'close'): > result.close() > > In the next section, we will specify the precise semantics that > these illustrations are examples of. > > > Specification Details > ===================== > > The application object must accept two positional arguments. For > the sake of illustration, we have named them ``environ``, and > ``start_response``, but they are not required to have these names. > A server or gateway *must* invoke the application object using > positional (not keyword) arguments. > > The first parameter is a dictionary object, containing CGI-style > environment variables. I think the spec is easier to understand if you use names here, i.e., "environ is a dictionary object". Or remind the reader of the invocation, i.e., note application(environ, start_response) is called. > This object *must* be a builtin Python > dictionary (*not* a subclass, ``UserDict`` or other dictionary > emulation), and the application is allowed to modify the dictionary > in any way it desires. The dictionary must also include certain > WSGI-required variables (described in a later section), and may > also include server-specific extension variables, named according > to a convention that will be described below. > > The second parameter is a callable accepting two positional > arguments: a status string of the form ``"999 Message here"``, > and a list of ``(header_name,header_value)`` tuples describing the > HTTP response header. This callable must return another callable > that takes one parameter: a string to write as part of the HTTP > response body. "This callable must return a writing function: a function that takes a single string as an argument, which is written as the HTTP response body." I guess "function" is more specific than "callable", but it seems easier to understand. Though honestly, I find the CGI example the easiest way to understand this, so maybe being more accurate here is fine. > The application object may return either ``None`` (indicating that > there is no additional output), or it may return a non-empty > iterable yielding strings. (For example, it could be a > generator-iterator that yields strings, or it could be a > sequence such as a list of strings.) The server or gateway will > treat the strings yielded by the iterable as if they had been > passed to the ``write()`` method. > > Also, if the application returns an iterable, and the iterable has a > ``close()`` method, the server or gateway *must* call that method > upon completion of the current request, whether the request was > completed normally, or terminated early due to an error. (This is to > support resource release by the application. This protocol is > intended to support PEP 325, and also the simple case of an > application returning an open text file.) > > > ``environ`` Variables > --------------------- > > The ``environ`` dictionary is required to contain these CGI environment > variables, as defined by the Common Gateway Interface specification > [2]_. The following variables *must* be present, but *may* be an empty > string, if there is no more appropriate value for them: > > * ``REQUEST_METHOD`` > > * ``SCRIPT_NAME`` (The initial portion of the request URL's "path" that > corresponds to the application object, so that the application knows > its virtual "location".) > > * ``PATH_INFO`` (The remainder of the request URL's "path", designating > the virtual "location" of the request's target within the application) > > * ``QUERY_STRING`` > > * ``CONTENT_TYPE`` > > * ``CONTENT_LENGTH`` > > * ``SERVER_NAME`` and ``SERVER_PORT`` (which, when combined with > ``SCRIPT_NAME`` and ``PATH_INFO``, should complete the You forgot to finish your sentence. Also SERVER_NAME is a fallback if HTTP_HOST isn't present; generally SERVER_NAME indicates the canonical host name, not necessarily the actual host name. > * Variables corresponding to the client-supplied HTTP headers (i.e., > variables whose names begin with ``"HTTP_"``). > > In general, a server or gateway should attempt to provide as many > other CGI variables as are applicable, including e.g. the nonstandard > SSL variables such as ``HTTPS=on``, if an SSL connection is in effect. > However, an application that uses any variables other than the ones > listed above are necessarily non-portable to web servers that do not > support the relevant extensions. > > A WSGI-compliant server or gateway *should* document what variables > it provides, along with their definitions as appropriate. Applications > *should* check for the presence of any nonstandard variables they > require, and have a fallback plan in the event such a variable is > absent. > > Note: missing variables (such as ``REMOTE_USER`` when no > authentication has occurred) should be left out of the ``environ`` > dictionary. Also note that CGI-defined variables must be strings, > if they are present at all. It is a violation of this specification > for a CGI variable's value to be of any type other than ``str``. > > In addition to the CGI-defined variables, the ``environ`` dictionary > must also contain the following WSGI-defined variables: > > ===================== ============================================== > Variable Value > ===================== ============================================== > ``wsgi.version`` The tuple ``(1,0)``, representing WSGI > version 1.0. > > ``wsgi.input`` An input stream from which the HTTP request > body can be read. > > ``wsgi.errors`` An output stream to which error output can > be written. For most servers, this will be > the server's error log. > > ``wsgi.multithread`` This value should be true if the application > object may be simultaneously invoked by > another thread in the same process, and > false otherwise. > > ``wsgi.multiprocess`` This value should be true if an equivalent > application object may be simultaneously > invoked by another process, and false > otherwise. > > ``wsgi.last_call`` This value should be true if this is expected > to be the last invocation of the application > in this process. This is provided to allow > applications to optimize their setup for > long-running vs. short-running scenarios. > This flag should normally only be true for > CGI applications, or while a server is doing > some kind of "graceful shutdown". Note that > a server or gateway is still allowed to invoke > the application again; this flag is only > a "suggestion" to the application that it is > unlikely to be reinvoked. wsgi.last_call seems to complicated from this. Really, it's for CGI and nothing else. Maybe just wsgi.cgi? wsgi.run_once? I think the semantics shouldn't be any more general than that. Then we can also guarantee that it won't be called again. > ===================== ============================================== > > Finally, the ``environ`` dictionary may also contain server-defined > variables. These variables should be named using only lower-case > letters, numbers, dots, and underscores, and should be prefixed with > a name that is unique to the defining server or gateway. For > example, ``mod_python`` might define variables with names like > ``mod_python.some_variable``. This naming convention allows > "middleware" components to safely filter out extensions that they > do not understand. (E.g. by deleting all keys from ``environ`` that > are all-lowercase and do not begin with ``"wsgi."``.) > > > Input and Error Streams > ~~~~~~~~~~~~~~~~~~~~~~~ > > The input and error streams provided by the server must support > the following methods: > > =================== ========== ======== > Method Files Notes > =================== ========== ======== > ``read(size)`` ``input`` > ``readline()`` ``input`` 1 > ``readlines(hint)`` ``input`` 2 > ``__iter__()`` ``input`` > ``flush()`` ``errors`` 3 > ``write(str)`` ``errors`` > ``writelines(seq)`` ``errors`` > =================== ========== ======== > > The semantics of each method are as documented in the Python Library > Reference, except for these notes as listed in the table above: > > 1. The optional "size" argument to ``readline()`` is not supported, > as it may be complex for server authors to implement, and is not > often used in practice. > > 2. Note that the ``hint`` argument to ``readlines()`` is optional for > both caller and implementer. The application is free not to > supply it, and the server or gateway is free to ignore it. > > 3. Since the ``errors`` stream may not be rewound, a container is > free to forward write operations immediately, without buffering. > In this case, the ``flush()`` method may be a no-op. Portable > applications, however, cannot assume that output is unbuffered > or that ``flush()`` is a no-op. They must call ``flush()`` if > they need to ensure that output has in fact been written. (For > example, to minimize intermingling of data from multiple processes > writing to the same error log. > > The methods listed in the table above *must* be supported by all > servers conforming to this specification. Applications conforming > to this specification *must not* use any other methods or attributes > of the ``input`` or ``errors`` objects. In particular, applications > *must not* attempt to close these streams, even if they possess > ``close()`` methods. > > > The ``start_response()`` Callable > --------------------------------- > > The second parameter passed to the application object is itself a > two-argument callable, used to begin the HTTP response and return > a ``write()`` callable. "The second parameters passed to the application object (start_response) is a callable, used like ``start_response(status, headers)``. The status argument is a string like "404 Not Found" or "200 OK". This string must be pure 7-bit ASCII, containing no control characters, and not terminated with a return or linefeed. The headers argument is a sequence of ``(header_name, header_value)`` tuples. Each ``header_name`` must be a valid... (and continuing on with your text). Though I'm not clear what "folding" means. I'm guessing you mean: Header: blah continuing Header content Does the HTTP spec care about folding? Seems like a distraction to mention it. > The first parameter the ``start_response()`` > callable takes is a "status" string, of the form ``"999 Message here"``, > where ``999`` is replaced with the HTTP status code, and ``Message here`` > is replaced with the appropriate message text. The string *must* be > pure 7-bit ASCII, containing no control characters. In particular, > it must not be terminated with a carriage return or linefeed. > > The second parameter accepted by the ``start_response()`` callable > must be a sequence of ``(header_name,header_value)`` tuples. Each > ``header_name`` must be a valid HTTP header name, without a > trailing colon or other punctuation. Each ``header_value`` > *must not* include carriage returns or linefeeds: it should be a raw > *unfolded* header value. If the HTTP spec calls for folding of a > particular header, the server shall be responsible for performing the > folding. (These requirements are to minimize the complexity of parsing > required by servers, gateways, and intermediate response processors > that need to inspect or modify response headers.) > > In general, the server or gateway is responsible for ensuring that > correct headers are sent to the client: if the application omits > a needed header, the server or gateway *should* add it. For example, > the HTTP ``Date:`` and ``Server:`` headers would normally be supplied > by the server or gateway. If the application supplies a header that > the server would ordinarily supply, or that contradicts the server's > intended behavior (e.g. supplying a different ``Connection:`` header), > the server or gateway *may* discard the conflicting header, provided > that its action is recorded for the benefit of the application author. > > > The ``write()`` Callable > ------------------------ > > The return value of the ``start_response()`` callable is a one-argument > `write()`` callable, that accepts strings to write as part of the > HTTP response body. > > Note that the purpose of the ``write()`` callable is primarily to > support existing application frameworks that support a streaming "push" > API. Therefore, strings passed to ``write()`` *must* be sent to the > client *as soon as possible*; they must *not* be buffered unless the > buffer will be emptied in parallel with the application's continuing > execution (e.g. by a separate I/O thread). If the server or gateway > does not have a separate I/O thread available, it *must* finish > writing the supplied string before it returns from each ``write()`` > invocation. > > If the application returns an iterable, each string produced by the > iterable must be treated as though it had been passed to ``write()``, > with the data sent in an "as soon as possible" manner. That is, > the iterable should not be asked for a new string until the previous > string has been sent to the client, or is buffered for such sending > by a parallel thread. > > Notice that these rules discourage the generation of content before a > client is ready for it, in excess of the buffer sizes provided by the > server and operating system. For this reason, some applications may > wish to buffer data internally before passing any of it to ``write()`` > or yielding it from an iterator, in order to avoid waiting for the > client to catch up with their output. This approach may yield better > throughput for dynamically generated pages of moderate size, since the > application is then freed for other tasks. > > In addition to improved performance, buffering all of an application's > output has an advantage for error handling: the buffered output can > be thrown away and replaced by an error page, rather than dumping an > error message in the middle of some partially-completed output. For > this and other reasons, many existing Python frameworks already > accumulate their output for a single write, unless the application > explicitly requests streaming, or the expected output is larger than > practical for buffering (e.g. multi-megabyte PDFs). So, these > application frameworks are already a natural fit for the WSGI > streaming model: for most requests they will only call ``write()`` > once anyway! > > > Implementation/Application Notes > ================================ > > > Unicode > ------- > > HTTP does not directly support Unicode, and neither does this > interface. All encoding/decoding must be handled by the application; > all strings and streams passed to or from the server must be standard > Python byte strings, not Unicode objects. The result of using a > Unicode object where a string object is required, is undefined. > > > Multiple Invocations > -------------------- > > Application objects must be able to be invoked more than once, since > virtually all servers/gateways will make such requests. > > > Error Handling > -------------- > > Servers *should* trap and log exceptions raised by > applications, and *may* continue to execute, or attempt to shut down > gracefully. Applications *should* avoid allowing exceptions to > escape their execution scope, since the result of uncaught exceptions > is server-defined. > > > Thread Support > -------------- > > Thread support, or lack thereof, is also server-dependent. > Servers that can run multiple requests in parallel, *should* also > provide the option of running an application in a single-threaded > fashion, so that applications or frameworks that are not thread-safe > may still be used with that server. > > > URL Reconstruction > ------------------ > > If an application wishes to reconstruct a request's complete URL, > it may do so using the following algorithm, contributed by Ian > Bicking:: > > if environ.get('HTTPS') == 'on': > url = 'https://' > else: > url = 'http://' > > if environ.get('HTTP_HOST'): > url += environ['HTTP_HOST'] > else: > url += environ['SERVER_NAME'] > > if environ.get('HTTPS') == 'on': > if environ['SERVER_PORT'] != '443' > url += ':' + environ['SERVER_PORT'] > else: > if environ['SERVER_PORT'] != '80': > url += ':' + environ['SERVER_PORT'] > > url += environ['SCRIPT_NAME'] > url += environ['PATH_INFO'] > if environ.get('QUERY_STRING'): > url += '?' + environ['QUERY_STRING'] > > Note that such a reconstructed URL may not be precisely the > same URI as requested by the client. Server rewrite rules, for > example, may have modified the client's originally requested URL > to place it in a canonical form. > > > Application Configuration > ------------------------- > > This specification does not define how a server selects or > obtains an application to invoke. These and other configuration > options are highly server-specific matters. It is expected that > server/gateway authors will document how to configure the server to > execute a particular application object, and with what options (such > as threading options). > > Framework authors, on the other hand, should document how to create > an application object that wraps their framework's functionality. > The user, who has chosen both the server and the application > framework, must connect the two together. However, since both the > framework and the server now have a common interface, this should > be merely a mechanical matter, rather than a significant engineering > effort for each new server/framework pair. > > > Middleware > ---------- > > Note that a single object may play the role of a server with respect > to some application(s), while also acting as an application with > respect to some server(s). Such "middleware" components can perform > such functions as: > > * Routing a request to different application objects based on the > target URL, after rewriting the ``environ`` accordingly. > > * Allowing multiple applications or frameworks to run side-by-side > in the same process > > * Load balancing and remote processing, by forwarding requests and > responses over a network > > * Perform content postprocessing, such as applying XSL stylesheets > > Given the existence of applications and servers conforming to this > specification, the appearance of such reusable middleware becomes > a possibility. > > Middleware components that transform the request or response data > should in general remove WSGI extension data from the ``environ`` > that the middleware does not understand, to prevent applications > from inadvertently bypassing the middleware's mediation of the > interaction by use of a server extension. The simplest way to do > this is to just delete keys from ``environ`` that are all lowercase > and do not begin with ``"wsgi."``, before passing the ``environ`` > on to the application. I don't understand this. To me it seems more reasonable that middleware leave the extra arguments in place. For instance, lets say I have a URL redirecting middleware. There's a chance I need to look at the parsed form of QUERY_STRING, and I cache the result as a dictionary in, say, webkit.query_vars. That's just as valid later. Oh, well, unless someone rewrites QUERY_STRING. So to be safe, I put the query string I parsed in webkit.query_string. But maybe I have some other middleware that handles configuration. It runs after the URL parser, for localized configuration. It doesn't necessarily know about the query string, or about the other piece of middleware. And it shouldn't know about it, because what would be the point of that? They are decoupled. But I don't want it throwing away that information. In that case, it's just some lost time reparsing the URL, but I can imagine more important things, and a lot of pieces of middleware where the only point is that they add something to the environ dictionary. E.g., a session-handling middleware. There's not point to these if other middleware is going to throw information away. If there's reliability issues -- like middleware rewriting QUERY_STRING, but passing through a cached parse of the old QUERY_STRING that it didn't know about -- these can be handled pretty easily. But if one middleware throws away keys it doesn't know about, it messes up the whole stack. > HTTP 1.1 Expect/Continue > ------------------------ > > Servers and gateways *must* provide transparent support for HTTP 1.1's > "expect/continue" mechanism, if they implement HTTP 1.1. This may be > done in any of several ways: > > 1. Reject all client requests containing an ``Expect: 100-continue`` > header with a "417 Expectation failed" error. Such requests will > not be forwarded to an application object. > > 2. Respond to requests containing an ``Expect: 100-continue`` request > with an immediate "100 Continue" response, and proceed normally. > > 3. Proceed with the request normally, but provide the application with > a ``wsgi.input`` stream that will send the "100 Continue" response > if/when the application first attempts to read from the input > stream. The read request must then remain blocked until the client > responds. > > Note that this behavior restriction does not apply for HTTP 1.0 requests, > or for requests that are not directed to an application object. For more > information on HTTP 1.1 Expect/Continue, see RFC 2616, sections 8.2.3 > and 10.1.1. > > > > > Questions and Answers > ===================== > > 1. Why must ``environ`` be a dictionary? What's wrong with using > a subclass? > > The rationale for requiring a dictionary is to maximize > portability between servers. The alternative would be to define > some subset of a dictionary's methods as being the standard and > portable interface. In practice, however, most servers will > probably find a dictionary adequate to their needs, and thus > framework authors will come to expect the full set of dictionary > features to be available, since they will be there more often > than not. But, if some server chooses *not* to use a dictionary, > then there will be interoperability problems despite that > server's "conformance" to spec. Therefore, making a dictionary > mandatory simplifies the specification and guarantees > interoperabilty. > > Note that this does not prevent server or framework developers > from offering specialized services as custom variables *inside* > the ``environ`` dictionary. This is the recommended approach > for offering any such value-added services. > > 2. Why can you call ``write()`` *and* yield strings/return an > iterator? Shouldn't we pick just one way? > > If we supported only the iteration approach, then current > frameworks that assume the availability of "push" suffer. > But, if we only support pushing via ``write()``, then > server performance suffers for transmission of e.g. large > files (if a worker thread can't begin work on a new request > until all of the output has been sent). Thus, this compromise > allows an application framework to support both approaches, as > appropriate, but with only a little more burden to the server > implementor than a push-only approach would require. > > 3. What's the ``close()`` for? > > When writes are done from during the execution of an application > object, the application can ensure that resources are released > using a try/finally block. But, if the application returns an > iterator, any resources used will not be released until the > iterator is garbage collected. The ``close()`` idiom allows > an application to release critical resources at the end of a > request, and it's forward-compatible with the support for > try/finally in generators that's proposed by PEP 325. > > 4. Why is this interface so low-level? I want feature X! (e.g. > cookies, sessions, persistence, ...) > > This isn't Yet Another Python Web Framework. It's just a way > for frameworks to talk to web servers, and vice versa. If you > want these features, you need to pick a web framework that > provides the features you want. And if that framework lets > you create a WSGI application, you should be able to run it > in most WSGI-supporting servers. Also, some WSGI servers may > offer additional services via objects provided in their > ``environ`` dictionary; see the applicable server documentation > for details. (Of course, applications that use such extensions > will not be portable to other WSGI-based servers.) > > 5. Why use CGI variables instead of good old HTTP headers? And > why mix them in with WSGI-defined variables? > > Many existing web frameworks are built heavily upon the CGI spec, > and existing web servers know how to generate CGI variables. In > contrast, alternative ways of representing inbound HTTP information > are fragmented and lack market share. Thus, using the CGI > "standard" seems like a good way to leverage existing > implementations. As for mixing them with WSGI variables, separating > them would just require two dictionary arguments to be passed > around, while providing no real benefits. > > 6. What about the status string? Can't we just use the number, > passing in ``200`` instead of ``"200 OK"``? > > Doing this would complicate the server or gateway, by requiring > them to have a table of numeric statuses and corresponding > messages. By contrast, it is easy for an application or framework > author to type the extra text to go with the specific response code > they are using, and existing frameworks often already have a table > containing the needed messages. So, on balance it seems better to > make the application/framework responsible, rather than the server > or gateway. > > > Acknowledgements > ================ > > Thanks go to the many folks on the Web-SIG mailing list whose > thoughtful feedback made this revised draft possible. Especially: > > * Gregory "Grisha" Trubetskoy, author of ``mod_python``, who > beat up on the first draft as not offering any advantages > over "plain old CGI", thus encouraging me to look for a > better approach. > > * Ian Bicking, who helped nag me into properly specifying > the multithreading and multiprocess options, as well as > badgering me to provide a mechanism for servers to supply > custom extension data to an application. > > * Tony Lownds, who came up with the concept of a ``start_response`` > function that took the status and headers, returning a ``write`` > function. > > > References > ========== > > .. [1] The Python Wiki "Web Programming" topic > (http://www.python.org/cgi-bin/moinmoin/WebProgramming) > > .. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft > (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt) > > > Copyright > ========= > > This document has been placed in the public domain. > > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > End: > From pje at telecommunity.com Sun Aug 22 19:12:02 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Aug 22 19:11:54 2004 Subject: [Web-SIG] Latest WSGI Draft In-Reply-To: <4128210F.3050901@colorstudy.com> References: <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> At 11:29 PM 8/21/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>Note also that this goal precludes WSGI from requiring anything that >>is not already available in deployed versions of Python. Therefore, >>new standard library modules are not proposed or required by this >>specification, and nothing in WSGI requires a Python version greater >>than 1.5.2. (It would be a good idea, however, for future versions >>of Python to include support for this interface in web servers >>provided by the standard library.) > >Like you said, maybe 1.5.2 is optimistic. The spec works for 1.5.2, but >most servers and applications will have higher requirements, and the >iteration is annoying to handle in those versions. Fine, we'll say 2.2.2, since that version had True and False as well as __iter__. >>If middleware can be both simple and robust, and WSGI is widely >>available in servers and frameworks, it allows for the possibility >>of an entirely new kind of Python web application framework: one >>consisting of loosely-coupled WSGI middleware components. Indeed, >>existing framework authors may even choose to refactor their >>frameworks' existing services to be provided in this way, becoming >>more like libraries used with WSGI, and less like monolithic >>frameworks. This would then allow application developers to choose >>"best-of-breed" components for specific functionality, rather than >>having to commit to all the pros and cons of a single framework. >>Of course, as of this writing, that day is doubtless quite far off. >>In the meantime, it is a sufficient short-term goal for WSGI to >>enable the use of any framework with any server. > >That's a awfully pessimistic paragraph ;) Are you being ironic? I'm not sure I follow you here. >>The WSGI interface has two sides: the "server" or "gateway" side, >>and the "application" side. The server side invokes a callable >>object that is provided by the application side. The specifics >>of how that object is provided are up to the server or gateway. >>It is assumed that some servers or gateways will require an >>application's deployer to write a short script to create an >>instance of the server or gateway, and supply it with the >>application object. Other servers and gateways may use >>configuration files or other mechanisms to specify where the >>application object should be imported from. > >Maybe "gateway" is just distracting. Do you have a specific suggestion here? >> class AppClass: >> """Much the same thing, but as a class""" >> def __init__(self, environ, start_response): >> self.environ = environ >> self.start = start_response >> def __iter__(self): >> status = '200 OK' >> headers = [('Content-type','text/plain')] >> self.start(status, headers) >> yield "Hello world!\n" >> for i in range(1,11): >> yield "Extra line %s\n" % i > >This second example confuses me. Though as I reread it I realize more >clearly what it's doing; __init__ is the callable (in essence), but self >is automatically returned. I think an instance with a __call__ method >would be easier to understand. OTOH, there's more concurrency >overhead. I dunno. Anyway, that one confused me. Perhaps you could suggest some text to add to the docstring that would have prevented your initial confusion? >>The application object must accept two positional arguments. For >>the sake of illustration, we have named them ``environ``, and >>``start_response``, but they are not required to have these names. >>A server or gateway *must* invoke the application object using >>positional (not keyword) arguments. >>The first parameter is a dictionary object, containing CGI-style >>environment variables. > >I think the spec is easier to understand if you use names here, i.e., >"environ is a dictionary object". Or remind the reader of the invocation, >i.e., note application(environ, start_response) is called. I'll try to do something with this. >>The second parameter is a callable accepting two positional >>arguments: a status string of the form ``"999 Message here"``, >>and a list of ``(header_name,header_value)`` tuples describing the >>HTTP response header. This callable must return another callable >>that takes one parameter: a string to write as part of the HTTP >>response body. > >"This callable must return a writing function: a function that takes a >single string as an argument, which is written as the HTTP response body." I'll work on this one too. >I guess "function" is more specific than "callable", but it seems easier >to understand. Though honestly, I find the CGI example the easiest way to >understand this, so maybe being more accurate here is fine. I've got to explain *somewhere* that these are any callable. Maybe I should preface the overview with an explanation of what "a callable" means, and reinforce it once or twice in the form "such and such is a callable (function, method, class, callable instance, etc.) that blah blah blah". >> * ``SERVER_NAME`` and ``SERVER_PORT`` (which, when combined with >> ``SCRIPT_NAME`` and ``PATH_INFO``, should complete the > >You forgot to finish your sentence. Also SERVER_NAME is a fallback if >HTTP_HOST isn't present; generally SERVER_NAME indicates the canonical >host name, not necessarily the actual host name. Ah yes. Tony already provided a patch for the typo, but I'll add something about HTTP_HOST. >>``wsgi.last_call`` This value should be true if this is expected >> to be the last invocation of the application >> in this process. This is provided to allow >> applications to optimize their setup for >> long-running vs. short-running scenarios. >> This flag should normally only be true for >> CGI applications, or while a server is doing >> some kind of "graceful shutdown". Note that >> a server or gateway is still allowed to invoke >> the application again; this flag is only >> a "suggestion" to the application that it is >> unlikely to be reinvoked. > >wsgi.last_call seems to complicated from this. It's precisely what you agreed to as a solution for your issue. Granted, I was also surprised by how long the "official" explanation of the feature turned out to be. > Really, it's for CGI and nothing else. Maybe just > wsgi.cgi? wsgi.run_once? I think the semantics shouldn't be any more > general than that. Then we can also guarantee that it won't be called again. I'm really reluctant to require the server to make such a guarantee. My understanding of your use case is really more like, "I'm not likely to run you again for a while, so don't optimize for frequent execution." Hm. Now that I'm thinking about it more, it seems to me that this could be just as easily handled by application/framework-side configuration, and I'm inclined to remove it from the spec altogether. >>The ``start_response()`` Callable >>--------------------------------- >>The second parameter passed to the application object is itself a >>two-argument callable, used to begin the HTTP response and return >>a ``write()`` callable. > >"The second parameters passed to the application object (start_response) >is a callable, used like ``start_response(status, headers)``. I'll work on this. >The status argument is a string like "404 Not Found" or "200 OK". This >string must be pure 7-bit ASCII, containing no control characters, and not >terminated with a return or linefeed. > >The headers argument is a sequence of ``(header_name, header_value)`` >tuples. Each ``header_name`` must be a valid... (and continuing on with >your text). I'll work on this. >Though I'm not clear what "folding" means. I'm guessing you mean: > >Header: blah > continuing Header content Yes. >Does the HTTP spec care about folding? Seems like a distraction to >mention it. I'll check. >>Middleware components that transform the request or response data >>should in general remove WSGI extension data from the ``environ`` >>that the middleware does not understand, to prevent applications >>from inadvertently bypassing the middleware's mediation of the >>interaction by use of a server extension. The simplest way to do >>this is to just delete keys from ``environ`` that are all lowercase >>and do not begin with ``"wsgi."``, before passing the ``environ`` >>on to the application. > >I don't understand this. To me it seems more reasonable that middleware >leave the extra arguments in place. > >For instance, lets say I have a URL redirecting middleware. There's a >chance I need to look at the parsed form of QUERY_STRING, and I cache the >result as a dictionary in, say, webkit.query_vars. That's just as valid >later. Oh, well, unless someone rewrites QUERY_STRING. So to be safe, I >put the query string I parsed in webkit.query_string. > >But maybe I have some other middleware that handles configuration. It >runs after the URL parser, for localized configuration. It doesn't >necessarily know about the query string, or about the other piece of >middleware. And it shouldn't know about it, because what would be the >point of that? They are decoupled. But I don't want it throwing away >that information. > >In that case, it's just some lost time reparsing the URL, but I can >imagine more important things, and a lot of pieces of middleware where the >only point is that they add something to the environ dictionary. E.g., a >session-handling middleware. There's not point to these if other >middleware is going to throw information away. > >If there's reliability issues -- like middleware rewriting QUERY_STRING, >but passing through a cached parse of the old QUERY_STRING that it didn't >know about -- these can be handled pretty easily. But if one middleware >throws away keys it doesn't know about, it messes up the whole stack. You're right. The extension mechanism needs to be clearer. Instead of throwing away everything, there needs to be a way to identify that a server-supplied value may be used in place of some WSGI functionality, so that middleware can remove only those items, rather than every item. Hmmm. Maybe we should have a 'wsgi.extensions' key that contains a dictionary for items that middleware *must* either understand, or not pass through. If a framework or middleware author did your hypothetical query string parsing, he would have to place it in 'wsgi.extensions' if he did not implement the cross-check you describe. Sigh. This will probably need to be a new section on "WSGI Extensions and Middleware". From pje at telecommunity.com Sun Aug 22 20:16:29 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Aug 22 20:16:22 2004 Subject: [Web-SIG] HTTP header canonicalization? Message-ID: <5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com> While reviewing the HTTP/1.1 spec (RFC 2616) for information on header folding, I noticed an interesting bit under section "4.2 Message Headers": Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]. It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma. The order in which header fields with the same field-name are received is therefore significant to the interpretation of the combined field value, and thus a proxy MUST NOT change the order of these field values when a message is forwarded. So, although I've defined the headers sent by the application as a list of name/value pairs, it seems that we *could* use a dictionary instead, if we required that multiple headers not be used, and that some canonical form (e.g. all lower-case) be used for the names. Does anybody see any issues with this? The upside is that it makes it easy for servers/gateways to add missing headers (using 'headerdict.setdefault()'), and it should also be easier for application/framework developers to build up their headers incrementally in the same way. The only downsides I see that could possibly come up are: * There's some reason to have headers with different names in a specific order, even though the spec is adamant that such an ordering is insignificant and not to be relied upon. * There's some reason to split multi-value headers into separate header lines, even though the spec is adamant that the forms are equivalent, and that HTTP has no limitations on line length. Does anybody know whether any HTTP clients in practice are affected by these matters? From ianb at colorstudy.com Sun Aug 22 21:18:52 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Sun Aug 22 21:18:57 2004 Subject: [Web-SIG] Latest WSGI Draft In-Reply-To: <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> References: <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> Message-ID: <4128F19C.1060500@colorstudy.com> Phillip J. Eby wrote: >> That's a awfully pessimistic paragraph ;) > > > Are you being ironic? I'm not sure I follow you here. I don't know if I was being ironic. But it was just an offhanded comment, not a suggestion to change anything. >>> The WSGI interface has two sides: the "server" or "gateway" side, >>> and the "application" side. The server side invokes a callable >>> object that is provided by the application side. The specifics >>> of how that object is provided are up to the server or gateway. >>> It is assumed that some servers or gateways will require an >>> application's deployer to write a short script to create an >>> instance of the server or gateway, and supply it with the >>> application object. Other servers and gateways may use >>> configuration files or other mechanisms to specify where the >>> application object should be imported from. >> >> >> Maybe "gateway" is just distracting. > > > Do you have a specific suggestion here? Use only the term "server". >>> class AppClass: >>> """Much the same thing, but as a class""" >>> def __init__(self, environ, start_response): >>> self.environ = environ >>> self.start = start_response >>> def __iter__(self): >>> status = '200 OK' >>> headers = [('Content-type','text/plain')] >>> self.start(status, headers) >>> yield "Hello world!\n" >>> for i in range(1,11): >>> yield "Extra line %s\n" % i >> >> >> This second example confuses me. Though as I reread it I realize more >> clearly what it's doing; __init__ is the callable (in essence), but >> self is automatically returned. I think an instance with a __call__ >> method would be easier to understand. OTOH, there's more concurrency >> overhead. I dunno. Anyway, that one confused me. > > > Perhaps you could suggest some text to add to the docstring that would > have prevented your initial confusion? I think it makes sense when you see it in action, i.e.,: AppClass *is* the application object (*not* instances of AppClass). AppClass(environ, start_response) starts the response; it returns an instance of itself, which is an iterator that produces the content. I see what really confused me. Shouldn't that be more like: class AppClass: def __init__(self, environ, start_response): self.environ = environ status = '200 OK' headers = [('Content-type', 'text/plain')] start_response(status, headers) # return self is implicit def __iter__(self): yield "Hello world!\n" for i in range(1, 11): yield "Extra line %s\n" % i running start_response in __iter__ seems strange to me. Maybe it's correct, but I expect the call sequence to be: application(environ, start_response) start_response(status_code, environ) returns write() possible write() calls application returns iterable server uses iterable In this example, the write() function only is created after you start the iteration. Maybe that's fine, I'm not sure -- it's a little odd, because when you start the iteration you expect to be getting the body, but the headers haven't been sent yet. Of course, you ensure the headers get sent, but it definitely confuses me. >> I guess "function" is more specific than "callable", but it seems >> easier to understand. Though honestly, I find the CGI example the >> easiest way to understand this, so maybe being more accurate here is >> fine. > > > I've got to explain *somewhere* that these are any callable. Maybe I > should preface the overview with an explanation of what "a callable" > means, and reinforce it once or twice in the form "such and such is a > callable (function, method, class, callable instance, etc.) that blah > blah blah". Sure. But it might not be that big a deal -- I think just using names more often might help. "The write callable", for instance, instead of "a callable". >>> ``wsgi.last_call`` This value should be true if this is expected >>> to be the last invocation of the application >>> in this process. This is provided to allow >>> applications to optimize their setup for >>> long-running vs. short-running scenarios. >>> This flag should normally only be true for >>> CGI applications, or while a server is doing >>> some kind of "graceful shutdown". Note that >>> a server or gateway is still allowed to invoke >>> the application again; this flag is only >>> a "suggestion" to the application that it is >>> unlikely to be reinvoked. >> >> >> wsgi.last_call seems to complicated from this. > > > It's precisely what you agreed to as a solution for your issue. > Granted, I was also surprised by how long the "official" explanation of > the feature turned out to be. Yes, it's what I agreed to. But looking at the length of the description, I think I was wrong, it's shouldn't be that complicated to explain. >> Really, it's for CGI and nothing else. Maybe just wsgi.cgi? >> wsgi.run_once? I think the semantics shouldn't be any more general >> than that. Then we can also guarantee that it won't be called again. > > > I'm really reluctant to require the server to make such a guarantee. My > understanding of your use case is really more like, "I'm not likely to > run you again for a while, so don't optimize for frequent execution." > > Hm. Now that I'm thinking about it more, it seems to me that this could > be just as easily handled by application/framework-side configuration, > and I'm inclined to remove it from the spec altogether. That was initially how multithreaded and multiprocess was going to be handled too, but I think it's really important that those will be specified. CGI is the only realistic use case for this feature, but it's a really common use case (since it's really just a widely supported standard that we are building on), and it presents a distinct set of problems for Python. I don't see any reason not to just be explicit about being in a CGI environment -- every server will clearly know if it's in a CGI environment, every application can ignore it if it chooses, everyone will know exactly what it means in the spec. >>> Middleware components that transform the request or response data >>> should in general remove WSGI extension data from the ``environ`` >>> that the middleware does not understand, to prevent applications >>> from inadvertently bypassing the middleware's mediation of the >>> interaction by use of a server extension. The simplest way to do >>> this is to just delete keys from ``environ`` that are all lowercase >>> and do not begin with ``"wsgi."``, before passing the ``environ`` >>> on to the application. >> >> >> I don't understand this. To me it seems more reasonable that >> middleware leave the extra arguments in place. >> >> For instance, lets say I have a URL redirecting middleware. There's a >> chance I need to look at the parsed form of QUERY_STRING, and I cache >> the result as a dictionary in, say, webkit.query_vars. That's just as >> valid later. Oh, well, unless someone rewrites QUERY_STRING. So to >> be safe, I put the query string I parsed in webkit.query_string. >> >> But maybe I have some other middleware that handles configuration. It >> runs after the URL parser, for localized configuration. It doesn't >> necessarily know about the query string, or about the other piece of >> middleware. And it shouldn't know about it, because what would be the >> point of that? They are decoupled. But I don't want it throwing away >> that information. >> >> In that case, it's just some lost time reparsing the URL, but I can >> imagine more important things, and a lot of pieces of middleware where >> the only point is that they add something to the environ dictionary. >> E.g., a session-handling middleware. There's not point to these if >> other middleware is going to throw information away. >> >> If there's reliability issues -- like middleware rewriting >> QUERY_STRING, but passing through a cached parse of the old >> QUERY_STRING that it didn't know about -- these can be handled pretty >> easily. But if one middleware throws away keys it doesn't know about, >> it messes up the whole stack. > > > You're right. The extension mechanism needs to be clearer. Instead of > throwing away everything, there needs to be a way to identify that a > server-supplied value may be used in place of some WSGI functionality, > so that middleware can remove only those items, rather than every item. > > Hmmm. Maybe we should have a 'wsgi.extensions' key that contains a > dictionary for items that middleware *must* either understand, or not > pass through. If a framework or middleware author did your hypothetical > query string parsing, he would have to place it in 'wsgi.extensions' if > he did not implement the cross-check you describe. I'm quite comfortable with solving this in on ad hoc basis. Generally the issue is middleware that rewrites the environment, but some extension depends on a value in the environment and isn't simultaneously updated. In general, keeping a note about what the value of the key was will work fine, in those small number of cases where it is an issue. Then it's up to the extension-using application (and middleware) to agree on a reliable way to do things, and other pieces of middleware don't need to worry about any of it. I guess the problem is that someone might build in a dependency, but not be careful about it, and bugs would only arise in the presence of some middleware which the author didn't test with. It's the same issue if the author doesn't set wsgi.extensions properly, though that's more explicit and maybe harder to miss. > Sigh. This will probably need to be a new section on "WSGI Extensions > and Middleware". -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From mnot at mnot.net Mon Aug 23 00:10:40 2004 From: mnot at mnot.net (Mark Nottingham) Date: Mon Aug 23 00:10:44 2004 Subject: [Web-SIG] HTTP header canonicalization? In-Reply-To: <5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com> References: <5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com> Message-ID: <1A41660E-F488-11D8-82BE-000A95BD86C0@mnot.net> The only problem I'm aware of is Set-Cookie, which can have an unquoted expires date in it; e.g., Set-Cookie: CUSTOMER=WILE_E_COYOTE; path=/; expires=Wednesday, 09-Nov-99 23:12:40 GMT If you have two of these, the comma after the day (here, "Wednesday") makes parsing problematic. Note that this is only specified in the original netscape cookie spec [1], not the State Management RFC [2]. See section 10.1.2 of [2] for more discussion of this issue. So, you *shouldn't* see these, especially since WSGI is about the server side. All the same, I'll ask around to see how often they're still seen in the wild. It would also be interesting to hear from people working on WSGI application frameworks to find out how many expect to set multiple cookies with expires (as opposed to max-age) in at least one; it might be best to simply disallow doing so, or to require quoting. Regarding ordering of headers with different names; I don't think so. Note that HTTP says """it is "good practice" to send general-header fields first, followed by request-header or response-header fields, and ending with the entity-header fields.""" This isn't very strict, though. WRT header length limitations, most people start get nervous when they get larger than 2048 characters; some proxies (esp. older ones) did limit there, or even at 1024 characters. Note that headers can be split into multiple lines as well as multiple instances; e.g., Example: foo, bar is equivalent to Example: foo Example: bar and Example: foo, bar Overall, I think that modelling headers as dictionary in the application and passing them in that form to a server is a good thing, as long as the Set-Cookie issue is kept in mind. Servers might have to modify their serialisation on the wire to account for line lengths and aesthetics (generally, the only time you run into line length problems is when you're extending HTTP to do non-browsing things), but that doesn't need to be exposed to the application. Cheers, 1. http://wp.netscape.com/newsref/std/cookie_spec.html 2. http://rfc2109.x42.com/ On Aug 22, 2004, at 11:16 AM, Phillip J. Eby wrote: > Does anybody see any issues with this? The upside is that it makes it > easy for servers/gateways to add missing headers (using > 'headerdict.setdefault()'), and it should also be easier for > application/framework developers to build up their headers > incrementally in the same way. > > The only downsides I see that could possibly come up are: > > * There's some reason to have headers with different names in a > specific order, even though the spec is adamant that such an ordering > is insignificant and not to be relied upon. > > * There's some reason to split multi-value headers into separate > header lines, even though the spec is adamant that the forms are > equivalent, and that HTTP has no limitations on line length. > > Does anybody know whether any HTTP clients in practice are affected by > these matters? -- Mark Nottingham http://www.mnot.net/ From pje at telecommunity.com Mon Aug 23 00:14:43 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 23 00:14:28 2004 Subject: [Web-SIG] Latest WSGI Draft In-Reply-To: <4128F19C.1060500@colorstudy.com> References: <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> At 02:18 PM 8/22/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >> >>Do you have a specific suggestion here? > >Use only the term "server". I'm rather reluctant to do that, because CGI, FastCGI, and many other such systems are "gateways" rather than servers per se. Technically, I would only consider a web server that's written in Python, or embeds Python, to be capable of being a "server" per the spec. Other servers must be accessed via a "gateway" written in Python. Certainly it doesn't make sense to talk about a CGI "server", for example. >running start_response in __iter__ seems strange to me. Maybe it's >correct, but I expect the call sequence to be: > >application(environ, start_response) > start_response(status_code, environ) returns write() > possible write() calls >application returns iterable >server uses iterable > >In this example, the write() function only is created after you start the >iteration. Maybe that's fine, I'm not sure -- it's a little odd, because >when you start the iteration you expect to be getting the body, but the >headers haven't been sent yet. Of course, you ensure the headers get >sent, but it definitely confuses me. Darn. I guess now I'll have to explain this part, too. :) The intent of the spec is to allow start_response() to be called during the first iteration of the iterator. That is, you must have called start_response() at least by the time the first body part is yielded from the iterator. I illustrated this in the example, but forgot to mention it in the text. I'm correcting this now. >>> Really, it's for CGI and nothing else. Maybe just wsgi.cgi? >>>wsgi.run_once? I think the semantics shouldn't be any more general than >>>that. Then we can also guarantee that it won't be called again. >> >>I'm really reluctant to require the server to make such a guarantee. My >>understanding of your use case is really more like, "I'm not likely to >>run you again for a while, so don't optimize for frequent execution." >>Hm. Now that I'm thinking about it more, it seems to me that this could >>be just as easily handled by application/framework-side configuration, >>and I'm inclined to remove it from the spec altogether. > >That was initially how multithreaded and multiprocess was going to be >handled too, but I think it's really important that those will be >specified. CGI is the only realistic use case for this feature, but it's >a really common use case (since it's really just a widely supported >standard that we are building on), and it presents a distinct set of >problems for Python. I don't see any reason not to just be explicit about >being in a CGI environment -- every server will clearly know if it's in a >CGI environment, every application can ignore it if it chooses, everyone >will know exactly what it means in the spec. Alright. Let's make it 'wsgi.run_once'. Here's my attempt at a shorter explanation: ``wsgi.run_once`` This value should be true if the server/gateway expects (but does not guarantee!) that the application will only be invoked this one time during the life of its containing process. Normally, this will only be true for a gateway based on CGI (or something similar). >>You're right. The extension mechanism needs to be clearer. Instead of >>throwing away everything, there needs to be a way to identify that a >>server-supplied value may be used in place of some WSGI functionality, so >>that middleware can remove only those items, rather than every item. >>Hmmm. Maybe we should have a 'wsgi.extensions' key that contains a >>dictionary for items that middleware *must* either understand, or not >>pass through. If a framework or middleware author did your hypothetical >>query string parsing, he would have to place it in 'wsgi.extensions' if >>he did not implement the cross-check you describe. > >I'm quite comfortable with solving this in on ad hoc basis. Generally the >issue is middleware that rewrites the environment, but some extension >depends on a value in the environment and isn't simultaneously >updated. In general, keeping a note about what the value of the key was >will work fine, in those small number of cases where it is an issue. Then >it's up to the extension-using application (and middleware) to agree on a >reliable way to do things, and other pieces of middleware don't need to >worry about any of it. > >I guess the problem is that someone might build in a dependency, but not >be careful about it, and bugs would only arise in the presence of some >middleware which the author didn't test with. It's the same issue if the >author doesn't set wsgi.extensions properly, though that's more explicit >and maybe harder to miss. Here's the use case I'm thinking of. Suppose mod_python wants to expose some nifty super-duper API that an application can use in place of pure WSGI, if it's present. But, this interface maybe bypasses certain features that a particular piece of middleware is intended to intercept. So, my idea here is that if mod_python puts that API into a key in 'wsgi.extensions', then any middleware will know it's safely "intercepting communications" if it discards any 'wsgi.extensions'. This is different from the sort of scenario you're talking about, where you can have cached data include a record of its dependencies to ensure correctness. So here's the idea: * If you provide an alternative mechanism or extension to a WSGI-supplied facility, you place it in the 'wsgi.extensions' dictionary * If you're middleware that simply adds additional data to the 'environ', do so, recording your dependencies if any, to avoid becoming "stale" if other middleware changes things * If you're middleware that makes changes to existing variables, or intercepts any WSGI operations, do 'environ["wsgi.extensions"].clear()' or delete any extensions you can't intercept, to prevent the underlying application from "going around" you. Your thoughts? From pje at telecommunity.com Mon Aug 23 00:30:12 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 23 00:29:57 2004 Subject: [Web-SIG] HTTP header canonicalization? In-Reply-To: <1A41660E-F488-11D8-82BE-000A95BD86C0@mnot.net> References: <5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com> <5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040822181654.02342a40@mail.telecommunity.com> At 03:10 PM 8/22/04 -0700, Mark Nottingham wrote: >The only problem I'm aware of is Set-Cookie, which can have an unquoted >expires date in it; e.g., > > Set-Cookie: CUSTOMER=WILE_E_COYOTE; path=/; expires=Wednesday, > 09-Nov-99 23:12:40 GMT > >If you have two of these, the comma after the day (here, "Wednesday") >makes parsing problematic. > >Note that this is only specified in the original netscape cookie spec [1], >not the State Management RFC [2]. See section 10.1.2 of [2] for more >discussion of this issue. > >So, you *shouldn't* see these, especially since WSGI is about the server >side. All the same, I'll ask around to see how often they're still seen in >the wild. Unfortunately, this seems like something that's awfully likely to be present in Python frameworks "in the wild". >Regarding ordering of headers with different names; I don't think so. Note >that HTTP says > >"""it is "good practice" to send general-header fields first, followed by >request-header or response-header fields, and ending with the >entity-header fields.""" > >This isn't very strict, though. I was thinking that servers that want to follow "good practice" could just have a list of headers in the desirable order, pulling them out of the dictionary first. In practice, *not* doing this simply means that every application or framework has to know what order headers "belong" in, so this doesn't seem like a terrible thing. >Overall, I think that modelling headers as dictionary in the application >and passing them in that form to a server is a good thing, as long as the >Set-Cookie issue is kept in mind. Servers might have to modify their >serialisation on the wire to account for line lengths and aesthetics >(generally, the only time you run into line length problems is when you're >extending HTTP to do non-browsing things), but that doesn't need to be >exposed to the application. Maybe a dictionary of lists would work? That is, the ``headers`` field would look like: {'content-type': ['text/plain'], 'content-length': ['1234'], ...} This would be perhaps annoying for specifying simpler fields, but it would still be easy to write utility functions to manipulate headers. For the content, I'm thinking we should still prohibit embedded control characters, but note that the server is allowed to "fold" long header lines if it wishes (by replacing one or more whitespace characters with '\r\n '). From ianb at colorstudy.com Mon Aug 23 00:41:31 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Aug 23 00:41:36 2004 Subject: [Web-SIG] Latest WSGI Draft In-Reply-To: <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> References: <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> Message-ID: <4129211B.2020101@colorstudy.com> Phillip J. Eby wrote: > At 02:18 PM 8/22/04 -0500, Ian Bicking wrote: > >> Phillip J. Eby wrote: >> >>> >>> Do you have a specific suggestion here? >> >> >> Use only the term "server". > > > I'm rather reluctant to do that, because CGI, FastCGI, and many other > such systems are "gateways" rather than servers per se. Technically, I > would only consider a web server that's written in Python, or embeds > Python, to be capable of being a "server" per the spec. Other servers > must be accessed via a "gateway" written in Python. Certainly it > doesn't make sense to talk about a CGI "server", for example. Okay, that's fine then. >>>> Really, it's for CGI and nothing else. Maybe just wsgi.cgi? >>>> wsgi.run_once? I think the semantics shouldn't be any more general >>>> than that. Then we can also guarantee that it won't be called again. >>> >>> >>> I'm really reluctant to require the server to make such a guarantee. >>> My understanding of your use case is really more like, "I'm not >>> likely to run you again for a while, so don't optimize for frequent >>> execution." >>> Hm. Now that I'm thinking about it more, it seems to me that this >>> could be just as easily handled by application/framework-side >>> configuration, and I'm inclined to remove it from the spec altogether. >> >> >> That was initially how multithreaded and multiprocess was going to be >> handled too, but I think it's really important that those will be >> specified. CGI is the only realistic use case for this feature, but >> it's a really common use case (since it's really just a widely >> supported standard that we are building on), and it presents a >> distinct set of problems for Python. I don't see any reason not to >> just be explicit about being in a CGI environment -- every server will >> clearly know if it's in a CGI environment, every application can >> ignore it if it chooses, everyone will know exactly what it means in >> the spec. > > > Alright. Let's make it 'wsgi.run_once'. Here's my attempt at a shorter > explanation: > > ``wsgi.run_once`` This value should be true if the server/gateway > expects (but does not guarantee!) that the > application will only be invoked this one time > during the life of its containing process. > Normally, this will only be true for a gateway > based on CGI (or something similar). Is there a reason it can't be guaranteed? >>> You're right. The extension mechanism needs to be clearer. Instead >>> of throwing away everything, there needs to be a way to identify that >>> a server-supplied value may be used in place of some WSGI >>> functionality, so that middleware can remove only those items, rather >>> than every item. >>> Hmmm. Maybe we should have a 'wsgi.extensions' key that contains a >>> dictionary for items that middleware *must* either understand, or not >>> pass through. If a framework or middleware author did your >>> hypothetical query string parsing, he would have to place it in >>> 'wsgi.extensions' if he did not implement the cross-check you describe. >> >> >> I'm quite comfortable with solving this in on ad hoc basis. Generally >> the issue is middleware that rewrites the environment, but some >> extension depends on a value in the environment and isn't >> simultaneously updated. In general, keeping a note about what the >> value of the key was will work fine, in those small number of cases >> where it is an issue. Then it's up to the extension-using application >> (and middleware) to agree on a reliable way to do things, and other >> pieces of middleware don't need to worry about any of it. >> >> I guess the problem is that someone might build in a dependency, but >> not be careful about it, and bugs would only arise in the presence of >> some middleware which the author didn't test with. It's the same >> issue if the author doesn't set wsgi.extensions properly, though >> that's more explicit and maybe harder to miss. > > > Here's the use case I'm thinking of. Suppose mod_python wants to expose > some nifty super-duper API that an application can use in place of pure > WSGI, if it's present. But, this interface maybe bypasses certain > features that a particular piece of middleware is intended to > intercept. So, my idea here is that if mod_python puts that API into a > key in 'wsgi.extensions', then any middleware will know it's safely > "intercepting communications" if it discards any 'wsgi.extensions'. > > This is different from the sort of scenario you're talking about, where > you can have cached data include a record of its dependencies to ensure > correctness. > > So here's the idea: > > * If you provide an alternative mechanism or extension to a > WSGI-supplied facility, you place it in the 'wsgi.extensions' dictionary > > * If you're middleware that simply adds additional data to the > 'environ', do so, recording your dependencies if any, to avoid becoming > "stale" if other middleware changes things > > * If you're middleware that makes changes to existing variables, or > intercepts any WSGI operations, do 'environ["wsgi.extensions"].clear()' > or delete any extensions you can't intercept, to prevent the underlying > application from "going around" you. > > Your thoughts? Okay, that seems reasonable. For instance, I could imagine mod_python putting its Apache request object in an extension. Something like an exception-catching middleware wouldn't really care about this sort of thing, so it wouldn't clear the extensions, but a middleware that filtered the output wouldn't want that extension around. I guess a general rule would be that any extension that provided a route around input/output should be in wsgi.extensions, and any middleware that relies on input and output should clear those extensions. Should that rule also apply to the other environmental variables? -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From mnot at mnot.net Mon Aug 23 00:41:48 2004 From: mnot at mnot.net (Mark Nottingham) Date: Mon Aug 23 00:41:52 2004 Subject: [Web-SIG] HTTP header canonicalization? In-Reply-To: <5.1.1.6.0.20040822181654.02342a40@mail.telecommunity.com> References: <5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com> <5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com> <5.1.1.6.0.20040822181654.02342a40@mail.telecommunity.com> Message-ID: <73D3244D-F48C-11D8-82BE-000A95BD86C0@mnot.net> On Aug 22, 2004, at 3:30 PM, Phillip J. Eby wrote: > At 03:10 PM 8/22/04 -0700, Mark Nottingham wrote: >> The only problem I'm aware of is Set-Cookie, which can have an >> unquoted expires date in it; e.g., >> >> Set-Cookie: CUSTOMER=WILE_E_COYOTE; path=/; expires=Wednesday, >> 09-Nov-99 23:12:40 GMT >> >> If you have two of these, the comma after the day (here, "Wednesday") >> makes parsing problematic. >> >> Note that this is only specified in the original netscape cookie spec >> [1], not the State Management RFC [2]. See section 10.1.2 of [2] for >> more discussion of this issue. >> >> So, you *shouldn't* see these, especially since WSGI is about the >> server side. All the same, I'll ask around to see how often they're >> still seen in the wild. > > Unfortunately, this seems like something that's awfully likely to be > present in Python frameworks "in the wild". I'm honestly not sure. That was my assumption until recently, but I'm hopeful that RFC2109 may have reduced the need to accommodate this. Since it's a server-side framework, it can enforce conformance to the RFCs (there are other problems with using Expires on cookies anyway, esp. WRT caching) if it so chooses, as long as the application frameworks are willing to accept that. >> Regarding ordering of headers with different names; I don't think so. >> Note that HTTP says >> >> """it is "good practice" to send general-header fields first, >> followed by request-header or response-header fields, and ending with >> the entity-header fields.""" >> >> This isn't very strict, though. > > I was thinking that servers that want to follow "good practice" could > just have a list of headers in the desirable order, pulling them out > of the dictionary first. In practice, *not* doing this simply means > that every application or framework has to know what order headers > "belong" in, so this doesn't seem like a terrible thing. Agreed. > Maybe a dictionary of lists would work? That is, the ``headers`` > field would look like: > > {'content-type': ['text/plain'], 'content-length': ['1234'], ...} > > This would be perhaps annoying for specifying simpler fields, but it > would still be easy to write utility functions to manipulate headers. Would implementations be required to separate multiple header values into different list items? > For the content, I'm thinking we should still prohibit embedded > control characters, but note that the server is allowed to "fold" long > header lines if it wishes (by replacing one or more whitespace > characters with '\r\n '). That *may* get tricky if it does so in the middle of quoted content, e.g., Example: foo="bar baz" if whitespace is significant inside the quotes. -- Mark Nottingham http://www.mnot.net/ From ianb at colorstudy.com Mon Aug 23 00:59:04 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Aug 23 00:59:07 2004 Subject: [Web-SIG] HTTP header canonicalization? In-Reply-To: <5.1.1.6.0.20040822181654.02342a40@mail.telecommunity.com> References: <5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com> <5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com> <5.1.1.6.0.20040822181654.02342a40@mail.telecommunity.com> Message-ID: <41292538.8070400@colorstudy.com> Phillip J. Eby wrote: > At 03:10 PM 8/22/04 -0700, Mark Nottingham wrote: > >> The only problem I'm aware of is Set-Cookie, which can have an >> unquoted expires date in it; e.g., >> >> Set-Cookie: CUSTOMER=WILE_E_COYOTE; path=/; expires=Wednesday, >> 09-Nov-99 23:12:40 GMT >> >> If you have two of these, the comma after the day (here, "Wednesday") >> makes parsing problematic. >> >> Note that this is only specified in the original netscape cookie spec >> [1], not the State Management RFC [2]. See section 10.1.2 of [2] for >> more discussion of this issue. >> >> So, you *shouldn't* see these, especially since WSGI is about the >> server side. All the same, I'll ask around to see how often they're >> still seen in the wild. > > > Unfortunately, this seems like something that's awfully likely to be > present in Python frameworks "in the wild". I don't know if that's true. Most (all?) frameworks have an explicit way of setting cookies, rather than having applications generate Set-Cookie headers on their own. Since they have to be modified for WSGI, changing this might not be so bad. Though right now the standard Cookie class does create multiple headers. Many (most?) frameworks also use a dictionary representation for headers as well, sometimes with distinct methods for adding and setting headers (where adding creates a list of values, but only if it has to). Several independent response implementations seem to work this way, so it's pretty common. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Mon Aug 23 01:26:49 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 23 01:26:34 2004 Subject: [Web-SIG] Latest WSGI Draft In-Reply-To: <4129211B.2020101@colorstudy.com> References: <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> At 05:41 PM 8/22/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>Alright. Let's make it 'wsgi.run_once'. Here's my attempt at a shorter >>explanation: >>``wsgi.run_once`` This value should be true if the server/gateway >> expects (but does not guarantee!) that the >> application will only be invoked this one time >> during the life of its containing process. >> Normally, this will only be true for a gateway >> based on CGI (or something similar). > >Is there a reason it can't be guaranteed? Is there a reason it *should* be guaranteed? :) The last time we had this discussion (December?), I thought you'd decided that the standard library's "atexit" facility was sufficient to cover your use case if a guarantee was needed here. (I only just remembered the "atexit" discussion, or I'd have suggested that as the solution instead of introducing 'wsgi.last_call' a few days ago.) >>Here's the use case I'm thinking of. Suppose mod_python wants to expose >>some nifty super-duper API that an application can use in place of pure >>WSGI, if it's present. But, this interface maybe bypasses certain >>features that a particular piece of middleware is intended to >>intercept. So, my idea here is that if mod_python puts that API into a >>key in 'wsgi.extensions', then any middleware will know it's safely >>"intercepting communications" if it discards any 'wsgi.extensions'. >>This is different from the sort of scenario you're talking about, where >>you can have cached data include a record of its dependencies to ensure >>correctness. >>So here's the idea: >> * If you provide an alternative mechanism or extension to a >> WSGI-supplied facility, you place it in the 'wsgi.extensions' dictionary >> * If you're middleware that simply adds additional data to the >> 'environ', do so, recording your dependencies if any, to avoid becoming >> "stale" if other middleware changes things >> * If you're middleware that makes changes to existing variables, or >> intercepts any WSGI operations, do 'environ["wsgi.extensions"].clear()' >> or delete any extensions you can't intercept, to prevent the underlying >> application from "going around" you. >>Your thoughts? > >Okay, that seems reasonable. For instance, I could imagine mod_python >putting its Apache request object in an extension. Something like an >exception-catching middleware wouldn't really care about this sort of >thing, so it wouldn't clear the extensions, but a middleware that filtered >the output wouldn't want that extension around. > >I guess a general rule would be that any extension that provided a route >around input/output should be in wsgi.extensions, and any middleware that >relies on input and output should clear those extensions. Should that >rule also apply to the other environmental variables? Actually, there's another way to handle this. Suppose we put the burden on server authors to provide safe extensions? Specifically, if a server provides an extension that can be used in place of, or as an extension to, any native WSGI facility (request data, response management, environment, etc.), then that facility *must* respect any changes made by middleware, or generate an appropriate error. An example would be that if mod_python wanted to supply its request object as an extension, it would have to supply a variable like 'mod_python.get_request', which would be a callable taking 'environ' and 'start_response'. If any 'environ' contents supplied by mod_python had changed, or 'start_response' wasn't the 'start_response' it gave to the application, it would have to either provide an alternative object, or raise an error, or return None, or something of that sort. In other words, the burden of verification is on the extender. This would simplify the spec somewhat, since we wouldn't need to introduce 'wsgi.extensions', and we can also drop the suggestion for middleware authors to delete extensions. Middleware is simpler too, it just changes what it needs to and moves on with life. :) We would just have to add a section on how to build "safe" extensions to the spec. From pje at telecommunity.com Mon Aug 23 01:33:05 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 23 01:32:49 2004 Subject: [Web-SIG] HTTP header canonicalization? In-Reply-To: <73D3244D-F48C-11D8-82BE-000A95BD86C0@mnot.net> References: <5.1.1.6.0.20040822181654.02342a40@mail.telecommunity.com> <5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com> <5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com> <5.1.1.6.0.20040822181654.02342a40@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040822192723.02214ec0@mail.telecommunity.com> At 03:41 PM 8/22/04 -0700, Mark Nottingham wrote: >On Aug 22, 2004, at 3:30 PM, Phillip J. Eby wrote: >>Maybe a dictionary of lists would work? That is, the ``headers`` field >>would look like: >> >> {'content-type': ['text/plain'], 'content-length': ['1234'], ...} >> >>This would be perhaps annoying for specifying simpler fields, but it >>would still be easy to write utility functions to manipulate headers. > >Would implementations be required to separate multiple header values into >different list items? No. Readers would be required to look at all list items. >>For the content, I'm thinking we should still prohibit embedded control >>characters, but note that the server is allowed to "fold" long header >>lines if it wishes (by replacing one or more whitespace characters with >>'\r\n '). > >That *may* get tricky if it does so in the middle of quoted content, e.g., > >Example: foo="bar > baz" > >if whitespace is significant inside the quotes. I think I'm going to punt on this by saying that the server can split or fold headers only if it can do so *safely*, where "safely" means, "the server has sufficient understanding of the header's format or semantics". :( A possible alternative is to allow applications to fold their own headers, but I'm reluctant to do this because I fear people using e.g. '\n' when they should use '\r\n' and suchlike. Banning control characters means the server can easily detect when a supplied header is broken, *and* the server knows it always adds a single CRLF to the end of each header. From angryhicKclown at netscape.net Mon Aug 23 06:44:47 2004 From: angryhicKclown at netscape.net (angryhicKclown@netscape.net) Date: Mon Aug 23 06:44:53 2004 Subject: [Web-SIG] Comments/stylistic ideas regarding WSGI Message-ID: <6759AAD7.5E1CC4AC.519F8DB3@netscape.net> Now that I understand what WSGI is intended to be used for, I like it a lot. However, I do have a few suggestions. Although it means more typing, I think the API is too cryptic as-is. I think that applications should be callable, but should have a single parameter: gateway. The gateway parameter contains attributes and methods such as environ, start_response(), and write(). This way, it's clear to the end-user both in documentation (removing many instances of "callable" and confusion with __init__) and also is very much more natural to many programmers. Finally, I think the most important reason this change should be implemented is because it allows the interface to be easily upgraded without breaking compatibility with older versions. Perhaps (just an example), in the future, there will be a need for a flush() method, in addition to the write() method. In the current version, start_response() would return a tuple of write() and flush(), which would break current compatibility. The only other way I see of doing this using the current spec would be passing a default parameter of the version of the API used, which is ugly. With this enhancement I propose, it is simply a means of adding a method to the gateway parameter. Here's the example as it is now: def simple_app(environ, start_response): """Simplest possible application object""" status = '200 OK' headers = [('Content-type','text/plain')] write = start_response(status, headers) write('Hello world!\n') With my enhancements, it would now look like: def simple_app(gateway): status = '200 OK' headers = [('Content-type','text/plain')] gateway.start_response(status, headers) gateway.write('Hello world!\n') In my opinion, my proposal looks a bit clearer. My other idea (which follows the previous proposal) is to scrap start_response() entirely, and instead set gateway.status and gateway.headers attributes. The simple app would now look like: def simple_app(gateway): gateway.status = '200 OK' gateway.headers = [('Content-type','text/plain')] # perhaps gateway.set_header('Content-type','text/plain')? gateway.write('Hello world!\n') Any comments/criticisms are appreciated. __________________________________________________________________ Switch to Netscape Internet Service. As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register Netscape. Just the Net You Need. New! Netscape Toolbar for Internet Explorer Search from anywhere on the Web and block those annoying pop-ups. Download now at http://channels.netscape.com/ns/search/install.jsp From ianb at colorstudy.com Mon Aug 23 07:24:20 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Aug 23 07:24:24 2004 Subject: [Web-SIG] Latest WSGI Draft In-Reply-To: <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> References: <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> Message-ID: <41297F84.1090801@colorstudy.com> Phillip J. Eby wrote: > At 05:41 PM 8/22/04 -0500, Ian Bicking wrote: > >> Phillip J. Eby wrote: >> >>> Alright. Let's make it 'wsgi.run_once'. Here's my attempt at a >>> shorter explanation: >>> ``wsgi.run_once`` This value should be true if the server/gateway >>> expects (but does not guarantee!) that the >>> application will only be invoked this one time >>> during the life of its containing process. >>> Normally, this will only be true for a gateway >>> based on CGI (or something similar). >> >> >> Is there a reason it can't be guaranteed? > > > Is there a reason it *should* be guaranteed? :) The last time we had > this discussion (December?), I thought you'd decided that the standard > library's "atexit" facility was sufficient to cover your use case if a > guarantee was needed here. (I only just remembered the "atexit" > discussion, or I'd have suggested that as the solution instead of > introducing 'wsgi.last_call' a few days ago.) atexit was a different discussion. I don't know if there's a reason it should be guaranteed, but then I don't know if there's any situation where it wouldn't be guaranteed. I can't imagine it being used outside of a CGI context, and it is guaranteed for CGI. > Actually, there's another way to handle this. Suppose we put the burden > on server authors to provide safe extensions? Specifically, if a server > provides an extension that can be used in place of, or as an extension > to, any native WSGI facility (request data, response management, > environment, etc.), then that facility *must* respect any changes made > by middleware, or generate an appropriate error. > > An example would be that if mod_python wanted to supply its request > object as an extension, it would have to supply a variable like > 'mod_python.get_request', which would be a callable taking 'environ' and > 'start_response'. If any 'environ' contents supplied by mod_python had > changed, or 'start_response' wasn't the 'start_response' it gave to the > application, it would have to either provide an alternative object, or > raise an error, or return None, or something of that sort. In other > words, the burden of verification is on the extender. I can see that working for extensions to the request, but what about extensions to the response? E.g., some mod_python extension could allow for internal redirects -- a useful feature that won't fit into WSGI. There's nothing the extension could do to check for middleware that would be interested, as the middleware that's interested is going to modify the output, not the request. > This would simplify the spec somewhat, since we wouldn't need to > introduce 'wsgi.extensions', and we can also drop the suggestion for > middleware authors to delete extensions. Middleware is simpler too, it > just changes what it needs to and moves on with life. :) We would just > have to add a section on how to build "safe" extensions to the spec. I do like the idea of simplifying this part of the spec. If it works. It's also something people can work out on their own. I expect the vast majority of these servers and applications to be open source, and if some pieces don't work together at first there's a feedback loop to fix that. Also, I don't think any of these discussions need to be resolved before this becomes a real PEP. There's going to be more discussion then (no matter how much we discuss now), and this discussion can just be part of that process. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Mon Aug 23 07:24:29 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Aug 23 07:24:32 2004 Subject: [Web-SIG] Comments/stylistic ideas regarding WSGI In-Reply-To: <6759AAD7.5E1CC4AC.519F8DB3@netscape.net> References: <6759AAD7.5E1CC4AC.519F8DB3@netscape.net> Message-ID: <41297F8D.3070907@colorstudy.com> angryhicKclown@netscape.net wrote: > Here's the example as it is now: > > def simple_app(environ, start_response): > """Simplest possible application object""" > status = '200 OK' > headers = [('Content-type','text/plain')] > write = start_response(status, headers) > write('Hello world!\n') > > With my enhancements, it would now look like: > > def simple_app(gateway): > status = '200 OK' > headers = [('Content-type','text/plain')] > gateway.start_response(status, headers) > gateway.write('Hello world!\n') That does look easier to understand. There'd be no particular reason to put the input stream inside the environ dictionary either. I assume it would simply be an error to use gateway.write before start_response. > In my opinion, my proposal looks a bit clearer. > > My other idea (which follows the previous proposal) is to scrap start_response() entirely, and instead set gateway.status and gateway.headers attributes. The simple app would now look like: > > def simple_app(gateway): > gateway.status = '200 OK' > gateway.headers = [('Content-type','text/plain')] # perhaps gateway.set_header('Content-type','text/plain')? > gateway.write('Hello world!\n') This is harder to implement and understand. start_response is likely to be an actual action on the part of the gateway, with this model you'd have to detect when both status and headers were set, or on the first call to write, or something like that. I think an explicit start_response is the best idea, whether a method or function. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From jim-web-sig at jimdabell.com Mon Aug 23 07:30:06 2004 From: jim-web-sig at jimdabell.com (Jim Dabell) Date: Mon Aug 23 07:24:45 2004 Subject: [Web-SIG] Comments/stylistic ideas regarding WSGI In-Reply-To: <6759AAD7.5E1CC4AC.519F8DB3@netscape.net> References: <6759AAD7.5E1CC4AC.519F8DB3@netscape.net> Message-ID: <200408230630.06907.jim-web-sig@jimdabell.com> On Monday 23 August 2004 05:44, angryhicKclown@netscape.net wrote: > Although it means more typing, I think the API is too cryptic as-is. I > think that applications should be callable, but should have a single > parameter: gateway. The gateway parameter contains attributes and methods > such as environ, start_response(), and write(). This way, it's clear to the > end-user both in documentation (removing many instances of "callable" and > confusion with __init__) and also is very much more natural to many > programmers. That's the first thing I thought when skimming the draft. Why bother moving tuples around when you can organise the relevent information into an object and simply send that around instead? It's less to keep track of, and an easily extendable interface. > My other idea (which follows the previous proposal) is to scrap > start_response() entirely, and instead set gateway.status and > gateway.headers attributes. The simple app would now look like: > > def simple_app(gateway): > gateway.status = '200 OK' > gateway.headers = [('Content-type','text/plain')] # perhaps > gateway.set_header('Content-type','text/plain')? gateway.write('Hello > world!\n') That's starting to look a lot like a mod_python handler. My only other comment for the time being is that if the status argument to the start_response function was changed to an integer instead of a string, it would be marginally easier to compare and branch on. A custom "reason phrase" that comes after the integer in the response status line can be provided by other means, perhaps gateway.reason_phrase, if desired. -- Jim Dabell From ianb at colorstudy.com Mon Aug 23 07:27:06 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Aug 23 07:27:09 2004 Subject: [Web-SIG] Comments/stylistic ideas regarding WSGI In-Reply-To: <200408230630.06907.jim-web-sig@jimdabell.com> References: <6759AAD7.5E1CC4AC.519F8DB3@netscape.net> <200408230630.06907.jim-web-sig@jimdabell.com> Message-ID: <4129802A.2060102@colorstudy.com> Jim Dabell wrote: > My only other comment for the time being is that if the status argument to the > start_response function was changed to an integer instead of a string, it > would be marginally easier to compare and branch on. A custom "reason > phrase" that comes after the integer in the response status line can be > provided by other means, perhaps gateway.reason_phrase, if desired. I've been thinking: is there anything, anywhere, that pays any attention to the reason string? -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From jim-web-sig at jimdabell.com Mon Aug 23 07:45:36 2004 From: jim-web-sig at jimdabell.com (Jim Dabell) Date: Mon Aug 23 07:40:15 2004 Subject: [Web-SIG] Comments/stylistic ideas regarding WSGI In-Reply-To: <4129802A.2060102@colorstudy.com> References: <6759AAD7.5E1CC4AC.519F8DB3@netscape.net> <200408230630.06907.jim-web-sig@jimdabell.com> <4129802A.2060102@colorstudy.com> Message-ID: <200408230645.37129.jim-web-sig@jimdabell.com> On Monday 23 August 2004 06:27, Ian Bicking wrote: > Jim Dabell wrote: > > My only other comment for the time being is that if the status argument > > to the start_response function was changed to an integer instead of a > > string, it would be marginally easier to compare and branch on. A custom > > "reason phrase" that comes after the integer in the response status line > > can be provided by other means, perhaps gateway.reason_phrase, if > > desired. > > I've been thinking: is there anything, anywhere, that pays any attention > to the reason string? If there is, then it's broken. According to RFC 2616, the reason string is intended for humans, giving localisation as an example of when it may vary. "The Reason-Phrase is intended to give a short textual description of the Status-Code. The Status-Code is intended for use by automata and the Reason-Phrase is intended for the human user. The client is not required to examine or display the Reason-Phrase." "The individual values of the numeric status codes defined for HTTP/1.1, and an example set of corresponding Reason-Phrase's, are presented below. The reason phrases listed here are only recommendations -- they MAY be replaced by local equivalents without affecting the protocol." I don't think I've come across anything that pays attention to the reason phrase, but it's a useful reminder to developers when they are debugging something I suppose. -- Jim Dabell From jim-web-sig at jimdabell.com Mon Aug 23 07:46:27 2004 From: jim-web-sig at jimdabell.com (Jim Dabell) Date: Mon Aug 23 07:41:06 2004 Subject: [Web-SIG] Latest WSGI Draft In-Reply-To: <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> References: <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> Message-ID: <200408230646.28097.jim-web-sig@jimdabell.com> On Monday 23 August 2004 00:26, Phillip J. Eby wrote: > At 05:41 PM 8/22/04 -0500, Ian Bicking wrote: > >Phillip J. Eby wrote: > >>Alright. Let's make it 'wsgi.run_once'. Here's my attempt at a shorter > >>explanation: > >>``wsgi.run_once`` This value should be true if the server/gateway > >> expects (but does not guarantee!) that the > >> application will only be invoked this one time > >> during the life of its containing process. > >> Normally, this will only be true for a gateway > >> based on CGI (or something similar). > > > >Is there a reason it can't be guaranteed? > > Is there a reason it *should* be guaranteed? :) Clarity? I don't know about anybody else, but I would assume something called run_once would only run once - and write code that also assumed this :). -- Jim Dabell From pje at telecommunity.com Mon Aug 23 07:52:35 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 23 07:52:21 2004 Subject: [Web-SIG] Comments/stylistic ideas regarding WSGI In-Reply-To: <6759AAD7.5E1CC4AC.519F8DB3@netscape.net> Message-ID: <5.1.1.6.0.20040823011330.0315e050@mail.telecommunity.com> At 12:44 AM 8/23/04 -0400, angryhicKclown@netscape.net wrote: >Now that I understand what WSGI is intended to be used for, I like it a >lot. However, I do have a few suggestions. > >Although it means more typing, I think the API is too cryptic as-is. So, now that you understand the API, you think it's too cryptic. :) All kidding aside, I've made some attempts to make the spec more readable with respect to the various callables, as you'll see in my next draft posting. >I think that applications should be callable, but should have a single >parameter: gateway. The gateway parameter contains attributes and methods >such as environ, start_response(), and write(). This way, it's clear to >the end-user both in documentation (removing many instances of "callable" >and confusion with __init__) and also is very much more natural to many >programmers. I agree that it's more natural, but I disagree that "naturalness" is an important goal for the WSGI spec. The reason is that most of WSGI's initial audience will be implementing exactly *one* server/gateway and/or application, in order to add support for it to their server or application framework. They will thus have "spec in hand" when implementing. It's more important that they be able to easily implement the spec. The second audience for WSGI will be people creating "middleware" components, and they will appreciate the bare-bones nature of WSGI even more, because they will not need to implement a "gateway" class in order to intercept inputs, outputs, or variables. Many fairly sophisticated pieces of middleware will be written as a single function (maybe with one or two nested functions). Best of all, these functions will be *very* explicit as to what they are modifying, because they will not contain code that's needed to emulate functions they aren't replacing. Using multi-functional objects like your "gateway" proposal means that middleware components have to implement the full gateway interface. >Finally, I think the most important reason this change should be >implemented is because it allows the interface to be easily upgraded >without breaking compatibility with older versions. Actually, the current interface includes *numerous* routes for extension, including additional 'wsgi.' keys, and keyword arguments to callables. > Perhaps (just an example), in the future, there will be a need for a > flush() method, in addition to the write() method. In the current > version, start_response() would return a tuple of write() and flush(), > which would break current compatibility. The only other way I see of > doing this using the current spec would be passing a default parameter of > the version of the API used, which is ugly. It would be simple to add a 'wsgi.flush' key to the environ to supply this functionality, were it needed. (Of course, flush() isn't actually needed, because WSGI requires write buffers to always be emptied ASAP.) >In my opinion, my proposal looks a bit clearer. I agree with you, but as I said, it's not a primary goal. WSGI will rarely be used directly by an application developer; it's much more likely that you will use some other Python Web API layered atop WSGI. In other words, the intended audience is developers of servers, frameworks, and middleware. And most framework and server authors will only code to the spec once, probably with the spec in hand so they can check their compliance. I think it's better for them to have an absolutely unequivocal spec, that's simple to implement and easy to verify the correctness of. For example, did you use a dictionary? That's a trivial yes-or-no thing to check, compared to, e.g., "did I implement a sufficiently dictionary-like object?" >My other idea (which follows the previous proposal) is to scrap >start_response() entirely, and instead set gateway.status and >gateway.headers attributes. The simple app would now look like: > > def simple_app(gateway): > gateway.status = '200 OK' > gateway.headers = [('Content-type','text/plain')] # perhaps > gateway.set_header('Content-type','text/plain')? > gateway.write('Hello world!\n') To properly evaluate your proposal, it's inappropriate to use the application-side code as a basis for comparison. Compare the *server-side* code, and the code needed to implement various forms of middleware. You will find that the relatively small gain on the application-side code is *rapidly* counterbalanced by the expanding complexity of servers and middleware. For example, to implement a middleware component that applies an XSLT stylesheet, you'll need to create a class that implements all the WSGI methods, and delegates the ones it doesn't need to the previous gateway object. It will also need properties so it can observe the setting of status and headers, and delegate those as well, while tracking what it needs. By comparison, the functional architecture of WSGI allows a middleware component to simply pass through to the next component whatever it doesn't need to change. For example, a middleware component for applying an XSLT stylesheet would only need to define 'start_response' and 'write' replacements, where the 'start_response' simply munged the headers for content type and length, and the 'write' would pump data into the stylesheet mechanism, and call the old write function with any output. These changes are clearly connected to the functionality: there is no overhead being added just so the next component downstream gets a more "object-oriented" interface. (I'm wondering if I should add any of this to the spec, but it already has a paragraph in the Rationale section saying the API is intentionally no-frills, and another one in the Q&A saying "Why is this interface so low-level?". I'm not sure how much more I can add without it seeming overdefensive, although I'm sure I'll get ten times as many more "why don't you use an object" protests once this hits c.l.py. Oh well.) From pje at telecommunity.com Mon Aug 23 08:03:16 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 23 08:03:02 2004 Subject: [Web-SIG] Latest WSGI Draft In-Reply-To: <41297F84.1090801@colorstudy.com> References: <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com> At 12:24 AM 8/23/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>At 05:41 PM 8/22/04 -0500, Ian Bicking wrote: >> >>>Phillip J. Eby wrote: >>> >>>>Alright. Let's make it 'wsgi.run_once'. Here's my attempt at a >>>>shorter explanation: >>>>``wsgi.run_once`` This value should be true if the server/gateway >>>> expects (but does not guarantee!) that the >>>> application will only be invoked this one time >>>> during the life of its containing process. >>>> Normally, this will only be true for a gateway >>>> based on CGI (or something similar). >>> >>> >>>Is there a reason it can't be guaranteed? >> >>Is there a reason it *should* be guaranteed? :) The last time we had >>this discussion (December?), I thought you'd decided that the standard >>library's "atexit" facility was sufficient to cover your use case if a >>guarantee was needed here. (I only just remembered the "atexit" >>discussion, or I'd have suggested that as the solution instead of >>introducing 'wsgi.last_call' a few days ago.) > >atexit was a different discussion. Really? I thought you were asking for something to be called upon exit as a way of addressing this exact same issue: i.e., the app knowing when to clean up after itself. > I don't know if there's a reason it should be guaranteed, but then I > don't know if there's any situation where it wouldn't be guaranteed. I > can't imagine it being used outside of a CGI context, and it is > guaranteed for CGI. Fine. I just don't like it being anything other than a heuristic. Suppose I'm running acceptance tests? My CGI runner will say "you're being run only once", except then I'll run it again when the acceptance test tests another input. But, I want the acceptance test to test the operation of the application when it's in "cgi mode", effectively. So, what I'm saying is that any app that I wrote, I would want 'run_once' or 'last_run' or whatever it was called to *not* be a guarantee of never running again, but only a suggestion to "rig for infrequent running". If my code *actually* relied upon it being a guarantee, then testing scenarios are hosed. >I can see that working for extensions to the request, but what about >extensions to the response? E.g., some mod_python extension could allow >for internal redirects -- a useful feature that won't fit into WSGI. Really? Why not? Let's say that mod_python provides the function, the app calls it, doesn't call 'start_response', and doesn't return an iterator. What does middleware do? Well, presumably it does nothing. Definitely it does nothing if it's an output transformer, or if it just adds things to the request. So, where's the problem? For other kinds of responses, the behavior is as I outlined before: if the extension is replacing the existing functionality, one should have to call a function to get it, passing in the existing functionality (e.g. environ or start_response) so that the extender can verify that critical functions aren't being mediated by middleware. >>This would simplify the spec somewhat, since we wouldn't need to >>introduce 'wsgi.extensions', and we can also drop the suggestion for >>middleware authors to delete extensions. Middleware is simpler too, it >>just changes what it needs to and moves on with life. :) We would just >>have to add a section on how to build "safe" extensions to the spec. > >I do like the idea of simplifying this part of the spec. If it works. >It's also something people can work out on their own. I expect the vast >majority of these servers and applications to be open source, and if some >pieces don't work together at first there's a feedback loop to fix that. > >Also, I don't think any of these discussions need to be resolved before >this becomes a real PEP. There's going to be more discussion then (no >matter how much we discuss now), and this discussion can just be part of >that process. Alas, that's all too true. :( From ianb at colorstudy.com Mon Aug 23 08:53:33 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Aug 23 08:53:39 2004 Subject: [Web-SIG] Latest WSGI Draft In-Reply-To: <5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com> References: <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> <5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com> Message-ID: <4129946D.3020408@colorstudy.com> Phillip J. Eby wrote: >> atexit was a different discussion. > > > Really? I thought you were asking for something to be called upon exit > as a way of addressing this exact same issue: i.e., the app knowing when > to clean up after itself. I'm not really that concerned about the cleanup with CGI. I don't think it's that important that the application is used only once in a CGI context, but then I don't think it matters much the other way either. But I don't see any advantage for some other server to set run_once when it *thinks* this is the last request (but doesn't know it for sure). In fact, I don't even see a reason for wsgi.run_once if the server *knows* this is the last request, except in the case where the last request is also the first request (i.e., CGI). I just think wsgi.run_once and CGI are the same, and there's no reason to state it any differently than that. And CGI guarantees your application won't be rerun. Or the spec could simply be silent on the matter, without stressing the issue one way or the other. >> I don't know if there's a reason it should be guaranteed, but then I >> don't know if there's any situation where it wouldn't be guaranteed. >> I can't imagine it being used outside of a CGI context, and it is >> guaranteed for CGI. > > > Fine. I just don't like it being anything other than a heuristic. > Suppose I'm running acceptance tests? My CGI runner will say "you're > being run only once", except then I'll run it again when the acceptance > test tests another input. But, I want the acceptance test to test the > operation of the application when it's in "cgi mode", effectively. If you're running multiple unit tests in a single process, you aren't in CGI mode, and you shouldn't set that key. You're in some other mode. If CGI mode really matters, the only test that is accurate is one where you are actually launching a separate process. > So, what I'm saying is that any app that I wrote, I would want > 'run_once' or 'last_run' or whatever it was called to *not* be a > guarantee of never running again, but only a suggestion to "rig for > infrequent running". If my code *actually* relied upon it being a > guarantee, then testing scenarios are hosed. > >> I can see that working for extensions to the request, but what about >> extensions to the response? E.g., some mod_python extension could >> allow for internal redirects -- a useful feature that won't fit into >> WSGI. > > > Really? Why not? Let's say that mod_python provides the function, the > app calls it, doesn't call 'start_response', and doesn't return an > iterator. What does middleware do? Well, presumably it does nothing. > Definitely it does nothing if it's an output transformer, or if it just > adds things to the request. So, where's the problem? Well, let's say mod_python adds two extensions. One is to do a local redirect, the other is to do a recursive call. The local redirect would be in wsgi.extensions (if it existed), but the recursive call would not. With wsgi.extensions, the middleware would eliminate the local redirect, and the application would be forced to use the recursive call and write out the result of that. Which is what you would want, because then the middleware would have an opportunity to modify the output. I still can't think of a good way to define wsgi.extensions or give rules for what should go in there. I can see some case for it, but since it's vague I don't think it should be included in the spec. There's room to add it later if it turns out to be important. > For other kinds of responses, the behavior is as I outlined before: if > the extension is replacing the existing functionality, one should have > to call a function to get it, passing in the existing functionality > (e.g. environ or start_response) so that the extender can verify that > critical functions aren't being mediated by middleware. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Mon Aug 23 08:59:51 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Aug 23 08:59:54 2004 Subject: [Web-SIG] Latest WSGI Draft In-Reply-To: <4129946D.3020408@colorstudy.com> References: <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> <5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com> <4129946D.3020408@colorstudy.com> Message-ID: <412995E7.50605@colorstudy.com> Ian Bicking wrote: >> Fine. I just don't like it being anything other than a heuristic. >> Suppose I'm running acceptance tests? My CGI runner will say "you're >> being run only once", except then I'll run it again when the >> acceptance test tests another input. But, I want the acceptance test >> to test the operation of the application when it's in "cgi mode", >> effectively. > > > If you're running multiple unit tests in a single process, you aren't in > CGI mode, and you shouldn't set that key. You're in some other mode. If > CGI mode really matters, the only test that is accurate is one where you > are actually launching a separate process. Now that I think about it, maybe it does make sense for testing purposes that run_once doesn't mean that it's the last run -- it would be annoyingly slow to start a process for each test, and might make it hard to do real unit tests, but if you have a different code path when wsgi.run_once is true then it's important to test that. OTOH, if I'm testing a project, I can make sure that my code doesn't require the process to terminate; code and tests are hardly decoupled after all. Anyway, I guess I retract my concern over this issue. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From andrew at andreweland.org Mon Aug 23 11:37:29 2004 From: andrew at andreweland.org (Andrew Eland) Date: Mon Aug 23 11:45:56 2004 Subject: [Web-SIG] WSGI and sendfile() Message-ID: <4129BAD9.3080104@andreweland.org> The WSGI draft seems to be progressing well, it's great to see some effort at standardisation in this area. I had a couple of thoughts: If write() allowed an object implementing the fileno() method as a parameter, then an implementation is free to use the sendfile() syscall to efficiently send the entire contents of a file descriptor to the client. I don't know whether others think this is useful enough functionality to warrant the extra implementation complexity. If you ignore the possible efficiency gains, and sendfile() is emulated by the implementation, it still reduces the amount of code that needs to be written to serve a static file. There's an as asymmetry in streaming. Although the use of iterators allows a single-threaded implementation to stream a response to many clients simultaneously with something like select(), it doesn't work the other way around. If the only access to the request body is via the wsgi.input stream, all reads will be blocking. Although processing many large uploads simultaneously isn't such a common use case when developing websites, it can be when developing web services. -- Andrew Eland (http://www.andreweland.org) From angryhicKclown at netscape.net Mon Aug 23 16:32:00 2004 From: angryhicKclown at netscape.net (angryhicKclown@netscape.net) Date: Mon Aug 23 16:32:09 2004 Subject: [Web-SIG] RE: Comments/stylistic ideas regarding WSGI Message-ID: <64804A9E.49192330.519F8DB3@netscape.net> >Date: Mon, 23 Aug 2004 01:52:35 -0400 >From: "Phillip J. Eby" >Subject: Re: [Web-SIG] Comments/stylistic ideas regarding WSGI >To: angryhicKclown@netscape.net, web-sig@python.org >Message-ID: <5.1.1.6.0.20040823011330.0315e050@mail.telecommunity.com> >Content-Type: text/plain; charset="us-ascii"; format=flowed > >At 12:44 AM 8/23/04 -0400, angryhicKclown@netscape.net wrote: >>Now that I understand what WSGI is intended to be used for, I like it a >>lot. However, I do have a few suggestions. >> >>Although it means more typing, I think the API is too cryptic as-is. > >So, now that you understand the API, you think it's too cryptic. ?:) > >All kidding aside, I've made some attempts to make the spec more readable >with respect to the various callables, as you'll see in my next draft posting. > > >>I think that applications should be callable, but should have a single >>parameter: gateway. The gateway parameter contains attributes and methods >>such as environ, start_response(), and write(). This way, it's clear to >>the end-user both in documentation (removing many instances of "callable" >>and confusion with __init__) and also is very much more natural to many >>programmers. > >I agree that it's more natural, but I disagree that "naturalness" is an >important goal for the WSGI spec. ?The reason is that most of WSGI's >initial audience will be implementing exactly *one* server/gateway and/or >application, in order to add support for it to their server or application >framework. ?They will thus have "spec in hand" when implementing. ?It's >more important that they be able to easily implement the spec. I agree, however "a callable that is passed a callable which returns a callable" could be mind-bending for some people. > >The second audience for WSGI will be people creating "middleware" >components, and they will appreciate the bare-bones nature of WSGI even >more, because they will not need to implement a "gateway" class in order to >intercept inputs, outputs, or variables. ?Many fairly sophisticated pieces >of middleware will be written as a single function (maybe with one or two >nested functions). > >Best of all, these functions will be *very* explicit as to what they are >modifying, because they will not contain code that's needed to emulate >functions they aren't replacing. ?Using multi-functional objects like your >"gateway" proposal means that middleware components have to implement the >full gateway interface. > Not neccessarily. They could extend something like this: class Gateway(object): def __init__(self, parent=None): self.parent = parent def __getattribute__(self, key): try: return object.__getattribute__(self, key) except AttributeError: if self.parent != None: return getattr(self.parent, key) else: raise def write(self, data): raise NotImplementedError # ... more standard API functions here and instantiate it with the gateway they were passed from the caller. > >>Finally, I think the most important reason this change should be >>implemented is because it allows the interface to be easily upgraded >>without breaking compatibility with older versions. > >Actually, the current interface includes *numerous* routes for extension, >including additional 'wsgi.' keys, and keyword arguments to callables. > I don't see why this can't be solved with OOP. > >> ?Perhaps (just an example), in the future, there will be a need for a >> flush() method, in addition to the write() method. In the current >> version, start_response() would return a tuple of write() and flush(), >> which would break current compatibility. The only other way I see of >> doing this using the current spec would be passing a default parameter of >> the version of the API used, which is ugly. > >It would be simple to add a 'wsgi.flush' key to the environ to supply this >functionality, were it needed. ?(Of course, flush() isn't actually needed, >because WSGI requires write buffers to always be emptied ASAP.) Fair enough, however I think we're trying to solve a problem (extensions) which has already been solved by inheritance. > >>In my opinion, my proposal looks a bit clearer. > >I agree with you, but as I said, it's not a primary goal. ?WSGI will rarely >be used directly by an application developer; it's much more likely that >you will use some other Python Web API layered atop WSGI. ?In other words, >the intended audience is developers of servers, frameworks, and >middleware. ?And most framework and server authors will only code to the >spec once, probably with the spec in hand so they can check their >compliance. ?I think it's better for them to have an absolutely unequivocal >spec, that's simple to implement and easy to verify the correctness >of. ?For example, did you use a dictionary? ?That's a trivial yes-or-no >thing to check, compared to, e.g., "did I implement a sufficiently >dictionary-like object?" "Did I override the write() method?" >>My other idea (which follows the previous proposal) is to scrap >>start_response() entirely, and instead set gateway.status and >>gateway.headers attributes. The simple app would now look like: >> >> ? ? def simple_app(gateway): >> ? ? ? ? gateway.status = '200 OK' >> ? ? ? ? gateway.headers = [('Content-type','text/plain')] # perhaps >> gateway.set_header('Content-type','text/plain')? >> ? ? ? ? gateway.write('Hello world!\n') > >To properly evaluate your proposal, it's inappropriate to use the >application-side code as a basis for comparison. ?Compare the *server-side* >code, and the code needed to implement various forms of middleware. ?You >will find that the relatively small gain on the application-side code is >*rapidly* counterbalanced by the expanding complexity of servers and >middleware. ?For example, to implement a middleware component that applies >an XSLT stylesheet, you'll need to create a class that implements all the >WSGI methods, and delegates the ones it doesn't need to the previous >gateway object. ?It will also need properties so it can observe the setting >of status and headers, and delegate those as well, while tracking what it >needs. Proposal withdrawn. >By comparison, the functional architecture of WSGI allows a middleware >component to simply pass through to the next component whatever it doesn't >need to change. ?For example, a middleware component for applying an XSLT >stylesheet would only need to define 'start_response' and 'write' >replacements, where the 'start_response' simply munged the headers for >content type and length, and the 'write' would pump data into the >stylesheet mechanism, and call the old write function with any output. > >These changes are clearly connected to the functionality: there is no >overhead being added just so the next component downstream gets a more >"object-oriented" interface. OK. >(I'm wondering if I should add any of this to the spec, but it already has >a paragraph in the Rationale section saying the API is intentionally >no-frills, and another one in the Q&A saying "Why is this interface so >low-level?". ?I'm not sure how much more I can add without it seeming >overdefensive, although I'm sure I'll get ten times as many more "why don't >you use an object" protests once this hits c.l.py. ?Oh well.) I'd say you should write a short paragraph under "Questions and Answers" regarding it. It's a great proposal thus far, I just think it's not as clean as it could be. __________________________________________________________________ Switch to Netscape Internet Service. As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register Netscape. Just the Net You Need. New! Netscape Toolbar for Internet Explorer Search from anywhere on the Web and block those annoying pop-ups. Download now at http://channels.netscape.com/ns/search/install.jsp From pje at telecommunity.com Mon Aug 23 17:37:03 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 23 17:36:49 2004 Subject: [Web-SIG] Latest WSGI Draft In-Reply-To: <4129946D.3020408@colorstudy.com> References: <5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com> <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> <5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040823110736.028a9300@mail.telecommunity.com> At 01:53 AM 8/23/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>Fine. I just don't like it being anything other than a heuristic. >>Suppose I'm running acceptance tests? My CGI runner will say "you're >>being run only once", except then I'll run it again when the acceptance >>test tests another input. But, I want the acceptance test to test the >>operation of the application when it's in "cgi mode", effectively. > >If you're running multiple unit tests in a single process, you aren't in >CGI mode, and you shouldn't set that key. You're in some other mode. If >CGI mode really matters, the only test that is accurate is one where you >are actually launching a separate process. Not if the purpose is to test the code branch that e.g. saves your sessions when it's run in CGI mode. What I'm getting at here is that the purpose of this "CGI mode" is to tell the app to perform certain behaviors on a different heuristic pattern. There's nothing about that, that requires a guarantee of being only run once. I say that because a CGI application is going to get run more than once, anyway, so obviously whatever it does can be done more than once. And, it's hard to test it if you need to run a new process every time it runs. >>>I can see that working for extensions to the request, but what about >>>extensions to the response? E.g., some mod_python extension could allow >>>for internal redirects -- a useful feature that won't fit into WSGI. >> >>Really? Why not? Let's say that mod_python provides the function, the >>app calls it, doesn't call 'start_response', and doesn't return an >>iterator. What does middleware do? Well, presumably it does nothing. >>Definitely it does nothing if it's an output transformer, or if it just >>adds things to the request. So, where's the problem? > >Well, let's say mod_python adds two extensions. One is to do a local >redirect, the other is to do a recursive call. The local redirect would >be in wsgi.extensions (if it existed), but the recursive call would >not. With wsgi.extensions, the middleware would eliminate the local >redirect, and the application would be forced to use the recursive call >and write out the result of that. Which is what you would want, because >then the middleware would have an opportunity to modify the output. In that case, why not have the local_redirect function require the start_response callable as one of its parameters? It can then refuse if the output has been captured by middleware. >I still can't think of a good way to define wsgi.extensions or give rules >for what should go in there. I can see some case for it, but since it's >vague I don't think it should be included in the spec. There's room to add >it later if it turns out to be important. We don't need wsgi.extensions, we just need for servers and gateways to make their extension APIs middleware-safe, by verifying that things the APIs depend on haven't been changed by middleware. I'll write up an explanation of this in the spec. From ianb at colorstudy.com Mon Aug 23 17:43:17 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Aug 23 17:44:37 2004 Subject: [Web-SIG] Latest WSGI Draft In-Reply-To: <5.1.1.6.0.20040823110736.028a9300@mail.telecommunity.com> References: <5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com> <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> <5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com> <5.1.1.6.0.20040823110736.028a9300@mail.telecommunity.com> Message-ID: <412A1095.2050603@colorstudy.com> Phillip J. Eby wrote: >> Well, let's say mod_python adds two extensions. One is to do a local >> redirect, the other is to do a recursive call. The local redirect >> would be in wsgi.extensions (if it existed), but the recursive call >> would not. With wsgi.extensions, the middleware would eliminate the >> local redirect, and the application would be forced to use the >> recursive call and write out the result of that. Which is what you >> would want, because then the middleware would have an opportunity to >> modify the output. > > > In that case, why not have the local_redirect function require the > start_response callable as one of its parameters? It can then refuse if > the output has been captured by middleware. How can it tell output is going to be captured? -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Mon Aug 23 17:56:53 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 23 17:56:39 2004 Subject: [Web-SIG] Latest WSGI Draft In-Reply-To: <412995E7.50605@colorstudy.com> References: <4129946D.3020408@colorstudy.com> <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> <5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com> <4129946D.3020408@colorstudy.com> Message-ID: <5.1.1.6.0.20040823115633.028a0d80@mail.telecommunity.com> At 01:59 AM 8/23/04 -0500, Ian Bicking wrote: >Ian Bicking wrote: >>>Fine. I just don't like it being anything other than a heuristic. >>>Suppose I'm running acceptance tests? My CGI runner will say "you're >>>being run only once", except then I'll run it again when the acceptance >>>test tests another input. But, I want the acceptance test to test the >>>operation of the application when it's in "cgi mode", effectively. >> >>If you're running multiple unit tests in a single process, you aren't in >>CGI mode, and you shouldn't set that key. You're in some other mode. If >>CGI mode really matters, the only test that is accurate is one where you >>are actually launching a separate process. > >Now that I think about it, maybe it does make sense for testing purposes >that run_once doesn't mean that it's the last run -- it would be >annoyingly slow to start a process for each test, and might make it hard >to do real unit tests, but if you have a different code path when >wsgi.run_once is true then it's important to test that. OTOH, if I'm >testing a project, I can make sure that my code doesn't require the >process to terminate; code and tests are hardly decoupled after all. > >Anyway, I guess I retract my concern over this issue. So, leave 'wsgi.run_once' the way I last proposed it? From ianb at colorstudy.com Mon Aug 23 17:56:18 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Aug 23 17:57:38 2004 Subject: [Web-SIG] Latest WSGI Draft In-Reply-To: <5.1.1.6.0.20040823115633.028a0d80@mail.telecommunity.com> References: <4129946D.3020408@colorstudy.com> <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> <5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com> <4129946D.3020408@colorstudy.com> <5.1.1.6.0.20040823115633.028a0d80@mail.telecommunity.com> Message-ID: <412A13A2.2030401@colorstudy.com> Phillip J. Eby wrote: > So, leave 'wsgi.run_once' the way I last proposed it? Yep. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Mon Aug 23 18:02:11 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 23 18:01:58 2004 Subject: [Web-SIG] WSGI and sendfile() In-Reply-To: <4129BAD9.3080104@andreweland.org> Message-ID: <5.1.1.6.0.20040823115702.033d7c80@mail.telecommunity.com> At 10:37 AM 8/23/04 +0100, Andrew Eland wrote: >The WSGI draft seems to be progressing well, it's great to see some effort >at standardisation in this area. > >I had a couple of thoughts: > >If write() allowed an object implementing the fileno() method as a >parameter, then an implementation is free to use the sendfile() syscall to >efficiently send the entire contents of a file descriptor to the client. >I don't know whether others think this is useful enough functionality to >warrant the extra implementation complexity. >If you ignore the possible efficiency gains, and sendfile() is emulated by >the implementation, it still reduces the amount of code that needs to be >written to serve a static file. If the use case is just to send *one* file, this could be supported by the application returning a file object; we could amend the spec to indicate that if the returned iterable has a 'fileno()' attribute, the server *may* use OS facilities to read data directly from the descriptor, but must still call the iterable's close() method, rather than closing the file descriptor. >There's an as asymmetry in streaming. Although the use of iterators allows >a single-threaded implementation to stream a response to many clients >simultaneously with something like select(), it doesn't work the other way >around. If the only access to the request body is via the wsgi.input >stream, all reads will be blocking. Although processing many large uploads >simultaneously isn't such a common use case when developing websites, it >can be when developing web services. Perhaps it should be mentioned that the server *is* allowed to buffer the input stream to e.g. a temporary file, *before* invoking the application. From pje at telecommunity.com Mon Aug 23 18:13:30 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 23 18:13:14 2004 Subject: [Web-SIG] RE: Comments/stylistic ideas regarding WSGI In-Reply-To: <64804A9E.49192330.519F8DB3@netscape.net> Message-ID: <5.1.1.6.0.20040823120237.036d36b0@mail.telecommunity.com> At 10:32 AM 8/23/04 -0400, angryhicKclown@netscape.net wrote: >I agree, however "a callable that is passed a callable which returns a >callable" could be mind-bending for some people. As I said, I've done some work on that. When the next draft comes out, if you can provide diffs of what you'd like it to say in those spots, it'll be helpful. >Not neccessarily. They could extend something like this: > >class Gateway(object): > def __init__(self, parent=None): > self.parent = parent > def __getattribute__(self, key): > try: > return object.__getattribute__(self, key) > except AttributeError: > if self.parent != None: > return getattr(self.parent, key) > else: > raise > def write(self, data): > raise NotImplementedError > # ... more standard API functions here > >and instantiate it with the gateway they were passed from the caller. And all of that code is pure "excise"... a tax on the implementor that doesn't provide *them* with any benefit. It's what the Zope folks call a "dead chicken": boilerplate code that everybody has to use, but nobody understands, copied mindlessly from one implementation to another, subject to subtle bugs that then will only be fixed in some of the implementations and not in others. > >>>Finally, I think the most important reason this change should be > >>implemented is because it allows the interface to be easily upgraded > >>without breaking compatibility with older versions. > > > >Actually, the current interface includes *numerous* routes for extension, > >including additional 'wsgi.' keys, and keyword arguments to callables. > > > >I don't see why this can't be solved with OOP. Because the point of OOP is to encapsulate functions into one object; WSGI wants the functions to be as separate as possible so middleware can selectively replace functions and delegate to the old ones. Thus, OOP does not "solve" anything here; it introduces more problems. I'm ordinarily a very OOPish person, but this is one of those cases where it is the exact opposite of a solution. >Fair enough, however I think we're trying to solve a problem (extensions) >which has already been solved by inheritance. No, the appropriate solution is the "Chain Of Responsibility" pattern, if you're familiar with the GOF patterns. It's just that since Python functions are first-class objects, it's trivial to implement a Chain Of Responsibility with functions, rather than creating several objects to each house one function. From pje at telecommunity.com Mon Aug 23 18:16:32 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 23 18:16:15 2004 Subject: [Web-SIG] Latest WSGI Draft In-Reply-To: <412A1095.2050603@colorstudy.com> References: <5.1.1.6.0.20040823110736.028a9300@mail.telecommunity.com> <5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com> <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com> <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com> <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com> <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com> <5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com> <5.1.1.6.0.20040823110736.028a9300@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040823121344.036de8b0@mail.telecommunity.com> At 10:43 AM 8/23/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>>Well, let's say mod_python adds two extensions. One is to do a local >>>redirect, the other is to do a recursive call. The local redirect would >>>be in wsgi.extensions (if it existed), but the recursive call would >>>not. With wsgi.extensions, the middleware would eliminate the local >>>redirect, and the application would be forced to use the recursive call >>>and write out the result of that. Which is what you would want, because >>>then the middleware would have an opportunity to modify the output. >> >>In that case, why not have the local_redirect function require the >>start_response callable as one of its parameters? It can then refuse if >>the output has been captured by middleware. > >How can it tell output is going to be captured? If 'start_response' is a different 'start_response' than the one it gave the application. A middleware component has no need to replace 'start_response' unless it needs to control the output in some way. Thus, using any extension API that allows direct output would be bypassing middleware in that case. From andrew at andreweland.org Tue Aug 24 12:57:32 2004 From: andrew at andreweland.org (Andrew Eland) Date: Tue Aug 24 13:06:09 2004 Subject: [Web-SIG] WSGI and sendfile() In-Reply-To: <5.1.1.6.0.20040823115702.033d7c80@mail.telecommunity.com> References: <5.1.1.6.0.20040823115702.033d7c80@mail.telecommunity.com> Message-ID: <412B1F1C.4010209@andreweland.org> Phillip J. Eby wrote: > At 10:37 AM 8/23/04 +0100, Andrew Eland wrote: > > we could amend the spec to indicate that if the returned iterable has a > 'fileno()' attribute, the server *may* use OS facilities to read data directly > from the descriptor, but must still call the iterable's close() method, rather > than closing the file descriptor. That sounds fine to me. > Perhaps it should be mentioned that the server *is* allowed to buffer > the input stream to e.g. a temporary file, *before* invoking the > application. Another solution would be to feed the request body to the application as it arrives, via some callback function. It's probably not worth the extra complexity, as the number of applications that stream a response based on incremental processing of the request body will be pretty small. -- Andrew (http://www.andreweland.org) From floydophone at gmail.com Tue Aug 24 16:52:00 2004 From: floydophone at gmail.com (Peter Hunt) Date: Tue Aug 24 16:52:02 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <20040824100006.1E15F1E400A@bag.python.org> References: <20040824100006.1E15F1E400A@bag.python.org> Message-ID: <6654eac4040824075242be15dd@mail.gmail.com> Is there a "Hello, world!" type of middleware that I could take a look at? From pje at telecommunity.com Tue Aug 24 17:34:55 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Aug 24 17:34:50 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <6654eac4040824075242be15dd@mail.gmail.com> References: <20040824100006.1E15F1E400A@bag.python.org> <20040824100006.1E15F1E400A@bag.python.org> Message-ID: <5.1.1.6.0.20040824113118.0329e5b0@mail.telecommunity.com> At 10:52 AM 8/24/04 -0400, Peter Hunt wrote: >Is there a "Hello, world!" type of middleware that I could take a look at? How about this: def make_middleware(application): def middleware(environ, start_response): def extra_response(status,headers): write = start_response(status, headers) write('Hello world!\n') return write return application(environ, extra_response) return middleware Calling 'make_middleware(some_application)' creates a new "application" object that can be supplied to a server, that prepends "Hello world" to the body of every response issued by the original application object. From ianb at colorstudy.com Tue Aug 24 18:26:27 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Tue Aug 24 18:28:21 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <6654eac4040824075242be15dd@mail.gmail.com> References: <20040824100006.1E15F1E400A@bag.python.org> <6654eac4040824075242be15dd@mail.gmail.com> Message-ID: <412B6C33.9080102@colorstudy.com> Peter Hunt wrote: > Is there a "Hello, world!" type of middleware that I could take a look at? I haven't tested this (but I'll try to tonight), but here's perhaps a more realistic middleware. This compresses (with gzip) the response from the application (if it is allowed to): import gzip from cStringIO import StringIO class gzip_middleware(object): def __init__(self, application, compress_level=5): self.application = application self.compress_level = compress_level def __call__(self, environ, start_response): if 'gzip' not in environ.get('HTTP_ACCEPT'): # nothing for us to do, so this middleware will # be a no-op: return application(environ, start_response) response = GzipResponse(start_response, self.compress_level) app_iter = self.application(environ, response.gzip_start_response) response.finish_response(app_iter) return None class GzipResponse(object): def __iter__(self, start_response, compress_level): self.start_response = start_response self.compress_level = compress_level self.gzip_fileobj = None def gzip_start_response(self, status, headers): # This isn't part of the spec yet: if headers.has_key('content-encoding'): # we won't double-encode return self.start_response(status, headers) headers['content-encoding'] = 'gzip' raw_writer = self.start_response(status, headers) dummy_fileobj = object() dummy_fileobj.write = raw_writer self.gzip_fileobj = GzipFile('', 'wb', self.compress_level, dummy_fileobj) return self.gzip_fileobj.write def finish_response(self, app_iter): try: for s in app_iter: self.gzip_fileobj.write(s) finally: if hasattr(app_iter, 'close'): app_iter.close() self.gzip_fileobj.close() Hmm... For a very simple filter, I actually found that surprisingly difficult to write. And I think it should take advantage of its server's iteration, but currently it only uses the "push" (write function) aspect of the server. But I'm not sure how exactly I would do that, especially so that the iteration actually had any beneficial properties. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Tue Aug 24 19:26:47 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Aug 24 19:26:34 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <412B6C33.9080102@colorstudy.com> References: <6654eac4040824075242be15dd@mail.gmail.com> <20040824100006.1E15F1E400A@bag.python.org> <6654eac4040824075242be15dd@mail.gmail.com> Message-ID: <5.1.1.6.0.20040824130106.02a19590@mail.telecommunity.com> At 11:26 AM 8/24/04 -0500, Ian Bicking wrote: >Peter Hunt wrote: >>Is there a "Hello, world!" type of middleware that I could take a look at? > >I haven't tested this (but I'll try to tonight), but here's perhaps a more >realistic middleware. This compresses (with gzip) the response from the >application (if it is allowed to): > >import gzip >from cStringIO import StringIO > >class gzip_middleware(object): > > def __init__(self, application, compress_level=5): > self.application = application > self.compress_level = compress_level > > def __call__(self, environ, start_response): > if 'gzip' not in environ.get('HTTP_ACCEPT'): > # nothing for us to do, so this middleware will > # be a no-op: > return application(environ, start_response) > response = GzipResponse(start_response, self.compress_level) > app_iter = self.application(environ, > response.gzip_start_response) > response.finish_response(app_iter) > return None > >class GzipResponse(object): > > def __iter__(self, start_response, compress_level): I think you meant '__init__' here. > self.start_response = start_response > self.compress_level = compress_level > self.gzip_fileobj = None > > def gzip_start_response(self, status, headers): > # This isn't part of the spec yet: > if headers.has_key('content-encoding'): > # we won't double-encode > return self.start_response(status, headers) > > headers['content-encoding'] = 'gzip' > raw_writer = self.start_response(status, headers) > dummy_fileobj = object() > dummy_fileobj.write = raw_writer > self.gzip_fileobj = GzipFile('', 'wb', self.compress_level, > dummy_fileobj) > return self.gzip_fileobj.write > > def finish_response(self, app_iter): > try: > for s in app_iter: > self.gzip_fileobj.write(s) > finally: > if hasattr(app_iter, 'close'): > app_iter.close() > self.gzip_fileobj.close() > > > > >Hmm... For a very simple filter, I actually found that surprisingly >difficult to write. Maybe because you used classes unnecessarily? class GzipOutput(object): pass def gzip_middleware(application, compress_level=5): def do_gzip(environ, start_response): writer = [] if 'gzip' not in environ.get('HTTP_ACCEPT'): # nothing for us to do, so this middleware will # be a no-op: return application(environ, start_response) def gzip_start_response(status, headers): if 'content-encoding' in headers: writer.append(start_response(status,headers)) else: headers['content-encoding'] = gzip raw_writer = start_response(status,headers) dummy_fileobj = GzipOutput() dummy_fileobj.write = raw_writer gzip_file = GzipFile('','wb',compress_level,dummy_fileobj) writer.append(gzip_file.write) return writer[0] app_iter = application(environ,gzip_start_response) if app_iter and writer: try: map(writer[0],app_iter) finally: if hasattr(app_iter,'close'): app_iter.close() else: return app_iter return do_gzip Hm. That's only slightly less complicated. Still, the only "excise" is handling the try/finally for the close -- virtually everything else is directly connected to the required functionality. (By the way, your implementation tries to iterate even if the app returns None, and you can't set arbitrary attributes on 'object' instances.) It may be that the PEP should contain a list of suggested utility functions, like this one: def finish_response(write_func,app_return): if app_return: try: map(write_func,app_return) finally: if hasattr(app_return,'close'): app_return.close() Such a routine would come in handy for response-munging middleware. > And I think it should take advantage of its server's iteration, but > currently it only uses the "push" (write function) aspect of the > server. But I'm not sure how exactly I would do that, especially so that > the iteration actually had any beneficial properties. For the given application, it's not important. Gzipping a server push stream probably doesn't make a lot of sense. :) If you *really* want to support it, you could do something like: def iter_response(transformer, queue, app_return): for data in app_return: transformer(data) if queue: yield ''.join(queue) queue[:] = [] Where "queue" is a list appended to by the 'transformer'. For your example, you could set it up like this: queue = [] dummy_fileobj.write = queue.append I'll leave the rest as an exercise for the reader. :) From pje at telecommunity.com Tue Aug 24 20:07:52 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Aug 24 20:07:36 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <5.1.1.6.0.20040824130106.02a19590@mail.telecommunity.com> References: <412B6C33.9080102@colorstudy.com> <6654eac4040824075242be15dd@mail.gmail.com> <20040824100006.1E15F1E400A@bag.python.org> <6654eac4040824075242be15dd@mail.gmail.com> Message-ID: <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> At 01:26 PM 8/24/04 -0400, Phillip J. Eby wrote: >At 11:26 AM 8/24/04 -0500, Ian Bicking wrote: >> headers['content-encoding'] = 'gzip' > headers['content-encoding'] = gzip Oops. We both goofed: this should be: headers['content-encoding'] = ['gzip'] From ianb at colorstudy.com Wed Aug 25 02:37:01 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Aug 25 02:37:07 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> References: <412B6C33.9080102@colorstudy.com> <6654eac4040824075242be15dd@mail.gmail.com> <20040824100006.1E15F1E400A@bag.python.org> <6654eac4040824075242be15dd@mail.gmail.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> Message-ID: <412BDF2D.5090404@colorstudy.com> Phillip J. Eby wrote: > Oops. We both goofed: this should be: > > headers['content-encoding'] = ['gzip'] Was there any resolution on how headers are going to work? While it's certainly more confusing to deal with a list of headers, as opposed to a dictionary of headers, I feel like the whole thing is a little vague at this point. Must all values be lists? Other sequences? Is it an error to put a string there? I fear I'd see a lot of: content-encoding: g content-encoding: z content-encoding: i content-encoding: p Must all keys be lower case? If not, headers aren't going to be any easier to work with as a dictionary than as a list. If they are required to be lower case, again it seems like a fragile part of the spec. It all makes me think that it'd just be easier to write the four or so functions to make lists of headers easy to deal with. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Wed Aug 25 02:49:34 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Aug 25 02:49:42 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <5.1.1.6.0.20040824130106.02a19590@mail.telecommunity.com> References: <6654eac4040824075242be15dd@mail.gmail.com> <20040824100006.1E15F1E400A@bag.python.org> <6654eac4040824075242be15dd@mail.gmail.com> <5.1.1.6.0.20040824130106.02a19590@mail.telecommunity.com> Message-ID: <412BE21E.20504@colorstudy.com> Phillip J. Eby wrote: >> Hmm... For a very simple filter, I actually found that surprisingly >> difficult to write. > > > Maybe because you used classes unnecessarily? It wasn't so much the result, as the process -- keeping track of the state was difficult for me, with one function outside of the application (the application wrapper) and another inside of the application (the start_process wrapper). I had to remember which function fit what part of the process, and how to best keep the state around, and that was more difficult than I expected. > class GzipOutput(object): > pass > > def gzip_middleware(application, compress_level=5): > > def do_gzip(environ, start_response): > > writer = [] Using a list to simulate mutable inner scopes is hardly what I'd consider a Hello World class of example! While the trick works, it's not something that I would do without a compelling reason; certainly not just to save creating one class. > if 'gzip' not in environ.get('HTTP_ACCEPT'): > # nothing for us to do, so this middleware will > # be a no-op: > return application(environ, start_response) > > def gzip_start_response(status, headers): > if 'content-encoding' in headers: > writer.append(start_response(status,headers)) > else: > headers['content-encoding'] = gzip > raw_writer = start_response(status,headers) > dummy_fileobj = GzipOutput() > dummy_fileobj.write = raw_writer > gzip_file = > GzipFile('','wb',compress_level,dummy_fileobj) > writer.append(gzip_file.write) > return writer[0] > > app_iter = application(environ,gzip_start_response) > > if app_iter and writer: > try: > map(writer[0],app_iter) > finally: > if hasattr(app_iter,'close'): > app_iter.close() > else: > return app_iter > > return do_gzip > > Hm. That's only slightly less complicated. Still, the only "excise" is > handling the try/finally for the close -- virtually everything else is > directly connected to the required functionality. (By the way, your > implementation tries to iterate even if the app returns None, and you > can't set arbitrary attributes on 'object' instances.) > > It may be that the PEP should contain a list of suggested utility > functions, like this one: > > def finish_response(write_func,app_return): > if app_return: > try: > map(write_func,app_return) > finally: > if hasattr(app_return,'close'): > app_return.close() > > Such a routine would come in handy for response-munging middleware. I believe you also have to close the GzipFile, as it won't flush its final output until that happens. So the finally block has to include that as well. That makes finish_response a bit less of a win. And again, map is clever but something of an abuse of the function, and not appropriate for any example code. >> And I think it should take advantage of its server's iteration, but >> currently it only uses the "push" (write function) aspect of the >> server. But I'm not sure how exactly I would do that, especially so >> that the iteration actually had any beneficial properties. > > > For the given application, it's not important. Gzipping a server push > stream probably doesn't make a lot of sense. :) How so? -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Wed Aug 25 03:53:08 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Aug 25 03:53:11 2004 Subject: [Web-SIG] WSGI sample applications Message-ID: <412BF104.6010200@colorstudy.com> I've started writing some sample code using WSGI. So far just a working version of the gzip-encoder, a hello world app, the CGI server example from the WSGI PEP, and a small URL dispatcher. svn://colorstudy.com/trunk/WSGI/ http://colorstudy.com/cgi-bin/viewcvs.cgi/trunk/WSGI/ I'll be trying to make some other applications as time goes by, kind of according to http://blog.colorstudy.com/ianb/weblog/2004/08/22.html#P150 -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From floydophone at gmail.com Wed Aug 25 04:56:36 2004 From: floydophone at gmail.com (Peter Hunt) Date: Wed Aug 25 04:56:41 2004 Subject: [Web-SIG] Where do sessions fit in? Message-ID: <6654eac40408241956694d9916@mail.gmail.com> I've realized now how middleware works. Now, I'm wondering where sessions would fit in. Would they be a piece of middleware, or an extension? If so, what would the interface look like? From ianb at colorstudy.com Wed Aug 25 05:50:00 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Aug 25 05:50:03 2004 Subject: [Web-SIG] Where do sessions fit in? In-Reply-To: <6654eac40408241956694d9916@mail.gmail.com> References: <6654eac40408241956694d9916@mail.gmail.com> Message-ID: <412C0C68.4050305@colorstudy.com> Peter Hunt wrote: > I've realized now how middleware works. Now, I'm wondering where > sessions would fit in. Would they be a piece of middleware, or an > extension? If so, what would the interface look like? There's a good chance the session would be implemented by the application/framework sitting on top of WSGI, so WSGI wouldn't factor in at all. But middleware could implement the session. It would change the environment dictionary, adding a key (like 'middleware_name.session'), which would be the session object. The new key would be an "extension" of sorts (at least, that's the only extension WSGI has). The session object then looks like, well, whatever the middleware makes it look like. The advantage of having it in the middleware, is that if several frameworks agree on an interface for the session object, it can be created early and then shared between all the applications, even if the applications otherwise work very differently. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Wed Aug 25 06:32:44 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Aug 25 06:32:38 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <412BDF2D.5090404@colorstudy.com> References: <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <412B6C33.9080102@colorstudy.com> <6654eac4040824075242be15dd@mail.gmail.com> <20040824100006.1E15F1E400A@bag.python.org> <6654eac4040824075242be15dd@mail.gmail.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> At 07:37 PM 8/24/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>Oops. We both goofed: this should be: >> headers['content-encoding'] = ['gzip'] > >Was there any resolution on how headers are going to work? While it's >certainly more confusing to deal with a list of headers, as opposed to a >dictionary of headers, I feel like the whole thing is a little vague at >this point. > >Must all values be lists? Other sequences? Is it an error to put a >string there? I fear I'd see a lot of: > >content-encoding: g >content-encoding: z >content-encoding: i >content-encoding: p I was thinking lists-only, so it's an error to use a string for *any* header. If it's based on some kind of semantics, it's not easily extended, and if there's any mixed typing it increases the chances of messing it up. >Must all keys be lower case? Yes. > If not, headers aren't going to be any easier to work with as a > dictionary than as a list. If they are required to be lower case, again > it seems like a fragile part of the spec. > >It all makes me think that it'd just be easier to write the four or so >functions to make lists of headers easy to deal with. You could equally well write the functions to work on the dictionary of lists. ;) OTOH, I think it's probably best if the spec is strengthened to, "the server *must* report an immediate error if any of the header keys contain non-lowercase letters, or if any values are not lists." That would help flush out any programming errors. From pje at telecommunity.com Wed Aug 25 06:41:34 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Aug 25 06:41:17 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <412BE21E.20504@colorstudy.com> References: <5.1.1.6.0.20040824130106.02a19590@mail.telecommunity.com> <6654eac4040824075242be15dd@mail.gmail.com> <20040824100006.1E15F1E400A@bag.python.org> <6654eac4040824075242be15dd@mail.gmail.com> <5.1.1.6.0.20040824130106.02a19590@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040825003303.02abd070@mail.telecommunity.com> At 07:49 PM 8/24/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >> class GzipOutput(object): >> pass >> def gzip_middleware(application, compress_level=5): >> def do_gzip(environ, start_response): >> writer = [] > >Using a list to simulate mutable inner scopes is hardly what I'd consider >a Hello World class of example! While the trick works, it's not something >that I would do without a compelling reason; certainly not just to save >creating one class. Hm. To me the mutable inner scope thingy is more natural. I'd blame it on my Lisp background, except I don't *have* a Lisp background... :) >>It may be that the PEP should contain a list of suggested utility >>functions, like this one: >> def finish_response(write_func,app_return): >> if app_return: >> try: >> map(write_func,app_return) >> finally: >> if hasattr(app_return,'close'): >> app_return.close() >>Such a routine would come in handy for response-munging middleware. > >I believe you also have to close the GzipFile, as it won't flush its final >output until that happens. So the finally block has to include that as >well. That makes finish_response a bit less of a win. And again, map is >clever but something of an abuse of the function, and not appropriate for >any example code. Abuse of the function? That's what map() is *for*: to apply a function to each item in a sequence. It's more compact and to the point than a list comprehension when all you're doing is applying a single function to a sequence of single arguments. Perhaps I should also blame this on my imaginary Lisp background, where map is considered a primitive. :) (Actually, it's my 7 years of Python showing, since 'map()' was king before the advent of listcomps.) >>For the given application, it's not important. Gzipping a server push >>stream probably doesn't make a lot of sense. :) > >How so? Don't the subsequent responses have their own headers and transfer encodings? (By server push I mean a multipart response, which is also the main scenario for calling write() more than once or yielding more than one value and wanting the data to be immediately flushed. From ianb at colorstudy.com Wed Aug 25 06:46:36 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Aug 25 06:46:41 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <5.1.1.6.0.20040825003303.02abd070@mail.telecommunity.com> References: <5.1.1.6.0.20040824130106.02a19590@mail.telecommunity.com> <6654eac4040824075242be15dd@mail.gmail.com> <20040824100006.1E15F1E400A@bag.python.org> <6654eac4040824075242be15dd@mail.gmail.com> <5.1.1.6.0.20040824130106.02a19590@mail.telecommunity.com> <5.1.1.6.0.20040825003303.02abd070@mail.telecommunity.com> Message-ID: <412C19AC.6000500@colorstudy.com> Phillip J. Eby wrote: >> I believe you also have to close the GzipFile, as it won't flush its >> final output until that happens. So the finally block has to include >> that as well. That makes finish_response a bit less of a win. And >> again, map is clever but something of an abuse of the function, and >> not appropriate for any example code. > > > Abuse of the function? That's what map() is *for*: to apply a function > to each item in a sequence. It's more compact and to the point than a > list comprehension when all you're doing is applying a single function > to a sequence of single arguments. Perhaps I should also blame this on > my imaginary Lisp background, where map is considered a primitive. :) > (Actually, it's my 7 years of Python showing, since 'map()' was king > before the advent of listcomps.) Map (and list comprehensions) imply you are doing something with the results, but you aren't, you just want to throw the results away. >>> For the given application, it's not important. Gzipping a server >>> push stream probably doesn't make a lot of sense. :) >> >> >> How so? > > > Don't the subsequent responses have their own headers and transfer > encodings? (By server push I mean a multipart response, which is also > the main scenario for calling write() more than once or yielding more > than one value and wanting the data to be immediately flushed. Oh, now I'm confused. By push I just meant where the application pushes data to the server (the write callable) vs. the case where the server pulls from the application (the iterable). -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Wed Aug 25 06:52:28 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Aug 25 06:52:32 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> References: <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <412B6C33.9080102@colorstudy.com> <6654eac4040824075242be15dd@mail.gmail.com> <20040824100006.1E15F1E400A@bag.python.org> <6654eac4040824075242be15dd@mail.gmail.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> Message-ID: <412C1B0C.4020107@colorstudy.com> Phillip J. Eby wrote: > I was thinking lists-only, so it's an error to use a string for *any* > header. If it's based on some kind of semantics, it's not easily > extended, and if there's any mixed typing it increases the chances of > messing it up. > > >> Must all keys be lower case? > > > Yes. [...] > OTOH, I think it's probably best if the spec is strengthened to, "the > server *must* report an immediate error if any of the header keys > contain non-lowercase letters, or if any values are not lists." That > would help flush out any programming errors. All of these requirements make me wary. It's not that hard to deal with a list of headers, and we don't have to make any of these requirements, and if the server doesn't check something you won't get bizarre bugs (like four content-encoding fields). Keys can be any case, all values will always be strings (which aren't compound, and so people aren't likely to mess up). The issues with a dictionary are just too great, without significant gain. I'd be okay if we used a dictionary-like object that enforced these requirements, kind of like rfc822 defines, but that doesn't seem to be the direction WSGI is going. I've been writing my middleware using lists of headers, and it's really not a problem. There are some other annoyances, but that isn't one of them. I'll write about the annoyances later, once I've actually got it all working, but those relate to other parts of the system. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Wed Aug 25 07:00:34 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Aug 25 07:00:41 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <412C1B0C.4020107@colorstudy.com> References: <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <412B6C33.9080102@colorstudy.com> <6654eac4040824075242be15dd@mail.gmail.com> <20040824100006.1E15F1E400A@bag.python.org> <6654eac4040824075242be15dd@mail.gmail.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040825005443.02ac8060@mail.telecommunity.com> At 11:52 PM 8/24/04 -0500, Ian Bicking wrote: >I'd be okay if we used a dictionary-like object that enforced these >requirements, kind of like rfc822 defines, but that doesn't seem to be the >direction WSGI is going. If there's an implementation already available in the stdlib for 2.2 and up, that's not constantly in flux (like the 'email' package), I'd consider it. I just *really* don't want another long thread about what the methods should be named and what their precise semantics should be. :) In the meantime, I'm fine with headers remaining as they were in the previous draft: i.e. a sequence of tuples. From pje at telecommunity.com Wed Aug 25 07:22:28 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Aug 25 07:22:12 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <5.1.1.6.0.20040825005443.02ac8060@mail.telecommunity.com> References: <412C1B0C.4020107@colorstudy.com> <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <412B6C33.9080102@colorstudy.com> <6654eac4040824075242be15dd@mail.gmail.com> <20040824100006.1E15F1E400A@bag.python.org> <6654eac4040824075242be15dd@mail.gmail.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com> At 01:00 AM 8/25/04 -0400, Phillip J. Eby wrote: >At 11:52 PM 8/24/04 -0500, Ian Bicking wrote: >>I'd be okay if we used a dictionary-like object that enforced these >>requirements, kind of like rfc822 defines, but that doesn't seem to be >>the direction WSGI is going. > >If there's an implementation already available in the stdlib for 2.2 and >up, that's not constantly in flux (like the 'email' package), I'd consider >it. I just *really* don't want another long thread about what the methods >should be named and what their precise semantics should be. :) > >In the meantime, I'm fine with headers remaining as they were in the >previous draft: i.e. a sequence of tuples. Hm. Looking at 'email.Message', actually, it has all the semantics needed for header management, and it looks like the interface at least is stable across 2.2 and 2.3 (I haven't checked 2.4.) The code is relatively brief, and I think I'd be okay with using it as the type for 'headers'. Anybody have any objections? Here's sample usage: from email.Message import Message def application(env, start): headers = Message() headers.set_type("text/plain") headers.add_header("Set-Cookie", "CUSTOMER=WILE_E_COYOTE", path="/foobar") start("200 OK", headers)("Hello world!") One of the nice things about it is that it makes it easier to do MIME and HTTP headers that have parameter info. From ianb at colorstudy.com Wed Aug 25 08:25:33 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Aug 25 08:25:38 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com> References: <412C1B0C.4020107@colorstudy.com> <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <412B6C33.9080102@colorstudy.com> <6654eac4040824075242be15dd@mail.gmail.com> <20040824100006.1E15F1E400A@bag.python.org> <6654eac4040824075242be15dd@mail.gmail.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> <5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com> Message-ID: <412C30DD.4060401@colorstudy.com> Phillip J. Eby wrote: > Hm. Looking at 'email.Message', actually, it has all the semantics > needed for header management, and it looks like the interface at least > is stable across 2.2 and 2.3 (I haven't checked 2.4.) > > The code is relatively brief, and I think I'd be okay with using it as > the type for 'headers'. Anybody have any objections? Here's sample usage: > > from email.Message import Message > > def application(env, start): > headers = Message() > headers.set_type("text/plain") > headers.add_header("Set-Cookie", "CUSTOMER=WILE_E_COYOTE", > path="/foobar") > start("200 OK", headers)("Hello world!") > > One of the nice things about it is that it makes it easier to do MIME > and HTTP headers that have parameter info. Seems like an appropriate object. This part certainly should be stable, since they are deprecating mimetools and rfc822, with email replacing those. At first it seemed a little annoying that content-type was handled differently, but because it's the one required header it actually seems pretty reasonable. It seems like there are a couple things that are a little inappropriate for HTTP: multipart, unifrom, attach, payload, filename, boundary, preamble, epilogue. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Wed Aug 25 08:39:43 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Aug 25 08:39:47 2004 Subject: [Web-SIG] WSGI: catching exception Message-ID: <412C342F.70203@colorstudy.com> I wrote a couple pieces of middleware that catch exceptions. One defines exceptions like HTTPTemporaryRedirect, so you can raise those exceptions and it catches them and turns it into a proper HTTP response. The other catches all unhandled exceptions and formats them with cgitb. (Obviously the two have to be nested in the right order) Anyway, it felt difficult to handle exceptions, for two reasons: One place is around the application invocation, looks like: try: return application(environ, start_response) except: blah blah Except "blah blah" almost certainly depends on whether start_response has been called, so it knows if it has to call start_response, or just deal with a partially completed response. So I had to wrap start_response in another function that detected if it had been called. This also would create what I believe is a false negative if you were comparing start_response at different points in the request, as we discussed for certain output-shortcutting extensions. So, it would be nice if there was an easier way to tell where in the request we were, i.e., if headers had been sent. The other hard part is dealing with the iterator. I had to wrap the iterator, with something like: def wrap_iter(app_iter): try: for s in app_iter: yield s except: blah blah There we know headers have been sent. But it's a bit annoying that the except has to be done twice. I was also getting some behavior I have yet to understand when I was nesting gzipper and cgitb_catcher, using a URL like: .../WSGI/dispatch.cgi/cgitb_catcher.middle/gzipper.middle/httpexceptions.middle/echo?error=iter This is using the modules in svn://colorstudy.com/trunk/WSGI (symlinking dispatch.py to dispatch.cgi). Actually, now that I look at it, I think it's an issue with gzipper not dealing well with exceptions, though I guess that's another exception issue. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From tony at lownds.com Wed Aug 25 17:05:42 2004 From: tony at lownds.com (tony@lownds.com) Date: Wed Aug 25 17:22:58 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com> References: <412C1B0C.4020107@colorstudy.com><5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com><5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com><412B6C33.9080102@colorstudy.com><6654eac4040824075242be15dd@mail.gmail.com><20040824100006.1E15F1E400A@bag.python.org><6654eac4040824075242be15dd@mail.gmail.com><5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com><5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> <5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com> Message-ID: <49996.67.124.88.63.1093446342.squirrel@*> >>In the meantime, I'm fine with headers remaining as they were in the >>previous draft: i.e. a sequence of tuples. > +1 > Hm. Looking at 'email.Message', actually, it has all the semantics needed > for header management, and it looks like the interface at least is stable > across 2.2 and 2.3 (I haven't checked 2.4.) > > The code is relatively brief, and I think I'd be okay with using it as the > type for 'headers'. Anybody have any objections? Here's sample usage: > It's a nice idea, and it would probably simplify both server and application code and the spec. But, it forces an implementation. I think inclusion in the PEP as a possible change before 1.0, will give the idea plenty of discussion time. > from email.Message import Message > > def application(env, start): > headers = Message() > headers.set_type("text/plain") > headers.add_header("Set-Cookie", "CUSTOMER=WILE_E_COYOTE", > path="/foobar") > start("200 OK", headers)("Hello world!") Just call the items() method, and WSGI remains the same start("200 OK", headers.items())("Hello world!") > > One of the nice things about it is that it makes it easier to do MIME and > HTTP headers that have parameter info. > One issue: after the m.set_type call, an extra MIME-Version: 1.0 header is present. According to the HTTP 1.1 spec, when that header is present, whatever the server sends must be in "full compliance" with the MIME protocol. From reading the MIME spec, I guess adding Content-transfer-encoding: binary would take care of that... -Tony From pje at telecommunity.com Wed Aug 25 17:55:39 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Aug 25 17:55:27 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <412C30DD.4060401@colorstudy.com> References: <5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com> <412C1B0C.4020107@colorstudy.com> <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <412B6C33.9080102@colorstudy.com> <6654eac4040824075242be15dd@mail.gmail.com> <20040824100006.1E15F1E400A@bag.python.org> <6654eac4040824075242be15dd@mail.gmail.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> <5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040825115314.02c17140@mail.telecommunity.com> At 01:25 AM 8/25/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>Hm. Looking at 'email.Message', actually, it has all the semantics >>needed for header management, and it looks like the interface at least is >>stable across 2.2 and 2.3 (I haven't checked 2.4.) >>The code is relatively brief, and I think I'd be okay with using it as >>the type for 'headers'. Anybody have any objections? Here's sample usage: >> from email.Message import Message >> def application(env, start): >> headers = Message() >> headers.set_type("text/plain") >> headers.add_header("Set-Cookie", "CUSTOMER=WILE_E_COYOTE", >> path="/foobar") >> start("200 OK", headers)("Hello world!") >>One of the nice things about it is that it makes it easier to do MIME and >>HTTP headers that have parameter info. > >Seems like an appropriate object. This part certainly should be stable, >since they are deprecating mimetools and rfc822, with email replacing those. > >At first it seemed a little annoying that content-type was handled >differently, but because it's the one required header it actually seems >pretty reasonable. Actually, there's nothing stopping you from using the normal features to manipulate content-type; but 'set_type()' is more convenient. >It seems like there are a couple things that are a little inappropriate >for HTTP: multipart, unifrom, attach, payload, filename, boundary, >preamble, epilogue. I don't really see an issue there; if need be we can list the "approved" methods. From pje at telecommunity.com Wed Aug 25 18:04:28 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Aug 25 18:04:11 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <49996.67.124.88.63.1093446342.squirrel@*> References: <5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com> <412C1B0C.4020107@colorstudy.com> <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <412B6C33.9080102@colorstudy.com> <6654eac4040824075242be15dd@mail.gmail.com> <20040824100006.1E15F1E400A@bag.python.org> <6654eac4040824075242be15dd@mail.gmail.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> <5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040825115721.02c188c0@mail.telecommunity.com> At 08:05 AM 8/25/04 -0700, tony@lownds.com wrote: > >>In the meantime, I'm fine with headers remaining as they were in the > >>previous draft: i.e. a sequence of tuples. > > > >+1 > > > Hm. Looking at 'email.Message', actually, it has all the semantics needed > > for header management, and it looks like the interface at least is stable > > across 2.2 and 2.3 (I haven't checked 2.4.) > > > > The code is relatively brief, and I think I'd be okay with using it as the > > type for 'headers'. Anybody have any objections? Here's sample usage: > > > >It's a nice idea, and it would probably simplify both server and >application code >and the spec. But, it forces an implementation. But it's available in the standard library, and therefore will be *one* implementation, and thus have only one set of bugs to work around per Python version. :) >Just call the items() method, and WSGI remains the same > > start("200 OK", headers.items())("Hello world!") Quite so. But turning the items back into headers is more complex, if middleware wants to manipulate them, e.g.: for n,v in headers: msg.add_header(n,v) In any case, email.Message is actually a very thin wrapper over a list of name,value pairs! It just provides the needed functionality to manipulate the headers. >One issue: after the m.set_type call, an extra MIME-Version: 1.0 header is >present. >According to the HTTP 1.1 spec, when that header is present, whatever the >server sends must be in "full compliance" with the MIME protocol. From >reading the MIME spec, I guess adding Content-transfer-encoding: binary >would take care of that... We could require the server to add a c-t-e header if it's missing and MIME-Version is present, i.e.: if ('MIME-Version' in headers and 'Content-Transfer-Encoding' not in headers ): headers['Content-Transfer-Encoding'] = "whatever" From ianb at colorstudy.com Wed Aug 25 18:05:52 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Aug 25 18:07:29 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <49996.67.124.88.63.1093446342.squirrel@*> References: <412C1B0C.4020107@colorstudy.com><5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com><5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com><412B6C33.9080102@colorstudy.com><6654eac4040824075242be15dd@mail.gmail.com><20040824100006.1E15F1E400A@bag.python.org><6654eac4040824075242be15dd@mail.gmail.com><5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com><5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> <5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com> <49996.67.124.88.63.1093446342.squirrel@*> Message-ID: <412CB8E0.7040502@colorstudy.com> tony@lownds.com wrote: > It's a nice idea, and it would probably simplify both server and > application code > and the spec. But, it forces an implementation. I think inclusion in the > PEP as a possible > change before 1.0, will give the idea plenty of discussion time. I agree, I don't think this need to be resolved before making it an official PEP. > >> from email.Message import Message >> >> def application(env, start): >> headers = Message() >> headers.set_type("text/plain") >> headers.add_header("Set-Cookie", "CUSTOMER=WILE_E_COYOTE", >>path="/foobar") >> start("200 OK", headers)("Hello world!") > > > Just call the items() method, and WSGI remains the same > > start("200 OK", headers.items())("Hello world!") > > >>One of the nice things about it is that it makes it easier to do MIME and >>HTTP headers that have parameter info. >> > > > One issue: after the m.set_type call, an extra MIME-Version: 1.0 header is > present. > According to the HTTP 1.1 spec, when that header is present, whatever the > server sends must be in "full compliance" with the MIME protocol. From > reading the MIME spec, I guess adding Content-transfer-encoding: binary > would take care of that... Is it only after set_type then, not add_header('content-type',...)? Adding that header implicitly is rather annoying. It's too bad there's not a simpler superclass to email.Message that implements just the header part, and not the email/MIME part. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Wed Aug 25 18:54:50 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Aug 25 18:54:33 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <50583.67.124.88.63.1093450925.squirrel@*> References: <5.1.1.6.0.20040825115721.02c188c0@mail.telecommunity.com> <5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com> <412C1B0C.4020107@colorstudy.com> <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <412B6C33.9080102@colorstudy.com> <6654eac4040824075242be15dd@mail.gmail.com> <20040824100006.1E15F1E400A@bag.python.org> <6654eac4040824075242be15dd@mail.gmail.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> <5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com> <5.1.1.6.0.20040825115721.02c188c0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040825124719.0231c280@mail.telecommunity.com> (Tony: I'm assuming this was intended for web-sig; discussions like this should be archived "for the record". I hope that's not an issue for you.) At 09:22 AM 8/25/04 -0700, tony@lownds.com wrote: > > > >>Just call the items() method, and WSGI remains the same > >> > >> start("200 OK", headers.items())("Hello world!") > > > > Quite so. But turning the items back into headers is more complex, if > > middleware wants to manipulate them, e.g.: > > > > for n,v in headers: > > msg.add_header(n,v) > > > >Yes, that is an advantage. But applications with their own header >manipulation library would need to do that as well, if a Message() >instance was required. But how many of those applications currently use a list of key,value pairs as their data structure? They're going to have to loop over whatever they actually use, or build it up piece by piece, or do whatever it is that they do. I'm pretty much assuming that current apps/frameworks will need to generate a 'headers' structure, so as long as it's a simple loop, it doesn't much matter what goes in the body of that loop. Also, frameworks are usually going to have only one place to create WSGI headers, but middleware is by definition intended to be stacked. And, it's more likely that a single author will write multiple middleware components, than WSGI wrappers for multiple frameworks. So, simplifying the job of middleware authors, if it doesn't significantly burden framework authors, is a good thing here, I think. (Since the only framework authors who would be burdened by the change are those who already use a precisely compliant data structure; everyone else had to write a loop anyway.) > > We could require the server to add a c-t-e header if it's missing and > > MIME-Version is present, i.e.: > > > > if ('MIME-Version' in headers and > > 'Content-Transfer-Encoding' not in headers > > ): > > headers['Content-Transfer-Encoding'] = "whatever" > > > > > >I think that warning applications about this implication of set_type would >be sufficient. > >Content-transfer-encoding is assumed to be 7bit if not present, its not >required by MIME. 7bit would be wrong for a lot of HTTP responses though. In the current spec, the server is already required to ensure validity of the headers; this would just be a specific mention of one example of that. From pje at telecommunity.com Wed Aug 25 19:04:28 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Aug 25 19:04:11 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <412CB8E0.7040502@colorstudy.com> References: <49996.67.124.88.63.1093446342.squirrel@*> <412C1B0C.4020107@colorstudy.com> <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <412B6C33.9080102@colorstudy.com> <6654eac4040824075242be15dd@mail.gmail.com> <20040824100006.1E15F1E400A@bag.python.org> <6654eac4040824075242be15dd@mail.gmail.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> <5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com> <49996.67.124.88.63.1093446342.squirrel@*> Message-ID: <5.1.1.6.0.20040825125547.0231e0d0@mail.telecommunity.com> At 11:05 AM 8/25/04 -0500, Ian Bicking wrote: >tony@lownds.com wrote: >>It's a nice idea, and it would probably simplify both server and >>application code >>and the spec. But, it forces an implementation. I think inclusion in the >>PEP as a possible >>change before 1.0, will give the idea plenty of discussion time. > >I agree, I don't think this need to be resolved before making it an >official PEP. I'll mark it as an "Open Issue" in the PEP, providing sample code to show how it's used. Might as well have *something* left for folks to argue about. Maybe it'll provide a nice distraction from PEP 318. :) >Is it only after set_type then, not add_header('content-type',...)? Adding >that header implicitly is rather annoying. Yeah, but not hard for the server to fix, either. While I dislike forcing either side to have any "boilerplate" code, there will be fewer servers/gateways than middleware and frameworks, and being able to use email.Message should make response header manipulation as easy for middleware as request header manipulation is now. From tony at lownds.com Wed Aug 25 19:35:31 2004 From: tony at lownds.com (tony@lownds.com) Date: Wed Aug 25 19:52:36 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <5.1.1.6.0.20040825124719.0231c280@mail.telecommunity.com> References: <5.1.1.6.0.20040825115721.02c188c0@mail.telecommunity.com><5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com><412C1B0C.4020107@colorstudy.com><5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com><5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com><412B6C33.9080102@colorstudy.com><6654eac4040824075242be15dd@mail.gmail.com><20040824100006.1E15F1E400A@bag.python.org><6654eac4040824075242be15dd@mail.gmail.com><5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com><5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com><5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com><5.1.1.6.0.20040825115721.02c188c0@mail.telecommunity.com> <5.1.1.6.0.20040825124719.0231c280@mail.telecommunity.com> Message-ID: <51123.204.162.121.54.1093455331.squirrel@*> > (Tony: I'm assuming this was intended for web-sig; discussions like this > should be archived "for the record". I hope that's not an issue for you.) > Simple oversight, sorry! > So, simplifying the job of > middleware authors, if it doesn't significantly burden framework authors, > is a good thing here, I think. (Since the only framework authors who > would > be burdened by the change are those who already use a precisely compliant > data structure; everyone else had to write a loop anyway.) > I agree with that. I liked the simplicity and non-mutability of a sequence of tuples. Look forward to hearing what a wider audience thinks, after PEPing! >>Content-transfer-encoding is assumed to be 7bit if not present, its not >>required by MIME. 7bit would be wrong for a lot of HTTP responses though. > > In the current spec, the server is already required to ensure validity of > the headers; this would just be a specific mention of one example of that. > Ok -Tony From brsizer at kylotan.eidosnet.co.uk Thu Aug 26 20:30:22 2004 From: brsizer at kylotan.eidosnet.co.uk (Ben Sizer) Date: Thu Aug 26 20:28:52 2004 Subject: [Web-SIG] Regarding the WSGI draft Message-ID: <412E2C3E.7000900@kylotan.eidosnet.co.uk> I've read through the draft and most of the messages on this list that followed it. However, I have a basic problem with it which I will attempt to summarise below. The focus seems to be on making frameworks more portable. The abstract reads "This document specifies a proposed standard interface between web servers and Python web applications or frameworks, to promote web application portability across a variety of web servers." This is all well and good, but the implications from that point onwards are that we're firmly dealing with frameworks rather than applications. Phillip J. Eby has commented on Ian Bicking's blog that "at this stage, the benefits of WSGI are primarily for web *framework* authors, and web *server* authors, not web *application* authors. This is *not* an application API, it's a framework-to-server glue API." This immediately strikes me as odd, because from my previous development experience frameworks are not that important. In fact, I'm heavily inclined to believe that Python only has a proliferation of frameworks because of the currently poor degree of higher level support for web development in general, and the various frameworks attempt to bridge that gap. Create better general web support for Python, and frameworks will only be necessary for the really heavy duty applications. Create the ability to make frameworks more portable, and all you do is encourage more people to develop more frameworks. Focusing on making life easier for framework developers is solving the wrong problem, in my opinion. I come from an ASP and PHP background and generally speaking, a developer doesn't want or need a framework between their code and the web-scripting language when developing on those platforms. On the rare occasions that you do use a framework (such as PHP-Nuke) it's because you want to simplify high level activities like news management and user lists, and allow people to add content without needing to know HTML. By contrast Python's frameworks tend to address the trivial, low level things that should fall under the 'batteries included' philosophy that Python subscribes to. The front page of this Python Web-SIG suggests, "pick a Web framework that already exists, make a functionality checklist from it, and add that functionality to a new webserver module." I think that's what is needed most of all - some sort of standard approach that new Python programmers can jump right in and use, which doesn't require choosing one of several different frameworks. What I'd like to see is something mirroring the Python Database API. For instance, I might have to change "import MySQLdb" to "import pyPgSQL" but I know that 99% of the rest of the database code will work fine. As a web developer I would like to be able to change "import cgi" to "import mod_python" or "import fastcgi" and know that, if I follow a standard set of calls, I will have a simple and standard way of producing a web document. The standardised access to the output and input streams in the current draft is all well and good but there's little point in me making use of that abstraction if I still have to rely on extra modules for access to useful higher-level concepts such as: - dispatching control flow based on the URI - session management and cookies - GET/query string parsing - POST/form parsing - ASP + PHP style templating If these things are coming soon in future WSGI drafts, then great! But I got the impression that these features were being delegated out to the legion of frameworks. I am aware that this all sounds very negative, and I don't mean to criticise the hard work that Phillip and others have put into this draft specification. I just worry that it diverts attention from what I consider to be the real issue facing Python on the web, which is making life easier for web application developers, not framework developers. -- Ben Sizer From mnot at mnot.net Thu Aug 26 20:51:06 2004 From: mnot at mnot.net (Mark Nottingham) Date: Thu Aug 26 20:51:11 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: <412E2C3E.7000900@kylotan.eidosnet.co.uk> References: <412E2C3E.7000900@kylotan.eidosnet.co.uk> Message-ID: Hi Ben, I understand where you're coming from, but I think we're in a different situation here. There are a lot of different ways that you can construct an application framework; there is no "one true way," because people have varying requirements for a Web application. Contrast this with databases, which are for the most part a commodity; you can plug in different databases because they all have the same conceptual model of how a database works. There has been some progress towards convergence on a common view of what a Web application is, but I still think we have a ways to go, and much to learn, before any one application framework can declare victory. That being the case, WSGI provides something that's incredibly valuable; as long as it maintains the right level of abstraction, it allows application frameworks to avoid worrying about the details of a particular server implementation. I'm pleased as punch with it, because it lets me avoid doing that when I write my own application framework (details forthcoming ;). Cheers, On Aug 26, 2004, at 11:30 AM, Ben Sizer wrote: > I've read through the draft and most of the messages on this list that > followed it. However, I have a basic problem with it which I will > attempt to summarise below. > > The focus seems to be on making frameworks more portable. The abstract > reads "This document specifies a proposed standard interface between > web servers and Python web applications or frameworks, to promote web > application portability across a variety of web servers." This is all > well and good, but the implications from that point onwards are that > we're firmly dealing with frameworks rather than applications. Phillip > J. Eby has commented on Ian Bicking's blog that "at this stage, the > benefits of WSGI are primarily for web *framework* authors, and web > *server* authors, not web *application* authors. This is *not* an > application API, it's a framework-to-server glue API." > > This immediately strikes me as odd, because from my previous > development experience frameworks are not that important. In fact, I'm > heavily inclined to believe that Python only has a proliferation of > frameworks because of the currently poor degree of higher level > support for web development in general, and the various frameworks > attempt to bridge that gap. Create better general web support for > Python, and frameworks will only be necessary for the really heavy > duty applications. Create the ability to make frameworks more > portable, and all you do is encourage more people to develop more > frameworks. Focusing on making life easier for framework developers is > solving the wrong problem, in my opinion. > > I come from an ASP and PHP background and generally speaking, a > developer doesn't want or need a framework between their code and the > web-scripting language when developing on those platforms. On the rare > occasions that you do use a framework (such as PHP-Nuke) it's because > you want to simplify high level activities like news management and > user lists, and allow people to add content without needing to know > HTML. By contrast Python's frameworks tend to address the trivial, low > level things that should fall under the 'batteries included' > philosophy that Python subscribes to. > > The front page of this Python Web-SIG suggests, "pick a Web framework > that already exists, make a functionality checklist from it, and add > that functionality to a new webserver module." I think that's what is > needed most of all - some sort of standard approach that new Python > programmers can jump right in and use, which doesn't require choosing > one of several different frameworks. > > What I'd like to see is something mirroring the Python Database API. > For instance, I might have to change "import MySQLdb" to "import > pyPgSQL" but I know that 99% of the rest of the database code will > work fine. As a web developer I would like to be able to change > "import cgi" to "import mod_python" or "import fastcgi" and know that, > if I follow a standard set of calls, I will have a simple and standard > way of producing a web document. The standardised access to the output > and input streams in the current draft is all well and good but > there's little point in me making use of that abstraction if I still > have to rely on extra modules for access to useful higher-level > concepts such as: > > - dispatching control flow based on the URI > - session management and cookies > - GET/query string parsing > - POST/form parsing > - ASP + PHP style templating > > If these things are coming soon in future WSGI drafts, then great! But > I got the impression that these features were being delegated out to > the legion of frameworks. > > I am aware that this all sounds very negative, and I don't mean to > criticise the hard work that Phillip and others have put into this > draft specification. I just worry that it diverts attention from what > I consider to be the real issue facing Python on the web, which is > making life easier for web application developers, not framework > developers. > > -- > Ben Sizer > _______________________________________________ > Web-SIG mailing list > Web-SIG@python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/mnot%40mnot.net > -- Mark Nottingham http://www.mnot.net/ From brsizer at kylotan.eidosnet.co.uk Thu Aug 26 21:46:07 2004 From: brsizer at kylotan.eidosnet.co.uk (Ben Sizer) Date: Thu Aug 26 21:48:28 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: References: <412E2C3E.7000900@kylotan.eidosnet.co.uk> Message-ID: <412E3DFF.2000605@kylotan.eidosnet.co.uk> Mark Nottingham wrote: > I understand where you're coming from, but I think we're in a different > situation here. There are a lot of different ways > that you can construct an application framework; there is no "one true > way," because people have varying requirements for a Web application. ... > There has been some progress towards convergence on a common view of > what a Web application is, but I still think we have a ways to go, and > much to learn, before any one application framework can declare victory. Although what you say makes sense on the surface, the fact remains that technologies such as ASP and PHP are popular and useful because they present a simple and standard interface to the user, whether that user is writing a 4 line script, a small application, or a large framework upon which to base other applications. With Python you seem stuck with two equally unappealing options: slow CGI if you want a simple script, where simple is relative since you need to fool around with os.environ, printing your own headers, etc - or a complex and idiosyncratic framework if you want anything non-trivial, but which is often just as complex as PHP straight out of the box, except with a much smaller user base and generally less documentation. For example, you know that $_GET[varName] is going to be the standard way of accessing a querystring variable in PHP. Yet in Python it could be part of a request.form dictionary, or cgi.parse_qs(os.environ['QUERY_STRING']), or modpython.util.parse_qs(req.parsed_uri[7]), etc. Yet we know that query strings are part of the RFC2396 standard, so why not have a standard module or interface to present to the user? I don't see any good reason for this sort of variance, except that there's a bias towards accommodating these existing frameworks rather than enabling simpler applications of the future, and which I think is a symptom of the problem rather than part of the solution. -- Ben Sizer. From pje at telecommunity.com Thu Aug 26 21:59:07 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Aug 26 21:58:48 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: <412E2C3E.7000900@kylotan.eidosnet.co.uk> Message-ID: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> At 07:30 PM 8/26/04 +0100, Ben Sizer wrote: >I just worry that it diverts attention from what I consider to be the real >issue facing Python on the web, which is making life easier for web >application developers, not framework developers. Unfortunately, every effort to date to create a "framework to end all frameworks" has simply resulted in the existence of framework N+1. Why? Because the creation of a *new* framework means that there is no existing code that uses it. And if the framework only provides features that others already have, there's no compelling reason to switch. Any approach that ignores the economic reality of present-day Python web apps, and provides no way for them to migrate gradually to a new standard, is doomed to niche status at best. (Comparison to ASP and PHP is misleading: both had standards for dispatching, sessions, cookies, form parsing, and templating *when they were created*, so there was no legacy codebase using alternative solutions that had to be migrated.) And so, the only way we're going to "steal" the marketshare of existing frameworks is with the consent and co-operation of the developers of those frameworks. That means there has to be enough benefit for them to justify the effort of getting on board. So, please allow me to reveal my top-secret plan for total world domination... :) First, the current situation. Choice of framework is a high investment for users, because once they choose, they are stuck with that framework and possibly server. The cost to switch is extremely high. It's almost as though every plumbing manufacturer makes their own sizes of pipes and connectors, so once you choose a vendor, you're stuck with them. WSGI changes this scenario by introducing competitive pressure to the server/framework choice. As soon as enough framework and server developers participate, the others are pushed by network effects to do the same. Users ask, "Why can't I use your framework in any WSGI server?" and "Why can't I use any WSGI framework in your server?", pushing the slower adopters to either join up or be marginalized. But this is just the first phase: standardizing on a size for one kind of pipe. It's not very glamorous, but it fundamentally changes the marketplace, and causes many things to appear to spontaneously happen "on their own". First, users can experiment with other frameworks, especially if those frameworks are lightweight. This builds competitive pressure in the direction of lightweight, easy-to-integrate frameworks. So framework developers begin to break their monolithic approaches down into smaller pieces that operate on segments of WSGI. For example, a session service that you pass the incoming 'environ' and outgoing 'headers' to, in order for it to read and set cookies. (Notice that this *isn't* a WSGI-defined or standardized service, just a service implemented *in terms of* WSGI.) Such a service makes little sense to implement today, but people will spontaneously begin developing such services once WSGI is a ubiquitous part of the Python web development landscape. It's the most natural thing in the world for them to do so, not only because it means a wider audience for their service, but because they're likely developing it for a WSGI-based environment they're already using. What other platform would they write it for? Because these services will be interchangeable to some degree, lock-in is limited and competition will determine a winner or winners. Then, if the winners are sufficiently similar to allow useful standardization, that's the natural next step. But, for some services, the differences will be important qualitative differences, and standardization would reduce meaningful choice. We don't know in advance what these services should be, and we don't know enough to standardize on them now. For someone with an ASP or PHP background, that last statement at least might sound like sheer lunacy. But, Python web frameworks have often pioneered techniques years ahead of their appearance in ASP, PHP, and Java frameworks. I would hate for us to lose our next great innovation to premature standardization. But luckily, I don't need to worry: there's simply no way you'll get enough Python framework developers (and their users) to agree on such a standardization. For one thing, it's not in their best interests to do so. (Don't let me discourage you from trying, though, if that's what you want to do. I just don't think you'll have much success, and am not interested in trying it myself.) Anyway, there it is. My secret plan to fundamentally alter the Python web programming universe through secret mind-control market manipulation and social engineering. You found me out. Now I'll have to kill you.* :) * "And I'd have gotten away with it too, if it hadn't been for those meddling kids..." (Disclaimer for non-US readers: the above is a humorous reference to an American TV cartoon that featured a different character saying this line each week, after their nefarious plans were foiled. It's not me calling anybody a meddling kid, or threatening to actually kill anyone!) From rjkimble at alum.mit.edu Thu Aug 26 22:11:38 2004 From: rjkimble at alum.mit.edu (Bob Kimble) Date: Thu Aug 26 22:11:47 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: <412E3DFF.2000605@kylotan.eidosnet.co.uk> References: <412E2C3E.7000900@kylotan.eidosnet.co.uk> <412E3DFF.2000605@kylotan.eidosnet.co.uk> Message-ID: <200408261611.38389.rjkimble@alum.mit.edu> On Thursday 26 August 2004 03:46 pm, Ben Sizer wrote: > Mark Nottingham wrote: > > I understand where you're coming from, but I think we're in a different > > situation here. There are a lot of different ways > > that you can construct an application framework; there is no "one true > > way," because people have varying requirements for a Web application. > > ... > > > There has been some progress towards convergence on a common view of > > what a Web application is, but I still think we have a ways to go, and > > much to learn, before any one application framework can declare victory. > > Although what you say makes sense on the surface, the fact remains that > technologies such as ASP and PHP are popular and useful because they > present a simple and standard interface to the user, whether that user > is writing a 4 line script, a small application, or a large framework > upon which to base other applications. With Python you seem stuck with > two equally unappealing options: slow CGI if you want a simple script, > where simple is relative since you need to fool around with os.environ, > printing your own headers, etc - or a complex and idiosyncratic > framework if you want anything non-trivial, but which is often just as > complex as PHP straight out of the box, except with a much smaller user > base and generally less documentation. > > For example, you know that $_GET[varName] is going to be the standard > way of accessing a querystring variable in PHP. Yet in Python it could > be part of a request.form dictionary, or > cgi.parse_qs(os.environ['QUERY_STRING']), or > modpython.util.parse_qs(req.parsed_uri[7]), etc. Yet we know that query > strings are part of the RFC2396 standard, so why not have a standard > module or interface to present to the user? > > I don't see any good reason for this sort of variance, except that > there's a bias towards accommodating these existing frameworks rather > than enabling simpler applications of the future, and which I think is a > symptom of the problem rather than part of the solution. I have been reading this thread for a while now, and I haven't commented because I have done absolutely no web development using Python. However, Mark's comments strike me as being dead on. I'm used to the Java Servlet API, which creates an API for servlets and JSP pages. The fact that there are several high quality application servers that all support this API suggests to me that creating something similar for Python makes a lot of sense. I have written JSP's and servlets and run them under Tomcat, but I know that I could just as easily run them under WebSphere, WebLogic, JRun, or any others that support the API. It seems to me that creating a similar API for Python would be terrific. Of course, somebody would also have to write an application server to support the API, but I suspect some of the existing frameworks could be revamped to support it. Anyway, that's my 2 cents. I would love to see something similar to Tomcat and the Java Servlet API for Python. From titus at caltech.edu Thu Aug 26 22:25:10 2004 From: titus at caltech.edu (Titus Brown) Date: Thu Aug 26 22:23:03 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: <200408261611.38389.rjkimble@alum.mit.edu> References: <412E2C3E.7000900@kylotan.eidosnet.co.uk> <412E3DFF.2000605@kylotan.eidosnet.co.uk> <200408261611.38389.rjkimble@alum.mit.edu> Message-ID: <20040826202510.GA5704@caltech.edu> -> I have been reading this thread for a while now, and I haven't commented -> because I have done absolutely no web development using Python. However, -> Mark's comments strike me as being dead on. I'm used to the Java Servlet API, -> which creates an API for servlets and JSP pages. The fact that there are -> several high quality application servers that all support this API suggests -> to me that creating something similar for Python makes a lot of sense. I have -> written JSP's and servlets and run them under Tomcat, but I know that I could -> just as easily run them under WebSphere, WebLogic, JRun, or any others that -> support the API. It seems to me that creating a similar API for Python would -> be terrific. Of course, somebody would also have to write an application -> server to support the API, but I suspect some of the existing frameworks -> could be revamped to support it. Anyway, that's my 2 cents. I would love to -> see something similar to Tomcat and the Java Servlet API for Python. I've implemented packages at the adapter level (PyWX), the framework level (crud that was never released because I found Quixote first), and the content level (based variously on CGI, WebWare, and Quixote). I'm moderately skeptical of the short term use of the API being developed on this list, because in practice it is relatively easy to implement a framework that fits on top of all of the existing adapters (CGI, mod_python, etc.) Medium term, I think it will lead to a welcome homogenization of server <--> adapter <--> framework interaction, and so I think it's a valuable concept. The idea of having a single framework (like Java's "servlets") is, I think, silly. Having implemented sites in several of the existing frameworks, it is clear that there are several different ways to conceptualize the development of Web sites: the Quixote style and the WebWare style are two very distinct examples. Anything that cuts down on the variety of available frameworks is going to restrict the options, which is bad. However, I think it is incumbent upon the developers and users of the different frameworks to clearly distinguish between the various options. Right now it is very confusing to me, and I've been developing Web sites in Python for 5 years ;). I'm very confused as to why you need multiple servlet implementations in Java. Wouldn't one do just as well as 10? It sounds like having 5 different implementations of the 'os' module in Python... --titus From pje at telecommunity.com Thu Aug 26 22:45:01 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Aug 26 22:44:39 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: <20040826202510.GA5704@caltech.edu> References: <200408261611.38389.rjkimble@alum.mit.edu> <412E2C3E.7000900@kylotan.eidosnet.co.uk> <412E3DFF.2000605@kylotan.eidosnet.co.uk> <200408261611.38389.rjkimble@alum.mit.edu> Message-ID: <5.1.1.6.0.20040826163917.01efdec0@mail.telecommunity.com> At 01:25 PM 8/26/04 -0700, Titus Brown wrote: >in practice it is relatively easy >to implement a framework that fits on top of all of the existing >adapters (CGI, mod_python, etc.) How about FastCGI? Medusa? Twisted? ZServer? SCGI and PCGI? ReadyExec? It's only "relatively easy" in that you can define your own WSGI-like protocol, make adapters for some subset of the existing servers and gateways, and then document that protocol. There's no sense in duplicating those efforts for each framework and each server or gateway in an N*M explosion, especially since the coverage in practice is quite incomplete, despite being "relatively easy" in principle. From fumanchu at amor.org Thu Aug 26 22:50:12 2004 From: fumanchu at amor.org (Robert Brewer) Date: Thu Aug 26 22:55:40 2004 Subject: [Web-SIG] Regarding the WSGI draft Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022E8C@exchange.hqamor.amorhq.net> Phillip J. Eby wrote: > First, the current situation. Choice of framework is a high > investment for users, because once they choose, they are stuck > with that framework and possibly server. The cost to switch > is extremely high. It's almost as though every plumbing > manufacturer makes their own sizes of pipes and connectors, > so once you choose a vendor, you're stuck with them. > > WSGI changes this scenario by introducing competitive pressure to the > server/framework choice. As soon as enough framework and > server developers participate, the others are pushed by network > effects to do the same. Users ask, "Why can't I use your > framework in any WSGI server?" and "Why can't I use any WSGI > framework in your server?", pushing the slower adopters to either > join up or be marginalized. It's on my to-do list for *my* framework already... ;) just-a-data-point-in-support-ly-yrs, Robert Brewer MIS Amor Ministries fumanchu@amor.org From mnot at mnot.net Thu Aug 26 23:27:58 2004 From: mnot at mnot.net (Mark Nottingham) Date: Thu Aug 26 23:28:02 2004 Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby) In-Reply-To: <5.1.1.6.0.20040825005443.02ac8060@mail.telecommunity.com> References: <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <412B6C33.9080102@colorstudy.com> <6654eac4040824075242be15dd@mail.gmail.com> <20040824100006.1E15F1E400A@bag.python.org> <6654eac4040824075242be15dd@mail.gmail.com> <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com> <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com> <5.1.1.6.0.20040825005443.02ac8060@mail.telecommunity.com> Message-ID: On Aug 24, 2004, at 10:00 PM, Phillip J. Eby wrote: > In the meantime, I'm fine with headers remaining as they were in the > previous draft: i.e. a sequence of tuples. +1 to this or the email.Message solution; there are lots of different ways to add value to the way that headers are exposed, but let's keep it simple and conservative in WSGI. Cheers, -- Mark Nottingham http://www.mnot.net/ From ianb at colorstudy.com Fri Aug 27 00:08:37 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Aug 27 00:10:40 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: <412E2C3E.7000900@kylotan.eidosnet.co.uk> References: <412E2C3E.7000900@kylotan.eidosnet.co.uk> Message-ID: <412E5F65.9080508@colorstudy.com> Responding more generally on this thread; or, more generally, here's What The WSGI Means To Me... It's not so much that you can attach servers and frameworks independently. That's nice, but it's not a huge deal. WSGI is, to me, the beginning of a common language about HTTP requests, a standard way to represent that request. It's not the most awesome, easiest to use representation of these objects, but I don't think that's a reasonable goal, those qualities are too subjective. WSGI's request and response are what we can manage, trying to make everyone happy. And it's not so bad, because while it's not featureful, it's *really simple*. That's a decent compromise. The request is the environment dictionary the WSGI defines; the response is the status plus headers plus written body plus iterable body. And it's okay that it's this simple, because it's a straight-forward mapping of HTTP with little information lost, and HTTP is obviously fairly central to this all. But even though it's simple and adds no real features, nor does it enable anything new, it's still interesting because it gives us a standard way of communicating (programmatically). We don't have that right now. Ben's right, there's a lot of work to be done to make a good, simple, Python web development environment. WSGI makes it possible to work towards that goal incrementally and in a distributed fashion, without competing. Right now everyone who develops on a framework is competing with everyone developing on some other framework. It's just too big of a problem space to have to compete on a large scale, with the entire environment being take it or leave it. But I don't think the developers actually *want* to compete, it's just been a technical necessity. So, a bit like Phillip, I think WSGI isn't an end to itself, but it could be key in enabling further progress. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From mnot at mnot.net Fri Aug 27 00:57:17 2004 From: mnot at mnot.net (Mark Nottingham) Date: Fri Aug 27 00:57:22 2004 Subject: [Web-SIG] Other kinds of environment variables Message-ID: <479BBC96-F7B3-11D8-82BE-000A95BD86C0@mnot.net> One thing that seems to be missing in WSGI to me is the communication of the delineation between what the server does and what the application does. The latest drafts says; [[[ In general, the server or gateway is responsible for ensuring that correct headers are sent to the client: if an application omits a needed header, the server or gateway *shoud* add it. [...] If the application supplies a header that the server would ordinarily supply, or that contradicts the server's intended behaviour [...] the server or gateway *may* discard the conflicting header, provided that its action is recorded for the benefit of the application author. ]]] I'm a bit uncomfortable with this, because there's no standard way for the action to be "recorded for the benefit of the application author." IMO this is one of the major problems with CGI. In other words, there's a laundry list of HTTP features that may or may not be handled by the server on behalf of the application, depending on how it's written and configured. Giving the application some idea of what it can expect the server to do, and how it will do it, would help application frameworks decide what tasks it needs to take on itself. For example; * HTTP auth - does the server make the Authentication header available? Automatically generate 401s when configured to require auth? If the application framework wishes to perform auth on its own, will it have the appropriate information available? * chunked encoding - does the server chunk the body when appropriate? * content-length - does the server automatically calculate it? * cache validation - does the server handle If-Modified-Since and If-None-Match requests appropriately (e.g., with a 304)? * content-encoding - does the server apply content-encoding in requests and/or responses as appropriate, and what schemes does it support? * transfer-encoding - same as content-encoding Some servers (e.g., CGI) may not be able to supply all of this information reliably, but others will, and it would be quite useful to frameworks to know the capabilities of the server in a generic fashion. I know that this can be addressed by server-specific environment now, but I think there might be some low-hanging fruit for common functions like the ones above. It might be that they'd be better in a separate document, so they're not part of the 'core' WSGI, but I think there's real value in having some common ones. Thoughts? If there's interest, I'll make a proposal. -- Mark Nottingham http://www.mnot.net/ From mnot at mnot.net Fri Aug 27 01:03:36 2004 From: mnot at mnot.net (Mark Nottingham) Date: Fri Aug 27 01:03:39 2004 Subject: [Web-SIG] expect/continue Message-ID: <291B974B-F7B4-11D8-82BE-000A95BD86C0@mnot.net> Phillip, I like the Expect/Continue langauge in the latest draft -- thanks! One thing; the first bullet point gives servers and gateways the option of "Reject[ing] all client requests containing an Expect: 100-continue header with a '417 Expectation failed' error." This doesn't seem like a good thing to allow, because it makes server implementations that take this path reject ALL requests that use expect/continue, with no recourse. The intent of Expect/Continue is that it should fall back to normal operation (the request gets sent and processed) unless it is explicitly rejected. So, I think this option should be removed. I can see some scenarios where the server can and will be configured to reject all requests over a certain size, etc. but rejecting all requests that use this mechanism indiscriminately doesn't seem to fall into that case. If an implementation doesn't want to deal with expect/continue at all, it has two choices; 1) don't claim to be HTTP/1.1 conformant 2) wait until the client decides you don't support expect/continue, and sends the request body (this is suboptimal, for obvious reasons). Cheers, -- Mark Nottingham http://www.mnot.net/ From floydophone at gmail.com Fri Aug 27 01:19:24 2004 From: floydophone at gmail.com (Peter Hunt) Date: Fri Aug 27 01:19:34 2004 Subject: [Web-SIG] Re: Web-SIG Digest, Vol 10, Issue 26 In-Reply-To: <20040826202307.7DCE81E400A@bag.python.org> References: <20040826202307.7DCE81E400A@bag.python.org> Message-ID: <6654eac404082616195fc15079@mail.gmail.com> I believe we can achieve the best of both worlds. We should implement a Servlet-like interface which works atop WSGI, which includes session management, caching, and pooling. We should include this in the standard Python distribution and call it the official framework. This servlet library should have the exact same interface as a currently existing framework. At first glance, I'd say we should just port jonpy to WSGI and include it in the Python distribution, but other viable alternatives are WebWare servlets, Snakelets, and WebStack. What do you think? On Thu, 26 Aug 2004 22:23:07 +0200 (CEST), web-sig-request@python.org wrote: > Send Web-SIG mailing list submissions to > web-sig@python.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.python.org/mailman/listinfo/web-sig > or, via email, send a message with subject or body 'help' to > web-sig-request@python.org > > You can reach the person managing the list at > web-sig-owner@python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Web-SIG digest..." > > Today's Topics: > > 1. Regarding the WSGI draft (Ben Sizer) > 2. Re: Regarding the WSGI draft (Mark Nottingham) > 3. Re: Regarding the WSGI draft (Ben Sizer) > 4. Re: Regarding the WSGI draft (Phillip J. Eby) > 5. Re: Regarding the WSGI draft (Bob Kimble) > 6. Re: Regarding the WSGI draft (Titus Brown) > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 26 Aug 2004 19:30:22 +0100 > From: Ben Sizer > Subject: [Web-SIG] Regarding the WSGI draft > To: web-sig@python.org > Message-ID: <412E2C3E.7000900@kylotan.eidosnet.co.uk> > Content-Type: text/plain; charset=us-ascii; format=flowed > > I've read through the draft and most of the messages on this list that > followed it. However, I have a basic problem with it which I will > attempt to summarise below. > > The focus seems to be on making frameworks more portable. The abstract > reads "This document specifies a proposed standard interface between web > servers and Python web applications or frameworks, to promote web > application portability across a variety of web servers." This is all > well and good, but the implications from that point onwards are that > we're firmly dealing with frameworks rather than applications. Phillip > J. Eby has commented on Ian Bicking's blog that "at this stage, the > benefits of WSGI are primarily for web *framework* authors, and web > *server* authors, not web *application* authors. This is *not* an > application API, it's a framework-to-server glue API." > > This immediately strikes me as odd, because from my previous development > experience frameworks are not that important. In fact, I'm heavily > inclined to believe that Python only has a proliferation of frameworks > because of the currently poor degree of higher level support for web > development in general, and the various frameworks attempt to bridge > that gap. Create better general web support for Python, and frameworks > will only be necessary for the really heavy duty applications. Create > the ability to make frameworks more portable, and all you do is > encourage more people to develop more frameworks. Focusing on making > life easier for framework developers is solving the wrong problem, in my > opinion. > > I come from an ASP and PHP background and generally speaking, a > developer doesn't want or need a framework between their code and the > web-scripting language when developing on those platforms. On the rare > occasions that you do use a framework (such as PHP-Nuke) it's because > you want to simplify high level activities like news management and user > lists, and allow people to add content without needing to know HTML. By > contrast Python's frameworks tend to address the trivial, low level > things that should fall under the 'batteries included' philosophy that > Python subscribes to. > > The front page of this Python Web-SIG suggests, "pick a Web framework > that already exists, make a functionality checklist from it, and add > that functionality to a new webserver module." I think that's what is > needed most of all - some sort of standard approach that new Python > programmers can jump right in and use, which doesn't require choosing > one of several different frameworks. > > What I'd like to see is something mirroring the Python Database API. For > instance, I might have to change "import MySQLdb" to "import pyPgSQL" > but I know that 99% of the rest of the database code will work fine. As > a web developer I would like to be able to change "import cgi" to > "import mod_python" or "import fastcgi" and know that, if I follow a > standard set of calls, I will have a simple and standard way of > producing a web document. The standardised access to the output and > input streams in the current draft is all well and good but there's > little point in me making use of that abstraction if I still have to > rely on extra modules for access to useful higher-level concepts such as: > > - dispatching control flow based on the URI > - session management and cookies > - GET/query string parsing > - POST/form parsing > - ASP + PHP style templating > > If these things are coming soon in future WSGI drafts, then great! But I > got the impression that these features were being delegated out to the > legion of frameworks. > > I am aware that this all sounds very negative, and I don't mean to > criticise the hard work that Phillip and others have put into this draft > specification. I just worry that it diverts attention from what I > consider to be the real issue facing Python on the web, which is making > life easier for web application developers, not framework developers. > > -- > Ben Sizer > > ------------------------------ > > Message: 2 > Date: Thu, 26 Aug 2004 11:51:06 -0700 > From: Mark Nottingham > Subject: Re: [Web-SIG] Regarding the WSGI draft > To: Ben Sizer > Cc: web-sig@python.org > Message-ID: > Content-Type: text/plain; charset=US-ASCII; format=flowed > > Hi Ben, > > I understand where you're coming from, but I think we're in a different > situation here. There are a lot of different ways > that you can construct an application framework; there is no "one true > way," because people have varying requirements for a Web application. > > Contrast this with databases, which are for the most part a commodity; > you can plug in different databases because they all have the same > conceptual model of how a database works. > > There has been some progress towards convergence on a common view of > what a Web application is, but I still think we have a ways to go, and > much to learn, before any one application framework can declare > victory. > > That being the case, WSGI provides something that's incredibly > valuable; as long as it maintains the right level of abstraction, it > allows application frameworks to avoid worrying about the details of a > particular server implementation. > > I'm pleased as punch with it, because it lets me avoid doing that when > I write my own application framework (details forthcoming ;). > > Cheers, > > On Aug 26, 2004, at 11:30 AM, Ben Sizer wrote: > > > I've read through the draft and most of the messages on this list that > > followed it. However, I have a basic problem with it which I will > > attempt to summarise below. > > > > The focus seems to be on making frameworks more portable. The abstract > > reads "This document specifies a proposed standard interface between > > web servers and Python web applications or frameworks, to promote web > > application portability across a variety of web servers." This is all > > well and good, but the implications from that point onwards are that > > we're firmly dealing with frameworks rather than applications. Phillip > > J. Eby has commented on Ian Bicking's blog that "at this stage, the > > benefits of WSGI are primarily for web *framework* authors, and web > > *server* authors, not web *application* authors. This is *not* an > > application API, it's a framework-to-server glue API." > > > > This immediately strikes me as odd, because from my previous > > development experience frameworks are not that important. In fact, I'm > > heavily inclined to believe that Python only has a proliferation of > > frameworks because of the currently poor degree of higher level > > support for web development in general, and the various frameworks > > attempt to bridge that gap. Create better general web support for > > Python, and frameworks will only be necessary for the really heavy > > duty applications. Create the ability to make frameworks more > > portable, and all you do is encourage more people to develop more > > frameworks. Focusing on making life easier for framework developers is > > solving the wrong problem, in my opinion. > > > > I come from an ASP and PHP background and generally speaking, a > > developer doesn't want or need a framework between their code and the > > web-scripting language when developing on those platforms. On the rare > > occasions that you do use a framework (such as PHP-Nuke) it's because > > you want to simplify high level activities like news management and > > user lists, and allow people to add content without needing to know > > HTML. By contrast Python's frameworks tend to address the trivial, low > > level things that should fall under the 'batteries included' > > philosophy that Python subscribes to. > > > > The front page of this Python Web-SIG suggests, "pick a Web framework > > that already exists, make a functionality checklist from it, and add > > that functionality to a new webserver module." I think that's what is > > needed most of all - some sort of standard approach that new Python > > programmers can jump right in and use, which doesn't require choosing > > one of several different frameworks. > > > > What I'd like to see is something mirroring the Python Database API. > > For instance, I might have to change "import MySQLdb" to "import > > pyPgSQL" but I know that 99% of the rest of the database code will > > work fine. As a web developer I would like to be able to change > > "import cgi" to "import mod_python" or "import fastcgi" and know that, > > if I follow a standard set of calls, I will have a simple and standard > > way of producing a web document. The standardised access to the output > > and input streams in the current draft is all well and good but > > there's little point in me making use of that abstraction if I still > > have to rely on extra modules for access to useful higher-level > > concepts such as: > > > > - dispatching control flow based on the URI > > - session management and cookies > > - GET/query string parsing > > - POST/form parsing > > - ASP + PHP style templating > > > > If these things are coming soon in future WSGI drafts, then great! But > > I got the impression that these features were being delegated out to > > the legion of frameworks. > > > > I am aware that this all sounds very negative, and I don't mean to > > criticise the hard work that Phillip and others have put into this > > draft specification. I just worry that it diverts attention from what > > I consider to be the real issue facing Python on the web, which is > > making life easier for web application developers, not framework > > developers. > > > > -- > > Ben Sizer > > _______________________________________________ > > Web-SIG mailing list > > Web-SIG@python.org > > Web SIG: http://www.python.org/sigs/web-sig > > Unsubscribe: > > http://mail.python.org/mailman/options/web-sig/mnot%40mnot.net > > > > -- > Mark Nottingham http://www.mnot.net/ > > ------------------------------ > > Message: 3 > Date: Thu, 26 Aug 2004 20:46:07 +0100 > From: Ben Sizer > Subject: Re: [Web-SIG] Regarding the WSGI draft > To: Mark Nottingham > Cc: web-sig@python.org > Message-ID: <412E3DFF.2000605@kylotan.eidosnet.co.uk> > Content-Type: text/plain; charset=us-ascii; format=flowed > > Mark Nottingham wrote: > > > I understand where you're coming from, but I think we're in a different > > situation here. There are a lot of different ways > > that you can construct an application framework; there is no "one true > > way," because people have varying requirements for a Web application. > > .... > > > There has been some progress towards convergence on a common view of > > what a Web application is, but I still think we have a ways to go, and > > much to learn, before any one application framework can declare victory. > > Although what you say makes sense on the surface, the fact remains that > technologies such as ASP and PHP are popular and useful because they > present a simple and standard interface to the user, whether that user > is writing a 4 line script, a small application, or a large framework > upon which to base other applications. With Python you seem stuck with > two equally unappealing options: slow CGI if you want a simple script, > where simple is relative since you need to fool around with os.environ, > printing your own headers, etc - or a complex and idiosyncratic > framework if you want anything non-trivial, but which is often just as > complex as PHP straight out of the box, except with a much smaller user > base and generally less documentation. > > For example, you know that $_GET[varName] is going to be the standard > way of accessing a querystring variable in PHP. Yet in Python it could > be part of a request.form dictionary, or > cgi.parse_qs(os.environ['QUERY_STRING']), or > modpython.util.parse_qs(req.parsed_uri[7]), etc. Yet we know that query > strings are part of the RFC2396 standard, so why not have a standard > module or interface to present to the user? > > I don't see any good reason for this sort of variance, except that > there's a bias towards accommodating these existing frameworks rather > than enabling simpler applications of the future, and which I think is a > symptom of the problem rather than part of the solution. > > -- > Ben Sizer. > > ------------------------------ > > Message: 4 > Date: Thu, 26 Aug 2004 15:59:07 -0400 > From: "Phillip J. Eby" > Subject: Re: [Web-SIG] Regarding the WSGI draft > To: Ben Sizer , web-sig@python.org > Message-ID: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> > Content-Type: text/plain; charset="us-ascii"; format=flowed > > At 07:30 PM 8/26/04 +0100, Ben Sizer wrote: > >I just worry that it diverts attention from what I consider to be the real > >issue facing Python on the web, which is making life easier for web > >application developers, not framework developers. > > Unfortunately, every effort to date to create a "framework to end all > frameworks" has simply resulted in the existence of framework > N+1. Why? Because the creation of a *new* framework means that there is > no existing code that uses it. And if the framework only provides features > that others already have, there's no compelling reason to switch. > > Any approach that ignores the economic reality of present-day Python web > apps, and provides no way for them to migrate gradually to a new standard, > is doomed to niche status at best. (Comparison to ASP and PHP is > misleading: both had standards for dispatching, sessions, cookies, form > parsing, and templating *when they were created*, so there was no legacy > codebase using alternative solutions that had to be migrated.) > > And so, the only way we're going to "steal" the marketshare of existing > frameworks is with the consent and co-operation of the developers of those > frameworks. That means there has to be enough benefit for them to justify > the effort of getting on board. > > So, please allow me to reveal my top-secret plan for total world > domination... :) > > First, the current situation. Choice of framework is a high investment for > users, because once they choose, they are stuck with that framework and > possibly server. The cost to switch is extremely high. It's almost as > though every plumbing manufacturer makes their own sizes of pipes and > connectors, so once you choose a vendor, you're stuck with them. > > WSGI changes this scenario by introducing competitive pressure to the > server/framework choice. As soon as enough framework and server developers > participate, the others are pushed by network effects to do the > same. Users ask, "Why can't I use your framework in any WSGI server?" and > "Why can't I use any WSGI framework in your server?", pushing the slower > adopters to either join up or be marginalized. > > But this is just the first phase: standardizing on a size for one kind of > pipe. It's not very glamorous, but it fundamentally changes the > marketplace, and causes many things to appear to spontaneously happen "on > their own". > > First, users can experiment with other frameworks, especially if those > frameworks are lightweight. This builds competitive pressure in the > direction of lightweight, easy-to-integrate frameworks. So framework > developers begin to break their monolithic approaches down into smaller > pieces that operate on segments of WSGI. For example, a session service > that you pass the incoming 'environ' and outgoing 'headers' to, in order > for it to read and set cookies. (Notice that this *isn't* a WSGI-defined > or standardized service, just a service implemented *in terms of* WSGI.) > > Such a service makes little sense to implement today, but people will > spontaneously begin developing such services once WSGI is a ubiquitous part > of the Python web development landscape. It's the most natural thing in > the world for them to do so, not only because it means a wider audience for > their service, but because they're likely developing it for a WSGI-based > environment they're already using. What other platform would they write it > for? > > Because these services will be interchangeable to some degree, lock-in is > limited and competition will determine a winner or winners. Then, if the > winners are sufficiently similar to allow useful standardization, that's > the natural next step. But, for some services, the differences will be > important qualitative differences, and standardization would reduce > meaningful choice. We don't know in advance what these services should be, > and we don't know enough to standardize on them now. > > For someone with an ASP or PHP background, that last statement at least > might sound like sheer lunacy. But, Python web frameworks have often > pioneered techniques years ahead of their appearance in ASP, PHP, and Java > frameworks. I would hate for us to lose our next great innovation to > premature standardization. > > But luckily, I don't need to worry: there's simply no way you'll get enough > Python framework developers (and their users) to agree on such a > standardization. For one thing, it's not in their best interests to do > so. (Don't let me discourage you from trying, though, if that's what you > want to do. I just don't think you'll have much success, and am not > interested in trying it myself.) > > Anyway, there it is. My secret plan to fundamentally alter the Python web > programming universe through secret mind-control market manipulation and > social engineering. You found me out. Now I'll have to kill you.* :) > > * "And I'd have gotten away with it too, if it hadn't been for those > meddling kids..." > > (Disclaimer for non-US readers: the above is a humorous reference to an > American TV cartoon that featured a different character saying this line > each week, after their nefarious plans were foiled. It's not me calling > anybody a meddling kid, or threatening to actually kill anyone!) > > ------------------------------ > > Message: 5 > Date: Thu, 26 Aug 2004 16:11:38 -0400 > From: Bob Kimble > Subject: Re: [Web-SIG] Regarding the WSGI draft > To: web-sig@python.org > Message-ID: <200408261611.38389.rjkimble@alum.mit.edu> > Content-Type: text/plain; charset="iso-8859-1" > > On Thursday 26 August 2004 03:46 pm, Ben Sizer wrote: > > Mark Nottingham wrote: > > > I understand where you're coming from, but I think we're in a different > > > situation here. There are a lot of different ways > > > that you can construct an application framework; there is no "one true > > > way," because people have varying requirements for a Web application. > > > > ... > > > > > There has been some progress towards convergence on a common view of > > > what a Web application is, but I still think we have a ways to go, and > > > much to learn, before any one application framework can declare victory. > > > > Although what you say makes sense on the surface, the fact remains that > > technologies such as ASP and PHP are popular and useful because they > > present a simple and standard interface to the user, whether that user > > is writing a 4 line script, a small application, or a large framework > > upon which to base other applications. With Python you seem stuck with > > two equally unappealing options: slow CGI if you want a simple script, > > where simple is relative since you need to fool around with os.environ, > > printing your own headers, etc - or a complex and idiosyncratic > > framework if you want anything non-trivial, but which is often just as > > complex as PHP straight out of the box, except with a much smaller user > > base and generally less documentation. > > > > For example, you know that $_GET[varName] is going to be the standard > > way of accessing a querystring variable in PHP. Yet in Python it could > > be part of a request.form dictionary, or > > cgi.parse_qs(os.environ['QUERY_STRING']), or > > modpython.util.parse_qs(req.parsed_uri[7]), etc. Yet we know that query > > strings are part of the RFC2396 standard, so why not have a standard > > module or interface to present to the user? > > > > I don't see any good reason for this sort of variance, except that > > there's a bias towards accommodating these existing frameworks rather > > than enabling simpler applications of the future, and which I think is a > > symptom of the problem rather than part of the solution. > > I have been reading this thread for a while now, and I haven't commented > because I have done absolutely no web development using Python. However, > Mark's comments strike me as being dead on. I'm used to the Java Servlet API, > which creates an API for servlets and JSP pages. The fact that there are > several high quality application servers that all support this API suggests > to me that creating something similar for Python makes a lot of sense. I have > written JSP's and servlets and run them under Tomcat, but I know that I could > just as easily run them under WebSphere, WebLogic, JRun, or any others that > support the API. It seems to me that creating a similar API for Python would > be terrific. Of course, somebody would also have to write an application > server to support the API, but I suspect some of the existing frameworks > could be revamped to support it. Anyway, that's my 2 cents. I would love to > see something similar to Tomcat and the Java Servlet API for Python. > > ------------------------------ > > Message: 6 > Date: Thu, 26 Aug 2004 13:25:10 -0700 > From: Titus Brown > Subject: Re: [Web-SIG] Regarding the WSGI draft > To: Bob Kimble > Cc: web-sig@python.org > Message-ID: <20040826202510.GA5704@caltech.edu> > Content-Type: text/plain; charset=us-ascii > > -> I have been reading this thread for a while now, and I haven't commented > -> because I have done absolutely no web development using Python. However, > -> Mark's comments strike me as being dead on. I'm used to the Java Servlet API, > -> which creates an API for servlets and JSP pages. The fact that there are > -> several high quality application servers that all support this API suggests > -> to me that creating something similar for Python makes a lot of sense. I have > -> written JSP's and servlets and run them under Tomcat, but I know that I could > -> just as easily run them under WebSphere, WebLogic, JRun, or any others that > -> support the API. It seems to me that creating a similar API for Python would > -> be terrific. Of course, somebody would also have to write an application > -> server to support the API, but I suspect some of the existing frameworks > -> could be revamped to support it. Anyway, that's my 2 cents. I would love to > -> see something similar to Tomcat and the Java Servlet API for Python. > > > > I've implemented packages at the adapter level (PyWX), the framework > level (crud that was never released because I found Quixote first), and > the content level (based variously on CGI, WebWare, and Quixote). > > I'm moderately skeptical of the short term use of the API being > developed on this list, because in practice it is relatively easy > to implement a framework that fits on top of all of the existing > adapters (CGI, mod_python, etc.) Medium term, I think it will lead > to a welcome homogenization of server <--> adapter <--> framework > interaction, and so I think it's a valuable concept. > > The idea of having a single framework (like Java's "servlets") is, I > think, silly. Having implemented sites in several of the existing > frameworks, it is clear that there are several different ways to > conceptualize the development of Web sites: the Quixote style and > the WebWare style are two very distinct examples. Anything that cuts > down on the variety of available frameworks is going to restrict the > options, which is bad. > > However, I think it is incumbent upon the developers and users of the > different frameworks to clearly distinguish between the various options. > Right now it is very confusing to me, and I've been developing Web sites > in Python for 5 years ;). > > I'm very confused as to why you need multiple servlet implementations in > Java. Wouldn't one do just as well as 10? It sounds like having 5 > different implementations of the 'os' module in Python... > > --titus > > ------------------------------ > > _______________________________________________ > Web-SIG mailing list > Web-SIG@python.org > http://mail.python.org/mailman/listinfo/web-sig > > End of Web-SIG Digest, Vol 10, Issue 26 > *************************************** > From jim-web-sig at jimdabell.com Fri Aug 27 02:03:08 2004 From: jim-web-sig at jimdabell.com (Jim Dabell) Date: Fri Aug 27 01:56:15 2004 Subject: [Web-SIG] Re: Web-SIG Digest, Vol 10, Issue 26 In-Reply-To: <6654eac404082616195fc15079@mail.gmail.com> References: <20040826202307.7DCE81E400A@bag.python.org> <6654eac404082616195fc15079@mail.gmail.com> Message-ID: <200408270103.08455.jim-web-sig@jimdabell.com> [Please trim responses in future, you didn't have to quote the whole digest to us.] > We should implement a Servlet-like interface which works atop WSGI, Fair enough, but as it would sit on top WSGI, nobody need concern themselves with it until WSGI is finished. Otherwise you are trying to hit a moving target. The point of WSGI isn't that development stops after it's finished, but rather that everyone is on the same page before attempting something more ambitious like you describe. Larger projects take more time to mature and give more scope for fundamental disgreements. Something like WSGI can be specified, implemented and standardised relatively quickly, meaning there are incremental, measurable improvements, rather than everybody waiting around for the perfect system to be born. Obviously your proposed servlet-like interface's requirements are a factor in what should go into WSGI, but I see no reason to believe your servlet-like interface would have significantly different requirements to all the other frameworks. -- Jim Dabell From ianb at colorstudy.com Fri Aug 27 02:03:36 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Aug 27 02:03:42 2004 Subject: [Web-SIG] Other kinds of environment variables In-Reply-To: <479BBC96-F7B3-11D8-82BE-000A95BD86C0@mnot.net> References: <479BBC96-F7B3-11D8-82BE-000A95BD86C0@mnot.net> Message-ID: <412E7A58.6030800@colorstudy.com> Mark Nottingham wrote: > One thing that seems to be missing in WSGI to me is the communication of > the delineation between what the server does and what the application does. > > The latest drafts says; > > [[[ > In general, the server or gateway is responsible for ensuring that > correct headers are sent to the client: if an application omits a needed > header, the server or gateway *shoud* add it. [...] If the application > supplies a header that the server would ordinarily supply, or that > contradicts the server's intended behaviour [...] the server or gateway > *may* discard the conflicting header, provided that its action is > recorded for the benefit of the application author. > ]]] > > I'm a bit uncomfortable with this, because there's no standard way for > the action to be "recorded for the benefit of the application author." > IMO this is one of the major problems with CGI. The closest thing to a standard would be, I think, environ['wsgi.error']. I would expect to see errors about the application to be sent there. I also think it's reasonable not to specify it further than this -- many error logging facilities are possible, and it's all very server-specific. > In other words, there's a laundry list of HTTP features that may or may > not be handled by the server on behalf of the application, depending on > how it's written and configured. Giving the application some idea of > what it can expect the server to do, and how it will do it, would help > application frameworks decide what tasks it needs to take on itself. But then this is a different issue. I think Phillip likes the idea of "configuration" for this. I give it scare quotes because I think Phillip thinks about configuration somewhat differently than most people, and configuration plays a different sort of role in PEAK (and Zope 3). It's a way of plugging pieces together, rather than just a way of indicating installation-specific values. But, an earlier WSGI interface didn't have wsgi.threaded or wsgi.multiprocess, and I think it would actually be hard to work without these. > For example; > > * HTTP auth - does the server make the Authentication header available? > Automatically generate 401s when configured to require auth? If the > application framework wishes to perform auth on its own, will it have > the appropriate information available? If the server does not provide the Authentication header, that would be useful to know. Of course, sometimes you can't know that -- a CGI script doesn't know how its parent is configured. Using Apache, you can configure it both ways for CGI scripts (and I think they even make this easier and more explicit in Apache 2, so you shouldn't just expect it to always be off). But I can appreciate the annoyance when you don't know if HTTP auth will work, or you're new to this (or come from someplace like PHP) you just go nuts trying to figure out why the software won't let you log in. > * chunked encoding - does the server chunk the body when appropriate? > * content-length - does the server automatically calculate it? These seem useful. > * cache validation - does the server handle If-Modified-Since and > If-None-Match requests appropriately (e.g., with a 304)? I would almost certainly expect this to be false. There may be some WSGI servers that have an extended notion of the application, so they can look at things like the modification date. But those are likely to be uncommon -- more likely only applications will know the necessary information. > * content-encoding - does the server apply content-encoding in requests > and/or responses as appropriate, and what schemes does it support? > * transfer-encoding - same as content-encoding Again, seems useful. What harm would there be if you assume they don't, or assume they do? I haven't thought this part through. When I think of middleware, I can think of many things like this. In most cases, I'd add a key, and if the key wasn't present I'd know it was false. But it can be odd. Say I have a middleware that catches exceptions, because that's my one example at the moment. If it is present, it would be nice if other applications didn't catch exceptions, and let them propagate all the way up. So, the application looks for environ.get('ianb_middleware.exception_catcher')? That's weird, because someone else comes along and makes their own exception catcher that works like mine; what key do they use? It would be nice if we used the same key. But then, at this point I might suggest we use 'webapp0.exception_catcher', leading up to a Web App standard that defines the meaning for a bunch more keys ('webapp1.exception_catcher' once we agree on a standard). Anyway, that's my theory on how this might go. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From jim-web-sig at jimdabell.com Fri Aug 27 02:20:30 2004 From: jim-web-sig at jimdabell.com (Jim Dabell) Date: Fri Aug 27 02:13:38 2004 Subject: [Web-SIG] Servers ignoring application-supplied headers Message-ID: <200408270120.30569.jim-web-sig@jimdabell.com> > In general, the server or gateway is responsible for ensuring that correct > headers are sent to the client: if the application omits a needed header, > the server or gateway *should* add it. For example, the HTTP ``Date:`` and > ``Server:`` headers would normally be supplied by the server or gateway. If > the application supplies a header that the server would ordinarily supply, > or that contradicts the server's intended behavior (e.g. supplying a > different ``Connection:`` header), the server or gateway *may* discard the > conflicting header, provided that its action is recorded for the benefit of > the application author. Is this wise? It's not really the WSGI's job to nanny the application and make sure it does the right thing. I can see the case for supplying default values, but simply throwing away something it's specifically been asked to use seems rather shortsighted. WSGI authors aren't perfect, and it's far to easy to end up in a situation where application developers are stuck behind a clueless WSGI that insists on ignoring certain things because it thinks it's the right thing to do. It seems to me that if the application developers want to do something, WSGI shouldn't make it intentionally impossible for them to do. The worst that is likely to happen is the application developer tries something and it breaks, so he doesn't try it again, right? -- Jim Dabell From mnot at mnot.net Fri Aug 27 02:29:25 2004 From: mnot at mnot.net (Mark Nottingham) Date: Fri Aug 27 02:29:29 2004 Subject: [Web-SIG] Servers ignoring application-supplied headers In-Reply-To: <200408270120.30569.jim-web-sig@jimdabell.com> References: <200408270120.30569.jim-web-sig@jimdabell.com> Message-ID: <26460C34-F7C0-11D8-82BE-000A95BD86C0@mnot.net> I assume that this part was written with CGI in mind. Not to say that we shouldn't do better than CGI when possible... On Aug 26, 2004, at 5:20 PM, Jim Dabell wrote: >> In general, the server or gateway is responsible for ensuring that >> correct >> headers are sent to the client: if the application omits a needed >> header, >> the server or gateway *should* add it. For example, the HTTP >> ``Date:`` and >> ``Server:`` headers would normally be supplied by the server or >> gateway. If >> the application supplies a header that the server would ordinarily >> supply, >> or that contradicts the server's intended behavior (e.g. supplying a >> different ``Connection:`` header), the server or gateway *may* >> discard the >> conflicting header, provided that its action is recorded for the >> benefit of >> the application author. > > Is this wise? It's not really the WSGI's job to nanny the application > and > make sure it does the right thing. I can see the case for supplying > default > values, but simply throwing away something it's specifically been > asked to > use seems rather shortsighted. WSGI authors aren't perfect, and it's > far to > easy to end up in a situation where application developers are stuck > behind a > clueless WSGI that insists on ignoring certain things because it > thinks it's > the right thing to do. It seems to me that if the application > developers > want to do something, WSGI shouldn't make it intentionally impossible > for > them to do. > > The worst that is likely to happen is the application developer tries > something and it breaks, so he doesn't try it again, right? -- Mark Nottingham http://www.mnot.net/ From pje at telecommunity.com Fri Aug 27 03:42:22 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Aug 27 03:42:00 2004 Subject: [Web-SIG] Other kinds of environment variables In-Reply-To: <479BBC96-F7B3-11D8-82BE-000A95BD86C0@mnot.net> Message-ID: <5.1.1.6.0.20040826212629.02641e00@mail.telecommunity.com> At 03:57 PM 8/26/04 -0700, Mark Nottingham wrote: >* HTTP auth - does the server make the Authentication header available? >Automatically generate 401s when configured to require auth? If the >application framework wishes to perform auth on its own, will it have the >appropriate information available? This is already a problem today, I'm afraid. For example, Apache 1.x doesn't normally supply this header to CGI applications at least. (Which is really silly, IMO, because using REMOTE_USER instead can leads to serious security issues in shared hosting environments.) Anyway, I think this is one that has to remain an unspecified deployment-specific issue. No sane framework targeting multiple web servers is going to rely solely on HTTP basic-auth if it can avoid it anyway. Basic-auth sucks on far too many levels. I'm not saying that it doesn't have its niche, I'm just saying that I don't think we can make any guarantees about it in the WSGI spec without breaking something. >* chunked encoding - does the server chunk the body when appropriate? >* content-length - does the server automatically calculate it? There's a section on both of these in the current draft, just not the last one I posted. I sent a copy to peps@python.org, but haven't gotten a reply yet. Here's the relevant section from the latest draft: """Handling the ``Content-Length`` Header ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If the application does not supply a ``Content-Length`` header, a server or gateway may choose one of several approaches to handling it. The simplest of these is to close the client connection when the response is completed. Under some circumstances, however, the server or gateway may be able to either generate a ``Content-Length`` header, or at least avoid the need to close the client connection. If the application does *not* call the ``write()`` callable, and returns an iterable whose ``len()`` is 1, then the server can automatically determine ``Content-Length`` by taking the length of the first string yielded by the iterable. And, if the server and client both support HTTP/1.1 "chunked encoding" [3]_, then the server *may* use chunked encoding to send a chunk for each ``write()`` call or string yielded by the iterable, thus generating a ``Content-Length`` header for each chunk. This allows the server to keep the client connection alive, if it wishes to do so. Note that the server *must* comply fully with RFC 2616 when doing this, or else fall back to one of the other strategies for dealing with the absence of ``Content-Length``. """ >* cache validation - does the server handle If-Modified-Since and >If-None-Match requests appropriately (e.g., with a 304)? IMO this is an application responsibility; given dynamic content, how can the server verify these? >* content-encoding - does the server apply content-encoding in requests >and/or responses as appropriate, and what schemes does it support? >* transfer-encoding - same as content-encoding Do you have any suggestions? My assumption is that the server should "first do no harm". That is, the server shouldn't silently "value-add" encodings unless it's absolutely sure it's okay to do so, or a human has configured it to do so. >I know that this can be addressed by server-specific environment now, but >I think there might be some low-hanging fruit for common functions like >the ones above. It might be that they'd be better in a separate document, >so they're not part of the 'core' WSGI, but I think there's real value in >having some common ones. I think it certainly would be useful to have a comprehensive set of guidelines for how to use, provide, or apply HTTP/1.1 features in WSGI. Judging from your input so far, I'd say you have a better handle on the subject than I do, so your contribution would be very welcome. It may indeed make sense to create a separate PEP for them, since they will mainly be needed by server authors and by people who need to make use of some set of HTTP/1.1 features. Other areas that need to be addressed within HTTP/1.1 probably also includes things like byte ranges. From pje at telecommunity.com Fri Aug 27 03:43:37 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Aug 27 03:43:14 2004 Subject: [Web-SIG] Re: expect/continue In-Reply-To: <291B974B-F7B4-11D8-82BE-000A95BD86C0@mnot.net> Message-ID: <5.1.1.6.0.20040826214248.02ed37b0@mail.telecommunity.com> At 04:03 PM 8/26/04 -0700, Mark Nottingham wrote: >So, I think this option should be removed. I can see some scenarios where >the server can and will be configured to reject all requests over a >certain size, etc. but rejecting all requests that use this mechanism >indiscriminately doesn't seem to fall into that case. If an implementation >doesn't want to deal with expect/continue at all, it has two choices; > >1) don't claim to be HTTP/1.1 conformant > >2) wait until the client decides you don't support expect/continue, and >sends the request body (this is suboptimal, for obvious reasons). Sounds pretty good to me; why don't you just pull all the HTTP/1.1 stuff from WSGI and use it as a skeleton for starting your HTTP/1.1 guidelines document? :) From jim-web-sig at jimdabell.com Fri Aug 27 04:17:02 2004 From: jim-web-sig at jimdabell.com (Jim Dabell) Date: Fri Aug 27 04:10:36 2004 Subject: [Web-SIG] Other kinds of environment variables In-Reply-To: <5.1.1.6.0.20040826212629.02641e00@mail.telecommunity.com> References: <5.1.1.6.0.20040826212629.02641e00@mail.telecommunity.com> Message-ID: <200408270317.02905.jim-web-sig@jimdabell.com> On Friday 27 August 2004 02:42, Phillip J. Eby wrote: > At 03:57 PM 8/26/04 -0700, Mark Nottingham wrote: > >* cache validation - does the server handle If-Modified-Since and > >If-None-Match requests appropriately (e.g., with a 304)? > > IMO this is an application responsibility; given dynamic content, how can > the server verify these? In my opinion this is a middleware responsibility. Look at the headers supplied by the client, put off beginning the response until all the response headers are retrieved from the application, and respond with a 304 where appropriate. Are there any situations you can think of where an application would want to generate a matching Last-Modified or ETag header but not generate a 304? If that happens, what stops an intermediate proxy from throwing the response body away and responding with a 304 itself? > >* content-encoding - does the server apply content-encoding in requests > >and/or responses as appropriate, and what schemes does it support? > >* transfer-encoding - same as content-encoding > > Do you have any suggestions? My assumption is that the server should > "first do no harm". That is, the server shouldn't silently "value-add" > encodings unless it's absolutely sure it's okay to do so, or a human has > configured it to do so. I think the constraints RFC 2616 puts on HTTP proxies should apply to servers/middleware because that's essentially what they are. Basically, any transformation can occur as long as the server/middleware understands the relevant parts of the protocol, even to the point of transforming from one media type to another (as long as cache-control: no-transform isn't encountered, of course). If a downstream proxy can make comprehensive changes to the message without any authorisation beyond sitting between the two parties, I think servers/middleware should be at least as free. > >I know that this can be addressed by server-specific environment now, but > >I think there might be some low-hanging fruit for common functions like > >the ones above. It might be that they'd be better in a separate document, > >so they're not part of the 'core' WSGI, but I think there's real value in > >having some common ones. > > I think it certainly would be useful to have a comprehensive set of > guidelines for how to use, provide, or apply HTTP/1.1 features in > WSGI. I agree that having this in a separate document is the best approach, but I don't think that it's something specific to WSGI. Last time I checked, the Atom guys also felt the need to have a best practices document in relation to HTTP usage, so perhaps a collaboration is in order? I seem to remember a "common pitfalls" type document from the W3C from a few years ago as well, but I've failed to dig anything up so far. -- Jim Dabell From mnot at mnot.net Fri Aug 27 05:44:38 2004 From: mnot at mnot.net (Mark Nottingham) Date: Fri Aug 27 05:44:42 2004 Subject: [Web-SIG] Other kinds of environment variables In-Reply-To: <5.1.1.6.0.20040826212629.02641e00@mail.telecommunity.com> References: <5.1.1.6.0.20040826212629.02641e00@mail.telecommunity.com> Message-ID: <6BBA3664-F7DB-11D8-82BE-000A95BD86C0@mnot.net> On Aug 26, 2004, at 6:42 PM, Phillip J. Eby wrote: >> * HTTP auth - does the server make the Authentication header >> available? Automatically generate 401s when configured to require >> auth? If the application framework wishes to perform auth on its own, >> will it have the appropriate information available? > > This is already a problem today, I'm afraid. For example, Apache 1.x > doesn't normally supply this header to CGI applications at least. > (Which is really silly, IMO, because using REMOTE_USER instead can > leads to serious security issues in shared hosting environments.) > > Anyway, I think this is one that has to remain an unspecified > deployment-specific issue. No sane framework targeting multiple web > servers is going to rely solely on HTTP basic-auth if it can avoid it > anyway. Basic-auth sucks on far too many levels. I'm not saying that > it doesn't have its niche, I'm just saying that I don't think we can > make any guarantees about it in the WSGI spec without breaking > something. Digest auth sucks much less, and also uses REMOTE_USER. >> * chunked encoding - does the server chunk the body when appropriate? >> * content-length - does the server automatically calculate it? > > There's a section on both of these in the current draft, just not the > last one I posted. I sent a copy to peps@python.org, but haven't > gotten a reply yet. > > Here's the relevant section from the latest draft: > > """Handling the ``Content-Length`` Header > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > If the application does not supply a ``Content-Length`` header, a > server or gateway may choose one of several approaches to handling > it. The simplest of these is to close the client connection when > the response is completed. > > Under some circumstances, however, the server or gateway may be > able to either generate a ``Content-Length`` header, or at least > avoid the need to close the client connection. If the application > does *not* call the ``write()`` callable, and returns an iterable > whose ``len()`` is 1, then the server can automatically determine > ``Content-Length`` by taking the length of the first string yielded > by the iterable. > > And, if the server and client both support HTTP/1.1 "chunked > encoding" [3]_, then the server *may* use chunked encoding to send > a chunk for each ``write()`` call or string yielded by the iterable, > thus generating a ``Content-Length`` header for each chunk. This > allows the server to keep the client connection alive, if it wishes > to do so. Note that the server *must* comply fully with RFC 2616 when > doing this, or else fall back to one of the other strategies for > dealing with the absence of ``Content-Length``. > """ Looks good. >> I know that this can be addressed by server-specific environment now, >> but I think there might be some low-hanging fruit for common >> functions like the ones above. It might be that they'd be better in a >> separate document, so they're not part of the 'core' WSGI, but I >> think there's real value in having some common ones. > > I think it certainly would be useful to have a comprehensive set of > guidelines for how to use, provide, or apply HTTP/1.1 features in > WSGI. Judging from your input so far, I'd say you have a better > handle on the subject than I do, so your contribution would be very > welcome. It may indeed make sense to create a separate PEP for them, > since they will mainly be needed by server authors and by people who > need to make use of some set of HTTP/1.1 features. OK, I'll take that as a challenge :) I agree that it doesn't make sense to put this onto the critical path for WSGI getting into a PEP. > Other areas that need to be addressed within HTTP/1.1 probably also > includes things like byte ranges. Ah, yes. -- Mark Nottingham http://www.mnot.net/ From mnot at mnot.net Fri Aug 27 05:44:45 2004 From: mnot at mnot.net (Mark Nottingham) Date: Fri Aug 27 05:44:54 2004 Subject: [Web-SIG] Other kinds of environment variables In-Reply-To: <412E7A58.6030800@colorstudy.com> References: <479BBC96-F7B3-11D8-82BE-000A95BD86C0@mnot.net> <412E7A58.6030800@colorstudy.com> Message-ID: <6FF369B1-F7DB-11D8-82BE-000A95BD86C0@mnot.net> On Aug 26, 2004, at 5:03 PM, Ian Bicking wrote: >> * cache validation - does the server handle If-Modified-Since and >> If-None-Match requests appropriately (e.g., with a 304)? > > I would almost certainly expect this to be false. There may be some > WSGI servers that have an extended notion of the application, so they > can look at things like the modification date. But those are likely > to be uncommon -- more likely only applications will know the > necessary information. Apache CGI does it; i.e., if you set a Last-Modified header, it'll automagically handle validation for you. This is pretty old, but gives an indication of what Web servers do in this and other regards: http://www.mnot.net/papers/capabilities.pdf > When I think of middleware, I can think of many things like this. In > most cases, I'd add a key, and if the key wasn't present I'd know it > was false. But it can be odd. Say I have a middleware that catches > exceptions, because that's my one example at the moment. If it is > present, it would be nice if other applications didn't catch > exceptions, and let them propagate all the way up. So, the > application looks for > environ.get('ianb_middleware.exception_catcher')? That's weird, > because someone else comes along and makes their own exception catcher > that works like mine; what key do they use? It would be nice if we > used the same key. > > But then, at this point I might suggest we use > 'webapp0.exception_catcher', leading up to a Web App standard that > defines the meaning for a bunch more keys ('webapp1.exception_catcher' > once we agree on a standard). > > Anyway, that's my theory on how this might go. I can totally see this stuff happening on a more ad hoc basis. We did similar things at Akamai; i.e., putting together a dynamically-configured pipeline of handlers to implement HTTP functionality, as well as content transforms. Very useful and very cool. -- Mark Nottingham http://www.mnot.net/ From pje at telecommunity.com Fri Aug 27 06:02:45 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Aug 27 06:02:24 2004 Subject: [Web-SIG] Other kinds of environment variables In-Reply-To: <6FF369B1-F7DB-11D8-82BE-000A95BD86C0@mnot.net> References: <412E7A58.6030800@colorstudy.com> <479BBC96-F7B3-11D8-82BE-000A95BD86C0@mnot.net> <412E7A58.6030800@colorstudy.com> Message-ID: <5.1.1.6.0.20040826235336.023b1ab0@mail.telecommunity.com> At 08:44 PM 8/26/04 -0700, Mark Nottingham wrote: >On Aug 26, 2004, at 5:03 PM, Ian Bicking wrote: > >>>* cache validation - does the server handle If-Modified-Since and >>>If-None-Match requests appropriately (e.g., with a 304)? >> >>I would almost certainly expect this to be false. There may be some WSGI >>servers that have an extended notion of the application, so they can look >>at things like the modification date. But those are likely to be >>uncommon -- more likely only applications will know the necessary information. > >Apache CGI does it; i.e., if you set a Last-Modified header, it'll >automagically handle validation for you. I guess the relevance of this depends on whether bandwidth or CPU is the scarcer resource. If you want to save CPU, the application should do this, so it doesn't have to produce a response body it doesn't need. If all you care about is bandwidth, then certainly the server can truncate the body. I'm inclined to make this guideline permissive: a server *may* treat write() as a no-op and change the status if it can do so safely. But I don't think servers should be required to do this. [Ian:] >>When I think of middleware, I can think of many things like this. In >>most cases, I'd add a key, and if the key wasn't present I'd know it was >>false. But it can be odd. Say I have a middleware that catches >>exceptions, because that's my one example at the moment. If it is >>present, it would be nice if other applications didn't catch exceptions, >>and let them propagate all the way up. So, the application looks for >>environ.get('ianb_middleware.exception_catcher')? That's weird, because >>someone else comes along and makes their own exception catcher that works >>like mine; what key do they use? It would be nice if we used the same key. I'm somewhat negative on this concept; to me an application should be responsible for catching its own exceptions, or require a middleware wrapping for it. The server/gateway *has* to be responsible for catching any otherwise uncaught exceptions. I don't really get the concept of wanting to *not* catch exceptions. If you have a two-layer model (app+exception catcher), just put the handler you want to use in place as middleware. If the app has its own exception handling, surely it knows better how to handle the exception than anything else, so why change? [Mark:] >I can totally see this stuff happening on a more ad hoc basis. We did >similar things at Akamai; i.e., putting together a dynamically-configured >pipeline of handlers to implement HTTP functionality, as well as content >transforms. Very useful and very cool. Ah, now I see why you know so much about HTTP/1.1 issues "in the field". :) From pje at telecommunity.com Fri Aug 27 06:07:14 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Aug 27 06:06:49 2004 Subject: [Web-SIG] Servers ignoring application-supplied headers In-Reply-To: <200408270120.30569.jim-web-sig@jimdabell.com> Message-ID: <5.1.1.6.0.20040827000412.02399e40@mail.telecommunity.com> At 01:20 AM 8/27/04 +0100, Jim Dabell wrote: > > In general, the server or gateway is responsible for ensuring that correct > > headers are sent to the client: if the application omits a needed header, > > the server or gateway *should* add it. For example, the HTTP ``Date:`` and > > ``Server:`` headers would normally be supplied by the server or > gateway. If > > the application supplies a header that the server would ordinarily supply, > > or that contradicts the server's intended behavior (e.g. supplying a > > different ``Connection:`` header), the server or gateway *may* discard the > > conflicting header, provided that its action is recorded for the benefit of > > the application author. > >Is this wise? It's not really the WSGI's job to nanny the application and >make sure it does the right thing. I can see the case for supplying default >values, but simply throwing away something it's specifically been asked to >use seems rather shortsighted. WSGI authors aren't perfect, and it's far to >easy to end up in a situation where application developers are stuck behind a >clueless WSGI that insists on ignoring certain things because it thinks it's >the right thing to do. It seems to me that if the application developers >want to do something, WSGI shouldn't make it intentionally impossible for >them to do. > >The worst that is likely to happen is the application developer tries >something and it breaks, so he doesn't try it again, right? Fair enough. I should probably narrow that phrasing more specifically to the issue I had in mind. Specifically, it shouldn't be the application's job to control whether the connection will persist or not. That's something that (IMO) belongs squarely in the server/gateway's bailiwick. I guess I was just trying to get away without studying the keep-alive/connection header specs enough to be more specific. :) It may be that there are other response headers that similarly should be the exclusive preserve of the server, but I don't know what they are at present. From pje at telecommunity.com Fri Aug 27 06:11:49 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Aug 27 06:11:24 2004 Subject: [Web-SIG] Other kinds of environment variables In-Reply-To: <6BBA3664-F7DB-11D8-82BE-000A95BD86C0@mnot.net> References: <5.1.1.6.0.20040826212629.02641e00@mail.telecommunity.com> <5.1.1.6.0.20040826212629.02641e00@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040827000752.0239b2b0@mail.telecommunity.com> At 08:44 PM 8/26/04 -0700, Mark Nottingham wrote: >Digest auth sucks much less, and also uses REMOTE_USER. As I said, REMOTE_USER in a CGI environment leads to nasty local-system security holes: potentially a local user can just set REMOTE_USER=whoeverIwantToBe and invoke the application. Maybe we should, however, have a configuration key for 'wsgi.auth_available' that indicates the availability of the HTTP_AUTHORIZATION header. Absence of 'wsgi.auth_available' would mean that the availability is unknown, while true or false would indicate definite availability or lack thereof. From ianb at colorstudy.com Fri Aug 27 06:11:28 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Aug 27 06:11:34 2004 Subject: [Web-SIG] Other kinds of environment variables In-Reply-To: <5.1.1.6.0.20040826235336.023b1ab0@mail.telecommunity.com> References: <412E7A58.6030800@colorstudy.com> <479BBC96-F7B3-11D8-82BE-000A95BD86C0@mnot.net> <412E7A58.6030800@colorstudy.com> <5.1.1.6.0.20040826235336.023b1ab0@mail.telecommunity.com> Message-ID: <412EB470.7010407@colorstudy.com> Phillip J. Eby wrote: >>> When I think of middleware, I can think of many things like this. In >>> most cases, I'd add a key, and if the key wasn't present I'd know it >>> was false. But it can be odd. Say I have a middleware that catches >>> exceptions, because that's my one example at the moment. If it is >>> present, it would be nice if other applications didn't catch >>> exceptions, and let them propagate all the way up. So, the >>> application looks for >>> environ.get('ianb_middleware.exception_catcher')? That's weird, >>> because someone else comes along and makes their own exception >>> catcher that works like mine; what key do they use? It would be nice >>> if we used the same key. > > > I'm somewhat negative on this concept; to me an application should be > responsible for catching its own exceptions, or require a middleware > wrapping for it. The server/gateway *has* to be responsible for > catching any otherwise uncaught exceptions. I don't really get the > concept of wanting to *not* catch exceptions. If you have a two-layer > model (app+exception catcher), just put the handler you want to use in > place as middleware. If the app has its own exception handling, surely > it knows better how to handle the exception than anything else, so why > change? Generally the app doesn't know how to best handle unexpected exceptions. There's no "right" way to handle unexpected exceptions, because they are unexpected. Handing unexpected exceptions is usually installation-specific. Imagining a heterogeneous setup with multiple applications, it would be annoying to configure each application when you could group them, and to deal with some applications having poor support for debugging vs. others. E.g., a good exception catcher will log lots of information for post-mortem debugging, notify the appropriate person, etc. A poor exception catcher just prints out the traceback for the user. Blech. Certainly this could also be done as a library. Maybe that's better, but I still like the idea of centralizing it. I don't think it's so bad. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Fri Aug 27 06:34:17 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Aug 27 06:34:22 2004 Subject: [Web-SIG] Other kinds of environment variables In-Reply-To: <6FF369B1-F7DB-11D8-82BE-000A95BD86C0@mnot.net> References: <479BBC96-F7B3-11D8-82BE-000A95BD86C0@mnot.net> <412E7A58.6030800@colorstudy.com> <6FF369B1-F7DB-11D8-82BE-000A95BD86C0@mnot.net> Message-ID: <412EB9C9.5030103@colorstudy.com> Mark Nottingham wrote: > On Aug 26, 2004, at 5:03 PM, Ian Bicking wrote: > >>> * cache validation - does the server handle If-Modified-Since and >>> If-None-Match requests appropriately (e.g., with a 304)? >> >> >> I would almost certainly expect this to be false. There may be some >> WSGI servers that have an extended notion of the application, so they >> can look at things like the modification date. But those are likely >> to be uncommon -- more likely only applications will know the >> necessary information. > > > Apache CGI does it; i.e., if you set a Last-Modified header, it'll > automagically handle validation for you. That seems... well, Apache is putting forward effort, but obviously it's not a terribly efficient way to go about it. I think it would be fine if a server did that when it could, but I wouldn't leave it up to the server if the application was able to handle it on its own. So it's not particularly important for the application to know if the server is going to do this, as it wouldn't change what the application does. (So long as the application is giving an accurate Last-Modified header, which I think we should expect.) But this made me think, the WSGI spec leaves lots of ways for the server to add extensions to the request, but not many ways to extend the application. Presumably if you wanted the server to be able to handle this, the server would have to be able to query the application in some way. In this case it would be sufficient to have the server implicitly query the application by looking at the headers it produces. From there, it would want to abort the application (though it might be reasonable for the application to refuse to be aborted). There's no allowance for this, nor can I think of an extension to allow it. This is where application starts to blend into resource, which isn't the way WSGI looks at things (reasonably, since resources are much more complex and full of structure compared to applications). As I think about it, I'm kind of talking myself out of the whole thing... but there are places where middleware could make good use by looking ahead into the application, but I don't think WSGI could be extended in that direction. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From paul.boddie at ementor.no Fri Aug 27 12:32:58 2004 From: paul.boddie at ementor.no (Paul Boddie) Date: Fri Aug 27 12:33:10 2004 Subject: [Web-SIG] Regarding the WSGI draft Message-ID: <89DE0F3E9781C048A14DC88C06D9F93D13EF30@100nooslmsg005.common.alpharoot.net> Bob Kimble wrote: -> -> I have been reading this thread for a while now, and I haven't commented -> because I have done absolutely no web development using Python. However, -> Mark's comments strike me as being dead on. I'm used to the Java Servlet -> API, which creates an API for servlets and JSP pages. The fact that there -> are several high quality application servers that all support this API -> suggests to me that creating something similar for Python makes a lot of -> sense. I have written JSP's and servlets and run them under Tomcat, but I -> know that I could just as easily run them under WebSphere, WebLogic, -> JRun, or any others that support the API. Once the deployment gymnastics and library conflicts are dealt with, yes. ;-) It's an interesting point that I'll hint at briefly below that it isn't exactly coincidence that the most popular Java frameworks are all based on the Servlet API in some way. -> It seems to me that creating a similar API for Python would -> be terrific. Of course, somebody would also have to write an application -> server to support the API, but I suspect some of the existing frameworks -> could be revamped to support it. Anyway, that's my 2 cents. I would love -> to see something similar to Tomcat and the Java Servlet API for Python. Well, Webware was created with the Java Servlet API in mind, amongst other inspirations, and there are certainly plenty of frameworks which follow the same pattern. However, having looked into implementing the high-level functionality that Mark Nottingham and Philip Eby are presumably referring to, and having looked into the differences between frameworks before (which led to the increasingly incoherent WebProgramming Wiki page), any future work of mine in that area will be done on top of WebStack: http://www.python.org/pypi?%3Aaction=search&name=WebStack Clearly, by even mentioning it I'm pushing some kind of agenda, but should I want to develop some kind of Web application or framework, I'd rather have a reasonably sane API which works across the major technologies (and does so pretty well right now). Titus Brown wrote: > > I'm moderately skeptical of the short term use of the API being > developed on this list, because in practice it is relatively easy > to implement a framework that fits on top of all of the existing > adapters (CGI, mod_python, etc.) Medium term, I think it will lead > to a welcome homogenization of server <--> adapter <--> framework > interaction, and so I think it's a valuable concept. I think it depends how many frameworks you want to support and which ones you choose. The work may be intellectually straightforward, but it isn't necessarily trivial. As for the value of the WSGI concept, if it provides a better foundation for higher-level frameworks and applications, then it's obviously a good thing. I'm not totally convinced that lots of people might want to run Webware on top of Twisted, for example, and that the Twisted people will get excited by this very notion and do the work to make it happen. (Although having now said that, they might rise to the challenge.) Moreover, when it comes to "co-locating" applications, there exists some pretty adequate solutions for doing so right now through Apache and other generic Web server solutions. > The idea of having a single framework (like Java's "servlets") is, I > think, silly. Having implemented sites in several of the existing > frameworks, it is clear that there are several different ways to > conceptualize the development of Web sites: the Quixote style and > the WebWare style are two very distinct examples. Anything that cuts > down on the variety of available frameworks is going to restrict the > options, which is bad. There are a variety of Java frameworks which are based on the Servlet API and which offer a range of fairly diverse development styles. Few people really want to code applications directly against that API, but it's misleading (if not wrong) to state that a standard API at such a low level will somehow strongly constrain how you develop your applications. As for the diversity of styles within Python Web frameworks, one has to ask whether such a standard API is useful and can support things like Quixote, SkunkWeb, Webware and Zope. If you split this analysis into the dispatching of requests and the request objects themselves, the area of "sensible" diversity is the dispatching mechanism - where Python frameworks differ in their treatment of request and response information tends to be in how comprehensive and consistent (or the opposite) the APIs for such concepts are. Where dispatching is concerned, I dislike the way many Python frameworks decide on one's behalf how the URLs are going to be interpreted, and I welcome things like Ian Bicking's enhancements to Webware that have removed such inconveniences since 0.8.1; much diversity is arguably arbitrary in this area, anyway. > However, I think it is incumbent upon the developers and users of the > different frameworks to clearly distinguish between the various options. > Right now it is very confusing to me, and I've been developing Web sites > in Python for 5 years ;). The problem is that the average developer has to choose something to start out on, and having looked at most of the main frameworks I can say with confidence that the average developer has to make a pretty big compromise between things like API sanity, API popularity and deployment flexibility. WSGI does its thing to make deployment less of an issue (along somewhat with API popularity), but avoids the burning issue of the standard API that many people within the Python frameworks scene insist isn't necessary whilst also hinting that it could be a good thing. Certainly, I'd regard a discussion on the need for such a thing as significantly more important than the decorator discussion at the very least. Paul From brsizer at kylotan.eidosnet.co.uk Fri Aug 27 14:56:17 2004 From: brsizer at kylotan.eidosnet.co.uk (Ben Sizer) Date: Fri Aug 27 14:55:00 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> References: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> Message-ID: <412F2F71.4040608@kylotan.eidosnet.co.uk> Phillip J. Eby wrote: > Unfortunately, every effort to date to create a "framework to end all > frameworks" has simply resulted in the existence of framework N+1. > Why? Because the creation of a *new* framework means that there is no > existing code that uses it. And if the framework only provides features > that others already have, there's no compelling reason to switch. To quote http://www.python.org/sigs/web-sig/ again, "pick a Web framework that already exists". To pick an example from my minimal experience here, mod_python might make a good baseline because it's reasonably low-level already and runs on Unix and Win32, yet provides templating, dispatching, session handling, etc. Maybe if someone could replicate a large subset of mod_python's functionality on IIS then you'd have something very useful. > Any approach that ignores the economic reality of present-day Python web > apps, and provides no way for them to migrate gradually to a new > standard, is doomed to niche status at best. I see your point but I look at this from the other side; any approach that is focused on the current niche status of Python web apps, is doomed to perpetuate that niche status. No new framework or API or 'standard' Python web service is going to break existing code, just provide an alternative. Why therefore is there such a focus on accommodating existing users and having them migrate over? This sounds too much like preaching to the converted to me. > And so, the only way we're going to "steal" the marketshare of > existing frameworks is with the consent and co-operation of the > developers of those frameworks. I don't think the idea is to steal the marketshare of existing frameworks, as you put it. Rather, I'd think it would be about apturing the imagination of the average developer who would appreciate Python as a language. Some people won't use ASP because of the Microsoft aspect, or won't use PHP because of the Perl/C syntax. These are people who would probably be very interested in using an open language such as Python for this sort of thing. > First, the current situation. Choice of framework is a high investment > for users, because once they choose, they are stuck with that framework > and possibly server. This is why I would like Python to have web support in the standard library that is on a high enough level that you don't necessarily need a framework to achieve something useful. I readily agree that something such as WSGI would nicely form the backbone of the interchangeable modules. All I disagree with that you should then /need/ one of these competing frameworks on top of that before you can do anything useful. Hence my worry about the insistence on such frameworks. As it stands, web development is pretty much the only commonplace task that I can't achieve with Python using either the standard library or an obvious 3rd party package. > Because these services will be interchangeable to some degree, lock-in > is limited and competition will determine a winner or winners. Then, if > the winners are sufficiently similar to allow useful standardization, > that's the natural next step. But, for some services, the differences > will be important qualitative differences, and standardization would > reduce meaningful choice. We don't know in advance what these services > should be, and we don't know enough to standardize on them now. > > For someone with an ASP or PHP background, that last statement at least > might sound like sheer lunacy. It sounds more like refusing to sell ready meals in stores because of the insistence that everybody likes their food cooked in different ways. If we provided a simpler and effective baseline, even if that standard only featured 90% of the power and flexibility of existing services, then I expect we'd see a rapid take-up of that technology. In no way do I think that the current services and frameworks are useless. I just worry that there's this focus on what I'd vaguely call 'enterprise' level web development that shuts out the majority of developers who are trying to do something simpler. -- Ben Sizer. From brsizer at kylotan.eidosnet.co.uk Fri Aug 27 15:18:57 2004 From: brsizer at kylotan.eidosnet.co.uk (Ben Sizer) Date: Fri Aug 27 15:17:19 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: <89DE0F3E9781C048A14DC88C06D9F93D13EF30@100nooslmsg005.common.alpharoot.net> References: <89DE0F3E9781C048A14DC88C06D9F93D13EF30@100nooslmsg005.common.alpharoot.net> Message-ID: <412F34C1.7020704@kylotan.eidosnet.co.uk> Paul Boddie wrote: > It's an interesting point that I'll hint at briefly below that > it isn't exactly coincidence that the most popular > Java frameworks are all based on the Servlet API in some way. I'd certainly argue that it's no coincidence that the Servlet API - from what I can see - has built in support for sessions, handling form data, query strings, etc. What worries me about the talk on this list is that people are aspiring to give Python web development all the complexity of the Java methodology with almost none of the convenience! Personally I think that Java Servlets are still too low-level for your average web developer, so to see the implication that they're too high-level and therefore somehow limiting framework diversity is worrying. -- Ben Sizer From pje at telecommunity.com Fri Aug 27 15:40:15 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Aug 27 15:39:55 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: <412F2F71.4040608@kylotan.eidosnet.co.uk> References: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040827090419.025ab160@mail.telecommunity.com> At 01:56 PM 8/27/04 +0100, Ben Sizer wrote: >Phillip J. Eby wrote: >>Any approach that ignores the economic reality of present-day Python web >>apps, and provides no way for them to migrate gradually to a new >>standard, is doomed to niche status at best. > >I see your point but I look at this from the other side; any approach >that is focused on the current niche status of Python web apps, Huh? I meant niche status *within* the Python community. My point is that trying to promote another framework isn't going to get much past the noise level in communicating about *current* frameworks. >No new framework or API or >'standard' Python web service is going to break existing code, just >provide an alternative. Why therefore is there such a focus on >accommodating existing users and having them migrate over? Because if providing an alternative would actually change anything, then things should have already changed by now. Simply providing a new framework will not create any technical or social network effects, so that leaves marketing as the force to drive adoption. And that marketing will be limited to either 1) new users, or 2) people you can get to switch from existing frameworks. >>And so, the only way we're going to "steal" the marketshare of >>existing frameworks is with the consent and co-operation of the >>developers of those frameworks. > >I don't think the idea is to steal the marketshare of existing >frameworks, as you put it. Rather, I'd think it would be about apturing >the imagination of the average developer who would appreciate Python as >a language Okay, so you want to recruit non-Python developers; fine. Go for it. That's entirely orthogonal to what I'm trying to do with WSGI. But, I think you'll have an easier time of it once WSGI is ubiquitous and APIs emerge that you can then use as a standard. >>First, the current situation. Choice of framework is a high investment >>for users, because once they choose, they are stuck with that framework >>and possibly server. > >This is why I would like Python to have web support in the standard >library that is on a high enough level that you don't necessarily need a >framework to achieve something useful. As a practical matter, you'll need commuity support to get something like that in the standard library, and the political reality of the community is that you'll have to show why accepting your new framework N+1 doesn't mean that frameworks 1 through N should also be included. >I readily agree that something such as WSGI would nicely form the >backbone of the interchangeable modules. All I disagree with that you >should then /need/ one of these competing frameworks on top of that >before you can do anything useful. It's not that you should *need* one of them; it's merely that if adding such features gets in the way of WSGI goals, then we don't add such features. And adding those features gets in the way of the goals if it adds unwarranted complexity to servers, gateways, or middleware. Therefore, it may make sense for those wishing to devise a higher-level or "friendlier" API, to build it atop WSGI in parallel with the current standardization efforts, and then propose it as a stdlib addition once WSGI is stable and has seen some adoption, perhaps as part of an effort to upgrade the stdlib-included web servers and gateways to support WSGI. Such efforts would be easier to spin as "tools for WSGI" rather than "web framework N+1". >It sounds more like refusing to sell ready meals in stores because of >the insistence that everybody likes their food cooked in different ways. Not at all; we have dozens of lines of ready meals with names like Albatross, CherryPy, SkunkWeb, Quixote, and so on. It's merely that the marketplace is already crowded with such manufacturers and launching a new line to compete with them isn't likely to be a profitable venture. Instead, we've decided to standardize packaging materials and sell boxes and trays and suchlike to all the existing meal manufacturers. :) >If we provided a simpler and effective baseline, even if that standard >only featured 90% of the power and flexibility of existing services, >then I expect we'd see a rapid take-up of that technology. You must be making some assumptions that aren't clear to me. If existing services provide 100% of those capabilities, why hasn't one of them already taken the lead? Perhaps you think that endorsement by the Web-SIG is all that's needed. Maybe that could work, I don't know. But, how will you obtain the endorsement of the Web-SIG? Keep in mind that a lot of the people actually doing any work on the Web-SIG are authors of existing frameworks, which means to get buy-in you have to support their goals. To support their goals, you need an API that allows them to continue to scratch whatever itches prompted them to write their particular framework in the first place, not to mention avoid losing their investement in application code already written to that API. But, the higher the level of abstraction in the API, the greater the chance that some facility on which they depend, will not be expressible in the high-level API so as to allow them to continue to use code based on their existing APIs, and therefore the more difficult it will be to get the support of those participants. WSGI lets us bypass all this, by beginning with something that everybody can use, because everybody's using HTTP, and WSGI only deals with HTTP. >In no way do I think that the current services and frameworks are >useless. I just worry that there's this focus on what I'd vaguely call >'enterprise' level web development that shuts out the majority of >developers who are trying to do something simpler. Fair enough. But personally, if that were my goal, I'd design *two* APIs: one to emulate ASP, and the other to emulate PHP. Then I'd write translators to do the mechanical work of translating most of the syntax to e.g. PSP. That would do *much* more to bring in non-Python developers than any new Python framework would. For one thing, the mere existence of such a tool for ASP or PHP applications would create vast amounts of publicity (blog postings, articles, etc.) that money couldn't buy, and they'd be in exactly the right places: where those non-Python programmers will see them. By contrast, announcing another Python framework seems to me unlikely to create a splash even *within* the Python community, let alone outside it. From ianb at colorstudy.com Fri Aug 27 18:23:02 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Aug 27 18:24:36 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: <412F2F71.4040608@kylotan.eidosnet.co.uk> References: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> <412F2F71.4040608@kylotan.eidosnet.co.uk> Message-ID: <412F5FE6.5000504@colorstudy.com> Ben Sizer wrote: >> Any approach that ignores the economic reality of present-day Python >> web apps, and provides no way for them to migrate gradually to a new >> standard, is doomed to niche status at best. > > I see your point but I look at this from the other side; any approach > that is focused on the current niche status of Python web apps, is > doomed to perpetuate that niche status. No new framework or API or > 'standard' Python web service is going to break existing code, just > provide an alternative. Why therefore is there such a focus on > accommodating existing users and having them migrate over? This sounds > too much like preaching to the converted to me. Well, from a purely sociological tack: We need to pay attention to current frameworks because this is open source -- it's not just a matter of converting users, it's a matter of converting contributing developers. Getting a bunch of users only helps an open source project if those users contribute back to the project. Past performance is some indication of future success, so it's best to try to get *current* open source web developers on board. >> And so, the only way we're going to "steal" the marketshare of >> existing frameworks is with the consent and co-operation of the >> developers of those frameworks. > > I don't think the idea is to steal the marketshare of existing > frameworks, as you put it. Rather, I'd think it would be about apturing > the imagination of the average developer who would appreciate Python as > a language. Some people won't use ASP because of the Microsoft aspect, > or won't use PHP because of the Perl/C syntax. These are people who > would probably be very interested in using an open language such as > Python for this sort of thing. > >> First, the current situation. Choice of framework is a high >> investment for users, because once they choose, they are stuck with >> that framework and possibly server. > > > This is why I would like Python to have web support in the standard > library that is on a high enough level that you don't necessarily need a > framework to achieve something useful. We already have that support, and it even works pretty well with WSGI: the cgi module. What, the cgi module is stupid and annoying you say? (Well, if you won't say it I will.) To me that's evidence that Just Any Old Thing won't do. > I readily agree that something such as WSGI would nicely form the > backbone of the interchangeable modules. All I disagree with that you > should then /need/ one of these competing frameworks on top of that > before you can do anything useful. Hence my worry about the insistence > on such frameworks. As it stands, web development is pretty much the > only commonplace task that I can't achieve with Python using either the > standard library or an obvious 3rd party package. Why is WSGI's limited scope a problem? I feel fairly certain that we can get WSGI approved and start building things on it fairly soon, but anything more expansive will take much, much longer to move forward on. Your more expansive desires have been out there for a long time, if not proposed by yourself, proposed by other people (including me). And yet there's been little forward movement in terms of any standard. "Little" is probably an overstatement in that last sentence. We are moving ahead with a smaller step -- that's still much more forward progress than before. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Fri Aug 27 18:27:53 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Aug 27 18:28:36 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: <89DE0F3E9781C048A14DC88C06D9F93D13EF30@100nooslmsg005.common.alpharoot.net> References: <89DE0F3E9781C048A14DC88C06D9F93D13EF30@100nooslmsg005.common.alpharoot.net> Message-ID: <412F6109.30807@colorstudy.com> Paul Boddie wrote: > I think it depends how many frameworks you want to support and which > ones you choose. The work may be intellectually straightforward, but > it isn't necessarily trivial. As for the value of the WSGI concept, > if it provides a better foundation for higher-level frameworks and > applications, then it's obviously a good thing. I'm not totally > convinced that lots of people might want to run Webware on top of > Twisted, for example, and that the Twisted people will get excited by > this very notion and do the work to make it happen. (Although having > now said that, they might rise to the challenge.) Moreover, when it > comes to "co-locating" applications, there exists some pretty > adequate solutions for doing so right now through Apache and other > generic Web server solutions. This is open source -- the Twisted people don't have to be very excited about Webware in order for Webware to run on Twisted. *Someone* has to be excited about it, that's all. But WSGI takes it one further -- instead of the NxM problem which you are addressing with WebStack (well, Nx1 in that case, but NxM if you started nesting arbitrary frameworks), simply by making Webware run on WSGI, and making Twisted into a WSGI server, they could be used together. So I think there's more reason to be optimistic about the possibilities. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From amk at amk.ca Fri Aug 27 18:33:40 2004 From: amk at amk.ca (A.M. Kuchling) Date: Fri Aug 27 18:34:03 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: <412F5FE6.5000504@colorstudy.com> References: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> <412F2F71.4040608@kylotan.eidosnet.co.uk> <412F5FE6.5000504@colorstudy.com> Message-ID: <20040827163340.GA29076@rogue.amk.ca> On Fri, Aug 27, 2004 at 11:23:02AM -0500, Ian Bicking wrote: > Why is WSGI's limited scope a problem? I feel fairly certain that we > can get WSGI approved and start building things on it fairly soon, but > anything more expansive will take much, much longer to move forward on. Definite agreement. When faced with a problem, it doesn't take very long to build a large list of requirements, a list large enough to frighten away all potential implementors. Python developers seem to suffer from this problem to an extreme degree. It's unfortunate that WSGI probably isn't going to be finished in time for Python 2.4, so that BaseHTTPServer or some similar class could support it in the stdlib. 2.4alpha3 is scheduled for September 3rd, and is planned to be the last alpha; no new features are introduced at the beta stage, so that means WSGI support would have to wait until Python 2.5. --amk From pje at telecommunity.com Fri Aug 27 18:43:56 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Aug 27 18:43:33 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: <20040827163340.GA29076@rogue.amk.ca> References: <412F5FE6.5000504@colorstudy.com> <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> <412F2F71.4040608@kylotan.eidosnet.co.uk> <412F5FE6.5000504@colorstudy.com> Message-ID: <5.1.1.6.0.20040827123801.02780ad0@mail.telecommunity.com> At 12:33 PM 8/27/04 -0400, A.M. Kuchling wrote: >It's unfortunate that WSGI probably isn't going to be finished in time >for Python 2.4, so that BaseHTTPServer or some similar class could >support it in the stdlib. 2.4alpha3 is scheduled for September 3rd, >and is planned to be the last alpha; no new features are introduced at >the beta stage, so that means WSGI support would have to wait until >Python 2.5. That's one week: BaseHTTPServer is HTTP/1.0-based if I recall correctly, so whipping up support shouldn't take too long. I have a draft WSGIServer based on the December draft of the PEP, so it'd just have to be beefed up. Also, I think a CGI-based gateway (with some kind of error handling) should go in, and perhaps the utility functions we discussed previously. Documentation is an issue, though, and perhaps tests as well. Also, I sent in the PEP the day before yesterday and still don't have a PEP number. So getting community support for the PEP in the time remaining might be tough, too. From amk at amk.ca Fri Aug 27 19:11:31 2004 From: amk at amk.ca (A.M. Kuchling) Date: Fri Aug 27 19:11:56 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: <5.1.1.6.0.20040827123801.02780ad0@mail.telecommunity.com> References: <412F5FE6.5000504@colorstudy.com> <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> <412F2F71.4040608@kylotan.eidosnet.co.uk> <412F5FE6.5000504@colorstudy.com> <5.1.1.6.0.20040827123801.02780ad0@mail.telecommunity.com> Message-ID: <20040827171131.GC29144@rogue.amk.ca> On Fri, Aug 27, 2004 at 12:43:56PM -0400, Phillip J. Eby wrote: > Documentation is an issue, though, and perhaps tests as well. Also, I sent > in the PEP the day before yesterday and still don't have a PEP number. Editors pinged. Coincidentally, it'll probably be PEP 333. PEP 222 was also web-related, so no more web PEPs until #444. IMHO it should have become a PEP much earlier. That gives a single place to point at the current draft, rather than having to point to a particular message in the Web-SIG list archive. It doesn't matter if the draft is incomplete -- we have PEPs that are just titles, so the WSGI spec was ahead of the game from the beginning. --amk From brsizer at kylotan.eidosnet.co.uk Fri Aug 27 19:26:05 2004 From: brsizer at kylotan.eidosnet.co.uk (Ben Sizer) Date: Fri Aug 27 19:24:31 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: <5.1.1.6.0.20040827090419.025ab160@mail.telecommunity.com> References: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> <5.1.1.6.0.20040827090419.025ab160@mail.telecommunity.com> Message-ID: <412F6EAD.9050702@kylotan.eidosnet.co.uk> Phillip J. Eby wrote: > At 01:56 PM 8/27/04 +0100, Ben Sizer wrote: >> No new framework or API or >> 'standard' Python web service is going to break existing code, just >> provide an alternative. Why therefore is there such a focus on >> accommodating existing users and having them migrate over? > > Because if providing an alternative would actually change anything, then > things should have already changed by now. Simply providing a new > framework will not create any technical or social network effects, so > that leaves marketing as the force to drive adoption. But if that framework is distributed as a standard library module, surely it will immediately gain wider recognition - and thus adoption - and also momentum towards improving it. >> This is why I would like Python to have web support in the standard >> library that is on a high enough level that you don't necessarily need a >> framework to achieve something useful. > > As a practical matter, you'll need commuity support to get something > like that in the standard library, and the political reality of the > community is that you'll have to show why accepting your new framework > N+1 doesn't mean that frameworks 1 through N should also be included. Does the suggestion in the Web-SIG charter no longer hold true then? I'm genuinely interested in the answer to that because the implication from reading it is that Python needs at least one 'good-enough' system in the standard library. However the implication from this list is that Web-SIG is more interested in catering for those who have already solved this problem and just want a bit more interoperability. > Not at all; we have dozens of lines of ready meals with names like > Albatross, CherryPy, SkunkWeb, Quixote, and so on. It's merely that the > marketplace is already crowded with such manufacturers and launching a > new line to compete with them isn't likely to be a profitable venture. Sadly each of these seem to be subtly different, often with no real benefit to the user in those differences. For instance, all the different templating styles - looking at the difference between Cheetah, PSP, jonpy.wt, and Spyce, is there really any need for all of them? It seems to be a case of different syntax yet same semantics in 90% of cases. All of these packages seem to be of high quality and are notable achievements in themselves, yet I don't see that they're /really/ offering anything so unique that standardisation would handicap the end user. >> If we provided a simpler and effective baseline, even if that standard >> only featured 90% of the power and flexibility of existing services, >> then I expect we'd see a rapid take-up of that technology. > > You must be making some assumptions that aren't clear to me. If > existing services provide 100% of those capabilities, why hasn't one of > them already taken the lead? In my opinion, it's because they're underdocumented and/or overcomplex, and non-standard. I wouldn't know where to start with Zope. It took me a while to work out how to get something useful out of mod_python. Webware looks nice but provides an awful lot, 90% of which most people won't need, making it hard for beginners to get to grips with. And so on. I expect all of these and more besides could do /everything/ I would ever need from a Python web development platform. The question is whether it's worthwhile, given the other languages and tools available to me. > But, how will you obtain the endorsement of the Web-SIG? Keep in mind > that a lot of the people actually doing any work on the Web-SIG are > authors of existing frameworks, which means to get buy-in you have to > support their goals. I'm not interested in SIG politics, to be honest. If everybody goes away from this and decides I'm wrong or that my points are irrelevant to their needs, that's fair enough. I just didn't want to be the guy complaining 6 months from now and getting told, "well, you should have brought this up on Web-SIG earlier". I like Python as a language and just wished that there wasn't this paradox where such a simple and clean language doesn't have the simple and clean access to web objects that ASP and PHP do. -- Ben Sizer. From brsizer at kylotan.eidosnet.co.uk Fri Aug 27 19:52:33 2004 From: brsizer at kylotan.eidosnet.co.uk (Ben Sizer) Date: Fri Aug 27 19:50:51 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: <412F5FE6.5000504@colorstudy.com> References: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> <412F2F71.4040608@kylotan.eidosnet.co.uk> <412F5FE6.5000504@colorstudy.com> Message-ID: <412F74E1.9060508@kylotan.eidosnet.co.uk> Ian Bicking wrote: > Ben Sizer wrote: >> This is why I would like Python to have web support in the standard >> library that is on a high enough level that you don't necessarily need a >> framework to achieve something useful. > > We already have that support, and it even works pretty well with WSGI: > the cgi module. What, the cgi module is stupid and annoying you say? > (Well, if you won't say it I will.) To me that's evidence that Just Any > Old Thing won't do. I agree that the cgi module won't do, but that's because I disagree that the cgi module is "on a high enough level". I do think that support for sessions, query strings, form handling, templating, and various url-parsing and html-escaping requirements need to be in that module for it to be considered high-level by my (admittedly subjective) standards. > Why is WSGI's limited scope a problem? I feel fairly certain that we > can get WSGI approved and start building things on it fairly soon, but > anything more expansive will take much, much longer to move forward on. > Your more expansive desires have been out there for a long time, if not > proposed by yourself, proposed by other people (including me). The only reason I think the limited scope is a problem is because it doesn't get me significantly closer to being able to say to my friends "Python is a great language for developing web sites with". It's a shame because I can say that about Python regarding almost any other application area. Maybe things will change as WSGI develops, but I can only comment on the draft that I see. -- Ben Sizer. From amk at amk.ca Fri Aug 27 19:51:08 2004 From: amk at amk.ca (A.M. Kuchling) Date: Fri Aug 27 19:51:31 2004 Subject: [Web-SIG] SIG charter In-Reply-To: <412F6EAD.9050702@kylotan.eidosnet.co.uk> References: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> <5.1.1.6.0.20040827090419.025ab160@mail.telecommunity.com> <412F6EAD.9050702@kylotan.eidosnet.co.uk> Message-ID: <20040827175108.GA29376@rogue.amk.ca> On Fri, Aug 27, 2004 at 06:26:05PM +0100, Ben Sizer wrote: > Does the suggestion in the Web-SIG charter no longer hold true then? I'm I think the charter was written by Bill Janssen, who doesn't seem to be actively participating on the list any more. The charter doesn't necessarily bear any relevance to what the individuals in the SIG are actually doing. For example, the charter talks about client-side HTTP, too, but no one is working on that aspect (even though there's no real competition in this space the way there is for server-side things). Is it worth updating the charter? I have no idea what a new charter would say... --amk From steve at holdenweb.com Fri Aug 27 20:27:44 2004 From: steve at holdenweb.com (Steve Holden) Date: Fri Aug 27 20:30:13 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: <5.1.1.6.0.20040827123801.02780ad0@mail.telecommunity.com> References: <412F5FE6.5000504@colorstudy.com> <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> <412F2F71.4040608@kylotan.eidosnet.co.uk> <412F5FE6.5000504@colorstudy.com> <5.1.1.6.0.20040827123801.02780ad0@mail.telecommunity.com> Message-ID: <412F7D20.5020201@holdenweb.com> Phillip J. Eby wrote: > At 12:33 PM 8/27/04 -0400, A.M. Kuchling wrote: > >> It's unfortunate that WSGI probably isn't going to be finished in time >> for Python 2.4, so that BaseHTTPServer or some similar class could >> support it in the stdlib. 2.4alpha3 is scheduled for September 3rd, >> and is planned to be the last alpha; no new features are introduced at >> the beta stage, so that means WSGI support would have to wait until >> Python 2.5. > > > That's one week: BaseHTTPServer is HTTP/1.0-based if I recall correctly, > so whipping up support shouldn't take too long. I have a draft > WSGIServer based on the December draft of the PEP, so it'd just have to > be beefed up. Also, I think a CGI-based gateway (with some kind of > error handling) should go in, and perhaps the utility functions we > discussed previously. > [...] I am not sure that's correct. My 2.3.4 version contains the following comment: """HTTP server base class. Note: the class in this module doesn't implement any HTTP request; see SimpleHTTPServer for simple implementations of GET, HEAD and POST (including CGI scripts). It does, however, optionally implement HTTP/1.1 persistent connections, as of version 0.3. """ and there's code in there that only complains if the HTTP version is greater than 1.1. Would be neat if you could do it, though it's a demanding and error-prone task to generate code on such short notice. Good luck! regards Steve -- XXX Please note recent change of email address From neel at mediapulse.com Fri Aug 27 21:31:11 2004 From: neel at mediapulse.com (Michael C. Neel) Date: Fri Aug 27 21:27:55 2004 Subject: [Web-SIG] SIG charter In-Reply-To: <20040827175108.GA29376@rogue.amk.ca> References: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> <5.1.1.6.0.20040827090419.025ab160@mail.telecommunity.com> <412F6EAD.9050702@kylotan.eidosnet.co.uk> <20040827175108.GA29376@rogue.amk.ca> Message-ID: <1093635070.1239.198.camel@mike.mediapulse.com> On Fri, 2004-08-27 at 13:51, A.M. Kuchling wrote: > On Fri, Aug 27, 2004 at 06:26:05PM +0100, Ben Sizer wrote: > > Does the suggestion in the Web-SIG charter no longer hold true then? I'm > > I think the charter was written by Bill Janssen, who doesn't seem to > be actively participating on the list any more. The charter doesn't > necessarily bear any relevance to what the individuals in the SIG are > actually doing. > When first started, there were alot of ideas thrown about, but I think there was also alot of 'sniping' going on, which led to a very quiet period on the list. I don't think most are here to fight for ideals, just to write code and solve some common problems. The current charter is too narrow in focus, and carries a slight bias tword suggested solutions. If I were asked what I think the charter should say, it would be along the lines of reviewing, updating, and adding modules to the python standard library that related to the web and web based technologies, and at the same time defining recommended standards around these technologies. The list is doing the latter, but not the former, which is a shame. There are several "common domain" problems that could be addressed, but aren't. One example is cookies, there is a python stdlib module for cookies and yet mod python has it's own cookie module -- this points to a reason to review the stdlib module because it isn't providing what is needed to mod python. Considering any framework will need to address cookies, this is something that makes sense to address on a Web SIG. There have also been ideas for servers and clients in the stdlib (which there already are some, so they would be built upon and expanded) and even a mention of python applets. I think all of these should also fall into the Web SIG charter, and at least merit discussion. I think the WSGI is a good concept and idea, but not a burning issue. In practice, I have never had the need to port an application / framework across servers and platforms. If that was part of the scope, it was along with a complete rewrite so I really didn't want to keep the prior application's code and possible it's framework. Also, many frameworks have placed the server specific code into a controller class that can be quickly subclassed and taken to a new server. Mike From pje at telecommunity.com Fri Aug 27 21:32:18 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Aug 27 21:31:58 2004 Subject: [Web-SIG] Regarding the WSGI draft In-Reply-To: <412F7D20.5020201@holdenweb.com> References: <5.1.1.6.0.20040827123801.02780ad0@mail.telecommunity.com> <412F5FE6.5000504@colorstudy.com> <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> <412F2F71.4040608@kylotan.eidosnet.co.uk> <412F5FE6.5000504@colorstudy.com> <5.1.1.6.0.20040827123801.02780ad0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040827152156.03835660@mail.telecommunity.com> At 02:27 PM 8/27/04 -0400, Steve Holden wrote: >I am not sure that's correct. My 2.3.4 version contains the following comment: > >"""HTTP server base class. > >Note: the class in this module doesn't implement any HTTP request; see >SimpleHTTPServer for simple implementations of GET, HEAD and POST >(including CGI scripts). It does, however, optionally implement HTTP/1.1 >persistent connections, as of version 0.3. >""" > >and there's code in there that only complains if the HTTP version is >greater than 1.1. It's not anywhere *near* RFC-compliant, though, based on our discussions of RFC 2616 here as regards e.g. "100 expect/continue". >Would be neat if you could do it, though it's a demanding and error-prone >task to generate code on such short notice. It wouldn't be that short; there's already a WSGIServer.py in my CVS based on the December draft; the differences between that and today's WSGI are minor when it comes to the semantics. It doesn't really offer decent error handling, but then again neither does BaseHTTPServer or CGIServer. From pje at telecommunity.com Fri Aug 27 21:49:30 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Aug 27 21:49:04 2004 Subject: [Web-SIG] FYI: PEP 333 posted to python-dev and python-list Message-ID: <5.1.1.6.0.20040827154658.01efaec0@mail.telecommunity.com> Anything we talked about in the last two days or so isn't in it yet, as this is the version I submitted to the PEP editors. From pf_moore at yahoo.co.uk Fri Aug 27 22:40:50 2004 From: pf_moore at yahoo.co.uk (Paul Moore) Date: Fri Aug 27 22:40:40 2004 Subject: [Web-SIG] Re: Regarding the WSGI draft References: <412E2C3E.7000900@kylotan.eidosnet.co.uk> Message-ID: Ben Sizer writes: > I've read through the draft and most of the messages on this list that > followed it. However, I have a basic problem with it which I will > attempt to summarise below. [...] > What I'd like to see is something mirroring the Python Database API. For > instance, I might have to change "import MySQLdb" to "import pyPgSQL" > but I know that 99% of the rest of the database code will work fine. As > a web developer I would like to be able to change "import cgi" to > "import mod_python" or "import fastcgi" and know that, if I follow a > standard set of calls, I will have a simple and standard way of > producing a web document. I have some reservations, as well. My perspective is as a web application *consumer* rather than a developer. I have a server which runs MoinMoin and Roundup (among other web apps). MoinMoin runs under mod_python, whereas Roundup runs as its own server, accessed via Apache and mod_proxy. If I wanted to add PyBlosxom, I'd need to run it as CGI (which, given the server hardware, is horribly slow). The variety of servers and backends gets hard to manage (and that's just with 3 applications!) I'd much prefer to only use one underlying architecture (probably mod_python), but Roundup and PyBlosxom don't support it. Ben's idea of application writers being able to easily support multiple servers, much like the DB API supports multiple backends, would be a real bonus for me, as it would make it far more likely that I could do something like this. (Either because application writers would include additional support, or because it would be simple enough for me to add it myself). I get the impression that the WSGI idea of layering and middleware might make this more likely in the longer term, but I don't see how it might happen. It certainly doesn't make it seem like something I could do for myself with an existing application. Maybe I'm missing something crucial here, but I'd certainly like to see this clarified, if it's the case. Paul -- "Bother," said the Borg, "We've assimilated Pooh." From pf_moore at yahoo.co.uk Fri Aug 27 22:50:26 2004 From: pf_moore at yahoo.co.uk (Paul Moore) Date: Fri Aug 27 22:50:16 2004 Subject: [Web-SIG] Re: Regarding the WSGI draft References: <412E2C3E.7000900@kylotan.eidosnet.co.uk> <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> Message-ID: "Phillip J. Eby" writes: > First, users can experiment with other frameworks, especially if those > frameworks are lightweight. This builds competitive pressure in the > direction of lightweight, easy-to-integrate frameworks. So framework > developers begin to break their monolithic approaches down into smaller > pieces that operate on segments of WSGI. For example, a session service > that you pass the incoming 'environ' and outgoing 'headers' to, in order > for it to read and set cookies. (Notice that this *isn't* a WSGI-defined > or standardized service, just a service implemented *in terms of* WSGI.) I think this starts to address the question I raised in my previous posting, about "run anywhere" applications. If an application is written to use WSGI-compliant services, it could run on any WSGI-compliant server. But doesn't this raise a complementary issue? With 10 applications running, I have one server. But I also have 5 session handling services, 8 authentication services, 3 error handling services, etc, etc. Maybe that's where the pressure for "best of breed" services comes from. Small steps, I guess... Paul. -- The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair. -- Douglas Adams From pje at telecommunity.com Fri Aug 27 23:00:45 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Aug 27 23:00:24 2004 Subject: [Web-SIG] Re: Regarding the WSGI draft In-Reply-To: References: <412E2C3E.7000900@kylotan.eidosnet.co.uk> Message-ID: <5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com> At 09:40 PM 8/27/04 +0100, Paul Moore wrote: >I have a server which >runs MoinMoin and Roundup (among other web apps). MoinMoin runs under >mod_python, whereas Roundup runs as its own server, accessed via >Apache and mod_proxy. If I wanted to add PyBlosxom, I'd need to run >it as CGI (which, given the server hardware, is horribly slow). The >variety of servers and backends gets hard to manage (and that's just >with 3 applications!) > >I'd much prefer to only use one underlying architecture (probably >mod_python), but Roundup and PyBlosxom don't support it. Ben's idea >of application writers being able to easily support multiple servers, >much like the DB API supports multiple backends, would be a real >bonus for me, as it would make it far more likely that I could do >something like this. (Either because application writers would >include additional support, or because it would be simple enough for >me to add it myself). > >I get the impression that the WSGI idea of layering and middleware >might make this more likely in the longer term, but I don't see how it >might happen. It certainly doesn't make it seem like something I could >do for myself with an existing application. Maybe I'm missing >something crucial here, but I'd certainly like to see this clarified, >if it's the case. Well, if you can identify the top-level control point of PyBlosxom and Roundup, you can always try converting them to WSGI. But, maybe if there's a stdlib module for WSGI utilities, a useful one would probably be something to run some code in such a way that it thinks it's running under CGI, even though it's really running under WSGI. The degree to which this could be assured is of course dependent on precisely what the application *does*, but getting 80% of CGIs (that don't depend on some kind of global state that isn't reset after each execution) to be able to run in arbitrary WSGI servers would be a handy thing, and most appropriate for the stdlib. Anybody want to volunteer to write it? ;) If it helps, WSGIServer has some code for parsing stdout headers; see: http://cvs.eby-sarna.com/PEAK/src/peak/util/WSGIServer.py?rev=1.3&content-type=text/vnd.viewcvs-markup in the WSGIRequestHandler class. (Note: this is the code I mentioned that's based on the December WSGI draft, where the response status and headers were embedded in the output stream rather than being function arguments. So don't use it as an example of a proper WSGI server at the moment!) (Offtopic, I'd note that a major reason PyBlosxom is slow may have nothing to do with CGI: my offhand impression of its code is that it appears to scan through file directories for every page rendering, just to do things like find what "flavours" might be defined in some of your post directories. But I could be wrong, and there may be some "caching" plugins that would help this.) From pje at telecommunity.com Fri Aug 27 23:12:51 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Aug 27 23:12:29 2004 Subject: [Web-SIG] Re: Regarding the WSGI draft In-Reply-To: References: <412E2C3E.7000900@kylotan.eidosnet.co.uk> <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040827170109.022bacb0@mail.telecommunity.com> At 09:50 PM 8/27/04 +0100, Paul Moore wrote: >"Phillip J. Eby" writes: > > > First, users can experiment with other frameworks, especially if those > > frameworks are lightweight. This builds competitive pressure in the > > direction of lightweight, easy-to-integrate frameworks. So framework > > developers begin to break their monolithic approaches down into smaller > > pieces that operate on segments of WSGI. For example, a session service > > that you pass the incoming 'environ' and outgoing 'headers' to, in order > > for it to read and set cookies. (Notice that this *isn't* a WSGI-defined > > or standardized service, just a service implemented *in terms of* WSGI.) > >I think this starts to address the question I raised in my previous >posting, about "run anywhere" applications. If an application is >written to use WSGI-compliant services, it could run on any >WSGI-compliant server. > >But doesn't this raise a complementary issue? With 10 applications >running, I have one server. But I also have 5 session handling >services, 8 authentication services, 3 error handling services, etc, >etc. Maybe that's where the pressure for "best of breed" services >comes from. > >Small steps, I guess... Right. Journey of a thousand miles, single step, that sort of thing. :) Anyway, once you have 5, 8, 3, etc. things that are focused on specific areas, you have an opportunity for *focused* discussion on that area, and a chance of making some progress on a standard. Right now, WSGI is focused intently on HTTP, because that's the *only* thing everybody's definitely got in common. When WSGI is also "common", then it's easy to look at other layers, because the server differences are factored out. So, we can pull out another layer, making the *next* layer up come into sharper focus, and so on. And, as you say, the duplication *will* provide a new kind of market pressure, to reduce duplication and consolidate the choices. The overall process is somewhat organic, I think, but it has to be started in a way that will take advantage of the forces currently in play (e.g. developer interest, existing investments, etc.) rather than working against them. From ianb at colorstudy.com Fri Aug 27 23:15:06 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Aug 27 23:15:40 2004 Subject: [Web-SIG] Re: Regarding the WSGI draft In-Reply-To: <5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com> References: <412E2C3E.7000900@kylotan.eidosnet.co.uk> <5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com> Message-ID: <412FA45A.8010809@colorstudy.com> Phillip J. Eby wrote: > Well, if you can identify the top-level control point of PyBlosxom and > Roundup, you can always try converting them to WSGI. But, maybe if > there's a stdlib module for WSGI utilities, a useful one would probably > be something to run some code in such a way that it thinks it's running > under CGI, even though it's really running under WSGI. The degree to > which this could be assured is of course dependent on precisely what the > application *does*, but getting 80% of CGIs (that don't depend on some > kind of global state that isn't reset after each execution) to be able > to run in arbitrary WSGI servers would be a handy thing, and most > appropriate for the stdlib. I happened to be playing with just such a thing: http://colorstudy.com/cgi-bin/viewcvs.cgi/trunk/WSGI/pycgiwrapper.py?rev=206&view=log There's a few parts I kind of punted on, though now that I think about it I know what I did wrong, so I'll fix it a bit this evening. Anyway, it's intended to work both for multiprocess (e.g., mod_python) and threaded servers, with decreasing likelihood that any particular script will actually work. But I haven't yet tested it under anything but CGI, so it really *should* work ;) I'll try running it with your WSGIServer and see how it goes. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Fri Aug 27 23:17:23 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Aug 27 23:17:03 2004 Subject: [Web-SIG] FYI: PEP 333 posted to python-dev and python-list In-Reply-To: <5.1.1.6.0.20040827154658.01efaec0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040827171300.022a9d10@mail.telecommunity.com> At 03:49 PM 8/27/04 -0400, Phillip J. Eby wrote: >Anything we talked about in the last two days or so isn't in it yet, as >this is the version I submitted to the PEP editors. Argh. It bounced back to me for length reasons. :( I'm going to have to refer people to the online text. From pje at telecommunity.com Fri Aug 27 23:36:21 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Aug 27 23:35:59 2004 Subject: [Web-SIG] Re: Regarding the WSGI draft In-Reply-To: <412FA45A.8010809@colorstudy.com> References: <5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com> <412E2C3E.7000900@kylotan.eidosnet.co.uk> <5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040827172056.025e6e50@mail.telecommunity.com> At 04:15 PM 8/27/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>Well, if you can identify the top-level control point of PyBlosxom and >>Roundup, you can always try converting them to WSGI. But, maybe if >>there's a stdlib module for WSGI utilities, a useful one would probably >>be something to run some code in such a way that it thinks it's running >>under CGI, even though it's really running under WSGI. The degree to >>which this could be assured is of course dependent on precisely what the >>application *does*, but getting 80% of CGIs (that don't depend on some >>kind of global state that isn't reset after each execution) to be able to >>run in arbitrary WSGI servers would be a handy thing, and most >>appropriate for the stdlib. > >I happened to be playing with just such a thing: > >http://colorstudy.com/cgi-bin/viewcvs.cgi/trunk/WSGI/pycgiwrapper.py?rev=206&view=log Wow; you certainly thought this through more than I did. E.g. your model for dealing with threads. The part for dealing with missing threads/threading though should probably use dummy_threading, though. Also, I notice that you're using multiple 'environ' replacements for some reason, even though any use of it is going to be wrapped in a global thread lock. So, the '_environs' dictionary seems superfluous. A minimal approach to adjusting the FieldStorage class could be: cgi.FieldStorage.__init__.im_func.func_defaults=(None,None,"",environ,0,0) which will make the default environ be the one you want. Doing that, and replacing 'os.environ' for the request's duration (and putting them back when done) seem to be required. (Note that code that doesn't use the 'cgi' module but directly checks os.environ won't work with the current state of things.) Hm. Actually, I just looked and it looks like you're not wrapping the execution in the global mutex, but you really need to because not only are sys.std* global, CGI apps aren't generally written to be multithreaded. Nice design otherwise, though. I particularly like that if you don't want to run a script file, you can just override 'run_script' to do whatever the request body is. >There's a few parts I kind of punted on, though now that I think about it >I know what I did wrong, so I'll fix it a bit this evening. Anyway, it's >intended to work both for multiprocess (e.g., mod_python) and threaded >servers, with decreasing likelihood that any particular script will >actually work. > >But I haven't yet tested it under anything but CGI, so it really *should* >work ;) I'll try running it with your WSGIServer and see how it goes. Don't forget: WSGIServer has *not* been updated to PEP 333 yet; it's still based on the old streaming approach! I've been too busy updating the spec (and replying to every thread) to update the code. :) From pje at telecommunity.com Fri Aug 27 23:59:51 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Aug 27 23:59:29 2004 Subject: [Web-SIG] Stuff left to be done on WSGI Message-ID: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> I don't know if it's possible for us to get these items together in time for 2.4; if we don't, we don't. There's little harm in having a separate 'wsgi' distribution until 2.5 rolls around. I'm thinking the package should include: * BaseHTTPServer-based WSGI server * CGI-based WSGI gateway (run WSGI apps under CGI) * WSGI app that wraps CGI applications so they can run under WSGI * Utility routines to fulfill certain parts of the spec's requirements * HTTP/1.1 practice guidelines, and utility routines where appropriate * Documentation This looks like quite a list to do in just a few days, despite the fact that we have skeleton implementations of the first four items and part of the fifth. And that's completely ignoring these currently outstanding issues in the PEP itself: * List-of-tuples vs. email.Message for outgoing headers * Exception handling Plus, I'm a couple days behind in updates to reflect the SIG's current consensus on other outstanding issues, and haven't done anything to separate the HTTP/1.1 guidelines out. Anyway, we really need to finish the outstanding open issues, because until the spec is firm on those items, we're coding on sand in those areas. I personally would like to use email.Message, and I'm even tempted to make 'Status' a header, so that it's just 'start_response(headers)' instead of 'start_response(status,headers)'. The Content-Transfer-Encoding boilerplate is only needed by servers and gateways, and I don't think adding another two lines of code creates a big burden there. But it makes middleware's job a lot easier: just add or modify headers, rather than having to turn the sequence of headers into some other structure and back again, or having to write utility routines to duplicate the functionality already in email.Message. With regard to exception handling, Ian has pointed out that it's hard for middleware to trap exceptions well, because it can't tell whether the next app down the chain has written headers yet, unless it replaces 'start_response', which then means it disables any advanced server APIs. After thinking about this for a while, I'm having trouble seeing a problem with that. Specifically, exception-catching middleware *is* modifying the output mechanism, because it will change the output in that case. It doesn't seem to me that you can safely write exception-catching middleware that can work without disabling the use of extension APIs for application output. The only other thing that comes to mind is requiring servers to support multiple 'start_response' calls in some way that makes sense for exception handlers, while requiring it to still work in the case where an extension API has already been used for output. From ianb at colorstudy.com Sat Aug 28 00:09:11 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Sat Aug 28 00:09:22 2004 Subject: [Web-SIG] Re: Regarding the WSGI draft In-Reply-To: <5.1.1.6.0.20040827172056.025e6e50@mail.telecommunity.com> References: <5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com> <412E2C3E.7000900@kylotan.eidosnet.co.uk> <5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com> <5.1.1.6.0.20040827172056.025e6e50@mail.telecommunity.com> Message-ID: <412FB107.7020702@colorstudy.com> Phillip J. Eby wrote: >> I happened to be playing with just such a thing: >> >> http://colorstudy.com/cgi-bin/viewcvs.cgi/trunk/WSGI/pycgiwrapper.py?rev=206&view=log > > > Wow; you certainly thought this through more than I did. E.g. your > model for dealing with threads. The part for dealing with missing > threads/threading though should probably use dummy_threading, though. I figure if threads are missing, then we better not be in a wsgi.threaded environment, and if it's not threaded server then I don't use the threads or threading modules. > Also, I notice that you're using multiple 'environ' replacements for > some reason, even though any use of it is going to be wrapped in a > global thread lock. So, the '_environs' dictionary seems superfluous. > A minimal approach to adjusting the FieldStorage class could be: > > > cgi.FieldStorage.__init__.im_func.func_defaults=(None,None,"",environ,0,0) > > which will make the default environ be the one you want. Doing that, > and replacing 'os.environ' for the request's duration (and putting them > back when done) seem to be required. (Note that code that doesn't use > the 'cgi' module but directly checks os.environ won't work with the > current state of things.) Maybe other people don't use os.environ, but I always have a lot when doing cgi scripts, so I want to handle that case. It had actually never occurred to me before to access environ through the cgi module... > Hm. Actually, I just looked and it looks like you're not wrapping the > execution in the global mutex, but you really need to because not only > are sys.std* global, CGI apps aren't generally written to be multithreaded. Well, there's basically two code paths, the multithreaded and the multiprocess. I thought about the multithreaded more, but have only tested the multi-process. In both cases I try to replace sys.stdout and os.environ. I forgot to put things back the way they were with the multiprocess technique, so it's a little broken now. With the multithreaded case, it uses the thread ID to figure out what stream or environment you will be looking at, so it doesn't need a lock around run_script -- each thread sees a stdout and environ that is appropriate for it. I guess I could just change stdin too and not worry about fidding with the cgi module at all. Though it can cause problems. E.g., if instead of the cgi server passing sys.stdout.write, it passed: def write(s): sys.stdout.write(s) That would cause all sorts of problems. Unless it used sys.__stdout__.write(s); I don't know if that would be a good or bad style. That's what I did to work around my bug. Anyway, the whole thing is a bit of a hack, so I don't expect it to work seemlessly with all scripts or all servers, though hopefully without heroic modifications it would be possible. MoinMoin would be an excellent test, as I believe it is hopelessly bound to the cgi module, but would benefit nicely from running on a different environment, at least long-running multi-process. > Don't forget: WSGIServer has *not* been updated to PEP 333 yet; it's > still based on the old streaming approach! I've been too busy updating > the spec (and replying to every thread) to update the code. :) Hmm... I don't even remember what the old spec looked like anymore. I'll give it a look-see. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Sat Aug 28 02:00:18 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Sat Aug 28 02:00:25 2004 Subject: [Web-SIG] Stuff left to be done on WSGI In-Reply-To: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> References: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> Message-ID: <412FCB12.2030209@colorstudy.com> Phillip J. Eby wrote: > I don't know if it's possible for us to get these items together in time > for 2.4; if we don't, we don't. I can't imagine we would make it. Hopefully we produce something for 2.5, that can be installed on previous Python installations under the same name. (Not like optparse/optik) I would hope that we can come to some consensus and produce something useable before 2.5, with the understanding that it will be included in 2.5. I would kind of like to see a "web" package. > There's little harm in having a > separate 'wsgi' distribution until 2.5 rolls around. I'm thinking the > package should include: > > * BaseHTTPServer-based WSGI server > * CGI-based WSGI gateway (run WSGI apps under CGI) You've noted these are missing error handling. What kind were you thinking of specifically? There's exception handling, which seems straight forward. Spec compliance? Certainly an anal version of these servers should be written, that checks every type passed around, looks for common mistakes, etc. I don't know if the anal and the useable version need to be the same thing. Did you have any other error cases you were thinking of? > * WSGI app that wraps CGI applications so they can run under WSGI Two models -- one that optimistically tries to load the cgi module in a fake environment (what I did), plus another that actually runs any CGI script. And maybe another one that forks and ultimately dies, can run any Python CGI script, but saves some startup time. But that last one isn't that important. > * Utility routines to fulfill certain parts of the spec's requirements > * HTTP/1.1 practice guidelines, and utility routines where appropriate > * Documentation > > This looks like quite a list to do in just a few days, despite the fact > that we have skeleton implementations of the first four items and part > of the fifth. And that's completely ignoring these currently > outstanding issues in the PEP itself: > > * List-of-tuples vs. email.Message for outgoing headers > * Exception handling > > Plus, I'm a couple days behind in updates to reflect the SIG's current > consensus on other outstanding issues, and haven't done anything to > separate the HTTP/1.1 guidelines out. > > Anyway, we really need to finish the outstanding open issues, because > until the spec is firm on those items, we're coding on sand in those areas. > > I personally would like to use email.Message, and I'm even tempted to > make 'Status' a header, so that it's just 'start_response(headers)' > instead of 'start_response(status,headers)'. The > Content-Transfer-Encoding boilerplate is only needed by servers and > gateways, and I don't think adding another two lines of code creates a > big burden there. But it makes middleware's job a lot easier: just add > or modify headers, rather than having to turn the sequence of headers > into some other structure and back again, or having to write utility > routines to duplicate the functionality already in email.Message. If we use email.Message, using a status header seems fine. If not, I think it should be separate -- I don't want to search a list for the status header. I don't think the utility functions are a big deal at all, and I worry that there's some gotchas to email.Message, specifically where it is intended for email. So I'm certainly not adamantly opposed to email.Message, but I'm not adamantly for it either. I'd rather see a superclass of email.Message (such a superclass does not yet exist, but should be easy to write/extract) that is more minimal. > With regard to exception handling, Ian has pointed out that it's hard > for middleware to trap exceptions well, because it can't tell whether > the next app down the chain has written headers yet, unless it replaces > 'start_response', which then means it disables any advanced server APIs. > > After thinking about this for a while, I'm having trouble seeing a > problem with that. Specifically, exception-catching middleware *is* > modifying the output mechanism, because it will change the output in > that case. It doesn't seem to me that you can safely write > exception-catching middleware that can work without disabling the use of > extension APIs for application output. To me it doesn't feel like the middleware is modifying the output. It is augmenting the output in a case where there has been an unexpected failure. I guess that could cause a problem, but then I think any middleware that is sensitive to the response being modified must still always allow for extra response coming in through normal channels. But, I don't know. I'm still up in the air. Really, I just don't like wrapping start_response, from a mechanical point of view. It feels awkward to me. I wish I could just query the server as to what point in the response it is at. > The only other thing that comes to mind is requiring servers to support > multiple 'start_response' calls in some way that makes sense for > exception handlers, while requiring it to still work in the case where > an extension API has already been used for output. That seems too hard. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From mnot at mnot.net Sat Aug 28 02:11:30 2004 From: mnot at mnot.net (Mark Nottingham) Date: Sat Aug 28 02:11:33 2004 Subject: [Web-SIG] Stuff left to be done on WSGI In-Reply-To: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> References: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> Message-ID: I'd be inclined to keep a separation between status and headers, so that one doesn't have to worry about collisions, namespace pollution, etc. For that matter, my preference would be for environ to be split into (environ, request_method, request_url, request_headers) or similar. However, I know it's late, and I don't want to hold things up. email.Message seems like a reasonable thing to do. On Aug 27, 2004, at 2:59 PM, Phillip J. Eby wrote: > I personally would like to use email.Message, and I'm even tempted to > make 'Status' a header, so that it's just 'start_response(headers)' > instead of 'start_response(status,headers)'. -- Mark Nottingham http://www.mnot.net/ From ianb at colorstudy.com Sat Aug 28 02:22:11 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Sat Aug 28 02:22:15 2004 Subject: [Web-SIG] Stuff left to be done on WSGI In-Reply-To: References: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> Message-ID: <412FD033.6060607@colorstudy.com> Mark Nottingham wrote: > For that matter, my preference would be for environ to be split into > (environ, request_method, request_url, request_headers) or similar. > However, I know it's late, and I don't want to hold things up. I think this make pass-through a bit harder, which I imagine could be fairly common. And it also would add redundancy, since environ as define by CGI contains all those other objects. If not CGI variables, then we wouldn't be building on any particular spec. Also, request_url isn't actually part of environ right now. Instead there is SCRIPT_NAME and PATH_INFO, which provides important information about how to parse the URL. There's also the (optional) REQUEST_URI, which I think is useful, but only advisory. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From mnot at mnot.net Sat Aug 28 02:54:00 2004 From: mnot at mnot.net (Mark Nottingham) Date: Sat Aug 28 02:54:07 2004 Subject: [Web-SIG] Stuff left to be done on WSGI In-Reply-To: <412FD033.6060607@colorstudy.com> References: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> <412FD033.6060607@colorstudy.com> Message-ID: Could you expand on the problems that would be encountered with pass-through? I don't think it would add redundancy in the CGI case, it would just require CGI WSGI servers to remove http headers from the environment and put them in the proper data structure. WRT URIs, my preference (once again, just stating what I'd do, not saying that I think this MUST change) would be to base it on the underlying specs, and not make it so CGI-centric; i.e., have 'abs_path' and 'query' (these are the BNF productions in both 2396 and 2616), and that's it; anything else (e.g., script location) would be in the environment, and probably server-specific. Cheers, On Aug 27, 2004, at 5:22 PM, Ian Bicking wrote: > Mark Nottingham wrote: >> For that matter, my preference would be for environ to be split into >> (environ, request_method, request_url, request_headers) or similar. >> However, I know it's late, and I don't want to hold things up. > > I think this make pass-through a bit harder, which I imagine could be > fairly common. And it also would add redundancy, since environ as > define by CGI contains all those other objects. If not CGI variables, > then we wouldn't be building on any particular spec. > > Also, request_url isn't actually part of environ right now. Instead > there is SCRIPT_NAME and PATH_INFO, which provides important > information about how to parse the URL. There's also the (optional) > REQUEST_URI, which I think is useful, but only advisory. > > -- > Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org > -- Mark Nottingham http://www.mnot.net/ From floydophone at gmail.com Sat Aug 28 03:56:53 2004 From: floydophone at gmail.com (Peter Hunt) Date: Sat Aug 28 03:56:58 2004 Subject: [Web-SIG] My port of jonpy to current WSGI draft Message-ID: <6654eac40408271856228642c2@mail.gmail.com> I've taken the liberty to add a jonpy adapter to WSGI. In short, it works. It's an example of a high-level, servlet interface to WSGI, and it allows you to write real WSGI apps _now_, and also supports apps that were written to run on other platforms. My hello, world executed alright, though I havent sufficiently tested it yet. From the looks of it, however, I think it's a complete implementation. The only design issue I see is that it doesn't use yielding; it is push. The attached files are: - wsgicgi.py - the run_with_cgi method described in the pre-PEP, except I fixed a typo and fixed some issues regarding the blank line between the headers and content, and commented out the non-standard Status: header. - test.cgi - the hello, world test script, taken verbatim from the jonpy website - jonpy_wsgi.py - the jonpy middleware I wrote I know there's no unit tests or comments or docs, but it allows many real world apps to run on WSGI _today_. Tested on Windows XP, and IIS (so sue me ;) ). -------------- next part -------------- import os, sys def run_with_cgi(application): environ = {} environ.update(os.environ) environ['wsgi.input'] = sys.stdin environ['wsgi.errors'] = sys.stderr environ['wsgi.version'] = '1.0' environ['wsgi.multithread'] = False environ['wsgi.multiprocess'] = True def start_response(status,headers): #print "Status:", status for key,val in headers: print "%s: %s" % (key,val) print return sys.stdout.write result = application(environ, start_response) if result: try: for data in result: sys.stdout.write(data) finally: if hasattr(result,'close'): result.close() -------------- next part -------------- from jon import cgi class WSGIRequest(cgi.Request): """An implementation of Request which is also a WSGI app.""" def __init__(self, handler): cgi.Request.__init__(self, handler) def __call__(self, environ, start_response): self.environ = environ self.stdin = environ['wsgi.input'] self.start_response = start_response self._writefunc = None cgi.Request._init(self) self.process() def process(self): """Execute the handler""" self._init() try: handler = self._handler_type() except: self.traceback() else: try: handler.process(self) except: handler.traceback(self) self.close() def output_headers(self): self._writefunc = self.start_response("200 OK", self._headers) def error(self, s): self.environ['wsgi.error'].write(s) def _write(self, s): assert self._writefunc != None self._writefunc(s) #def simple_app(environ, start_response): # """Simplest possible application object""" # status = '200 OK' # headers = [('Content-type','text/plain')] # write = start_response(status, headers) # write('Hello world!\n') -------------- next part -------------- #!/usr/bin/env python from jon import cgi import wsgicgi import jonpy_wsgi # bit redundant: cgi->wsgi->cgi class Handler(cgi.Handler): def process(self, req): req.set_header("Content-Type", "text/plain") req.write("Hello, %s!\n" % req.params.get("greet", "world")) wsgicgi.run_with_cgi(jonpy_wsgi.WSGIRequest(Handler)) From floydophone at gmail.com Sat Aug 28 03:58:56 2004 From: floydophone at gmail.com (Peter Hunt) Date: Sat Aug 28 03:59:02 2004 Subject: [Web-SIG] Whoops! Quick little patch to my previous post Message-ID: <6654eac404082718585af408ab@mail.gmail.com> jonpy_wsgi.py: I added a redundant call to cgi.Request._init(self) in __call__; feel free to remove it. From floydophone at gmail.com Sat Aug 28 04:42:34 2004 From: floydophone at gmail.com (Peter Hunt) Date: Sat Aug 28 04:42:37 2004 Subject: [Web-SIG] My WSGIHTTPServer implementation Message-ID: <6654eac404082719426d43aaf4@mail.gmail.com> As the PEAK one is horribly out of date, I decided to implement a new one. I don't know if this is exactly the interface you want, but it's a start. The way it works is you write a .py script, and include a module-level "application" callable, which is your WSGI application. It will execute it from there. It's horribly insecure, but should help with people testing their WSGI apps. The docs are scarce. Attached is the implementation as well as the example app from the PEP. -------------- next part -------------- #!/usr/bin/env python def application(environ, start_response): """Simplest possible application object""" status = '200 OK' headers = [('Content-type','text/plain')] write = start_response(status, headers) write('Hello world!\n') -------------- next part -------------- """WSGI-savvy HTTP Server. SECURITY WARNING: DON'T USE THIS CODE UNLESS YOU ARE INSIDE A FIREWALL -- it may execute arbitrary Python code or external programs. """ __version__ = "0.4" __all__ = ["WSGIHTTPRequestHandler"] import os import sys import urllib import BaseHTTPServer import SimpleHTTPServer import select import traceback class WSGIHTTPRequestHandler(SimpleHTTPServer.SimpleHTTPRequestHandler): """Complete HTTP server with GET, HEAD and POST commands. GET and HEAD also support running CGI scripts. The POST command is *only* implemented for CGI scripts. """ # Make rfile unbuffered -- we need to read one line and then pass # the rest to a subprocess, so we can't use buffered input. rbufsize = 0 def do_POST(self): """Serve a POST request. This is only implemented for CGI scripts. """ if self.is_cgi(): self.run_cgi() else: self.send_error(501, "Can only POST to CGI scripts") def do_GET(self): if self.is_cgi(): self.run_cgi() else: SimpleHTTPServer.SimpleHTTPRequestHandler.do_GET(self) def send_head(self): """Version of send_head that support CGI scripts""" if self.is_cgi(): return self.run_cgi() else: return SimpleHTTPServer.SimpleHTTPRequestHandler.send_head(self) def is_cgi(self): """Test whether self.path corresponds to a Python script """ path = self.path if "?" in path: path = path[:path.rfind("?")] if self.is_python(path): i = path.rfind("/") if i == -1: self.cgi_info = "/",path else: self.cgi_info = path[:i],path[i+1:] #self.cgi_info = os.path.split(path)#path[:-1], path[-1] return True else: return False def is_python(self, path): """Test whether argument path is a Python script.""" head, tail = os.path.splitext(path) return tail.lower() in (".py", ".pyw") def run_cgi(self): """Execute a CGI script.""" dir, rest = self.cgi_info i = rest.rfind('?') if i >= 0: rest, query = rest[:i], rest[i+1:] else: query = '' i = rest.find('/') if i >= 0: script, rest = rest[:i], rest[i:] else: script, rest = rest, '' scriptname = dir + '/' + script scriptfile = self.translate_path(scriptname) if not os.path.exists(scriptfile): self.send_error(404, "No such CGI script (%s)" % `scriptname`) return if not os.path.isfile(scriptfile): self.send_error(403, "CGI script is not a plain file (%s)" % `scriptname`) return ispy = self.is_python(scriptname) if not ispy: self.send_error(403, "CGI script is not a Python script (%s)" % `scriptname`) return # Reference: http://hoohoo.ncsa.uiuc.edu/cgi/env.html # XXX Much of the following could be prepared ahead of time! env = {} env['SERVER_SOFTWARE'] = self.version_string() env['SERVER_NAME'] = self.server.server_name env['GATEWAY_INTERFACE'] = 'CGI/1.1' env['SERVER_PROTOCOL'] = self.protocol_version env['SERVER_PORT'] = str(self.server.server_port) env['REQUEST_METHOD'] = self.command uqrest = urllib.unquote(rest) env['PATH_INFO'] = uqrest env['PATH_TRANSLATED'] = self.translate_path(uqrest) env['SCRIPT_NAME'] = scriptname if query: env['QUERY_STRING'] = query host = self.address_string() if host != self.client_address[0]: env['REMOTE_HOST'] = host env['REMOTE_ADDR'] = self.client_address[0] # XXX AUTH_TYPE # XXX REMOTE_USER # XXX REMOTE_IDENT if self.headers.typeheader is None: env['CONTENT_TYPE'] = self.headers.type else: env['CONTENT_TYPE'] = self.headers.typeheader length = self.headers.getheader('content-length') if length: env['CONTENT_LENGTH'] = length accept = [] for line in self.headers.getallmatchingheaders('accept'): if line[:1] in "\t\n\r ": accept.append(line.strip()) else: accept = accept + line[7:].split(',') env['HTTP_ACCEPT'] = ','.join(accept) ua = self.headers.getheader('user-agent') if ua: env['HTTP_USER_AGENT'] = ua co = filter(None, self.headers.getheaders('cookie')) if co: env['HTTP_COOKIE'] = ', '.join(co) # XXX Other HTTP_* headers # Since we're setting the env in the parent, provide empty # values to override previously set values for k in ('QUERY_STRING', 'REMOTE_HOST', 'CONTENT_LENGTH', 'HTTP_USER_AGENT', 'HTTP_COOKIE'): env.setdefault(k, "") env.update(os.environ) # now, set WSGI vars env['wsgi.input'] = self.rfile env['wsgi.errors'] = sys.stderr env['wsgi.version'] = '1.0' env['wsgi.multithread'] = False env['wsgi.multiprocess'] = True decoded_query = query.replace('+', ' ') try: ns = {} execfile(scriptfile,ns,ns) ns["application"](env, self.start_response) except: traceback.print_exc(file=sys.stderr) self.log_error("WSGI script could not be executed.") def start_response(self, status, headers): code,desc = status.split(" ",1) self.send_response(int(code), desc) for k,v in headers: self.wfile.write("%s: %s\r\n" % (k,v)) self.wfile.write("\r\n") return self.wfile.write def test(HandlerClass = WSGIHTTPRequestHandler, ServerClass = BaseHTTPServer.HTTPServer): SimpleHTTPServer.test(HandlerClass, ServerClass) if __name__ == '__main__': test() From pje at telecommunity.com Sat Aug 28 05:13:43 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Aug 28 05:13:34 2004 Subject: [Web-SIG] Stuff left to be done on WSGI In-Reply-To: <412FCB12.2030209@colorstudy.com> References: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040827224215.027473f0@mail.telecommunity.com> At 07:00 PM 8/27/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>I don't know if it's possible for us to get these items together in time >>for 2.4; if we don't, we don't. > >I can't imagine we would make it. You're probably right; it's just so tantalizingly close, as AMK mentioned. >I would hope that we can come to some consensus and produce something >useable before 2.5, with the understanding that it will be included in >2.5. I would kind of like to see a "web" package. I think we'll have better luck with a 'wsgi' package, but I could be wrong. 'web' just seems like a nuisance attractor for all sorts of unproductive bickering on so many levels. On a more immediate practical level, we'd be crazy to try to claim 'web' for a third-party package that we want to propose for the stdlib, but a package named 'wsgi' would be more than fair game. >>There's little harm in having a separate 'wsgi' distribution until 2.5 >>rolls around. I'm thinking the package should include: >> * BaseHTTPServer-based WSGI server >> * CGI-based WSGI gateway (run WSGI apps under CGI) > >You've noted these are missing error handling. What kind were you >thinking of specifically? > >There's exception handling, which seems straight forward. Well, to be honest, I haven't a clue what one does about errors *after* the headers are written. You can't send anything useful to the client, because the status is already set. If you sent a Content-Length, you can break the connection before that point, and it's a fair guess the client will know something's wrong. If you *didn't* send a content length and break the connection, the client gets an incomplete file and maybe doesn't know it. Sending an error message once 'write()' has been called will garble the output. All of these options are especially unsatisfactory when binary files are involved, where "unsatisfactory" could mean anything from "annoying" to "catastrophic" (e.g. garbling an executable). > Spec compliance? Certainly an anal version of these servers should be > written, that checks every type passed around, looks for common mistakes, > etc. I don't know if the anal and the useable version need to be the > same thing. I wasn't even addressing spec compliance, although test suites for all the implementations, factored so that they could be used as a basis for testing other implementations, would certainly be nice. >Two models -- one that optimistically tries to load the cgi module in a >fake environment (what I did), plus another that actually runs any CGI script. I'm not following what the difference is, exactly, but I guess we'll need to get into the design more. >If we use email.Message, using a status header seems fine. If not, I >think it should be separate -- I don't want to search a list for the >status header. Right, that's all I was thinking. >I don't think the utility functions are a big deal at all, and I worry >that there's some gotchas to email.Message, specifically where it is >intended for email. So I'm certainly not adamantly opposed to >email.Message, but I'm not adamantly for it either. I'd rather see a >superclass of email.Message (such a superclass does not yet exist, but >should be easy to write/extract) that is more minimal. Why don't you take a look at the code? I have. Here are the methods: as_string, __str__ -- format the message as a string is_multipart -- returns true if payload has been set to a list get_unixfrom/set_unixfrom, add_payload/set_payload/get_payload/attach, get_charsets, walk -- stuff for manipulating parts of the message we don't care about. set_charset/get_charset -- sets the character set parameters of the content-type, which is actually useful. On the down side, setting the character set sets MIME-Version, but it also sets the Content-Transfer-Encoding, so it doesn't force the server to default one. __len__, __getitem__, __setitem__, __delitem__, __contains__, has_key, get, keys, values, items -- case-insensitive dictionary-like interface (i.e., the stuff we mainly want) get_all -- all values for a header name add_header, replace_header -- more stuff we want get_type, get_main_type, get_subtype, get_content_type, get_content_maintype, get_content_subtype, get_content_subtype, get_param, get_params, set_param, del_param, set_type, get_boundary, set_boundary, get_content_charset -- miscellaneous content-type analysis and manipulation. Not necessarily very helpful, except maybe for middleware. But they hardly hurt. get_filename -- extract filename from Content-Disposition if present. Not particularly helpful, but also not damaging in any way. Perhaps more eyes should look at this, but I haven't found anything in here that's damaging or even annoying apart from setting MIME-Version if it's not there and the content-type is touched. >But, I don't know. I'm still up in the air. Really, I just don't like >wrapping start_response, from a mechanical point of view. It feels >awkward to me. I wish I could just query the server as to what point in >the response it is at. Well, we could offer a facility for that, but first I'd like to explore what error handling should *do* in different situations. >>The only other thing that comes to mind is requiring servers to support >>multiple 'start_response' calls in some way that makes sense for >>exception handlers, while requiring it to still work in the case where an >>extension API has already been used for output. > >That seems too hard. Well, to some extent we have to look at the question of what should happen in those circumstances anyway, whether we solve the problem in that specific way or not. Because if the application *does* call start_response more than once, the server has to be able to handle it *somehow*. Really, the ultimate error handling *has* to be done by servers, unless they want to take the route of crashing the entire process when something bad happens. :) From pje at telecommunity.com Sat Aug 28 05:18:17 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Aug 28 05:18:03 2004 Subject: [Web-SIG] Re: Regarding the WSGI draft In-Reply-To: <412FB107.7020702@colorstudy.com> References: <5.1.1.6.0.20040827172056.025e6e50@mail.telecommunity.com> <5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com> <412E2C3E.7000900@kylotan.eidosnet.co.uk> <5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com> <5.1.1.6.0.20040827172056.025e6e50@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040827231446.021cd010@mail.telecommunity.com> At 05:09 PM 8/27/04 -0500, Ian Bicking wrote: >Though it can cause problems. E.g., if instead of the cgi server passing >sys.stdout.write, it passed: > >def write(s): > sys.stdout.write(s) > >That would cause all sorts of problems. Unless it used >sys.__stdout__.write(s); I don't know if that would be a good or bad >style. That's what I did to work around my bug. There's another way... make the dummy file object put in for sys.stdout do this: def write(self,data): sys.stdout = self.__oldstdout__ try: self.wsgi_writefunc(data) finally: sys.stdout = self Voila. Now, even if the WSGI server is written to use stdout, it still works. The same trick can and should be used for stdin and stderr. It's messy, but it should suffice. Actually, to be a really decent emulation, the dummy stdout.write() should probably buffer the data, and look for flush() before calling the wsgi_writefunc. Assuming it's not still buffering headers. But I digress. Clearly, CGI is a pain in the, er... gateway. :) From pje at telecommunity.com Sat Aug 28 05:22:01 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Aug 28 05:21:49 2004 Subject: [Web-SIG] Stuff left to be done on WSGI In-Reply-To: References: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040827231832.030f8e60@mail.telecommunity.com> At 05:11 PM 8/27/04 -0700, Mark Nottingham wrote: >I'd be inclined to keep a separation between status and headers, so that >one doesn't have to worry about collisions, namespace pollution, etc. It's only a collision if some future version of HTTP decides to use 'Status:' as a response header, in which case CGI is in trouble. :) >For that matter, my preference would be for environ to be split into >(environ, request_method, request_url, request_headers) or similar. >However, I know it's late, and I don't want to hold things up. Don't worry about the lateness. Let's do it right. That having been said, I've previously mentioned these reasons for *not* doing request headers and suchlike: 1. lots of code in-the-field knows how to do sensible things with CGI variables, but not HTTP headers 2. HTTP doesn't differentiate between "target of this request" and "where the application is", but CGI does (SCRIPT_NAME + PATH_INFO) From ianb at colorstudy.com Sat Aug 28 06:03:28 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Sat Aug 28 06:03:38 2004 Subject: [Web-SIG] Re: Regarding the WSGI draft In-Reply-To: <5.1.1.6.0.20040827231446.021cd010@mail.telecommunity.com> References: <5.1.1.6.0.20040827172056.025e6e50@mail.telecommunity.com> <5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com> <412E2C3E.7000900@kylotan.eidosnet.co.uk> <5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com> <5.1.1.6.0.20040827172056.025e6e50@mail.telecommunity.com> <5.1.1.6.0.20040827231446.021cd010@mail.telecommunity.com> Message-ID: <41300410.8090404@colorstudy.com> Phillip J. Eby wrote: > At 05:09 PM 8/27/04 -0500, Ian Bicking wrote: > >> Though it can cause problems. E.g., if instead of the cgi server >> passing sys.stdout.write, it passed: >> >> def write(s): >> sys.stdout.write(s) >> >> That would cause all sorts of problems. Unless it used >> sys.__stdout__.write(s); I don't know if that would be a good or bad >> style. That's what I did to work around my bug. > > > There's another way... make the dummy file object put in for sys.stdout > do this: > > def write(self,data): > sys.stdout = self.__oldstdout__ > try: > self.wsgi_writefunc(data) > finally: > sys.stdout = self > > Voila. Now, even if the WSGI server is written to use stdout, it still > works. The same trick can and should be used for stdin and stderr. Hmm... possibly. Another thought I had was to buffer all the output, then only return as an iterator (or with a single call to the server's write function) when the application has finished. This way the only problem would be with server extensions, as no server code would normally be written while the script was running. Hrm, though that has its own problems if the script needs to stream output. Yours would be more general in that case. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Sat Aug 28 06:51:57 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Sat Aug 28 06:52:03 2004 Subject: [Web-SIG] Stuff left to be done on WSGI In-Reply-To: <5.1.1.6.0.20040827224215.027473f0@mail.telecommunity.com> References: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> <5.1.1.6.0.20040827224215.027473f0@mail.telecommunity.com> Message-ID: <41300F6D.4050607@colorstudy.com> Phillip J. Eby wrote: >> I would hope that we can come to some consensus and produce something >> useable before 2.5, with the understanding that it will be included in >> 2.5. I would kind of like to see a "web" package. > > > I think we'll have better luck with a 'wsgi' package, but I could be > wrong. 'web' just seems like a nuisance attractor for all sorts of > unproductive bickering on so many levels. > > On a more immediate practical level, we'd be crazy to try to claim 'web' > for a third-party package that we want to propose for the stdlib, but a > package named 'wsgi' would be more than fair game. I would only want to use "web" if we could get agreement that it would be in 2.5 under that name. I was thinking of it like a package for various Python web-related modules (the Next Generation; forgoing this current generation which is all in the root). Almost all the modules in the root have issues. Well, let's enumerate... webbrowser: this seems like a totally weird module to me cgi: ick ick ick. cgitb: this is okay. urllib: defunct? urllib2: surpisingly hard to use in a number of ways. There was some discussion about this early in Web-SIG. I think the client stuff John Lee has done at: http://wwwsearch.sourceforge.net/ is better, and I think he's interested in that direction. Probably not right now, but at some point this could well improve on urllib* httplib: actually okay, kind of; needed for some things that urllib can't do. But it also seems redundant in other ways. urlparse: like os.path, this is a rather annoying module to use, though I guess it works fine. I'd like to see something like Jason Orendorf's path module, but for URLs. BaseHTTPServer, SimpleHTTPServer, CGIHTTPServer: it seems odd that this is three modules. And none of the three actually claims to work that well. It's wonky. They're useful modules, but limited in scope. Cookie: weird interface. Has some insecure parts. I think mod_python differs mostly in that it has secure alternatives. xmlrpclib: a good module. SimpleXMLRPCServer: like the HTTPServers, seems a little odd. DocXMLRPCServer: what a weird module. robotparser: never knew this existed. HTMLParser: lives in the world between web and XML. Some of the client tools in wwwserver are very HTML-centric as well. But it all fits together. htmllib: deprecated, I think? Or HTMLParser? I don't know what's going on here. htmlentitydefs: another odd little module. Anyway, I think there's a case to be made for a new generation of web libraries, and a package to bring them together. I don't know if we need deeper hierarchy than that. E.g., web.wsgi.cgiadapter. I don't think so. I'd rather "WSGI" be a term only those in the know use -- it means nothing unless you expand the acronym, and even then it's pretty vague. Ultimately I hope most web programmers just don't need to think about any of it. >>> There's little harm in having a separate 'wsgi' distribution until >>> 2.5 rolls around. I'm thinking the package should include: >>> * BaseHTTPServer-based WSGI server >>> * CGI-based WSGI gateway (run WSGI apps under CGI) >> >> >> You've noted these are missing error handling. What kind were you >> thinking of specifically? >> >> There's exception handling, which seems straight forward. > > > Well, to be honest, I haven't a clue what one does about errors *after* > the headers are written. You can't send anything useful to the client, > because the status is already set. > > If you sent a Content-Length, you can break the connection before that > point, and it's a fair guess the client will know something's wrong. If > you *didn't* send a content length and break the connection, the client > gets an incomplete file and maybe doesn't know it. Sending an error > message once 'write()' has been called will garble the output. > > All of these options are especially unsatisfactory when binary files are > involved, where "unsatisfactory" could mean anything from "annoying" to > "catastrophic" (e.g. garbling an executable). Yes, you are right. Which means the catcher has to keep track of the headers that were sent if it hopes to do anything. In that case, it might check for text/html or text/plain; if not those two, then just stop the response short and log the error. If so, and if configured to show errors, then it could display them; cgitb goes to some length to make HTML render correctly. That makes me think that wrapping send_response is more reasonable. Though it makes error resolution in servers more complex. >> Spec compliance? Certainly an anal version of these servers should >> be written, that checks every type passed around, looks for common >> mistakes, etc. I don't know if the anal and the useable version need >> to be the same thing. > > > I wasn't even addressing spec compliance, although test suites for all > the implementations, factored so that they could be used as a basis for > testing other implementations, would certainly be nice. Yes, I've meant to work on this. I have a simple "echo" application that sends results based on the query; throwing errors, displaying text, displaying the environ, etc. I was thinking that along with a client could make a good structure for further testing. Then the echo application could be coded in different styles of application as well -- for instance, jonpy, and the same tests run. It would be useful for testing middleware as well. I'll try to give it a go sometime soon. >> Two models -- one that optimistically tries to load the cgi module in >> a fake environment (what I did), plus another that actually runs any >> CGI script. > > I'm not following what the difference is, exactly, but I guess we'll > need to get into the design more. One runner would actually fork a process and run the CGI script separately. This would be useful for, say, implementing CGIHTTPServer in terms of WSGI. It would always work, because it would actually run the script as a CGI script. >> I don't think the utility functions are a big deal at all, and I worry >> that there's some gotchas to email.Message, specifically where it is >> intended for email. So I'm certainly not adamantly opposed to >> email.Message, but I'm not adamantly for it either. I'd rather see a >> superclass of email.Message (such a superclass does not yet exist, but >> should be easy to write/extract) that is more minimal. > > > Why don't you take a look at the code? I have. Well good, now I don't need to ;) > Here are the methods: > > as_string, __str__ -- format the message as a string > > is_multipart -- returns true if payload has been set to a list Can you do this with HTTP? I know some MIME stuff works (like content-disposition: attachment; filename=blah). Would this work too? In a meaningful way? The cgi module has some weird MIME stuff in it that I don't think any web client has ever exercised. > get_unixfrom/set_unixfrom, add_payload/set_payload/get_payload/attach, > get_charsets, walk -- stuff for manipulating parts of the message we > don't care about. Yes. If these accidentally are used, will it effect the as_string representation? > set_charset/get_charset -- sets the character set parameters of the > content-type, which is actually useful. On the down side, setting the > character set sets MIME-Version, but it also sets the > Content-Transfer-Encoding, so it doesn't force the server to default one. Would that start opening up the possibility of accepting Unicode to write()/app_iter? > __len__, __getitem__, __setitem__, __delitem__, __contains__, has_key, > get, keys, values, items -- case-insensitive dictionary-like interface > (i.e., the stuff we mainly want) > > get_all -- all values for a header name > > add_header, replace_header -- more stuff we want Very good, though not hard to reimplement. > get_type, get_main_type, get_subtype, get_content_type, > get_content_maintype, get_content_subtype, get_content_subtype, > get_param, get_params, set_param, del_param, set_type, get_boundary, > set_boundary, get_content_charset -- miscellaneous content-type analysis > and manipulation. Not necessarily very helpful, except maybe for > middleware. But they hardly hurt. > > get_filename -- extract filename from Content-Disposition if present. > Not particularly helpful, but also not damaging in any way. Sure. > > Perhaps more eyes should look at this, but I haven't found anything in > here that's damaging or even annoying apart from setting MIME-Version if > it's not there and the content-type is touched. Okay, looking through the code briefly, I can't help but think that all the complex parts are parts we don't care about. A case-insensitive dictionary that accepts multiple values for a key isn't hard to implement. Certainly we could match the interface of email.Message where it applies. If it ended up in the standard library, that's fine -- it's one of those things people keep reinventing anyway, so a canonical implementation would be good. >>> The only other thing that comes to mind is requiring servers to >>> support multiple 'start_response' calls in some way that makes sense >>> for exception handlers, while requiring it to still work in the case >>> where an extension API has already been used for output. >> >> >> That seems too hard. > > > Well, to some extent we have to look at the question of what should > happen in those circumstances anyway, whether we solve the problem in > that specific way or not. Because if the application *does* call > start_response more than once, the server has to be able to handle it > *somehow*. Really, the ultimate error handling *has* to be done by > servers, unless they want to take the route of crashing the entire > process when something bad happens. :) Good question. I think servers should consider that an error, but they should handle that error gracefully. Which probably means keeping a "has send_response already been called" flag. Now, if I could get access to that flag from middleware... and maybe access to the headers and status that have already been sent... (and really, why not? We aren't worried about streaming headers like we are about bodies) -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Sat Aug 28 18:56:35 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Aug 28 18:56:42 2004 Subject: [Web-SIG] Stuff left to be done on WSGI In-Reply-To: <41300F6D.4050607@colorstudy.com> References: <5.1.1.6.0.20040827224215.027473f0@mail.telecommunity.com> <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> <5.1.1.6.0.20040827224215.027473f0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040828121031.03326890@mail.telecommunity.com> At 11:51 PM 8/27/04 -0500, Ian Bicking wrote: >I don't know if we need deeper hierarchy than that. E.g., >web.wsgi.cgiadapter. I don't think so. I'd rather "WSGI" be a term only >those in the know use -- it means nothing unless you expand the acronym, >and even then it's pretty vague. Ultimately I hope most web programmers >just don't need to think about any of it. Flat is better than nested; let's not mix other projects into this. The WSGI stuff will have enough content to deserve a package of its own, and we don't want it to be dependent upon a bunch of "next generation" stuff that's not even designed yet. >Yes, you are right. Which means the catcher has to keep track of the >headers that were sent if it hopes to do anything. In that case, it might >check for text/html or text/plain; if not those two, then just stop the >response short and log the error. If so, and if configured to show >errors, then it could display them; cgitb goes to some length to make HTML >render correctly. > >That makes me think that wrapping send_response is more reasonable. Though >it makes error resolution in servers more complex. I'm not sure I follow you. The error handling in the server would look just like the handling in middleware, no? In fact, this potentially sounds like a job for another boilerplate function in wsgi.util, or perhaps a class. I imagine we might have an AbstractWSGIServer that defines basic start-response, write, and other operations, with abstract methods for sending/receiving data to and from the client, and various overrideable methods for policy. The simple WSGIServer and CGI gateway would both derive from it, or perhaps delegate to it. >>Here are the methods: >>as_string, __str__ -- format the message as a string >>is_multipart -- returns true if payload has been set to a list > >Can you do this with HTTP? I know some MIME stuff works (like >content-disposition: attachment; filename=blah). Would this work too? In >a meaningful way? The cgi module has some weird MIME stuff in it that I >don't think any web client has ever exercised. The as_string/__str__ aren't really useful for HTTP, because they include the payload, and optionally a "unix from" line. They'd only be useful in debugging, just to dump out some info. >>get_unixfrom/set_unixfrom, add_payload/set_payload/get_payload/attach, >>get_charsets, walk -- stuff for manipulating parts of the message we >>don't care about. > >Yes. If these accidentally are used, will it effect the as_string >representation? Yes, which is why we don't need/care about them. >>set_charset/get_charset -- sets the character set parameters of the >>content-type, which is actually useful. On the down side, setting the >>character set sets MIME-Version, but it also sets the >>Content-Transfer-Encoding, so it doesn't force the server to default one. > >Would that start opening up the possibility of accepting Unicode to >write()/app_iter? In my view, no, because then we'd force the server to know about every possible encoding the client and app can come up with. If the app uses this, it should handle the encoding. We might want to include a utility routine or two to pull what the client accepts out of HTTP_ACCEPT et al. >>__len__, __getitem__, __setitem__, __delitem__, __contains__, has_key, >>get, keys, values, items -- case-insensitive dictionary-like interface >>(i.e., the stuff we mainly want) >>get_all -- all values for a header name >>add_header, replace_header -- more stuff we want > >Very good, though not hard to reimplement. But why should everybody reimplement it, if we're not going to be in the stdlib till 2005? >Okay, looking through the code briefly, I can't help but think that all >the complex parts are parts we don't care about. Not so; content-type parameter setting is quite handy. For example, if you're doing multipart push, you'll need e.g. set_boundary and get_boundary might also be useful. >>Well, to some extent we have to look at the question of what should >>happen in those circumstances anyway, whether we solve the problem in >>that specific way or not. Because if the application *does* call >>start_response more than once, the server has to be able to handle it >>*somehow*. Really, the ultimate error handling *has* to be done by >>servers, unless they want to take the route of crashing the entire >>process when something bad happens. :) > >Good question. I think servers should consider that an error, but they >should handle that error gracefully. Which probably means keeping a "has >send_response already been called" flag. > >Now, if I could get access to that flag from middleware... and maybe >access to the headers and status that have already been sent... (and >really, why not? We aren't worried about streaming headers like we are >about bodies) You dodged my question... what are you going to *do* with that? Because we need to formulate sensible error handling policies for the general case, including things like an I/O error due to the client disconnecting. Here are possible loci of error: * Before start_response is called (application error) * During start_response (server error or application error * After start_response, before first write (application error) * During a write (server error or application error) * Between writes, before return (application error) * After return/during iteration (application error) * During a post-return write (server error or application error) * During 'close()' (application error) The reason those are "server or application" is because start_response and write can fail due to bad data passed by the application, so it's really an application error in that case. The server might fail for some other reason, of course, like a lost client connection. One issue here is that an application or middleware error handler needs to know whether the error is the application's or the server's. It makes no sense for a failed write to cause a middleware error handler to attempt to write some more data! It seems we need an error parameter like: environ['wsgi.fatal_errors'] = SomeExceptionClass1, SomeExceptionClass2,... Such that one would use: try: # invoke child application, etc. except environ['wsgi.fatal_errors']: raise except: # regular error handling here In other words, an application or middleware component should abort if it receives one of these exception types. I'm inclined to think that application WSGI programming errors should be treated as fatal: if the app sends bad parameters to start_response or write, there's little point in proceeding further. From pje at telecommunity.com Sat Aug 28 05:33:04 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Aug 28 18:58:35 2004 Subject: [Web-SIG] Stuff left to be done on WSGI In-Reply-To: References: <412FD033.6060607@colorstudy.com> <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> <412FD033.6060607@colorstudy.com> Message-ID: <5.1.1.6.0.20040827232241.030fa080@mail.telecommunity.com> At 05:54 PM 8/27/04 -0700, Mark Nottingham wrote: >Could you expand on the problems that would be encountered with pass-through? > >I don't think it would add redundancy in the CGI case, it would just >require CGI WSGI servers to remove http headers from the environment and >put them in the proper data structure. > >WRT URIs, my preference (once again, just stating what I'd do, not saying >that I think this MUST change) would be to base it on the underlying >specs, and not make it so CGI-centric; i.e., have 'abs_path' and 'query' >(these are the BNF productions in both 2396 and 2616), and that's it; >anything else (e.g., script location) would be in the environment, and >probably server-specific. And now every framework that's already based on parsing CGI variables is stuck having to write code to turn all that stuff (including your "server-specific" application location) into CGI variables. *And* we get to write something to take the CGI variables and turn them into this other format, and throw away the script location so the script can try to figure it back out again. Why reinvent the wheel, when CGI has already shown itself to be of practical use for this? From jjl at pobox.com Sat Aug 28 19:18:57 2004 From: jjl at pobox.com (John J Lee) Date: Sat Aug 28 19:19:01 2004 Subject: [Web-SIG] Stuff left to be done on WSGI In-Reply-To: <41300F6D.4050607@colorstudy.com> References: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> <5.1.1.6.0.20040827224215.027473f0@mail.telecommunity.com> <41300F6D.4050607@colorstudy.com> Message-ID: On Fri, 27 Aug 2004, Ian Bicking wrote: > Phillip J. Eby wrote: [...] > > wrong. 'web' just seems like a nuisance attractor for all sorts of > > unproductive bickering on so many levels. [...] > be in 2.5 under that name. I was thinking of it like a package for > various Python web-related modules (the Next Generation; forgoing this > current generation which is all in the root). +0 for web as bag-of-modules Seems uncontroversial, since anybody with a web module has an equal right to lay claim to a patch of land within it. OTOH, I've never found the Python practise of sticking all stdlib modules in the root namespace to be troublesome. And the reality is that there is no grand scheme here: people generally do small pieces of work as they find they need / want to do it. > Almost all the modules in the root have issues. Well, let's enumerate... > > webbrowser: this seems like a totally weird module to me Why? > cgi: ick ick ick. > cgitb: this is okay. > urllib: defunct? It's not about to go away. (especially since Guido wrote it, I think ;-) Unfortunately, I think there enough bugs in both urllib and urllib2 that it's hard to say that either is unconditionally better for all purposes. > urllib2: surpisingly hard to use in a number of ways. There was some > discussion about this early in Web-SIG. I think the client stuff John > Lee has done at: http://wwwsearch.sourceforge.net/ is better, and I > think he's interested in that direction. Probably not right now, but at > some point this could well improve on urllib* This is what I hope to do on urllib2 for 2.5, very roughly in order of priority. I guess you're referring above mostly to 3 in this list. 1, 2 and 3 will likely happen, 4, 5, and 6 may or may not. Help is welcome :-) 1 Add more handlers from ClientCookie: Robot rules, http-equiv, refresh, etc. 2 Add features that are present in urllib but missing from urllib2 (urlretrieve is the most obvious, and easy to fix). 3 A class bearing some resemblance to mechanize.UserAgent, as we discussed here before. The idea is to avoid having to make a new object each time you want to change URL-opener behaviour. 4 Possibly improve proxy, authentication support, if I can be bothered. I think this is probably still quite buggy, despite valuable changes from Anthony Baxter and others. 5 Connection caching. 6 HEAD, GET byte range (and maybe something to make resuming downloads as easy as possible), conditional GET requests, a function to do file uploads. [...] > DocXMLRPCServer: what a weird module. Weird indeed. Never noticed it before. [...] > HTMLParser: lives in the world between web and XML. Some of the client > tools in wwwserver are very HTML-centric as well. But it all fits together. > htmllib: deprecated, I think? Or HTMLParser? I don't know what's going > on here. As you probably know, htmllib just adds some possibly-convenient bits and pieces on top of sgmllib. sgmllib/htmllib is more relaxed about bad HTML than is HTMLParser, so is certainly worth keeping. John From py-web-sig at xhaus.com Mon Aug 30 02:32:23 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Mon Aug 30 02:54:40 2004 Subject: [Web-SIG] My experiences implement WSGI on java/j2ee/jython. Message-ID: <41327597.5060909@xhaus.com> Dear Web-Sig, Firstly, I must say, I am totally impressed with the WSGI initiative. At first at wasn't clear how such low level structures could improve the fragmented situation with python-web frameworks. But now that I've spend some time implementing a framework that complies with the spec, I understand it a *lot* better, and can see a lot of it's benefits. Secondly, I must apologise in advance for the length of this post :-) I decided to write a java/j2ee/jython framework which layers WSGI on top of java servlets. I decided this for a number of reasons - Because I want WSGI to succeed, and in open-source chances of success are greatly enhanced by running code. - Because jython needs to be included in WSGI from the ground up. - Because cpython and jython should be able to share web components. - Because WSGI needs testing against as many server architectures as possible. - Because the best way to test the quality and usability of a spec is to write software that implements it. - Because I pray for the day when we can pick and mix capabilities from the huge wealth of python web frameworks out there. - Because J2EE (i.e. traditional servlets) are sometimes far too restrictive, in terms of the way they handle cookies, authorisation, etc, and require configuring lots of XML files, which can be a pain: I don't like coding in XML, I like coding in python, where I can keep my configuration all in an appropriate format. - Because I want cpythonistas to keep jython in mind. - Because someone had to do it :-), and I do J2EE and jython stuff all the time in my work - Because WSGI was small enough to implement in a day or two. - A load of other good reasons. My code is not ready for release. I only spent yesterday writing it: it's not big, approx 500 lines of java. But I haven't even compiled it yet, so it's got loads of syntax errors, no comments, no documentation, etc. I expect the compilation and debugging to take a day or two. However, I'm ridiculously busy at the moment, and really can't spare much time. The fact that I sacrificed my weekend to get jython WSGI up-and-running quickly may give you an idea of how important I consider the WSGI initiative. I promise I'll release my code by next weekend, whatever state it's in. If it's not 100% running, it'll be 90+% running, at least. My design for the moment is really just to show a proof of concept, and a bare-bones framework. The framework will simply allow, through configuration, the user to map an URL python file, and to specify the name of a callalble object within that file, which will obviously be the application. Application objects will be cached, based on the filename they came from. The request will be dispatched to the application in a WSGI compliant way. Simple. For the moment, I'm taking the easy way out, in relation to things like threading guarantees. Anything that asks to be single-threaded will still use a single instance, but calls from multiple threads will be synchronized on that single object, which wouldn't really work in a production framework. As WSGI evolves, I'll make these kinds of facilities more robust, scalable. I don't see the point yet in trying to build any more facilities into my framework, e.g. url->object mapping, session management, page-template management components, authorization, etc. Hopefully, all of these facilities will become available as WSGI middleware components, written in nice python: not java, or nasty apache conf files, or servlet container XML files, blah, blah, blah. Anyway, while was writing my thing (with printed WSGI spec in hand, covered in annotations, tick marks and red ink :-), I came across a few points in the spec that I'd like to raise about things that are either observations, or things that are incompletely specified, or that induce me to misunderstand, or seem just right or wrong. Also, I've spent today catching up with the web-sig archives, to review everyone's comments (now that I'm in a position to understand them), and to make sure that I'm not trolling over old ground. So I've added one or two points of my own, based on reading those archives. Hopefully some of them will be useful. Lastly, does have anyone have any name suggestions for a java/j2ee/jython WSGI-compliant framework? I've been think along the lines of "modjy", but I'm open to better ones :-) So on to the points/questions. 0. On choice of CGI as a basis. =============================== My experience with J2EE has clearly demonstrated to me that CGI is the right choice to base WSGI upon. The J2EE servlet spec has a specific method to return every single CGI variable: the specs even mention "this method returns the same as the CGI varibale "SCRIPT_NAME", etc. My job as "translator" couldn't have been easier. I expect that many other containers/frameworks will also support the CGI spec in this way. 1. Default values of environment variables when not present. ============================================================ The spec says that compulsory environment variables, for example "CONTENT_LENGTH" or "CONTENT_TYPE", must have a value, i.e. "must be present, but may be an empty string, if there is no more appropriate value for them". I read "empty string" to mean "". There are obviously two different choices for how to represent values for headers/env-vars that are not present in the request, i.e. 1. an empty string as described above or 2. as a python None value. It seems more correct to me to use the latter option, None, for when the header/env-var is not available, i.e. the client did not send it. This allows the use of the "" value to indicate (the admittedly rare and malformed case) that the client sent the header name, but did not specify a header value. If WSGI uses the empty string for both cases, then we lose the ability to distinguish between when the header was sent with no value, and when it wasn't sent at all . I don't think it's a big deal losing that ability, but I could imagine that there might be, for example, some security application that might like to have access to that information. For simplicity of the spec, and robustness of servers/apps running on WSGI, I understand why it is a good thing to make the default values as robust as possible, i.e. in case some app author tries to use a header value without checking if it is None first. I suppose I'm really pointing out a possible wording difficulty in the spec, which says "may be an empty string, if there is no more appropriate value". To me None is "a more appropriate value" sometimes, so I suppose I could legitimately interpret that to mean that I can use None values in my WSGI-compliant framework, because my server infrastructure allows me to detect their absence or lack of value. So perhaps either the wording of the spec needs to be tightened up to exclude this? Or the default environment values need to be more clearly specified? Or perhaps a discussion of None vs. empty string needs to added to the Q&A at the end? 2. The SCRIPT_NAME variable. ============================ At first I was a little wary of the SCRIPT_NAME variable, and how I would construct it, until I realised that the beginning of the URL->Callable mapping is outside the scope of WSGI: it is in the control of whichever program/process/container is receiving HTTP requests through sockets from the client, and resolving/dispatching them according to its configuration files: in my case that was a J2EE container, e.g. Tomcat. The J2EE call that returns a value equivalent to the CGI SCRIPT_NAME variable is HTTPServletRequest.getServletPath method. It is an interesting note on it which says that "This method will return an empty string ("") if the servlet used to process this request was matched using the "/*" pattern." Which seems a little odd, until you realise that the SCRIPT_NAME = "" case is when the application object is responsible for dealing with the entire URL space. Maybe it's worth adding a note to this effect in the WSGI spec as well? It helped me understand things better. An idea occurs to me for a nice little reusable WSGI middleware component which is a URI mapper, with functionality akin to apache mod_rewrite, resolving URIs to python callable's. A lot of frameworks like to do things with URL rewriting and mapping, in order to present a nice clean URL interface to a tree of objects. Quixote is one such framework that likes to have crisp URLs. But much of the time installing such frameworks requires configuring apache and invoking mod_rewrite and its "cool voodoo" to get the job done. Which can be difficult to debug and get working, and scares newbies. (On re-reading the spec, and the mailing list, I see I'm not the only one to have thought of such a uri mapping component :-) If I wrote such a reusable mapping component, I could then simply configure my entire "container", e.g. Apache, Tomcat, etc, etc, to simply resolve all requests for a URL hierarchy to my python component, and nice-n-easy python code takes care of it from there, no mod_rewrite rules, no complex java servlets mapping algorithm: just python. A big win in terms of both installation simplicity and portability, since that standard component could then be used across all WSGI frameworks and the containers in which they live. I like this WSGI idea :-) 3. Status code and message. =========================== The WSGI spec states that the status value passed to start_response should be of the form "999 Message here". That's fine, I can parse up the string easily enough to get the java data types I need to send to the container. However, J2EE does not allow me to set the message string: I can only set the status code, and that must have an integer value. So, in terms of compliance with WSGI, am I in violation of the WSGI spec by not transmitting the actual textual status message specified by the application? If that's a problem, there's nothing I can do about it. I wonder how often this will be the case with other server/container frameworks? 4. Binary vs. textual writing. ============================== Normally, python opens a file in text mode, line-ending translation takes place on all python strings written to the file, changing '\n' to whatever is the appropriate local line-ending. This is not noticeable on *nix, since *nix uses the same line-ending character as python, '\n', so no translation is necessary. This means that people running python on *nix can write binary data through channels opened in text mode. On other platforms though, namely Windows and MacOS, different line-endings are used, and python's '\n' gets translated to '\r\n' and '\r' respectively. Which corrupts binary files, e.g. .jpg, .gif, if they contain '\n'. So Windows and MacOS python users must open files explicitly in binary mode if they want to avoid this translation. It is fundamental requirement (to me at least) that WSGI be able to handle writing of binary data. And I'm fairly sure the intention for the write() callable in WSGI is that it take python "strings", which includes strings of binary data. But perhaps it needs to made explicitly clear in the WSGI spec that the write() callable explicitly writes in binary mode, i.e. that no translation is taking place on byte strings passed to it, and the application/user is responsible for all encoding concerns relating to byte strings passed to the write() callable. 5A. Python 2.1 vs. python 2.2: iterators and generators. ======================================================== The WSGI spec says that python 2.2 features are required to be compliant. However, it appears to me that the only python 2.2 features in use are iterators and generators, used when the application object returns an iterator. In fact, it's just that the example in the WSGI spec uses a generator (and its corresponding 'yield' keyword): actual applications are not required to use a generator: they can also return an object that implements the iterator protocol. Which means returning an object with a .next() method when the .__iter__() method is called. The iterator.next() method keeps returning values, until the iterator runs out, in which case it raises StopIteration. Like generators, the iterator protocol was also introduced in python 2.2, but they are two separate things. However, even though jython is based on python 2.1, and thus doesn't have built-in support for either iterators or generators, I have still implemented the iterator protocol in my java/jython framework, by simply invoking the .__iter__() and .next() methods on application objects, and catching StopIteration exceptions. So I can support components and applications returning iterators, and I'm thus compliant with the spec, even though I'm running on 2.1. (This is only possible because I'm embedding: it is still not possible to support the iterator protocol in, say, jython for-loops) Does the spec need to be changed to reflect this iterators/versioning issue? Or to more clearly define the difference between iterators and generators? It's conceivable that even a python 1.5 framework could be programmed to support the iterator protocol: it's *very* easy to implement. 5B. A "python.version" WSGI variable? ===================================== Of course, it will be case that some middleware and applications will require to use more advanced and recent (2.2, 2.3, 2.4) language features, such as generators, generator expressions, decorators, etc. But such components and applications will not be usable under jython, which is 2.1. It would be nice for components and applications to have a way of knowing what version of python they are running under. Similarly, there will jython components and applications that require java libraries, and thus won't be usable on cpython of any version. Would it be useful to define a WSGI variable "python.version", similar to "wsgi.version", which gives the python version in effect? In most cases under jython, it wouldn't help, because its 2.1 compiler would choke when loading python files with newer python syntax anyway, giving syntax errors. But it might be useful in some circumstances, perhaps for sophisticated dispatchers with the requisite meta-data available to them? I'm not sure on this one. Maybe the values of sys.platform and os.name give enough information to deal with this problem? 6. Streaming and flushing. ========================== I see there has been discussion on the list about streaming output and flushing. In one message, Philip said "I'm suggesting that write() should be guaranteed to either: 1) Flush all output before returning, or 2) Put data in a buffer that will be emptied by another thread or by the operating system To be a conforming implementation, a server/gateway must do one or the other." In the J2EE case (and I'm sure with Apache CGI), that's very simple to deal with, since the container will do it's own buffering completely outside your control, and send the pieces with chunked-transfer encoding if necessary. So even if I put a flush on the output channel in my framework, I'm only flushing it to the container's buffer: it's still not guaranteed to send output back down the return socket to the client. Just a datapoint. 7. Redirects. ============= I read some discussion in the lists on how to handle container specific facilities, e.g. Apache/mod_python's ability to internally redirect a request. J2EE offers the same capabilities, to internally redirect a request, without sending a response back to the client. It happens in a slightly different way, because you first ask your container for a dispatcher, based on a url, and then call that dispatcher to redirect to the URL. And the client may not see any redirect HTTP responses: it's all internal to the container. I see the solution to this redirect platform-dependence problem in the implementation of a platform-independent WSGI middleware component that takes all responsiblity for redirects. This component examines the wsgi.environment present, seeking hints for the optimal way to redirect the request: if mod_python is available, use the mopd_python API call: if modjy is available, use the getDispatcher(uri).redirect() dance, etc. If none of these platform specific techniques are available, it can fall back to sending a 302 or 307 response back to the client, and let the client re-reqeust the new URL. If the platform specific techniques are available, their availability will be signalled in wsgi.envvars by the presence of variables such "mod_python.request" or "modjy.servlet_context", etc. So one ultraportable component could do it all (albeit chock full of special cases). Problem solved? 8. Write callable and fileno() ============================== It is a good idea to check for the fileno() attribute on the write callable, since many platforms/frameworks have high-performance ways of transferring file contents to sockets, for example. Java 1.4 nio has this capability, through the use of directBuffers, memory-mapped files, and special natively implemented methods to transfer between the two. I'm be surprised if containers like Apache don't support something similar. This can drastically improve throughput on static files. Java objects have "channel"s, or "outputStream"s not "fileno"s. But that's an easy problem to fix. 9. Server-detected headers. =========================== I can see the reason for servers/containers intercepting client headers and translating/augmenting/deleting them. However, do we need a specification of what to do with certained specified headers? As with CGI, should I recognise the "Status: " header or the "Location: " header, and translate it to the relevant status code, or do a redirect, respectively? If I don't do those translations, won't I be breaking reams of python CGI code out there that relies on Apache doing this? 10. The "wsgi.errors" environment variable. ========================================== Under J2EE, setting the "wsgi.input" variable is easy, I just wrap the HttpServletRequest.getInputStream() with an org.python.core.PyFile, and bingo. However, the J2EE HttpServletRequest has no corresponding error stream, nor does the corresponding HttpServletResponse paired with each request. The only mechanism I can use to send error output is the "sendError(int, message)" method of HttpServletResponse. Which allows me to send both an integer status code and a textual message, which the J2EE docs say "The server defaults to creating the response to look like an HTML-formatted server error page containing the specified message, setting the content type to "text/html", leaving cookies and other headers unmodified". So I can't send error output this way without also knowing a status code for it as well. Which makes we wonder what the "wsgi.errors" variable is for? Yes, it's for writing error data. But what do we expect to happen to data that gets written to it? Will be it wrapped or translated in some way, and and used to construct an error response to the user? Or should it be locally logged by the server? I know that this is all J2EE specific stuff, as is confirmed by the rest of the documentation sentence I quoted above: "If an error-page declaration has been made for the web application corresponding to the status code passed in [to the sendError method], it will be served back in preference to the suggested msg parameter." WSGI (rightly) has no concept of "configured error page declarations", so it would seem the "sendError" method is not the right method to use to implement "wsgi.errors". So I'm going to have to treat the error output in some other way, which means I need to know more about what it is. Before I can implement a jython framework that is fully compliant with the WSGI spec, I need to know what will happen to any output send to "wsgi.errors", so that I can code for whatever eventualities arise. Or if it's always to be a framework specific thing, maybe I'll just redirect all "wsgi.errors" output to /dev/null, for example? The J2EE ServletContext for each servlet has a "log(message)" method. Maybe I should just send error output there, in which case it will end in the server logs? That's all for now. onwards-and-upwards-ly y'rs, Alan. From ianb at colorstudy.com Mon Aug 30 04:22:25 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Aug 30 04:22:32 2004 Subject: [Web-SIG] My experiences implement WSGI on java/j2ee/jython. In-Reply-To: <41327597.5060909@xhaus.com> References: <41327597.5060909@xhaus.com> Message-ID: <41328F61.20705@colorstudy.com> Great to see more implementation. My thoughts on some of the questions (only quoting the relevant portions)... Alan Kennedy wrote: > 1. Default values of environment variables when not present. > ============================================================ > > The spec says that compulsory environment variables, for example > "CONTENT_LENGTH" or "CONTENT_TYPE", must have a value, i.e. "must be > present, but may be an empty string, if there is no more appropriate > value for them". I read "empty string" to mean "". > > There are obviously two different choices for how to represent values > for headers/env-vars that are not present in the request, i.e. 1. an > empty string as described above or 2. as a python None value. It seems > more correct to me to use the latter option, None, for when the > header/env-var is not available, i.e. the client did not send it. This > allows the use of the "" value to indicate (the admittedly rare and > malformed case) that the client sent the header name, but did not > specify a header value. If WSGI uses the empty string for both cases, > then we lose the ability to distinguish between when the header was sent > with no value, and when it wasn't sent at all . Elsewhere in the spec (I forget where) I believe it is very strict that all CGI variables (if present) must have non-unicode string values. So None would not be allowed in any CGI variable (only extension variables). I think for all the required variables using the empty string should be sufficient to indicate ambiguity. Applications can't depend on there being a good distinction between a missing key and a empty string, as different parent containers can go either way, so the WSGI gateway might not have any information to work on. > 2. The SCRIPT_NAME variable. > ============================ > > At first I was a little wary of the SCRIPT_NAME variable, and how I > would construct it, until I realised that the beginning of the > URL->Callable mapping is outside the scope of WSGI: it is in the control > of whichever program/process/container is receiving HTTP requests > through sockets from the client, and resolving/dispatching them > according to its configuration files: in my case that was a J2EE > container, e.g. Tomcat. > > The J2EE call that returns a value equivalent to the CGI SCRIPT_NAME > variable is HTTPServletRequest.getServletPath method. It is an > interesting note on it which says that "This method will return an empty > string ("") if the servlet used to process this request was matched > using the "/*" pattern." Which seems a little odd, until you realise > that the SCRIPT_NAME = "" case is when the application object is > responsible for dealing with the entire URL space. Maybe it's worth > adding a note to this effect in the WSGI spec as well? It helped me > understand things better. That makes sense to me. I don't think SCRIPT_NAME should ever be "/" -- usually PATH_INFO should either be the empty string, or start with /, so if your application applies to the root domain then PATH_INFO should be the entire request URL, and SCRIPT_NAME the empty string. > An idea occurs to me for a nice little reusable WSGI middleware > component which is a URI mapper, with functionality akin to apache > mod_rewrite, resolving URIs to python callable's. A lot of frameworks > like to do things with URL rewriting and mapping, in order to present a > nice clean URL interface to a tree of objects. Quixote is one such > framework that likes to have crisp URLs. But much of the time installing > such frameworks requires configuring apache and invoking mod_rewrite and > its "cool voodoo" to get the job done. Which can be difficult to debug > and get working, and scares newbies. (On re-reading the spec, and the > mailing list, I see I'm not the only one to have thought of such a uri > mapping component :-) Definitely. I like the idea that most WSGI servers and middleware (except for the URL mappers) would just take a single application, to keep the techniques separate. > 3. Status code and message. > =========================== > > The WSGI spec states that the status value passed to start_response > should be of the form "999 Message here". That's fine, I can parse up > the string easily enough to get the java data types I need to send to > the container. However, J2EE does not allow me to set the message > string: I can only set the status code, and that must have an integer > value. That raises an interesting question. As far as I know, no client ever pays any attention to the message. It's purely noise, conveying no information. It might make sense, for simplicity, for the status code to be an integer, as it apparently is in Java. > 5A. Python 2.1 vs. python 2.2: iterators and generators. > ======================================================== > > The WSGI spec says that python 2.2 features are required to be > compliant. However, it appears to me that the only python 2.2 features > in use are iterators and generators, used when the application object > returns an iterator. In fact, it's just that the example in the WSGI > spec uses a generator (and its corresponding 'yield' keyword): actual > applications are not required to use a generator: they can also return > an object that implements the iterator protocol. Which means returning > an object with a .next() method when the .__iter__() method is called. > The iterator.next() method keeps returning values, until the iterator > runs out, in which case it raises StopIteration. Like generators, the > iterator protocol was also introduced in python 2.2, but they are two > separate things. > > However, even though jython is based on python 2.1, and thus doesn't > have built-in support for either iterators or generators, I have still > implemented the iterator protocol in my java/jython framework, by simply > invoking the .__iter__() and .next() methods on application objects, and > catching StopIteration exceptions. So I can support components and > applications returning iterators, and I'm thus compliant with the spec, > even though I'm running on 2.1. (This is only possible because I'm > embedding: it is still not possible to support the iterator protocol in, > say, jython for-loops) > > Does the spec need to be changed to reflect this iterators/versioning > issue? Or to more clearly define the difference between iterators and > generators? > > It's conceivable that even a python 1.5 framework could be programmed to > support the iterator protocol: it's *very* easy to implement. That's also an interesting question. I guess with both Jython and Zope 2.6 and earlier being Python 2.1, it should be given some consideration. One question: should the application iterable be a Python 2.2 style iterable? I.e., it is up to Python 2.1 servers to implement the Python 2.2 iterator protocol themselves? Or, should the application be responsible to return an iterator, appropriate for the Python version? In Python <2.2 (including 1.5.2) the protocol was that you called __getitem__ with ever-increasing integers, until an IndexError was raised. There was no concept of a special __iter__() function. But I guess Python 2.2's iter() builtin could be simulated: def iter(obj): if type(obj) in (types.ListType, types.TupleType): return obj elif type(obj) is types.FileType: return FileIter(obj) elif hasattr(obj, '__iter__'): return IterWrapper(obj.__iter__()) else: return IterWrapper(obj) class FileIter: def __init__(self, file): self.file = file def __getitem__(self, index): # while this copies Python 2.2, you wouldn't actually have to # iterate line by line: value = self.file.readline() if value == '': raise IndexError return value class IterWrapper: def __init__(self, obj): self.obj = obj def __getitem__(self, index): # we ignore the index try: return self.obj.next() except StopIteration: raise IndexError Then in Jython you'd do: for s in iter(obj): write(s) One issue is that StopIteration isn't defined in earlier versions of Python. You may be able to add it to __builtins__. Obviously none of this means anything if the application uses generators, but in many cases that should make it more portable. I think it might be the right idea to have the server implement this kind of backward portability, rather than applications. But that might be something for the spec, if so. > 5B. A "python.version" WSGI variable? > ===================================== > > Of course, it will be case that some middleware and applications will > require to use more advanced and recent (2.2, 2.3, 2.4) language > features, such as generators, generator expressions, decorators, etc. > But such components and applications will not be usable under jython, > which is 2.1. It would be nice for components and applications to have a > way of knowing what version of python they are running under. Similarly, > there will jython components and applications that require java > libraries, and thus won't be usable on cpython of any version. > > Would it be useful to define a WSGI variable "python.version", similar > to "wsgi.version", which gives the python version in effect? In most > cases under jython, it wouldn't help, because its 2.1 compiler would > choke when loading python files with newer python syntax anyway, giving > syntax errors. But it might be useful in some circumstances, perhaps for > sophisticated dispatchers with the requisite meta-data available to > them? I'm not sure on this one. Maybe the values of sys.platform and > os.name give enough information to deal with this problem? sys.version_info has the information you are looking for. > 7. Redirects. > ============= > > I read some discussion in the lists on how to handle container specific > facilities, e.g. Apache/mod_python's ability to internally redirect a > request. > > J2EE offers the same capabilities, to internally redirect a request, > without sending a response back to the client. It happens in a slightly > different way, because you first ask your container for a dispatcher, > based on a url, and then call that dispatcher to redirect to the URL. > And the client may not see any redirect HTTP responses: it's all > internal to the container. > > I see the solution to this redirect platform-dependence problem in the > implementation of a platform-independent WSGI middleware component that > takes all responsiblity for redirects. This component examines the > wsgi.environment present, seeking hints for the optimal way to redirect > the request: if mod_python is available, use the mopd_python API call: > if modjy is available, use the getDispatcher(uri).redirect() dance, etc. > If none of these platform specific techniques are available, it can fall > back to sending a 302 or 307 response back to the client, and let the > client re-reqeust the new URL. > > If the platform specific techniques are available, their availability > will be signalled in wsgi.envvars by the presence of variables such > "mod_python.request" or "modjy.servlet_context", etc. So one > ultraportable component could do it all (albeit chock full of special > cases). > > Problem solved? I can also imagine in some future version of WSGI (or some standard building on it) that we could decide on a standard interface for doing internal redirects, available under a standard key. > 9. Server-detected headers. > =========================== > > I can see the reason for servers/containers intercepting client headers > and translating/augmenting/deleting them. However, do we need a > specification of what to do with certained specified headers? As with > CGI, should I recognise the "Status: " header or the "Location: " > header, and translate it to the relevant status code, or do a redirect, > respectively? If I don't do those translations, won't I be breaking > reams of python CGI code out there that relies on Apache doing this? Right now there should be no Status header, and a Location header should not imply a redirect, unlike with CGI. Any CGI responses have to be wrapped to comply. But there's other issues besides this, so they already had to be wrapped. > 10. The "wsgi.errors" environment variable. > ========================================== > > Under J2EE, setting the "wsgi.input" variable is easy, I just wrap the > HttpServletRequest.getInputStream() with an org.python.core.PyFile, and > bingo. > > However, the J2EE HttpServletRequest has no corresponding error stream, > nor does the corresponding HttpServletResponse paired with each request. > The only mechanism I can use to send error output is the "sendError(int, > message)" method of HttpServletResponse. Which allows me to send both an > integer status code and a textual message, which the J2EE docs say "The > server defaults to creating the response to look like an HTML-formatted > server error page containing the specified message, setting the content > type to "text/html", leaving cookies and other headers unmodified". Stuff to wsgi.errors isn't supposed to go to the client. Under Apache it would typically end up in the error log. Under CGI wsgi.errors is usually stderr (and CGI script run under Apache that write to stderr also end up writing to the error log). Error logs -- at least the kind that WSGI implies -- are fairly free form. Though I guess a server could buffer the output sent to wsgi.errors, put in some delimiters, add some request information, and turn it into a nicely formatted log entry. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Mon Aug 30 04:53:12 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 30 04:53:33 2004 Subject: [Web-SIG] My experiences implement WSGI on java/j2ee/jython. In-Reply-To: <41327597.5060909@xhaus.com> Message-ID: <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> At 01:32 AM 8/30/04 +0100, Alan Kennedy wrote: >I suppose I'm really pointing out a possible wording difficulty in the >spec, which says "may be an empty string, if there is no more appropriate >value". To me None is "a more appropriate value" sometimes, so I suppose I >could legitimately interpret that to mean that I can use None values in my >WSGI-compliant framework, because my server infrastructure allows me to >detect their absence or lack of value. > >So perhaps either the wording of the spec needs to be tightened up to >exclude this? Or the default environment values need to be more clearly >specified? Or perhaps a discussion of None vs. empty string needs to added >to the Q&A at the end? I went to add this to the PEP, and found it was already there: """Also note that CGI-defined variables must be strings, if they are present at all. It is a violation of this specification for a CGI variable's value to be of any type other than ``str``.""" >So, in terms of compliance with WSGI, am I in violation of the WSGI spec >by not transmitting the actual textual status message specified by the >application? If that's a problem, there's nothing I can do about it. Personally, I would just document it as a minor nonconformance of your servlet implementation; it's not likely to be an issue in practice. >It is fundamental requirement (to me at least) that WSGI be able to handle >writing of binary data. And I'm fairly sure the intention for the write() >callable in WSGI is that it take python "strings", which includes strings >of binary data. But perhaps it needs to made explicitly clear in the WSGI >spec that the write() callable explicitly writes in binary mode, i.e. that >no translation is taking place on byte strings passed to it, and the >application/user is responsible for all encoding concerns relating to byte >strings passed to the write() callable. Added a note about this. >However, even though jython is based on python 2.1, and thus doesn't have >built-in support for either iterators or generators, I have still >implemented the iterator protocol in my java/jython framework, by simply >invoking the .__iter__() and .next() methods on application objects, and >catching StopIteration exceptions. So I can support components and >applications returning iterators, and I'm thus compliant with the spec, >even though I'm running on 2.1. (This is only possible because I'm >embedding: it is still not possible to support the iterator protocol in, >say, jython for-loops) Unfortunately, your technique doesn't actually work, unless you're also going to patch the Jython __builtins__ to include 'StopIteration', 'iter', and so forth. You would have to use the pre-2.2 iteration protocol, which uses __getitem__ and IndexError. I think this would have to be something you document as a spinoff or "application note" for WSGI users who must use a pre-2.2 version of Python. One of the reasons we decided to go ahead and require 2.2.2 was to avoid having to deal with the absence of True/False, iterators, and generators. >It's conceivable that even a python 1.5 framework could be programmed to >support the iterator protocol: it's *very* easy to implement. But not actually *usable* in a pre-2.2 Python, because StopIteration doesn't exist, so code can't raise it. If it has to import it from somewhere, then it can't be used with multiple WSGI servers or gateways, because each one is expecting a different StopIteration class. >Would it be useful to define a WSGI variable "python.version", similar to >"wsgi.version", which gives the python version in effect? -1; that's what sys.version, sys.hexversion, sys.version_info, and so on are for. >In the J2EE case (and I'm sure with Apache CGI), that's very simple to >deal with, since the container will do it's own buffering completely >outside your control, and send the pieces with chunked-transfer encoding >if necessary. So even if I put a flush on the output channel in my >framework, I'm only flushing it to the container's buffer: it's still not >guaranteed to send output back down the return socket to the client. That is potentially a problem, since the point is to guarantee that when 'write()' returns to the application, the output isn't going to just sit in the buffer while the application moves ahead with other things: it should be going to the client. >I see the solution to this redirect platform-dependence problem in the >implementation of a platform-independent WSGI middleware component that >takes all responsiblity for redirects. This component examines the >wsgi.environment present, seeking hints for the optimal way to redirect >the request: if mod_python is available, use the mopd_python API call: if >modjy is available, use the getDispatcher(uri).redirect() dance, etc. If >none of these platform specific techniques are available, it can fall back >to sending a 302 or 307 response back to the client, and let the client >re-reqeust the new URL. I'm afraid internal and external redirects are *not* interchangeable. Specifically, internal redirects break relative URLs. So, internal redirects need to be something that's a server extension, and *should* be something obscure to do, because you'd better know what you're doing. >8. Write callable and fileno() >============================== > >It is a good idea to check for the fileno() attribute on the write callable, No, it isn't. First of all, it's a callable, not a stream, so it won't have such an attribute. Second, even if it *is* the write method of a stream, it's none of the application's business. Perhaps you're confusing this with the part where the server is allowed to check whether the application's return value has a fileno()? >9. Server-detected headers. >=========================== > >I can see the reason for servers/containers intercepting client headers >and translating/augmenting/deleting them. However, do we need a >specification of what to do with certained specified headers? As with CGI, >should I recognise the "Status: " header or the "Location: " header, and >translate it to the relevant status code, or do a redirect, respectively? >If I don't do those translations, won't I be breaking reams of python CGI >code out there that relies on Apache doing this? Again, WSGI doesn't support internal redirects. The spec as currently written doesn't consider "status" to be a header. Meanwhile, "Location" is a valid HTTP header, so there's no issue there. If you're doing a WSGI implementation, don't worry about CGI. If the CGI code is ported to WSGI, then fixing these issues are part of the port. If the CGI is run under a "WSGI-to-CGI" wrapper, then this is the wrapper's responsibility. In no case is the interpretation of Status or Location headers part of the WSGI server's responsibility. >Which makes we wonder what the "wsgi.errors" variable is for? Yes, it's >for writing error data. But what do we expect to happen to data that gets >written to it? Will be it wrapped or translated in some way, and and used >to construct an error response to the user? Or should it be locally logged >by the server? """An output stream to which error output can be written. For most servers, this will be the server's error log.""" I've just added some additional explanatory text: ``wsgi.errors`` An output stream to which error output can be written, for the purpose of recording program or other errors in a standardized and possibly centralized location. For many servers, this will be the server's main error log. Alternatively, this may be ``sys.stderr``, or a log file of some sort. The server's documentation should include an explanation of how to configure this or where to find the recorded output. A server or gateway may supply different error streams to different applications, if this is desired. >The J2EE ServletContext for each servlet has a "log(message)" method. >Maybe I should just send error output there, in which case it will end in >the server logs? That is probably the right place for a servlet-based WSGI gateway to write errors to. From pje at telecommunity.com Mon Aug 30 05:01:40 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 30 05:02:02 2004 Subject: [Web-SIG] My experiences implement WSGI on java/j2ee/jython. In-Reply-To: <41328F61.20705@colorstudy.com> References: <41327597.5060909@xhaus.com> <41327597.5060909@xhaus.com> Message-ID: <5.1.1.6.0.20040829225537.02740d10@mail.telecommunity.com> At 09:22 PM 8/29/04 -0500, Ian Bicking wrote: >One question: should the application iterable be a Python 2.2 style >iterable? I.e., it is up to Python 2.1 servers to implement the Python >2.2 iterator protocol themselves? Or, should the application be >responsible to return an iterator, appropriate for the Python version? How about we just add a "Using WSGI with earlier Python Versions" subsection to the application/implementation notes? It would simply note that a WSGI server/gateway intended to work pre-2.2 *must* use only a 'for' loop to iterate over an iterable returned by the application, and that applications needing to work pre-2.2 would have to implement the old-style iteration protocol. It is *not* necessary for either the server or application to go through any special contortions to emulate the 2.2 iterator protocol, because current versions of Python still support the old iterator protocol. See PEP 234: """For backwards compatibility, the PyObject_GetIter() function implements fallback semantics when its argument is a sequence that does not implement a tp_iter function: a lightweight sequence iterator object is constructed in that case which iterates over the items of the sequence in the natural order.""" ('iter(ob)' is basically just Python for 'PyObject_GetIter(ob)' in C.) From pje at telecommunity.com Mon Aug 30 05:16:00 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 30 05:16:18 2004 Subject: [Web-SIG] Other kinds of environment variables In-Reply-To: <5.1.1.6.0.20040827000752.0239b2b0@mail.telecommunity.com> References: <6BBA3664-F7DB-11D8-82BE-000A95BD86C0@mnot.net> <5.1.1.6.0.20040826212629.02641e00@mail.telecommunity.com> <5.1.1.6.0.20040826212629.02641e00@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040829231521.022907a0@mail.telecommunity.com> At 12:11 AM 8/27/04 -0400, Phillip J. Eby wrote: >At 08:44 PM 8/26/04 -0700, Mark Nottingham wrote: >>Digest auth sucks much less, and also uses REMOTE_USER. > >As I said, REMOTE_USER in a CGI environment leads to nasty local-system >security holes: potentially a local user can just set >REMOTE_USER=whoeverIwantToBe and invoke the application. > >Maybe we should, however, have a configuration key for >'wsgi.auth_available' that indicates the availability of the >HTTP_AUTHORIZATION header. Absence of 'wsgi.auth_available' would mean >that the availability is unknown, while true or false would indicate >definite availability or lack thereof. Nobody's responded to this; does that mean you all think it's a brilliant idea? ;) From pje at telecommunity.com Mon Aug 30 05:25:47 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 30 05:26:06 2004 Subject: [Web-SIG] Pending modifications to PEP 333 Message-ID: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> Here are some changes I've proposed in the last few days to resolve issues people brought up, but which I haven't gotten much feedback on: * 'wsgi.fatal_errors' key for exceptions that apps and middleware shouldn't trap * 'wsgi.auth_available' flag * Make the 'headers' object an 'email.Message' (well, there's been some feedback, but I think I addressed the concerns, and there was no feedback since) * what should a server or gateway's default error handling be, for each of the eight contexts in which an exception can occur? * notes on writing pre-2.2 compatible iteration code * anything else? I'd really like to get everything but the HTTP/1.1-specific stuff (which Mark Nottingham is working on) wrapped up early this week, if possible. So far, there has been surprisingly little comment on the PEP either from c.l.py or python-dev, so I'm going to take their silence to mean that the PEP is basically perfect, apart from the currently known issues. ;) From ianb at colorstudy.com Mon Aug 30 05:33:35 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Aug 30 05:33:42 2004 Subject: [Web-SIG] My experiences implement WSGI on java/j2ee/jython. In-Reply-To: <5.1.1.6.0.20040829225537.02740d10@mail.telecommunity.com> References: <41327597.5060909@xhaus.com> <41327597.5060909@xhaus.com> <5.1.1.6.0.20040829225537.02740d10@mail.telecommunity.com> Message-ID: <4132A00F.8010706@colorstudy.com> Phillip J. Eby wrote: > How about we just add a "Using WSGI with earlier Python Versions" > subsection to the application/implementation notes? > > It would simply note that a WSGI server/gateway intended to work pre-2.2 > *must* use only a 'for' loop to iterate over an iterable returned by the > application, and that applications needing to work pre-2.2 would have to > implement the old-style iteration protocol. This would mean that applications would have to be written with backward compatibility in mind. Which may not be terribly unreasonable. But I don't see any reasonable way you can write version-neutral code. For instance, file objects are not iterable in older Pythons, so you can't return those. That's pretty annoying. And there's no method that is invoked which warns you that you need to be backward-compatible -- __iter__ is called on newer Pythons, but nothing on newer ones. Of course, those same functions I put in the other email could be applied on the application side, maybe conditionally depending on Python version. From a practical sense, though, I suspect servers are going to be more aware of their target Python version than applications. So server authors are going to have more incentive to write the code to deal with older Python versions. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Mon Aug 30 05:38:57 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Aug 30 05:39:05 2004 Subject: [Web-SIG] Pending modifications to PEP 333 In-Reply-To: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> Message-ID: <4132A151.1000006@colorstudy.com> Phillip J. Eby wrote: > Here are some changes I've proposed in the last few days to resolve > issues people brought up, but which I haven't gotten much feedback on: > > * 'wsgi.fatal_errors' key for exceptions that apps and middleware > shouldn't trap > > * what should a server or gateway's default error handling be, for each > of the eight contexts in which an exception can occur? Those are hard problems. Lots of thought. I haven't done much thought on it, so I don't have any comments. > * 'wsgi.auth_available' flag Sure. > * Make the 'headers' object an 'email.Message' (well, there's been some > feedback, but I think I addressed the concerns, and there was no > feedback since) I'm -0 on email.Message. > * notes on writing pre-2.2 compatible iteration code I'd rather allow lazier applications and put more of the pre-2.2 compatibility work in the hands of the server. > * anything else? Integer status code? And the Status header. I'm -0 on a status header. I'm +1 on integer status code. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Mon Aug 30 06:12:18 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Aug 30 06:12:25 2004 Subject: [Web-SIG] Stuff left to be done on WSGI In-Reply-To: <5.1.1.6.0.20040828121031.03326890@mail.telecommunity.com> References: <5.1.1.6.0.20040827224215.027473f0@mail.telecommunity.com> <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com> <5.1.1.6.0.20040827224215.027473f0@mail.telecommunity.com> <5.1.1.6.0.20040828121031.03326890@mail.telecommunity.com> Message-ID: <4132A922.7070008@colorstudy.com> Phillip J. Eby wrote: > At 11:51 PM 8/27/04 -0500, Ian Bicking wrote: > >> I don't know if we need deeper hierarchy than that. E.g., >> web.wsgi.cgiadapter. I don't think so. I'd rather "WSGI" be a term >> only those in the know use -- it means nothing unless you expand the >> acronym, and even then it's pretty vague. Ultimately I hope most web >> programmers just don't need to think about any of it. > > > Flat is better than nested; let's not mix other projects into this. The > WSGI stuff will have enough content to deserve a package of its own, and > we don't want it to be dependent upon a bunch of "next generation" stuff > that's not even designed yet. Will it really? And how will it be organized? There's some utility functions, which don't deserve a module. There's WSGIHTTPServer, based on BaseHTTPServer. And maybe some CGI WSGI server. I imagine other things could come along, but not right away, and where would they go? Added to some top-level module? A new module? I also *really* dislike the name wsgi for a module. It's a fine name for discussing this, but I'm really opposed to it becoming a name used more widely. Not because I think there's a better name, but because the function is important and the name isn't. One of the things we can do if this is an approved PEP is that we don't have to qualify this as one-of-many, using a distinguishing name. >> Yes, you are right. Which means the catcher has to keep track of the >> headers that were sent if it hopes to do anything. In that case, it >> might check for text/html or text/plain; if not those two, then just >> stop the response short and log the error. If so, and if configured >> to show errors, then it could display them; cgitb goes to some length >> to make HTML render correctly. >> >> That makes me think that wrapping send_response is more reasonable. >> Though it makes error resolution in servers more complex. > > > I'm not sure I follow you. The error handling in the server would look > just like the handling in middleware, no? In fact, this potentially > sounds like a job for another boilerplate function in wsgi.util, or > perhaps a class. I imagine we might have an AbstractWSGIServer that > defines basic start-response, write, and other operations, with abstract > methods for sending/receiving data to and from the client, and various > overrideable methods for policy. The simple WSGIServer and CGI gateway > would both derive from it, or perhaps delegate to it. To me that feels like it makes implementation more complicated, rather than less. Maybe not really, but I think it will *feel* more complicated. I think a good example is more helpful to authors. All these issues are very much part of the control flow, and abstracting control flow leads (IMHO) to confusing class structures. >>> set_charset/get_charset -- sets the character set parameters of the >>> content-type, which is actually useful. On the down side, setting >>> the character set sets MIME-Version, but it also sets the >>> Content-Transfer-Encoding, so it doesn't force the server to default >>> one. >> >> >> Would that start opening up the possibility of accepting Unicode to >> write()/app_iter? > > > In my view, no, because then we'd force the server to know about every > possible encoding the client and app can come up with. If the app uses > this, it should handle the encoding. We might want to include a utility > routine or two to pull what the client accepts out of HTTP_ACCEPT et al. Python seems to be pretty good at dealing with a lot of different encodings. A lot of work on this has gone into the base Python distribution -- I don't think there's any better source of code on encoding. It opens up a big can of worms, so I don't mind ignoring encoding, but maybe that's just because I'm American and I'm lazy and usually ignore encoding, so it's mysterious to me. >>> __len__, __getitem__, __setitem__, __delitem__, __contains__, >>> has_key, get, keys, values, items -- case-insensitive dictionary-like >>> interface (i.e., the stuff we mainly want) >>> get_all -- all values for a header name >>> add_header, replace_header -- more stuff we want >> >> >> Very good, though not hard to reimplement. > > > But why should everybody reimplement it, if we're not going to be in the > stdlib till 2005? Well, if we already have utility functions, this is just a utility class. And it would be a very small and easy to understand. Smaller and easier to understand than email.Message, certainly, and with no distracting vestigal pieces. >> Okay, looking through the code briefly, I can't help but think that >> all the complex parts are parts we don't care about. > > > Not so; content-type parameter setting is quite handy. For example, if > you're doing multipart push, you'll need e.g. set_boundary and > get_boundary might also be useful. > > >>> Well, to some extent we have to look at the question of what should >>> happen in those circumstances anyway, whether we solve the problem in >>> that specific way or not. Because if the application *does* call >>> start_response more than once, the server has to be able to handle it >>> *somehow*. Really, the ultimate error handling *has* to be done by >>> servers, unless they want to take the route of crashing the entire >>> process when something bad happens. :) >> >> >> Good question. I think servers should consider that an error, but >> they should handle that error gracefully. Which probably means >> keeping a "has send_response already been called" flag. >> >> Now, if I could get access to that flag from middleware... and maybe >> access to the headers and status that have already been sent... (and >> really, why not? We aren't worried about streaming headers like we >> are about bodies) > > > You dodged my question... what are you going to *do* with that? > Because we need to formulate sensible error handling policies for the > general case, including things like an I/O error due to the client > disconnecting. Well, in some cases I would try to display errors to the client. Though maybe a class of errors -- particularly those that happen during the iteration phase, or after start_response -- could just go to a log. OTOH, I'd want to show *some* indication to the client that an error has occured, and the response is incomplete, at least for human-readable content (text/html and maybe text/plain). But not in all cases, like I/O error. OTOH, I might log errors *only* when I couldn't display them to the client (during development). > Here are possible loci of error: > > * Before start_response is called (application error) Easy to handle. Display a traceback, or a technical-problems error message and log the error. > * During start_response (server error or application error What application errors are you thinking of? Like invoking start_response incorrectly? Server errors should probably be handled by the server. It might be nice if the server always raised a single exception (say, WSGIServerError), so a start_response definition might look like: def start_response(status, headers): try: blah blah except ServerIOError: do something raise WSGIServerError And applications shouldn't catch (or should re-raise) a server error. > * After start_response, before first write (application error) I'd like the option here to display an error to the client, dependent on the content-type. > * During a write (server error or application error) Another WSGIServerError? > * Between writes, before return (application error) Depending on content-type, a last write would be good. > * After return/during iteration (application error) Again, depending on content-type, a last write (well, iteration) would be nice. Less important generally. > * During a post-return write (server error or application error) I'm not sure what you're thinking here? > * During 'close()' (application error) Logged to wsgi.errors, nothing else. > The reason those are "server or application" is because start_response > and write can fail due to bad data passed by the application, so it's > really an application error in that case. The server might fail for > some other reason, of course, like a lost client connection. > > One issue here is that an application or middleware error handler needs > to know whether the error is the application's or the server's. It > makes no sense for a failed write to cause a middleware error handler to > attempt to write some more data! It seems we need an error parameter like: > > environ['wsgi.fatal_errors'] = SomeExceptionClass1, > SomeExceptionClass2,... > > Such that one would use: > > try: > # invoke child application, etc. > except environ['wsgi.fatal_errors']: > raise > except: > # regular error handling here > > In other words, an application or middleware component should abort if > it receives one of these exception types. I'm inclined to think that > application WSGI programming errors should be treated as fatal: if the > app sends bad parameters to start_response or write, there's little > point in proceeding further. Hmm... that would work too. Then the type of the exception wouldn't be lost, though servers would also be able to encode the type inside a single exception. OTOH, by using a tuple there, you could avoid requiring any wsgi module which defines this particular exception. I would probably call these "server_errors" rather than "fatal_errors", though I guess it amounts to the same thing. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From floydophone at gmail.com Mon Aug 30 06:16:54 2004 From: floydophone at gmail.com (Peter Hunt) Date: Mon Aug 30 06:16:57 2004 Subject: [Web-SIG] My repository of WSGI code Message-ID: <6654eac4040829211664129ec@mail.gmail.com> http://st0rm.hopto.org/wsgi/ The files listed there are: - jonpy_wsgi.py - wsgi to jonpy adapter - test.cgi.py - test jonpy application using wsgi - wsgicgi.py - run a wsgi application under - WSGIHTTPServer.py - copycat of CGIHTTPServer, except it runs WSGI apps - testhttpserver.py - tests the WSGIHTTPServer.py class Please submit any patches/comments. Perhaps we could improve upon these scripts and include them in the distribution? From ods at strana.ru Mon Aug 30 11:06:49 2004 From: ods at strana.ru (Denis S. Otkidach) Date: Mon Aug 30 11:12:12 2004 Subject: [Web-SIG] Re: Pending modifications to PEP 333 In-Reply-To: <4132A151.1000006@colorstudy.com> References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> <4132A151.1000006@colorstudy.com> Message-ID: <20040830130649.74f1826f.ods@strana.ru> On Sun, 29 Aug 2004 22:38:57 -0500 Ian Bicking wrote: > > * Make the 'headers' object an 'email.Message' (well, there's been some > > feedback, but I think I addressed the concerns, and there was no > > feedback since) > > I'm -0 on email.Message. Below is a class we use for headers in our framework for several years. I guess it's more comfortable than list of tuples or email.Message. Anyway, we have to fix only "must have" interface, but not all useful methods. class Headers: '''Dictionary-like object of HTTP headers with case insensitive key lookup and add() method. The order of headers is preserved.''' def __init__(self, data={}): self._headers = [] self._headers_map = {} if data: if isinstance(data, dict): # From dictionary for key, value in data.iteritems(): self.add(key, value) else: # from any sequence of pairs for key, value in data: self.add(key, value) # XXX Here can be initialization from other types: string, file. def __iter__(self): return iter(self._headers) def __len__(self): return len(self._headers) def keys(self): return self._headers_map.keys() def has_key(self, key): return self._headers_map.has_key(key) def add(self, key, value): self._headers.append((key, value)) self._headers_map.setdefault(key.lower(), []).append(value) def __getitem__(self, key): '''Get header. If there are several header with the same key, their values are joined.''' # RFC 2616, 4.2 Message Headers return ', '.join(self._headers_map[key.lower()]) def __setitem__(self, key, value): '''Replace headers with the same key.''' del self[key] self.add(key, value) def __delitem__(self, key): '''Delete all headers with this key. Never fail.''' key = key.lower() if self._headers_map.has_key(key): del self._headers_map[key] self._headers = [(k, v) for (k, v) in self._headers if k.lower()!=key] def __str__(self): return '\r\n'.join(['%s: %s' % h for h in self._headers])+'\r\n' -- Denis S. Otkidach http://www.python.ru/ [ru] From pje at telecommunity.com Mon Aug 30 15:33:14 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 30 15:33:42 2004 Subject: [Web-SIG] Re: Pending modifications to PEP 333 In-Reply-To: <20040830130649.74f1826f.ods@strana.ru> References: <4132A151.1000006@colorstudy.com> <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> <4132A151.1000006@colorstudy.com> Message-ID: <5.1.1.6.0.20040830093138.037788d0@mail.telecommunity.com> At 01:06 PM 8/30/04 +0400, Denis S. Otkidach wrote: >On Sun, 29 Aug 2004 22:38:57 -0500 >Ian Bicking wrote: > > > > * Make the 'headers' object an 'email.Message' (well, there's been some > > > feedback, but I think I addressed the concerns, and there was no > > > feedback since) > > > > I'm -0 on email.Message. > >Below is a class we use for headers in our framework for several years. >I guess it's more comfortable than list of tuples or email.Message. >Anyway, we have to fix only "must have" interface, but not all useful >methods. Hi Denis; thanks for the input. Unfortunately, WSGI needs to either use a class/type that's available in the Python standard library, or else a simple protocol like "sequence of name,value pairs". From wilk-ml at flibuste.net Mon Aug 30 16:01:53 2004 From: wilk-ml at flibuste.net (William Dode) Date: Mon Aug 30 16:02:01 2004 Subject: [Web-SIG] Re: Pending modifications to PEP 333 In-Reply-To: <5.1.1.6.0.20040830093138.037788d0@mail.telecommunity.com> (Phillip J. Eby's message of "Mon, 30 Aug 2004 09:33:14 -0400") References: <4132A151.1000006@colorstudy.com> <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> <4132A151.1000006@colorstudy.com> <5.1.1.6.0.20040830093138.037788d0@mail.telecommunity.com> Message-ID: <87isb0912m.fsf@blakie.riol> "Phillip J. Eby" writes: > At 01:06 PM 8/30/04 +0400, Denis S. Otkidach wrote: >>On Sun, 29 Aug 2004 22:38:57 -0500 >>Ian Bicking wrote: >> >> > > * Make the 'headers' object an 'email.Message' (well, there's been some >> > > feedback, but I think I addressed the concerns, and there was no >> > > feedback since) >> > >> > I'm -0 on email.Message. >> >>Below is a class we use for headers in our framework for several years. >>I guess it's more comfortable than list of tuples or email.Message. >>Anyway, we have to fix only "must have" interface, but not all useful >>methods. > > Hi Denis; thanks for the input. Unfortunately, WSGI needs to either > use a class/type that's available in the Python standard library, or > else a simple protocol like "sequence of name,value pairs". I also think email.Message is overkill for this and it can be very surprising to see an "email message" here... -- William Dod? - http://flibuste.net From steve at holdenweb.com Mon Aug 30 16:04:52 2004 From: steve at holdenweb.com (Steve Holden) Date: Mon Aug 30 16:07:25 2004 Subject: [Web-SIG] Re: Pending modifications to PEP 333 In-Reply-To: <20040830130649.74f1826f.ods@strana.ru> References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> <4132A151.1000006@colorstudy.com> <20040830130649.74f1826f.ods@strana.ru> Message-ID: <41333404.6070508@holdenweb.com> Denis S. Otkidach wrote: > On Sun, 29 Aug 2004 22:38:57 -0500 > Ian Bicking wrote: > > >>>* Make the 'headers' object an 'email.Message' (well, there's been some >>>feedback, but I think I addressed the concerns, and there was no >>>feedback since) >> >>I'm -0 on email.Message. > > > Below is a class we use for headers in our framework for several years. > I guess it's more comfortable than list of tuples or email.Message. > Anyway, we have to fix only "must have" interface, but not all useful > methods. > > [...] > > def __getitem__(self, key): > '''Get header. If there are several header with the same key, their > values are joined.''' > # RFC 2616, 4.2 Message Headers > return ', '.join(self._headers_map[key.lower()]) > [...] Since this module has seen productions use, can we take it you've had no problem joining cookie values with dates containing commas? This was one of the arguments for maintaining separate multiple headers of the same type, IIRC. regards Steve -- XXX Please note recent change of email address From wilk-ml at flibuste.net Mon Aug 30 16:58:49 2004 From: wilk-ml at flibuste.net (William Dode) Date: Mon Aug 30 16:58:51 2004 Subject: [Web-SIG] Pending modifications to PEP 333 In-Reply-To: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> (Phillip J. Eby's message of "Sun, 29 Aug 2004 23:25:47 -0400") References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> Message-ID: <87eklo8yfq.fsf@blakie.riol> "Phillip J. Eby" writes: > So far, there has been surprisingly little comment on the > PEP either from c.l.py or python-dev, so I'm going to take their > silence to mean that the PEP is basically perfect, apart from the > currently known issues. ;) First, thanks you (and the others) for the great works. Like a lot of people i think, i did my own modest framework, because it's near my need and it's not difficult to do. I don't think it can be a problem to still have a lot of framework in the community, each one is very specific and it's not difficult to write his own framework "aux petits oignons". But it's more difficult to write a server, everybody make his hack on top of BaseHTTPServer and reinvent the wheels. It's also because of the need to adapt his framework to BaseHTTPServer that this server doesn't evolve in the lib std, the same for cgi. So, when servers will follow this specification it'll be a breath of oxygen ! I'll keep my framework and throw away my servers. You found a really good point with this gateway :-) -- William Dod? - http://flibuste.net From ods at strana.ru Mon Aug 30 17:33:09 2004 From: ods at strana.ru (Denis S. Otkidach) Date: Mon Aug 30 17:38:19 2004 Subject: [Web-SIG] Re: Pending modifications to PEP 333 In-Reply-To: <41333404.6070508@holdenweb.com> References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> <4132A151.1000006@colorstudy.com> <20040830130649.74f1826f.ods@strana.ru> <41333404.6070508@holdenweb.com> Message-ID: <20040830193309.1baa0b01.ods@strana.ru> On Mon, 30 Aug 2004 10:04:52 -0400 Steve Holden wrote: > [...] > > > > def __getitem__(self, key): > > '''Get header. If there are several header with the same key, their > > values are joined.''' > > # RFC 2616, 4.2 Message Headers > > return ', '.join(self._headers_map[key.lower()]) > > > [...] > Since this module has seen productions use, can we take it you've had no > problem joining cookie values with dates containing commas? This was one > of the arguments for maintaining separate multiple headers of the same > type, IIRC. As you can see we do maintain separate headers with the same name. So there is no problem with Set-Cookie header. Here should be method like FieldStorage.getlist() for completeness, but we didn't ever need it. -- Denis S. Otkidach http://www.python.ru/ [ru] From ods at strana.ru Mon Aug 30 17:38:40 2004 From: ods at strana.ru (Denis S. Otkidach) Date: Mon Aug 30 17:43:49 2004 Subject: [Web-SIG] Re: Pending modifications to PEP 333 In-Reply-To: <5.1.1.6.0.20040830093138.037788d0@mail.telecommunity.com> References: <4132A151.1000006@colorstudy.com> <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> <4132A151.1000006@colorstudy.com> <5.1.1.6.0.20040830093138.037788d0@mail.telecommunity.com> Message-ID: <20040830193840.6b12ae9b.ods@strana.ru> On Mon, 30 Aug 2004 09:33:14 -0400 "Phillip J. Eby" wrote: > >Below is a class we use for headers in our framework for several years. > >I guess it's more comfortable than list of tuples or email.Message. > >Anyway, we have to fix only "must have" interface, but not all useful > >methods. > > Hi Denis; thanks for the input. Unfortunately, WSGI needs to either use a > class/type that's available in the Python standard library, or else a > simple protocol like "sequence of name,value pairs". "sequence of name,value pairs" is OK - my class satisfies this interface if you mean just iterable object when saying "sequence", and not real list. -- Denis S. Otkidach http://www.python.ru/ [ru] From wilk-ml at flibuste.net Mon Aug 30 19:10:55 2004 From: wilk-ml at flibuste.net (William Dode) Date: Mon Aug 30 19:11:12 2004 Subject: [Web-SIG] Pending modifications to PEP 333 In-Reply-To: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> (Phillip J. Eby's message of "Sun, 29 Aug 2004 23:25:47 -0400") References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> Message-ID: <87sma47dr4.fsf@blakie.riol> "Phillip J. Eby" writes: > So far, there has been surprisingly little comment on the > PEP either from c.l.py or python-dev, so I'm going to take their > silence to mean that the PEP is basically perfect, apart from the > currently known issues. ;) One important things of course is that current frameworks and servers implement the specs. When the most famous will begin, the others will follow, but who will begin ? Shall we ask them on their mailing-list ? Are they here ? -- William Dod? - http://flibuste.net From fumanchu at amor.org Mon Aug 30 19:11:14 2004 From: fumanchu at amor.org (Robert Brewer) Date: Mon Aug 30 19:16:50 2004 Subject: [Web-SIG] Pending modifications to PEP 333 Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022E98@exchange.hqamor.amorhq.net> William Dode wrote: > "Phillip J. Eby" writes: > > > So far, there has been surprisingly little comment on the > > PEP either from c.l.py or python-dev, so I'm going to take their > > silence to mean that the PEP is basically perfect, apart from the > > currently known issues. ;) > > One important things of course is that current frameworks and servers > implement the specs. When the most famous will begin, the others will > follow, but who will begin ? Shall we ask them on their mailing-list ? > Are they here ? The intermediate step for me as a framework writer is to write my own WSGI wrapper for mod_python, for example, so that when mod_python grows its own WSGI interface, the replacement will be nearly seamless. I expect others are doing the same, if only for testing purposes, so I don't think we're in a huge rush. But yes, some of the "more famous" server authors are here and gave input on the spec. Robert Brewer MIS Amor Ministries fumanchu@amor.org From py-web-sig at xhaus.com Mon Aug 30 22:02:57 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Mon Aug 30 21:58:27 2004 Subject: [Web-SIG] Iterator protocols. In-Reply-To: <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> References: <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> Message-ID: <413387F1.7010804@xhaus.com> [Alan Kennedy] >> However, even though jython is based on python 2.1, and thus doesn't >> have built-in support for either iterators or generators, I have still >> implemented the iterator protocol in my java/jython framework [Philip J. Eby] > Unfortunately, your technique doesn't actually work, unless you're also > going to patch the Jython __builtins__ to include 'StopIteration', > 'iter', and so forth. You would have to use the pre-2.2 iteration > protocol, which uses __getitem__ and IndexError. I think this would > have to be something you document as a spinoff or "application note" for > WSGI users who must use a pre-2.2 version of Python. One of the reasons > we decided to go ahead and require 2.2.2 was to avoid having to deal > with the absence of True/False, iterators, and generators. [Ian Bicking] > That's an interesting question. I guess with both Jython and Zope > 2.6 and earlier being Python 2.1, it should be given some consideration. > > One question: should the application iterable be a Python 2.2 style > iterable? I.e., it is up to Python 2.1 servers to implement the Python > 2.2 iterator protocol themselves? Or, should the application be > responsible to return an iterator, appropriate for the Python version? > > In Python <2.2 (including 1.5.2) the protocol was that you called > __getitem__ with ever-increasing integers, until an IndexError was > raised. There was no concept of a special __iter__() function. But I > guess Python 2.2's iter() builtin could be simulated: Well, now I'm confused :-) Firstly, my 2.1 implementation of the 2.2 iterator protocol does work, because I do create a StopIteration exception and poke it into __builtin__. Which isn't the prettiest of approaches, but it works. I'm currently testing on an application object defined like this: ################ class handler: def __init__(self, environ, start_response): start_response("200 OK", []) self.i = 0 def __iter__(self): return self def next(self): if self.i < 6: self.i += 1 return "Hello WSGI World!\n" % (self.i, self.i) else: raise StopIteration() ################# And it works as expected: as I expected ;-) So the two consequent questions I have are 1. Is there something wrong with my approach of defining a StopIteration exception, and poking it into __builtin__? 2. Do I need to implement the old pre-2.2 iterator protocol as well? It had never occurred to me to implement that: I was focussed only on 2.2 iterators. While we're on the subject of python 2.2 requisites, it's also trivial for me to define True and False. Which leaves generators as the only 2.2 facility I can't do anything about. But since generators are optional for application/middleware authors, doesn't that mean that 2.2.2 is not required as the minimum version for framework authors, only for 2.2-dependent components that are plugged into their framework? Keep up the good work! Regards, Alan. From pje at telecommunity.com Mon Aug 30 22:18:43 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Aug 30 22:18:13 2004 Subject: [Web-SIG] Iterator protocols. In-Reply-To: <413387F1.7010804@xhaus.com> References: <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com> At 09:02 PM 8/30/04 +0100, Alan Kennedy wrote: >So the two consequent questions I have are > >1. Is there something wrong with my approach of defining a StopIteration >exception, and poking it into __builtin__? Yes; it won't work with anything else that pokes its own StopIteration into __builtin__. This is very fragile; don't do it. >2. Do I need to implement the old pre-2.2 iterator protocol as well? It >had never occurred to me to implement that: I was focussed only on 2.2 >iterators. If you're writing a server or gateway, you don't need to implement it at all: use a "for" loop to iterate over the iterable, and all will be well. If you're writing an application that must work under pre-2.2 Python, you must implement the *old* iterator protocol, and only that protocol. You do not have to implement the new iterator protocol "as well". Implement the old protocol *instead*. Following these guidelines will make your code both "forward" and "backward" compatible, since newer Pythons still recognize the old iterator protocol. >While we're on the subject of python 2.2 requisites, it's also trivial for >me to define True and False. Which leaves generators as the only 2.2 >facility I can't do anything about. But since generators are optional for >application/middleware authors, doesn't that mean that 2.2.2 is not >required as the minimum version for framework authors, only for >2.2-dependent components that are plugged into their framework? Correct. By the way, there's no need to define True and False either; a server or gateway supporting a pre-2.2.2 version of Python should just use 1 and 0. The PEP doesn't actually require the use of True and False, it just refers to "true values" and "false values". From py-web-sig at xhaus.com Mon Aug 30 22:25:00 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Mon Aug 30 22:20:30 2004 Subject: [Web-SIG] Container buffering of output. In-Reply-To: <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> References: <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> Message-ID: <41338D1C.9000704@xhaus.com> [Alan Kennedy] >> In .. J2EE .. the container will do it's own buffering completely >> outside your control, and send the pieces with chunked-transfer >> encoding if necessary. So even if I put a flush on the output channel >> in my framework, I'm only flushing it to the container's buffer: it's >> still not guaranteed to send output back down the return socket to the >> client. [Phillip J. Eby] > That is potentially a problem, since the point is to guarantee that when > 'write()' returns to the application, the output isn't going to just sit > in the buffer while the application moves ahead with other things: it > should be going to the client. Hmmm, I don't see how it would be a problem. Although I suppose that depends on what you mean by "the output isn't going to just sit in the buffer": which buffer? As you say, when the write() returns, the application's output has been sent as far as I can send it. My entire thread of execution for a request may have ended, and the output may still be sitting in some container's (i.e. Apache, Tomcat) buffer, i.e. not sent to the client: there's nothing I can do about that. I can call flush on my OutputStream, but I can't guarantee that the container will respect that by actually flushing to the client, for whatever reasons it may have. This already happens with plain CGI. That's the way that containers like Apache and Tomcat deal with most dynamic content: buffer CGI/etc output until the buffer is full, then send a chunk to the client. The behaviour of the container will probably be different if a Content-Length header is set: it might pass the output straight through, or it might still buffer it. That's container-specific. This is all an inevitable consequence of running inside a container of some kind. However, if the container were written in python, e.g. SimpleHttpServer, Medusa or Twisted, they could meet the guarantee "sent down the socket to the client before the write() returns", because they hold the socket connected to the client. He who holds the socket calls the shots. I don't see any of this presenting a problem for WSGI. Regards, Alan. From ianb at colorstudy.com Mon Aug 30 22:35:16 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Aug 30 22:35:37 2004 Subject: [Web-SIG] Iterator protocols. In-Reply-To: <5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com> References: <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> <5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com> Message-ID: <41338F84.6030403@colorstudy.com> Phillip J. Eby wrote: > At 09:02 PM 8/30/04 +0100, Alan Kennedy wrote: > >> So the two consequent questions I have are >> >> 1. Is there something wrong with my approach of defining a >> StopIteration exception, and poking it into __builtin__? > > > Yes; it won't work with anything else that pokes its own StopIteration > into __builtin__. This is very fragile; don't do it. Why? So long as he is catching the StopIteration that is in __builtin__, which may or may not be the object he originally put in there, it should all be fine. So maybe he should do: try: StopIteration except NameError: class StopIteration(Exception): pass __builtin__.StopIteration = StopIteration del StopIteration -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From py-web-sig at xhaus.com Mon Aug 30 23:05:08 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Mon Aug 30 23:00:37 2004 Subject: [Web-SIG] Iterator protocols. In-Reply-To: <41338F84.6030403@colorstudy.com> References: <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> <5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com> <41338F84.6030403@colorstudy.com> Message-ID: <41339684.9060606@xhaus.com> [Ian Bicking] > Why? So long as he is catching the StopIteration that is in > __builtin__, which may or may not be the object he originally put in > there, it should all be fine. So maybe he should do: > > try: > StopIteration > except NameError: > class StopIteration(Exception): > pass > __builtin__.StopIteration = StopIteration > del StopIteration :-) Here's my implementation: minds think alike! private void create_stop_iteration ( ) { interp.exec( "try:\n"+ " StopIteration\n"+ "except NameError:\n"+ " class StopIteration(Exception): pass\n"+ " import sys ; sys.add_package('org.python.core')\n"+ " from org.python.core import __builtin__\n"+ " __builtin__.StopIteration = StopIteration\n"+ " del StopIteration\n" ); } Regards, Alan. From py-web-sig at xhaus.com Mon Aug 30 22:45:03 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Mon Aug 30 23:09:54 2004 Subject: [Web-SIG] Iterator protocols. In-Reply-To: <5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com> References: <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> <5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com> Message-ID: <413391CF.4070109@xhaus.com> Phillip, I really am confused now by what you say. Is it possible that you're misunderstanding my approach? I should make it explicitly clear that I am writing this in Java. So when I say I'm iterating over the iterable, I do it this way //------------------------------------- PyObject iterable = app_result.invoke("__iter__"); PyObject next_object = null; while (true) { try { next_object = iterable.invoke("next"); } catch (PyException pe) { // Pseudo-code here if pe is StopIteration: break } start_response_callable.write_callable.write(((PyString)next_object).toString()); } //------------------------------------- So in light of that ..... [Alan Kennedy] >> 1. Is there something wrong with my approach of defining a >> StopIteration exception, and poking it into __builtin__? [Phillip J. Eby] > Yes; it won't work with anything else that pokes its own StopIteration > into __builtin__. This is very fragile; don't do it. Hmm, I still don't see the problem. I've got complete control of the interpreter, since I am instantiating it. So I can guarantee that any mods I make will be made before any other code. I think of it as specializing the interpreter to have a new exception. [Alan Kennedy] >> 2. Do I need to implement the old pre-2.2 iterator protocol as well? >> It had never occurred to me to implement that: I was focussed only on >> 2.2 iterators. [Phillip J. Eby] > If you're writing a server or gateway, you don't need to implement it at > all: use a "for" loop to iterate over the iterable, and all will be well. Ah, but this sentence only makes sense if I'm writing python/jython: I'm writing java. > If you're writing an application that must work under pre-2.2 Python, > you must implement the *old* iterator protocol, and only that protocol. > You do not have to implement the new iterator protocol "as well". > Implement the old protocol *instead*. To me, the purpose of implementing the 2.2 iterator protocol is so that applications and components run inside my framework will work, if they support the 2.2 iterator protocol. I'm really not interested in the pre-2.2 protocol at all, though I suppose I should be if people want to run pre-2.2 iterable components in my framework. > Following these guidelines will make your code both "forward" and > "backward" compatible, since newer Pythons still recognize the old > iterator protocol. To some degree, my framework *is* the python in this case. [Alan Kennedy] >> While we're on the subject of python 2.2 requisites, it's also trivial >> for me to define True and False. Which leaves generators as the only >> 2.2 facility I can't do anything about. But since generators are >> optional for application/middleware authors, doesn't that mean that >> 2.2.2 is not required as the minimum version for framework authors, >> only for 2.2-dependent components that are plugged into their framework? [Phillip J. Eby] > Correct. By the way, there's no need to define True and False either; a > server or gateway supporting a pre-2.2.2 version of Python should just > use 1 and 0. The PEP doesn't actually require the use of True and > False, it just refers to "true values" and "false values". I think I'll set them anyway. That way, components running inside my framework won't break if they refer to True or False. Kind regards, Alan. From pje at telecommunity.com Tue Aug 31 01:59:37 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Aug 31 01:59:15 2004 Subject: [Web-SIG] Iterator protocols. In-Reply-To: <413391CF.4070109@xhaus.com> References: <5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com> <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> <5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040830194335.020e93c0@mail.telecommunity.com> At 09:45 PM 8/30/04 +0100, Alan Kennedy wrote: >I really am confused now by what you say. Is it possible that you're >misunderstanding my approach? No; your approach just isn't portable, and breaks the cross-server compatibility that's the point of WSGI. See below. >I should make it explicitly clear that I am writing this in Java. So when >I say I'm iterating over the iterable, I do it this way > >//------------------------------------- > PyObject iterable = app_result.invoke("__iter__"); > PyObject next_object = null; > while (true) > { > try > { next_object = iterable.invoke("next"); } > catch (PyException pe) > { > // Pseudo-code here > if pe is StopIteration: break > } >start_response_callable.write_callable.write(((PyString)next_object).toString()); > } >//------------------------------------- > >So in light of that ..... ...this code won't work if the application returns, say, a list. But a list *would* be a perfectly valid iterable in a "normal" WSGI server or gateway; therefore, this approach is broken. Meanwhile, an application that wants to support running in pre-2.2 containers *other* than yours, is now forced to implement *both* the old and the new protocol! This is clearly broken, since there's no reason to require backward-compatible application code to implement a protocol that isn't implemented by the version of Python they're trying to support. >[Phillip J. Eby] > > If you're writing a server or gateway, you don't need to implement it at > > all: use a "for" loop to iterate over the iterable, and all will be well. > >Ah, but this sentence only makes sense if I'm writing python/jython: I'm >writing java. Well, perhaps you should check whether there is a Java API you can access from Jython that's akin to PyObject_GetIter() in the C API, that's used in both Jython 2.1 and Jython 2.2; then your code will be forward and backward compatible without implementing both the old and the new protocols. If there is no such API, and you want to support the 2.2 protocol, you'll need to hardcode both the old and new protocols, due to the fact that you're not coding in Python (where a simple "for" loop suffices to ensure portability). >To me, the purpose of implementing the 2.2 iterator protocol is so that >applications and components run inside my framework will work, if they >support the 2.2 iterator protocol. I'm really not interested in the >pre-2.2 protocol at all, though I suppose I should be if people want to >run pre-2.2 iterable components in my framework. If a piece of code is written for 2.2 and its iterator protocol, why do you think it'll work in your server at all? It's far more likely that the only code you can run in your server will be code written for a 2.1 version of Python. And such code, if it has an iterable at all, is going to be written to the old iterator protocol, because it will presumably want to be able to run in pre-2.2 CPython containers, too. So, no matter what, *no* code is going to work in your server unless it was specifically written for your server: the exact opposite of the point of WSGI. From ianb at colorstudy.com Tue Aug 31 05:01:22 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Tue Aug 31 05:01:29 2004 Subject: [Web-SIG] Status code, status header Message-ID: <4133EA02.6090301@colorstudy.com> After a little thought, I'm -1 on a status header, even with email.Message. Mostly because at some point I believe Alan asked about what to do about a Location header, thinking in terms of CGI behavior where if you don't provide the status header the server guesses -- either doing 200, or 304 if there's a Location header to a remote location, or an internal redirect otherwise. WSGI explicitly doesn't allow that, but it's a clearer requirement when the application has to explicitly say what the status code is. If status was a header, I think we'd have to deal with a situation when that header was missing. I'm also +1 on turning status into an integer. I think it makes things a little simpler, and those message strings are just a distraction. The final server can put that string in ("200 OK", etc) if it wants to, but if it doesn't it doesn't matter. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Tue Aug 31 05:07:51 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Tue Aug 31 05:07:57 2004 Subject: [Web-SIG] The write callable (vs. file-like object) Message-ID: <4133EB87.7060501@colorstudy.com> Another comment I meant to make, but forgot about amid exceptions. The write callable is a bit awkward, because most code wants a file-like object, not a callable. So I had to do the dumb thing of creating a fake instance with one "write" instance variable. That feels silly. I think I'd prefer if the return value from start_response was a file-like object. Arguably, where the callable is harder to use, it's easier to produce. E.g., you could pass a bound method (that's not write) as the callable, like aList.append. So I'm not sure about this. OTOH, returning a file-like object leaves open more room for extension. Like, the ability to write unicode; even if we leave it out now I don't see any good place where that could be added in the future, as the interface is rather minimal in that area. But my thinking is a little fuzzy in that area. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From tony at lownds.com Tue Aug 31 08:15:55 2004 From: tony at lownds.com (tony@lownds.com) Date: Tue Aug 31 08:34:35 2004 Subject: [Web-SIG] wsgi.fatal_errors In-Reply-To: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> Message-ID: <55924.68.122.69.79.1093932955.squirrel@*> > Here are some changes I've proposed in the last few days to resolve issues > people brought up, but which I haven't gotten much feedback on: > > * 'wsgi.fatal_errors' key for exceptions that apps and middleware > shouldn't > trap > What about defining an exception class that applications can raise with an HTML payload, which servers are supposed to send the to the client? Middleware should be free to alter the payload as much as they like. The server should not send the payload when content-type is not html. By using exceptions as a backchannel, the application and middleware do not have to keep track of the state to sanely handle an error. With these examples, the FormatExceptions middleware really needs to be the "innermost" middleware. I think exception-handling middleware independent of how it is stacked is a non-goal. For example, def an_application(env, start_response): try: form = read_form(env) html = do_work(form) write = start_response('200 OK', [('Content-type', 'text/html')]) return [html] except: import cgitb cgitb.html raise env['wsgi.error_class'], cgitb.html(sys.exc_info()) ...and middleware that formats the exception: def FormatExceptions(app): import sys, cgitb def middleware(env, start_response): try: return app(env, start_response) except: raise env['wsgi.error_class'], cgitb.html(sys.exc_info()) return middleware ...and more complicated middleware that uses this concept: class AddContent: def __init__(self, app, header='', footer=''): self.app = app self.header = header self.footer = footer def __call__(self, env, start_response): return AddContentHandler(env, start_response, self).run() def add_length(self, length): return length + len(self.header) + len(self.footer) class AddContentHandler: def __init__(self, add_content, env, start_response): self.env = env self.orig_start_response = start_response self.add_content = add_content self.written_header = False self.publish_extension() def publish_extension(self): self.env['wsgi.extensions'].append('add_content') self.env['add_content.instance'].append(add_content) def start_response(self, status, headers): self.set_headers(headers) self.check_content_length() self.orig_write = self.orig_start_response(status, self.rebuild_headers()) return self.write def write(self, data): if not self.written_header: self.orig_write(self.add_content.header) self.written_header = True return self.orig_write(data) def run(self): try: result = self.add_content.app(self.env, self.start_response) except self.env['wsgi.error_class'], e: # wrap exception html -- try not to duplicate header html = str(e) if self.written_header: self.written_header = True html = self. add_content.header + html html += self. add_content.footer raise self.env['wsgi.error_class'], html else: self.result = iter(result) return self def __iter__(self): if not self.written_header: self.written_header = True yield self.add_content.header for i in self.result: yield i yield self.add_content.footer -Tony From py-web-sig at xhaus.com Tue Aug 31 17:16:31 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Tue Aug 31 17:15:47 2004 Subject: [Web-SIG] Iterator protocols. In-Reply-To: <5.1.1.6.0.20040830194335.020e93c0@mail.telecommunity.com> References: <5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com> <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> <5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com> <5.1.1.6.0.20040830194335.020e93c0@mail.telecommunity.com> Message-ID: <4134964F.4030201@xhaus.com> Dear Phillip, OK, now I understand what you're saying about iterators. Sorry for being so thick, and thanks for your patience. More below. [Alan Kennedy] >> I really am confused now by what you say. Is it possible that you're >> misunderstanding my approach? [Phillip J. Eby] > No; your approach just isn't portable, and breaks the cross-server > compatibility that's the point of WSGI. See below. [Snip some java code posted by Alan] [Phillip J. Eby] > ...this code won't work if the application returns, say, a list. But a > list *would* be a perfectly valid iterable in a "normal" WSGI server or > gateway; therefore, this approach is broken. > > Meanwhile, an application that wants to support running in pre-2.2 > containers *other* than yours, is now forced to implement *both* the old > and the new protocol! > > This is clearly broken, since there's no reason to require > backward-compatible application code to implement a protocol that isn't > implemented by the version of Python they're trying to support. My misunderstanding was based on the fact that I mistakenly thought that the application object authors would always implement the 2.2 iterator protocol on their own objects, i.e. explicit .__iter__() and .next() methods, etc: I forgot that they could just return a simple python object, e.g. list, etc, which is of course an iterable as well. [Phillip J. Eby] > .... perhaps you should check whether there is a Java API you can > access from Jython that's akin to PyObject_GetIter() in the C API, > that's used in both Jython 2.1 and Jython 2.2; then your code will be > forward and backward compatible without implementing both the old and > the new protocols. Unfortunately not: jython 2.1 does not have such a method in the PyObject API. The only iterator related methods in the jython 2.1 PyObject API are __getitem__() __len__() Jython 2.2alpha does have 2.2 iterator support, i.e. all built-in sequence objects implement the 2.2 iterator protocol. http://cvs.sourceforge.net/viewcvs.py/jython/jython/org/python/core/PyObject.java?rev=2.30&view=log But jython 2.2 is unfortunately currently out-of-the-question: not production quality yet. And it could be a while before it becomes production quality. I want to create a robust jython WSGI solution for right now. [Phillip J. Eby] > If there is no such API, and you want to support the 2.2 protocol, > you'll need to hardcode both the old and new protocols, due to the fact > that you're not coding in Python (where a simple "for" loop suffices to > ensure portability). I see now that that is my only option. Which is fine, it's not actually that much work. And I would have to do some of it for WSGI anyway, due to the requirements relating to application objects with __len__ methods, etc. [Alan Kennedy] >> To me, the purpose of implementing the 2.2 iterator protocol is so >> that applications and components run inside my framework will work, if >> they support the 2.2 iterator protocol. I'm really not interested in >> the pre-2.2 protocol at all, though I suppose I should be if people >> want to run pre-2.2 iterable components in my framework. [Phillip J. Eby] > If a piece of code is written for 2.2 and its iterator protocol, why do > you think it'll work in your server at all? To me, the whole point of implementing the 2.2 iterator protocol under jython 2.1 was so that there is at least a sporting chance that third-party WSGI components written for cpython 2.2 will run under my 2.1 container. I only want to do what I can to make sure that jython is not left behind ..... [Phillip J. Eby] > It's far more likely that > the only code you can run in your server will be code written for a 2.1 > version of Python. I'm hoping to maximize portability, and to minimize dependencies. [Phillip J. Eby] > And such code, if it has an iterable at all, is > going to be written to the old iterator protocol, because it will > presumably want to be able to run in pre-2.2 CPython containers, too. Well, as I mentioned above, I will attempt to explicitly support both the old and new iterator protocols. Do you think other folks developing embedded (i.e. not coded in python) frameworks should consider the same? [Phillip J. Eby] > So, no matter what, *no* code is going to work in your server unless it > was specifically written for your server: the exact opposite of the > point of WSGI. And framework-specificity is the very thing that I want to avoid most. Kind regards, Alan. From pje at telecommunity.com Tue Aug 31 17:29:09 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Aug 31 17:28:41 2004 Subject: [Web-SIG] Status code, status header In-Reply-To: <4133EA02.6090301@colorstudy.com> Message-ID: <5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com> At 10:01 PM 8/30/04 -0500, Ian Bicking wrote: >After a little thought, I'm -1 on a status header, even with email.Message. I think email.Message is also dead, due to its absence in Python versions prior to 2.2. >I'm also +1 on turning status into an integer. I think it makes things a >little simpler, and those message strings are just a distraction. The >final server can put that string in ("200 OK", etc) if it wants to, but if >it doesn't it doesn't matter. I'm still -1 on this, for the reasons stated previously. I might be convinced if you can show me that a significant number of popular servers already have the necessary table(s) to do this with; e.g. Twisted, ZServer, Apache (CGI/FastCGI), mod_python, etc. In theory, the "reason-phrase" can be null. In practice, I wonder. Also, I don't think the message strings are "just a distraction": they clarify the intent of the code that contains them. From pje at telecommunity.com Tue Aug 31 17:42:42 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Aug 31 17:42:11 2004 Subject: [Web-SIG] The write callable (vs. file-like object) In-Reply-To: <4133EB87.7060501@colorstudy.com> Message-ID: <5.1.1.6.0.20040831112916.0356d740@mail.telecommunity.com> At 10:07 PM 8/30/04 -0500, Ian Bicking wrote: >Another comment I meant to make, but forgot about amid exceptions. The >write callable is a bit awkward, because most code wants a file-like >object, not a callable. What other file-like properties do you want it to have? Keep in mind that the average response should be sent by calling 'write()' at most *once*, to write the entire page content, buffering the output of some template. 'write()' imposes a potentially high synchronization cost that reduces throughput if it's overused. It should *not* be used as the target of output from any kind of page template. Application frameworks should buffer template output (e.g. to a StringIO) and then either 'write()' or yield the result. Multiple calls to 'write()' are for streaming output only, such as each segment of a multipart server push, or for supporting frameworks that can't work any other way. I guess I need to beef up the parts that say this. The preferred mechanism for generating WSGI output is via the iterable return value, as it allows the maximum concurrency and throughput for the server. If we didn't need it for backward-compatibility with existing frameworks, 'write()' and 'start_response()' simply wouldn't exist, and the status and headers would be part of the return value as well. >Like, the ability to write unicode; even if we leave it out now I don't >see any good place where that could be added in the future, as the >interface is rather minimal in that area. But my thinking is a little >fuzzy in that area. If Python currently had a "byte array" type, we'd be using that instead of strings. Direct writing of Unicode isn't intended to ever be directly supported by the standard, although in principle you could create some kind of "encoding middleware" that sits directly atop the application. (An application or framework written to it would technically not be WSGI-compliant.) I guess I need to add something about byte arrays to the spec, especially since Java/Jython may have this issue today (i.e. strings are Unicode, but for HTTP a byte array is needed). From amk at amk.ca Tue Aug 31 17:43:44 2004 From: amk at amk.ca (A.M. Kuchling) Date: Tue Aug 31 17:44:19 2004 Subject: [Web-SIG] Status code, status header In-Reply-To: <5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com> References: <4133EA02.6090301@colorstudy.com> <5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com> Message-ID: <20040831154344.GA17594@rogue.amk.ca> On Tue, Aug 31, 2004 at 11:29:09AM -0400, Phillip J. Eby wrote: > At 10:01 PM 8/30/04 -0500, Ian Bicking wrote: > >After a little thought, I'm -1 on a status header, even with email.Message. > > I think email.Message is also dead, due to its absence in Python versions > prior to 2.2. Do note that rfc822.py is on the road to deprecation, presumably in favour of email.Message. If email.Message has problems, therefore, you should try to fix them. --amk From pje at telecommunity.com Tue Aug 31 17:47:51 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Aug 31 17:47:19 2004 Subject: [Web-SIG] Iterator protocols. In-Reply-To: <4134964F.4030201@xhaus.com> References: <5.1.1.6.0.20040830194335.020e93c0@mail.telecommunity.com> <5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com> <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com> <5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com> <5.1.1.6.0.20040830194335.020e93c0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040831114403.0356da10@mail.telecommunity.com> At 04:16 PM 8/31/04 +0100, Alan Kennedy wrote: >[Phillip J. Eby] > > And such code, if it has an iterable at all, is > > going to be written to the old iterator protocol, because it will > > presumably want to be able to run in pre-2.2 CPython containers, too. > >Well, as I mentioned above, I will attempt to explicitly support both the >old and new iterator protocols. > >Do you think other folks developing embedded (i.e. not coded in python) >frameworks should consider the same? I don't think this is going to be an issue anywhere else; AFAIK any other non-CPython target will have 2.2 iterator support built-in. For CPython 2.2 and up, 'PyObject_GetIter()' will do. If somebody needs to support earlier versions, they should just implement the old iterator protocol. It doesn't make any sense to try to support CPython 2.1 objects implementing a CPython 2.2 protocol, the special case of Jython notwithstanding. From pje at telecommunity.com Tue Aug 31 17:55:24 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Aug 31 17:54:51 2004 Subject: [Web-SIG] Status code, status header In-Reply-To: <20040831154344.GA17594@rogue.amk.ca> References: <5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com> <4133EA02.6090301@colorstudy.com> <5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040831115305.0356e440@mail.telecommunity.com> At 11:43 AM 8/31/04 -0400, A.M. Kuchling wrote: >On Tue, Aug 31, 2004 at 11:29:09AM -0400, Phillip J. Eby wrote: > > At 10:01 PM 8/30/04 -0500, Ian Bicking wrote: > > >After a little thought, I'm -1 on a status header, even with > email.Message. > > > > I think email.Message is also dead, due to its absence in Python versions > > prior to 2.2. > >Do note that rfc822.py is on the road to deprecation, presumably in >favour of email.Message. If email.Message has problems, therefore, >you should try to fix them. It doesn't actually have any serious problems w/respect to WSGI usage, just stuff we don't need. However, despite our change to a 2.2.2 version target, Jython has since then emerged as a use case, so I believe we're moving back to at least a 2.1 version target. IIRC, 'email.Message' isn't available in 2.1. Anyway, the alternative is "list of (name,value) tuples", not anything from the rfc822 module. From pje at telecommunity.com Tue Aug 31 18:11:03 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Aug 31 18:10:49 2004 Subject: [Web-SIG] wsgi.fatal_errors In-Reply-To: <55924.68.122.69.79.1093932955.squirrel@*> References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com> At 11:15 PM 8/30/04 -0700, tony@lownds.com wrote: > > Here are some changes I've proposed in the last few days to resolve issues > > people brought up, but which I haven't gotten much feedback on: > > > > * 'wsgi.fatal_errors' key for exceptions that apps and middleware > > shouldn't > > trap > > > >What about defining an exception class that applications can raise with an >HTML payload, which servers are supposed to send the to the client? >Middleware should be free to alter the payload as much as they like. The >server should not send the payload when content-type is not html. > >By using exceptions as a backchannel, the application and middleware do >not have to keep track of the state to sanely handle an error. Interesting. But I think you've just given me an idea for a possibly simpler way to do this, with some other advantages. Suppose that instead of 'start_response(status,headers)' we had 'set_response(status,headers,body=None)'. And the difference would be that our 'set_response' does nothing until/unless you call write() or yield a result from the return iterable. Therefore, you could call 'set_response' multiple times, with only the last such call taking effect. (If you supply a non-None 'body', then calling write() or returning an iterable is an error.) Now consider error handling middleware: it simply calls 'set_response(error_status,error_headers,error_body)', and returns None. At this point, we've isolated the complexity to exist only for streaming responses once the first body chunk has been generated. We can handle this by making a call to 'set_response()' a fatal error if a body chunk has been generated. Thus, no special handling is needed by an exception handler: it just tries to do 'set_response()', and allows the fatal error (if any) to propagate. Now, the server can catch the fatal error and deal with it. I think this will let us keep all of the complications in the server, where they always have to exist, no matter what else we do. Exception-handling middleware is then delightfully simple. On the other hand, output-transforming middleware becomes somewhat more complex, as it would now have three output sources to transform (body param to set_response(), write(), and output iterable). This is a fairly significant change to the spec, that introduces lots of new angles to cover. But, I think it could be an "exceptionally" clean solution to the problem. ;) From py-web-sig at xhaus.com Tue Aug 31 19:35:43 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Tue Aug 31 19:31:11 2004 Subject: [Web-SIG] The write callable (vs. file-like object) In-Reply-To: <5.1.1.6.0.20040831112916.0356d740@mail.telecommunity.com> References: <5.1.1.6.0.20040831112916.0356d740@mail.telecommunity.com> Message-ID: <4134B6EF.2010708@xhaus.com> [Phillip J. Eby] > If Python currently had a "byte array" type, we'd be using that instead > of strings. Direct writing of Unicode isn't intended to ever be > directly supported by the standard, although in principle you could > create some kind of "encoding middleware" that sits directly atop the > application. (An application or framework written to it would > technically not be WSGI-compliant.) > > I guess I need to add something about byte arrays to the spec, > especially since Java/Jython may have this issue today (i.e. strings are > Unicode, but for HTTP a byte array is needed). Hmmm: looking under the jython covers, I think there is no problem with binary strings. org.python.core.PyFile implements the write method for *binary* data by transcoding the Unicode string using the java.lang.String.getBytes(int,int,byte[],int) method (which is deprecated because it doesn't transcode unicode characters properly). http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes(int,%20int,%20byte[],%20int) The javadoc says: "Copies characters from this string into the destination byte array. Each byte receives the 8 low-order bits of the corresponding character. The eight high-order bits of each character are not copied and do not participate in the transfer in any way." Which, AFAICT, is not a problem, because (I'm presuming) jython stores binary data as one byte per character of a string, i.e. the low byte. So the above transcoding would be fine, when you're dealing with bytes, not actual characters. When the output is *character* data (i.e. the "if (binary)" clause is false, see below), the java.lang.String.getBytes() method is used, which transcodes properly to bytes, according to the "platform's default charset", which is set at JVM startup time. http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes() If anyone is interested, here is the code for the PyFile.getBytes(String) method, called by PyFile.write(). protected byte[] getBytes(String s) { // Yes, I known the method is depricated, but it is the fastest // way of converting between between byte[] and String if (binary) { byte[] buf = new byte[s.length()]; s.getBytes(0, s.length(), buf, 0); return buf; } else return s.getBytes(); } So, I think all is well here: jython knows how to properly manage byte strings vs. python strings. Regards, Alan. P.S. The spelling mistakes in the code comments above are verbatim from the jython 2.1 codebase. All other speeling misteaks are my own ;-) From py-web-sig at xhaus.com Tue Aug 31 19:50:31 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Tue Aug 31 19:49:13 2004 Subject: [Web-SIG] wsgi.fatal_errors In-Reply-To: <5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com> References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> <5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com> Message-ID: <4134BA67.6020603@xhaus.com> [Phillip J. Eby] > This is a fairly significant change to the spec, that introduces lots of > new angles to cover. But, I think it could be an "exceptionally" clean > solution to the problem. ;) +1 on changing the spec until it's perfect, or as close as possible to same. But I'm trying to manage an implementation here as well. It would be nice if we could have a simple versioning scheme on the spec, i.e. a date string or a version label, which I could use as a tag/label in my versioning system. Maybe a change history as well? Just a suggestion. No problem if it's considered too much hassle. Kind regards, Alan. From py-web-sig at xhaus.com Tue Aug 31 21:01:24 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Tue Aug 31 20:56:51 2004 Subject: [Web-SIG] Returned application object and fileno. Message-ID: <4134CB04.2010803@xhaus.com> Dear Sig, Currently the spec says that the application can return an object which has a callable fileno attribute, which can return a file descriptor. The current wording is "If the returned iterable has a fileno attribute, the server may assume that this is a fileno() method returning an operating system file descriptor, and that it is allowed to read directly from that descriptor up to the end of the file, and/or use any appropriate operating system facilities (e.g. the sendfile() system call) to transmit the file's contents. If the server does this, it must begin transmission with the file's current position, and end at the end of the file." Problem is that jython doesn't support file descriptors, or the fileno() method. If you invoke fileno() on an org.python.core.PyFile, you get an Py.IOError("fileno() is not supported in jpython") exception. Is there any more portable way that we can detect the application returning a file(-like object)? Maybe checking type(app_object) == types.FileType? Or checking if the object has a read() method? I can imagine that a similar problem may arise later with IronPython on the MS CLR, which I believe doesn't use file descriptors either: like java, it is stream based. Regards, Alan. From pje at telecommunity.com Tue Aug 31 21:05:23 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Aug 31 21:04:59 2004 Subject: [Web-SIG] wsgi.fatal_errors In-Reply-To: <4134BA67.6020603@xhaus.com> References: <5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com> <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> <5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040831150321.02820e50@mail.telecommunity.com> At 06:50 PM 8/31/04 +0100, Alan Kennedy wrote: >[Phillip J. Eby] >>This is a fairly significant change to the spec, that introduces lots of >>new angles to cover. But, I think it could be an "exceptionally" clean >>solution to the problem. ;) > >+1 on changing the spec until it's perfect, or as close as possible to same. > >But I'm trying to manage an implementation here as well. > >It would be nice if we could have a simple versioning scheme on the spec, >i.e. a date string or a version label, which I could use as a tag/label in >my versioning system. Maybe a change history as well? There's a "Last Modified" header: http://www.python.org/peps/pep-0333.html And a revision history: http://cvs.sourceforge.net/viewcvs.py/python/python/nondist/peps/pep-0333.txt Note that both of these will be slightly out of sync with the "real" Python CVS, as both are updated by cronjobs. From pje at telecommunity.com Tue Aug 31 23:21:01 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Aug 31 23:20:30 2004 Subject: [Web-SIG] Re: Pending modifications to PEP 333 In-Reply-To: <20040830193840.6b12ae9b.ods@strana.ru> References: <5.1.1.6.0.20040830093138.037788d0@mail.telecommunity.com> <4132A151.1000006@colorstudy.com> <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> <4132A151.1000006@colorstudy.com> <5.1.1.6.0.20040830093138.037788d0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040831171705.0238f250@mail.telecommunity.com> At 07:38 PM 8/30/04 +0400, Denis S. Otkidach wrote: >On Mon, 30 Aug 2004 09:33:14 -0400 >"Phillip J. Eby" wrote: > > > >Below is a class we use for headers in our framework for several years. > > >I guess it's more comfortable than list of tuples or email.Message. > > >Anyway, we have to fix only "must have" interface, but not all useful > > >methods. > > > > Hi Denis; thanks for the input. Unfortunately, WSGI needs to either use a > > class/type that's available in the Python standard library, or else a > > simple protocol like "sequence of name,value pairs". > >"sequence of name,value pairs" is OK - my class satisfies this interface if >you mean just iterable object when saying "sequence", and not real list. As it happens, the current spec is ambiguous: it says both "list" and "sequence" in different places. I've standardized it to be "list", as in 'type(headers) is ListType'. This means your approach will require you to call 'list(myHeadersObject)', but it will allow middleware to manipulate the list in-place using boilerplate routines. From pje at telecommunity.com Tue Aug 31 23:56:11 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Aug 31 23:55:40 2004 Subject: [Web-SIG] Pending modifications to PEP 333 In-Reply-To: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040831173043.023a93e0@mail.telecommunity.com> I'm just about to check in a major update to the PEP, per the details below. It will be a while before it shows up in the HTML version of the PEP or the sourceforge ViewCVS, though. At 11:25 PM 8/29/04 -0400, Phillip J. Eby wrote: >Here are some changes I've proposed in the last few days to resolve issues >people brought up, but which I haven't gotten much feedback on: > >* 'wsgi.fatal_errors' key for exceptions that apps and middleware >shouldn't trap > >* 'wsgi.auth_available' flag I've added these to the "Open Issues" section now >* Make the 'headers' object an 'email.Message' (well, there's been some >feedback, but I think I addressed the concerns, and there was no feedback >since) ...and removed this, because it's effectively dead due to lack of popular support, added annoyances, and the need to support pre-2.2 versions of Python. However, I've updated the spec to be unambiguous in requiring a *list* of header tuples, so that middleware and servers can modify the headers in place using boilerplate routines, if desired. >* what should a server or gateway's default error handling be, for each of >the eight contexts in which an exception can occur? Added to open issues. >* notes on writing pre-2.2 compatible iteration code Completed and added to the PEP. >* anything else? The application object must now *always* return an iterable; 'None' is no longer a valid return value. This simplifies server logic and helps encourage the use of an iterable. Also, it's now explicit that the server must not try to use any attributes of the iterable not explicitly mentioned by the PEP (e.g. 'read()' is a no-no). I've also clarified that 'fileno()', if present, *must* be an OS file descriptor, and is only relevant to servers on platforms where file descriptors exist. I've also done a significant edit to further clarify that the 'write()' callable is a backward compatibility hack, and isn't intended to be used unless you really, really need it. I've also significantly clarified the issues surrounding buffering and streaming. I also refactored the examples to be more compliant with the spec's intentions and to be more explanatory/exemplary of desirable behaviors. Last, but not least, the language regarding a server modifying or deleting application-supplied headers has been clarified to restrict its applicability to connection-management headers, and to clarify where any replaced or deleted headers should be recorded.