Update of /cvsroot/python/python/nondist/peps In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25905 Modified Files: pep-0333.txt Log Message: Major clarifications and some minor semantic changes; see http://mail.python.org/pipermail/web-sig/2004-August/000730.htm for a complete summary of the changes and their rationales. Index: pep-0333.txt =================================================================== RCS file: /cvsroot/python/python/nondist/peps/pep-0333.txt,v retrieving revision 1.3 retrieving revision 1.4 diff -u -d -r1.3 -r1.4 --- pep-0333.txt 30 Aug 2004 03:04:59 -0000 1.3 +++ pep-0333.txt 31 Aug 2004 22:00:28 -0000 1.4 @@ -53,7 +53,7 @@ However, since no existing servers or frameworks support WSGI, there is little immediate reward for an author who implements WSGI support. -Thus, WSGI *must* be easy to implement, so that an author's initial +Thus, WSGI **must** be easy to implement, so that an author's initial investment in the interface can be reasonably low. Thus, simplicity of implementation on *both* the server and framework @@ -146,17 +146,23 @@ """Simplest possible application object""" status = '200 OK' headers = [('Content-type','text/plain')] - write = start_response(status, headers) - write('Hello world!\n') + start_response(status, headers) + return ['Hello world!\n'] class AppClass: - """Much the same thing, but as a class + """Produce the same output, but using a class - (Note: 'AppClass' is the "application", so calling it - returns an instance of 'AppClass', which is the iterable - return value of the "application callable", as required - by the spec.) + (Note: 'AppClass' is the "application" here, so calling it + returns an instance of 'AppClass', which is then the iterable + return value of the "application callable" as required by + the spec. + + If we wanted to use *instances* of 'AppClass' as application + objects instead, we would have to implement a '__call__' + method, which would be invoked to execute the application, + and we would need to create an instance for use by the + server or gateway. """ def __init__(self, environ, start_response): @@ -167,10 +173,7 @@ status = '200 OK' headers = [('Content-type','text/plain')] self.start(status, headers) - yield "Hello world!\n" - for i in range(1,11): - yield "Extra line %s\n" % i Throughout this specification, we will use the term "a callable" to mean "a function, method, class, or an instance with a ``__call__`` @@ -200,22 +203,26 @@ environ['wsgi.multiprocess'] = True environ['wsgi.last_call'] = True + def write(data): + sys.stdout.write(data) + sys.stdout.flush() + def start_response(status,headers): - write = sys.stdout.write - write("Status: %s\r\n" % status) + + sys.stdout.write("Status: %s\r\n" % status) for key,val in headers: - write("%s: %s\r\n" % (key,val)) - write("\r\n") + sys.stdout.write("%s: %s\r\n" % (key,val)) + sys.stdout.write("\r\n") + return write result = application(environ, start_response) - if result is not None: - try: - for data in result: - sys.stdout.write(data) - finally: - if hasattr(result,'close'): - result.close() + try: + for data in result: + write(data) + finally: + if hasattr(result,'close'): + result.close() In the next section, we will specify the precise semantics that these illustrations are examples of. @@ -227,12 +234,12 @@ The application object must accept two positional arguments. For the sake of illustration, we have named them ``environ`` and ``start_response``, but they are not required to have these names. -A server or gateway *must* invoke the application object using +A server or gateway **must** invoke the application object using positional (not keyword) arguments. (E.g. by calling ``result = application(environ,start_response)`` as shown above.) The ``environ`` parameter is a dictionary object, containing CGI-style -environment variables. This object *must* be a builtin Python +environment variables. This object **must** be a builtin Python dictionary (*not* a subclass, ``UserDict`` or other dictionary emulation), and the application is allowed to modify the dictionary in any way it desires. The dictionary must also include certain @@ -243,7 +250,7 @@ The ``start_response`` parameter is a callable accepting two positional arguments. For the sake of illustration, we have named them ``status`` and ``headers``, but they are not required to have -these names, and the application *must* invoke the ``start_response`` +these names, and the application **must** invoke the ``start_response`` callable using positional arguments (e.g. ``start_response(status,headers)``). @@ -252,21 +259,31 @@ tuples describing the HTTP response header. This ``start_response`` callable must return a ``write(body_data)`` callable that takes one positional parameter: a string to be written as part of the HTTP -response body. +response body. (Note: the ``write()`` callable is provided only +to support certain existing frameworks' imperative output APIs; +it should not be used by new applications or frameworks. See +the `Buffering and Streaming`_ section for more details.) -The application object may return either ``None`` (indicating that -there is no additional output), or it may return a non-empty -iterable yielding strings. (For example, it could be a -generator-iterator that yields strings, or it could be a -sequence such as a list of strings.) The server or gateway will -treat the strings yielded by the iterable as if they had been -passed to the ``write()`` method. If a call to ``len(iterable)`` -succeeds, the server must be able to rely on the result being -accurate. That is, if the iterable returned by the application -provides a working ``__len__()`` method, it *must* return an -accurate result. +The application object must return an iterable yielding strings. +(For example, it could be a generator-iterator that yields strings, +or it could be a sequence such as a list of strings.) The server +or gateway must transmit these strings to the client in an +unbuffered fashion, completing the transmission of each string +before requesting another one. (See the `Buffering and Streaming`_ +section below for more on how application output must be handled.) -If the returned iterable has a ``fileno`` attribute, the server *may* +The server or gateway must not modify supplied strings in any way; +they must be treated as binary byte sequences with no character +interpretation, line ending changes, or other modification. The +application is responsible for ensuring that the string(s) to be +written are in a format suitable for the client. + +If a call to ``len(iterable)`` succeeds, the server must be able +to rely on the result being accurate. That is, if the iterable +returned by the application provides a working ``__len__()`` +method, it **must** return an accurate result. + +If the returned iterable has a ``fileno`` attribute, the server **may** assume that this is a ``fileno()`` method returning an operating system file descriptor, and that it is allowed to read directly from that descriptor up to the end of the file, and/or use any appropriate @@ -275,20 +292,36 @@ transmission with the file's current position, and end at the end of the file. -Finally, if the application returned an iterable, and the iterable has -a ``close()`` method, the server or gateway *must* call that method -upon completion of the current request, whether the request was -completed normally, or terminated early due to an error. (This is to -support resource release by the application. This protocol is -intended to support PEP 325, and also the simple case of an -application returning an open text file.) +Note that an application **must not** return an iterable with a +``fileno`` attribute if it is anything other than a method returning +an **operating system file descriptor**. "File-like" objects +that do not possess a true operating system file descriptor number +are expressly forbidden. Servers running on platforms where file +descriptors do not exist, or where there is no meaningful API for +accelerating transmission from a file descriptor should ignore the +``fileno`` attribute. -(Note: the application *must* invoke the ``start_response()`` callable -before the iterable yields its first body string, so that the server -can send headers before any body content. However, this invocation -*may* be performed by the iterable's first iteration, so servers *must -not* assume that ``start_response()`` has been called before they -begin iterating over the iterable.) +If the iterable returned by the application has a ``close()`` method, +the server or gateway **must** call that method upon completion of the +current request, whether the request was completed normally, or +terminated early due to an error. (This is to support resource release +by the application. This protocol is intended to support PEP 325, and +also other simple cases such as an application returning an open text +file.) + +(Note: the application **must** invoke the ``start_response()`` +callable before the iterable yields its first body string, so that the +server can send the headers before any body content. However, this +invocation **may** be performed by the iterable's first iteration, so +servers **must not** assume that ``start_response()`` has been called +before they begin iterating over the iterable.) + +Finally, servers **must not** directly use any other attributes of +the iterable returned by the application. For example, it the +iterable is a file object, it may have a ``read()`` method, but +the server **must not** utilize it. Only attributes specified +here, or accessed via e.g. the PEP 234 iteration APIs are +acceptable. ``environ`` Variables @@ -296,8 +329,8 @@ The ``environ`` dictionary is required to contain these CGI environment variables, as defined by the Common Gateway Interface -specification [2]_. The following variables *must* be present, but -*may* be an empty string, if there is no more appropriate value for +specification [2]_. The following variables **must** be present, but +**may** be an empty string, if there is no more appropriate value for them: * ``REQUEST_METHOD`` @@ -446,11 +479,11 @@ example, to minimize intermingling of data from multiple processes writing to the same error log.) -The methods listed in the table above *must* be supported by all +The methods listed in the table above **must** be supported by all servers conforming to this specification. Applications conforming -to this specification *must not* use any other methods or attributes +to this specification **must not** use any other methods or attributes of the ``input`` or ``errors`` objects. In particular, applications -*must not* attempt to close these streams, even if they possess +**must not** attempt to close these streams, even if they possess ``close()`` methods. @@ -462,36 +495,46 @@ (As with all WSGI callables, the arguments must be supplied positionally, not by keyword.) The ``start_response`` callable is used to begin the HTTP response, and it must return a -``write(body_data)`` callable. +``write(body_data)`` callable (see the `Buffering and Streaming`_ +section, below). The ``status`` argument is an HTTP "status" string like ``"200 OK"`` -or ``"404 Not Found"``. The string *must* be pure 7-bit ASCII, +or ``"404 Not Found"``. The string **must** be pure 7-bit ASCII, containing no control characters. It must not be terminated with a carriage return or linefeed. -The ``headers`` argument is a sequence of -``(header_name,header_value)`` tuples. Each ``header_name`` must be a -valid HTTP header name, without a trailing colon or other punctuation. -Each ``header_value`` *must not* include *any* control characters, -including carriage returns or linefeeds, either embedded or at the -end. (These requirements are to minimize the complexity of any -parsing that must be performed by servers, gateways, and intermediate -response processors that need to inspect or modify response headers.) +The ``headers`` argument is a list of ``(header_name,header_value)`` +tuples. It must be a Python list; i.e. ``type(headers) is +ListType)``, and the server **may** change its contents in any way +it desires. Each ``header_name`` must be a valid HTTP header name, +without a trailing colon or other punctuation. Each ``header_value`` +**must not** include *any* control characters, including carriage +returns or linefeeds, either embedded or at the end. (These +requirements are to minimize the complexity of any parsing that must +be performed by servers, gateways, and intermediate response +processors that need to inspect or modify response headers.) In general, the server or gateway is responsible for ensuring that correct headers are sent to the client: if the application omits a needed header, the server or gateway *should* add it. For example, the HTTP ``Date:`` and ``Server:`` headers would normally be supplied -by the server or gateway. If the application supplies a header that -the server would ordinarily supply, or that contradicts the server's -intended behavior (e.g. supplying a different ``Connection:`` header), -the server or gateway *may* discard the conflicting header, provided -that its action is recorded for the benefit of the application author. +by the server or gateway. (A reminder for server/gateway authors: HTTP header names are case-insensitive, so be sure to take that into consideration when examining application-supplied headers!) +If the application supplies headers that would affect the persistence +of the client's connection (e.g. ``Connection:``, "keep-alives", etc.), +the server or gateway is permitted to discard or modify these headers, +if the server cannot or will not conform to the application's requested +semantics. E.g., if the application requests a persistent connection +but the server wishes transience, or vice versa. + +However, if a server or gateway discards or overrides any application +header for any reason, it **must** record this action in a log (such as +the ``wsgi.errors`` log) for the benefit of the application author. + Handling the ``Content-Length`` Header ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -510,64 +553,102 @@ by the iterable. And, if the server and client both support HTTP/1.1 "chunked -encoding" [3]_, then the server *may* use chunked encoding to send +encoding" [3]_, then the server **may** use chunked encoding to send a chunk for each ``write()`` call or string yielded by the iterable, thus generating a ``Content-Length`` header for each chunk. This allows the server to keep the client connection alive, if it wishes -to do so. Note that the server *must* comply fully with RFC 2616 when -doing this, or else fall back to one of the other strategies for +to do so. Note that the server **must** comply fully with RFC 2616 +when doing this, or else fall back to one of the other strategies for dealing with the absence of ``Content-Length``. -The ``write()`` Callable ------------------------- - -The return value of the ``start_response()`` callable is a -one-argument `write()`` callable, that accepts strings to write as -part of the HTTP response body. The server or gateway must -not modify supplied strings in any way; they must be treated -as binary byte sequences with no character interpretation, line -ending changes, or other modification. The application is responsible -for ensuring that the string(s) to be written are in a format suitable -for the client. - -Note that the purpose of the ``write()`` callable is primarily to -support existing application frameworks that support a streaming -"push" API. Therefore, strings passed to ``write()`` *must* be sent -to the client *as soon as possible*; they must *not* be buffered -unless the buffer will be emptied in parallel with the application's -continuing execution (e.g. by a separate I/O thread). If the server -or gateway does not have a separate I/O thread available, it *must* -finish writing the supplied string before it returns from each -``write()`` invocation. - -If the application returns an iterable, each string produced by the -iterable must be treated as though it had been passed to ``write()``, -with the data sent in an "as soon as possible" manner. That is, -the iterable should not be asked for a new string until the previous -string has been sent to the client, or is buffered for such sending -by a parallel thread. +Buffering and Streaming +----------------------- -Notice that these rules discourage the generation of content before a -client is ready for it, in excess of the buffer sizes provided by the -server and operating system. For this reason, some applications may -wish to buffer data internally before passing any of it to ``write()`` -or yielding it from an iterator, in order to avoid waiting for the -client to catch up with their output. This approach may yield better -throughput for dynamically generated pages of moderate size, since the -application is then freed for other tasks. +Generally speaking, applications will achieve the best throughput +by buffering their (modestly-sized) output and sending it all at +once. When this is the case, applications **should** simply +return a single-element iterable containing their entire output as +a single string. -In addition to improved performance, buffering all of an application's +(In addition to improved performance, buffering all of an application's output has an advantage for error handling: the buffered output can -be thrown away and replaced by an error page, rather than dumping an +be discarded and replaced by an error page, rather than dumping an error message in the middle of some partially-completed output. For this and other reasons, many existing Python frameworks already accumulate their output for a single write, unless the application explicitly requests streaming, or the expected output is larger than -practical for buffering (e.g. multi-megabyte PDFs). So, these -application frameworks are already a natural fit for the WSGI -streaming model: for most requests they will only call ``write()`` -once anyway! +practical for buffering (e.g. multi-megabyte PDFs).) + +For large files, however, or for specialized uses of HTTP streaming +(such as multipart "server push"), an application may need to provide +output in smaller blocks (e.g. to avoid loading a large file into +memory). It's also sometimes the case that part of a response may +be time-consuming to produce, but it would be useful to send ahead the +portion of the response that precedes it. + +In these cases, applications **should** return an iterator (usually +a generator-iterator) that produces the output in a block-by-block +fashion. These blocks may be broken to coincide with mulitpart +boundaries (for "server push"), or just before time-consuming +tasks (such as reading another block of an on-disk file). + +WSGI servers and gateways **must not** delay the transmission +of any block; they **must** either fully transmit the block to +the client, or guarantee that they will continue transmission +even while the application is producing its next block. A +server/gateway may provide this guarantee in one of two ways: + +1. Send the entire block to the operating system (and request + that any O/S buffers be flushed) before returning control + to the application, OR + +2. Use a different thread to ensure that the block continues + to be transmitted while the application produces the next + block. + +By providing this guarantee, WSGI allows applications to ensure +that transmission will not become stalled at an arbitrary point +in their output data. This is critical for proper functioning +of e.g. multipart "server push" streaming, where data between +multipart boundaries should be transmitted in full to the client. + + +The ``write()`` Callable +~~~~~~~~~~~~~~~~~~~~~~~~ + +Some existing application framework APIs support unbuffered +output in a different manner than WSGI. Specifically, they +provide a "write" function or method of some kind to write +an unbuffered block of data, or else they provide a buffered +"write" function and a "flush" mechanism to flush the buffer. + +Unfortunately, such APIs cannot be implemented in terms of +WSGI's "iterable" application return value, unless threads +or other special mechanisms are used. + +Therefore, to allow these frameworks to continue using an +imperative API, WSGI includes a special ``write()`` callable, +returned by the ``start_response`` callable. + +New WSGI applications and frameworks **should not** use the +``write()`` callable if it is possible to avoid doing so. The +``write()`` callable is strictly a hack to support existing +frameworks' imperative APIs. In general, applications +should either be internally buffered, or produce iterable output. + +The ``write()`` callable is returned by the ``start_response()`` +callable, and it accepts a single parameter: a string to be +written as part of the HTTP response body, that is treated exactly +as though it had been yielded by the output iterable. In other +words, before ``write()`` returns, it must guarantee that the +passed-in string was either completely sent to the client, or +that it is buffered for transmission while the application +proceeds forward. + +An application **may** return a non-empty iterable even if it +invokes ``write()``, and that output must be treated normally +by the server or gateway. Implementation/Application Notes @@ -594,7 +675,7 @@ -------------- Servers *should* trap and log exceptions raised by -applications, and *may* continue to execute, or attempt to shut down +applications, and **may** continue to execute, or attempt to shut down gracefully. Applications *should* avoid allowing exceptions to escape their execution scope, since the result of uncaught exceptions is server-defined. @@ -687,6 +768,48 @@ a possibility. +Supporting Older (<2.2) Versions of Python +------------------------------------------ + +Some servers, gateways, or applications may wish to support older +(<2.2) versions of Python. This is especially important if Jython +is a target platform, since as of this writing a production-ready +version of Jython 2.2 is not yet available. + +For servers and gateways, this is relatively straightforward: +servers and gateways targeting pre-2.2 versions of Python must +simply restrict themselves to using only a standard "for" loop to +iterate over any iterable returned by an application. This is the +only way to ensure source-level compatibility with both the pre-2.2 +iterator protocol (discussed further below) and "today's" iterator +protocol (see PEP 234). + +(Note that this technique necessarily applies only to servers, +gateways, or middleware that are written in Python. Discussion of +how to use iterator protocol(s) correctly from other languages is +outside the scope of this PEP.) + +For applications, supporting pre-2.2 versions of Python is slightly +more complex: + +* You may not return a file object and expect it to work as an iterable, + since before Python 2.2, files were not iterable. + +* If you return an iterable, it **must** implement the pre-2.2 iterator + protocol. That is, provide a ``__getitem__`` method that accepts + an integer key, and raises ``IndexError`` when exhausted. + +Finally, middleware that wishes to support pre-2.2 versions of Python, +and iterates over application return values or itself returns an +iterable (or both), must follow the appropriate recommendations above. + +(Note: It should go without saying that to support pre-2.2 versions +of Python, any server, gateway, application, or middleware must also +use only language features available in the target version, use +1 and 0 instead of ``True`` and ``False``, etc.) + + + Server Extension APIs --------------------- @@ -710,7 +833,7 @@ frameworks to function almost entirely as middleware of various kinds. So, to provide maximum compatibility, servers and gateways that -provide extension APIs that replace some WSGI functionality, *must* +provide extension APIs that replace some WSGI functionality, **must** design those APIs so that they are invoked using the portion of the API that they replace. For example, an extension API to access HTTP request headers must require the application to pass in its current @@ -747,9 +870,9 @@ HTTP 1.1 Expect/Continue ------------------------ -Servers and gateways *must* provide transparent support for HTTP 1.1's -"expect/continue" mechanism, if they implement HTTP 1.1. This may be -done in any of several ways: +Servers and gateways **must** provide transparent support for HTTP +1.1's "expect/continue" mechanism, if they implement HTTP 1.1. This +may be done in any of several ways: 1. Reject all client requests containing an ``Expect: 100-continue`` header with a "417 Expectation failed" error. Such requests will @@ -918,71 +1041,28 @@ Open Issues =========== -The format of the ``headers`` passed to the ``start_response`` -callable has seen some debate. Currently, it is a sequence of tuples, -but other formats have been suggested, such as a dictionary of lists, -or an ``email.Message`` object (from the Python standard library's -``email`` package). For various practical reasons, the "dictionary of -lists" approach has been ruled out, but ``email.Message`` is still a -candidate, as it provides several advantages for many middleware -developers and some application or framework developers, without an -excessive burden to anyone else. - -Specifically, ``email.Message`` objects offer a mutable data -structure, (not unlike a case-insensitive dictionary) for containing -MIME headers, such as those used in an HTTP response. This makes it -very easy to modify headers, or add multi-valued headers such as -``Set-Cookie`` headers. If the ``headers`` passed to -``start_response`` were an ``email.Message``, then it would be easy -for middleware and servers to modify response headers, e.g. to supply -defaults for missing headers. It also leads to cleaner-looking -application code in some cases, e.g.:: - - from email.Message import Message - - def application(environ, start_response): - headers = Message() - headers.set_type("text/plain") - headers.add_header("Set-Cookie", "FOO=BAR", path="/foobar") - start("200 OK", headers)("Hello world!") - -Some have pointed out that this requires the developers of existing -frameworks to convert whatever header format they use to an -``email.Message``. But this is only relevant if the format they -already use is a list of name/value pairs: in all other cases they -would have to perform some conversion anyway. - -In the event that the ``email.Message`` format is *not* chosen, -however, application developers will still have the option of using it -as a helper class. For example, the code below works with the current -WSGI spec, by passing the message object's ``items()`` (a list of -tuples) to ``start_response()``:: - - def application(environ, start_response): - headers = Message() - headers.set_type("text/plain") - headers.add_header("Set-Cookie", "FOO=BAR", path="/foobar") - start_response("200 OK", headers.items())("Hello world!") - -But this doesn't help middleware authors, who would have to convert -the response headers into a ``Message`` object and back again if they -needed to modify the headers. - -One other issue that's been brought up in relation to -``email.Message`` is that its ``set_type()`` method also sets a -``MIME-Version`` header. In order to comply properly with the MIME -and HTTP specifications, it would then be necessary for server/gateway -authors to ensure the presence of a ``Content-Transfer-Encoding``, -e.g.:: - - if ('MIME-Version' in headers and - 'Content-Transfer-Encoding' not in headers - ): - headers['Content-Transfer-Encoding'] = "8bit" - -Also, ``email.Message`` has various features unrelated to HTTP or WSGI -that should not be used, and might be distracting or confusing to -authors. +* Some persons have requested information about whether the + ``HTTP_AUTHENTICATION`` header may be provided by the server. + That is, some web servers do not supply this information to + e.g. CGI applications, and they would like the application + to know that this is the case so it can use alternative + means of authentication. + +* Error handling: strategies for effective error handling are + currently in discussion on the Web-SIG mailing list. In + particular, a mechanism for specifying what errors an + application or middleware should *not* trap (because they + indicate that the request should be aborted), and mechanisms + for servers, gateways, and middleware to handle exceptions + occurring at various phases of the response processing. + +* Byte strings: future versions of Python may replace today's + 8-bit strings with some kind of "byte array" type. Some sort + of future-proofing would be good to have, and strategies for + this should be discussed on Web-SIG and Python-Dev. Nearly + every string in WSGI is potentially affected by this, although + some contexts should perhaps continue to allow strings as long as + they're pure ASCII. Acknowledgements
participants (1)
-
pje@users.sourceforge.net